How Accurate Are Outcome Predictions?
Short answer: Our AI model correctly predicts tribunal outcomes 79.0% of the time when tested on cases it’s never seen before.
How We Measured Accuracy
Training vs. Testing Split
- Training set: 252,978 decisions (98%) - Used to teach the AI
- Test set: 3,756 decisions (2%) - Hidden from AI, used only for accuracy testing
- Result: 79.0% of test cases predicted correctly
What Does 79% Mean?
- If the AI predicts “Allowed”: There’s a 79% chance the actual decision was “Allowed”
- If the AI predicts “Dismissed”: There’s a 79% chance the actual decision was “Dismissed”
- 21% error rate: In 1 out of 5 cases, the prediction is wrong
Industry Context
- Legal AI benchmarks: 70-85% is typical for outcome prediction tasks
- Human accuracy: Experienced lawyers predict outcomes at 66-75% accuracy (according to legal research)
- Our 79% accuracy: Above average, but not perfect
Confidence Levels Explained
High Confidence (≥80% probability)
- What we show: Only high-confidence predictions appear in app search results
- Example: “Allowed (85% confidence)” = AI is 85% certain this is correct
- Deployment: 25,213 decisions (18.4%) meet high-confidence threshold
- Your risk: 15-20% chance the prediction is wrong
Medium Confidence (60-79% probability)
- What we show: Warning label: “AI predicted, medium confidence”
- Example: “Dismissed (72% confidence)” = AI thinks it’s probably Dismissed, but less certain
- Your risk: 28-40% chance the prediction is wrong
Low Confidence (<60% probability)
- What we show: “Outcome unknown” or no prediction displayed
- Example: “Allowed (48% confidence)” = AI is guessing—don’t trust it
- Not deployed: 11,430 WSIAT decisions (2020-2026) flagged low-confidence
Why Some Predictions Are Wrong
1. Limited Training Data for Rare Outcomes
- Problem: Only 27 “Settled” cases in training data vs. 47,198 “Granted” cases
- Result: AI is terrible at predicting settlements (too few examples)
- Your risk: Higher for rare outcomes like “Partial Win” or “No Jurisdiction”
2. Missing Context in Keywords
- Problem: CanLII API returns only brief keywords, not full decision text
- Example: Decision says “chronic pain,” but doesn’t say if medical evidence was strong
- Result: AI can’t distinguish between well-documented vs. poorly-documented chronic pain claims
- Your risk: Higher for recent WSIAT cases (2020-2026) with sparse keywords
3. Tribunal Policy Changes
- Problem: AI learns from historical decisions (2020-2023), but tribunals change policies
- Example: If WSIAT started favoring chronic pain claims more in 2025, AI trained on 2020-2023 data won’t know
- Your risk: Higher for very recent decisions (2025-2026)
4. Case-Specific Nuances
- Problem: Two “low back pain” cases can have completely different evidence quality
- Example: Case A has 5 specialist reports + employer incident report; Case B has only family doctor note
- Result: AI sees both as “low back pain” but can’t distinguish evidence strength
- Your risk: Always compare your evidence quality to similar cases
When to Trust Predictions
✅ Trust predictions when:
- High confidence (≥80%): AI is relatively certain
- Common outcome: “Allowed,” “Granted,” “Dismissed” (AI has seen thousands of examples)
- Historical WSIAT data: Pre-2020 decisions with rich keywords
- HRTO/ONSBT: These tribunals have more consistent outcome patterns
⚠️ Question predictions when:
- Medium/low confidence: AI is uncertain
- Rare outcome: “Partial Win,” “Settled,” “No Jurisdiction”
- Recent WSIAT (2020-2026): Sparse keywords, lower AI accuracy
- Your case has unique factors: Complex pre-existing conditions, multiple injuries, policy violations
Comparing to Official Statistics
| Data Source | WSIAT Win Rate | Notes |
|---|---|---|
| WSIAT Annual Report (official) | 65-73% | Worker success rate on entitlement appeals |
| Our AI Predictions | 100% (onwsiat tribunal) | Likely overstates—reflects data limitations |
| Our AI Predictions | 86.4% (BC WCAT) | More accurate—better training data |
| Our AI Predictions | 84.1% (other tribunals) | Averaged across mixed jurisdictions |
Conclusion: AI predictions are directionally correct but not gospel truth. Use them as a starting point, not a guarantee.
How to Use Predictions Responsibly
1. Compare to Similar Cases
- Don’t just look at AI prediction—read 5-10 similar cases manually
- Ask: “Do those cases have better/worse evidence than mine?”
- If your evidence is weaker, your real odds are lower than AI prediction
2. Focus on Patterns, Not Individual Cases
- AI says “chronic pain” has 87% success rate across 1,200 cases? That’s useful.
- AI says your specific case will be Allowed? That’s less reliable.
3. Consult a Lawyer for High-Stakes Cases
- If your appeal affects $50K+ in benefits, don’t rely solely on AI
- Lawyers can assess evidence quality (AI can’t)
4. Check Confidence Level
- High confidence: Reasonable to trust as a data point
- Medium confidence: Treat as “maybe, maybe not”
- Low confidence: Ignore—look at similar cases manually instead
Ongoing Improvements
What We’re Doing to Increase Accuracy
- Requesting official outcome data from WSIAT (covers 11,430 low-confidence cases)
- Adding full decision text (when available) instead of keywords only
- Retraining model quarterly as new decisions are published
- Integrating user feedback (“Was this prediction accurate?”) to identify weak spots
How You Can Help
- Report errors: If you know the actual outcome and it differs from prediction, let us know
- Share your case: If you win/lose an appeal, tell us what worked/didn’t (anonymously)
Related Resources
- Understanding Tribunal Outcomes - What “Allowed,” “Dismissed,” etc. mean
- What Affects Your Appeal Outcome? - Evidence factors that predict success
- All Outcome Statistics → - Full research methodology
📧 Email: empowrapp08162025@gmail.com
🔗 Mastodon: @3mpwrApp@mastodon.social
🔗 Bluesky: @3mpwrapp.bsky.social