How Accurate Are Outcome Predictions?

Short answer: Our AI model correctly predicts tribunal outcomes 79.0% of the time when tested on cases it’s never seen before.


How We Measured Accuracy

Training vs. Testing Split

  • Training set: 252,978 decisions (98%) - Used to teach the AI
  • Test set: 3,756 decisions (2%) - Hidden from AI, used only for accuracy testing
  • Result: 79.0% of test cases predicted correctly

What Does 79% Mean?

  • If the AI predicts “Allowed”: There’s a 79% chance the actual decision was “Allowed”
  • If the AI predicts “Dismissed”: There’s a 79% chance the actual decision was “Dismissed”
  • 21% error rate: In 1 out of 5 cases, the prediction is wrong

Industry Context

  • Legal AI benchmarks: 70-85% is typical for outcome prediction tasks
  • Human accuracy: Experienced lawyers predict outcomes at 66-75% accuracy (according to legal research)
  • Our 79% accuracy: Above average, but not perfect

Confidence Levels Explained

High Confidence (≥80% probability)

  • What we show: Only high-confidence predictions appear in app search results
  • Example: “Allowed (85% confidence)” = AI is 85% certain this is correct
  • Deployment: 25,213 decisions (18.4%) meet high-confidence threshold
  • Your risk: 15-20% chance the prediction is wrong

Medium Confidence (60-79% probability)

  • What we show: Warning label: “AI predicted, medium confidence”
  • Example: “Dismissed (72% confidence)” = AI thinks it’s probably Dismissed, but less certain
  • Your risk: 28-40% chance the prediction is wrong

Low Confidence (<60% probability)

  • What we show: “Outcome unknown” or no prediction displayed
  • Example: “Allowed (48% confidence)” = AI is guessing—don’t trust it
  • Not deployed: 11,430 WSIAT decisions (2020-2026) flagged low-confidence

Why Some Predictions Are Wrong

1. Limited Training Data for Rare Outcomes

  • Problem: Only 27 “Settled” cases in training data vs. 47,198 “Granted” cases
  • Result: AI is terrible at predicting settlements (too few examples)
  • Your risk: Higher for rare outcomes like “Partial Win” or “No Jurisdiction”

2. Missing Context in Keywords

  • Problem: CanLII API returns only brief keywords, not full decision text
  • Example: Decision says “chronic pain,” but doesn’t say if medical evidence was strong
  • Result: AI can’t distinguish between well-documented vs. poorly-documented chronic pain claims
  • Your risk: Higher for recent WSIAT cases (2020-2026) with sparse keywords

3. Tribunal Policy Changes

  • Problem: AI learns from historical decisions (2020-2023), but tribunals change policies
  • Example: If WSIAT started favoring chronic pain claims more in 2025, AI trained on 2020-2023 data won’t know
  • Your risk: Higher for very recent decisions (2025-2026)

4. Case-Specific Nuances

  • Problem: Two “low back pain” cases can have completely different evidence quality
  • Example: Case A has 5 specialist reports + employer incident report; Case B has only family doctor note
  • Result: AI sees both as “low back pain” but can’t distinguish evidence strength
  • Your risk: Always compare your evidence quality to similar cases

When to Trust Predictions

✅ Trust predictions when:

  • High confidence (≥80%): AI is relatively certain
  • Common outcome: “Allowed,” “Granted,” “Dismissed” (AI has seen thousands of examples)
  • Historical WSIAT data: Pre-2020 decisions with rich keywords
  • HRTO/ONSBT: These tribunals have more consistent outcome patterns

⚠️ Question predictions when:

  • Medium/low confidence: AI is uncertain
  • Rare outcome: “Partial Win,” “Settled,” “No Jurisdiction”
  • Recent WSIAT (2020-2026): Sparse keywords, lower AI accuracy
  • Your case has unique factors: Complex pre-existing conditions, multiple injuries, policy violations

Comparing to Official Statistics

Data Source WSIAT Win Rate Notes
WSIAT Annual Report (official) 65-73% Worker success rate on entitlement appeals
Our AI Predictions 100% (onwsiat tribunal) Likely overstates—reflects data limitations
Our AI Predictions 86.4% (BC WCAT) More accurate—better training data
Our AI Predictions 84.1% (other tribunals) Averaged across mixed jurisdictions

Conclusion: AI predictions are directionally correct but not gospel truth. Use them as a starting point, not a guarantee.


How to Use Predictions Responsibly

1. Compare to Similar Cases

  • Don’t just look at AI prediction—read 5-10 similar cases manually
  • Ask: “Do those cases have better/worse evidence than mine?”
  • If your evidence is weaker, your real odds are lower than AI prediction

2. Focus on Patterns, Not Individual Cases

  • AI says “chronic pain” has 87% success rate across 1,200 cases? That’s useful.
  • AI says your specific case will be Allowed? That’s less reliable.

3. Consult a Lawyer for High-Stakes Cases

  • If your appeal affects $50K+ in benefits, don’t rely solely on AI
  • Lawyers can assess evidence quality (AI can’t)

4. Check Confidence Level

  • High confidence: Reasonable to trust as a data point
  • Medium confidence: Treat as “maybe, maybe not”
  • Low confidence: Ignore—look at similar cases manually instead

Ongoing Improvements

What We’re Doing to Increase Accuracy

  1. Requesting official outcome data from WSIAT (covers 11,430 low-confidence cases)
  2. Adding full decision text (when available) instead of keywords only
  3. Retraining model quarterly as new decisions are published
  4. Integrating user feedback (“Was this prediction accurate?”) to identify weak spots

How You Can Help

  • Report errors: If you know the actual outcome and it differs from prediction, let us know
  • Share your case: If you win/lose an appeal, tell us what worked/didn’t (anonymously)


📧 Email: empowrapp08162025@gmail.com
🔗 Mastodon: @3mpwrApp@mastodon.social
🔗 Bluesky: @3mpwrapp.bsky.social