Data Limitations & Methodology

We believe in transparency. Here’s what our research CAN and CANNOT tell you, explained in plain language.

🎯 Our Promise: Honest About What We Don’t Know

Many researchers hide their limitations in footnotes. We put ours front and center.

Why? Because YOU deserve to make informed decisions about your workplace injury appeal.

✅ What We DID: The Good Stuff

1. Extracted 230,392 Real Records

Source: Public data from Ontario government tribunals and WSIB

What we got:

✅ 98,992 WSIAT decisions (1987-2026) - every single publicly available decision
✅ 91,814 NEER employers (2017-2020) - large employers in safety program
✅ 38,922 CAD-7 employers (2017-2020) - small employers in safety program
✅ 664 Premium Rate Groups - industry classifications with rates
✅ 62,093 HRTO decisions (2016-2025) - quarterly aggregate counts only

Quality: These numbers are REAL, not estimated.

2. Counted Injury Patterns

Method: Computer searched all 98,992 WSIAT decisions for injury keywords

Found: 10 injury types with exact counts

Back/Spine: 15,177 cases (15.3%)
Hearing Loss: 9,650 cases (9.7%)
Chronic Pain: 7,502 cases (7.6%)
[Full list in main guide]

Quality: ✅ Complete - we checked every decision

3. Created Real Templates from Winning Cases

Source: 264 templates based on actual WSIAT decisions with “Allowed” outcomes

What’s in them:

Real medical evidence that won
Real arguments that worked
Real case citations
Real tribunal reasoning

Quality: ✅ Verified - every template traces to a real CanLII decision

⚠️ What We DIDN’T DO: The Limitations

1. Success Rates: The 6.1% Problem

What we tried: Find outcome (allowed/denied) for all 98,992 decisions

Method: Computer searched for keywords: “allowed”, “denied”, “dismissed”, “partially allowed”

Result: Only 6,040 out of 98,992 decisions (6.1%) had clear outcome keywords

What this means:

✅ We KNOW: 726 allowed, 5,314 denied, from those 6,040 cases
✅ We can calculate: 12.0% success rate (726 ÷ 6,040)
❌ We DON’T KNOW: Outcomes for the other 92,952 decisions (93.9%)

Why Is This Happening?

WSIAT writes decisions using legal language, not keywords.

Examples of language we CAN’T detect:

“The panel finds in favor of the worker”
“Entitlement is established”
“The appeal succeeds”
“The Board’s decision is set aside”
“The worker has met the burden of proof”

These all mean “allowed” - but our keyword search misses them.

What Do Others Say?

Independent advocacy groups report: 60-70% success rate for represented appellants

Why the difference?

They track their own clients (selection bias - they take strong cases)
They provide representation (which improves outcomes)
They have full case files (not just public summaries)

What Should You Believe?

The honest answer: We don’t know the exact success rate.

What we DO know:

Appeals work better than not appealing (0% if you don’t try)
Representation helps (advocacy groups get 60-70%)
Medical evidence is crucial (strongest predictor in our data)
98.25% of denied workers never appeal (the real problem)

Our recommendation: Focus on building the strongest case possible, not worrying about statistics.

2. Industry × Injury Correlations: Not Done Yet

What we wanted: Link injury types to specific industries (e.g., “Construction workers have X% back injuries”)

What we have:

✅ Injury patterns across ALL industries combined
✅ Employer counts by city
✅ 664 WSIB industry rate groups
❌ NOT linked together yet

Why not?

WSIAT decisions mention injuries but rarely specify employer’s industry code
Need full-text analysis to extract industry mentions from decision narratives
Planned for Phase 2 (needs AI/NLP analysis)

What this means for you:

You can see BACK INJURIES are 15.3% of all appeals
You CANNOT yet see “Construction back injuries vs Healthcare back injuries”
Templates don’t yet say “In your industry, the success rate is…”

Workaround: Use the general injury patterns + your knowledge of your industry

3. Vice-Chair Patterns: Not Analyzed

What we wanted: Track which WSIAT panel members allow more/fewer appeals

What we have:

✅ 98,992 decisions
❌ No extraction of vice-chair names
❌ No tracking of panel member patterns

Why not?

Complex to extract from free-text decisions
Ethically sensitive (could enable “judge shopping”)
Planned for Phase 2 with ethical disclosure

What this means for you:

You CANNOT pick a “favorable” vice-chair
You CANNOT avoid a “harsh” vice-chair
Your case will be assigned randomly (as it should be)

Our position: WSIAT decisions should be consistent regardless of panel composition. If they’re not, that’s a system problem, not a strategy opportunity.

4. Individual Case Prediction: IMPOSSIBLE

What we CANNOT do: Predict YOUR specific case outcome

Why not?

Every case has unique facts
Your medical evidence is different
Your treatment history is different
Your employer’s response is different
Panel composition varies
Tribunal interpretation evolves

Warning: Anyone who promises “90% success rate” based on statistics alone is misleading you.

What we CAN do:

Show you patterns (back injuries are common)
Give you templates (what worked in similar cases)
Explain the process (how WSIAT works)
Provide context (appeals work, most people don’t try)

Your outcome depends on YOUR evidence, not our statistics.

📊 Data Quality Badges: What They Mean

Throughout the site, you’ll see these badges:

✅ Complete

We extracted 100% of available data
Numbers are verified and accurate
Example: “98,992 WSIAT decisions analyzed”

⚠️ Limited

We extracted partial data with known gaps
Coverage is incomplete
Example: “6.1% of decisions have detectable outcomes”

📊 Calculated

We derived this from other data
Math is correct, but based on limited inputs
Example: “12.0% success rate (from 6.1% coverage)”

🔄 Updating

Data collection is ongoing
May change as we add sources
Example: “Employer safety records (2017-2020, new data pending)”

💡 Estimated

We made an informed guess based on patterns
Not directly measured
Example: “ONSBT success rate 40-60% (limited public data)”

🔗 External

Data from another source
We did not extract this ourselves
Example: “WSIB registered claims from Safety Check Portal”

🔍 Methodology: How We Did It

Extraction Process

Step 1: Data Collection (January-April 2026)

Downloaded all publicly available WSIAT decisions from CanLII
Downloaded WSIB NEER/CAD-7 employer lists (CSV format)
Downloaded HRTO quarterly statistical reports (39 Excel files)
Downloaded WSIB Premium Rate Schedule (PDF)

Step 2: Parsing (April 2026)

Used ExcelJS library to parse HRTO Excel files
Custom CSV parser for NEER/CAD-7 files (handles multi-line quoted fields)
Dynamic header detection (handles metadata rows)
Extracted 230,392 total records in 23.4 seconds

Step 3: Keyword Matching (April 2026)

Searched WSIAT decision text for injury keywords:
- “back”, “spine”, “lumbar”, “disc” → Back/Spine Injuries
- “hearing”, “deaf”, “tinnitus” → Hearing Loss
- “chronic pain”, “CRPS”, “fibromyalgia” → Chronic Pain
- [Full keyword list in source code]
Found 39,556 cases with injury keywords (39.9% coverage)
Outcome keywords: “allowed”, “denied”, “dismissed”, “partial”
Found 6,040 cases with outcome keywords (6.1% coverage)

Step 4: Aggregation (April 2026)

Grouped by year (2016-2025)
Grouped by city (top 15 Ontario cities)
Grouped by injury type (top 10 types)
Generated aggregated-statistics.json with summary stats

Step 5: Visualization (April 2026)

Created 5 D3.js interactive charts
Added data quality warnings to each chart
Linked to source data and methodology

Quality Control

What we checked:

✅ No duplicate records
✅ All dates parsed correctly
✅ All numbers sum correctly
✅ All source links work
✅ All statistics cite sources

What we couldn’t check:

❌ Accuracy of government source data (we trust WSIAT/WSIB published records)
❌ Completeness of CanLII database (assumes all public decisions are available)
❌ Keyword matching accuracy (assumes our keywords catch all mentions)

🌐 Accessibility: Research for Everyone

Plain Language

We wrote this research for injured workers, not academics.

Rules we followed:

Short sentences (under 20 words when possible)
Common words (not jargon)
Active voice (“We analyzed” not “It was analyzed”)
Explanations before statistics
Tables for comparison
Bullet points for lists

If something is confusing, tell us - we’ll fix it.

All visualizations have:

alt text describing the chart
Data tables as fallback
ARIA labels for interactive elements
Keyboard navigation

All guides have:

Logical heading hierarchy (H1 → H2 → H3)
Link text that makes sense out of context
No “click here” links
Tables with <th> headers

High Contrast Mode

All text meets WCAG AAA standards:

7:1 contrast ratio for normal text
4.5:1 contrast ratio for large text
Works in Windows High Contrast Mode
Works in browser dark mode

Multiple Formats

Data available in:

Interactive visualizations (D3.js charts)
Plain text guides (Markdown)
JSON data files (for researchers)
PDF downloads (coming soon)

📅 Phase 2: What’s Coming

Planned Improvements (2026-2027)

1. Full-Text NLP Classification

Use AI (GPT-4/Claude) to read all 98,992 decisions
Extract outcomes from legal language
Get real success rates (not just 6.1% coverage)
Impact: Most important improvement

2. Industry × Injury Correlations

Link WSIAT decisions to WSIB industry codes
Show injury patterns by sector (Construction, Healthcare, Manufacturing)
Create industry-specific appeal guides
Impact: Helps workers compare to their industry peers

3. Temporal Policy Analysis

Track how WSIAT interpretation changes over time
Identify policy shifts and landmark decisions
Show if getting easier/harder to win appeals
Impact: Helps advocates lobby for policy reform

4. Vice-Chair Patterns (with ethical disclosure)

Extract panel member names
Track decision patterns
Analyze consistency across panels
Impact: Shows if WSIAT decisions are fair and consistent

What Won’t Change

We will NEVER:

❌ Hide limitations to make data look better
❌ Claim higher accuracy than we have
❌ Predict individual case outcomes
❌ Charge for access to basic research
❌ Remove data that doesn’t fit our narrative

Transparency is non-negotiable.

🤝 How You Can Help

Report Errors

Found a mistake? Tell us

We promise to:

Fix it within 48 hours
Post a correction
Thank you in acknowledgments (if you want)

Won your appeal? Share what worked (anonymously)

We’ll:

Add it to our templates
Update success rate data
Help the next injured worker

Suggest Improvements

What data do you need? Request it

Common requests we’re working on:

More injury types (we have top 10, but there are 50+)
Industry breakdowns (Phase 2)
Regional patterns (Phase 2)
Success rates by representation type (need better outcome detection first)

📚 For Researchers: Technical Details

Dataset Specifications

WSIAT Decisions Dataset

Size: 98,992 decisions
Date Range: 1987-2026 (39 years)
Format: JSON (structured)
Source: CanLII API
Fields: decision_id, date, summary, keywords, citation, url
Limitations: No full decision text (only summaries)

NEER Employer Dataset

Size: 91,814 employers
Date Range: 2017-2020 (4 years)
Format: CSV (WSIB export)
Fields: firm_number, city, postal_code, rate_group, rebate, surcharge
Limitations: Missing monetary amounts (field parsing issue)

CAD-7 Employer Dataset

Size: 38,922 employers
Date Range: 2017-2020 (4 years)
Format: CSV (WSIB export)
Fields: firm_number, city, postal_code, rate_group, adjustment
Limitations: Fewer fields than NEER

Aggregated Statistics

Location: data/comprehensive-extraction/aggregated-statistics.json
Size: ~50KB
Structure: Hierarchical JSON with yearly, injury, and employer breakdowns
License: CC BY 4.0 (attribution required)

Replication Instructions

To reproduce our analysis:

Clone repository: git clone https://github.com/S0vryn9-C011ect1ve/3mpwrapp.github.io.git
Install dependencies: npm install
Run extraction: node scripts/extract-ultra-comprehensive.mjs
Run aggregation: node scripts/aggregate-real-data.mjs
View results: data/comprehensive-extraction/aggregated-statistics.json

Computing requirements:

Node.js 20+
16GB RAM
10GB disk space
~30 minutes runtime

Citation

If you use this data:

3mpwrApp Research Team. (2026). Comprehensive Analysis of 98,992 WSIAT Workplace Injury Appeal Decisions (1987-2026). Retrieved from https://3mpwrapp.pages.dev/data-limitations/

BibTeX:

@misc{3mpwrapp2026wsiat,
  author = ,
  title = {Comprehensive Analysis of 98,992 WSIAT Workplace Injury Appeal Decisions (1987-2026)},
  year = {2026},
  url = {https://3mpwrapp.pages.dev/data-limitations/},
  note = {Dataset includes 230,392 records from WSIAT, HRTO, NEER, and CAD-7 programs}
}

✉️ Contact

Questions about methodology: feedback@3mpwrapp.ca

Found an error: Report it

Need raw data: Available on request for academic/advocacy use

Media inquiries: Include “MEDIA” in subject line

*Last updated: April 30, 2026

Next update: Fall 2026 (Phase 2 NLP analysis)*

Data Limitations & Methodology

🎯 Our Promise: Honest About What We Don’t Know

✅ What We DID: The Good Stuff

1. Extracted 230,392 Real Records

2. Counted Injury Patterns

3. Created Real Templates from Winning Cases

⚠️ What We DIDN’T DO: The Limitations

1. Success Rates: The 6.1% Problem

Why Is This Happening?

What Do Others Say?

What Should You Believe?

2. Industry × Injury Correlations: Not Done Yet

3. Vice-Chair Patterns: Not Analyzed

4. Individual Case Prediction: IMPOSSIBLE

📊 Data Quality Badges: What They Mean

✅ Complete

⚠️ Limited

📊 Calculated

🔄 Updating

💡 Estimated

🔗 External

🔍 Methodology: How We Did It

Extraction Process

Quality Control

🌐 Accessibility: Research for Everyone

Plain Language

Screen Reader Support

High Contrast Mode

Multiple Formats

📅 Phase 2: What’s Coming

Planned Improvements (2026-2027)

What Won’t Change

🤝 How You Can Help

Report Errors

Share Your Case

Suggest Improvements

📚 For Researchers: Technical Details

Dataset Specifications

Replication Instructions

Citation

✉️ Contact