Contents
- 1) Business Problem
- 2) Objectives & KPIs
- 3) Data Sources
- 4) Architecture
- 5) Feature Store Design
- 6) Anomaly Detection Logic
- 7) Explainability & Governance
- 8) Results & Impact
- 9) How to Run
- 10) Dashboard (Screens)
- 11) Credits & Contact
1) Business Problem
Healthcare payers process millions of claims per day, making manual detection of fraud, waste, and abuse impractical.
Traditional rule-based systems:
- Generate high false positives
- Miss emerging patterns
- Provide limited explainability for audits and appeals
SIU teams need a ranked, explainable view of which provider-months or member-months deserve attention first.
2) Objectives & KPIs
Objectives
- Identify abnormal utilization, cost, and coding behavior
- Benchmark providers against true peers (specialty + state + time)
- Reduce noise while preserving investigatory explainability
- Produce BI-ready outputs for operational triage
Core KPIs / Signals
- Claims per member
- Cost per claim (mean & p95)
- Weekend billing %
- E/M upcoding mix (99214 vs 99213)
- ICD-10 entropy & rare diagnosis rate
- Provider hopping index (members)
- Composite anomaly risk score
3) Data Sources
Synthetic / De-identified Healthcare Claims
- Medical claims (ICD-10-CM, CPT, POS)
- Provider attributes (specialty, state)
- Member attributes (state, utilization)
- Allowed & paid amounts
Demo uses synthetic claims generated via Python to mirror real payer data structures.
4) Architecture
- Ingestion: Python generators & CSV inputs
- Profiling: Data quality, distributions, baselines
- Feature Store: Provider-month & member-month marts
- Detection: Peer robust z-scores + Isolation Forest
- Outputs: Ranked risk tables for BI
- Analytics: Tableau / Power BI dashboards
5) Feature Store Design
Provider-Month Features
- Claims, members, claims/member
- Allowed & paid cost aggregates
- Weekend billing %
- E/M mix & imaging rates
- ICD-10 entropy & rare diagnosis rate
- Rolling 3- and 6-month baselines
- Spike & shift indicators
Member-Month Features
- Claims & providers visited
- Provider hopping index
- Utilization & cost spikes
- Specialty diversity
6) Anomaly Detection Logic
Explainable Layer (Peer Benchmarking)
- Robust z-scores within:
specialty Γ state Γ month - Highlights behavior above peer norms
Multivariate Layer (ML)
- Isolation Forest on utilization + cost + coding signals
- Captures nonlinear interactions missed by rules
Composite Risk Score
7) Explainability & Governance
- Every anomaly includes:
- Peer deviation drivers
- Trend context (rolling baselines)
- Transparent flags (upcoding, weekend, rare ICD)
- Supports audit review and appeal workflows
- Avoids black-box risk scores
8) Results & Impact
9) How to Run
```bash
Generate synthetic claims
python src/generate_synthetic_claims.py
Build features
PYTHONPATH=. python scripts/run_02_feature_engineering.py
Run anomaly detection
PYTHONPATH=. python scripts/run_03_anomaly_detection.py