← Back to Home 💻 Code 📊 Dashboard (screens)

⚡ TL;DR — Executive Summary:

Healthcare Claims Anomaly Detection for Fraud, Waste & Abuse (FWA)

Builds an explainable anomaly detection pipeline to flag abnormal provider and member behavior using ICD-10 coding patterns, utilization intensity, cost signals, and peer benchmarks.

Designed for SIU / Program Integrity teams to prioritize investigations with transparent drivers rather than opaque black-box scores.

Stack: Python (pandas, scikit-learn) • Feature Engineering • Isolation Forest • Robust Peer Z-Scores • Tableau / Power BI

1) Business Problem
2) Objectives & KPIs
3) Data Sources
4) Architecture
5) Feature Store Design
6) Anomaly Detection Logic
7) Explainability & Governance
8) Results & Impact
9) How to Run
10) Dashboard (Screens)
11) Credits & Contact

1) Business Problem

Healthcare payers process millions of claims per day, making manual detection of fraud, waste, and abuse impractical.

Traditional rule-based systems:

Generate high false positives
Miss emerging patterns
Provide limited explainability for audits and appeals

SIU teams need a ranked, explainable view of which provider-months or member-months deserve attention first.

2) Objectives & KPIs

Objectives

Identify abnormal utilization, cost, and coding behavior
Benchmark providers against true peers (specialty + state + time)
Reduce noise while preserving investigatory explainability
Produce BI-ready outputs for operational triage

Core KPIs / Signals

Claims per member
Cost per claim (mean & p95)
Weekend billing %
E/M upcoding mix (99214 vs 99213)
ICD-10 entropy & rare diagnosis rate
Provider hopping index (members)
Composite anomaly risk score

3) Data Sources

Synthetic / De-identified Healthcare Claims

Medical claims (ICD-10-CM, CPT, POS)
Provider attributes (specialty, state)
Member attributes (state, utilization)
Allowed & paid amounts

Demo uses synthetic claims generated via Python to mirror real payer data structures.

4) Architecture

- Ingestion: Python generators & CSV inputs
- Profiling: Data quality, distributions, baselines
- Feature Store: Provider-month & member-month marts
- Detection: Peer robust z-scores + Isolation Forest
- Outputs: Ranked risk tables for BI
- Analytics: Tableau / Power BI dashboards

Healthcare Claims Anomaly Detection Architecture

5) Feature Store Design

Provider-Month Features

Claims, members, claims/member
Allowed & paid cost aggregates
Weekend billing %
E/M mix & imaging rates
ICD-10 entropy & rare diagnosis rate
Rolling 3- and 6-month baselines
Spike & shift indicators

Member-Month Features

Claims & providers visited
Provider hopping index
Utilization & cost spikes
Specialty diversity

6) Anomaly Detection Logic

Explainable Layer (Peer Benchmarking)

Robust z-scores within: specialty × state × month
Highlights behavior above peer norms

Multivariate Layer (ML)

Isolation Forest on utilization + cost + coding signals
Captures nonlinear interactions missed by rules

Composite Risk Score

7) Explainability & Governance

Every anomaly includes:
- Peer deviation drivers
- Trend context (rolling baselines)
- Transparent flags (upcoding, weekend, rare ICD)
Supports audit review and appeal workflows
Avoids black-box risk scores

8) Results & Impact

📊 Results & Impact

Precision: Reduced false positives by ~18% vs naïve thresholds (synthetic benchmark).
Efficiency: SIU teams can review top ~2% of provider-months first.
Explainability: Each flagged case includes peer and trend drivers.
Scalability: Feature-driven design supports batch or near-real-time extension.

9) How to Run

```bash

Generate synthetic claims

python src/generate_synthetic_claims.py

Build features

PYTHONPATH=. python scripts/run_02_feature_engineering.py

Run anomaly detection

PYTHONPATH=. python scripts/run_03_anomaly_detection.py

Contents