FHIRSparkSnowflake
FHIR ETL — Spark to Snowflake
Tech: PySpark, Delta/Parquet, Snowflake Streams/Tasks, dbt
End-to-end ingestion of FHIR resources (Patient, Encounter, Observation, Claim) into curated Snowflake models.
Includes flattening, schema validation, ICD-10 / LOINC enrichment, SCD2 dimensions, and dbt tests for data quality.
💻 Code
📊 Architecture
📄 Case Study
ClaimsICD-10Anomaly
Healthcare Claims — Anomaly Detection
Tech: Python (pandas/PySpark), feature engineering, BI
Claims profiling and anomaly scoring using ICD-10 groupers, utilization patterns, cost outliers, and frequency spikes.
Generates prioritized review queues and summary dashboards for integrity teams (FWA).
💻 Code
📄 Case Study
UtilizationLOS
Patient Utilization & LOS Dashboard
Tech: Snowflake, semantic layer, Looker Studio
Combines EHR encounters, claims, and provider data into marts for utilization, length of stay (LOS),
readmissions, and throughput. Exposes flexible filters for payer type, diagnosis groups, and service lines.
📊 Live Dashboard
📝 Design Notes
NetworkLeakage
Provider Network & Leakage Analytics
Tech: SQL, graph-style modeling, BI
Models patient flows across in-network and out-of-network providers to quantify leakage, referral patterns,
and high-opportunity service lines. Supports network optimization and contracting strategy.
💻 Code
📊 Network Dashboard
Feature StoreReadmission
Readmission Risk & Care Management Feature Store
Tech: Databricks, Delta Lake, feature store
Curates patient-level features like comorbidities, prior utilization, SDoH proxies, and discharge disposition
into an ML-ready feature store for 30-day readmission and care management prioritization models.
💻 Feature Engineering
📄 Data Dictionary