Health Data Engineering Pipeline
Local lakehouse architecture using Airflow, dbt, DuckDB, and Apache Iceberg to support reproducible healthcare analytics and versioned datasets. Synthea generates synthetic patient records; MinIO provides S3-compatible object storage; Nessie handles catalog versioning. Modular DAGs enable schema evolution, data validation, and analytics-ready outputs.