Jeremy Tran · Data Engineer & AI Developer

Jeremy Tran

Data Engineer & AI Developer

5+ years building production-grade batch and streaming pipelines on GCP and AWS. Delivered data systems for Walmart, Macy's, and BlueCross BlueShield. Google Professional Data Engineer certified.

Projects

Production-oriented pipelines built with modern open-source tooling.

Health Data Engineering Pipeline

Local lakehouse architecture using Airflow, dbt, DuckDB, and Apache Iceberg to support reproducible healthcare analytics and versioned datasets. Synthea generates synthetic patient records; MinIO provides S3-compatible object storage; Nessie handles catalog versioning. Modular DAGs enable schema evolution, data validation, and analytics-ready outputs.

Airflow dbt DuckDB Apache Iceberg MinIO Nessie Docker

ScoreChat

RAG application that indexes classical music score documents and related text, enabling natural-language Q&A over sheet music, program notes, and musicological sources. Built with LangChain, giving musicians and researchers a conversational interface into a corpus of classical repertoire.

LangChain RAG Python Vector Search Classical Music

Services

Available for short-term engagements and project-based contracts. Reach out to discuss scope and rates.

Pipeline Engineering

Batch and streaming ETL/ELT pipelines from ingestion to analytics-ready outputs. Airflow, Flink, Kafka, dbt, Apache Iceberg.

Cloud Data Architecture

Data infrastructure design and migration on GCP and AWS. BigQuery optimization, Lakehouse architecture, cost-effective storage strategy.

AI & Agent Workflows

LLM-powered automation using LangChain and multi-agent patterns. RAG pipelines, SQL agents, structured extraction, and workflow orchestration.

Data Quality & Modeling

Schema design, data validation frameworks, and dbt modeling. Audit SQL, fix data integrity issues, and build stakeholder-ready outputs.

AI/ML Data Preparation

Training and evaluation dataset production for ML models — schema consistency, edge case coverage, labeling accuracy, and annotation QA.

Let's Work Together

Have a project in mind? Available for contracts, consulting, and remote full-time roles.

Start a conversation

About

I'm Jeremy Tran, a data engineer and AI developer based in Irvine, CA, with 5+ years of experience delivering production data systems at scale. I've worked across the stack at Tredence (Walmart), Infosys (Macy's, BlueCross BlueShield), and as an independent AI/ML data contractor — with a consistent focus on data quality, schema evolution, and analytics-ready outputs.

I hold a BA in Neuroscience from Pomona College and am Google Professional Data Engineer certified. Currently open to remote contract and full-time roles in data engineering and AI.

Google Professional Data Engineer Google Cloud Certified
AI/ML Data Contractor Freelance, Remote · January 2024 – Present
Tredence — Data Engineering Consultant Walmart · GCP BigQuery · Python · PowerBI
Infosys — Associate, Data Engineering Macy's · BlueCross BlueShield · Airflow · BigQuery

Languages

Python SQL Java Scala JavaScript/TypeScript

Pipelines & Orchestration

Apache Airflow Apache Flink Apache Kafka dbt Apache Spark

Data & Storage

Apache Iceberg DuckDB BigQuery Databricks Snowflake

Cloud & Infra

GCP AWS Azure Docker Linux

Web

TypeScript Node.js React

AI & Tooling

LangChain Vector DBs RAG Prompt Engineering

Contact

Get in touch.

Open to remote contract work and full-time roles in data engineering and AI.

jeremy@jt0321.com linkedin.com/in/jt0321 github.com/jt0321