Available for contract & full-time remote

Jeremy Tran

Data Engineer & AI Developer

5+ years building production-grade batch and streaming pipelines on GCP and AWS. Delivered data systems for Walmart, Macy's, and BlueCross BlueShield. Google Professional Data Engineer certified.

Python SQL Apache Airflow dbt Apache Flink Apache Iceberg Kafka DuckDB GCP BigQuery AWS Databricks Snowflake Docker LangChain LLM Agents

Projects

Production-oriented pipelines built with modern open-source tooling.

Health Data Engineering Pipeline

Local lakehouse architecture using Airflow, dbt, DuckDB, and Apache Iceberg to support reproducible healthcare analytics and versioned datasets. Synthea generates synthetic patient records; MinIO provides S3-compatible object storage; Nessie handles catalog versioning. Modular DAGs enable schema evolution, data validation, and analytics-ready outputs.

Airflow dbt DuckDB Apache Iceberg MinIO Nessie Docker

AI Automation Framework

Modular multi-agent orchestration framework built with LangChain. A coordinator agent delegates tasks to specialized sub-agents: a research agent for web summarization, a SQL agent for natural-language-to-query execution, and a RAG pipeline over local documents using FAISS. Structured extraction from unstructured text via Pydantic models.

LangChain Multi-Agent RAG FAISS GPT-4o Python SQLite
More on GitHub

Services

Available for short-term engagements and project-based contracts. Reach out to discuss scope and rates.

Pipeline Engineering

Batch and streaming ETL/ELT pipelines from ingestion to analytics-ready outputs. Airflow, Flink, Kafka, dbt, Apache Iceberg.

Cloud Data Architecture

Data infrastructure design and migration on GCP and AWS. BigQuery optimization, Lakehouse architecture, cost-effective storage strategy.

AI & Agent Workflows

LLM-powered automation using LangChain and multi-agent patterns. RAG pipelines, SQL agents, structured extraction, and workflow orchestration.

Data Quality & Modeling

Schema design, data validation frameworks, and dbt modeling. Audit SQL, fix data integrity issues, and build stakeholder-ready outputs.

AI/ML Data Preparation

Training and evaluation dataset production for ML models — schema consistency, edge case coverage, labeling accuracy, and annotation QA.

Let's Work Together

Have a project in mind? Available for contracts, consulting, and remote full-time roles.

Start a conversation

About

I'm Jeremy Tran, a data engineer and AI developer based in Irvine, CA, with 5+ years of experience delivering production data systems at scale. I've worked across the stack at Tredence (Walmart), Infosys (Macy's, BlueCross BlueShield), and as an independent AI/ML data contractor — with a consistent focus on data quality, schema evolution, and analytics-ready outputs.

I hold a BA in Neuroscience from Pomona College and am Google Professional Data Engineer certified. Currently open to remote contract and full-time roles in data engineering and AI.

  • Google Professional Data Engineer Google Cloud Certified
  • AI/ML Data Contractor Freelance, Remote · January 2024 – Present
  • Tredence — Data Engineering Consultant Walmart · GCP BigQuery · Python · PowerBI
  • Infosys — Associate, Data Engineering Macy's · BlueCross BlueShield · Airflow · BigQuery

Languages

Python SQL Java Scala JavaScript

Pipelines & Orchestration

Apache Airflow Apache Flink Kafka dbt Apache Spark

Data & Storage

Apache Iceberg DuckDB BigQuery Databricks Snowflake

Cloud & Infra

GCP AWS Azure Docker Linux

AI & Tooling

LangChain LLM Agents RAG Prompt Engineering

Contact