I'm a Data Engineer III focused on building reliable, scalable data platforms and production pipelines for biotech, diagnostics, and high-trust analytical workflows.
My work sits at the intersection of data engineering, cloud infrastructure, distributed data processing, production analytics, and ML-adjacent systems. I enjoy designing systems that can move data from messy operational sources into clean, governed, and reusable data products.
π Website: jowinjestine.github.io
- Production data pipelines for API, file, database, and event-driven ingestion
- Incremental and idempotent ingestion systems that reduce duplicate processing and improve reliability
- Data models that support traceability, auditability, lineage, and operational reporting
- Batch and near-real-time workflows for internal platforms and scientific operations
- Cloud-native pipelines using Azure storage, functions, events, queues, and PostgreSQL
- Internal data tools and services using Python, SQL, FastAPI, Streamlit, R Shiny, and Docker
- ML-ready datasets, model input pipelines, prediction output tracking, and reproducible reporting flows
I care about building systems that are:
- Reliable β resilient to retries, late data, bad payloads, and partial failures
- Scalable β designed for growing data volume, more users, and more downstream consumers
- Observable β logged, monitored, and easy to debug when something breaks
- Governed β traceable from source system to transformed data product
- Maintainable β clear schemas, modular pipelines, and practical documentation
- Useful β built around real business, scientific, and operational workflows
- High-usage data pipelines and platform engineering
- API ingestion, webhook reconciliation, and change-data tracking
- Distributed and cloud-based data processing
- Data lake and warehouse architecture
- PostgreSQL schema design and analytical data modeling
- Data quality, lineage, observability, and audit logging
- Production analytics and ML-adjacent data systems
- Replacing spreadsheet-heavy workflows with durable data platforms
Languages: Python, SQL, R
Data Engineering: PostgreSQL, pandas, PySpark, DuckDB, dbt, SQLMesh
Cloud & Infra: Azure, Docker, Linux, GitHub, Bitbucket
Azure: ADLS Gen2, Azure Functions, Event Grid, Event Hub, Azure PostgreSQL
Apps & APIs: FastAPI, Streamlit, R Shiny, Plumber, Plotly
ML / Analytics: scikit-learn, XGBoost, PyTorch, statistical modeling
Orchestration: Mage AI, scheduled pipelines, event-driven workflows
I am especially interested in data platform work where pipelines are treated as production systems, not one-off scripts. That includes clear contracts, schema evolution, validation, logging, retries, backfills, monitoring, and well-defined ownership of data products.
A lot of my work involves connecting lab data, metadata, files, models, reports, and operational events through a clear data model so that teams can trust how data moves through the system.
- Incremental API ingestion and webhook reconciliation
- ELN, inventory, and operational data integration
- Sample, run, file, and report-level traceability
- Azure-based storage and event-driven processing
- PostgreSQL-backed analytical and operational systems
- Internal tools for scientific and business workflows
- Data quality checks, logging, monitoring, and audit trails
- ML-ready data generation and prediction reporting pipelines
- Website: jowinjestine.github.io
- GitHub: @jowinjestine
- Email: jowinjestine@gmail.com
Thanks for stopping by.

