Jowin Jestine jowinjestine

Hi, I'm Jowin Jestine 👋

I'm a Data Engineer III focused on building reliable, scalable data platforms and production pipelines for biotech, diagnostics, and high-trust analytical workflows.

My work sits at the intersection of data engineering, cloud infrastructure, distributed data processing, production analytics, and ML-adjacent systems. I enjoy designing systems that can move data from messy operational sources into clean, governed, and reusable data products.

🌐 Website: jowinjestine.github.io

What I build

Production data pipelines for API, file, database, and event-driven ingestion
Incremental and idempotent ingestion systems that reduce duplicate processing and improve reliability
Data models that support traceability, auditability, lineage, and operational reporting
Batch and near-real-time workflows for internal platforms and scientific operations
Cloud-native pipelines using Azure storage, functions, events, queues, and PostgreSQL
Internal data tools and services using Python, SQL, FastAPI, Streamlit, R Shiny, and Docker
ML-ready datasets, model input pipelines, prediction output tracking, and reproducible reporting flows

Data engineering focus

I care about building systems that are:

Reliable — resilient to retries, late data, bad payloads, and partial failures
Scalable — designed for growing data volume, more users, and more downstream consumers
Observable — logged, monitored, and easy to debug when something breaks
Governed — traceable from source system to transformed data product
Maintainable — clear schemas, modular pipelines, and practical documentation
Useful — built around real business, scientific, and operational workflows

Areas I care about

High-usage data pipelines and platform engineering
API ingestion, webhook reconciliation, and change-data tracking
Distributed and cloud-based data processing
Data lake and warehouse architecture
PostgreSQL schema design and analytical data modeling
Data quality, lineage, observability, and audit logging
Production analytics and ML-adjacent data systems
Replacing spreadsheet-heavy workflows with durable data platforms

Tech stack

Languages: Python, SQL, R
Data Engineering: PostgreSQL, pandas, PySpark, DuckDB, dbt, SQLMesh
Cloud & Infra: Azure, Docker, Linux, GitHub, Bitbucket
Azure: ADLS Gen2, Azure Functions, Event Grid, Event Hub, Azure PostgreSQL
Apps & APIs: FastAPI, Streamlit, R Shiny, Plumber, Plotly
ML / Analytics: scikit-learn, XGBoost, PyTorch, statistical modeling
Orchestration: Mage AI, scheduled pipelines, event-driven workflows

Current focus

I am especially interested in data platform work where pipelines are treated as production systems, not one-off scripts. That includes clear contracts, schema evolution, validation, logging, retries, backfills, monitoring, and well-defined ownership of data products.

A lot of my work involves connecting lab data, metadata, files, models, reports, and operational events through a clear data model so that teams can trust how data moves through the system.

Selected themes from my work

Incremental API ingestion and webhook reconciliation
ELN, inventory, and operational data integration
Sample, run, file, and report-level traceability
Azure-based storage and event-driven processing
PostgreSQL-backed analytical and operational systems
Internal tools for scientific and business workflows
Data quality checks, logging, monitoring, and audit trails
ML-ready data generation and prediction reporting pipelines

Connect

Thanks for stopping by.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly