Skip to content
View jowinjestine's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report jowinjestine

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jowinjestine/README.md

Hi, I'm Jowin Jestine πŸ‘‹

I'm a Data Engineer III focused on building reliable, scalable data platforms and production pipelines for biotech, diagnostics, and high-trust analytical workflows.

My work sits at the intersection of data engineering, cloud infrastructure, distributed data processing, production analytics, and ML-adjacent systems. I enjoy designing systems that can move data from messy operational sources into clean, governed, and reusable data products.

🌐 Website: jowinjestine.github.io

What I build

  • Production data pipelines for API, file, database, and event-driven ingestion
  • Incremental and idempotent ingestion systems that reduce duplicate processing and improve reliability
  • Data models that support traceability, auditability, lineage, and operational reporting
  • Batch and near-real-time workflows for internal platforms and scientific operations
  • Cloud-native pipelines using Azure storage, functions, events, queues, and PostgreSQL
  • Internal data tools and services using Python, SQL, FastAPI, Streamlit, R Shiny, and Docker
  • ML-ready datasets, model input pipelines, prediction output tracking, and reproducible reporting flows

Data engineering focus

I care about building systems that are:

  • Reliable β€” resilient to retries, late data, bad payloads, and partial failures
  • Scalable β€” designed for growing data volume, more users, and more downstream consumers
  • Observable β€” logged, monitored, and easy to debug when something breaks
  • Governed β€” traceable from source system to transformed data product
  • Maintainable β€” clear schemas, modular pipelines, and practical documentation
  • Useful β€” built around real business, scientific, and operational workflows

Areas I care about

  • High-usage data pipelines and platform engineering
  • API ingestion, webhook reconciliation, and change-data tracking
  • Distributed and cloud-based data processing
  • Data lake and warehouse architecture
  • PostgreSQL schema design and analytical data modeling
  • Data quality, lineage, observability, and audit logging
  • Production analytics and ML-adjacent data systems
  • Replacing spreadsheet-heavy workflows with durable data platforms

Tech stack

Languages: Python, SQL, R
Data Engineering: PostgreSQL, pandas, PySpark, DuckDB, dbt, SQLMesh
Cloud & Infra: Azure, Docker, Linux, GitHub, Bitbucket
Azure: ADLS Gen2, Azure Functions, Event Grid, Event Hub, Azure PostgreSQL
Apps & APIs: FastAPI, Streamlit, R Shiny, Plumber, Plotly
ML / Analytics: scikit-learn, XGBoost, PyTorch, statistical modeling
Orchestration: Mage AI, scheduled pipelines, event-driven workflows

Current focus

I am especially interested in data platform work where pipelines are treated as production systems, not one-off scripts. That includes clear contracts, schema evolution, validation, logging, retries, backfills, monitoring, and well-defined ownership of data products.

A lot of my work involves connecting lab data, metadata, files, models, reports, and operational events through a clear data model so that teams can trust how data moves through the system.

Selected themes from my work

  • Incremental API ingestion and webhook reconciliation
  • ELN, inventory, and operational data integration
  • Sample, run, file, and report-level traceability
  • Azure-based storage and event-driven processing
  • PostgreSQL-backed analytical and operational systems
  • Internal tools for scientific and business workflows
  • Data quality checks, logging, monitoring, and audit trails
  • ML-ready data generation and prediction reporting pipelines

Connect


Thanks for stopping by.

Pinned Loading

  1. career-ops career-ops Public

    AI job search pipeline with auto-apply β€” forked from santifer/career-ops with automated application submission via ATS APIs and Playwright

    JavaScript

  2. f1-data-lakehouse f1-data-lakehouse Public

    End-to-end F1 analytics platform on GCP. FastF1 + Jolpica API -> Cloud Functions -> GCS -> dbt -> BigQuery -> Looker Studio

    HCL

  3. f1-race-predictor f1-race-predictor Public

    F1 Race Predictor - ML-powered race outcome predictions using FastF1, XGBoost, and SHAP explanations

    Jupyter Notebook

  4. f1-race-replay f1-race-replay Public

    Forked from IAmTomShaw/f1-race-replay

    An interactive Formula 1 race visualisation and data analysis tool built with Python! 🏎️

    Python

  5. jowinjestine.github.io jowinjestine.github.io Public

    Portfolio Website

    JavaScript

  6. nmrglue nmrglue Public

    Forked from jjhelmus/nmrglue

    A module for working with NMR data in Python

    Python