Skip to content
View tarekmasryo's full-sized avatar

Block or report tarekmasryo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
tarekmasryo/README.md

Tarek Masryo Banner

Typing SVG

AI/ML Engineer building production-ready ML and Generative AI systems across modeling, serving, evaluation, monitoring, and decision support.
From validated data pipelines → model evaluation → deployed APIs → decision-support systems.

Kaggle Datasets Grandmaster Kaggle Notebooks Master

GitHub Website LinkedIn Kaggle

Hugging Face Streamlit Repositories


🧭 What I build

Area What it means in practice
Production ML systems Leakage-safe pipelines, calibrated outputs, threshold policies, reproducible artifacts, and decision-ready outputs
Generative AI & RAG Retrieval attribution, hallucination exposure, tool-calling agents, structured outputs, and quality checks before rollout
APIs & serving Dockerized FastAPI services, strict schemas, versioned artifacts, and CI-friendly delivery
Monitoring & ops Telemetry, drift signals, cost/latency trade-offs, triage thresholds, and operator-facing workflows with clear handoff logic
Applied NLP & CV Text classification, semantic search, threshold tuning, explainability, image restoration, and practical computer vision apps

🌟 Featured work

Project What it demonstrates
Fraud Risk Ops Platform FastAPI inference, Streamlit analyst UI, calibrated risk scores, threshold policies, audit logs, batch jobs, and monitoring hooks
RAG QA Command Center Retrieval quality evaluation, hallucination exposure, trace review, configuration trade-offs, and policy simulation
LLMOps Telemetry Command Center Reliability, latency, cost, routing-policy review, drift signals, triage thresholds, and evidence exports
Advanced ML Sentiment Lab TF-IDF pipelines, ROC/PR evaluation, threshold tuning, error review, live prediction, and exportable artifacts
Health Intelligence Platform Behavioral risk analytics, cohort KPIs, threshold diagnostics, feature importance, trends, and scenario simulation
Old Photo Restorer Gradio computer vision app with GFPGAN restoration, optional upscaling, before/after preview, and batch ZIP export

📊 Analytics & decision apps

Project Focus
Short-Video Intelligence Dashboard Virality scoring, engagement metrics, creator leaderboards, timing patterns, and segment benchmarks
EV Charging Analytics Geospatial infrastructure analytics, fast-DC allocation scenarios, market slices, and network planning
Football Matches Dashboard European football and UCL analytics: KPIs, standings, team explorer, head-to-head, and interactive match tables
Seaborn & Matplotlib Visual Lab Interactive Streamlit lab to build, compare, and export Seaborn vs Matplotlib charts with UI controls and generated code snippets
Hugging Face QuickStart Tool Gradio tool that converts model/repo URLs into run commands, download snippets, file views, risk hints, and ZIP scaffolds

🧠 Selected ML, NLP & healthcare-style workflows

Project Focus
Road Accident Risk Prediction Two-stage risk scoring with LightGBM, XGBoost residual modeling, NNLS blending, stable OOF evaluation, and interpretable risk features
Cancer Risk Analysis Clean tabular data, validation, leakage-aware benchmarking, and interpretable risk modeling for educational analytical use
Clinical Deterioration Early Warning 12-hour deterioration baseline with tabular models, probability ensembling, and cost-based threshold policy tables
Pima Diabetes Pipeline End-to-end diabetes risk pipeline with EDA, feature engineering, calibration, cost-aware thresholding, and deployable artifacts
SMS Spam Detection Dual TF-IDF pipeline with calibrated Linear SVM, nested CV, threshold tuning, explainability, and robustness checks
Text Sentiment Analysis IMDB sentiment pipeline with calibrated TF-IDF baselines, threshold tuning, explainability, and BiLSTM baseline

📦 Selected data products

Dataset What it enables
RAG QA Logs & Corpus RAG evaluation with QA logs, retrieval events, corpus documents, and evidence-style review workflows
LLM Production Telemetry Offline LLMOps telemetry for reliability, latency, cost, routing, drift, and triage-policy review
Cancer Risk Factors Health, lifestyle, environmental, and genetic features for leakage-aware risk modeling
Global EV Infrastructure Standardized EV charging data for geospatial analytics, planning, and network modeling
YouTube Shorts & TikTok Trends 2025 Short-form content analytics, trend exploration, creator benchmarks, and virality analysis
Digital Lifestyle & Mental Wellness Behavioral signals for wellbeing analytics, cohort exploration, and predictive workflows

🛠️ Stack

Category Tools
Languages & Core Python SQL C++ Bash Git Linux
Data & Analytics NumPy Pandas Polars DuckDB Jupyter
ML / DL scikit-learn XGBoost LightGBM PyTorch TensorFlow
NLP / CV / LLM Hugging Face Transformers OpenCV LangChain LlamaIndex LangGraph FAISS pgvector Ollama vLLM
Apps & Interfaces Streamlit Plotly Matplotlib Seaborn Gradio React PyDeck
APIs & Serving FastAPI Pydantic SQLAlchemy Alembic ONNX Docker Postgres Redis RQ
Monitoring & Quality MLflow OpenTelemetry Prometheus GitHub Actions pytest Ruff mypy

🤝 Open to collaborating on

  • 🚀 Production ML & GenAI systems: FastAPI services, Dockerized delivery, evaluation-first workflows, and review-ready outputs
  • 🧠 RAG reliability: retrieval attribution, grounded outputs, guardrails, and regression-friendly review
  • 🗂️ Validated data products: clean schemas, documented pipelines, reusable notebooks, and ML-ready artifacts
  • 📊 Decision-support tooling: monitoring, threshold policies, analytics interfaces, and operator workflows

Best contact: LinkedIn

If the work is useful, a ⭐ helps others find it.

Footer Banner

Pinned Loading

  1. advanced-ml-sentiment-lab advanced-ml-sentiment-lab Public

    Advanced Streamlit + Plotly sentiment analysis lab for TF-IDF word/char features, multi-model training, ROC/PR-AUC evaluation, cost-aware threshold tuning, error analysis, and live prediction.

    Python 8

  2. fraud-risk-ops-platform fraud-risk-ops-platform Public

    Production-structured fraud risk operations platform with FastAPI, Streamlit, policy-driven decisions, audit logging, batch jobs, and monitoring hooks.

    Python 5

  3. rag-qa-command-center rag-qa-command-center Public

    RAG QA command center for retrieval quality, hallucination exposure, config trade-offs, trace review, and review-policy simulation.

    Python 2

  4. llmops-telemetry-command-center llmops-telemetry-command-center Public

    Decision-ready LLMOps telemetry dashboard for reliability, latency, cost, routing-policy review, triage thresholds, drift signals, and evidence exports.

    Python 1

  5. pima-diabetes-pipeline pima-diabetes-pipeline Public

    End-to-end diabetes risk prediction pipeline (Pima): EDA → feature engineering → calibration + cost-aware threshold → deployable artifacts.

    Jupyter Notebook 8

  6. tarekmasryo.github.io tarekmasryo.github.io Public

    Tarek Masryo — AI/ML Engineer Portfolio

    JavaScript 2