An end-to-end MLOps project that analyzes YouTube comments sentiment in real-time through a Chrome extension.
YouTube Sentiment Insights automatically analyzes YouTube comments to provide sentiment analysis (Positive, Negative, Neutral) through a Chrome extension. Content creators can instantly understand audience feedback without manually reading thousands of comments.
- ⚡ Real-time sentiment analysis via Chrome extension
- 📊 Visual analytics (pie charts, word clouds, trend graphs)
- 🤖 LightGBM model with 72% accuracy
- 🔄 Automated MLOps pipeline with DVC
- 📈 Experiment tracking with MLflow
- 🚀 CI/CD deployment on AWS
ML/MLOps: Python, Scikit-learn, LightGBM, MLflow, DVC, NLTK
Backend: Flask REST API
Frontend: Chrome Extension (HTML, CSS, JavaScript)
Cloud: AWS (EC2, ECR, S3)
DevOps: Docker, GitHub Actions
┌─────────────────────────────────────────────────────────────────┐
│ MLOPS PIPELINE │
└─────────────────────────────────────────────────────────────────┘
1. DATA MANAGEMENT
│
├─► Data Collection (Reddit/YouTube Comments)
├─► Data Versioning (DVC)
├─► Preprocessing Pipeline
└─► Storage (AWS S3)
2. EXPERIMENT TRACKING (MLflow)
│
├─► Experiment 1: Baseline (Random Forest) → 64%
├─► Experiment 2: Vectorization (TF-IDF + Trigrams) → 65%
├─► Experiment 3: Feature Tuning (1000 features) → 66%
├─► Experiment 4: Imbalance Handling (SMOTE) → 68%
└─► Experiment 5: Model Selection (LightGBM) → 72% ✓
3. PIPELINE AUTOMATION (DVC)
│
├─► Data Ingestion
├─► Data Preprocessing
├─► Model Building
├─► Model Evaluation
└─► Model Registration
4. MODEL REGISTRY
│
├─► MLflow Model Registry
├─► Version Control (v1, v2, v3...)
├─► Stage Management (Staging → Production)
└─► Artifact Storage (S3)
5. MODEL SERVING
│
├─► Flask REST API
├─► Load Model from Registry
└─► Endpoints: /predict, /chart, /wordcloud
6. CONTAINERIZATION
│
├─► Dockerfile
├─► Build Docker Image
└─► Push to ECR (AWS)
7. CI/CD (GitHub Actions)
│
├─► Push Code to GitHub
├─► CI: Run Tests
├─► CD: Build & Push Docker Image
└─► Deploy: Pull to EC2 → Run Container
8. PRODUCTION
│
├─► EC2 Instance (Flask API)
├─► Chrome Extension (Frontend)
└─► End Users
┌─────────────────┐
│ Developer │
│ Push Code │
└────────┬────────┘
│
┌────────▼────────┐
│ GitHub │
│ (Version Ctrl) │
└────────┬────────┘
│
┌─────────────▼─────────────┐
│ GitHub Actions (CI/CD) │
│ • Build Docker Image │
│ • Push to ECR │
│ • Deploy to EC2 │
└─────────────┬──────────────┘
│
┌─────────────▼──────────────┐
│ AWS EC2 (Production) │
│ • Flask API Running │
│ • Model Serving │
└─────────────┬──────────────┘
│
┌────────▼────────┐
│ Chrome Ext │
│ (Users) │
└─────────────────┘
youtube-sentiment-insights/
├── .github/workflows/cicd.yaml # CI/CD pipeline
├── src/
│ ├── data/
│ │ ├── data_ingestion.py
│ │ └── data_preprocessing.py
│ └── model/
│ ├── model_building.py
│ ├── model_evaluation.py
│ └── model_registration.py
├── flask_api/main.py # REST API
├── yt-chrome-plugin-frontend/ # Chrome extension
├── dvc.yaml # DVC pipeline
├── params.yaml # Model parameters
├── Dockerfile # Container config
└── requirements.txt # Dependencies
# 1. Clone repository
git clone https://github.com/yourusername/youtube-sentiment-insights.git
cd youtube-sentiment-insights
# 2. Create environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Initialize DVC
dvc init
# 5. Run ML pipeline
dvc repro
# 6. Start Flask API
python flask_api/main.py
# 7. Install Chrome extension
# Go to chrome://extensions/ → Load unpacked → Select yt-chrome-plugin-frontend/| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| LightGBM | 72% | 0.71 | 0.70 | 0.70 |
Configuration:
- Vectorizer: TF-IDF (trigrams)
- Max Features: 1000
- Imbalance: SMOTE oversampling
- Hyperparameters: Tuned with Optuna
# Health check
GET /
# Predict sentiment
POST /predict
Body: {"comments": ["Great video!", "Bad quality"]}
# Generate visualizations
GET /generate_chart
GET /word_cloud
GET /generate_trendsAWS Infrastructure:
- EC2: Flask API server + MLflow server
- ECR: Docker image registry
- S3: Artifact storage
- GitHub Actions: Automated CI/CD
Access: http://<EC2-PUBLIC-IP>:8080
Contributions are welcome! Please fork the repository and submit a pull request.
⭐ Star this repo if you find it helpful!