Skip to content

ravindu-danthanarayana/SentiScope

Repository files navigation

🎬 YouTube Sentiment Insights

An end-to-end MLOps project that analyzes YouTube comments sentiment in real-time through a Chrome extension.

Python MLflow DVC Docker AWS


📖 Overview

YouTube Sentiment Insights automatically analyzes YouTube comments to provide sentiment analysis (Positive, Negative, Neutral) through a Chrome extension. Content creators can instantly understand audience feedback without manually reading thousands of comments.

Key Features

  • ⚡ Real-time sentiment analysis via Chrome extension
  • 📊 Visual analytics (pie charts, word clouds, trend graphs)
  • 🤖 LightGBM model with 72% accuracy
  • 🔄 Automated MLOps pipeline with DVC
  • 📈 Experiment tracking with MLflow
  • 🚀 CI/CD deployment on AWS

🛠️ Tech Stack

ML/MLOps: Python, Scikit-learn, LightGBM, MLflow, DVC, NLTK
Backend: Flask REST API
Frontend: Chrome Extension (HTML, CSS, JavaScript)
Cloud: AWS (EC2, ECR, S3)
DevOps: Docker, GitHub Actions


🔄 MLOps Workflow

┌─────────────────────────────────────────────────────────────────┐
│                         MLOPS PIPELINE                           │
└─────────────────────────────────────────────────────────────────┘

1. DATA MANAGEMENT
   │
   ├─► Data Collection (Reddit/YouTube Comments)
   ├─► Data Versioning (DVC)
   ├─► Preprocessing Pipeline
   └─► Storage (AWS S3)

2. EXPERIMENT TRACKING (MLflow)
   │
   ├─► Experiment 1: Baseline (Random Forest) → 64%
   ├─► Experiment 2: Vectorization (TF-IDF + Trigrams) → 65%
   ├─► Experiment 3: Feature Tuning (1000 features) → 66%
   ├─► Experiment 4: Imbalance Handling (SMOTE) → 68%
   └─► Experiment 5: Model Selection (LightGBM) → 72% ✓
   
3. PIPELINE AUTOMATION (DVC)
   │
   ├─► Data Ingestion
   ├─► Data Preprocessing
   ├─► Model Building
   ├─► Model Evaluation
   └─► Model Registration

4. MODEL REGISTRY
   │
   ├─► MLflow Model Registry
   ├─► Version Control (v1, v2, v3...)
   ├─► Stage Management (Staging → Production)
   └─► Artifact Storage (S3)

5. MODEL SERVING
   │
   ├─► Flask REST API
   ├─► Load Model from Registry
   └─► Endpoints: /predict, /chart, /wordcloud

6. CONTAINERIZATION
   │
   ├─► Dockerfile
   ├─► Build Docker Image
   └─► Push to ECR (AWS)

7. CI/CD (GitHub Actions)
   │
   ├─► Push Code to GitHub
   ├─► CI: Run Tests
   ├─► CD: Build & Push Docker Image
   └─► Deploy: Pull to EC2 → Run Container

8. PRODUCTION
   │
   ├─► EC2 Instance (Flask API)
   ├─► Chrome Extension (Frontend)
   └─► End Users

              ┌─────────────────┐
              │  Developer      │
              │  Push Code      │
              └────────┬────────┘
                       │
              ┌────────▼────────┐
              │  GitHub         │
              │  (Version Ctrl) │
              └────────┬────────┘
                       │
         ┌─────────────▼─────────────┐
         │  GitHub Actions (CI/CD)    │
         │  • Build Docker Image      │
         │  • Push to ECR             │
         │  • Deploy to EC2           │
         └─────────────┬──────────────┘
                       │
         ┌─────────────▼──────────────┐
         │  AWS EC2 (Production)      │
         │  • Flask API Running       │
         │  • Model Serving           │
         └─────────────┬──────────────┘
                       │
              ┌────────▼────────┐
              │  Chrome Ext     │
              │  (Users)        │
              └─────────────────┘

📁 Project Structure

youtube-sentiment-insights/
├── .github/workflows/cicd.yaml    # CI/CD pipeline
├── src/
│   ├── data/
│   │   ├── data_ingestion.py
│   │   └── data_preprocessing.py
│   └── model/
│       ├── model_building.py
│       ├── model_evaluation.py
│       └── model_registration.py
├── flask_api/main.py              # REST API
├── yt-chrome-plugin-frontend/     # Chrome extension
├── dvc.yaml                       # DVC pipeline
├── params.yaml                    # Model parameters
├── Dockerfile                     # Container config
└── requirements.txt               # Dependencies

🚀 Quick Start

# 1. Clone repository
git clone https://github.com/yourusername/youtube-sentiment-insights.git
cd youtube-sentiment-insights

# 2. Create environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Initialize DVC
dvc init

# 5. Run ML pipeline
dvc repro

# 6. Start Flask API
python flask_api/main.py

# 7. Install Chrome extension
# Go to chrome://extensions/ → Load unpacked → Select yt-chrome-plugin-frontend/

📊 Model Performance

Model Accuracy Precision Recall F1-Score
LightGBM 72% 0.71 0.70 0.70

Configuration:

  • Vectorizer: TF-IDF (trigrams)
  • Max Features: 1000
  • Imbalance: SMOTE oversampling
  • Hyperparameters: Tuned with Optuna

📡 API Endpoints

# Health check
GET /

# Predict sentiment
POST /predict
Body: {"comments": ["Great video!", "Bad quality"]}

# Generate visualizations
GET /generate_chart
GET /word_cloud
GET /generate_trends

☁️ Deployment

AWS Infrastructure:

  • EC2: Flask API server + MLflow server
  • ECR: Docker image registry
  • S3: Artifact storage
  • GitHub Actions: Automated CI/CD

Access: http://<EC2-PUBLIC-IP>:8080


🤝 Contributing

Contributions are welcome! Please fork the repository and submit a pull request.


📧 Contact

@ravindudanthanarayana

Star this repo if you find it helpful!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages