Skip to content

tamm/rss-anything

Repository files navigation

rss-anything

A Cloudflare Worker that generates RSS feeds from websites that don't have them.

It scrapes listing pages, uses Workers AI to identify articles, then fetches and parses each article into a proper RSS 2.0 feed — complete with images, excerpts, and full content.

How It Works

  1. Cron trigger runs hourly (configurable)
  2. Fetches the listing page for each configured feed
  3. Workers AI extracts article URLs, titles, and dates from the page text
  4. Scrapes new articles using HTMLRewriter to pull titles, content, and Open Graph metadata
  5. Stores articles in KV and serves them as RSS 2.0 feeds

Quick Start

Prerequisites

Setup

git clone https://github.com/tamm/rss-anything.git
cd rss-anything
npm install

# Copy the example configs
cp wrangler.toml.example wrangler.toml
cp .env.example .env

Edit .env with your Cloudflare API token.

Create a KV namespace and update wrangler.toml:

npx wrangler kv namespace create KV
# Copy the output id into wrangler.toml

Development

npm run dev
# Visit http://localhost:8787

Adding a Feed

Feeds are defined in src/config.ts. Each feed needs:

Field Description
slug URL-safe identifier (e.g. "anthropic-news")
title Human-readable feed title
description Short description
listUrl The listing/index page to monitor
baseUrl Base URL for resolving relative links
titleSelector CSS selector for the article title on article pages
contentSelector CSS selector for article content on article pages
maxArticles Maximum articles to keep in the feed

Example

{
  slug: "anthropic-news",
  title: "Anthropic News",
  description: "Latest news and announcements from Anthropic",
  listUrl: "https://www.anthropic.com/news",
  baseUrl: "https://www.anthropic.com",
  titleSelector: "h1",
  contentSelector: "main",
  maxArticles: 20,
}

How AI Extraction Works

The listing page is fetched and all text content + links are extracted using HTMLRewriter. This data is sent to a Workers AI model (Llama 3.1 8B by default) with a prompt asking it to identify the chronological article entries and return structured JSON with titles, URLs, and dates.

This approach means you don't need to write fragile CSS selectors for every listing page — the AI handles the varying layouts.

Deploying

npm run deploy

The worker will be available at https://rss-anything.<your-subdomain>.workers.dev. Feeds are accessible at /feed/<slug>.

Custom Domain

To serve feeds from a custom domain, uncomment and edit the routes section in wrangler.toml.

Configuration

Tuneable defaults are in src/defaults.ts:

Constant Default Description
USER_AGENT "rss-anything/1.0" User-Agent for outgoing requests
AI_TEXT_LIMIT 4000 Max characters of page text sent to AI
AI_MAX_TOKENS 2000 Max tokens the AI may generate
AI_MODEL "@cf/meta/llama-3.1-8b-instruct" Workers AI model
FEED_CACHE_MAX_AGE 900 Cache-Control max-age (seconds)
MOUNT_PREFIX "/rss" Path prefix for custom domain routing
FETCH_TIMEOUT 15000 Timeout (ms) for outgoing requests

Development

npm run dev          # Start local dev server
npm test             # Run tests
npm run test:watch   # Run tests in watch mode
npm run typecheck    # Type-check without emitting

Licence

MIT

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors