diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 000000000..fd53fb4ba --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,125 @@ +# RD-Agent Copilot Instructions + +RD-Agent is a Microsoft Research R&D automation framework that automates the **Research (propose hypotheses) → Development (implement)** cycle for data-driven ML tasks. Primary scenarios: quantitative finance (factor/model development via Qlib), data science/Kaggle competitions, and LLM fine-tuning. + +## Build, Test & Lint + +### Setup +```bash +make dev # install all optional deps + pre-commit hook +make install # editable install only +``` + +### Testing +```bash +make test # run all tests with coverage +make test-offline # offline tests only (no LLM API calls) +pytest test/path/to/test_file.py::TestClass::test_method # single test +pytest -m offline # run only @pytest.mark.offline tests +``` + +Mark tests that don't call external APIs with `@pytest.mark.offline`. The `workspace/` directory is excluded from test discovery. Coverage threshold is 80%. + +### Linting +```bash +make lint # mypy + ruff + isort + black + toml-sort (check only) +make auto-lint # auto-fix isort + black + toml-sort +make mypy # type check rdagent/core (scope is expanding) +make ruff # lint rdagent/core +``` + +- **Line length:** 120 (black + ruff) +- **isort profile:** black +- **mypy:** strict (`disallow_untyped_defs=true`, `warn_return_any=true`) — currently scoped to `rdagent/core/` +- Conventional commit prefixes required on PRs: `feat:`, `fix:`, `refactor:`, `test:`, `docs:`, `chore:`, etc. + +## Architecture + +### Package Layout + +``` +rdagent/ +├── core/ # Abstract base classes and framework interfaces +├── components/ # Reusable building blocks (coder, runner, proposal, workflow, …) +├── scenarios/ # Domain-specific implementations (qlib, kaggle, data_science, …) +├── app/ # Entry points per scenario (qlib_rd_loop, data_science, kaggle, …) +├── oai/ # LLM backend abstraction (LiteLLM default, OpenAI, Azure) +├── log/ # Logging (loguru-based), Streamlit trace viewer UI +└── utils/ # Env execution, workflow loop, repo/blob utilities +``` + +### The R&D Loop + +Each scenario runs a `LoopBase` (via `LoopMeta`) that orchestrates these steps: + +``` +HypothesisGen → Hypothesis2Experiment → Developer (coder) → Developer (runner) → Experiment2Feedback + ↑ | + └────────────────────── Trace (accumulates history) ───────────────────────────────┘ +``` + +`RDLoop` in `rdagent/components/workflow/rd_loop.py` is the canonical implementation. Each component is dynamically loaded by class path from a `BasePropSetting` config object. + +### Core Abstractions (`rdagent/core/`) + +| Class | Role | +|---|---| +| `Scenario` | Domain context — provides background, runtime environment description | +| `Developer[Exp]` | Transforms an experiment **in-place** (coder or runner) | +| `Evaluator` / `IterEvaluator` | Produces `Feedback`; iter variant uses coroutine `yield`/`send` | +| `EvolvingStrategy[T]` | Algorithm for one evolution iteration; yields partial states | +| `RAGStrategy[T]` | Manages a `EvolvingKnowledgeBase`: query, generate, dump, load | +| `Hypothesis` | Research idea with `hypothesis`, `reason`, `concise_*` fields | +| `ExperimentFeedback` | Boolean `decision` + `reason`; `bool(fb)` is `fb.decision` | +| `Workspace[Task, FB]` | Mutable container for code/data during task execution | +| `EvoStep[T]` | Dataclass: `evolvable_subjects`, `queried_knowledge`, `feedback` | + +**Critical Developer contract:** `develop(exp)` mutates `exp` in-place. Do not return a new object. + +### Configuration + +All settings use **Pydantic v2 `BaseSettings`** via `ExtendedBaseSettings` (supports env-prefix inheritance across base classes): + +```python +from rdagent.core.conf import ExtendedBaseSettings, RD_AGENT_SETTINGS +from rdagent.oai.llm_conf import LLM_SETTINGS +``` + +Settings are overridable via environment variables or a `.env` file in the working directory (auto-loaded by the CLI). Key settings: +- `RD_AGENT_SETTINGS.workspace_path` — where experiment artifacts are written (`git_ignore_folder/` by default) +- `LLM_SETTINGS.backend` — LLM provider class path (default: `rdagent.oai.backend.LiteLLMAPIBackend`) +- `LLM_SETTINGS.chat_model` / `embedding_model` + +### LLM Access + +```python +from rdagent.oai.llm_utils import APIBackend +response = APIBackend().build_messages_and_create_chat_completion(...) +embeddings = APIBackend().create_embedding(str_list) +``` + +`APIBackend` is a factory that returns the configured backend. LLM calls are cached via MD5-keyed pickle when `LLM_SETTINGS.use_chat_cache = True`. + +### Prompts + +Prompts live in YAML files alongside the module that uses them. They are loaded via `Prompts(file_path)` (a `SingletonBaseClass` dict). Jinja2-style templating is used for variable substitution. Prompt files are typically at `rdagent/{component}/prompts.yaml`. + +### Logging + +```python +from rdagent.log import rdagent_logger as logger +logger.info("message") +logger.log_object(obj, tag="label") # structured object logging +``` + +`rdagent_logger` is a global `RDAgentLog` singleton backed by loguru. Use `logger.log_object()` for structured data (scenarios, settings, experiments). The Streamlit UI (`rdagent ui`) reads these logs for visualization. + +## Key Conventions + +- **`from __future__ import annotations`** at the top of every module — required for forward references with mypy. +- **TypeVars are bound**: `ASpecificExp = TypeVar("ASpecificExp", bound=Experiment)`. Follow this pattern when adding generics. +- **`SingletonBaseClass`**: kwargs-only construction enforced. Do not pass positional args to singletons. +- **`import_class(dotted.path)`** from `rdagent.core.utils` — used everywhere to dynamically load scenario components from config strings. Use this rather than direct imports when the class path is user-configurable. +- **`git_ignore_folder/`** — runtime workspace output; already gitignored. Experiment artifacts, pickle caches, and checkpoints go here. +- **Parallel loops**: `RD_AGENT_SETTINGS.step_semaphore` controls concurrency. When `> 1`, subprocesses are used automatically. +- **Ruff ignore list**: `ANN401`, `D` (docstrings), `ERA001`, `T20` (print), `S101` (assert), `TD`/`FIX` (todos) are intentionally suppressed project-wide. diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 000000000..3bbca0473 --- /dev/null +++ b/Dockerfile @@ -0,0 +1,33 @@ +FROM node:20-bookworm-slim AS frontend-builder + +WORKDIR /app/web +COPY web/package.json web/package-lock.json ./ +RUN npm install --legacy-peer-deps +COPY web/ ./ +RUN npm run build:flask + + +FROM python:3.11-slim AS runtime + +ENV PYTHONDONTWRITEBYTECODE=1 \ + PYTHONUNBUFFERED=1 \ + PIP_NO_CACHE_DIR=1 \ + SETUPTOOLS_SCM_PRETEND_VERSION_FOR_RDAGENT=0.0.0 \ + PYTHONPATH=/app + +WORKDIR /app + +RUN apt-get update && apt-get install -y --no-install-recommends \ + build-essential \ + git \ + && rm -rf /var/lib/apt/lists/* + +COPY . /app +COPY --from=frontend-builder /app/git_ignore_folder/static /app/git_ignore_folder/static + +RUN python -m pip install --upgrade pip setuptools wheel && \ + python -m pip install . + +EXPOSE 19899 + +CMD sh -c 'python -m rdagent.app.cli server_ui --port "${PORT:-19899}"' diff --git a/README.md b/README.md index aed48bd47..ad5b5b68a 100644 --- a/README.md +++ b/README.md @@ -376,6 +376,10 @@ rdagent server_ui --port 19899 After that, open `http://127.0.0.1:19899` in your browser. +#### Railway Deployment + +The repository now includes `nixpacks.toml` and `railway.json` for deploying `rdagent server_ui` on Railway. If you want uploads, trace artifacts, and stdout logs to survive container replacement, enable the Supabase-backed persistence settings documented in [docs/deployment/railway.md](docs/deployment/railway.md). + #### Common Notes Port `19899` is used in the examples above. Before starting either UI, check whether this port is already occupied. If it is, please change it to another available port. diff --git a/docs/deployment/railway.md b/docs/deployment/railway.md new file mode 100644 index 000000000..f44003e13 --- /dev/null +++ b/docs/deployment/railway.md @@ -0,0 +1,116 @@ +# Deploying RD-Agent to Railway with Supabase + +This repository now includes a Railway-ready `nixpacks.toml` and `railway.json` for deploying the Flask-backed Web UI: + +- backend entrypoint: `rdagent server_ui` +- frontend build: `web/` via `npm run build:flask` +- persistent artifacts: optional Supabase-backed upload/trace/stdout sync + +## What this deployment supports today + +This deployment path runs the existing `server_ui` service on Railway. + +- The Vue frontend is built into `git_ignore_folder/static` +- Flask serves the built frontend and real-time APIs +- RD-Agent jobs still run as subprocesses launched by the Flask service +- If Supabase persistence is enabled, uploaded files, trace artifacts, and stdout logs are mirrored to Supabase Storage and can be hydrated back after container replacement + +This is the shortest path to getting the current project online on Railway without rewriting the execution model. + +## 1. Create the Railway service + +1. Create a new Railway project from this repository. +2. Keep the service root at the repository root. +3. Railway will pick up `nixpacks.toml` automatically. + +The service starts with: + +```bash +rdagent server_ui --port $PORT +``` + +## 2. Create Supabase resources + +Create one Supabase project and at least one Storage bucket for RD-Agent artifacts. + +Recommended setup: + +- **Postgres**: optional for future task metadata / queue state +- **Storage bucket**: required for persisted artifacts + +Suggested bucket name: + +```text +rdagent +``` + +## 3. Configure Railway environment variables + +### Required for the web service + +```bash +PORT # Provided by Railway +UI_STATIC_PATH=./git_ignore_folder/static +UI_TRACE_FOLDER=./git_ignore_folder/traces +``` + +### Enable Supabase-backed artifact persistence + +```bash +UI_SUPABASE_ENABLED=true +UI_SUPABASE_URL=https://.supabase.co +UI_SUPABASE_SERVICE_ROLE_KEY= +UI_SUPABASE_BUCKET=rdagent +``` + +Optional path prefixes inside the bucket: + +```bash +UI_SUPABASE_TRACE_PREFIX=traces +UI_SUPABASE_STDOUT_PREFIX=stdout +UI_SUPABASE_UPLOAD_PREFIX=uploads +``` + +### LLM / provider configuration + +Set the same provider credentials you already use locally. Common examples: + +```bash +LLM_BACKEND=rdagent.oai.backend.LiteLLMAPIBackend +OPENAI_API_KEY= +OPENAI_API_BASE= +CHAT_MODEL= +EMBEDDING_MODEL= +``` + +If you use Azure or other providers, set the corresponding `LLM_SETTINGS` environment variables used by this repo. + +## 4. Deploy + +Once the environment variables are set, trigger a deploy in Railway. + +The build pipeline will: + +1. install Python dependencies +2. install frontend dependencies in `web/` +3. build the Vue frontend into `git_ignore_folder/static` +4. start `rdagent server_ui` + +## 5. Validate the deployment + +After Railway finishes deploying: + +1. open the Railway public URL +2. confirm the frontend loads +3. start a task from the UI +4. confirm traces stream normally +5. after the task produces output, verify that artifacts appear in Supabase Storage under: + - `uploads/...` + - `traces/...` + - `stdout/...` + +## Current architecture notes + +- The current codebase still executes RD-Agent jobs as subprocesses inside the same `server_ui` service. +- Supabase persistence now protects uploaded files, trace logs, and stdout artifacts against container replacement. +- A fully split Web/API + Worker deployment would require an additional queue / orchestration layer; that is not wired in this repository yet. diff --git a/nixpacks.toml b/nixpacks.toml new file mode 100644 index 000000000..b29ee7058 --- /dev/null +++ b/nixpacks.toml @@ -0,0 +1,16 @@ +[phases.setup] +nixPkgs = ["python311", "nodejs_20"] + +[phases.install] +cmds = [ + "python3 -m ensurepip --upgrade", + "python3 -m pip install --upgrade pip setuptools wheel", + "python3 -m pip install .", + "cd web && npm ci" +] + +[phases.build] +cmds = ["cd web && npm run build:flask"] + +[start] +cmd = "rdagent server_ui --port ${PORT:-19899}" diff --git a/railway.json b/railway.json new file mode 100644 index 000000000..34b79ae22 --- /dev/null +++ b/railway.json @@ -0,0 +1,12 @@ +{ + "$schema": "https://railway.com/railway.schema.json", + "build": { + "builder": "DOCKERFILE" + }, + "deploy": { + "healthcheckPath": "/", + "healthcheckTimeout": 300, + "restartPolicyType": "ON_FAILURE", + "restartPolicyMaxRetries": 10 + } +} diff --git a/rdagent/log/server/app.py b/rdagent/log/server/app.py index 575f14bcf..a4f628410 100644 --- a/rdagent/log/server/app.py +++ b/rdagent/log/server/app.py @@ -15,6 +15,12 @@ from flask_cors import CORS from werkzeug.utils import secure_filename +from rdagent.log.server.persistence import ( + SupabaseStorageClient, + stdout_object_path, + trace_object_prefix, + upload_object_path, +) from rdagent.log.storage import FileStorage from rdagent.log.ui.conf import UI_SETTING from rdagent.log.ui.storage import WebStorage @@ -23,6 +29,9 @@ CORS(app) app.config["UI_SERVER_PORT"] = 19899 +SUPABASE_STORAGE = SupabaseStorageClient() +TRACE_SYNC_INTERVAL_SECONDS = 15 + _YELLOW = "\033[33m" _RESET = "\033[0m" @@ -76,6 +85,7 @@ def __init__( # NOTE: Use multiprocessing.Queue because rdagent is started as a separate process. self.user_request_q: Queue = Queue(maxsize=1024) self.user_response_q: Queue = Queue(maxsize=1024) + self.last_synced_at: datetime | None = None if create_process: self.process = Process( @@ -172,6 +182,76 @@ def _run(self) -> None: log_folder_path = Path(UI_SETTING.trace_folder).absolute() +def _trace_root() -> Path: + return Path(UI_SETTING.trace_folder).absolute() + + +def _relative_trace_id(trace_id: str | Path) -> str: + trace_value = Path(str(trace_id).strip()) + trace_root = _trace_root() + if trace_value.is_absolute(): + try: + return trace_value.relative_to(trace_root).as_posix() + except ValueError: + return trace_value.as_posix().strip("/") + return trace_value.as_posix().strip("/") + + +def _stdout_path_from_trace_id(trace_id: str) -> Path | None: + trace_parts = Path(trace_id).parts + if len(trace_parts) < 2: + return None + return _trace_root() / trace_parts[0] / f"{trace_parts[-1]}.log" + + +def _sync_task_artifacts(task: RDAgentTask, *, force: bool = False) -> None: + if not SUPABASE_STORAGE.is_enabled(): + return + + now = datetime.now(timezone.utc) + if ( + not force + and task.last_synced_at is not None + and (now - task.last_synced_at).total_seconds() < TRACE_SYNC_INTERVAL_SECONDS + ): + return + + relative_trace_id = _relative_trace_id(task.log_trace_path) + trace_path = Path(task.log_trace_path) + stdout_path = Path(task.stdout_path) + + if trace_path.exists(): + SUPABASE_STORAGE.upload_directory(trace_path, trace_object_prefix(relative_trace_id)) + if stdout_path.exists(): + SUPABASE_STORAGE.upload_file(stdout_path, stdout_object_path(relative_trace_id)) + + task.last_synced_at = now + + +def _hydrate_trace_dir(trace_path: Path, trace_id: str) -> bool: + if not SUPABASE_STORAGE.is_enabled(): + return False + + relative_trace_id = _relative_trace_id(trace_id) + if not relative_trace_id: + return False + + SUPABASE_STORAGE.download_directory(trace_object_prefix(relative_trace_id), trace_path) + return True + + +def _hydrate_stdout_file(trace_id: str, stdout_path: Path) -> bool: + if not SUPABASE_STORAGE.is_enabled(): + return False + + relative_trace_id = _relative_trace_id(trace_id) + if not relative_trace_id: + return False + + SUPABASE_STORAGE.download_file(stdout_object_path(relative_trace_id), stdout_path) + return True + + def _drain_user_requests_into_messages(task: RDAgentTask) -> None: """Move a single pending user-interaction request into `task.messages`. @@ -233,11 +313,14 @@ def _resolve_stdout_path(trace_id: str) -> Path | None: return None task = rdagent_processes.get(str(log_folder_path / normalized_trace_id)) - if task is None or not task.stdout_path: + stdout_path = ( + Path(task.stdout_path).resolve() + if task is not None and task.stdout_path + else _stdout_path_from_trace_id(normalized_trace_id) + ) + if stdout_path is None: return None - stdout_path = Path(task.stdout_path).resolve() - try: if os.path.commonpath([str(stdout_path), str(log_folder_path)]) != str(log_folder_path): return None @@ -248,6 +331,12 @@ def _resolve_stdout_path(trace_id: str) -> Path | None: def read_trace(log_path: Path, id: str = "") -> None: + if not log_path.exists(): + try: + _hydrate_trace_dir(log_path, id or str(log_path)) + except requests.RequestException as e: + app.logger.warning(f"Failed to hydrate trace '{id}': {e}") + fs = FileStorage(log_path) ws = WebStorage(port=1, path=log_path) task = _get_or_create_task(id) @@ -294,11 +383,23 @@ def update_trace(): trace_id = str(log_folder_path / trace_id) task = _get_or_create_task(trace_id) + if not task.messages and task.process is None: + read_trace(Path(trace_id), id=trace_id) # Make sure any pending user-interaction requests are visible to the frontend. _drain_user_requests_into_messages(task) + if task.process is not None and task.is_alive(): + try: + _sync_task_artifacts(task) + except requests.RequestException as e: + app.logger.warning(f"Failed to sync running task '{trace_id}' to Supabase: {e}") + if task.process is not None and not task.is_alive(): + try: + _sync_task_artifacts(task, force=True) + except requests.RequestException as e: + app.logger.warning(f"Failed to sync completed task '{trace_id}' to Supabase: {e}") if not task.messages or task.messages[-1].get("tag") != "END": task.messages.append( { @@ -337,7 +438,12 @@ def download_stdout_file(): if stdout_path is None: return jsonify({"error": "Trace ID is required or invalid"}), 400 if not stdout_path.exists() or not stdout_path.is_file(): - return jsonify({"error": "Stdout file not found"}), 404 + try: + _hydrate_stdout_file(trace_id, stdout_path) + except requests.RequestException as e: + app.logger.warning(f"Failed to hydrate stdout for '{trace_id}': {e}") + if not stdout_path.exists() or not stdout_path.is_file(): + return jsonify({"error": "Stdout file not found"}), 404 return send_file( stdout_path, @@ -381,6 +487,15 @@ def upload_file(): if not p.exists(): p.mkdir(parents=True, exist_ok=True) file.save(target_path) + if SUPABASE_STORAGE.is_enabled(): + try: + SUPABASE_STORAGE.upload_file( + target_path, + upload_object_path(scenario, trace_name, sanitized_filename), + ) + except requests.RequestException as e: + app.logger.warning(f"Failed to sync upload '{target_path}' to Supabase: {e}") + return jsonify({"error": f"Failed to persist upload to Supabase: {e}"}), 500 else: return jsonify({"error": "Invalid file path"}), 400 @@ -518,6 +633,10 @@ def control_process(): try: if task.is_alive(): task.stop() + try: + _sync_task_artifacts(task, force=True) + except requests.RequestException as e: + app.logger.warning(f"Failed to sync stopped task '{id}' to Supabase: {e}") if not task.messages or task.messages[-1].get("tag") != "END": task.messages.append( diff --git a/rdagent/log/server/persistence.py b/rdagent/log/server/persistence.py new file mode 100644 index 000000000..1c68b3276 --- /dev/null +++ b/rdagent/log/server/persistence.py @@ -0,0 +1,146 @@ +from __future__ import annotations + +import json +import mimetypes +from pathlib import Path, PurePosixPath +from urllib.parse import quote + +import requests +from pydantic_settings import SettingsConfigDict + +from rdagent.core.conf import ExtendedBaseSettings + + +class UISupabaseSettings(ExtendedBaseSettings): + model_config = SettingsConfigDict(env_prefix="UI_SUPABASE_", protected_namespaces=()) + + enabled: bool = False + url: str = "" + service_role_key: str = "" + bucket: str = "rdagent" + trace_prefix: str = "traces" + stdout_prefix: str = "stdout" + upload_prefix: str = "uploads" + request_timeout: int = 30 + + def is_enabled(self) -> bool: + return self.enabled and bool(self.url and self.service_role_key and self.bucket) + + +UI_SUPABASE_SETTINGS = UISupabaseSettings() + + +def _join_object_path(*parts: str) -> str: + normalized_parts = [part.strip("/") for part in parts if part and part.strip("/")] + return str(PurePosixPath(*normalized_parts)) + + +def trace_object_prefix(trace_id: str, settings: UISupabaseSettings = UI_SUPABASE_SETTINGS) -> str: + return _join_object_path(settings.trace_prefix, trace_id) + + +def stdout_object_path(trace_id: str, settings: UISupabaseSettings = UI_SUPABASE_SETTINGS) -> str: + return _join_object_path(settings.stdout_prefix, f"{trace_id}.log") + + +def upload_object_path( + scenario: str, + trace_name: str, + filename: str, + settings: UISupabaseSettings = UI_SUPABASE_SETTINGS, +) -> str: + return _join_object_path(settings.upload_prefix, scenario, trace_name, filename) + + +class SupabaseStorageClient: + def __init__(self, settings: UISupabaseSettings = UI_SUPABASE_SETTINGS) -> None: + self.settings = settings + + def is_enabled(self) -> bool: + return self.settings.is_enabled() + + def upload_bytes( + self, + payload: bytes, + object_path: str, + *, + content_type: str = "application/octet-stream", + ) -> None: + response = requests.post( + self._object_url(object_path), + headers=self._headers(content_type=content_type), + data=payload, + timeout=self.settings.request_timeout, + ) + response.raise_for_status() + + def upload_file(self, local_path: str | Path, object_path: str) -> None: + path = Path(local_path) + content_type, _ = mimetypes.guess_type(str(path)) + self.upload_bytes( + path.read_bytes(), + object_path, + content_type=content_type or "application/octet-stream", + ) + + def upload_directory(self, local_dir: str | Path, object_prefix: str) -> list[str]: + root = Path(local_dir) + uploaded_files: list[str] = [] + + for file_path in sorted(path for path in root.rglob("*") if path.is_file()): + rel_path = file_path.relative_to(root).as_posix() + self.upload_file(file_path, _join_object_path(object_prefix, rel_path)) + uploaded_files.append(rel_path) + + self.upload_bytes( + json.dumps(uploaded_files, indent=2).encode("utf-8"), + _join_object_path(object_prefix, "__manifest__.json"), + content_type="application/json", + ) + return uploaded_files + + def download_file(self, object_path: str, local_path: str | Path) -> Path: + response = requests.get( + self._object_url(object_path), + headers=self._headers(), + timeout=self.settings.request_timeout, + ) + response.raise_for_status() + + target_path = Path(local_path) + target_path.parent.mkdir(parents=True, exist_ok=True) + target_path.write_bytes(response.content) + return target_path + + def download_directory(self, object_prefix: str, local_dir: str | Path) -> list[Path]: + local_root = Path(local_dir) + manifest_path = local_root / "__manifest__.json" + self.download_file(_join_object_path(object_prefix, "__manifest__.json"), manifest_path) + + uploaded_files = json.loads(manifest_path.read_text(encoding="utf-8")) + if not isinstance(uploaded_files, list): + error_message = f"Invalid manifest for {object_prefix}: expected list, got {type(uploaded_files)!r}" + raise ValueError(error_message) + + downloaded_paths: list[Path] = [] + for rel_path in uploaded_files: + if not isinstance(rel_path, str): + error_message = f"Invalid manifest entry for {object_prefix}: expected str, got {type(rel_path)!r}" + raise ValueError(error_message) + downloaded_paths.append(self.download_file(_join_object_path(object_prefix, rel_path), local_root / rel_path)) + + return downloaded_paths + + def _headers(self, *, content_type: str | None = None) -> dict[str, str]: + headers = { + "apikey": self.settings.service_role_key, + "Authorization": f"Bearer {self.settings.service_role_key}", + "x-upsert": "true", + } + if content_type is not None: + headers["Content-Type"] = content_type + return headers + + def _object_url(self, object_path: str) -> str: + object_key = quote(object_path, safe="/") + return f"{self.settings.url.rstrip('/')}/storage/v1/object/{self.settings.bucket}/{object_key}" diff --git a/requirements.txt b/requirements.txt index 3cc19cc71..e83312c40 100644 --- a/requirements.txt +++ b/requirements.txt @@ -22,6 +22,7 @@ matplotlib langchain langchain-community tiktoken +requests pymupdf # Extract shotsreens from pdf # PDF related @@ -83,4 +84,4 @@ prefect datasets # DuckDuckGo search -duckduckgo-search \ No newline at end of file +duckduckgo-search diff --git a/test/utils/test_supabase_persistence.py b/test/utils/test_supabase_persistence.py new file mode 100644 index 000000000..be03c531f --- /dev/null +++ b/test/utils/test_supabase_persistence.py @@ -0,0 +1,101 @@ +import json +import tempfile +import unittest +from pathlib import Path + +import pytest + +from rdagent.log.server.persistence import ( + SupabaseStorageClient, + UISupabaseSettings, + stdout_object_path, + trace_object_prefix, + upload_object_path, +) + + +class DummySupabaseClient(SupabaseStorageClient): + def __init__(self) -> None: + super().__init__( + UISupabaseSettings( + enabled=True, + url="https://example.supabase.co", + service_role_key="service-role-key", + bucket="rdagent-artifacts", + ) + ) + self.remote_payloads: dict[str, bytes] = {} + + def upload_bytes( + self, + payload: bytes, + object_path: str, + *, + content_type: str = "application/octet-stream", + ) -> None: + self.remote_payloads[object_path] = payload + + def download_file(self, object_path: str, local_path: str | Path) -> Path: + target_path = Path(local_path) + target_path.parent.mkdir(parents=True, exist_ok=True) + target_path.write_bytes(self.remote_payloads[object_path]) + return target_path + + +@pytest.mark.offline +class SupabasePersistenceTest(unittest.TestCase): + def test_object_path_helpers(self) -> None: + settings = UISupabaseSettings( + enabled=True, + url="https://example.supabase.co", + service_role_key="service-role-key", + bucket="rdagent-artifacts", + trace_prefix="trace-root", + stdout_prefix="stdout-root", + upload_prefix="upload-root", + ) + + self.assertEqual(trace_object_prefix("Data Science/demo", settings), "trace-root/Data Science/demo") + self.assertEqual(stdout_object_path("Data Science/demo", settings), "stdout-root/Data Science/demo.log") + self.assertEqual( + upload_object_path("Data Science", "demo", "report.pdf", settings), + "upload-root/Data Science/demo/report.pdf", + ) + + def test_upload_directory_writes_manifest(self) -> None: + client = DummySupabaseClient() + + with tempfile.TemporaryDirectory() as tmp_dir: + root = Path(tmp_dir) + (root / "nested").mkdir() + (root / "summary.txt").write_text("summary", encoding="utf-8") + (root / "nested" / "details.json").write_text('{"ok": true}', encoding="utf-8") + + uploaded_files = client.upload_directory(root, "traces/demo") + + self.assertEqual(uploaded_files, ["nested/details.json", "summary.txt"]) + manifest = json.loads(client.remote_payloads["traces/demo/__manifest__.json"].decode("utf-8")) + self.assertEqual(manifest, ["nested/details.json", "summary.txt"]) + self.assertEqual(client.remote_payloads["traces/demo/summary.txt"], b"summary") + + def test_download_directory_uses_manifest(self) -> None: + client = DummySupabaseClient() + client.remote_payloads["traces/demo/__manifest__.json"] = json.dumps( + ["nested/details.json", "summary.txt"] + ).encode("utf-8") + client.remote_payloads["traces/demo/nested/details.json"] = b'{"ok": true}' + client.remote_payloads["traces/demo/summary.txt"] = b"summary" + + with tempfile.TemporaryDirectory() as tmp_dir: + root = Path(tmp_dir) + downloaded_paths = client.download_directory("traces/demo", root) + + self.assertEqual( + sorted(path.relative_to(root).as_posix() for path in downloaded_paths), + ["nested/details.json", "summary.txt"], + ) + self.assertEqual((root / "summary.txt").read_text(encoding="utf-8"), "summary") + + +if __name__ == "__main__": + unittest.main()