microsoft · Jensen246 · Dec 21, 2025 · Dec 21, 2025 · Dec 21, 2025 · Dec 21, 2025
diff --git a/.ai/README.md b/.ai/README.md
@@ -0,0 +1,3 @@
+# 简介
+
+我们相信用户依赖 AI 工具来完成任务。我们的目标是将任务上下文集中在一个文件夹中，让用户可以快速构建自己的 AI 工具，遵循上下文并持续推进任务。
diff --git a/.ai/task/nav.md b/.ai/task/nav.md
@@ -0,0 +1,86 @@
+
+```task
+
+我们要构建一个用于 RL-post-training（强化学习后训练）的 benchmark 和 agent 系统。
+
+可以大量参考 `rdagent/scenarios/finetune` 的内容。
+
+我们从 benchmark 开始。
+
+最终的 benchmark 代码将实现在 rdagent/scenarios/rl/eval
+
+待办事项清单：
+- [ ] 构建一个 example workspace（workspace 是 agent 系统生成的解决方案）并在 docker 环境中运行
+  - 为评测构建专用环境，并编写测试来评测 example workspace
+  - 测试用例示例：test/rl/test_example_workspace.py
+
+## 构建 benchmark & example workspace
+rdagent/scenarios/rl/eval/AutoRL-Bench/example_workspace
+
+基于 Dockerfile `rdagent/scenarios/rl/eval/AutoRL-Bench/env/`
+在 rdagent/utils/env.py 中编写 RL 专用的 DockerEnv
+
+
+我们开始一个新任务。
+
+## 构建一个 Agent 来生成 solution（workspace）
+
+### 第一步：
+
+参考 finetune agent，主要工作流在 `rdagent/scenarios/finetune/loop.py`
+
+请按照上述结构为 RL-post-training 实现一个脚手架（scaffold）。
+
+
+### 第二步：
+
+在脚手架中实现具体的示例。
+
+
+让我们从第一步开始；
+
+请为 RL 场景添加一个入口，类似 rdagent/app/finetune/llm/loop.py
+
+
+代码结构：
+- `rdagent/scenarios/rl/`: 场景的具体功能实现
+- `/rdagent/app/rl`: CLI 入口 & 配置
+
+## 组件说明
+
+CoSTEER：代码生成是困难的；我们需要多个步骤来生成代码。负责执行计划（计划来自外层循环）。
+- for step in all_steps:
+  - run step:
+    - 当 step 是 coding 时，我们有内层循环来生成代码。
+
+
+- TODO:
+  - 简化脚手架逻辑
+```
+
+
+[[test/rl/test_example_workspace.py:6]]
+
+
+## Coding Principles
+Don't catch unknown exceptions when implementing new code. I prefer to let the error propagate so it can be detected and fixed promptly.
+
+## (R)un 运行特定功能
+
+
+```
+```
+
+### 调试用
+
+## (A)I 编辑
+ <发送给 AI 的指令>
+
+## (E)xplanation 解释
+ <理解代码的关键部分>
+
+## (Q)uestions 问题
+ <记录要问同事的问题>
+
+ ## (B)acklogs 待办
+ <设计改进>
diff --git a/.ai/task/rl-naive-bench.md b/.ai/task/rl-naive-bench.md
@@ -0,0 +1,42 @@
+
+# 任务描述
+
+我们正在开发一个最简版的 RL 后训练基准测试。开发时遵循以下原则：
+- 保持代码简洁是最高优先级
+- 性能不在考虑范围内
+
+## 技术决策：
+
+- 我们不想重新发明仓库级代码生成。所以打算使用现有的 coder 来生成仓库级代码。
+  - 候选：aider, openhands
+
+## TODO:
+
+- [/] (xiao)repo-level coder may not provide interfaces that fits curernt CoSTEER's interface.
+  - related code:
+    - `rdagent/components/coder/CoSTEER/evolving_strategy.py`
+  - This is not required.
+  - Key question:
+    - Do we have requirements to launch multiple runs?
+    - Extremely long code (2~3K lines)
+
+- UI:
+  - Ideal UI: if we use same framework, we expect a unified UI for all scenarios.
+    - BUT: Current UI may not be general enough for all scenarios.
+
+- Define benchmark interface:
+  - The users(e.g. agent) only interacts with the benchmark's public interface.
+  - interaction scenarios:
+    - CODE in R&D-Agent interaction with the benchmark
+    - ...
+
+# 编码原则
+实现新代码时不要捕获未知异常。我倾向于让错误传播，以便及时发现和修复。
+
+
+
+# 潜在重构待办
+## 框架
+- 简化构建新 CoSTEER coder 的流程 (xiao 正在思考)
+  - 相关代码: `rdagent/components/coder/rl/costeer.py`
+- 在 `rdagent/core/experiment.py` 中：能否在 Generic 类中创建新的 Workspace？
diff --git a/.env.example b/.env.example
@@ -55,5 +55,7 @@ EMBEDDING_MODEL="litellm_proxy/BAAI/bge-large-en-v1.5"
 # Cache Setting (Optional):
 # USE_CHAT_CACHE=True
 # USE_EMBEDDING_CACHE=True
+# FT_DOCKER_ENABLE_CACHE=True
+# DS_DOCKER_ENABLE_CACHE=True
 # Senario Configs:
 # ==========================================
diff --git a/.gitignore b/.gitignore
@@ -182,4 +182,13 @@ static/
 # AI assistant
 .cursor/
 .claude/
-AGENTS.md
+AGENTS.md
+
+scripts/
+
+# AutoRL-Bench (legacy)
+rdagent/scenarios/rl/eval/autorl_bench/runs/
+rdagent/scenarios/rl/eval/autorl_bench/example_workspace/
+
+# Temporary files
+tmp/
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# 简介

		我们相信用户依赖 AI 工具来完成任务。我们的目标是将任务上下文集中在一个文件夹中，让用户可以快速构建自己的 AI 工具，遵循上下文并持续推进任务。