Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
504 commits
Select commit Hold shift + click to select a range
936a181
feat: add panorama dataset, refactor dataset interface
Jensen246 Dec 21, 2025
a81ffb4
feat: calculate token using tiktoken, and ndarray bug
Jensen246 Dec 21, 2025
08f8e86
Merge upstream/chemcot: add chemcot dataset with DatasetConfig interface
Jensen246 Dec 21, 2025
7763cc6
fix: download subtasks of chemcotdataset seperately
Jensen246 Dec 21, 2025
0a41502
feat: customized prepare func for datasets
Jensen246 Dec 21, 2025
781d6d0
feat: update new benchmarks
Jensen246 Dec 21, 2025
5b22c9c
add datasets package
Jensen246 Dec 21, 2025
a963bfe
docs: readme for llm finetune
Jensen246 Dec 22, 2025
4a6a4fe
feat: download raw data directly, with post-process function
Jensen246 Dec 22, 2025
b26e72c
feat: analyze raw dataset
Jensen246 Dec 22, 2025
3d71857
suppress litellm debug info
Jensen246 Dec 22, 2025
0d1fd17
feat(ui): summary page
Jensen246 Dec 23, 2025
473cfe5
feat: run multi-jobs
Jensen246 Dec 23, 2025
fe3374c
feat: improve ui
Jensen246 Dec 23, 2025
60c3e75
feat: add path and checkout options to LLM finetune loop entrypoint
you-n-g Dec 23, 2025
37c7804
feat: add FinanceIQ_ppl benchmark with auto-download and dataset desc…
you-n-g Dec 23, 2025
37147c4
refactor: remove unused imports and dead code, fix session folder log…
you-n-g Dec 23, 2025
1000aa0
feat: enable tablebench and tableInstruct dataset
chelsea97 Dec 23, 2025
3d18e0a
refine dataset readme, and coder prompt
Jensen246 Dec 23, 2025
93ecc78
Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune
Jensen246 Dec 23, 2025
5b88eac
refine proposal and coder prompt
Jensen246 Dec 23, 2025
d830351
fix: ui path (default log path)
Jensen246 Dec 23, 2025
a225fd5
feat: add automatic LoRA model merging for benchmarking with vLLM
you-n-g Dec 23, 2025
90c621d
refactor: reorganize finetune benchmark and merge modules under bench…
you-n-g Dec 23, 2025
7cc2a8a
refactor: modularize benchmark config and error extraction for finetu…
you-n-g Dec 23, 2025
d232af0
fix: update benchmark import paths and disable env cache for device info
you-n-g Dec 24, 2025
bc0742b
refactor docke&conda env and fix import bugs
Jensen246 Dec 24, 2025
46743d0
Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune
Jensen246 Dec 24, 2025
18f85be
modify init python file
chelsea97 Dec 24, 2025
97e2f4c
feat: add FinanceIQ dataset split utility and integrate with pipeline
you-n-g Dec 24, 2025
e73ffc6
feat: set weak and strong model by env, distribute workload across mo…
Jensen246 Dec 24, 2025
18e7207
Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune
Jensen246 Dec 24, 2025
325d0d2
feat: sample dataset and rm params for tensorboard, wandb
Jensen246 Dec 24, 2025
66a5677
update script to run jobs
Jensen246 Dec 24, 2025
b6b967f
refine proposal prompt, remove specific dataset name
Jensen246 Dec 24, 2025
2c9fbc2
fix(ui): auto switch log folder
Jensen246 Dec 24, 2025
9143e6f
fix: estimate the processed full data after sample
Jensen246 Dec 25, 2025
bdf9f5b
feat: filter raw data more aggressively, and lower data_eval standard
Jensen246 Dec 25, 2025
62f0c58
feat: sync workspace to blob
Jensen246 Dec 25, 2025
5d07fea
feat: rdkit for chemcotbench
Jensen246 Dec 25, 2025
7c0610e
update qwen2.5&llama3.1 context
Jensen246 Dec 25, 2025
0bbc492
fix: force failure on validation error and remove try/except in valid…
you-n-g Dec 26, 2025
01a65b5
feat: unified error sample extraction (with test scripts)
Jensen246 Dec 28, 2025
3e60b88
feat: set conda cache with .env
Jensen246 Dec 28, 2025
ee51fa8
feat: skip data eval if data pass in last evo
Jensen246 Dec 28, 2025
afb867a
fix: rm redundant param
Jensen246 Dec 28, 2025
55ca4b0
fix ui bug
Jensen246 Dec 28, 2025
eb31cb4
refactor: centralize assign_code_list_to_evo in MultiProcessEvolvingS…
you-n-g Dec 26, 2025
80f1efa
feat: add test_params.yaml generation and workspace cleanup improveme…
you-n-g Dec 29, 2025
76fc4a8
refactor: replace get_clear_ws_cmd with clear_workspace and update pr…
you-n-g Dec 29, 2025
620d329
add bioprobench dataset
XianBW Dec 29, 2025
95e798b
fix: handle commas in training config extraction and refactor prompt …
you-n-g Dec 29, 2025
f83d6f3
bioprobench description
Jensen246 Dec 29, 2025
1b7400b
add bioprobench readme
XianBW Dec 29, 2025
b5c3fa2
feat: merge lora adapter for blackwell gpu
Jensen246 Dec 30, 2025
8c9d810
Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune
Jensen246 Dec 30, 2025
6aa465b
feat: support for multi benchmarks in one job
Jensen246 Dec 30, 2025
3647dae
change dfficult aware content for training
chelsea97 Dec 30, 2025
8ed9a5d
update difficulty-aware and logging principles
chelsea97 Dec 30, 2025
5ef0847
fix: resolve variable name conflict in FTRunnerEvaluator
Jensen246 Dec 31, 2025
04518e9
set job id accuracy to minute
Jensen246 Dec 31, 2025
7a4bb2c
feat(ui): display one selected metric per benchmark
Jensen246 Dec 31, 2025
10b5b02
feat: store sota exp, and fix ws_ckp bug
Jensen246 Dec 31, 2025
57757dd
fix: truncate data.json in feedback
Jensen246 Dec 31, 2025
72bed66
fix: opencompass data for conda env
Jensen246 Jan 1, 2026
623b33e
fix: save only the last model
Jensen246 Jan 1, 2026
79dcdee
feat: set log path and ws path
Jensen246 Jan 1, 2026
8b22b80
fix: set overwrite_cache to avoid lock contention(through injecting p…
Jensen246 Jan 4, 2026
e70b815
feat: redirect stdout to file in localenv
Jensen246 Jan 5, 2026
8b58335
add pickle cache to dataset desc
peteryang1 Jan 5, 2026
d126f35
fix CI
peteryang1 Jan 5, 2026
8ac0796
fix: remove redundant wrapper
Jensen246 Jan 5, 2026
c99d7dd
feat: set python_unbuffered
Jensen246 Jan 5, 2026
6958372
move redirect stdout to env run
peteryang1 Jan 6, 2026
6692036
fix a small bug
peteryang1 Jan 6, 2026
4fdc9d1
move model folder
peteryang1 Jan 6, 2026
85e26a8
feat(ui): display benchmark baseline
Jensen246 Jan 6, 2026
64f20c8
fix: enrich scenario and benchmark description
Jensen246 Jan 6, 2026
de23a99
fix: rewrite runner eval to accept easier
Jensen246 Jan 6, 2026
7067629
feat: compare with baseline when no SOTA
Jensen246 Jan 6, 2026
4fa82e4
update tablebench readme
chelsea97 Jan 7, 2026
fb6dfdb
fix: switch back to single benchmark (for baseline)
Jensen246 Jan 7, 2026
8c4fcf6
Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune
Jensen246 Jan 7, 2026
276bbff
feat(ui): add ws path in ui
Jensen246 Jan 7, 2026
e548eec
refactor: update SOTA tracking to use DAG traversal and parent selection
you-n-g Jan 7, 2026
87a991d
fix: prioritize local_selection in trace and refactor sibling retriev…
you-n-g Jan 7, 2026
dc569b1
refactor: unify error handling in feedback generation and update work…
you-n-g Jan 7, 2026
2816b56
feat: add skip_loop_error_stepname to control error skip step in Loop…
you-n-g Jan 7, 2026
228eb45
fix: set local_selection to NEW_ROOT for experiments without parent
you-n-g Jan 8, 2026
162813a
feat: set different ports for jobs
Jensen246 Jan 8, 2026
7a5bdb8
feat: set different ports for jobs
Jensen246 Jan 8, 2026
dbc2ba2
Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune
XianBW Jan 8, 2026
5c5eaf2
feat: add upper data size limit for LLM fine-tuning and update relate…
peteryang1 Jan 8, 2026
3270aa3
fix: replace get_truncated_stdout() with stdout for consistent output…
peteryang1 Jan 8, 2026
29b2f7b
refactor: remove data.json from cache and workspace logic, focus on s…
you-n-g Jan 8, 2026
462fc21
fix: rm target_scenario
Jensen246 Jan 8, 2026
753a6a0
feat: add selective cache extraction and custom cache key for data pr…
you-n-g Jan 8, 2026
9f66f09
fix(ui): bug when displaying tablebench
Jensen246 Jan 9, 2026
f3c6494
fix: filter config in dataset_info.json
Jensen246 Jan 11, 2026
7ddfd9d
feat: add test set, set valid set
Jensen246 Jan 12, 2026
3c6ee92
feat(ui): update test score, and set color for final decision
Jensen246 Jan 12, 2026
bc23e83
feat: add test score for baseline and update ui
Jensen246 Jan 12, 2026
e7d3153
fix: use [-100:] as test range
Jensen246 Jan 12, 2026
e4aea14
feat: update data_stats in runner
Jensen246 Jan 12, 2026
fa2cb19
feat: wait for opencompass init when run multi jobs
Jensen246 Jan 12, 2026
1a0da06
fix: adjust test&valid split
Jensen246 Jan 12, 2026
0980ea0
feat: force to generate COT(with <think> token), and add answer forma…
Jensen246 Jan 12, 2026
8cd3fa5
feat: improve ui
Jensen246 Jan 12, 2026
d5a2a1e
fix: unify benchmark volume mounts and set extra_volumes for conda env
you-n-g Jan 12, 2026
66c3538
fix(ui): number color
Jensen246 Jan 13, 2026
c668a2c
Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune
Jensen246 Jan 13, 2026
04579e7
Merge branch 'main' into finetune
peteryang1 Jan 13, 2026
e510b7d
fix: update GPU memory handling to use total memory in GB and streaml…
peteryang1 Jan 13, 2026
a114f3b
fix: set use_cot_postprocessor
Jensen246 Jan 13, 2026
9f7cf11
Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune
Jensen246 Jan 13, 2026
bd4d4ec
feat: add env_dict to config classes and merge env vars in Env run
you-n-g Jan 14, 2026
973300e
fix: let coder obey proposal
Jensen246 Jan 14, 2026
18a9eff
fix(ui): direction bug and update chemcot core metirc
Jensen246 Jan 14, 2026
8a9f22b
fix: set consistent benchmark mount points and env vars for docker an…
you-n-g Jan 14, 2026
5d4e39e
fix: addintional target for LoRA
Jensen246 Jan 14, 2026
0c4c2a2
Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune
Jensen246 Jan 14, 2026
ea8cf80
feat: workspace dir log for benchmark running
Jensen246 Jan 15, 2026
28b2e33
fix: tableInstruct path bug and update benchmark description
Jensen246 Jan 15, 2026
49ed287
feat: timeout for whole job
Jensen246 Jan 15, 2026
31d8558
fix: align FinanceIQ import to opencompass
Jensen246 Jan 15, 2026
5e7702e
feat: use llm_judge for FinanceIQ
Jensen246 Jan 15, 2026
4a7a002
feat: switch to turn on <think> or not
Jensen246 Jan 15, 2026
492bb74
feat: using scripts to redirect stdout, and run in different windows
Jensen246 Jan 15, 2026
dfdd000
feat: sync litellm log
Jensen246 Jan 15, 2026
f8d9203
fix: gpu memory format
Jensen246 Jan 15, 2026
cb8889f
fix: escape special characters in benchmark desc
Jensen246 Jan 15, 2026
a8aeadf
fix: set data processing timeout to 1h
Jensen246 Jan 16, 2026
0466818
feat: set valid_loss and save_best_model
Jensen246 Jan 17, 2026
f227a87
fix: inject timeout and stage
Jensen246 Jan 17, 2026
26d6561
fix: loss history extract logic
Jensen246 Jan 17, 2026
9301b2a
feat: inject output dir
Jensen246 Jan 18, 2026
0dd026b
feat: inject eval batch size
Jensen246 Jan 18, 2026
e64a9f9
feat: inject save_total_limit
Jensen246 Jan 18, 2026
16e9e24
feat: update data prompt
Jensen246 Jan 18, 2026
9bea9c4
fix: escape shell special characters
Jensen246 Jan 19, 2026
fa37212
fix: tablebench visualization UI
Jensen246 Jan 19, 2026
2d47a11
fix: move implementation validation to coder, and ignore injected params
Jensen246 Jan 19, 2026
a9a016c
docs: add README for RL-PostTraining evaluation system
you-n-g Jan 19, 2026
71c7264
Add AutoRL-Bench evaluation framework for RL post-training
couragec Jan 19, 2026
c2be2ed
Add architecture documentation
couragec Jan 19, 2026
9d94aab
docs: update architecture and interface documentation for AutoRL-Bench
you-n-g Jan 19, 2026
d128e05
improve doc
couragec Jan 20, 2026
9284871
fix
couragec Jan 20, 2026
7375559
refactor: YAML配置驱动
couragec Jan 20, 2026
e41491a
feat: add RL Docker env, workspace test, and update project structure
you-n-g Jan 20, 2026
3fa10e6
feat: 重命名 autorl_bench, 新增 RLWorkspace, 配置 Docker extra_volumes
couragec Jan 20, 2026
c4a05a6
Add eval-only AutoRL-Bench pipeline
sakura657 Jan 21, 2026
21271af
sturcture clean
sakura657 Jan 21, 2026
55f66ef
docs: add autorl_bench README
sakura657 Jan 21, 2026
610ad14
feat(rl): Implement RL post-training agent scaffold and example
you-n-g Jan 21, 2026
19f80fc
refactor: simplify RL scenario classes and update RL CoSTEER integration
you-n-g Jan 21, 2026
6c98631
feat(rl): 调通 scaffold,mock 数据跑完 5 步循环
couragec Jan 21, 2026
44dd3f3
feat(rl): 接入 LLM 生成代码,支持 model_path 传递
couragec Jan 21, 2026
3d3e933
feat(rl): Docker 执行框架,RLWorkspace.run() + RLPostTrainingRunner
couragec Jan 21, 2026
e034730
feat(rl): LLM 生成假设/反馈,完整 loop 跑通
couragec Jan 21, 2026
507fa91
feat: add RL post-training entry point with configurable options
you-n-g Jan 22, 2026
48261d2
refactor: simplify RL proposal and trace classes, update config and docs
you-n-g Jan 22, 2026
c80ca9b
Update rl eval autorl_bench layout
shatianming5 Jan 22, 2026
a5342ae
Update RL workflow and evaluation setup
shatianming5 Jan 22, 2026
370f96f
Integrate AutoRL-Bench evaluation in RL workflow
shatianming5 Jan 22, 2026
8d97e8f
feat(rl): 添加 --base-model/--benchmark CLI 参数,简化 RLTask
couragec Jan 22, 2026
6f20926
feat(rl): Docker 环境动态选择 + example_agent 完整训练评测流程(无llm)
couragec Jan 23, 2026
2925b5f
fix(rl): 修复 feedback 传递 + 添加 verl 依赖
couragec Jan 23, 2026
ac3ad22
refactor: remove unused validate in BenchmarkAdapter and add core uti…
you-n-g Jan 23, 2026
71899f7
feat(rl): UI
couragec Jan 23, 2026
3204c81
Refactor autorl_bench layout and docker entrypoint
TianMing-Sha Jan 23, 2026
cb3271a
autorl_bench: add aider autoloop tool
TianMing-Sha Jan 25, 2026
b646898
feat(rl): environment docker
couragec Jan 26, 2026
4a4057f
refactor: simplify aider autoloop tooling
shatianming5 Jan 26, 2026
52d7d74
chore: update misc files
shatianming5 Jan 26, 2026
5783694
feat(rl): yaml-driven dataset download & auto-download on startup
couragec Jan 26, 2026
19042a7
feat(rl): yaml-driven dataset download & auto-download on startup
couragec Jan 26, 2026
9095a10
Refactor RL eval runner and clean up
shatianming5 Jan 26, 2026
62f5000
Simplify RL eval runner and env
TianMing-Sha Jan 26, 2026
c8f9c88
rl: include litellm in RL docker image
shatianming5 Jan 26, 2026
9a783a7
feat(rl): unified resource path & model repo_id structure
couragec Jan 27, 2026
468f4a5
feat(rl): refactor eval with OpenCompass & add training code template
couragec Jan 27, 2026
47c5f4f
feat(rl): refactor eval with OpenCompass & add training code template
couragec Jan 27, 2026
1e22e13
feat(rl): delete test bench
couragec Jan 27, 2026
82e7cca
docs: add benchmark interface notes and TODOs for unified evaluation
you-n-g Jan 27, 2026
41ab4c4
feat(rl): unified benchmark eval interface + shared configs
couragec Jan 27, 2026
b570585
feat(rl): 优雅
couragec Jan 27, 2026
6115b6d
feat(rl): prompt prososal+coder improve
couragec Jan 28, 2026
1482298
feat(rl): fix eval
couragec Jan 28, 2026
a57f72f
fix(rl): docker
couragec Jan 28, 2026
4aea705
fix(rl): eval
couragec Jan 29, 2026
263e17b
v 1.0 tmep
couragec Jan 30, 2026
7e8eaf9
benchmark v1.0
couragec Jan 30, 2026
3efb261
benchmark v1.1
couragec Feb 1, 2026
ddda159
benchmark v1.1: grading日志+代码去重
couragec Feb 1, 2026
f6d8af0
benchmark v1.1: grading日志+代码去重
couragec Feb 1, 2026
8a60e6f
benchmark v1.1: grading日志+代码去重+task description
couragec Feb 1, 2026
6b56ea2
benchmark v1.2: fix
couragec Feb 1, 2026
47e8bf0
benchmark v1.3: fix,example-agent ok,rdagent test,openhands develop
couragec Feb 1, 2026
4021310
benchmark v1.4: fix,example-agent ok,rdagent ok,openhands develop
couragec Feb 2, 2026
271f778
benchmark : add alfworld
couragec Feb 4, 2026
90ee622
benchmark : update readme
couragec Feb 9, 2026
292565e
benchmark : update readme
couragec Feb 9, 2026
bf8a70d
benchmark :
couragec Feb 9, 2026
96a63b8
chore: add eval bypass block and mark TODO in grading server
you-n-g Feb 9, 2026
7605eff
benchmark
couragec Feb 9, 2026
95b4c59
benchmark
couragec Feb 9, 2026
7a76be1
benchmark
couragec Feb 9, 2026
75bd517
benchmark
couragec Feb 10, 2026
b050e98
alfworld
couragec Feb 10, 2026
a2e25c1
alfworld
couragec Feb 10, 2026
761c07d
benchmark
couragec Feb 11, 2026
f455327
rdagent
couragec Feb 11, 2026
fffa1ab
rdagent
couragec Feb 11, 2026
edce91b
benchmark
couragec Feb 11, 2026
8f7a473
benchmark:ui
couragec Feb 12, 2026
77ab322
benchmark:delete docker + log
couragec Feb 12, 2026
305a898
1
couragec Feb 12, 2026
027032a
alfworld
couragec Feb 24, 2026
0b3ffc6
ui
couragec Feb 27, 2026
3450b8a
alfworld
couragec Feb 28, 2026
28d2f5a
readme
couragec Feb 28, 2026
240a7ec
alfworld
couragec Feb 28, 2026
7bff58d
parallex
couragec Feb 28, 2026
dd0faa3
alfworld
couragec Feb 28, 2026
d7919d4
run
couragec Mar 1, 2026
c3fa363
eval gpu
couragec Mar 2, 2026
6e50e82
alfworld
couragec Mar 2, 2026
1135bb3
alfworld
couragec Mar 2, 2026
d24ad8a
fix conda init in start.sh for non-interactive shells
couragec Mar 2, 2026
6a192e0
simplify start.sh: read TRAINING_PYTHON from .env
couragec Mar 2, 2026
866a4df
use OPENHANDS_PYTHON from .env to run agent
couragec Mar 2, 2026
51737db
feat: register OpenCode agent into autorl_bench framework
shatianming5 Mar 1, 2026
f542ca2
Update opencode agent, benchmarks, and eval configs
shatianming5 Mar 2, 2026
b87a153
Update OpenCode agent docs for external opencode-rl integration
shatianming5 Mar 2, 2026
c39940b
feat: add smith benchmark discovery and per-sample evaluator
shatianming5 Mar 2, 2026
088f4b7
enforce RL-only in instructions.md; remove embedded opencode-rl
couragec Mar 3, 2026
6bf943e
comment out OpenCode-only deps in requirements.txt
couragec Mar 3, 2026
7251e8c
refactor: extract _kill_process_group, narrow exception catches
couragec Mar 3, 2026
ca520db
move kill_process_group to core/utils for reuse
couragec Mar 3, 2026
e2ae657
add comments to run.py for workspace isolation and signal handling
couragec Mar 3, 2026
83ff188
remove OpenCode-only deps from requirements.txt entirely
couragec Mar 3, 2026
02c1068
allow SFT in instructions, RL as ultimate goal
couragec Mar 3, 2026
3ac4f8c
add workspace isolation rules to instructions.md
couragec Mar 3, 2026
278308a
update opencode start.sh: use OPENCODE_PYTHON, add PATH for opencode …
couragec Mar 3, 2026
0683730
opencode start.sh: pass --run-dir to use AutoRL-Bench workspace
couragec Mar 3, 2026
5007063
opencode start.sh: prepend training env bin to PATH
couragec Mar 3, 2026
56fbab3
opencode start.sh: restore --max-retries and --eval-timeout for openc…
couragec Mar 3, 2026
cdc83ff
Add webshop benchmark and update description
Jensen246 Mar 4, 2026
3785a3e
adjust flask dependency
Jensen246 Mar 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .ai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# 简介

我们相信用户依赖 AI 工具来完成任务。我们的目标是将任务上下文集中在一个文件夹中,让用户可以快速构建自己的 AI 工具,遵循上下文并持续推进任务。
86 changes: 86 additions & 0 deletions .ai/task/nav.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@

```task

我们要构建一个用于 RL-post-training(强化学习后训练)的 benchmark 和 agent 系统。

可以大量参考 `rdagent/scenarios/finetune` 的内容。

我们从 benchmark 开始。

最终的 benchmark 代码将实现在 rdagent/scenarios/rl/eval

待办事项清单:
- [ ] 构建一个 example workspace(workspace 是 agent 系统生成的解决方案)并在 docker 环境中运行
- 为评测构建专用环境,并编写测试来评测 example workspace
- 测试用例示例:test/rl/test_example_workspace.py

## 构建 benchmark & example workspace
rdagent/scenarios/rl/eval/AutoRL-Bench/example_workspace

基于 Dockerfile `rdagent/scenarios/rl/eval/AutoRL-Bench/env/`
在 rdagent/utils/env.py 中编写 RL 专用的 DockerEnv


我们开始一个新任务。

## 构建一个 Agent 来生成 solution(workspace)

### 第一步:

参考 finetune agent,主要工作流在 `rdagent/scenarios/finetune/loop.py`

请按照上述结构为 RL-post-training 实现一个脚手架(scaffold)。


### 第二步:

在脚手架中实现具体的示例。


让我们从第一步开始;

请为 RL 场景添加一个入口,类似 rdagent/app/finetune/llm/loop.py


代码结构:
- `rdagent/scenarios/rl/`: 场景的具体功能实现
- `/rdagent/app/rl`: CLI 入口 & 配置

## 组件说明

CoSTEER:代码生成是困难的;我们需要多个步骤来生成代码。负责执行计划(计划来自外层循环)。
- for step in all_steps:
- run step:
- 当 step 是 coding 时,我们有内层循环来生成代码。


- TODO:
- 简化脚手架逻辑
```


[[test/rl/test_example_workspace.py:6]]


## Coding Principles
Don't catch unknown exceptions when implementing new code. I prefer to let the error propagate so it can be detected and fixed promptly.

## (R)un 运行特定功能


```
```

### 调试用

## (A)I 编辑
<发送给 AI 的指令>

## (E)xplanation 解释
<理解代码的关键部分>

## (Q)uestions 问题
<记录要问同事的问题>

## (B)acklogs 待办
<设计改进>
42 changes: 42 additions & 0 deletions .ai/task/rl-naive-bench.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@

# 任务描述

我们正在开发一个最简版的 RL 后训练基准测试。开发时遵循以下原则:
- 保持代码简洁是最高优先级
- 性能不在考虑范围内

## 技术决策:

- 我们不想重新发明仓库级代码生成。所以打算使用现有的 coder 来生成仓库级代码。
- 候选:aider, openhands

## TODO:

- [/] (xiao)repo-level coder may not provide interfaces that fits curernt CoSTEER's interface.
- related code:
- `rdagent/components/coder/CoSTEER/evolving_strategy.py`
- This is not required.
- Key question:
- Do we have requirements to launch multiple runs?
- Extremely long code (2~3K lines)

- UI:
- Ideal UI: if we use same framework, we expect a unified UI for all scenarios.
- BUT: Current UI may not be general enough for all scenarios.

- Define benchmark interface:
- The users(e.g. agent) only interacts with the benchmark's public interface.
- interaction scenarios:
- CODE in R&D-Agent interaction with the benchmark
- ...

# 编码原则
实现新代码时不要捕获未知异常。我倾向于让错误传播,以便及时发现和修复。



# 潜在重构待办
## 框架
- 简化构建新 CoSTEER coder 的流程 (xiao 正在思考)
- 相关代码: `rdagent/components/coder/rl/costeer.py`
- 在 `rdagent/core/experiment.py` 中:能否在 Generic 类中创建新的 Workspace?
2 changes: 2 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,7 @@ EMBEDDING_MODEL="litellm_proxy/BAAI/bge-large-en-v1.5"
# Cache Setting (Optional):
# USE_CHAT_CACHE=True
# USE_EMBEDDING_CACHE=True
# FT_DOCKER_ENABLE_CACHE=True
# DS_DOCKER_ENABLE_CACHE=True
# Senario Configs:
# ==========================================
11 changes: 10 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -182,4 +182,13 @@ static/
# AI assistant
.cursor/
.claude/
AGENTS.md
AGENTS.md

scripts/

# AutoRL-Bench (legacy)
rdagent/scenarios/rl/eval/autorl_bench/runs/
rdagent/scenarios/rl/eval/autorl_bench/example_workspace/

# Temporary files
tmp/
Loading
Loading