-
Notifications
You must be signed in to change notification settings - Fork 117
feat: python tools requirement #1040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 11 commits
dbef526
119699e
9055101
036cfee
e3d00f3
577eb00
41f09cc
85c5c67
8f391da
8e24e97
b7261ea
94051c9
acbef75
7b3f561
7437c97
dece270
b094918
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
markstur marked this conversation as resolved.
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,136 @@ | ||
| # pytest: ollama, e2e, qualitative | ||
| """Repair plotting code with Python-tool and plotting-specific requirements.""" | ||
|
|
||
| import tempfile | ||
| import traceback | ||
| from pathlib import Path | ||
|
|
||
| import mellea | ||
| from mellea.backends import ModelOption | ||
| from mellea.backends.tools import MelleaTool | ||
| from mellea.stdlib.requirements import ( | ||
| python_plotting_requirements, | ||
| python_tool_requirements, | ||
| ) | ||
| from mellea.stdlib.sampling import SOFAISamplingStrategy | ||
| from mellea.stdlib.tools import local_code_interpreter | ||
| from mellea.stdlib.tools.interpreter import ExecutionResult | ||
|
|
||
|
|
||
| def python(code: str) -> ExecutionResult: | ||
| """Execute Python code. | ||
|
|
||
| Args: | ||
| code: Python code to execute | ||
|
|
||
| Returns: | ||
| Execution result containing stdout, stderr, and success status | ||
| """ | ||
| return local_code_interpreter(code) | ||
|
|
||
|
|
||
| def main(): | ||
| """Run the plotting repair example.""" | ||
| with tempfile.TemporaryDirectory() as tmpdir: | ||
| output_path = str(Path(tmpdir) / "plot.png") | ||
|
|
||
| m = mellea.start_session(context_type="chat") | ||
|
|
||
| requirements = [ | ||
| *python_tool_requirements(allowed_imports=["numpy", "matplotlib", "math"]), | ||
| *python_plotting_requirements(output_path=output_path), | ||
| ] | ||
|
|
||
| sampling_strategy = SOFAISamplingStrategy( | ||
| s1_solver_backend=m.backend, | ||
| s2_solver_backend=m.backend, | ||
| s2_solver_mode="fresh_start", | ||
| loop_budget=3, | ||
| feedback_strategy="first_error", | ||
| ) | ||
|
|
||
| task_summary = ( | ||
| f"Create a plot of sin(x) for x in 0..2π and save it to {output_path}" | ||
| ) | ||
|
|
||
| print("=" * 70) | ||
| print("Testing plotting-code repair with Python tool requirements") | ||
| print("=" * 70) | ||
| print(f"Task: {task_summary}\n") | ||
|
|
||
| try: | ||
| result = m.instruct( | ||
| task_summary, | ||
| requirements=requirements, | ||
| strategy=sampling_strategy, | ||
| return_sampling_results=True, | ||
| tool_calls=True, | ||
| model_options={ModelOption.TOOLS: [MelleaTool.from_callable(python)]}, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had previously made a comment about when a python tool would exist. I understand now how you are using it. Can you please clarify if this is expected? ie when models generate python code is it common practice for it to be done through a tool like this? I would've assumed they just generate it through their normal generation mode without a structured output.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Additionally, to support the current workflow where we only look for python in the python tool calls, we should have a much more built out python tool that gets used for this then. Otherwise, we should just default to looking at the model output and asking it to generate python.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here is the response by claudecode. It seems correct and explains well. Yes, this is expected behavior and follows standard LLM patterns. Here's what's happening: Why use a tool?
Is this common practice?
Without a tool, you'd have to parse code from free-form text and trust it immediately, with no structured validation step. This example shows the full lifecycle — if you just wanted direct code generation without the tool wrapper, you'd skip tool_calls=True and _call_tools(), but you'd lose the safety validation layer.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The crux of my comment is still unaddressed, that we don't have a standard Mellea python tool. If we want to go this route, we ought to have that.
Sampling strategies / Mellea doesn't call tools by default. So this is true of either approach.
Same as above.
But the output is functionally free-form. Your code tool only requests a string which could be any text. We are still parsing it. We aren't accepting some json schema that defines the total grammar of the python language. |
||
| ) | ||
|
|
||
| print(f"\nResult: {'SUCCESS' if result.success else 'FAILED'}\n") | ||
|
|
||
| if result.success: | ||
| print("✓ Model successfully generated and executed plotting code") | ||
| print("\nFinal generated code:") | ||
| print("-" * 70) | ||
| print(result.result.value) | ||
|
markstur marked this conversation as resolved.
Outdated
|
||
| print("-" * 70) | ||
|
|
||
| if Path(output_path).exists(): | ||
| file_size = Path(output_path).stat().st_size | ||
| print(f"\n✓ Output file created: {output_path}") | ||
| print(f" File size: {file_size} bytes") | ||
| else: | ||
| print(f"\n✗ Output file not found: {output_path}") | ||
|
|
||
| print(f"\nRepair iterations: {len(result.sample_validations)}") | ||
| for attempt_idx, validations in enumerate(result.sample_validations, 1): | ||
| passed = sum(1 for _, val in validations if val.as_bool()) | ||
| total = len(validations) | ||
| status = "✓" if passed == total else "✗" | ||
| print( | ||
| f" {status} Attempt {attempt_idx}: {passed}/{total} " | ||
| f"requirements passed" | ||
| ) | ||
|
|
||
| for req, val in validations: | ||
| if not val.as_bool(): | ||
| print(f" - {req.description}") | ||
| if val.reason: | ||
| reason_preview = val.reason[:100].replace("\n", " ") | ||
| print(f" Error: {reason_preview}...") | ||
|
|
||
| else: | ||
| print("✗ Failed to generate working plotting code after all attempts\n") | ||
| print("Last attempt output:") | ||
| print("-" * 70) | ||
| print(result.result.value) | ||
| print("-" * 70) | ||
|
|
||
| print(f"\nFailure history ({len(result.sample_validations)} attempts):") | ||
| for attempt_idx, validations in enumerate(result.sample_validations, 1): | ||
| failed_count = sum(1 for _, val in validations if not val.as_bool()) | ||
| if failed_count > 0: | ||
| print(f"\n Attempt {attempt_idx}:") | ||
| for req, val in validations: | ||
| if not val.as_bool(): | ||
| print(f" - {req.description}") | ||
| if val.reason: | ||
| reason_lines = val.reason.split("\n")[:2] | ||
| for line in reason_lines: | ||
| print(f" {line}") | ||
|
|
||
| except Exception as e: | ||
| print(f"✗ Exception during sampling: {e}") | ||
| traceback.print_exc() | ||
|
|
||
| print("\n" + "=" * 70) | ||
| print("Test completed") | ||
| print("=" * 70) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
|
|
||
| # Made with Bob | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the only one I found, but please search and remove any references. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| """Plotting-specific requirements for Python tool validation. | ||
| Provides matplotlib and plotting-focused requirement factories separate from | ||
| generic Python tool requirements. | ||
| """ | ||
|
|
||
| from .matplotlib import python_plotting_requirements | ||
|
|
||
| __all__ = ["python_plotting_requirements"] |
Uh oh!
There was an error while loading. Please reload this page.