Skip to content

WIP: Initial support of Tasks and Dataframes#73

Draft
3dgiordano wants to merge 5 commits into
SEARCH_ACTIONSfrom
TASKS_AND_DATAFRAMES
Draft

WIP: Initial support of Tasks and Dataframes#73
3dgiordano wants to merge 5 commits into
SEARCH_ACTIONSfrom
TASKS_AND_DATAFRAMES

Conversation

@3dgiordano

Copy link
Copy Markdown
Collaborator

This pull request introduces several enhancements and fixes across the codebase, focusing on improved tool integration, result handling, stricter adherence to BlazeMeter MCP usage, and more robust testing. The most significant changes are grouped below.


BlazeMeter MCP Integration and Usage Policies:

  • Enforces exclusive use of BlazeMeter MCP for all tasks, disallowing direct API access, and adds detailed, binding operational guidelines to the MCP instructions for safe parallel execution, batching, resource cleanup, and dataframe handling. [1] [2] [3]
  • Patches ArgModelBase in MCP to ensure tools with an arguments parameter always receive the full payload, improving compatibility with client requests.

Result Handling and Tool Output:

  • Adds new fields to BaseResult for tool call timing and debug info, and introduces a ToolResult class for standardized tool output, including properties for result metadata. [1] [2] [3]

Tool Registration and Management:

  • Registers the new tools_manager in the MCP server, ensuring all tools are available and properly integrated. [1] [2]

Testing Improvements:

  • Adds comprehensive tests for the async task manager, including Crockford Base32 ID generation, collision handling, and case-insensitive lookup/removal.
  • Refactors batch control tests to patch concurrency limits at the utility level and fixes method patching for skills and help batch actions. [1] [2] [3]

Other Enhancements:

  • Adds a minimal test formatter for concise test summaries.
  • Adds polars as a dependency for advanced dataframe support.

These changes collectively strengthen the reliability, maintainability, and compliance of the MCP server and its tool integrations.

@Baraujo25 Baraujo25 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loved this work @3dgiordano !!!

Comment thread tools/async_task_manager.py Outdated
Comment on lines +112 to +129
def _generate_task_id() -> str:
return "".join(secrets.choice(TASK_ID_ALPHABET) for _ in range(TASK_ID_LENGTH))


def _allocate_task_id() -> str:
for _ in range(TASK_ID_MAX_ATTEMPTS):
candidate = _generate_task_id()
if candidate not in _tasks:
return candidate

logger.error(
"Unable to allocate unique 8-char task id after 10 attempts. "
"attempts=%s id_length=%s alphabet=crockford32 active_pool_size=%s",
TASK_ID_MAX_ATTEMPTS,
TASK_ID_LENGTH,
len(_tasks),
)
raise RuntimeError("Unable to allocate unique 8-char task id after 10 attempts.")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about the usage of uuid in this place? I'm asking mainly because I see that there is a chance (mathematically small, but still) that the id is never generated and that is constantly validating existence prior its allocation to avoid collision.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, it was a UUID, but it was too long and consumed too many space (tokens). It was also too complex for a user to type when listing and checking the status of a task.

Therefore, a unique generation mechanism was implemented based on the Douglas Crockford Base32 alphabet, but in lowercase so users wouldn't have to worry about case. The length of 8 was chosen as an acceptable limit for an identifier that a user would have to generate.

This mechanism has been tested and, moreover, should be generalized and used for unique identifiers of dataframes to reduce size and allow users to list or reference dataframes without using a very complex unique identifier, whether for a human or an AI.

Comment thread main.py
Comment on lines +30 to +48
# Patch MCP ArgModelBase so tools with an "arguments" param receive the full payload
# when the client sends {"action": "x", "key": "value"} instead of {"arguments": {...}}
from mcp.server.fastmcp.utilities import func_metadata
from pydantic import model_validator

_OriginalArgModelBase = func_metadata.ArgModelBase


class _PatchedArgModelBase(_OriginalArgModelBase):
@model_validator(mode="before")
@classmethod
def _wrap_root_as_arguments(cls, data: object) -> object:
if isinstance(data, dict) and "arguments" not in data:
return {"arguments": data}
return data


func_metadata.ArgModelBase = _PatchedArgModelBase

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this happening for a specific client? it would be good to know and keep track, good fix!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an error caused by the client or the AI's misinterpretation of the specification's arguments.

The most common behavior is that the client performs an incorrect action, resulting in a Pydantic error because it doesn't comply with the specified interface. The AI ​​recognizes the error and attempts the action again, the next time correctly.

This commonly occurs with models that have low precision in function calls or in some clients like VS Code (we don't know if it's AI or client or both).

This approach allows us to disable the Pydantic error and allow raw data to be sent. It is then processed more defensively by our logic, accepting both correct and incorrect formatting, attempting to resolve the data regardless of the AI's errors.

This minimizes the number of round trips the AI ​​makes when it fails to correctly declare the tool call.
We doo some tricks to generate the arguments arg and also use normalize_action_args in utils to fix some of the patterns (in that function explain more about the supported format patterns).

from models.result import BaseResult


DATAFRAME_JSON_SIZE_THRESHOLD = 8000

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason behind this number? What about lowering down to 2k to save some context/tokens now that we can handle most of the work with dataframes?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size was determined through research, identifying the approximate point at which Claude, VS Code, and Cursor decide to process or split the text outside the context. This size was chosen to prevent these clients from performing these operations on the results with their external tools.

By converting the data to a DataFrame, the rudimentary tools used by these clients to process JSON are no longer necessary, allowing for more efficient use of the DataFrame mechanism.

If the JSON is small enough for these clients to accept it in the context, it passes directly to the context, but otherwise it tries to pass it to a dataframe before the client decides to use more expensive mechanisms for its processing (cropping, grep and other shell tools).

return len(ids)


def get_sql_capabilities() -> Dict[str, Any]:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like capabilities could have their own sit on a different file? What do you think?
I was thinking to have something like

 ─── dataframe
    ├── dataframe_decorators.py
    ├── dataframe_manager_capabilities.py
    └── dataframe_manager.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants