Skip to content

Agent fails to communicate with mlserver over gRPC after model is downloaded #6612

@andreys42

Description

@andreys42

Hey, folks!

I’ve installed Seldon Core 2 and all its sub-services in a Kubernetes cluster using the official production setup tutorial.

Everything seems to be running smoothly — all pods are communicating without issues.
My goal is to use MinIO (deployed as a pod with a local persistent volume) as the backend for storing models

📦 Model CRD

Here’s the Model Custom Resource I’m applying:
apiVersion: mlops.seldon.io/v1alpha1 kind: Model metadata: name: sentence-embedding-qwen3 namespace: seldon-mesh spec: storageUri: "s3://seldon-models/sentence-embedding-qwen3" secretName: seldon-minio-secret requirements: - mlserver

📁 Model Storage Layout in MinIO

The bucket path /sentence-embedding-qwen3 contains:
model-settings.json
{ "name": "sentence-embedding-qwen3", "implementation": "inference.SentenceEmbeddingModel", "parameters": { "uri": "./model" } }
/model/ directory:

  • 1_Pooling/
  • added_tokens.json
  • chat_template.jinja
  • config_sentence_transformers.json
  • config.json
  • merges.txt
  • model-00001-of-00002.safetensors
  • model-00002-of-00002.safetensors
  • model.safetensors.index.json
  • modules.json
  • README.md
  • sentence_bert_config.json
  • special_tokens_map.json
  • tokenizer_config.json
  • tokenizer.json
  • vocab.json

❌ Problem

When I apply the Model manifest, the mlserver pod (with the rclone sidecar) successfully downloads the model into its volume and then immediately deletes it.

At the same time, the agent container logs show the following:

time="2025-07-14T06:28:25Z" level=info msg="Chose path /mnt/agent/rclone/1366483864 for model sentence-embedding-qwen3" Name=AgentServiceManager func=LoadModel
time="2025-07-14T06:28:30Z" level=error msg="Retry op #0" Name=AgentServiceManager error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 0.0.0.0:9500: connect: connection refused"" func=LoadModel

So it seems the agent tries to contact mlserver via gRPC (port 9500), but the connection is refused. In mlserver logs I can see that:

2025-07-15 09:53:00,755 [mlserver.grpc] INFO - gRPC server running on http://0.0.0.0:9500

🤔Questions:

  • Is this means that problem has nothing with gRPC itself but the way mlserver loading my model (some files are missing/hf format is incorrect or smth else)?
  • Has anyone encountered this behavior where mlserver deletes the model immediately after download? How can I find more informative error trace without digging into Go source code?
  • What could be preventing gRPC communication from agent to mlserver (or maybe this trace is confusing)?
  • Any advice or best practices for deploying custom sentence-transformer models (like this) in Seldon Core?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions