Agent fails to communicate with mlserver over gRPC after model is downloaded

Hey, folks!

I’ve installed Seldon Core 2 and all its sub-services in a Kubernetes cluster using the [official production setup tutorial](https://docs.seldon.ai/seldon-core-2/installation/production-environment).

Everything seems to be running smoothly — all pods are communicating without issues.
My goal is to use MinIO (deployed as a pod with a local persistent volume) as the backend for storing models 

📦 Model CRD

Here’s the Model Custom Resource I’m applying:
`apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: sentence-embedding-qwen3
  namespace: seldon-mesh
spec:
  storageUri: "s3://seldon-models/sentence-embedding-qwen3"
  secretName: seldon-minio-secret
  requirements:
    - mlserver`

📁 Model Storage Layout in MinIO

The bucket path /sentence-embedding-qwen3 contains:
model-settings.json
`{
  "name": "sentence-embedding-qwen3",
  "implementation": "inference.SentenceEmbeddingModel",
  "parameters": {
    "uri": "./model"
  }
}`
/model/ directory:

- 1_Pooling/
- added_tokens.json
- chat_template.jinja
- config_sentence_transformers.json
- config.json
- merges.txt
- model-00001-of-00002.safetensors
- model-00002-of-00002.safetensors
- model.safetensors.index.json
- modules.json
- README.md
- sentence_bert_config.json
- special_tokens_map.json
- tokenizer_config.json
- tokenizer.json
- vocab.json

❌ Problem

When I apply the Model manifest, the mlserver pod (with the rclone sidecar) successfully downloads the model into its volume and then immediately deletes it.

At the same time, the `agent` container logs show the following:

time="2025-07-14T06:28:25Z" level=info msg="Chose path /mnt/agent/rclone/1366483864 for model sentence-embedding-qwen3" Name=AgentServiceManager func=LoadModel
time="2025-07-14T06:28:30Z" level=error msg="Retry op #0" Name=AgentServiceManager error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 0.0.0.0:9500: connect: connection refused\"" func=LoadModel

So it seems the agent tries to contact mlserver via gRPC (port 9500), but the connection is refused. In mlserver logs I can see that:

`2025-07-15 09:53:00,755 [mlserver.grpc] INFO - gRPC server running on http://0.0.0.0:9500`

🤔Questions: 

- Is this means that problem has nothing with gRPC itself but the way mlserver loading my model (some files are missing/hf format is incorrect or smth else)?
- Has anyone encountered this behavior where mlserver deletes the model immediately after download? How can I find more informative error trace without digging into Go source code?
- What could be preventing gRPC communication from agent to mlserver (or maybe this trace is confusing)?
- Any advice or best practices for deploying custom sentence-transformer models (like this) in Seldon Core?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent fails to communicate with mlserver over gRPC after model is downloaded #6612

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Agent fails to communicate with mlserver over gRPC after model is downloaded #6612

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions