Skip to content

Use streaming to avoid TTFT timeouts #285

Description

@lhoestq

I'm getting timeouts using flock to query a model with hard maths problems

INSTALL flock FROM community;
LOAD flock;

CREATE SECRET (TYPE OPENAI, BASE_URL 'https://router.huggingface.co/v1', API_KEY 'hf_xxx');
CREATE MODEL('GLM-5.2', 'zai-org/GLM-5.2:cheapest', 'openai');
CREATE PROMPT('answer', 'Anwser the mathematical question. {question}');

SELECT llm_complete(
    {'model_name': 'GLM-5.2'},
    {'prompt_name': 'answer', 'context_columns': [{'data': problem}]}
)
FROM 'hf://datasets/PeakStars/Math-Instruct/train-00000-of-00001.parquet'
LIMIT 5;

I'm getting

Invalid Error:
[ModelProvider] Invalid JSON response (HTTP 504)

It seems to be because flock doesn't use streaming and the response takes too long to arrive.
Should the openai implementation in flock use streaming instead to avoid such issues ? cc @anasdorbani

impacted inference providers so far: novita, deepinfra, zai-org

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions