Use streaming to avoid TTFT timeouts

I'm getting timeouts using flock to query a model with hard maths problems

```sql
INSTALL flock FROM community;
LOAD flock;

CREATE SECRET (TYPE OPENAI, BASE_URL 'https://router.huggingface.co/v1', API_KEY 'hf_xxx');
CREATE MODEL('GLM-5.2', 'zai-org/GLM-5.2:cheapest', 'openai');
CREATE PROMPT('answer', 'Anwser the mathematical question. {question}');

SELECT llm_complete(
    {'model_name': 'GLM-5.2'},
    {'prompt_name': 'answer', 'context_columns': [{'data': problem}]}
)
FROM 'hf://datasets/PeakStars/Math-Instruct/train-00000-of-00001.parquet'
LIMIT 5;
```

I'm getting
```
Invalid Error:
[ModelProvider] Invalid JSON response (HTTP 504)
```

It seems to be because flock doesn't use streaming and the response takes too long to arrive.
Should the openai implementation in flock use streaming instead to avoid such issues ? cc @anasdorbani

impacted inference providers so far: novita, deepinfra, zai-org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use streaming to avoid TTFT timeouts #285

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Use streaming to avoid TTFT timeouts #285

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions