Skip to content
Open
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
df8c195
✨ Added chart for vllm-openvino
krish918 Sep 5, 2024
d339c74
✨ Added charts for llm-vllm microservice
krish918 Sep 5, 2024
c8a420c
➕ Updated chatqna to have conditional dependency on tgi and vllm
krish918 Sep 6, 2024
21be6c9
🧪 Added tests for verifying pod sanity
krish918 Sep 6, 2024
25528c9
📝 Added docs for instruction to setup chatqna with vllm
krish918 Sep 6, 2024
140d1b5
🔥 removed unsupported env vars
krish918 Sep 6, 2024
815c51b
♻️ Removed global Model ID var | resolved readme conflicts
krish918 Sep 6, 2024
2621fa3
Merge branch 'main' into chart/vllm-ov
krish918 Sep 6, 2024
4ac8fb0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 6, 2024
5fffdd0
📌 Bumped up the chart version
krish918 Sep 6, 2024
7497322
🔥 Removed unused vars and resources
krish918 Sep 10, 2024
4154f02
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 18, 2024
027923c
🔧 added openvino values files
krish918 Sep 18, 2024
1f513a4
Merge branch 'main' into chart/vllm-ov
krish918 Sep 18, 2024
8b911f5
🩹 minor fixes
krish918 Sep 18, 2024
2ba4c8f
🩹 renamed chart llm-vllm-uservice to avoid conflict
krish918 Sep 18, 2024
207d2bd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 18, 2024
b36ac56
Merge branch 'main' into chart/vllm-ov
krish918 Sep 19, 2024
20670b7
Merge branch 'main' into chart/vllm-ov
krish918 Sep 19, 2024
01eb2b4
updated vllm-openvino image
krish918 Sep 30, 2024
738ff59
🔖 updated tags for llm-vllm and ctrl-uservice
krish918 Oct 8, 2024
ad96222
Merge branch 'main' into chart/vllm-ov
krish918 Oct 8, 2024
e7de84c
🔖 added latest tag for llm-vllm and ctrl-uservice
krish918 Oct 8, 2024
86b8064
🩹 fixed openvino values issue for chatqna
krish918 Oct 8, 2024
34f71b6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 8, 2024
e382dea
📄 added missing openvino values file
krish918 Oct 9, 2024
4065c9e
🔥 removed tags for conditional chart selection
krish918 Oct 9, 2024
81d269c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 9, 2024
1fce716
Merge branch 'main' into chart/vllm-ov
krish918 Oct 9, 2024
05a2be2
📝 formatting fixes in readme files
krish918 Oct 9, 2024
f0dae33
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 9, 2024
a8b85d7
🎨 prettier formatting fixes
krish918 Oct 9, 2024
294f1a0
🎨 prettier formatting fixes for chatqna readme
krish918 Oct 9, 2024
9b12618
retrigger CI checks
krish918 Oct 9, 2024
e890448
📝 minor updates in readme files
krish918 Oct 10, 2024
ef59964
retrigger CI checks
krish918 Oct 10, 2024
acd9a47
💚 enabled ci checks for new values files
krish918 Oct 29, 2024
afc3d45
Merge branch 'main' into chart/vllm-ov
krish918 Oct 29, 2024
cbb8d65
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 29, 2024
d570861
🩹 fixed vllm charts multiple installation
krish918 Oct 29, 2024
26562a9
Merge branch 'main' into chart/vllm-ov
krish918 Oct 30, 2024
03b7d26
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 30, 2024
9e1cdf1
increased helm rollout timeout in ci
krish918 Oct 30, 2024
df8261e
💚 fixes to enable ci for openvino-vllm
krish918 Nov 4, 2024
ac341ac
triggering CI checks
krish918 Nov 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions helm-charts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,16 +32,16 @@ AI application examples you can run directly on Xeon and Gaudi. You can also ref

### Components

Components which are building blocks for AI application.
All components helm charts are put in the ./common directory, and the support list is growing.
Components which are building blocks for AI application.
All components helm charts are put in the ./common directory, and the support list is growing.
Refer to [GenAIComps](https://github.com/opea-project/GenAIComps) for details of each component.

## Deploy with helm charts

### From Source Code

These helm charts are designed to be easy to start, which means you can deploy a workload easily without further options.
However, `HUGGINGFACEHUB_API_TOKEN` should be set in most cases for a workload to start up correctly.
These helm charts are designed to be easy to start, which means you can deploy a workload easily without further options.
However, `HUGGINGFACEHUB_API_TOKEN` should be set in most cases for a workload to start up correctly.
Examples of deploy a workload:

```
Expand Down Expand Up @@ -91,7 +91,7 @@ There are global options(which should be shared across all components of a workl

## Using Persistent Volume

It's common to use Persistent Volume(PV) for model caches(huggingface hub cache) in a production k8s cluster. We support to pass the PersistentVolumeClaim(PVC) to containers, but it's the user's responsibility to create the PVC depending on your k8s cluster's capability.
It's common to use Persistent Volume(PV) for model caches(huggingface hub cache) in a production k8s cluster. We support to pass the PersistentVolumeClaim(PVC) to containers, but it's the user's responsibility to create the PVC depending on your k8s cluster's capability.
Here is an setup example using NFS on Ubuntu 22.04.

- Export NFS directory from NFS server
Expand Down Expand Up @@ -154,10 +154,10 @@ helm install tgi common/tgi --set global.modelUsePVC=model-volume

## Using Private Docker Hub

By default, we're using docker images from [official docker hub](https://hub.docker.com/u/opea), with docker image version aligned with OPEA releases.
By default, we're using docker images from [official docker hub](https://hub.docker.com/u/opea), with docker image version aligned with OPEA releases.
If you have private hub or would like to use different docker image versions, see the following examples.

To use the latest tag for all images:
To use the latest tag for all images:
`find . -name '*values.yaml' -type f -exec sed -i 's#tag: ""#tag: latest#g' {} \;`

To use local docker registry:
Expand All @@ -169,8 +169,8 @@ find . -name '*values.yaml' -type f -exec sed -i "s#repository: opea/*#repositor

## Generate manifests from Helm Charts

Some users may want to use kubernetes manifests(yaml files) for workload deployment, we do not maintain manifests itself, and will generate them using `helm template`.
See update_genaiexamples.sh for how the manifests are generated for supported GenAIExamples.
See update_manifests.sh for how the manifests are generated for supported GenAIComps.
Please note that the above scripts have hardcoded settings to reduce user configuration effort.
Some users may want to use kubernetes manifests(yaml files) for workload deployment, we do not maintain manifests itself, and will generate them using `helm template`.
See update_genaiexamples.sh for how the manifests are generated for supported GenAIExamples.
See update_manifests.sh for how the manifests are generated for supported GenAIComps.
Please note that the above scripts have hardcoded settings to reduce user configuration effort.
They are not supposed to be directly used by users.
10 changes: 10 additions & 0 deletions helm-charts/chatqna/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,19 @@ dependencies:
- name: tgi
version: 1.0.0
repository: "file://../common/tgi"
condition: tgi.enabled
- name: vllm
version: 1.0.0
repository: "file://../common/vllm"
condition: vllm.enabled
- name: llm-uservice
version: 1.0.0
repository: "file://../common/llm-uservice"
condition: tgi.enabled
- name: llm-ctrl-uservice
version: 1.0.0
repository: "file://../common/llm-ctrl-uservice"
condition: vllm.enabled
Comment on lines +26 to +33
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you're adding wrappers?

They were removed over month ago for v1.1 (#474), are unnecessary, and LLM wrapper uses a langserve component with a problematic license (opea-project/GenAIComps#264).

- name: tei
version: 1.0.0
repository: "file://../common/tei"
Expand Down
80 changes: 68 additions & 12 deletions helm-charts/chatqna/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,36 +9,90 @@ Helm chart for deploying ChatQnA service. ChatQnA depends on the following servi
- [redis-vector-db](../common/redis-vector-db/README.md)
- [reranking-usvc](../common/reranking-usvc/README.md)
- [teirerank](../common/teirerank/README.md)
- [llm-uservice](../common/llm-uservice/README.md)
- [tgi](../common/tgi/README.md)

For LLM inference, two more microservices will be required. We can either use [TGI](https://github.com/huggingface/text-generation-inference) or [vLLM](https://github.com/vllm-project/vllm) as our LLM backend. Depending on that, we will have following microservices as part of dependencies for ChatQnA application.

1. For using **TGI** as an inference service, following 2 microservices will be required:

- [llm-uservice](../common/llm-uservice/README.md)
- [tgi](../common/tgi/README.md)

2. For using **vLLM** as an inference service, following 2 microservices would be required:

- [llm-ctrl-uservice](../common/llm-ctrl-uservice/README.md)
Comment on lines +13 to +22
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, why add wrappers?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is from 1.0 release time, so with some old code.
I think it's better to merge with #610 , or just simple changes to support openvino after 610 get merged.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, but note that I'm testing my PR only with vLLM Gaudi version.

I.e. currently both CPU and GPU/Openvino support need to be added / tested after it.

That PR has also quite a few comment TODOs about vLLM options where some feedback would be needed / appreciated.

- [vllm](../common/vllm/README.md)

> **_NOTE :_** We shouldn't have both inference engine deployed. It is required to only setup either of them. To achieve this, conditional flags are added in the chart dependency. We will be switching off flag corresponding to one service and switching on the other, in order to have a proper setup of all ChatQnA dependencies.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there could not be multiple inferencing engines?

ChatQnA has 4 inferencing subservices for which it is already using 2 inferencing engines, TEI and TGI.

And I do not see why it could not use e.g. TEI for embed + rerank, TGI for guardrails, and vLLM for LLM.

Please rephrase.


## Installing the Chart

To install the chart, run the following:
Please follow the following steps to install the ChatQnA Chart:

1. Clone the GenAIInfra repository:

```bash
git clone https://github.com/opea-project/GenAIInfra.git
```

2. Setup the dependencies and required environment variables:

```console
```bash
cd GenAIInfra/helm-charts/
./update_dependency.sh
helm dependency update chatqna
export HFTOKEN="insert-your-huggingface-token-here"
export MODELDIR="/mnt/opea-models"
export MODELNAME="Intel/neural-chat-7b-v3-3"
```

3. Depending on the device which we are targeting for running ChatQnA, please use one the following installation commands:

```bash
# Install the chart on a Xeon machine

# If you would like to use the traditional UI, please change the image as well as the containerport within the values
# append these at the end of the command "--set chatqna-ui.image.repository=opea/chatqna-ui,chatqna-ui.image.tag=latest,chatqna-ui.containerPort=5173"

helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set tgi.LLM_MODEL_ID=${MODELNAME}
```

```bash
# To use Gaudi device
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that there's support for both TGI and vLLM, all these comments here could state which one is used, e.g. like this:

Suggested change
# To use Gaudi device
# To use Gaudi device for TGI

#helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set tgi.LLM_MODEL_ID=${MODELNAME} -f chatqna/gaudi-values.yaml
helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set tgi.LLM_MODEL_ID=${MODELNAME} -f chatqna/gaudi-values.yaml
```

```bash
# To use Nvidia GPU
#helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set tgi.LLM_MODEL_ID=${MODELNAME} -f chatqna/nv-values.yaml
helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set tgi.LLM_MODEL_ID=${MODELNAME} -f chatqna/nv-values.yaml
```

```bash
# To include guardrail component in chatqna on Xeon
#helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} -f chatqna/guardrails-values.yaml
helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} -f chatqna/guardrails-values.yaml
```

```bash
# To include guardrail component in chatqna on Gaudi
#helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} -f chatqna/guardrails-gaudi-values.yaml
helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} -f chatqna/guardrails-gaudi-values.yaml
```

> **_NOTE :_** Default installation will use [TGI (Text Generation Inference)](https://github.com/huggingface/text-generation-inference) as inference engine. To use vLLM as inference engine, please see below.

```bash
# To use vLLM inference engine on XEON device

helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set llm-ctrl-uservice.LLM_MODEL_ID=${MODELNAME} --set vllm.LLM_MODEL_ID=${MODELNAME} --set tgi.enabled=false --set vllm.enabled=true

# To use OpenVINO optimized vLLM inference engine on XEON device

helm -f ./chatqna/vllm-openvino-values.yaml install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set llm-ctrl-uservice.LLM_MODEL_ID=${MODELNAME} --set vllm.LLM_MODEL_ID=${MODELNAME}
```

### IMPORTANT NOTE

1. Make sure your `MODELDIR` exists on the node where your workload is schedueled so you can cache the downloaded model for next time use. Otherwise, set `global.modelUseHostPath` to 'null' if you don't want to cache the model.
1. Make sure your `MODELDIR` exists on the node where your workload is scheduled so you can cache the downloaded model for next time use. Otherwise, set `global.modelUseHostPath` to 'null' if you don't want to cache the model.

2. Please set `http_proxy`, `https_proxy` and `no_proxy` values while installing chart, if you are behind a proxy.

Comment on lines +95 to +96
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO duplicating general information to application READMEs is not maintainable, there are too many of them. Instead you could include link to general options (helm-charts/README.md).

## Verify

Expand All @@ -52,8 +106,9 @@ Run the command `kubectl port-forward svc/chatqna 8888:8888` to expose the servi

Open another terminal and run the following command to verify the service if working:

```console
```bash
curl http://localhost:8888/v1/chatqna \
-X POST \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add redundant POST? -d already implies that (see man curl).

-H "Content-Type: application/json" \
-d '{"messages": "What is the revenue of Nike in 2023?"}'
```
Expand All @@ -76,8 +131,9 @@ Open a browser to access `http://<k8s-node-ip-address>:${port}` to play with the
| image.repository | string | `"opea/chatqna"` | |
| service.port | string | `"8888"` | |
| tgi.LLM_MODEL_ID | string | `"Intel/neural-chat-7b-v3-3"` | Models id from https://huggingface.co/, or predownloaded model directory |
| global.horizontalPodAutoscaler.enabled | bop; | false | HPA autoscaling for the TGI and TEI service deployments based on metrics they provide. See HPA section in ../README.md before enabling! |
| vllm-openvino.LLM_MODEL_ID | string | `"Intel/neural-chat-7b-v3-3"` | Models id from https://huggingface.co/, or predownloaded model directory |
| global.horizontalPodAutoscaler.enabled | bool | false | HPA autoscaling for the TGI and TEI service deployments based on metrics they provide. See HPA section in ../README.md before enabling! |

## Troubleshooting

If you encount any issues, please refer to [ChatQnA Troubleshooting](troubleshooting.md)
If you encounter any issues, please refer to [ChatQnA Troubleshooting](troubleshooting.md).
5 changes: 5 additions & 0 deletions helm-charts/chatqna/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,13 @@ spec:
containers:
- name: {{ .Release.Name }}
env:
{{- if .Values.vllm.enabled }}
- name: LLM_SERVICE_HOST_IP
value: {{ .Release.Name }}-llm-ctrl-uservice
{{- else }}
- name: LLM_SERVICE_HOST_IP
value: {{ .Release.Name }}-llm-uservice
{{- end }}
- name: RERANK_SERVICE_HOST_IP
value: {{ .Release.Name }}-reranking-usvc
- name: RETRIEVER_SERVICE_HOST_IP
Expand Down
18 changes: 17 additions & 1 deletion helm-charts/chatqna/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,14 @@ service:
type: ClusterIP
port: 8888

imagePullSecrets: []

podAnnotations: {}

podSecurityContext: {}

resources: {}

securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
Expand All @@ -43,6 +51,14 @@ horizontalPodAutoscaler:
# Override values in specific subcharts
tgi:
LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
enabled: true

vllm:
LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
enabled: false

llm-ctrl-uservice:
LLM_MODEL_ID: Intel/neural-chat-7b-v3-3

# disable guardrails-usvc by default
# See guardrails-values.yaml for guardrail related options
Expand All @@ -62,9 +78,9 @@ global:
https_proxy: ""
no_proxy: ""
HUGGINGFACEHUB_API_TOKEN: "insert-your-huggingface-token-here"

# set modelUseHostPath or modelUsePVC to use model cache.
modelUseHostPath: ""
# modelUseHostPath: /mnt/opea-models
# modelUsePVC: model-volume

# Prometheus Helm installation info for subchart serviceMonitors
Expand Down
25 changes: 25 additions & 0 deletions helm-charts/chatqna/vllm-openvino-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

tgi:
enabled: false

vllm:
enabled: true
openvino_enabled: true
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not confirm to Helm best practices: https://helm.sh/docs/chart_best_practices/values/

Should be either openvinoEnabled: true, or openvino: true.

image:
repository: opea/vllm-openvino
pullPolicy: IfNotPresent
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the value, it breaks CI testing for latest tag (see #587).

# Overrides the image tag whose default is the chart appVersion.
tag: "latest"

extraCmdArgs: []

LLM_MODEL_ID: Intel/neural-chat-7b-v3-3

CUDA_GRAPHS: "0"
VLLM_CPU_KVCACHE_SPACE: 50
VLLM_OPENVINO_KVCACHE_SPACE: 32
OMPI_MCA_btl_vader_single_copy_mechanism: none

ov_command: ["/bin/bash"]
23 changes: 23 additions & 0 deletions helm-charts/common/llm-ctrl-uservice/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
14 changes: 14 additions & 0 deletions helm-charts/common/llm-ctrl-uservice/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

apiVersion: v2
name: llm-ctrl-uservice
description: A Helm chart for LLM controller microservice which connects with vLLM microservice to provide inferences.
type: application
version: 1.0.0
appVersion: "v1.0"
dependencies:
- name: vllm
version: 1.0.0
repository: file://../vllm
condition: autodependency.enabled
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autodependency.enabled is no longer used, use vllm.enabled instead.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

Loading