Skip to content

feat(oceanbase): add multi-process loader, configurable index/partition params, and HNSW_BQ cosine support #769

Open
wyfanxiao wants to merge 2 commits intozilliztech:mainfrom
wyfanxiao:develop
Open

feat(oceanbase): add multi-process loader, configurable index/partition params, and HNSW_BQ cosine support #769
wyfanxiao wants to merge 2 commits intozilliztech:mainfrom
wyfanxiao:develop

Conversation

@wyfanxiao
Copy link
Copy Markdown
Contributor

Hi team! 👋

This PR enhances the OceanBase client and adds a multi-process data loader to improve load throughput for SQL-based vector DBs.

Summary

  • Add MultiprocessInsertRunner for parallel data loading across worker processes, bypassing GIL limitation for SQL-based vector DBs
  • OceanBase: declare thread_safe=False, auto-switch to multi-process loader when --load-concurrency>1
  • Add --load-processes CLI option for explicit multi-process control
  • Add --create-index-parallel CLI option (default 16) for configurable index build parallelism
  • Add --extra-info-max-size CLI option (default 32, set 0 to omit) for HNSW index
  • Add --partitions CLI option for KEY partitioning (default 0, no partition)
  • HNSW_BQ: remove forced L2 for cosine metric, now supports cosine natively
  • Improve concurrent_runner log message for thread_safe=False fallback
  • pyproject.toml: add pyyaml dependency, fix packages.find to include all subpackages

We've been using these changes in our OceanBase benchmark testing and they've been working well. Would love to get this merged so it can benefit other users too. Thanks in advance for taking the time to review! Happy to address any feedback. 🙏

…W_BQ cosine support

- Add MultiprocessInsertRunner for parallel data loading across worker processes
- OceanBase: declare thread_safe=False, auto-switch to multi-process loader
- Add --load-processes CLI option for explicit multi-process control
- Add --create-index-parallel CLI option (default 16)
- Add --extra-info-max-size CLI option (default 32, set 0 to omit)
- Add --partitions CLI option for KEY partitioning (default 0, no partition)
- HNSW_BQ: remove forced L2 for cosine, now supports cosine natively
- Improve concurrent_runner log message for thread_safe=False fallback
- pyproject.toml: add pyyaml dependency, fix packages.find to include all subpackages
@sre-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wyfanxiao
To complete the pull request process, please assign xuanyang-cn after the PR has been reviewed.
You can assign the PR to them by writing /assign @xuanyang-cn in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wyfanxiao wyfanxiao changed the title feat(oceanbase): multi-process loader, configurable index params, HNS… feat(oceanbase): add multi-process loader, configurable index/partition params, and HNSW_BQ cosine support Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants