Skip to content

Bump brace-expansion from 1.1.12 to 1.1.13 in /src/rest-server#172

Closed
dependabot[bot] wants to merge 110 commits into
mainfrom
dependabot/npm_and_yarn/src/rest-server/brace-expansion-1.1.13
Closed

Bump brace-expansion from 1.1.12 to 1.1.13 in /src/rest-server#172
dependabot[bot] wants to merge 110 commits into
mainfrom
dependabot/npm_and_yarn/src/rest-server/brace-expansion-1.1.13

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github Mar 31, 2026

Bumps brace-expansion from 1.1.12 to 1.1.13.

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    You can disable automated security fix PRs for this repo from the Security Alerts page.

RuiGaoMS and others added 30 commits June 16, 2025 05:00
## Add script to update DCGM when CUDA is newer than 12.8

###Description
This PR is used to update DCGM binary when CUDA is newer than version 12.8

### Test Actions:
Run a test job on NVDIA node which has newer CUDA version than 12.8.

###Expected Results
The test job can be executed successfully and DCGM version is updated which can be checked from openpai runtime logs
- Merged PR 46: Fix sdk path and submodule checkout
- Merged PR 49: Fix dockerfile build issues of alert-manager
- Merged PR 54: Bug fix of alert-parser and node-issue-classifiler
- Merged PR 57: Add init vm states and info for first launch
- Merged PR 62: alert-parser: handle case for node availibility status change during service down
- Merged PR 61: fix bug and add logging for node recycler
- Merged PR 63: Update validation job config
- Merged PR 59: security fix
…ncrease memory for deployment

fix validation result parse issue and increase memory for deployment
…running jobs

##[NewFeature] enable VC admin for users so the VC admin can stop the running jobs

###Description:
This feature supports the set the specific user as VC admin so the user can stop other users' running jobs

###Test Actions:
1. Create a running job by user 1
2. Login as user 2 to try to stop the user 1's running job, the expected result should be "cannot stop" message box
3. Use update user API to set user 2 as the VC admin for the virtual cluster in which the running job is
4. Try to stop the running job with user 2's credentials and the job should be stopped successfully.

###Expected Results:
1. Job started successfully
2. The job cannot be sopped
3. The user has been updated successfully, and is VC admin for the specific virtual cluster though ['extension']['vcadmins']
4. The job stopped successfully
Add feature: Dashboard
1. add initial ltp_productivity dashboard, preview version
2. table name convension: [category] - [index] - [title]
…ylon service

## [NewFeature]: "Add Reverse Proxy Client Support in Pylon"

### Description:

This feature adds support for a reverse proxy client in Pylon, allowing external access to the Pai cluster while keeping the internal network secure. The reverse proxy client forwards requests to the Pylon service running on the cluster.

### Test Actions:

1. build images of this service by "./build/pai_build.py build -c [config_file] -s pylon"
2. deploy the pylon service with reverse proxy client by refer to [this Readme file](./src/pylon/README.md)
3. submit a test job to Pai cluster:

    Using ubuntu image:

    ```sh
    python3 -V
    mkdir -p /app
    wget https://zhogu.github.io/plugins/test/test_server.py -O /app/server.py
    cat /app/server.py
    python3 /app/server.py $PAI_PORT_LIST_taskrole_0_http
    ```
4. access the internal job server by the reverse proxy client:

    ```sh
    curl https://gateway.openpai.org/auto-test/job-server/<$PAI_HOST_IP_taskrole_0 >:<$PAI_PORT_LIST_taskrole_0_http>/
    ```

### Expected Results:

<!-- 1. Build successfully
2. Deploy successfully
3. Data should be mounted successfully
4. Job should run successfully/Job should be refused -->

1. Build successfully
2. Deploy successfully
3. Test job should run successfully
4. Access the job server through the reverse proxy client should return the expected response from the job server.
1. update MTBF and MTBI tables with per week and cumulative results
2. fix bug in Top 1 failure category retrival
add dind runtimeplugin in job protocol
disable non-acr image
…rd.CapacityStatus table from kusto cluster azcore

service for backup the Dashboard.CapacityStatus table from kusto cluster azcore
clean the virtualClusters which resourcesTotal is empty
…database from local disk to Azure disk

Remove internal storage service and replace PostgreSql database from local disk to Azure disk
support node logs in log manager
- syslog, kern.log, dmesg
- journalctl --category
add job summary data recorder to kusto
…ob for prometheus

add deploying disk and blob of prometheus
##[bugifx] Fix the user's group list when changing group list

### Description
When the group information is updated in rest-server, the user's related information cannot be updated unless the administrator updated it user by user manually. This PR is going to fix this problem. After testing, we find that the existing delete group REST API can remove the group information for each user, but the newly added group information cannot be updated for each user. To update the new-added group information, we add a REST API to clean all the tokens in the AKS system and add token reading from cache and token verifying in the code. After we removed all the tokens, the user needs to re-login to the service and at that time, the new group information will be updated into the user's account.

### Test Actions
1. List all users to check their group list
2. Call REST API remove a group from the system which is in the tester's group list
3. List all users to check if the group has been removed from the user information as well as the VCs associated with this group
4. Add new group into the system
5. List all users to check their group list again
6. Call REST API to remove all the tokens in the system
7. Login to the system again
8. List all users to check if the group list has been updated

### Expected Results
1. All the user's information including the group list for each user
2. The API returned status code 200
3. All the user information has been updated. The removed group and its associated VCs won't been seen in the result.
4. The service restarts successfully
5. All the user information is listed, as same as step 3.
6. The API returned status code 200
7. The message box "token is invalid" is showed, and after login, the main page can be seen
8. All the user information is listed, and the new group and VCs have been added in the tester's user information.
…iewed as a non-admin...

# Enable a new field  in plugin to set whether it can be viewed as a non-admin user

In the `service-configuration.yaml`, add a new filed `onlyadmin` to configure whether a plugin can be viewed by a non-admin user.

If `onlyadmin` is set and its value is not 0, the plugin can only be viewed by admin users.

The configuration can be like:

```
webportal:
  plugins:
  - id: plugin1
    title: title 1
    uri: https://..../main-test.js
  - id: admin-plugin
    title: Admin Plugin
    uri: https://.../main.js
    onlyadmin: 1
```
increase the timeout of job server
- add cert expiration checker in alert-manager
…stop button in chat plugin

1. add history clean button
2. add request stop button
3. add sigma model logo for chating with sigma model
…tgreSql database from local disk to Azure disk'

Remove internal storage service and replace PostgreSql database from local disk to Azure disk

Reverts !76
Add `available_nodata` status in Kusto SDK.
align url between copilot frontend and backend
…s service

# Node Failure Detection System Implementation

## Overview
This PR implements a comprehensive node failure detection module. The module includes monitoring, detection, and alerting capabilities to identify and respond to node failures automatically. The implementation uses Redis for communication between components and integrates with the existing alert-manager for notifications.

## New Components

### 1. **Monitor Service** (`monitor/`)
**Purpose**: Collects monitoring data from various sources (Prometheus, logs, APIs) based on configurable specifications.

#### Key Files:
- `monitor_service.py`: Main orchestrator for data collection
- `service_runner.py`: Service lifecycle management and Redis request handling
- `scheduler.py`: Manages scheduled pattern-based data collection
- `executor.py`: Executes data collection specifications
- `data_sources.py`: Clients for Prometheus, job logs, node logs, and metadata
- `models.py`: Data structures for collection specifications and results
- `validator.py`: Validates collection specifications

#### Features:
- **Immediate Collection**: On-demand data collection for investigations
- **Scheduled Collection**: Periodic data collection for continuous monitoring
- **Multi-Source Data**: Collects from Prometheus metrics, job logs, node logs, and REST APIs
- **Configurable Specs**: YAML-based specifications for data collection patterns
- **Result Storage**: Stores results to both local files and Redis streams

### 2. **Detector Service** (`detector/`)
**Purpose**: Analyzes monitoring data using pattern-based detection rules and sends alerts to alert-manager.

#### Key Files:
- `detector_service.py`: Main detector orchestrator
- `analysis_executor.py`: Executes pattern analysis and sends alerts
- `monitor_listener.py`: Listens for monitor results and triggers analysis
- `pattern_registry.py`: Manages detection pattern registration and loading
- `redis_client.py`: Redis communication for monitor-detector coordination

#### Features:
- **Pattern-Based Detection**: Configurable Python-based detection patterns
- **Real-Time Analysis**: Analyzes monitoring results as they arrive
- **Alert Integration**: Sends alerts directly to alert-manager with proper routing
- **Action-Based Routing**: Routes alerts to appropriate receivers based on action types

### 3. **Configuration Management**
**Purpose**: Manages configuration for monitoring specifications and detection patterns.

#### Pattern Types:
- **Scheduled Patterns**: Run periodically for continuous monitoring
- **Event-Driven Patterns**: Triggered on-demand for investigations

#### Key Files:
- `configs/monitor_specs/`: YAML specifications for data collection
- `configs/detect_patterns/`: Python detection rules for various node failure scenarios
- `deploy/monitor-specs-configmap.yaml.template`: Kubernetes ConfigMap for monitor specs
- `deploy/detector-patterns-configmap.yaml.template`: Kubernetes ConfigMap for detection patterns

### Configuration
```yaml
node-fa...
Add deployment for cluster-local storage service.
Add cluster-local storage service:
* download data from blob by specifying path, blob_dir, blob_token
* delete data by specifying path
* sync data periodically
…de recycler

In start.sh, when node recycler is not configured, the container will not be deployed but the service map icm-certs will be deployed which causes deployment error. This fix is to ignore the service map deployment icm-certs.
hippogr and others added 23 commits November 6, 2025 09:54
…ad of sysnodepool (#104)

Co-authored-by: Rui Gao <ruigao@microsoft.com>
Major Revision
 - fix deployment template parse error when cluster-utilization, abnormal detector and related setting not configured
…sh specific images in alertmanager to GHCR; (#102)

* install python-icm when build alert-manager

* XXX: trigger building

* update

* update

* update

* update

* add a new cicd workflow to build all services

* fix build error in docker 28

* change cicd name

* update

* update

* update

* update

* update

* update

* update image pushing

* update

* update

* update

* update

* update

* update

* update
Add tolerations and priority class for cluster local storage daemons to
make them less likely to be evicted in disk pressure events.
* enable imagelist argument for image build script to enable build single docker image instead of whole service

* add default value for imagelist argument

---------

Co-authored-by: Rui Gao <ruigao@microsoft.com>
…manually or on release. (#120)

* diable build all image during PR
… inference job (#113)

* add job type support; add inference job support

* update

* deploy webportal-dind when webportal changed

* update

* update

* update

* update

* update

* update
* update

* check job protocol for inference job

* support jobType when query jobs

* update

* fix tag filter bug

* update

* update

* update

* update
remove force acr image and add image regex to limit the valid image
* update cilium version to 1.15.17 (with docker image from non-ACR)

* update the docker image from our own ACR with Ubuntu update

* update cilium version to v1.17.5

* enable host legacy porting

* change docker image pull policy to Always

---------

Co-authored-by: Rui Gao <ruigao@microsoft.com>
Fix service build and deployment on arm64 architecture.
…kusto (#117)

## Pull Request Overview

This PR adds PostgreSQL as an alternative storage backend alongside the existing Kusto implementation. It introduces a storage abstraction layer through `ltp-storage-common`, implements a PostgreSQL SDK with schema management, and updates all services to support dual backends via a factory pattern.

- Introduces `ltp-storage-common` package with shared data schemas, and storage factory
- Implements `postgresql-sdk` with full CRUD operations, Alembic migrations, and Kusto-compatible interface and add a schema management service with health checks and migration support
- Updates `kusto-sdk`, `alert-manager`, and `cluster-local-storage` to use the factory pattern for backend selection
* support job Type and inference job parameters

* update

* update

* update

* update

* update

---------

Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
**Description**

Merge bug fixes from v1.4 to dev branch.

**Major Revisions**
* ModelProxy: fix uncommit changes (#122) 
* Fix cicd errors when branch creation and building all imges (#123) 
* Bugfix - fix bug of image regex deployment (#125) 
* Bugfix - fix bugs in alert manager test and fix kusto alert query issue (#126)
* Fix openpai runtime build on arm64 (#128)
* Fix fluentd build on arm64 (#129)
* Bugfix - bug fix for baremetal support(#127)
* Doc - Add Release note for v1.4.0 (#130)

---------

Co-authored-by: zhogu <57975490+zhogu@users.noreply.github.com>
Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>
Fix job exporter compatibility issue on ARM node.
* support assign job name

* fix copilot comments
* fix the problem of unsupported container registry in AKS

* update k8s-host-device-plugin docker file

* add patches for k8s-host-device-plugin

* support multi platform for k8s-host-device-plugin docker file

* add k8s-rdma-shared-dev-plugin docker file to build amd64 / arm version

* add missing tools for k8s-rdma-shared-dev-plugin.k8s.dockerfile

* update rocm device plugin docker file to apply go package update

* fix the path problem in k8s-rocm-device-plugin.k8s.dockerfile

* update kube-scheduler version to 1.33.1

* update NPM packages for alert-handler

* update cluster local storage docker file to fix python packages

* update requirement to fix security issues in copilot-chat

* update pip version for dashboard-data-backup docker file

* update RPM packages for database-controller

* update GO version to 1.24.9 for framework controller

* update GO version to 1.24.9 for hivedscheduler

* update node.js package for job-status-change-notification

* update software version for reverse proxy

* add architecture support for reverseproxy

* update node.js packages for rest server

* update go version for watchdog

* update docker version for webportal-dind

* remove the package update for nvdia device plugin

* Update src/cluster-local-storage/build/cluster-local-storage.common.dockerfile

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix the docker pull problem after docker version updated

* fix the vfs storage for docker in docker

---------

Co-authored-by: Rui Gao <ruigao@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* add missing module when updating DCGM higher version

* install dcgm with specified version

---------

Co-authored-by: Rui Gao <ruigao@microsoft.com>
* add history_vclist in user info to save the vc list which the user used to belong to

* use hostory vc list to retrieve user's job

* add history_vclist into validation schema

* retrieve the vc list and update user info when create/update user

* add history vc list retrieve when user login/update

* clean the code

* add user access checking when retrieve job information including job config and logs

* Update src/rest-server/src/controllers/v2/job.js

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix the try-catch following copilot suggestions

* fix PR comments to remove unused attributes from database

---------

Co-authored-by: Rui Gao <ruigao@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Mount ssh key pairs for cluster local storage.
* Fix numa parsing in job exporter (#143)

Fix numa parsing in job exporter on GB200.

* Fix kubespray deployment on bare metal (#144)

Fix kubespray deployment on bare metal when cffi package is installed by package manager.

* fix the circular dependency (#142)

Co-authored-by: Rui Gao <ruigao@microsoft.com>

* fix more circular dependencies in rest server (#145)

Co-authored-by: Rui Gao <ruigao@microsoft.com>
Co-authored-by: zhogu <57975490+zhogu@users.noreply.github.com>

* Update the workflow tigger  (#146)

* change trigger of github workflow

* update

* update

* update

* add release note for v1.5 (#148)

Co-authored-by: Rui Gao <ruigao@microsoft.com>

---------

Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>
Co-authored-by: Rui Gao <ruigao@microsoft.com>
Co-authored-by: zhogu <57975490+zhogu@users.noreply.github.com>
Bumps [brace-expansion](https://github.com/juliangruber/brace-expansion) from 1.1.12 to 1.1.13.
- [Release notes](https://github.com/juliangruber/brace-expansion/releases)
- [Commits](juliangruber/brace-expansion@v1.1.12...v1.1.13)

---
updated-dependencies:
- dependency-name: brace-expansion
  dependency-version: 1.1.13
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file javascript Pull requests that update javascript code labels Mar 31, 2026
@hippogr hippogr closed this Apr 2, 2026
@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github Apr 2, 2026

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version. You can also ignore all major, minor, or patch releases for a dependency by adding an ignore condition with the desired update_types to your config file.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

@dependabot dependabot Bot deleted the dependabot/npm_and_yarn/src/rest-server/brace-expansion-1.1.13 branch April 2, 2026 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file javascript Pull requests that update javascript code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants