Skip to content

DAOS-18882 vos: avoid heap_curr_allocated underflow#18108

Merged
gnailzenh merged 9 commits intomasterfrom
grom72/DAOS-18882-2nd
Apr 28, 2026
Merged

DAOS-18882 vos: avoid heap_curr_allocated underflow#18108
gnailzenh merged 9 commits intomasterfrom
grom72/DAOS-18882-2nd

Conversation

@grom72
Copy link
Copy Markdown
Contributor

@grom72 grom72 commented Apr 25, 2026

Substitutes: #18103
Update PMDK to incorporate the following fixes:

fix "The pool was not closed" message (no ADR failure) daos-stack/pmdk#36
recalculate curr_allocated on underflow daos-stack/pmdk#37, daos-stack/pmdk#38

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

grom72 and others added 4 commits April 24, 2026 12:15
Update PMDK to incorporate the following fixes:
- fix "The pool was not closed" message (no ADR failure) daos-stack/pmdk#36
- recalculate curr_allocated on underflow daos-stack/pmdk#37

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>

Priority: 2

Allow-unstable-test: true

Focus validation on PMem version

Skip-func-hw-test-medium: false
Skip-func-hw-test-medium-md-on-ssd: true
Skip-func-hw-test-medium-vmd: false
Skip-func-hw-test-medium-verbs-provider: false
Skip-func-hw-test-medium-verbs-provider-md-on-ssd: true
Skip-func-hw-test-large: false
Skip-func-hw-test-large-md-on-ssd: true
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Signed-off-by: Oksana Salyk <oksana.salyk@hpe.com>
Signed-off-by: Ryon Jensen <ryon.jensen@hpe.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>

Priority: 2

Skip-unit-tests:true
Skip-unit-test: true
Skip-NLT: true
Skip-unit-test-memcheck: true

Skip-func-vm: true
Skip-func-test-el9: true
Skip-fault-injection-test: true
Skip-test-el-9.6-rpms: true
Skip-test-leap-15-rpms: true

Skip-func-hw-test-medium: false
Skip-func-hw-test-medium-verbs-provider: false
Skip-func-hw-test-large: false
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 25, 2026

Ticket title is '0 size SCM on pool with no containers'
Status is 'Awaiting backport'
Labels: 'request_for_2.8'
https://daosio.atlassian.net/browse/DAOS-18882

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18108/1/testReport/

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18108/1/testReport/

grom72 added 2 commits April 27, 2026 13:23
obj: recalculate curr_allocated on underflow (fix) (#38)
Use compare-and-swap loop in STATS_INC/DEC_persistent to avoid overflow-underflow race.

Validation only in environment with PMem

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>

Cancel-prev-build: false
Priority: 2

Skip-build-el8-gcc: true
Skip-build-leap15-gcc: true
Skip-build-leap15-icc: true

Skip-unit-tests:true
Skip-unit-test: true
Skip-NLT: true
Skip-unit-test-memcheck: true

Skip-func-vm: true
Skip-func-test-el9: true
Skip-fault-injection-test: true
Skip-test-el-9.6-rpms: true
Skip-test-leap-15-rpms: true

Skip-func-hw-test-medium: false
Skip-func-hw-test-medium-md-on-ssd: true
Skip-func-hw-test-medium-vmd: false
Skip-func-hw-test-medium-verbs-provider: false
Skip-func-hw-test-medium-verbs-provider-md-on-ssd: true
Skip-func-hw-test-medium-ucx-provider: false
Skip-func-hw-test-large: false
Skip-func-hw-test-large-md-on-ssd: true
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Skip-list: test_dfuse_daos_build_wb:SRE-3734 test_dfuse_daos_build_wt:SRE-3734
Cancel-prev-build: false
Priority: 2
@grom72
Copy link
Copy Markdown
Contributor Author

grom72 commented Apr 27, 2026

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Skip-list: test_dfuse_daos_build_wb:SRE-3734 test_dfuse_daos_build_wt:SRE-3734
Cancel-prev-build: false
Priority: 2
@grom72 grom72 marked this pull request as ready for review April 27, 2026 13:09
@grom72 grom72 requested a review from a team as a code owner April 27, 2026 13:10
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>

Doc-only: true
Cancel-prev-build: false
Comment thread utils/rpms/daos.changelog Outdated
Comment on lines +2 to +4
* Fri Apr 24 2026 Tomasz Gromadzki <tomasz.gromadzki@intel.com> 2.9.100-3
- Update PMDK to version 2.1.3-2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i sort of agree with Jerome on this that a DAOS version bump is not needed here. like we cannot keep bumping the daos version for every dependency upgrade.. the version will go out of control pretty quick.

but i won't block the PR for this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation is in progress (so far no critical issue).
I will come back to this issue if we need to make another upgrade to this PR.

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18108/3/testReport/

mchaarawi
mchaarawi previously approved these changes Apr 27, 2026
janekmi
janekmi previously approved these changes Apr 27, 2026
Copy link
Copy Markdown
Contributor

@janekmi janekmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to fix the following in case you will have next validation round. Do not bother otherwise.

Comment thread utils/rpms/daos.changelog Outdated
@phender
Copy link
Copy Markdown
Contributor

phender commented Apr 27, 2026

Test stage Functional Hardware Medium completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18108/3/testReport/

Known issue https://daosio.atlassian.net/browse/SRE-3734 with the 1-./dfuse/daos_build.py:DaosBuild.test_dfuse_daos_build_wb test

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium UCX Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18108/3/execution/node/590/log

@phender
Copy link
Copy Markdown
Contributor

phender commented Apr 27, 2026

Test stage Functional Hardware Medium UCX Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18108/3/execution/node/590/log

The 06-./daos_test/suite.py:DaosCoreTest.test_daos_single_rdg_tx failure and the corresponding FTEST_daos_test.DaosCoreTest-DAOS_Single_RDG_TX.DTX[19-23] cmocka failures looks like https://daosio.atlassian.net/browse/DAOS-18888.

The 24-./daos_test/suite.py:DaosCoreTest.test_daos_rebuild_ec failure and the corresponding FTEST_daos_test.DaosCoreTest-DAOS_Rebuild_EC.REBUILD[8-48] cmocka failures has been reported in https://daosio.atlassian.net/browse/DAOS-18897

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>

Doc-only: true
Cancel-prev-build: false
@grom72 grom72 dismissed stale reviews from janekmi and mchaarawi via 4a417c8 April 28, 2026 04:46
@grom72 grom72 requested a review from janekmi April 28, 2026 04:48
@grom72 grom72 requested a review from mchaarawi April 28, 2026 04:48
@grom72
Copy link
Copy Markdown
Contributor Author

grom72 commented Apr 28, 2026

@gnailzenh gnailzenh merged commit abf7984 into master Apr 28, 2026
47 of 48 checks passed
@gnailzenh gnailzenh deleted the grom72/DAOS-18882-2nd branch April 28, 2026 09:44
mchaarawi pushed a commit that referenced this pull request Apr 28, 2026
Update PMDK to incorporate the following fixes:
- fix "The pool was not closed" message (no ADR failure) daos-stack/pmdk#36
- recalculate curr_allocated on underflow daos-stack/pmdk#37


Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Co-authored-by: Ryon Jensen <ryon.jensen@hpe.com>
mchaarawi pushed a commit that referenced this pull request Apr 28, 2026
Update PMDK to incorporate the following fixes:
- fix "The pool was not closed" message (no ADR failure) daos-stack/pmdk#36
- recalculate curr_allocated on underflow daos-stack/pmdk#37

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Signed-off-by: Oksana Salyk <oksana.salyk@hpe.com>
Signed-off-by: Ryon Jensen <ryon.jensen@hpe.com>
Co-authored-by: Oksana Salyk <oksana.salyk@hpe.com>
Co-authored-by: Ryon Jensen <ryon.jensen@hpe.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

8 participants