DAOS-18882 vos: avoid heap_curr_allocated underflow#18108
Conversation
Update PMDK to incorporate the following fixes: - fix "The pool was not closed" message (no ADR failure) daos-stack/pmdk#36 - recalculate curr_allocated on underflow daos-stack/pmdk#37 Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com> Priority: 2 Allow-unstable-test: true Focus validation on PMem version Skip-func-hw-test-medium: false Skip-func-hw-test-medium-md-on-ssd: true Skip-func-hw-test-medium-vmd: false Skip-func-hw-test-medium-verbs-provider: false Skip-func-hw-test-medium-verbs-provider-md-on-ssd: true Skip-func-hw-test-large: false Skip-func-hw-test-large-md-on-ssd: true Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Signed-off-by: Oksana Salyk <oksana.salyk@hpe.com>
Signed-off-by: Ryon Jensen <ryon.jensen@hpe.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com> Priority: 2 Skip-unit-tests:true Skip-unit-test: true Skip-NLT: true Skip-unit-test-memcheck: true Skip-func-vm: true Skip-func-test-el9: true Skip-fault-injection-test: true Skip-test-el-9.6-rpms: true Skip-test-leap-15-rpms: true Skip-func-hw-test-medium: false Skip-func-hw-test-medium-verbs-provider: false Skip-func-hw-test-large: false
|
Ticket title is '0 size SCM on pool with no containers' |
|
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18108/1/testReport/ |
|
Test stage Functional Hardware Medium completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18108/1/testReport/ |
obj: recalculate curr_allocated on underflow (fix) (#38) Use compare-and-swap loop in STATS_INC/DEC_persistent to avoid overflow-underflow race. Validation only in environment with PMem Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com> Cancel-prev-build: false Priority: 2 Skip-build-el8-gcc: true Skip-build-leap15-gcc: true Skip-build-leap15-icc: true Skip-unit-tests:true Skip-unit-test: true Skip-NLT: true Skip-unit-test-memcheck: true Skip-func-vm: true Skip-func-test-el9: true Skip-fault-injection-test: true Skip-test-el-9.6-rpms: true Skip-test-leap-15-rpms: true Skip-func-hw-test-medium: false Skip-func-hw-test-medium-md-on-ssd: true Skip-func-hw-test-medium-vmd: false Skip-func-hw-test-medium-verbs-provider: false Skip-func-hw-test-medium-verbs-provider-md-on-ssd: true Skip-func-hw-test-medium-ucx-provider: false Skip-func-hw-test-large: false Skip-func-hw-test-large-md-on-ssd: true
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com> Skip-list: test_dfuse_daos_build_wb:SRE-3734 test_dfuse_daos_build_wt:SRE-3734 Cancel-prev-build: false Priority: 2
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com> Skip-list: test_dfuse_daos_build_wb:SRE-3734 test_dfuse_daos_build_wt:SRE-3734 Cancel-prev-build: false Priority: 2
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com> Doc-only: true Cancel-prev-build: false
| * Fri Apr 24 2026 Tomasz Gromadzki <tomasz.gromadzki@intel.com> 2.9.100-3 | ||
| - Update PMDK to version 2.1.3-2 | ||
|
|
There was a problem hiding this comment.
i sort of agree with Jerome on this that a DAOS version bump is not needed here. like we cannot keep bumping the daos version for every dependency upgrade.. the version will go out of control pretty quick.
but i won't block the PR for this.
There was a problem hiding this comment.
Validation is in progress (so far no critical issue).
I will come back to this issue if we need to make another upgrade to this PR.
|
Test stage Functional Hardware Medium completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18108/3/testReport/ |
janekmi
left a comment
There was a problem hiding this comment.
You may want to fix the following in case you will have next validation round. Do not bother otherwise.
Known issue https://daosio.atlassian.net/browse/SRE-3734 with the |
|
Test stage Functional Hardware Medium UCX Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18108/3/execution/node/590/log |
The The |
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com> Doc-only: true Cancel-prev-build: false
|
All regular PR tests has passed: https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/view/change-requests/job/PR-18108/5/pipeline-overview |
Update PMDK to incorporate the following fixes: - fix "The pool was not closed" message (no ADR failure) daos-stack/pmdk#36 - recalculate curr_allocated on underflow daos-stack/pmdk#37 Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com> Co-authored-by: Ryon Jensen <ryon.jensen@hpe.com>
Update PMDK to incorporate the following fixes: - fix "The pool was not closed" message (no ADR failure) daos-stack/pmdk#36 - recalculate curr_allocated on underflow daos-stack/pmdk#37 Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com> Signed-off-by: Oksana Salyk <oksana.salyk@hpe.com> Signed-off-by: Ryon Jensen <ryon.jensen@hpe.com> Co-authored-by: Oksana Salyk <oksana.salyk@hpe.com> Co-authored-by: Ryon Jensen <ryon.jensen@hpe.com>
Substitutes: #18103
Update PMDK to incorporate the following fixes:
fix "The pool was not closed" message (no ADR failure) daos-stack/pmdk#36
recalculate curr_allocated on underflow daos-stack/pmdk#37, daos-stack/pmdk#38
Steps for the author:
After all prior steps are complete: