LST: add LSTGeometry package and associated ESProducer#50679
LST: add LSTGeometry package and associated ESProducer#50679ariostas wants to merge 4 commits intocms-sw:masterfrom
Conversation
|
cms-bot internal usage |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50679/48907
|
|
A new Pull Request was created by @ariostas for master. It involves the following packages:
The following packages do not have a category, yet: RecoTracker/LSTGeometry @Martin-Grunewald, @Moanwar, @cmsbuild, @jfernan2, @mandrenguyen, @mmusich, @srimanob can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
test parameters:
|
|
@cmsbuild, please test |
|
-1 Failed Tests: UnitTests HLTP2Timing Failed Unit TestsI found 1 errors in the following unit tests: ---> test test-das-selected-lumis had ERRORS Comparison SummarySummary:
Max Memory Comparisons exceeding threshold@cms-sw/core-l2 , I found 17 workflow step(s) with memory usage exceeding the error threshold: Expand to see workflows ...
|
|
Is ~190 MB increase in memory usage expected? |
That seems a bit high, but it's likely. I'll double-check. Either way, it is only temporarily. Most of it is freed once the maps are constructed. |
According to the monitoring the peak memory usage would increase by ~190 MB, and thus freeing it afterwards doesn't help much if the job was killed because of going over the limit. |
|
test parameters:
|
|
@cmsbuild, please test Maybe one round of profiling tests would be worth it. |
I did back when I hadn't tightened the module maps and the undelying reason was that it was running out of memory. I assume the same thing is still happening. I'm looking into it to see what else could be constributing to higher vram usage. |
OK. I thought/misunderstood that it went away on that test machine with the latest updates. |
now you have full logs link: |
|
-1 Failed Tests: HLTP2Timing Comparison SummarySummary:
|
I was looking at this PR timing log compared to a "reference" run with #50479 log
|
yes.
not yet, see #50479 (comment).
no, we're using both GPUs. Only one CPU socket (out of two) is used in order to have 50/50 compute split. |
|
I've been doing some debugging, and I'm puzzled with what I've been finding. I found that to reliably and clearly reproduce the issue, it's better to restrict to a single GPU, use 1 job, and 16 threads/streams. I'm using the I made a new branch that adds this extra commit SegmentLinking@a9ab182. The commit just switches back to loading the files from the binary files instead of using the ES product, but just leaves all the setup in place. With this setup, the VRAM usage still increases a lot, even though the ESProducer is CPU-only and the product is not used at all. However, by simply commenting out this line the issue is resolved. Here is a plot comparing VRAM usage with and without that line.
So it seems that just having the ESProducer run causes VRAM usage to increase, even though it is purely constructed on the host, and the product is not being used. I find this very confusing, so I was wondering if you have any suggestions. I should mention that if I dial it back to 1 thread/stream, then everything looks identical in both cases. Also, I have tried to profile it with |
The situation almost smells like (or, that would be the easiest explanation I could quickly think of) some other component would be consuming an ES data product on the device and that would trigger the production, but in a way that the component does not fail if the data product is missing. Is e.g. Does the behavior of excessive memory usage reproduce on 1 thread/stream? Does the behavior reproduce if processing only few (down to 1) events? If the answers are "yes", I'd suggest to add the process.add_(cms.Service("Tracer", dumpPathsAndConsumes=cms.untracked.bool(True)))and put the (large) log somewhere accessible. This service prints every framework transition for every module, and when configured like this also the ED and ES data product consumption information. |
No, for 1 thread/strem everything looks normal.
Yeah, it still happens with only a few events.
Here's log with the tracer: part1 part2. Nothing seems obviously wrong. |
commenting out the request in produce is not enough. Saying you consume the item will cause the framework to prefetch it. So to actually keep the module from being called requires to that no module say they consume it. |
Well if I just comment out the consume it's back to normal. The point is that somehow the module being called is causing VRAM usage to increase even though it's a CPU module and the product is never used, so it should have no effect on VRAM usage. |
Right. This behavior is visible in the So when you
the This analysis does not answer to the question on how |
The Tracer log shows only |
If 1 thread/stream shows "good behavior", I'm wondering if the caching allocator could play a role. The allocator is shared, and if some modules allocate concurrently large temporary buffers, those buffers might end up being held by the caching allocator without being used later in the job. On 1 thread these temporary buffers would be allocated and deallocated serially, and the same large buffer could be used by multiple modules. But this is, of course, pure speculation, and does not explain the role of the existence of |
|
The CachingAllocator hypothesis could be investigated further by comparing the behavior between 1-thread and many-thread cases (on a few events). The debug prints of the CachingAllocator can be enabled with if not hasattr(process, "AlpakaServiceCudaAsync"):
process.load("HeterogeneousCore.AlpakaServices.AlpakaServiceCudaAsync_cfi")
process.AlpakaServiceCudaAsync.verbose = TrueA crude way to see the functions that lead to actual memory allocations would be cmsTraceFunction "cms::alpakatools::CachingAllocator<alpaka::DevCudaRt, alpaka::QueueCudaRtNonBlocking>::allocateBuffer" cmsRun ...(I'm not 100 % sure I got the CachingAllocator template instantiation right, possibly tracing calls to just |
I'm currently recompiling everything after adding <flags CXXFLAGS="-DALPAKA_DISABLE_CACHING_ALLOCATOR -DALPAKA_DISABLE_ASYNC_ALLOCATOR"/>to all the LST build files. I'll see what happens and try using the debug prints. Thanks! |
|
can it be something to do with the number of queues (and subsequently some extra allocations coming per queue)? |
|
In addition to the problems already discussed, now this branch has conflicts that must be resolved. @ariostas |



This PR adds a new
RecoTracker/LSTGeometrypackage containing the module map computation used by the LST algorithm. Currently, the maps are pre-computed by the code in https://github.com/SegmentLinking/LSTGeometry and they are stored in https://github.com/cms-data/RecoTracker-LSTCore. This PR allows for the on-the-fly computation of these maps via an ESProducer, ensuring that they stay consistent with the tracker geometry being used.This is the last major task in #46746.
c.c. @slava77