Skip to content

Pilot manager hadler simplifications#8624

Draft
fstagni wants to merge 5 commits into
DIRACGrid:integrationfrom
fstagni:pilotManagerHadler_simplifications
Draft

Pilot manager hadler simplifications#8624
fstagni wants to merge 5 commits into
DIRACGrid:integrationfrom
fstagni:pilotManagerHadler_simplifications

Conversation

@fstagni

@fstagni fstagni commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

First part for #8597

BEGINRELEASENOTES

*WMS
CHANGE: Removed several RPCs and DB calls for PilotManager

ENDRELEASENOTES

@fstagni fstagni force-pushed the pilotManagerHadler_simplifications branch 2 times, most recently from d51a5b2 to e5aa050 Compare June 17, 2026 09:12
@aldbr

aldbr commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

ddev: just need to make sure that htcondor does not remove the outputs from the spool once you requested it once.

@fstagni fstagni force-pushed the pilotManagerHadler_simplifications branch from e5aa050 to 1e804d8 Compare June 18, 2026 12:46
@iueda

iueda commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Checking the BelleDIRAC code following the question at the last Dops+Ddev meeting, we don't have any extension of PilotManagerHandler nor PilotAgentsDB, but I see some RPC calls with RPCClient('WorkloadManagement/PilotManager') such as getGroupedPilotSummary()

@fstagni

fstagni commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Checking the BelleDIRAC code following the question at the last Dops+Ddev meeting, we don't have any extension of PilotManagerHandler nor PilotAgentsDB, but I see some RPC calls with RPCClient('WorkloadManagement/PilotManager') such as getGroupedPilotSummary()

That has been moved to DB-only call, no need to go through the RPC.

@fstagni fstagni force-pushed the pilotManagerHadler_simplifications branch 2 times, most recently from 6c696d9 to 7a863b7 Compare June 22, 2026 15:58
@fstagni

fstagni commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Some info related to the removal of PilotOutput table and related methods:

Historically, running condor_transfer_data did not affect the job's presence in the queue, so it was possible to invoke it multiple times on the same job and retrieve the same output files each time.
However, this behavior changed recently in the HTCondor 25.x series. A successful condor_transfer_data now causes the job to become eligible to leave the queue shortly afterwards. This change was introduced because many users expected the job to be automatically cleaned up after retrieving the output, rather than having to run condor_rm separately.
So the answer depends on the HTCondor version in use: older versions allow repeated retrievals, while newer versions may remove the job from the queue after a successful transfer, preventing subsequent condor_transfer_data invocations.
Jamie (James Frey) also mentioned that the HTCondor team would be open to adding an option to preserve the old behavior if there is interest in such a use case.

This puts for the moment a halt on the attempt to simply remove the DIRAC functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PilotManager refactor Create a legacy adaptor for PilotManager

3 participants