Preliminary Checks
Description
Running a load testing experiment (72+ hours) triggered a previously unobserved condition in the seed-1 and plain-2 nodes.
Both nodes became frozen at some point, with mina process running, but not responding. Because the process is running, automatic restart is not triggered. Logs are not written. Appears as if it was a deadlock.
Steps to Reproduce
Not sure what are the easier steps, but running a 5-day experiment series with the following configuration triggered the issue:
[
{
"infra_rev": "georgeee/mesa",
"release_tag": "3.3.0-martyall-martyall-inc-rxs-per-zkapp-develop-9f82ebf-bullseye-devnet",
"description": "Mesa: smoke testing latest pre-RC"
},
{
"base_tps": 0.02,
"stress_tps": 0.03,
"zkapp_ratio": 0.5,
"rounds": 1,
"round_duration_min": 30,
"pause_min": 0,
"max_cost": false,
"zkapp_soft_limit": 12
},
{
"max_cost_mixed_tps_ratio": 0.5,
"base_tps": 0.05,
"stress_tps": 0.06,
"zkapp_ratio": 0.5,
"rounds": 2,
"round_duration_min": 55,
"pause_min": 5,
"fees": {
"min_zkapp": 4000000000,
"max_zkapp": 5000000000
},
"zkapp_soft_limit": 12
},
{
"base_tps": 0.65,
"stress_tps": 0.65,
"zkapp_ratio": 0.1,
"max_cost": true,
"rounds": 1,
"round_duration_min": 55,
"pause_min": 5,
"min_stop_ratio": 0.1,
"max_stop_ratio": 0.3,
"stop_clean_ratio": 0.5,
"stops_per_round": 4,
"fees": {
"min_zkapp": 4000000000,
"max_zkapp": 5000000000
},
"zkapp_soft_limit": 12
}
]
Expected Result
No deadlock behavior is expected.
Actual Result
Nodes are in a deadlock.
After seed-1 was killed with kill -9, it successfully restarted.
Daemon version
9f82ebf
Tag: 3.3.0-martyall-martyall-inc-rxs-per-zkapp-develop-9f82ebf-bullseye-devnet
Runtime config: https://github.com/o1-labs/gitops-infrastructure/blob/ee7aa533951657e609e215185fbb9781375ade03/platform/hetzner-rivendell-1/applications/mina-standard-itn/genesis-config.json
How frequently do you see this issue?
Frequently
What is the impact of this issue on your ability to run a node?
Blocker
Status
Can't be obtained due to deadlock
Additional information
Preliminary Checks
Description
Running a load testing experiment (72+ hours) triggered a previously unobserved condition in the seed-1 and plain-2 nodes.
Both nodes became frozen at some point, with
minaprocess running, but not responding. Because the process is running, automatic restart is not triggered. Logs are not written. Appears as if it was a deadlock.Steps to Reproduce
Not sure what are the easier steps, but running a 5-day experiment series with the following configuration triggered the issue:
[ { "infra_rev": "georgeee/mesa", "release_tag": "3.3.0-martyall-martyall-inc-rxs-per-zkapp-develop-9f82ebf-bullseye-devnet", "description": "Mesa: smoke testing latest pre-RC" }, { "base_tps": 0.02, "stress_tps": 0.03, "zkapp_ratio": 0.5, "rounds": 1, "round_duration_min": 30, "pause_min": 0, "max_cost": false, "zkapp_soft_limit": 12 }, { "max_cost_mixed_tps_ratio": 0.5, "base_tps": 0.05, "stress_tps": 0.06, "zkapp_ratio": 0.5, "rounds": 2, "round_duration_min": 55, "pause_min": 5, "fees": { "min_zkapp": 4000000000, "max_zkapp": 5000000000 }, "zkapp_soft_limit": 12 }, { "base_tps": 0.65, "stress_tps": 0.65, "zkapp_ratio": 0.1, "max_cost": true, "rounds": 1, "round_duration_min": 55, "pause_min": 5, "min_stop_ratio": 0.1, "max_stop_ratio": 0.3, "stop_clean_ratio": 0.5, "stops_per_round": 4, "fees": { "min_zkapp": 4000000000, "max_zkapp": 5000000000 }, "zkapp_soft_limit": 12 } ]Expected Result
No deadlock behavior is expected.
Actual Result
Nodes are in a deadlock.
After seed-1 was killed with
kill -9, it successfully restarted.Daemon version
9f82ebf
Tag:
3.3.0-martyall-martyall-inc-rxs-per-zkapp-develop-9f82ebf-bullseye-devnetRuntime config: https://github.com/o1-labs/gitops-infrastructure/blob/ee7aa533951657e609e215185fbb9781375ade03/platform/hetzner-rivendell-1/applications/mina-standard-itn/genesis-config.json
How frequently do you see this issue?
Frequently
What is the impact of this issue on your ability to run a node?
Blocker
Status
Can't be obtained due to deadlockAdditional information