Skip to content

Seed node is stale #18634

@georgeee

Description

@georgeee

Preliminary Checks

Description

Running a load testing experiment (72+ hours) triggered a previously unobserved condition in the seed-1 and plain-2 nodes.

Both nodes became frozen at some point, with mina process running, but not responding. Because the process is running, automatic restart is not triggered. Logs are not written. Appears as if it was a deadlock.

Steps to Reproduce

Not sure what are the easier steps, but running a 5-day experiment series with the following configuration triggered the issue:

[
  {
    "infra_rev": "georgeee/mesa",
    "release_tag": "3.3.0-martyall-martyall-inc-rxs-per-zkapp-develop-9f82ebf-bullseye-devnet",
    "description": "Mesa: smoke testing latest pre-RC"
  },
  {
    "base_tps": 0.02,
    "stress_tps": 0.03,
    "zkapp_ratio": 0.5,
    "rounds": 1,
    "round_duration_min": 30,
    "pause_min": 0,
    "max_cost": false,
    "zkapp_soft_limit": 12
  },
  {
    "max_cost_mixed_tps_ratio": 0.5,
    "base_tps": 0.05,
    "stress_tps": 0.06,
    "zkapp_ratio": 0.5,
    "rounds": 2,
    "round_duration_min": 55,
    "pause_min": 5,
    "fees": {
      "min_zkapp": 4000000000,
      "max_zkapp": 5000000000
    },
    "zkapp_soft_limit": 12
  },
  {
    "base_tps": 0.65,
    "stress_tps": 0.65,
    "zkapp_ratio": 0.1,
    "max_cost": true,
    "rounds": 1,
    "round_duration_min": 55,
    "pause_min": 5,
    "min_stop_ratio": 0.1,
    "max_stop_ratio": 0.3,
    "stop_clean_ratio": 0.5,
    "stops_per_round": 4,
    "fees": {
      "min_zkapp": 4000000000,
      "max_zkapp": 5000000000
    },
    "zkapp_soft_limit": 12
  }
]

Expected Result

No deadlock behavior is expected.

Actual Result

Nodes are in a deadlock.

After seed-1 was killed with kill -9, it successfully restarted.

Daemon version

9f82ebf

Tag: 3.3.0-martyall-martyall-inc-rxs-per-zkapp-develop-9f82ebf-bullseye-devnet

Runtime config: https://github.com/o1-labs/gitops-infrastructure/blob/ee7aa533951657e609e215185fbb9781375ade03/platform/hetzner-rivendell-1/applications/mina-standard-itn/genesis-config.json

How frequently do you see this issue?

Frequently

What is the impact of this issue on your ability to run a node?

Blocker

Status

Can't be obtained due to deadlock

Additional information

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    No status

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions