I had a good chat with Jose about agent safety after he shared this article:
https://www.the-independent.com/tech/claude-ai-agent-deletes-startup-anthropic-b2966176.html
It’s a good reminder that we should not rely only on the model “doing the right thing.” If an agent has too much access, one bad decision can cause real damage.
I think this is a good opportunity for ASTRA to be stronger than a raw Claude Code/Cursor-style workflow by adding better controls around the agent.
Some questions worth discussing:
- What should the default permission mode be?
- Should broad autonomous permissions be opt-in?
- Which actions should always need user approval?
- How should we handle destructive commands, production systems, databases, and credentials?
- Can we add deterministic safety checks outside the prompt/model?
- What should we log so users can audit what happened?
A few possible ideas:
- Default to restricted/read-only permissions when possible.
- Require approval for destructive or production-impacting actions.
- Add a guard layer for risky commands.
- Make production credentials and production workspaces harder to touch accidentally.
- Show clearer risk labels for tasks, skills, tools, and connectors.
Would love to hear thoughts and ideas from everyone. @garricko @jdposada @irvins @sboosi
I had a good chat with Jose about agent safety after he shared this article:
https://www.the-independent.com/tech/claude-ai-agent-deletes-startup-anthropic-b2966176.html
It’s a good reminder that we should not rely only on the model “doing the right thing.” If an agent has too much access, one bad decision can cause real damage.
I think this is a good opportunity for ASTRA to be stronger than a raw Claude Code/Cursor-style workflow by adding better controls around the agent.
Some questions worth discussing:
A few possible ideas:
Would love to hear thoughts and ideas from everyone. @garricko @jdposada @irvins @sboosi