Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
a11c07a
Adding initial scalable setup
yl-nuwan Jun 5, 2026
b17ebc1
fixed the yaml
yl-nuwan Jun 5, 2026
b23c318
removed the conditional exit added for testing
yl-nuwan Jun 5, 2026
1c7ba00
fixed the typo
yl-nuwan Jun 5, 2026
9b228cc
feat(aws-queue-mode): separate ECS execution and task roles with Dyna…
yl-nuwan Jun 5, 2026
b6f6d39
feat(cleanup): add clean and rebuild scripts for scalable containeriz…
yl-nuwan Jun 5, 2026
8fb5329
feat(ecs): add command override variables for ECS tasks and update ma…
yl-nuwan Jun 5, 2026
cc5ef72
feat(aws-containerized): add queue-aware REST handler for ECS deploym…
yl-nuwan Jun 5, 2026
dfcc8fe
fix(ecs-queue-handler): correct message attribute parameter in queue …
yl-nuwan Jun 5, 2026
2c981bb
feat(aws-containerized): add structured logging for agent request lif…
yl-nuwan Jun 5, 2026
ed49d79
testing with the rest async mode
yl-nuwan Jun 8, 2026
8efdd03
Fixed the mode type
yl-nuwan Jun 8, 2026
1afc5e6
fix(aws-containerized): correct path parameter substitution in async …
yl-nuwan Jun 8, 2026
36b6794
chore(aws-containerized): rename sqs.tf to queue.tf and remove redund…
yl-nuwan Jun 8, 2026
8faeac4
docs(aws-containerized): add queue mode documentation and update exam…
yl-nuwan Jun 8, 2026
d3858f3
style(aws-containerized): format code and improve string consistency
yl-nuwan Jun 8, 2026
79f1230
formated the documant
yl-nuwan Jun 8, 2026
b6e8e5e
Enabled a basic containerized test to check the backward compatibility
yl-nuwan Jun 8, 2026
189275e
fix(aws-deployment): add session ID validation for REST_ASYNC polling
yl-nuwan Jun 8, 2026
b53802a
fix(aws-containerized): improve error handling and validation in ECS …
yl-nuwan Jun 8, 2026
e3eb26e
fix(aws): correct HTTP status code and error messages for missing res…
yl-nuwan Jun 8, 2026
2d4b53b
feat(aws-containerized): add SQS-based autoscaling for Agent Runner
yl-nuwan Jun 8, 2026
256fc97
removed duplicate definition
yl-nuwan Jun 8, 2026
2d62c50
load testing
yl-nuwan Jun 9, 2026
cfab4f3
disabled the cache tempararly
yl-nuwan Jun 9, 2026
ecaad3c
Merge branch 'develop' into CNT-scalability
yl-nuwan Jun 9, 2026
9bd2f92
Merge branch 'develop' into CNT-scalability
yl-nuwan Jun 9, 2026
517dce4
fix: update agentkernel dependency version to 0.5.1 in deploy scripts…
yl-nuwan Jun 9, 2026
66cef74
fix: correct indentation in deploy script for config file copy
yl-nuwan Jun 9, 2026
81436c3
feat: add agent runner autoscaling documentation and configuration de…
yl-nuwan Jun 14, 2026
c61869e
refactor: clean up comments and enabled integration test configuration
yl-nuwan Jun 15, 2026
fd84cc3
re enabled the cache
yl-nuwan Jun 15, 2026
85dc2ce
linted
yl-nuwan Jun 15, 2026
fae5c49
enabled the agent code after load testing
yl-nuwan Jun 18, 2026
45ebaf2
fix: update autoscaling conditions to use local.enable_autoscaling in…
yl-nuwan Jun 18, 2026
5362d28
Potential fix for pull request finding
yl-nuwan Jun 18, 2026
0db8c89
Potential fix for pull request finding
yl-nuwan Jun 18, 2026
234b3fe
resolve pr review suggestions
yl-nuwan Jun 20, 2026
8729ad7
refactor(containerized): restructure infrastructure into modular comp…
yl-nuwan Jun 20, 2026
1ba0207
reduced test scope
yl-nuwan Jun 21, 2026
b475aaf
refactor: update containerized module configurations for consistency …
yl-nuwan Jun 21, 2026
b8faff2
feat: add create_dynamodb_memory_table variable and update IAM polici…
yl-nuwan Jun 21, 2026
c7a11d3
refactor: update package_path references in containerized module conf…
yl-nuwan Jun 21, 2026
71b4243
switch to sync mode for testing for the application
yl-nuwan Jun 21, 2026
b8cfb32
reduced test scope
yl-nuwan Jun 21, 2026
fbc2bee
containerized code refactor
lakindu-yl Jun 26, 2026
8dda440
containerized mode documentation updates
lakindu-yl Jun 26, 2026
b2c5c34
refactor sqs_consumer
lakindu-yl Jun 28, 2026
66ee9fa
function naming update
lakindu-yl Jun 28, 2026
5e4e09d
update groups thread execution
lakindu-yl Jun 28, 2026
4965e5c
updates, and refactors
lakindu-yl Jun 28, 2026
3fbe6ce
example import fixes
lakindu-yl Jun 28, 2026
8b4e1c7
example import fix 2
lakindu-yl Jun 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 91 additions & 2 deletions .agents/skills/ak-dev-architecture/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ description: >
Agent Kernel architectural principles, core abstractions, and design patterns.
Use this skill when you need to understand the codebase structure, how components
interact, or before making changes to core functionality. Covers Session, Agent,
Runner, Module, Runtime, AgentService, AKConfig, tools, hooks, multimodal, and the adapter pattern.
Runner, Module, Runtime, AgentService, AKConfig, tools, hooks, multimodal, the adapter pattern,
and the AWS ECS containerized deployment classes (ECSIOHandler, ECSOutputConsumer,
ECSAgentRunner, ECSSQSConsumer, ThreadRunner).
license: Apache-2.0
metadata:
author: yaalalabs
Expand Down Expand Up @@ -205,7 +207,17 @@ ak-py/src/agentkernel/
│ ├── a2a/ # Agent-to-Agent server
│ └── mcp/ # MCP server
├── deployment/ # Cloud deployment adapters
│ ├── aws/ # Lambda handler
│ ├── aws/
│ │ ├── serverless/ # Lambda handlers: Lambda, ResponseHandler, ServerlessAgentRunner, etc.
│ │ ├── containerized/ # ECS Fargate handlers
│ │ │ ├── core/
│ │ │ │ ├── sqs_consumer.py # ECSSQSConsumer — ABC: SQS poll loop
│ │ │ │ └── thread_runner.py # ThreadRunner — run N callables as peer threads
│ │ │ ├── akagentrunner.py # ECSAgentRunner — polls Input Queue, runs agent
│ │ │ ├── akoutputconsumer.py # ECSOutputConsumer — polls Output Queue, writes to DB/WS
│ │ │ ├── ecs_io_handler.py # ECSIOHandler — entrypoint: wires both threads
│ │ │ └── ecs_queue_handler.py # ECSQueueRequestHandler — FastAPI routes
│ │ └── core/ # Shared: SQSHandler, WebSocketHandler, ResponseStore
│ └── azure/ # Azure Functions handler
├── integration/ # Messaging integrations
│ ├── slack/
Expand Down Expand Up @@ -241,6 +253,83 @@ ak-py/src/agentkernel/
└── session_cache.py # SessionNonVolatileCacheAttachmentStore (legacy)
```

## AWS ECS Containerized Deployment

The containerized deployment runs on ECS Fargate and uses a two-container architecture for scalable queue-based processing.

### Class Hierarchy

| Class | File | Role |
|---|---|---|
| `ECSSQSConsumer` | `containerized/core/sqs_consumer.py` | Abstract base: SQS long-poll loop, retry/DLQ logic |
| `ThreadRunner` | `containerized/core/thread_runner.py` | Runs N callables as peer threads via `ThreadPoolExecutor` |
| `ECSOutputConsumer` | `containerized/akoutputconsumer.py` | Extends `ECSSQSConsumer` — polls Output Queue, writes to DynamoDB or broadcasts via WebSocket |
| `ECSAgentRunner` | `containerized/akagentrunner.py` | Extends `ECSSQSConsumer` — polls Input Queue, runs the agent, sends to Output Queue |
| `ECSIOHandler` | `containerized/ecs_io_handler.py` | Entrypoint for the IO container: wires REST API + output consumer as peer threads |
| `ECSQueueRequestHandler` | `containerized/ecs_queue_handler.py` | FastAPI routes: `POST /api/v1/chat` enqueues; `GET /api/v1/chat/{id}` polls |

### Two-Container Layout

```
Container 1 — ECSIOHandler
Thread 1 (ThreadRunner): RESTAPI.run(handlers=[ECSQueueRequestHandler()])
— FastAPI/uvicorn, handles POST /chat and GET /chat/{id}
Thread 2 (ThreadRunner): ECSOutputConsumer.run()
— polls Output Queue, writes to DynamoDB / broadcasts via WebSocket

Container 2 — ECSAgentRunner
Main thread: ECSSQSConsumer.run()
— polls Input Queue, runs agent, sends result to Output Queue
```

### ECSSQSConsumer Contract

- **`_get_queue_url(cls) → str`** *(abstract)*: return the SQS queue URL to poll.
- **`process_message(cls, record)`** *(abstract)*: handle one message; called on every successful receive.
- **`on_permanent_failure(cls, record)`** *(abstract)*: called when `ApproximateReceiveCount > max_receive_count`; **must catch its own exceptions** — if it raises, the message is not deleted and loops back.
- **`delete_message(cls, client, msg)`** *(public)*: subclasses may call this directly when manual deletion is needed.
- **`run(cls)`**: blocking poll loop — the container entry-point.

### ThreadRunner Contract

`ThreadRunner.run(*targets, thread_names=..., exit_on_failure=True)` submits all callables to a `ThreadPoolExecutor` and waits for `FIRST_COMPLETED`:

- Thread **raises** → logs exception; if `exit_on_failure=True`, calls `os._exit(1)` inside the `with` block so the container restarts cleanly via ECS (the `_exit` is placed before `executor.shutdown(wait=True)` to avoid blocking on the other infinite-loop thread).
- Thread **returns normally** (no exception) → logs unexpected exit; `os._exit` is **not** called.

### Entry Point Pattern

```python
# Container 1 — app_rest_service.py
from agentkernel.deployment.aws.containerized import ECSIOHandler

runner = ECSIOHandler.run

if __name__ == "__main__":
runner()

# Container 2 — app_agent_runner.py
from agentkernel.deployment.aws import ECSAgentRunner
from agentkernel.openai import OpenAIModule

OpenAIModule([...])

if __name__ == "__main__":
ECSAgentRunner.run()
```

### Public Exports

```python
# agentkernel.deployment.aws
from agentkernel.deployment.aws import (
ECSAgentRunner, # Container 2 entry-point
ECSIOHandler, # Container 1 entry-point
ECSOutputConsumer, # Subclass ECSSQSConsumer for custom output processing
)
from agentkernel.deployment.aws.containerized.core import ECSSQSConsumer, ThreadRunner
```

## Execution Flow

```
Expand Down
119 changes: 61 additions & 58 deletions .github/integration-test-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,71 +37,74 @@ weekly:
# then each project is deployed and tested
tests:
#AWS Containerized
# - type: aws-containerized
# path: examples/aws-containerized/adk
# deploy_dir: deploy
# - type: aws-containerized
# path: examples/aws-containerized/openai-dynamodb
# deploy_dir: deploy
# - type: aws-containerized
# path: examples/aws-containerized/crewai
# deploy_dir: deploy
# - type: aws-containerized
# path: examples/aws-containerized/mcp/multi
# deploy_dir: deploy
- type: aws-containerized
path: examples/aws-containerized/adk
deploy_dir: deploy
- type: aws-containerized
path: examples/aws-containerized/openai-dynamodb
deploy_dir: deploy
- type: aws-containerized
path: examples/aws-containerized/crewai
deploy_dir: deploy
- type: aws-containerized
path: examples/aws-containerized/mcp/multi
path: examples/aws-containerized/openai-dynamodb-scalable
deploy_dir: deploy

# AWS Serverless
- type: aws-serverless
path: examples/aws-serverless/adk
deploy_dir: deploy
- type: aws-serverless
path: examples/aws-serverless/crewai
deploy_dir: deploy
- type: aws-serverless
path: examples/aws-serverless/langgraph
deploy_dir: deploy
- type: aws-serverless
path: examples/aws-serverless/openai-auth
deploy_dir: deploy
- type: aws-serverless
path: examples/aws-serverless/scalable-openai
deploy_dir: deploy
# # AWS Serverless
# - type: aws-serverless
# path: examples/aws-serverless/adk
# deploy_dir: deploy
# - type: aws-serverless
# path: examples/aws-serverless/crewai
# deploy_dir: deploy
# - type: aws-serverless
# path: examples/aws-serverless/langgraph
# deploy_dir: deploy
# - type: aws-serverless
# path: examples/aws-serverless/openai-auth
# deploy_dir: deploy
# - type: aws-serverless
# path: examples/aws-serverless/scalable-openai
# deploy_dir: deploy

# Memory options
- type: aws-serverless
path: examples/memory/redis
- type: aws-serverless
path: examples/memory/dynamodb
- type: azure-serverless
path: examples/memory/cosmos
deploy_dir: deploy
# # # Memory options
# - type: aws-serverless
# path: examples/memory/redis
# - type: aws-serverless
# path: examples/memory/dynamodb
# - type: azure-serverless
# path: examples/memory/cosmos
# deploy_dir: deploy

# Azure Serverless
- type: azure-serverless
path: examples/azure-serverless/openai
deploy_dir: deploy
# # Azure Serverless
# - type: azure-serverless
# path: examples/azure-serverless/openai
# deploy_dir: deploy

# Azure Containerized
- type: azure-containerized
path: examples/azure-containerized/openai-cosmos
deploy_dir: deploy
# # Azure Containerized
# - type: azure-containerized
# path: examples/azure-containerized/openai-cosmos
# deploy_dir: deploy

# GCP Serverless
- type: gcp-serverless
path: examples/gcp-serverless/openai-firestore
deploy_dir: deploy
- type: gcp-serverless
path: examples/gcp-serverless/openai
deploy_dir: deploy
- type: gcp-serverless
path: examples/gcp-serverless/openai-auth
deploy_dir: deploy
# # GCP Serverless
# - type: gcp-serverless
# path: examples/gcp-serverless/openai-firestore
# deploy_dir: deploy
# - type: gcp-serverless
# path: examples/gcp-serverless/openai
# deploy_dir: deploy
# - type: gcp-serverless
# path: examples/gcp-serverless/openai-auth
# deploy_dir: deploy

# # GCP Containerized
- type: gcp-containerized
path: examples/gcp-containerized/openai
deploy_dir: deploy
- type: gcp-containerized
path: examples/gcp-containerized/openai-auth
deploy_dir: deploy
# - type: gcp-containerized
# path: examples/gcp-containerized/openai
# deploy_dir: deploy
# - type: gcp-containerized
# path: examples/gcp-containerized/openai-auth
# deploy_dir: deploy

Loading