diff --git a/README.md b/README.md index 422cb02..51c65d8 100644 --- a/README.md +++ b/README.md @@ -127,6 +127,7 @@ Evaluation is how you know if your AI actually works (and not hallucinating). Th ### Open Source Platforms - ![](https://img.shields.io/github/stars/Agenta-AI/agenta?style=social&label=github.com) [**Agenta**](https://github.com/Agenta-AI/agenta) - End-to-end LLM developer platform for prompt engineering, evaluation, and deployment. +- ![](https://img.shields.io/github/stars/luoyuctl/agenttrace?style=social&label=github.com) [**agenttrace**](https://github.com/luoyuctl/agenttrace) - Local TUI for evaluating AI coding agent sessions with cost, failures, latency, anomalies, health gates, and diffs. - ![](https://img.shields.io/github/stars/Arize-ai/phoenix?style=social&label=github.com) [**Arize Phoenix**](https://github.com/Arize-ai/phoenix) - OpenTelemetry-native observability and evaluation toolkit for RAG, LLMs, and agents. - ![](https://img.shields.io/github/stars/ucbepic/docetl?style=social&label=github.com) [**DocETL**](https://github.com/ucbepic/docetl) - ETL system for complex document processing with LLMs and built-in quality checks. - ![](https://img.shields.io/github/stars/Giskard-AI/giskard?style=social&label=github.com) [**Giskard**](https://github.com/Giskard-AI/giskard) - Testing framework for ML models with vulnerability scanning and LLM-specific detectors.