- Remote
Popular repositories Loading
-
medical-ai-failure-atlas
medical-ai-failure-atlas PublicClinician led synthetic medical AI safety evaluation resources: Failure Atlas, SourceCheckup, Turkish medical language risk, and outside objection routes.
Python 1
-
tr-ai-card-radar
tr-ai-card-radar PublicAudit Hugging Face model/dataset cards for Turkish AI resources and write small, reproducible metadata reports. Clinician-led, open-source. No model ranking, no legal/clinical claims.
Python
-
lighteval
lighteval PublicForked from huggingface/lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Python
-
lm-evaluation-harness
lm-evaluation-harness PublicForked from EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
Python
-
inspect_ai
inspect_ai PublicForked from UKGovernmentBEIS/inspect_ai
Inspect: A framework for large language model evaluations
Python
-
trust-safety-evals
trust-safety-evals PublicForked from The-AI-Alliance/trust-safety-evals
The AI Alliance project to define a reference stack for AI model and system evaluation, with evaluations, benchmarks, and leaderboards.
Makefile
If the problem persists, check the GitHub status page or contact support.

