Popular repositories Loading
-
confabulations
confabulations PublicHallucinations (Confabulations) Document-Based Benchmark for RAG
-
generalization
generalization PublicThematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a small set of examples and anti-examples, then detect which ite…
-
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.