3d27's comments

3d27 · on March 22, 2024

Checkout this instead: https://github.com/confident-ai/deepeval

Also has native ragas implementation but supports all models.

3d27 · on March 4, 2024

This is great. I'm also building an LLM evaluation framework with all these benchmarks integrated in one place so anyone can go benchmark these new models on their local setup in under 10 lines of code. Hope someone finds this useful: https://github.com/confident-ai/deepeval

3d27 · on Feb 26, 2024

(found this interesting post on medium, this is not my original work)

3d27 · on Feb 26, 2024

There's a lot more in the evaluation space, including this one: https://github.com/confident-ai/deepeval

snapey · on Feb 26, 2024

And promptfoo https://github.com/promptfoo/promptfoo

gaocegege · on Feb 26, 2024

Welcome contributions! https://github.com/tensorchord/ai-infra-landscape

Open-source contributions can make it better. :-)

3d27 · on Jan 20, 2024

I'm just imagining Jensen Huang laughing in his sleep right now...

3d27 · on Jan 3, 2024

How did you calculate accuracy and bias?

3d27 · on Dec 12, 2023

The package I built is like a provider for 10+ different evaluation metrics that run both locally on your machine using models from hugging-face but also on the cloud IF you want more functionality.

If you want to evaluate a fine-tuned model, we have integrations with LM Harness and Stanford HELM coming out. If you want to evaluate a RAG application, we have 7+ metrics available for that.

You can also create your custom metrics using our interface!