Stop Evaluating LLMs on Vibes

35 points by shayaks 2 years ago

shayaks 2 years ago

Learn more about TruLens here if you're curious: https://medium.com/trulens/evaluate-and-track-your-llm-exper...

Docs and other resources at: https://www.trulens.org/

Developer on the trulens team here. We just put out a pretty cool new release (0.1.2). Integration with huggingface pipelines in langchain and asynchronous feedback management are just some of the useful updates in this release.

Real evaluations (maybe evaluated by LLMs themselves) instead of just vibe checks is the next big step we need to take as an industry.

vjsingh2793 2 years ago

It's essential to assess language models based on their performance and capabilities rather than subjective measures like "vibes."

mblemanski 2 years ago

Any acceleration in the evaluation process is welcome and much, much needed. Looking forward to checking out TruLens.

shayaks 2 years ago

Thanks! Would love to get any feedback.

shayaks 2 years ago

Stop deploying LLMs to your users based on vibes. Use TruLens to evaluate your LLM apps

duncanid 2 years ago

You guys are on fire