Cleanlab Trustworthy Language Model

Cleanlab’s Trustworthy Language Model scores the trustworthiness of every LLM response in real-time, using state-of-the-art uncertainty estimates for LLMs. Trust scoring is crucial for applications where unchecked hallucinations and other LLM errors are a show-stopper.

This page demonstrates how to use TLM in place of your own LLM, to both generate responses and score their trustworthiness. That’s not the only way to use TLM though. To add trust scoring to your existing unmodified RAG application, you can instead see this Trustworthy RAG tutorial. Beyond RAG applications, you can score the trustworthiness of responses already generated from any LLM via TLM.get_trustworthiness_score().

Learn more in the Cleanlab documentation.

Setup

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python

%pip install llama-index-llms-cleanlab

python

%pip install llama-index

python

from llama_index.llms.cleanlab import CleanlabTLM

python

# set api key in env or in llm
# get free API key from: https://cleanlab.ai/
# import os
# os.environ["CLEANLAB_API_KEY"] = "your api key"

llm = CleanlabTLM(api_key="your_api_key")

python

resp = llm.complete("Who is Paul Graham?")

python

print(resp)

You also get the trustworthiness score of the above response in additional_kwargs. TLM automatically computes this score for all the <prompt, response> pair.

python

print(resp.additional_kwargs)

A high score indicates that LLM's response can be trusted. Let's take another example here.

python

resp = llm.complete(
    "What was the horsepower of the first automobile engine used in a commercial truck in the United States?"
)

python

print(resp)

python

print(resp.additional_kwargs)

A low score indicates that the LLM's response shouldn't be trusted.

From these 2 straightforward examples, we can observe that the LLM's responses with the highest scores are direct, accurate, and appropriately detailed.

On the other hand, LLM's responses with low trustworthiness score convey unhelpful or factually inaccurate answers, sometimes referred to as hallucinations.

Streaming

Cleanlab’s TLM does not natively support streaming both the response and the trustworthiness score. However, there is an alternative approach available to achieve low-latency, streaming responses that can be used for your application.

Detailed information about the approach, along with example code, is available here.

Advance use of TLM

TLM can be configured with the following options:

model: underlying LLM to use
max_tokens: maximum number of tokens to generate in the response
num_candidate_responses: number of alternative candidate responses internally generated by TLM
num_consistency_samples: amount of internal sampling to evaluate LLM-response-consistency
use_self_reflection: whether the LLM is asked to self-reflect upon the response it generated and self-evaluate this response
log: specify additional metadata to return. include “explanation” here to get explanations of why a response is scored with low trustworthiness

These configurations are passed as a dictionary to the CleanlabTLM object during initialization.

More details about these options can be referred from Cleanlab's API documentation and a few use-cases of these options are explored in this notebook.

Let's consider an example where the application requires gpt-4 model with 128 output tokens.

python

options = {
    "model": "gpt-4",
    "max_tokens": 128,
}
llm = CleanlabTLM(api_key="your_api_key", options=options)

python

resp = llm.complete("Who is Paul Graham?")

python

print(resp)

To understand why the TLM estimated low trustworthiness for the previous horsepower related question, specify the "explanation" flag when initializing the TLM.

python

options = {
    "log": ["explanation"],
}
llm = CleanlabTLM(api_key="your_api_key", options=options)

resp = llm.complete(
    "What was the horsepower of the first automobile engine used in a commercial truck in the United States?"
)

python

print(resp)

python

print(resp.additional_kwargs["explanation"])