Back to Llama Index

Longllmlingua2

llama-index-integrations/postprocessor/llama-index-postprocessor-longllmlingua/examples/longllmlingua2.ipynb

0.14.212.4 KB
Original Source

Get data for RAG

python
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'pg_essay.txt'

Build a simple RAG query engine

  • First we chunk documents from the main document, and
  • Build in-memory Vector index from documents using LlamaIndex's VectorStoreIndex abstraction
python
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.postprocessor.longllmlingua import LongLLMLinguaPostprocessor

documents = SimpleDirectoryReader(input_files=["pg_essay.txt"]).load_data()
python
index = VectorStoreIndex.from_documents(documents)
python
retriever = index.as_retriever(similarity_top_k=8)
python
query = "What did the author do during his time at Yale?"
python
nodes = retriever.retrieve(query)
python
nodes

Prompt compression using llmlingua2

LLMLingua2's claim to fame is its ability to achieve performant compression using a small prompt compression method trained via data distillation from GPT-4 for token classification! This performant compression comes with a performance bump of 3x-6x

https://aclanthology.org/2024.findings-acl.57/

python
compressor_llmlingua2 = LongLLMLinguaPostprocessor(
    model_name="microsoft/llmlingua-2-xlm-roberta-large-meetingbank",
    device_map="mps",  # Mac users rejoice!
    use_llmlingua2=True,
)
python
from llama_index.core.schema import QueryBundle

results = compressor_llmlingua2._postprocess_nodes(
    nodes, query_bundle=QueryBundle(query_str=query)
)
python
from IPython.display import display, Markdown

display(Markdown(results[0].text))
python
query_engine1 = index.as_query_engine(
    similarity_top_k=8, postprocessors=[compressor_llmlingua2]
)

response = query_engine1.query(query)
python
display(Markdown(str(response)))
python
response.metadata

Test llmlingua 1

python
compressor_llmlingua1 = LongLLMLinguaPostprocessor(
    device_map="mps"  # Mac users rejoice!
)
python
results = compressor_llmlingua1._postprocess_nodes(
    nodes, query_bundle=QueryBundle(query_str=query)
)
python
results
python
query_engine_llmlingua1 = index.as_query_engine(
    similarity_top_k=8, postprocessors=[compressor_llmlingua1]
)

response = query_engine_llmlingua1.query(query)

display(Markdown(str(response)))
python
response.metadata