Back to Llama Index

LlamaIndex MarkItDown Reader Integration

llama-index-integrations/readers/llama-index-readers-markitdown/README.md

0.14.21769 B
Original Source

LlamaIndex MarkItDown Reader Integration

MarkItDown is a powerful tool that converts various file formats to Markdown.

llama-index-readers-markitdown is an integration that uses MarkItDown to extract text from various file formats, supporting:

  • .txt files and text-based files without extension
  • .csv, .xml and .json files
  • HTML files (.html)
  • Presentations (.pptx)
  • Word documents (.docx)
  • PDF documents (.pdf)
  • ZIP files (.zip)

You can install it via:

bash
pip install llama-index-readers-markitdown

And you can use it in your scripts as follows:

python
from llama_index.readers.markitdown import MarkItDownReader

reader = MarkItDownReader()
documents = reader.load_data("presentation.pptx")