llama-index-integrations/readers/llama-index-readers-legacy-office/README.md
<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/data_connectors/lagecy_office_reader.ipynb" target="_parent"></a>
The Legacy Office Reader allows loading data from legacy Office documents (like Word 97 .doc files) using Apache Tika. It runs the Tika server locally to avoid remote server calls.
You can install the Legacy Office Reader via pip:
pip install llama-index-readers-legacy-office
from llama_index.readers.legacy_office import LegacyOfficeReader
# Initialize LegacyOfficeReader
reader = LegacyOfficeReader(
tika_server_jar_path="path/to/tika-server.jar", # Optional: Path to Tika server JAR
)
# Load data from a legacy Office document
documents = reader.load_data(
file="path/to/document.doc", # Path to the legacy Office document
)
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.legacy_office import LegacyOfficeReader
reader = SimpleDirectoryReader(
input_dir="path/to/directory/",
file_extractor={".doc": LegacyOfficeReader()},
)
documents = reader.load_data()
.doc) using Apache Tika9998This reader is built on top of: