llama-index-integrations/readers/llama-index-readers-microsoft-sharepoint/README.md
pip install llama-index-readers-microsoft-sharepoint
The loader loads the files from a folder in SharePoint site or SharePoint Site Pages.
It also supports traversing recursively through the sub-folders.
Note: If you use
Sites.Selected, you must grant your app access to the specific SharePoint site(s) via the SharePoint admin center. See Grant access to a specific site for details.
More info on Microsoft Graph APIs - Refer here
To use this loader client_id, client_secret and tenant_id of the registered app in Microsoft Azure Portal is required.
This loader loads the files present in a specific folder in SharePoint.
If the files are present in the Test folder in SharePoint Site under root directory, then the input for the loader for file_path is Test
from llama_index.readers.microsoft_sharepoint import SharePointReader
loader = SharePointReader(
client_id="<Client ID of the app>",
client_secret="<Client Secret of the app>",
tenant_id="<Tenant ID of the Microsoft Azure Directory>",
)
documents = loader.load_data(
sharepoint_site_name="<Sharepoint Site Name>",
sharepoint_folder_path="<Folder Path>",
recursive=True,
)
If you have only been granted access to a specific site (using Sites.Selected), you can use the site host name and relative URL instead of the site name:
from llama_index.readers.microsoft_sharepoint import SharePointReader
loader = SharePointReader(
client_id="<Client ID of the app>",
client_secret="<Client Secret of the app>",
tenant_id="<Tenant ID of the Microsoft Azure Directory>",
sharepoint_host_name="contoso.sharepoint.com",
sharepoint_relative_url="sites/YourSiteName",
)
documents = loader.load_data(
sharepoint_folder_path="<Folder Path>",
recursive=True,
)
You can also load SharePoint Site Pages as documents by setting sharepoint_type to PAGE:
from llama_index.readers.microsoft_sharepoint import (
SharePointReader,
SharePointType,
)
loader = SharePointReader(
client_id="<Client ID of the app>",
client_secret="<Client Secret of the app>",
tenant_id="<Tenant ID of the Microsoft Azure Directory>",
sharepoint_site_name="<Sharepoint Site Name>",
sharepoint_host_name="<your-tenant>.sharepoint.com",
sharepoint_relative_url="/sites/<YourSite>",
sharepoint_type=SharePointType.PAGE,
)
# Load all pages
documents = loader.load_data()
# Or load a specific page by ID
loader.sharepoint_file_id = "<page_id>"
documents = loader.load_data()
You can filter which pages to process using the process_document_callback:
def page_filter(page_name: str) -> bool:
# Only process pages that don't start with "Draft"
return not page_name.startswith("Draft")
loader = SharePointReader(
client_id="<Client ID>",
client_secret="<Client Secret>",
tenant_id="<Tenant ID>",
sharepoint_site_name="<Site Name>",
sharepoint_type=SharePointType.PAGE,
process_document_callback=page_filter,
)
Control error behavior with fail_on_error:
loader = SharePointReader(
client_id="<Client ID>",
client_secret="<Client Secret>",
tenant_id="<Tenant ID>",
fail_on_error=False, # Log errors and continue instead of raising
)
The SharePoint reader emits events during page processing for monitoring:
from llama_index.core.instrumentation import get_dispatcher
from llama_index.core.instrumentation.event_handlers import BaseEventHandler
from llama_index.readers.microsoft_sharepoint import (
TotalPagesToProcessEvent,
PageDataFetchCompletedEvent,
PageFailedEvent,
)
class SharePointEventHandler(BaseEventHandler):
def handle(self, event):
if isinstance(event, TotalPagesToProcessEvent):
print(f"Processing {event.total_pages} pages...")
elif isinstance(event, PageDataFetchCompletedEvent):
print(f"Completed: {event.page_id}")
elif isinstance(event, PageFailedEvent):
print(f"Failed: {event.page_id} - {event.error}")
dispatcher = get_dispatcher("llama_index.readers.microsoft_sharepoint.base")
dispatcher.add_event_handler(SharePointEventHandler())
Available events:
TotalPagesToProcessEvent: Total number of pages to processPageDataFetchStartedEvent: Page processing startedPageDataFetchCompletedEvent: Page successfully processedPageSkippedEvent: Page skipped (via callback)PageFailedEvent: Page processing failed