llama-index-integrations/readers/llama-index-readers-wordpress/README.md
pip install llama-index-readers-wordpress
This loader fetches the text from Wordpress blog posts using the Wordpress API. It also uses the BeautifulSoup library to parse the HTML and extract the text from the articles.
To use this loader, you need to pass base url of the Wordpress installation
(e.g. https://www.mysite.com) and optionally a username, and an application
password for the user (more about application passwords
here)
from llama_index.readers.wordpress import WordpressReader
loader = WordpressReader(
url="https://www.mysite.com",
username="my_username",
password="my_password",
)
documents = loader.load_data()
This loader is designed to be used as a way to load data into LlamaIndex.
Be default, the loader retrieves both Wordpress pages (static content) and
posts (blog entries) from the target site. This behavior can be configured
by setting get_pages=False or get_posts=False when initializing the
WordpressReader object.
To scrape additional custom endpoints beside posts and pages, you can specify additional_post_types as a comma-separated list (e.g., additional_post_types="custom-pages,custom-posts") when initializing the WordpressReader object.
from llama_index.readers.wordpress import WordpressReader
loader = WordpressReader(
url="https://www.mysite.com",
username="my_username",
password="my_password",
additional_post_types="webiners,podcasts",
)
documents = loader.load_data()