docs/schemas.mdx
Schemas are the definition for the exact data format you expect websites within a particular group to use. Every row of data Reworkd processes will go through a strict schema validation process to guaranteed your data is consistent with your schema.
Schemas support both basic data types like strings and numbers along with a collection of advanced fields that apply transformations to the data:
How well you can scrape a page is is heavily impacted by your schema choices. Here are some loose guides on making a good schema:
Often not every website will conform to the unified schema you’ve created. Sometimes individual pages may be missing fields while other times the entire website itself may not present a field.
If the field is missing, it will be left as null in the output. If it is an array, it will be left as an empty array.