scientific-skills/parallel-web/references/web-extract.md
Extract content from: $ARGUMENTS
Choose a short, descriptive filename based on the URL or content (e.g., vespa-docs, react-hooks-api). Use lowercase with hyphens, no spaces.
parallel-cli extract "$ARGUMENTS" --json -o "$FILENAME.json"
Options if needed:
--objective "focus area" to focus on specific contentWhen extracting from academic sources (arXiv, PubMed, journal sites, conference proceedings), use --objective to focus on the most valuable sections:
parallel-cli extract "$URL" --json --objective "extract abstract, methodology, key findings, and conclusions" -o "$FILENAME.json"
For arXiv papers, prefer the /abs/ URL (which has structured metadata) over the raw PDF URL when available. If the user provides a PDF link, extract it directly — parallel-cli handles PDFs.
Return content as:
For academic papers, include structured metadata when available:
Then the extracted content verbatim, with these rules:
After the response, mention the output file path ($FILENAME.json) so the user knows it's available for follow-up questions.