Log in Get started

Back to Bentopdf

Extract Tables

docs/tools/extract-tables.md

2.7.02.1 KB

Original Source

Extract Tables

Detects all tables in a PDF and lets you export them in your choice of format: CSV, JSON, or Markdown. Each table is extracted individually with page and position metadata. Powered by PyMuPDF's table detection engine.

How It Works

Upload a PDF by clicking the drop zone or dragging a file onto it.
Select your preferred export format -- CSV, JSON, or Markdown.
Click Extract to start processing.
If one table is found, it downloads as a single file. Multiple tables produce a ZIP archive with one file per table.

Options

Export Format -- choose between:
- CSV -- comma-separated values with proper quoting for fields containing commas, quotes, or newlines. Suitable for spreadsheets and databases.
- JSON -- array-of-arrays structure, pretty-printed with 2-space indentation. Suitable for programmatic consumption.
- Markdown -- pipe-delimited table format rendered by PyMuPDF. Suitable for documentation and README files.

Output Format

Single table: filename_table.csv (or .json / .md)
Multiple tables: filename_tables.zip containing files named table_1_page3.csv, table_2_page5.csv, etc.

Use Cases

Extracting specific tables from lengthy PDF reports without pulling the entire document's data.
Getting tables in Markdown format for pasting directly into GitHub issues, wikis, or documentation.
Pulling structured JSON data from PDF tables for API integrations or scripts.
Comparing tables across different PDF versions by exporting both to CSV and diffing.

Tips

This tool gives you per-table control. If you just want all tables dumped into a single CSV, use PDF to CSV. For a full Excel workbook, use PDF to Excel.
Tables that span multiple pages may be detected as separate tables per page. You may need to concatenate them manually.
If no tables are detected, the PDF might be a scanned image. Run it through OCR first to add a text layer.