PDF Parser
ParseJet is a developer-friendly PDF parser that extracts text, title, and metadata from any PDF via a single API call. No dependencies to install — replace pdf-parse, pdfplumber, or PyMuPDF with one HTTP endpoint.
Drop a file here or browse
Accepts PDF files
Free — 3 requests/day, no signup. for 300 credits/month free.
How it works
Send your PDF
Upload a file in the tool above, or POST it to the API. ParseJet auto-detects the format — no configuration needed.
Parse and extract
ParseJet extracts text, title, author, page count, and content structure. OCR is applied automatically to scanned pages.
Get structured JSON
Receive a clean JSON response with text, title, source_type, and metadata — ready to use in your application.
Key features
What makes this pdf parser stand out.
Zero dependencies
No need to install poppler, pdftotext, or any native libraries. ParseJet is a hosted API — just make an HTTP request.
Drop-in replacement
Replace pdf-parse (Node.js), pdfplumber (Python), or PyMuPDF with a single API call. Works from any language.
Rich metadata
Returns document title, author, creation date, page count, and detected content type — not just raw text.
Consistent JSON output
Every response follows the same schema: { text, title, source_type, metadata }. No format-specific handling needed.
Built-in OCR
Scanned PDFs are processed with OCR automatically. No separate OCR step or configuration required.
Table detection
Detects tabular data in PDFs. Request Markdown output for properly formatted tables.
Use cases
Common scenarios where this tool saves you time.
Replace pdf-parse in Node.js
If you're using the npm pdf-parse package and hitting issues with native dependencies or maintenance, ParseJet is a drop-in replacement via HTTP.
Replace pdfplumber in Python
pdfplumber requires Python and native libraries. ParseJet provides the same extraction via API, so you can call it from any language or serverless function.
Document processing pipelines
Build automated workflows that parse incoming PDFs — invoices, reports, forms — and route the extracted data to your database or CRM.
RAG document ingestion
Parse PDFs as part of your retrieval-augmented generation pipeline. ParseJet returns structured text that gives LLMs better context.
Automate with the API
Use the same tool programmatically. Works with any language — just HTTP.
# Parse a PDF and get text + metadata curl -X POST https://api.parsejet.com/v1/parse/auto/file \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "[email protected]" # Response: # { # "text": "Invoice #1234\nDate: 2026-03-15\n...", # "title": "Invoice #1234", # "source_type": "pdf", # "metadata": { "pages": 2, "author": "Acme Corp" } # }
import httpx
# Before (pdf-parse / pdfplumber):
# import pdfplumber
# with pdfplumber.open("invoice.pdf") as pdf:
# text = "\n".join(p.extract_text() for p in pdf.pages)
# After (ParseJet — no dependencies):
resp = httpx.post(
"https://api.parsejet.com/v1/parse/auto/file",
headers={"Authorization": "Bearer YOUR_API_KEY"},
files={"file": open("invoice.pdf", "rb")},
)
result = resp.json()
text = result["text"] # All text, all pages
title = result["title"] # Document title
pages = result["metadata"]["pages"] # Page count // Before (pdf-parse):
// const pdfParse = require("pdf-parse");
// const data = await pdfParse(buffer);
// After (ParseJet — no native dependencies):
const formData = new FormData();
formData.append("file", pdfBuffer, "invoice.pdf");
const res = await fetch("https://api.parsejet.com/v1/parse/auto/file", {
method: "POST",
headers: { Authorization: "Bearer YOUR_API_KEY" },
body: formData,
});
const { text, title, source_type, metadata } = await res.json();
// Works in Node.js, Deno, Bun, Cloudflare Workers — anywhere with fetch Want to automate this?
ParseJet API gives you the same parsing power via a single HTTP endpoint. No ffmpeg, no poppler, no tesseract — just one API call.
Frequently asked questions
How does ParseJet compare to pdf-parse (npm)?
pdf-parse is a Node.js library that requires local installation and only handles basic text extraction. ParseJet is a hosted API that extracts text, metadata, and structure from any PDF — including scanned documents via OCR — with zero dependencies.
How does it compare to pdfplumber (Python)?
pdfplumber is excellent for table extraction but requires Python and local processing. ParseJet offers similar capabilities via HTTP, so you can use it from any language without installing Python or native dependencies.
What metadata does the PDF parser extract?
ParseJet extracts the document title, author, creation date, page count, and detected content type. The full text and metadata are returned in a structured JSON response.
Can I use it in a serverless environment?
Yes. Since ParseJet is an HTTP API, it works in AWS Lambda, Vercel Functions, Cloudflare Workers, and any serverless platform — no native binary dependencies to bundle.
Does it support PDF table extraction?
Yes. ParseJet detects and extracts tables from PDFs and returns them as properly formatted Markdown tables by default.
Is it free?
Yes. You get 3 free parses per day with no signup. Create a free account for 300 credits per month. Paid plans start at $19/month with higher rate limits and file size quotas.
Related tools
PDF to Text Converter
Convert PDF to plain text online for free. Handles multi-page documents, scanned PDFs with OCR, and complex layouts. No signup required — use instantly or automate via API.
PDF to Markdown Converter
Convert PDF to Markdown online for free. Preserves headings, lists, tables, and code blocks. No signup required — try it instantly or automate with the ParseJet API.
Extract Text from PDF
Extract text from PDF files online for free. Supports scanned documents, multi-page PDFs, and complex layouts. No installation needed — works in your browser.