ParseJet

PDF Parser

ParseJet is a developer-friendly PDF parser that extracts text, title, and metadata from any PDF via a single API call. No dependencies to install — replace pdf-parse, pdfplumber, or PyMuPDF with one HTTP endpoint.

Drop a file here or browse

Accepts PDF files

Free — 3 requests/day, no signup. for 300 credits/month free.

How it works

1

Send your PDF

Upload a file in the tool above, or POST it to the API. ParseJet auto-detects the format — no configuration needed.

2

Parse and extract

ParseJet extracts text, title, author, page count, and content structure. OCR is applied automatically to scanned pages.

3

Get structured JSON

Receive a clean JSON response with text, title, source_type, and metadata — ready to use in your application.

Key features

What makes this pdf parser stand out.

Zero dependencies

No need to install poppler, pdftotext, or any native libraries. ParseJet is a hosted API — just make an HTTP request.

Drop-in replacement

Replace pdf-parse (Node.js), pdfplumber (Python), or PyMuPDF with a single API call. Works from any language.

Rich metadata

Returns document title, author, creation date, page count, and detected content type — not just raw text.

Consistent JSON output

Every response follows the same schema: { text, title, source_type, metadata }. No format-specific handling needed.

Built-in OCR

Scanned PDFs are processed with OCR automatically. No separate OCR step or configuration required.

Table detection

Detects tabular data in PDFs. Request Markdown output for properly formatted tables.

Use cases

Common scenarios where this tool saves you time.

Replace pdf-parse in Node.js

If you're using the npm pdf-parse package and hitting issues with native dependencies or maintenance, ParseJet is a drop-in replacement via HTTP.

Replace pdfplumber in Python

pdfplumber requires Python and native libraries. ParseJet provides the same extraction via API, so you can call it from any language or serverless function.

Document processing pipelines

Build automated workflows that parse incoming PDFs — invoices, reports, forms — and route the extracted data to your database or CRM.

RAG document ingestion

Parse PDFs as part of your retrieval-augmented generation pipeline. ParseJet returns structured text that gives LLMs better context.

Automate with the API

Use the same tool programmatically. Works with any language — just HTTP.

cURL
# Parse a PDF and get text + metadata
curl -X POST https://api.parsejet.com/v1/parse/auto/file \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]"

# Response:
# {
#   "text": "Invoice #1234\nDate: 2026-03-15\n...",
#   "title": "Invoice #1234",
#   "source_type": "pdf",
#   "metadata": { "pages": 2, "author": "Acme Corp" }
# }
Python
import httpx

# Before (pdf-parse / pdfplumber):
#   import pdfplumber
#   with pdfplumber.open("invoice.pdf") as pdf:
#       text = "\n".join(p.extract_text() for p in pdf.pages)

# After (ParseJet — no dependencies):
resp = httpx.post(
    "https://api.parsejet.com/v1/parse/auto/file",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    files={"file": open("invoice.pdf", "rb")},
)
result = resp.json()
text = result["text"]          # All text, all pages
title = result["title"]        # Document title
pages = result["metadata"]["pages"]  # Page count
JavaScript
// Before (pdf-parse):
//   const pdfParse = require("pdf-parse");
//   const data = await pdfParse(buffer);

// After (ParseJet — no native dependencies):
const formData = new FormData();
formData.append("file", pdfBuffer, "invoice.pdf");

const res = await fetch("https://api.parsejet.com/v1/parse/auto/file", {
  method: "POST",
  headers: { Authorization: "Bearer YOUR_API_KEY" },
  body: formData,
});
const { text, title, source_type, metadata } = await res.json();
// Works in Node.js, Deno, Bun, Cloudflare Workers — anywhere with fetch

Want to automate this?

ParseJet API gives you the same parsing power via a single HTTP endpoint. No ffmpeg, no poppler, no tesseract — just one API call.

curl -X POST https://api.parsejet.com/v1/parse/auto/url \ -H "Content-Type: application/json" \ -d '{"url":"https://example.com"}'
Read API Docs

Frequently asked questions

How does ParseJet compare to pdf-parse (npm)?

pdf-parse is a Node.js library that requires local installation and only handles basic text extraction. ParseJet is a hosted API that extracts text, metadata, and structure from any PDF — including scanned documents via OCR — with zero dependencies.

How does it compare to pdfplumber (Python)?

pdfplumber is excellent for table extraction but requires Python and local processing. ParseJet offers similar capabilities via HTTP, so you can use it from any language without installing Python or native dependencies.

What metadata does the PDF parser extract?

ParseJet extracts the document title, author, creation date, page count, and detected content type. The full text and metadata are returned in a structured JSON response.

Can I use it in a serverless environment?

Yes. Since ParseJet is an HTTP API, it works in AWS Lambda, Vercel Functions, Cloudflare Workers, and any serverless platform — no native binary dependencies to bundle.

Does it support PDF table extraction?

Yes. ParseJet detects and extracts tables from PDFs and returns them as properly formatted Markdown tables by default.

Is it free?

Yes. You get 3 free parses per day with no signup. Create a free account for 300 credits per month. Paid plans start at $19/month with higher rate limits and file size quotas.

Start extracting text for free

No signup required. Parse your first file in seconds.

View Pricing