Documentation

ParseJet Documentation

ParseJet extracts text from any file or URL. One API call handles PDF, DOCX, YouTube, web pages, images, audio, video, and 25+ more formats.

Quick Start

Get your first parse result in under 60 seconds. No signup required.

1

Try it instantly

Paste any URL into ParseJet — no API key needed for your first 3 requests per day.

curl -X POST https://api.parsejet.com/v1/parse/auto/url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'
2

Get your API key

Sign in with Google or GitHub to get a free API key. Free tier includes 300 requests per month.

# Add your API key to requests
curl -X POST https://api.parsejet.com/v1/parse/auto/url \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'
3

Use the result

Every response returns the same JSON structure regardless of input format:

{
  "text": "Extracted text content...",
  "title": "Document Title",
  "source_type": "webpage",
  "metadata": { "url": "https://example.com" }
}

Authentication

ParseJet offers three levels of access. You can start using the API immediately without any authentication.

Level How to access Rate limit Best for
AnonymousNo headers3/day, 2MBQuick testing
SessionSign in (cookie)10/day, 5MBDashboard tool
API KeyAuthorization: Bearer pj_xxxBy planProduction

Tip: You don't need an API key to get started. Just send requests directly — the first 3 per day are free with no signup.

Core Concepts

Supported formats

ParseJet auto-detects the format from the file extension or URL pattern. You don't need to specify the format — just send the file or URL to /v1/parse/auto and ParseJet handles the rest.

Category Formats Credits
TextTXT, MD, JSON, CSV, XML, HTML1
DocumentsDOCX, PPTX, XLSX, EPUB2
ComplexPDF, web pages, video3
YouTubeYouTube video URLs5
OtherAudio (MP3, WAV), images (JPG, PNG), RSS, OPML, email, notebooks1

Credits

Each API request consumes credits based on the complexity of the format being parsed. Simple text files cost 1 credit, while YouTube transcripts cost 5. Your monthly credit allowance depends on your plan.

Output format

By default, ParseJet returns raw extracted text. Add ?output_format=markdown to any request to get post-processed output with detected headings, lists, tables, and code blocks.

Guide

Parse a PDF

Extract text from any PDF file, including scanned documents and multi-page reports.

Upload a PDF file

curl -X POST https://api.parsejet.com/v1/parse/auto/file \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "[email protected]"

Convert to Markdown

Add output_format=markdown to preserve document structure:

curl -X POST https://api.parsejet.com/v1/parse/auto/file?output_format=markdown \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "[email protected]"

Credit cost: 3 credits per PDF. Supports files up to your plan's file size limit (10MB-200MB).

Guide

YouTube Transcripts

Get the full transcript of any YouTube video. Supports auto-generated captions in 100+ languages.

Get a transcript

curl -X POST https://api.parsejet.com/v1/parse/youtube \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=VIDEO_ID"}'

Specify language

Use the language parameter for non-English videos:

curl -X POST https://api.parsejet.com/v1/parse/youtube \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "language": "ja"}'

Or use auto-detect

The /v1/parse/auto/url endpoint automatically detects YouTube URLs:

curl -X POST https://api.parsejet.com/v1/parse/auto/url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtu.be/VIDEO_ID"}'

Credit cost: 5 credits per YouTube video. Metadata includes video_id, channel, and duration.

Guide

Web Scraping

Extract the main content from any web page. ParseJet automatically removes navigation, ads, sidebars, and boilerplate.

curl -X POST https://api.parsejet.com/v1/parse/webpage \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/blog/article"}'

Credit cost: 3 credits per web page. Returns clean text with title and source URL in metadata.

Guide

Office Documents

Parse Word (DOCX), Excel (XLSX), PowerPoint (PPTX), and CSV files. Just upload the file — ParseJet detects the format automatically.

# Works with any Office format
curl -X POST https://api.parsejet.com/v1/parse/auto/file \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "[email protected]"

# Also works with spreadsheets
curl -X POST https://api.parsejet.com/v1/parse/auto/file \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "[email protected]"

Credit cost: 2 credits per document. Supported: DOCX, PPTX, XLSX, CSV.

API Reference

Response Format

All endpoints return the same JSON structure:

{
  "text": "Extracted text content...",
  "title": "Document Title",
  "source_type": "pdf",
  "metadata": { "pages": 12, "author": "Jane Doe" }
}
Field Type Description
textstringThe extracted text content
titlestringDocument or page title
source_typestringFormat identifier (pdf, webpage, youtube, etc.)
metadataobjectFormat-specific metadata (page count, author, duration, etc.)
POST

/v1/parse/auto

The recommended endpoint. Auto-detects format from file extension or URL type. Accepts file (multipart) or url (form field), not both.

curl -X POST https://api.parsejet.com/v1/parse/auto \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "[email protected]"
POST

/v1/parse/auto/url

Parse any URL. Automatically distinguishes YouTube from regular web pages.

ParameterTypeRequiredDescription
urlstringyesURL to parse
languagestringnoISO 639-1 code for YouTube transcript language
curl -X POST https://api.parsejet.com/v1/parse/auto/url \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'
POST

/v1/parse/auto/file

Parse any uploaded file. Detects format from file extension, falls back to content-based detection.

curl -X POST https://api.parsejet.com/v1/parse/auto/file \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "[email protected]"
POST

/v1/parse/webpage

Extract main content from a web page. Removes navigation, ads, and boilerplate.

ParameterTypeRequiredDescription
urlstringyesWeb page URL
curl -X POST https://api.parsejet.com/v1/parse/webpage \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'
POST

/v1/parse/youtube

Extract transcript from a YouTube video. Metadata includes video_id, channel, and duration.

ParameterTypeRequiredDescription
urlstringyesYouTube video URL or video ID
languagestringnoISO 639-1 language code
curl -X POST https://api.parsejet.com/v1/parse/youtube \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "language": "en"}'
POST

/v1/parse/audio

Parse audio files. Supports MP3, WAV, M4A, OGG, FLAC, WebM. Max 25MB.

FieldTypeRequiredDescription
filefileyesAudio file
languagestringnoISO 639-1 code
with_timestampsbooleannoInclude word-level timestamps
curl -X POST https://api.parsejet.com/v1/parse/audio \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "[email protected]" -F "language=en"
POST

/v1/parse/video

Extract audio from video for transcription. Supports MP4, MKV, AVI, MOV, WebM.

curl -X POST https://api.parsejet.com/v1/parse/video \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "[email protected]" -F "language=en"
POST

/v1/parse/epub

Parse EPUB ebook. Extracts text organized by chapters.

curl -X POST https://api.parsejet.com/v1/parse/epub \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "[email protected]"
POST

/v1/parse/feed

Parse RSS or Atom feed. Also supports OPML via /v1/parse/opml.

curl -X POST https://api.parsejet.com/v1/parse/feed \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "[email protected]"
POST

/v1/parse/image

Analyze image. Supports JPG, PNG, GIF, BMP, WebP, TIFF. Max 20MB.

FieldTypeRequiredDescription
filefileyesImage file
promptstringnoCustom prompt for image analysis
modelstringnoVision model override
curl -X POST https://api.parsejet.com/v1/parse/image \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "[email protected]" -F "prompt=Describe this image"
POST

/v1/parse/image/ocr

Extract text from image via OCR.

curl -X POST https://api.parsejet.com/v1/parse/image/ocr \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "[email protected]"

SDKs

Official SDKs

TypeScript / JavaScript

npm install parsejet
import { ParseJet } from "parsejet";

const client = new ParseJet({ apiKey: "pj_YOUR_KEY" });

// Parse a URL
const result = await client.parse.url("https://example.com");
console.log(result.text);

// Parse a file
const result = await client.parse.file(buffer, "report.pdf");
console.log(result.text);

Python

pip install parsejet
from parsejet import ParseJet

client = ParseJet(api_key="pj_YOUR_KEY")

# Parse a URL
result = client.parse.url("https://example.com")
print(result.text)

# Parse a file
with open("report.pdf", "rb") as f:
    result = client.parse.file(f, "report.pdf")
    print(result.text)

AI Agents

MCP Server

Use ParseJet as an MCP (Model Context Protocol) server with Claude Code, Cursor, or any MCP-compatible AI agent.

Install

npm install -g @parsejet/mcp-server

Claude Code

Add to your project's .claude/settings.json:

{
  "mcpServers": {
    "parsejet": {
      "command": "parsejet-mcp",
      "env": {
        "PARSEJET_API_KEY": "pj_YOUR_KEY"
      }
    }
  }
}

Available tools

Tool Description
parse_urlParse any URL (web page, YouTube, etc.)
parse_fileParse a local file (PDF, DOCX, images, etc.)
get_youtube_transcriptGet YouTube video transcript with optional language

Rate Limits & Pricing

ParseJet uses a credit-based system. Each request consumes credits based on the format complexity.

Plan Price Credits/mo RPM Max file
Free$0300510MB
Pro$19/mo3,0003050MB
Business$49/mo20,00060100MB
Scale$99/mo50,000200200MB
EnterpriseCustomCustomCustomCustom

Response headers include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After on 429 responses.

Error Codes

All errors return JSON with error and message fields.

StatusCodeDescription
400unsupported_formatFile type not supported
401invalid_api_keyMissing or invalid API key
413file_too_largeFile exceeds plan limit
422parse_errorFile corrupted or unreadable
429rate_limit_exceededRPM or daily/monthly limit hit
502parser_unavailableParser backend unreachable
504parser_timeoutParse operation timed out