# ParseJet API Reference > Universal file and URL parsing API. Extracts text content from 25+ formats via a single HTTP endpoint. No local dependencies required. Base URL: `https://api.parsejet.com` ParseJet accepts files (via multipart upload) or URLs (via JSON body) and returns structured text output. Every response uses the same format regardless of input type. Supports PDF, DOCX, XLSX, PPTX, CSV, TXT, MD, JSON, XML, HTML, EPUB, YouTube, audio (MP3/WAV/M4A), video (MP4/MKV/AVI), images (JPG/PNG/GIF), RSS, Atom, and OPML. Anonymous access is available (no API key required) with 3 requests/day for evaluation. --- ## Authentication | Method | Header | Use case | Limits | |--------|--------|----------|--------| | Anonymous | None | Quick testing | 3 requests/day, 2MB files | | Session | Cookie | Dashboard tool | 10/day, 100/month, 5MB files | | API Key | `Authorization: Bearer pj_YOUR_KEY` | Production | Based on plan | Free API keys provide 300 requests/month at no cost. --- ## Response Format All endpoints return: ```json { "text": "Extracted text content", "title": "Document Title", "source_type": "pdf", "metadata": { "pages": 12, "author": "Jane Doe" } } ``` - `text` (string): Extracted text content. - `title` (string): Document or page title. - `source_type` (string): Format identifier. One of: `txt`, `markdown`, `json`, `csv`, `html`, `xml`, `pdf`, `document`, `epub`, `webpage`, `youtube`, `video`, `audio`, `image`, `feed`, `notebook`, `email`. - `metadata` (object): Format-specific metadata (page count, author, duration, word count, etc.). All endpoints accept `?output_format=raw` (default) or `?output_format=markdown` (post-processed with structure detection). --- ## Endpoints ### POST /v1/parse/auto Auto-detect format from file extension or URL type. This is the recommended starting point. Accepts `file` (multipart) or `url` (form field), not both. ```bash curl -X POST https://api.parsejet.com/v1/parse/auto \ -H "Authorization: Bearer pj_YOUR_KEY" \ -F "file=@report.pdf" ``` ### POST /v1/parse/auto/url Parse any URL. Distinguishes YouTube from regular webpages automatically. Request body (JSON): - `url` (string, required): URL to parse. - `language` (string, optional): ISO 639-1 code for YouTube transcript language. ```bash curl -X POST https://api.parsejet.com/v1/parse/auto/url \ -H "Authorization: Bearer pj_YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com/article"}' ``` ### POST /v1/parse/auto/file Parse any file. Detects format from file extension. Falls back to content-based detection. ```bash curl -X POST https://api.parsejet.com/v1/parse/auto/file \ -H "Authorization: Bearer pj_YOUR_KEY" \ -F "file=@presentation.pptx" ``` ### POST /v1/parse/webpage Extract main content from a webpage URL. Removes navigation, ads, and boilerplate. Request body (JSON): - `url` (string, required): Webpage URL. ```bash curl -X POST https://api.parsejet.com/v1/parse/webpage \ -H "Authorization: Bearer pj_YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com/article"}' ``` ### POST /v1/parse/youtube Extract transcript from a YouTube video. Request body (JSON): - `url` (string, required): YouTube video URL or video ID. - `language` (string, optional): ISO 639-1 language code. Returns `source_type: "youtube"`. Metadata includes `video_id`, `channel`, `duration`. ```bash curl -X POST https://api.parsejet.com/v1/parse/youtube \ -H "Authorization: Bearer pj_YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "language": "en"}' ``` ### POST /v1/parse/youtube-audio Download audio from YouTube for backend transcription. Returns base64-encoded audio in metadata. Request body (JSON): - `url` (string, required): YouTube video URL. - `language` (string, optional): Language hint for transcription. ```bash curl -X POST https://api.parsejet.com/v1/parse/youtube-audio \ -H "Authorization: Bearer pj_YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://youtube.com/watch?v=VIDEO_ID"}' ``` ### POST /v1/parse/audio Parse audio file. Supports MP3, WAV, M4A, OGG, FLAC, WebM. Maximum 25MB. Multipart form fields: - `file` (file, required): Audio file. - `language` (string, optional): ISO 639-1 code. - `with_timestamps` (boolean, optional): Include word-level timestamps. Default `false`. ```bash curl -X POST https://api.parsejet.com/v1/parse/audio \ -F "file=@recording.mp3" -F "language=en" ``` ### POST /v1/parse/video Extract audio from video file for transcription. Supports MP4, MKV, AVI, MOV, WebM. Multipart form fields: - `file` (file, required): Video file. - `language` (string, optional): ISO 639-1 code. ```bash curl -X POST https://api.parsejet.com/v1/parse/video \ -F "file=@lecture.mp4" -F "language=en" ``` ### POST /v1/parse/epub Parse EPUB ebook. Extracts text organized by chapters. ```bash curl -X POST https://api.parsejet.com/v1/parse/epub -F "file=@book.epub" ``` ### POST /v1/parse/feed Parse RSS or Atom feed. ```bash curl -X POST https://api.parsejet.com/v1/parse/feed -F "file=@feed.xml" ``` ### POST /v1/parse/opml Parse OPML subscription list. ```bash curl -X POST https://api.parsejet.com/v1/parse/opml -F "file=@subscriptions.opml" ``` ### POST /v1/parse/image Analyze image. Supports JPG, PNG, GIF, BMP, WebP, TIFF. Maximum 20MB. Multipart form fields: - `file` (file, required): Image file. - `prompt` (string, optional): Custom prompt for image analysis. - `model` (string, optional): Vision model override. ```bash curl -X POST https://api.parsejet.com/v1/parse/image \ -F "file=@photo.jpg" -F "prompt=Describe this image" ``` ### POST /v1/parse/image/ocr Extract text from image via OCR. ```bash curl -X POST https://api.parsejet.com/v1/parse/image/ocr -F "file=@screenshot.png" ``` --- ## Supported Formats | Category | Extensions | source_type | |----------|-----------|-------------| | Documents | .pdf | pdf | | Documents | .docx, .pptx, .xlsx | document | | Documents | .csv | csv | | Text | .txt, .md, .json, .xml, .html | txt, markdown, json, xml, html | | Web | Any HTTP/HTTPS URL | webpage | | YouTube | youtube.com, youtu.be URLs | youtube | | Audio | .mp3, .wav, .m4a, .ogg, .flac, .webm | audio | | Video | .mp4, .mkv, .avi, .mov, .webm | video | | Ebooks | .epub | epub | | Feeds | .rss, .atom, .opml | feed | | Images | .jpg, .png, .gif, .bmp, .webp, .tiff | image | | Notebooks | .ipynb | notebook | | Email | .eml, .msg | email | --- ## Error Responses ```json { "error": "unsupported_format", "message": "Unsupported file type: .xyz", "source_type": "unknown" } ``` | Status | Code | Description | |--------|------|-------------| | 400 | unsupported_format | File type not supported | | 401 | invalid_api_key | Missing or invalid API key | | 411 | content_length_required | Content-Length header missing (anonymous/session) | | 413 | file_too_large | File exceeds plan file size limit | | 422 | parse_error | File corrupted or unreadable | | 429 | rate_limit_exceeded | RPM limit hit | | 429 | daily_limit_exceeded | Daily quota reached | | 429 | monthly_limit_exceeded | Monthly quota reached | | 502 | parser_unavailable | Parser backend unreachable | | 504 | parser_timeout | Parse operation timed out | --- ## Pricing | Plan | Price | Credits/month | RPM | Max file size | API Keys | |------|-------|--------------|-----|---------------|----------| | Free | $0 | 300 | 5 | 10MB | 1 | | Pro | $19/mo | 3,000 | 30 | 50MB | 3 | | Business | $49/mo | 20,000 | 60 | 100MB | 10 | | Scale | $99/mo | 50,000 | 200 | 200MB | 50 | | Enterprise | Custom | Custom | Custom | Custom | Unlimited | Credit costs per request: 1 for text formats (txt, md, json, csv, html, xml, feed, notebook, email, audio, image), 2 for documents (docx, pptx, xlsx, epub), 3 for PDF/webpage/video, 5 for YouTube. Overage: Pro $8/1k credits, Business $6/1k, Scale $5/1k. --- ## SDKs TypeScript: ``` npm install parsejet ``` ```typescript import { ParseJet } from "parsejet"; const client = new ParseJet({ apiKey: "pj_YOUR_KEY" }); const result = await client.parse.url("https://example.com"); console.log(result.text); ``` Python: ``` pip install parsejet ``` ```python from parsejet import ParseJet client = ParseJet(api_key="pj_YOUR_KEY") result = client.parse.url("https://example.com") print(result.text) ``` Raw HTTP (no SDK, no API key — anonymous): ```bash curl -X POST https://api.parsejet.com/v1/parse/auto/url \ -H "Content-Type: application/json" \ -d '{"url": "https://en.wikipedia.org/wiki/Main_Page"}' ```