# ParseJet API Reference

> Universal file and URL parsing API. Extracts text content from 25+ formats via a single HTTP endpoint. No local dependencies required.

Base URL: `https://api.parsejet.com`

ParseJet accepts files (via multipart upload) or URLs (via JSON body) and returns structured text output. Every response uses the same format regardless of input type. Supports PDF, DOCX, XLSX, PPTX, CSV, TXT, MD, JSON, XML, HTML, EPUB, YouTube, audio (MP3/WAV/M4A), video (MP4/MKV/AVI), images (JPG/PNG/GIF), RSS, Atom, and OPML.

Anonymous access is available (no API key required) with 3 requests/day for evaluation.

---

## Authentication

| Method | Header | Use case | Limits |
|--------|--------|----------|--------|
| Anonymous | None | Quick testing | 3 requests/day, 2MB files |
| Session | Cookie | Dashboard tool | 10/day, 100/month, 5MB files |
| API Key | `Authorization: Bearer pj_YOUR_KEY` | Production | Based on plan |

Free API keys provide 300 requests/month at no cost.

---

## Response Format

All endpoints return:

```json
{
  "text": "Extracted text content",
  "title": "Document Title",
  "source_type": "pdf",
  "metadata": { "pages": 12, "author": "Jane Doe" }
}
```

- `text` (string): Extracted text content.
- `title` (string): Document or page title.
- `source_type` (string): Format identifier. One of: `txt`, `markdown`, `json`, `csv`, `html`, `xml`, `pdf`, `document`, `epub`, `webpage`, `youtube`, `video`, `audio`, `image`, `feed`, `notebook`, `email`.
- `metadata` (object): Format-specific metadata (page count, author, duration, word count, etc.).

All endpoints accept `?output_format=raw` (default) or `?output_format=markdown` (post-processed with structure detection).

---

## Endpoints

### POST /v1/parse/auto

Auto-detect format from file extension or URL type. This is the recommended starting point.

Accepts `file` (multipart) or `url` (form field), not both.

```bash
curl -X POST https://api.parsejet.com/v1/parse/auto \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "file=@report.pdf"
```

### POST /v1/parse/auto/url

Parse any URL. Distinguishes YouTube from regular webpages automatically.

Request body (JSON):
- `url` (string, required): URL to parse.
- `language` (string, optional): ISO 639-1 code for YouTube transcript language.

```bash
curl -X POST https://api.parsejet.com/v1/parse/auto/url \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'
```

### POST /v1/parse/auto/file

Parse any file. Detects format from file extension. Falls back to content-based detection.

```bash
curl -X POST https://api.parsejet.com/v1/parse/auto/file \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -F "file=@presentation.pptx"
```

### POST /v1/parse/webpage

Extract main content from a webpage URL. Removes navigation, ads, and boilerplate.

Request body (JSON):
- `url` (string, required): Webpage URL.

```bash
curl -X POST https://api.parsejet.com/v1/parse/webpage \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'
```

### POST /v1/parse/youtube

Extract transcript from a YouTube video.

Request body (JSON):
- `url` (string, required): YouTube video URL or video ID.
- `language` (string, optional): ISO 639-1 language code.

Returns `source_type: "youtube"`. Metadata includes `video_id`, `channel`, `duration`.

```bash
curl -X POST https://api.parsejet.com/v1/parse/youtube \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=VIDEO_ID", "language": "en"}'
```

### POST /v1/parse/youtube-audio

Download audio from YouTube for backend transcription. Returns base64-encoded audio in metadata.

Request body (JSON):
- `url` (string, required): YouTube video URL.
- `language` (string, optional): Language hint for transcription.

```bash
curl -X POST https://api.parsejet.com/v1/parse/youtube-audio \
  -H "Authorization: Bearer pj_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=VIDEO_ID"}'
```

### POST /v1/parse/audio

Parse audio file. Supports MP3, WAV, M4A, OGG, FLAC, WebM. Maximum 25MB.

Multipart form fields:
- `file` (file, required): Audio file.
- `language` (string, optional): ISO 639-1 code.
- `with_timestamps` (boolean, optional): Include word-level timestamps. Default `false`.

```bash
curl -X POST https://api.parsejet.com/v1/parse/audio \
  -F "file=@recording.mp3" -F "language=en"
```

### POST /v1/parse/video

Extract audio from video file for transcription. Supports MP4, MKV, AVI, MOV, WebM.

Multipart form fields:
- `file` (file, required): Video file.
- `language` (string, optional): ISO 639-1 code.

```bash
curl -X POST https://api.parsejet.com/v1/parse/video \
  -F "file=@lecture.mp4" -F "language=en"
```

### POST /v1/parse/epub

Parse EPUB ebook. Extracts text organized by chapters.

```bash
curl -X POST https://api.parsejet.com/v1/parse/epub -F "file=@book.epub"
```

### POST /v1/parse/feed

Parse RSS or Atom feed.

```bash
curl -X POST https://api.parsejet.com/v1/parse/feed -F "file=@feed.xml"
```

### POST /v1/parse/opml

Parse OPML subscription list.

```bash
curl -X POST https://api.parsejet.com/v1/parse/opml -F "file=@subscriptions.opml"
```

### POST /v1/parse/image

Analyze image. Supports JPG, PNG, GIF, BMP, WebP, TIFF. Maximum 20MB.

Multipart form fields:
- `file` (file, required): Image file.
- `prompt` (string, optional): Custom prompt for image analysis.
- `model` (string, optional): Vision model override.

```bash
curl -X POST https://api.parsejet.com/v1/parse/image \
  -F "file=@photo.jpg" -F "prompt=Describe this image"
```

### POST /v1/parse/image/ocr

Extract text from image via OCR.

```bash
curl -X POST https://api.parsejet.com/v1/parse/image/ocr -F "file=@screenshot.png"
```

---

## Supported Formats

| Category | Extensions | source_type |
|----------|-----------|-------------|
| Documents | .pdf | pdf |
| Documents | .docx, .pptx, .xlsx | document |
| Documents | .csv | csv |
| Text | .txt, .md, .json, .xml, .html | txt, markdown, json, xml, html |
| Web | Any HTTP/HTTPS URL | webpage |
| YouTube | youtube.com, youtu.be URLs | youtube |
| Audio | .mp3, .wav, .m4a, .ogg, .flac, .webm | audio |
| Video | .mp4, .mkv, .avi, .mov, .webm | video |
| Ebooks | .epub | epub |
| Feeds | .rss, .atom, .opml | feed |
| Images | .jpg, .png, .gif, .bmp, .webp, .tiff | image |
| Notebooks | .ipynb | notebook |
| Email | .eml, .msg | email |

---

## Error Responses

```json
{
  "error": "unsupported_format",
  "message": "Unsupported file type: .xyz",
  "source_type": "unknown"
}
```

| Status | Code | Description |
|--------|------|-------------|
| 400 | unsupported_format | File type not supported |
| 401 | invalid_api_key | Missing or invalid API key |
| 411 | content_length_required | Content-Length header missing (anonymous/session) |
| 413 | file_too_large | File exceeds plan file size limit |
| 422 | parse_error | File corrupted or unreadable |
| 429 | rate_limit_exceeded | RPM limit hit |
| 429 | daily_limit_exceeded | Daily quota reached |
| 429 | monthly_limit_exceeded | Monthly quota reached |
| 502 | parser_unavailable | Parser backend unreachable |
| 504 | parser_timeout | Parse operation timed out |

---

## Pricing

| Plan | Price | Credits/month | RPM | Max file size | API Keys |
|------|-------|--------------|-----|---------------|----------|
| Free | $0 | 300 | 5 | 10MB | 1 |
| Pro | $19/mo | 3,000 | 30 | 50MB | 3 |
| Business | $49/mo | 20,000 | 60 | 100MB | 10 |
| Scale | $99/mo | 50,000 | 200 | 200MB | 50 |
| Enterprise | Custom | Custom | Custom | Custom | Unlimited |

Credit costs per request: 1 for text formats (txt, md, json, csv, html, xml, feed, notebook, email, audio, image), 2 for documents (docx, pptx, xlsx, epub), 3 for PDF/webpage/video, 5 for YouTube.

Overage: Pro $8/1k credits, Business $6/1k, Scale $5/1k.

---

## SDKs

TypeScript:
```
npm install parsejet
```

```typescript
import { ParseJet } from "parsejet";
const client = new ParseJet({ apiKey: "pj_YOUR_KEY" });
const result = await client.parse.url("https://example.com");
console.log(result.text);
```

Python:
```
pip install parsejet
```

```python
from parsejet import ParseJet
client = ParseJet(api_key="pj_YOUR_KEY")
result = client.parse.url("https://example.com")
print(result.text)
```

Raw HTTP (no SDK, no API key — anonymous):
```bash
curl -X POST https://api.parsejet.com/v1/parse/auto/url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://en.wikipedia.org/wiki/Main_Page"}'
```