ParseJet

PDF to TXT File Converter

Need a .txt file from your PDF? Upload your document and get pure plain text output — no Markdown, no HTML, no formatting tags. Just raw text content you can save as a .txt file, pipe into a script, or import into any system that accepts plain text input.

Drop a file here or browse

Accepts PDF files

Free — 3 requests/day, no signup. for 300 credits/month free.

How it works

1

Upload your PDF

Drop a PDF file above or click to browse. Works with any PDF — text-based, scanned, or mixed content.

2

Extract as plain text

ParseJet strips all formatting — bold, italic, colors, fonts, headers, footers, page numbers — and returns pure text content in reading order.

3

Save as .txt

Copy the output and save it as a .txt file locally. Or use the API to batch-convert entire PDF folders to .txt files programmatically.

Key features

What makes this pdf to txt stand out.

Pure .txt output

No Markdown syntax, no HTML tags, no formatting artifacts. Just raw text — exactly what tools like grep, awk, and sed expect as input.

UTF-8 encoded

Output is always UTF-8 encoded, handling international characters, CJK text, and special symbols correctly in the resulting .txt file.

Scanned PDF → TXT

Image-only PDFs are processed with OCR automatically. The scanned pages become real text in your .txt output.

Batch conversion ready

Use the API to convert an entire directory of PDFs to .txt files in a single script. See the Python and Node.js examples below.

Noise removal

Automatically strips headers, footers, page numbers, and watermarks that would clutter a .txt file.

Use cases

Common scenarios where this tool saves you time.

Data pipeline input

Convert PDFs to .txt files for ingestion into ETL pipelines, Apache Spark, pandas DataFrames, or data warehouses. Plain text is the universal input format.

Search engine indexing

Batch-convert a PDF archive to .txt files for indexing in Elasticsearch, Solr, Meilisearch, or any full-text search engine that reads plain text.

Training data for ML/AI

Build text corpora from PDF document collections. Save each PDF as a .txt file to create clean training datasets for language models, classifiers, or NER systems.

Legacy system import

Many older systems, databases, and mainframe applications only accept .txt or CSV input. Convert PDFs to .txt for import into these systems without manual retyping.

Automate with the API

Use the same tool programmatically. Works with any language — just HTTP.

cURL
# Convert a single PDF to .txt
curl -X POST https://api.parsejet.com/v1/parse/auto/file \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]" \
  | jq -r '.text' > output.txt
Python
import httpx
from pathlib import Path

# Batch-convert all PDFs in a folder to .txt files
pdf_dir = Path("pdfs/")
txt_dir = Path("txt_output/")
txt_dir.mkdir(exist_ok=True)

for pdf_file in pdf_dir.glob("*.pdf"):
    resp = httpx.post(
        "https://api.parsejet.com/v1/parse/auto/file",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        files={"file": (pdf_file.name, pdf_file.read_bytes(), "application/pdf")},
    )
    txt_path = txt_dir / pdf_file.with_suffix(".txt").name
    txt_path.write_text(resp.json()["text"], encoding="utf-8")
    print(f"Saved {txt_path}")
JavaScript
import { readdir, readFile, writeFile } from "fs/promises";
import { join, basename } from "path";

// Batch-convert all PDFs in a folder to .txt files
const pdfDir = "./pdfs";
const outDir = "./txt_output";

for (const file of await readdir(pdfDir)) {
  if (!file.endsWith(".pdf")) continue;
  const formData = new FormData();
  formData.append("file", new Blob([await readFile(join(pdfDir, file))]));

  const res = await fetch("https://api.parsejet.com/v1/parse/auto/file", {
    method: "POST",
    headers: { Authorization: "Bearer YOUR_API_KEY" },
    body: formData,
  });
  const { text } = await res.json();
  await writeFile(join(outDir, basename(file, ".pdf") + ".txt"), text);
}

Want to automate this?

ParseJet API gives you the same parsing power via a single HTTP endpoint. No ffmpeg, no poppler, no tesseract — just one API call.

curl -X POST https://api.parsejet.com/v1/parse/auto/url \ -H "Content-Type: application/json" \ -d '{"url":"https://example.com"}'
Read API Docs

Frequently asked questions

How do I convert a PDF to a .txt file?

Upload your PDF above — ParseJet extracts all text and returns clean plain text output. Copy it and save as a .txt file, or use the API with output redirection (see the cURL example) to save directly.

What is the difference between PDF to TXT and PDF to Markdown?

PDF to TXT gives you raw plain text with no formatting — ideal for data processing, search indexing, and scripts. PDF to Markdown preserves structure (headings, tables, lists) using Markdown syntax — better for documentation and content migration.

Can I batch-convert multiple PDFs to .txt files?

Yes. Use the ParseJet API to loop through a folder of PDFs and save each as a .txt file. See the Python and JavaScript batch conversion examples above.

Can I convert a scanned PDF to TXT?

Yes. ParseJet uses OCR to extract text from scanned PDFs and image-based pages automatically. The result is the same clean .txt output.

What encoding does the .txt output use?

ParseJet returns UTF-8 encoded text, which supports all languages and special characters. When saving as a .txt file, use UTF-8 encoding to preserve the content correctly.

Is it free?

Yes. You get 3 free conversions per day with no signup. Create a free account for 300 credits per month. Paid plans start at $19/month for batch conversion workflows.

Start extracting text for free

No signup required. Parse your first file in seconds.

View Pricing