Why Can't I Copy Text from a PDF?

You open a PDF, try to select some text, and... nothing happens. Or the text selects but pastes as gibberish. This is one of the most common frustrations with PDFs, and it almost always comes down to one of these 6 reasons.

1. The PDF is a scanned image (most common)

This is the #1 reason people can't copy text from a PDF. When a document is scanned — with a flatbed scanner, a multifunction printer, or a phone camera app like CamScanner — the resulting PDF contains a photograph of each page, not actual text characters. Your PDF viewer renders it as a normal-looking document, but there is literally nothing to select because every page is just a picture.

This is extremely common with older documents, government forms, academic papers from before the digital era, and anything you received as a physical printout that someone later scanned to share electronically.

How to tell: Try clicking and dragging across the text. If nothing highlights at all, or the entire page selects as one big block (like selecting an image), it's a scanned PDF. Another test: zoom to 400% — if the text looks slightly blurry or pixelated like a photograph, it's an image.

Fix: You need OCR (Optical Character Recognition) software to convert the images into text. Free options include Google Docs (upload to Google Drive → "Open with → Google Docs") and the open-source Tesseract CLI tool. For better accuracy — especially with complex layouts, tables, or non-English text — a dedicated tool like ParseJet applies OCR automatically and preserves reading order.

2. The PDF has copy protection enabled

PDF creators can set security permissions that specifically disable text selection and copying. This is common with published ebooks, corporate reports marked "confidential," government publications, and documents from paid databases like JSTOR or IEEE.

You can usually still read the document on screen — the restriction only blocks the copy function. Some viewers show a lock icon or display "Secured" in the title bar.

How to tell: In Adobe Acrobat Reader, go to File → Properties → Security tab. Look at "Document Restrictions Summary." If "Content Copying" shows "Not Allowed," copy protection is active. In Chrome's PDF viewer, try Ctrl+A — if nothing selects, the PDF may be restricted.

Fix: If you have legitimate access to the content (you purchased it, received it through authorized channels, or it's a public government document), tools that process the PDF server-side can extract the text. Google Docs often works — upload to Drive and open as a Google Doc. ParseJet also handles this, since it processes the PDF on its server rather than respecting client-side restrictions.

3. The text is rendered as vector outlines

This is a sneaky one. Some PDFs look perfectly crisp and professional, but the "text" is actually made up of vector shapes — curves and paths that draw the outline of each letter, rather than font characters. This happens when a designer exports from Adobe Illustrator, InDesign, or Figma with the "Convert text to outlines" option enabled (often done to avoid font licensing issues).

The result looks identical to real text on screen, but to the computer, each letter is an abstract drawing — like a tiny logo. There are no characters to select or copy.

How to tell: Zoom in to 800%+ on a character. Real text stays perfectly sharp because it's rendered from a font. Outlined text also stays sharp (it's vector), but you won't be able to select individual characters — your cursor will select the entire text block as one object, or nothing at all. Another sign: the PDF file size is unusually large for a text-heavy document, because vector outlines take more space than font-rendered text.

Fix: Since the original character data is gone, OCR is the only way to recover the text. Upload the PDF to ParseJet or Google Docs — the OCR engine reads the visual shapes and outputs real text characters.

4. Custom font encoding makes text paste as gibberish

This is different from not being able to select text — here, you can select and copy just fine, but when you paste, you get garbage: □□□□, random symbols like "˙ˆ˜¯", or completely wrong characters. The PDF looks fine on screen because the viewer uses the embedded font to render it, but the underlying character codes are non-standard.

This happens when the PDF creator embedded a subset of a font that uses a custom encoding table. Instead of mapping character code 65 to the letter "A" (standard ASCII), the font might map code 65 to "Z" or some other character. The viewer knows how to render it correctly using the font, but copy-paste strips the font information, leaving you with the raw (wrong) character codes.

How to tell: Select a line of text, paste it into Notepad or any plain text editor. If the result is unreadable — symbols, wrong letters, or empty squares — encoding is the culprit.

Fix: Standard copy-paste can't solve this because the issue is in how the characters are stored. Tools that analyze the PDF's internal font tables can remap the characters back to standard encoding. ParseJet does this automatically during extraction. Alternatively, you can try Adobe Acrobat Pro's "Save As Text" function, which sometimes resolves encoding better than copy-paste.

5. Multi-column layout scrambles text order

This is technically "copying text works" — but the result is unusable. In PDFs with two or three columns (common in academic papers, newspapers, magazines, and newsletters), selecting text with your cursor grabs text left-to-right across the full page width. Line 1 of column A gets concatenated with line 1 of column B, then line 2 of column A with line 2 of column B, creating an alternating mess.

Tables have the same problem. When you select and copy a table, you usually get cell values jumbled in an unpredictable order, with no clear separation between rows and columns.

How to tell: Select text in a multi-column area, paste it into a text editor, and read it. If alternating lines seem to come from different parts of the page, layout is the issue.

Fix: You need a tool that detects columns and reads each one separately, in order. Adobe Acrobat Pro has a "Reading Order" tool but it requires manual correction. ParseJet detects columns, tables, and reading order automatically, extracting text in the correct sequence.

6. The PDF is corrupted or incomplete

Sometimes the PDF file itself is damaged — truncated during a download (the file size is suspiciously small), created by buggy software, or partially overwritten. The viewer may still render some or all pages visually, but the internal text data is missing or broken, so selection and copying fail silently.

How to tell: Check for warning messages when opening the PDF ("This document may be damaged"). Compare the file size to what you'd expect — a 200-page report that's only 50 KB is almost certainly corrupt. Try opening the file in a different viewer (Chrome vs Adobe vs Preview) — if they all have trouble, the file is damaged.

Fix: First, try downloading the file again from the original source. If that's not possible, try opening it in Google Chrome (which has a relatively tolerant PDF renderer) and copying from there. As a last resort, ParseJet can often extract text from partially corrupt PDFs that cause other tools to fail entirely, because it processes the raw PDF byte stream rather than relying on a standard PDF rendering pipeline.

Summary: how to identify and fix your specific problem

Can't select text at all → Most likely a scanned image (#1), vector outlines (#3), or copy protection (#2). Try Google Docs first (free), then a dedicated tool like ParseJet for stubborn cases.

Text selects but pastes as gibberish → Custom font encoding (#4). Use ParseJet or Adobe Acrobat Pro's "Save As Text" to remap the characters.

Text copies but is in the wrong order → Multi-column or table layout (#5). Use a layout-aware extraction tool like ParseJet.

Can't open the file or some pages are blank → Corrupted PDF (#6). Re-download from the source, or try ParseJet which handles partial corruption.

Extract text from any PDF — even ones you can't copy from

ParseJet handles scanned pages, copy protection, broken encoding, and complex layouts. Upload your PDF and get clean text in seconds.

Extract text now — free, no signup

Frequently asked questions

Why can't I highlight or select text in my PDF?

Most likely the PDF is a scanned image (not real text) or has copy protection enabled. Use ParseJet to extract the text — it handles both cases automatically via OCR and server-side processing.

Why does text from my PDF paste as gibberish?

This happens when the PDF uses custom font encoding that maps characters to non-standard positions. ParseJet resolves encoding during extraction, returning clean readable text.

How do I know if a PDF is scanned or text-based?

Try selecting text with your cursor. If you can highlight individual words, it's text-based. If nothing highlights or the entire page selects as one block, it's a scanned image.

Can I copy text from a protected PDF legally?

If you have legitimate access to the content (you purchased it, it's a public document, etc.), extracting text for personal use is generally fine. ParseJet processes files server-side without cracking passwords — it simply extracts the visible text content.

Why does copy-paste from PDFs mix up columns?

PDF viewers select text left-to-right across the full page width, ignoring column boundaries. Use a layout-aware extraction tool like ParseJet that detects columns and extracts text in the correct reading order.

Related guides

How to Copy Text from a PDF

Learn 4 proven ways to copy text from any PDF file — regular, scanned, or protected. Includes free methods, step-by-step instructions, and troubleshooting tips.

Start extracting text for free

No signup required. Parse your first file in seconds.

View Pricing