Repair Corrupted PDF — Rebuild xref Online Free, In-Browser

PDF won't open and the deadline is tomorrow

That is the search query behind "repair corrupted pdf online." The file opened yesterday. Today Adobe Reader throws "There was an error opening this document. The file is damaged and could not be repaired." The PDF holds a signed contract, a tax return, or a thesis chapter, and it needs to render in the next few hours.

The first instinct is to drop it into the first repair tool that ranks for the query. Most of those tools upload the file to a server, run a desktop PDF repair binary on it, and email back a result. If the file holds anything sensitive, that path is wrong — corrupted PDFs disproportionately contain financial statements, ID scans, and legal documents, exactly the categories where a server round-trip is the worst privacy tradeoff.

This tool runs entirely in your browser. It fixes the most common form of PDF corruption — a damaged cross-reference table — without ever uploading your file. When it cannot help, it says so clearly and points you at qpdf, the open-source desktop tool that handles deeper damage. We would rather tell you the truth than guess.

How the repair works

Every PDF ends with a cross-reference table — the xref — that lists every object in the file and where to find it by byte offset. PDF readers jump to the bottom of the file, read the xref, and use it as an index. When the xref is damaged but the page objects above it are intact, the file is unreadable even though the actual content is fine.

The repair runs in three stages. Each stage produces a different result state:

GREEN — lenient load succeeds

We try pdf-lib's permissive parser first with throwOnInvalidObject disabled. If the file loads, the corruption was minor enough that the parser's own recovery handled it. We re-save with normalized structure, which strips object streams and incremental updates that some viewers choke on. Useful even for "mostly fine" files.

YELLOW — xref rebuilt

The lenient load failed. We scan the byte stream for "N N obj" patterns, record their byte offsets, locate the catalog reference (/Root), and synthesize a fresh xref table plus trailer. We append the new xref to the original byte stream, then re-attempt the lenient load on the patched bytes. If it loads, you get a working PDF. Form fields and signatures may not survive — that is the cost of xref reconstruction.

RED — beyond in-browser repair

Both stages failed. The damage is in the page objects themselves, the content streams, or the file lacks a recognizable %PDF header. In-browser tools cannot fix this — proper recovery needs binary tools that decompress object streams, walk damaged content streams, and attempt page-tree reconstruction. We hand you the qpdf command that does the right thing locally.

What this tool can't fix

Honest scope is the point of this page. Other "PDF repair" sites imply they fix every corruption case — they don't, but the YELLOW result lets them claim a success rate. We'd rather list the failure modes plainly so you don't waste time before reaching for the right tool.

1. Page-object damage

If the corruption hit the content streams inside individual pages — the actual drawing instructions, embedded fonts, or compressed image data — rebuilding the xref doesn't help. The xref points at objects that exist but are themselves corrupt.

Local fix: qpdf --decode-level=generalized broken.pdf fixed.pdf. qpdf can decompress and re-encode content streams, which sometimes resurrects damaged pages. mutool clean (part of MuPDF) is a stronger fallback.

2. Missing or corrupt %PDF header

Every PDF starts with "%PDF-1.x" or "%PDF-2.x" in the first 1024 bytes. If the file was truncated at the start, or if it's actually a different file format mislabeled as .pdf, our header check fails and we abort.

Local fix: open the file in a hex editor (HxD on Windows, hexyl on macOS/Linux). If you see PDF object syntax further into the file, you can manually prepend "%PDF-1.7\n" and try again. If the file shows JPEG, ZIP, or RTF magic bytes, it was never a PDF.

3. Encrypted PDFs with corrupt encryption dictionary

Password-protected PDFs have an encryption dictionary that tells readers how to decrypt content streams. If that dictionary itself is mangled, the file is essentially scrambled. Bulk reconstruction of an encrypted PDF requires the password plus the original encryption parameters, which we cannot guess.

Local fix: if you know the password, qpdf can decrypt and re-save: qpdf --password=YOUR_PW --decrypt broken.pdf decrypted.pdf. If you don't know the password and the encryption dictionary is also damaged, recovery is unlikely with any tool.

4. Broken page tree

The page tree (a /Pages object pointing at every page) glues the document together. If the page tree is missing or its references are circular, our xref rebuild succeeds but the loaded PDF has zero pages.

Local fix: mutool clean attempts page-tree reconstruction by walking discovered Page objects. Install MuPDF tools and run mutool clean -d broken.pdf fixed.pdf.

5. Linearized PDFs with damaged hint table

Linearized ("web-optimized") PDFs have a hint table that lets browsers stream-render before the full file arrives. If only the hint table is damaged, our lenient load typically passes and you get GREEN. But if the linearization metadata is mixed up with xref damage, the rebuild can produce a structurally valid file that some readers refuse to open.

Local fix: qpdf --linearize broken.pdf fixed.pdf rebuilds linearization from scratch.

If your file falls into one of these five categories, paying an online repair service won't help either — they're running the same open-source binaries we're recommending you run locally, just on their server with your file in the middle.

Why browser-local matters for damaged PDFs

Corrupted PDFs are a privacy edge case. Files that fail to open are disproportionately important — backups people are checking after a hardware failure, statements pulled from a defunct portal, contracts saved during a flaky internet session. Routine PDFs don't get repaired; meaningful ones do.

That makes the upload-to-fix path a worse tradeoff than it looks. A typical online repair flow uploads the file, runs qpdf or pdftk on a Linux box, and downloads the result. The user's tax return, credit card statement, or signed NDA sits on a third-party server for the round trip. Most services delete after 24 hours; some don't. The privacy posture is whatever the cheapest TOS lets them get away with.

Browser-local repair removes the upload entirely. The file is read into the page's memory, processed by JavaScript, and offered as a download. Open dev tools, watch the Network tab — there are no PDF requests. The privacy claim is verifiable, not just stated.

Frequently asked questions

What kinds of PDF corruption does this tool actually fix?

It fixes the xref-corrupt case, where the cross-reference table at the end of the file is mangled but the page objects themselves are intact. This is the most common form of PDF damage — a partial download, an interrupted save, or a transfer that truncated the trailer without affecting the page content. The tool scans the byte stream for intact PDF objects, rebuilds a fresh xref table pointing at them, and writes a new trailer.

What kinds of corruption does it not fix?

Three failure modes are out of scope. First: page-object damage, where the content streams inside individual pages are corrupted. Second: missing %PDF header, where the first 1024 bytes don't identify the file as a PDF — usually meaning the start of the file was truncated. Third: encrypted PDFs whose encryption dictionary is itself corrupt. For all three, the tool returns a RED result and points you at qpdf, a desktop tool that can attempt deeper recovery.

Why use qpdf instead of an online repair service?

qpdf is free, open-source, and runs locally on your machine. Online repair services upload your PDF to their server — corrupted PDFs frequently contain financial statements, tax returns, or signed contracts, and shipping those to an unknown server is the wrong tradeoff when a local tool exists. The command we recommend is qpdf --decode-level=generalized --object-streams=disable broken.pdf fixed.pdf, which attempts the same xref reconstruction we do plus content-stream recovery we don't.

Does the tool work on encrypted or password-protected PDFs?

Partially. The lenient-load step uses ignoreEncryption=true, which can pass through PDFs whose encryption dictionary is intact. If the encryption metadata itself is corrupted, the file falls through to xref rebuild, which sometimes works and sometimes does not. If you have a password and need to remove it after repair, use our PDF Unlocker tool.

Will form fields, signatures, and annotations survive repair?

Form field values and digital signatures rely on incremental updates layered on top of the original PDF. Rebuilding the xref table flattens the file to its base state, which can drop those layers. Plain-text content, page layouts, and embedded images are preserved. If you need form values back, you may have to re-enter them after repair.

Is anything uploaded to a server?

No. The repair runs entirely in your browser using JavaScript. The file is read into memory, processed, and the result is offered as a download — it never crosses a network. You can verify this by opening browser dev tools, switching to the Network tab, running a repair, and observing zero requests.

How is this different from iLovePDF or PDF2Go's repair tool?

Both upload your PDF to their servers. We do not. Beyond that, server-side tools can run heavier reconstruction (qpdf, pdftk, mutool) that we cannot bundle into a browser. We make the tradeoff explicit: privacy first, with an honest scope. If we can't fix it, we tell you which local tool can — we don't pretend to handle every corruption case.