The MCA21 portal rejected your PDF — now what
That is the search query behind "pdf to pdf/a online." A company secretary has spent the morning preparing an annual return. The document looks fine in Adobe Reader. The MCA21 upload screen rejects it with "file not in PDF/A format." The deadline is today. The CS pastes the rejection message into a search bar and clicks the first "PDF to PDF/A converter" that ranks.
Most of those tools upload the file to a server, run Ghostscript or a commercial library on a Linux box, and email back a result. Some produce valid PDF/A-1b output. Many produce files that pass the upload but fail when the regulator's validator runs the next business day — at which point the deadline has passed. A second category of tools, including most browser-based ones, calls itself a PDF/A converter while doing little more than setting a PDF/A flag in the metadata. The output renders fine; veraPDF rejects it; the regulator rejects it.
We built this as a checker rather than a converter because the honest path is to tell you exactly what would fail before you waste another upload attempt. The page below explains what we check, what we do not check, and which real tools (free and paid) actually produce valid PDF/A-1b output today.
Why this is a checker, not a converter
A real PDF/A-1b converter has to do four things that pdf-lib (the JavaScript PDF library that runs in the browser) cannot do well:
1. Embed a full ICC color profile
PDF/A-1b §6.2.3 requires that any document using DeviceRGB, DeviceCMYK, or DeviceGray declare an OutputIntent dictionary pointing at an embedded ICC profile (typically sRGB IEC61966-2.1, ~3KB binary). pdf-lib has no built-in API to embed an ICC stream as a /DestOutputProfile reference. We could ship the profile as a static file and write the dictionary by hand — but the resulting structure rarely passes validation on the first try and the failure modes are not user-actionable in a browser.
2. Subset every font
§6.3.4 mandates that every font used in the document have its program embedded in the PDF. PDF/A-1 also requires that fonts be subset where possible, to avoid licensing complications. pdf-lib can embed a font you supply, but it cannot extract a font already referenced in an existing PDF, subset it to the glyphs used, and re-embed it. That is a font-engineering problem solved by HarfBuzz and FreeType — not a pdf-lib feature, and not something a browser-only tool will ship in the next year.
3. Write a valid XMP packet with the pdfaid namespace
§6.7 mandates an XMP metadata stream in the document catalog. §6.7.11 mandates that the packet include the pdfaid namespace (xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/") declaring the conformance part and level. pdf-lib has no API to author an XMP packet — you would write the XML by hand, including the rdf:Description machinery, byte order marks, and the trailing whitespace padding the spec recommends. Possible, but error-prone.
4. Flatten transparency without rasterising
§6.4 forbids transparency. If the source PDF uses transparency (modern Office exports often do), every transparent region needs to be flattened — split into opaque regions that approximate the visual result. Done correctly, this is what Adobe calls the "Flattener Library"; done badly, the document becomes a giant rasterised image. pdf-lib cannot flatten transparency at all. The browser path is to rasterise everything, which destroys text searchability and drops PDF/A-1a tagged-content compliance.
The browser-based competitors that claim to convert to PDF/A typically skip all four problems and just set a metadata flag. The output renders, the marketing copy is happy, and the file fails veraPDF. We are not interested in shipping that.
Honest scope — what this tool checks and what it doesn't
We run ten static checks against ISO 19005-1, conformance level B. Each check produces PASS, WARN, or FAIL. The full list with what we look for and what we recommend on failure:
What we check
- Encryption (§6.1.3) — pdf-lib's isEncrypted flag plus a regex scan for the /Encrypt dictionary.
- Forbidden actions (§6.6.1, §6.6.2) — JavaScript, URI, and Launch action types in any annotation or catalog OpenAction.
- Embedded files (§6.9) — /EmbeddedFiles, /Filespec, FileAttachment annotations.
- Transparency (§6.4) — /SMask references and /CA or /ca alpha values less than 1.0 in any extended graphics state.
- XMP metadata stream (§6.7) — /Metadata reference in the document catalog.
- PDF/A identifier (§6.7.11) — pdfaid:part and pdfaid:conformance elements in the XMP packet.
- Font embedding (§6.3.4) — count of font dictionaries vs count of FontFile / FontFile2 / FontFile3 streams plus Type3 charprocs.
- OutputIntent and ICC profile (§6.2.3, §6.2.4) — /OutputIntents array and /DestOutputProfile reference.
- Multimedia annotations (§6.5.3) — Movie, Sound, 3D annotation subtypes.
- AcroForm field appearances (§6.9) — /NeedAppearances flag and AcroForm presence.
What we don't check
- Structure tree integrity — required for PDF/A-1a (we target 1b, which does not require tagging).
- Tagged-content reading order — same reason as above.
- Font character encoding completeness — for non-Latin scripts, the font program must include cmap entries for every glyph used. We confirm the font is embedded but do not parse the cmap.
- ICC profile binary validity — we confirm a /DestOutputProfile reference exists; we do not parse the ICC bytes to confirm the profile is well-formed.
- Content streams inside Form XObjects — transparency hidden in a nested Form XObject content stream may slip past our regex scan.
- Digital signature dictionary conformance — /Sig dictionaries have their own conformance rules in §6.10 we do not currently check.
- JavaScript embedded as a stream — we catch /S /JavaScript actions, but not arbitrary JavaScript hidden inside a Names tree.
For the categories above, the right tool is veraPDF. Run our pre-check first to catch the common 80 percent of issues fast and free; run veraPDF as the final gate before submission.
Tools that actually produce valid PDF/A-1b
If the pre-check shows FAIL on your file, here are the three tools that produce veraPDF-clean output today. None of them is browser-only.
Adobe Acrobat Pro
File → Save As Other → PDF/A. Acrobat Pro is the most reliable conversion path because the same vendor wrote the spec and the reference implementation. Subscription is around $14.99/month at the time of writing.
Ghostscript (free, open source)
One command, runs locally, no upload:
gs -dPDFA=1 -dPDFACompatibilityPolicy=1 \ -sColorConversionStrategy=RGB \ -sOutputICCProfile=sRGB.icc \ -sDEVICE=pdfwrite \ -o output.pdf input.pdf
sRGB.icc ships with most Ghostscript installs (look in the iccprofiles/ directory). On macOS: brew install ghostscript. On Ubuntu: apt install ghostscript. On Windows: ghostscript.com/releases.
LibreOffice (free, open source)
File → Export As → Export As PDF → General tab → check "PDF/A-1a". Works only when the source document is editable in LibreOffice (.docx, .odt, .xlsx, etc.). The output is generally clean but has occasional font-embedding gaps for less common scripts — run our pre-check on the result.
India regulatory context — when PDF/A actually matters
Three regulatory contexts in India routinely demand PDF/A-conformant output:
Ministry of Corporate Affairs (MCA21). The MCA21 portal accepts annual returns, board resolutions, and director KYC submissions. Several form types require PDF/A and the upload screen rejects non-conformant files with a generic error. The MCA general circulars on e-filing reference ISO 19005 directly. A failed upload near a deadline is the most common reason a CS searches for "pdf to pdf/a converter" — and the reason we built the checker first.
GST Network (gst.gov.in). Taxpayer document submissions for refund claims, advance ruling applications, and notice replies sometimes require PDF/A archival format. The portal validation is less strict than MCA21 but still rejects files with embedded multimedia or encryption.
High Court e-filing portals. The e-Courts project (ecourts.gov.in) and several High Court e-filing portals (Delhi HC, Bombay HC, Karnataka HC) accept case documents in PDF/A. The validation rules vary by court — the Delhi HC e-filing rules document explicitly references PDF/A-1b. For a missed-deadline case, a non-conformant PDF rejection is the reason a litigant pays a printer to certify a paper copy at 11 PM.
For all three contexts, the cost of a failed validation is concrete — a missed deadline, a paid extension, a paper-print fallback. That is why we will not ship a fake converter. The pre-check tells you the truth in 5 to 15 seconds, in your browser, without ever uploading the file.
Privacy — why browser-local matters here
PDFs that get pre-checked are disproportionately sensitive — annual returns, board minutes, court submissions, GST refund claims. The cost of routing them through a third-party server is asymmetric: most files would never matter, but the ones that do are exactly the ones a server log retains.
This tool runs entirely in your browser. The PDF bytes are read into the page's memory, scanned by JavaScript, and the report is rendered locally. There is no upload. Open dev tools, switch to the Network tab, drop a file, and run the checks — you will see zero PDF requests. The privacy claim is verifiable, not just stated.
If you are routing a sensitive submission through us, that should be a verifiable promise. We'd rather you check than trust.
Frequently asked questions
Why is this a checker and not a converter?
An honest PDF/A-1b converter has to do four hard things: embed a complete ICC color profile (sRGB or coated CMYK, ~3KB binary blob), subset every font down to the glyphs actually used, write a valid XMP packet with the pdfaid namespace declaring part 1 conformance B, and rebuild the structure tree to be tagged. pdf-lib does none of these. We could ship a tool that strips encryption, sets a few metadata fields, and calls itself a PDF/A converter — most browser-based competitors do exactly that — but the output would fail veraPDF, the canonical ISO 19005 validator. For an India regulatory audience submitting to MCA, GST, or court e-filing, shipping a fake converter is a trust-killing event. We would rather tell you what is wrong with your PDF and what real tool to use to fix each issue.
What checks does the tool actually run?
Ten checks against ISO 19005-1, conformance level B: (1) encryption — forbidden by §6.1.3; (2) JavaScript, URI, and Launch actions — forbidden by §6.6.1 / §6.6.2; (3) embedded files and FileAttachment annotations — forbidden by §6.9; (4) transparency, including SMask soft masks and non-1.0 alpha values — forbidden by §6.4; (5) presence of an XMP metadata stream — required by §6.7; (6) PDF/A identifier in XMP — required for validators to recognise the file; (7) font embedding completeness — required by §6.3.4; (8) OutputIntent with embedded ICC profile when device color spaces are used — required by §6.2.3 / §6.2.4; (9) absence of multimedia annotations (Movie, Sound, 3D) — forbidden by §6.5.3; (10) AcroForm field appearance streams — required by §6.9.
What real tools convert a PDF to a valid PDF/A-1b?
Three options that produce veraPDF-clean output today: (1) Adobe Acrobat Pro — File → Save As Other → PDF/A. Most reliable, costs about $14.99/month. (2) Ghostscript (free, open source) — gs -dPDFA=1 -dPDFACompatibilityPolicy=1 -sColorConversionStrategy=RGB -sOutputICCProfile=sRGB.icc -sDEVICE=pdfwrite -o out.pdf input.pdf. Runs locally, no upload. (3) LibreOffice (free) — File → Export As PDF → General → check "PDF/A-1a". Works only if the source is editable in LibreOffice. None of these is browser-only. Browser-only PDF/A conversion is not a solved problem — anyone claiming otherwise is shipping output that fails strict validation.
Why does Indian e-filing care about PDF/A?
The Ministry of Corporate Affairs (MCA21 portal), GST Network, and several High Courts mandate PDF/A for archival submissions because the format guarantees the document will render the same way ten or twenty years from now. Standard PDFs depend on system fonts and ICC profiles that may not be available in the future; PDF/A embeds everything needed to render the document into the file itself. A submission rejected at the portal because of a non-conformant PDF can mean a missed filing deadline. The cost of a failed PDF/A check is real — that is why we built the checker honestly rather than shipping a converter that lies.
Is anything uploaded to a server?
No. The pre-flight runs entirely in your browser. The file is read into memory, scanned with JavaScript, and the report is rendered in the page. Open browser dev tools, go to the Network tab, drop a PDF, and click Run pre-flight checks — you will see zero PDF requests. The privacy claim is verifiable, not just stated. This matters because corrupted or non-compliant PDFs disproportionately hold financial statements, tax returns, board resolutions, and signed contracts — exactly the categories where shipping the file to an unknown server is the wrong tradeoff.
How accurate is this static analysis?
We catch the common failure modes — missing XMP, unembedded fonts, transparency, JavaScript, encryption, multimedia annotations, missing OutputIntent. We use a hybrid approach: pdf-lib for catalog and page structure, plus raw byte-stream regex scanning for objects and annotations. False positives are possible (e.g. a /CA value of 0.95 in a comment string) but rare. False negatives are also possible — we do not deeply parse content streams, so transparency hidden inside a Form XObject content stream will slip through. This is a static-analysis pre-check, not a substitute for veraPDF. If the regulator runs veraPDF and it passes, you are clean. If our pre-check shows FAIL, veraPDF will almost certainly also reject — fix the issues before submitting.
What is veraPDF and why do you keep mentioning it?
veraPDF (verapdf.org) is the canonical open-source PDF/A validator, funded by the PDF Association and the EU PREFORMA project. It is the reference implementation that other PDF/A claims are measured against. Most archive systems and government portals validate submissions with veraPDF or with libraries that embed veraPDF rules. Running veraPDF locally is free and takes one command: java -jar verapdf-greenfield-X.Y.Z.jar --format text yourfile.pdf. Our pre-checker is a faster first pass that runs in the browser; veraPDF is the definitive verdict.
What if my file passes your pre-check but fails veraPDF?
Possible. Our checker is byte-pattern and structure based; it does not implement every clause of ISO 19005-1. Categories we do not check exhaustively: structure tree integrity (PDF/A-1a only — we target 1b), tagged content reading order, font character encoding completeness for non-Latin scripts, ICC profile validity beyond presence, content streams inside Form XObjects, and digital signature dictionary conformance. If veraPDF flags an issue we missed, the fix is usually in the same family — re-export from a trusted PDF/A generator. We log the gap and add a check on a future release.
References: ISO 19005-1:2005 (the PDF/A-1 specification, iso.org/standard/38920.html). veraPDF, the canonical open-source validator (verapdf.org). MCA21 portal e-filing guidelines (mca.gov.in). PDF Association PDF/A primer (pdfa.org).