A PDF is supposed to be a compact, universal way to share a document. And yet, many of the PDFs that arrive in your inbox are 20, 40, sometimes 80 megabytes — well past the attachment limits of most email providers and obnoxious to download on a mobile connection. The good news is that almost every oversized PDF is oversized for the same few reasons, and you can usually cut them down by 60–90% without a human being able to tell the difference on screen. This guide walks through exactly what makes a PDF large and how to fix it.
1. Why most PDFs are large
A PDF is a container. It stores text (as vector descriptions that reference embedded fonts), vector graphics, images, and metadata. In a text-only document, the entire body of a 30-page report might be 50 KB. Everything past that — and PDFs routinely hit 5–50 MB — comes from embedded images.
A single US Letter page scanned at 300 DPI in colour is 2,550 × 3,300 pixels, or roughly 25 megapixels. Stored uncompressed that's 75 MB per page. Even with JPEG compression at high quality, you're looking at 2–4 MB per page for colour scans, and 300–500 KB for black-and-white. PDFs that originated as scans, design exports, or slide decks with photo-heavy content follow this pattern almost universally.
The other minor contributors are unoptimized embedded fonts (a full font file can be 500 KB; properly subsetted it's often 30 KB) and uncompressed object streams. Compared to image bloat, these are small — but on a text-heavy document they still matter.
2. The three techniques that actually reduce size
a. Image downsampling
This is the single biggest lever. If your PDF contains 300 DPI images and will only ever be viewed on a screen, you can downsample them to 150 DPI (or even 100 DPI for draft-quality sharing) and cut the file size by 75% or more without a visible difference on any normal display. For printing, stay at 300 DPI for colour and greyscale, 600 DPI for line art.
Downsampling means resizing the raster data itself — fewer pixels, smaller bytes. It is lossy by definition, but the loss is imperceptible as long as the target DPI matches the viewing medium.
b. Image re-encoding (JPEG recompression)
If the images inside your PDF are stored as Flate-compressed raw pixels (common for PNG-style embeds) or as very-high-quality JPEG, re-encoding them with JPEG at quality 70–85 typically cuts size another 40–60%. On photographic content this is almost always worth doing; on screenshots with text, be more conservative because JPEG artifacts around sharp edges are visible.
Modern PDF compressors support JPEG 2000 and JBIG2 as well. JBIG2 in particular is designed for bilevel (black-and-white) scanned documents and can shrink them by 10× compared to standard CCITT G4 fax compression. If your scan is a contract, receipt, or textual document, JBIG2 is the right tool.
c. Content-stream compression and object deduplication
Every object inside a PDF — text streams, font definitions, metadata — can be wrapped in a compression filter (most commonly Flate/DEFLATE). A surprising number of PDFs created by legacy tools skip this step on some streams, leaving 30–60% savings on the table. Proper optimizers enable Flate compression on every compressible stream and deduplicate identical objects (for example, when the same logo image is embedded on every page).
3. Font subsetting
Fonts are embedded in PDFs so the document renders identically everywhere. A full font file contains thousands of glyphs covering many scripts and symbols — but a typical document uses 50–100. Subsetting keeps only the glyphs the document actually uses, which can shrink a font embed from 500 KB down to 30 KB.
Most design tools subset by default, but PDFs generated from older word processors or from Word/Excel prints often embed full fonts. If you're creating a multi-language document or one that includes unusual symbols, make sure the subsetting tool doesn't strip glyphs you need.
4. What NOT to do
Don't flatten a searchable PDF into a single image. Some "compress PDF" services simply rasterize every page to a low-quality JPEG and wrap it back in a PDF. The file gets small, but you lose text search, text selection, copy-paste, and screen-reader accessibility. The file will also look visibly worse at any zoom level. Avoid any tool that does this without telling you.
Don't upload sensitive PDFs to unknown web services. PDFs often contain signatures, financial statements, personal addresses, or proprietary information. Many free online PDF compressors retain uploaded files for indefinite periods for model training, analytics, or outright resale. Prefer client-side tools (like OnlyFormat's in-browser compressors) or local software for anything confidential.
Don't compress a digitally-signed PDF. Compression changes the file's bytes, which invalidates any digital signature attached to it. If the PDF needs to remain signed, either sign after compressing, or leave the file alone.
5. Expected results by document type
| Source | Typical before | After screen-quality compression | Reduction |
|---|---|---|---|
| 20-page colour scan | 40–80 MB | 4–10 MB | 85–90% |
| B/W document scan | 8–15 MB | 0.3–0.8 MB (JBIG2) | 90–95% |
| Slide deck (PowerPoint export) | 20–50 MB | 3–8 MB | 70–85% |
| Designed report with photography | 15–40 MB | 2–6 MB | 75–85% |
| Text-only Word export | 1–3 MB | 0.2–0.5 MB | 70–85% |
6. A practical workflow
- Identify the viewing medium: email, web, print, archival. That dictates your target DPI.
- Use a tool that exposes image DPI and JPEG quality as separate knobs. "Compress" buttons without settings usually over-compress.
- Downsample images to 150 DPI for screen, keep 300 DPI for print.
- Re-encode images as JPEG at quality 80 for photographic content, or JBIG2 for bilevel scans.
- Subset embedded fonts if your tool has that option.
- Check the output: zoom to 100% and compare side-by-side with the original. If you see artifacts, bump quality back up.
Need to split, merge, or convert a PDF?
OnlyFormat's browser-based PDF tools handle merging, splitting, image extraction, and image-to-PDF conversion. Everything runs locally — your documents never leave your device.
Frequently asked questions
Q. Why is my PDF so large?
A. In almost every case, the answer is embedded images. A scanned document or a PDF exported from a design tool typically contains page-size images at 300 DPI or higher. A single 300 DPI colour scan of a letter-size page is roughly 25 megapixels, which is easily 2–4 MB per page before any compression. Multiply that by 20 pages and you have an 80 MB PDF. Text-only PDFs are almost never large.
Q. What's the difference between 'compress' and 'optimize'?
A. In PDF tooling, they usually mean the same thing — a set of operations that reduce the file's footprint without changing its visual content beyond acceptable thresholds. 'Optimize' sometimes also includes restructuring the PDF for fast web view (linearization) and removing unused objects.
Q. Will compressing a PDF degrade text quality?
A. No, as long as text is stored as actual text (not as a rasterized image). PDF text is a vector description — characters reference embedded font glyphs — and compression operates on those streams losslessly. You only lose quality when images inside the PDF are re-encoded at lower quality or downsampled.
Q. What DPI should I use to reduce file size for email?
A. For screen viewing, 150 DPI is the pragmatic default and looks indistinguishable from 300 DPI on a modern display. For printing, you want to keep 300 DPI for professional output, but 200 DPI is often fine for office printing. Going below 100 DPI becomes visibly soft on high-resolution screens.
Q. Is it safe to compress a PDF I'll sign digitally?
A. Compress before signing, not after. A digital signature is applied to the exact byte content of the PDF at the time of signing. Any modification — including re-compression — invalidates the signature.
References
- ISO 32000-2 — Document management — Portable document format — Part 2 (PDF 2.0)
- ISO/IEC 14492 — JBIG2 (bilevel image compression)
- pdf-lib —
github.com/Hopding/pdf-lib - pdf.js —
github.com/mozilla/pdf.js - Ghostscript PDF optimization documentation
About the OnlyFormat Editorial Team
OnlyFormat's editorial team is made up of working web developers and image-workflow engineers who ship file-conversion tooling for a living. Every guide is reviewed against primary sources — W3C/WHATWG specifications, IETF RFCs, MDN Web Docs, ISO/IEC media standards, and the official documentation of libraries we actually use in production (libwebp, libjpeg-turbo, libavif, FFmpeg, pdf-lib). We update articles when standards change so the guidance stays current.
Sources we cite: W3C · WHATWG · MDN Web Docs · IETF RFCs · ISO/IEC · libwebp · libavif · FFmpeg · pdf-lib