OnlyFormat

Markdown to PDF: A Developer's Documentation Workflow

OnlyFormat Editorial Team···8 min read

Markdown is how developers write. READMEs, architecture notes, runbooks, RFCs, release notes — they all start life as .md files in a repo. But at some point, a non-developer needs to read the same content. Legal wants the spec archived. A customer wants a printable handout. A manager wants to annotate a PDF in Preview. The conversion from Markdown to PDF is one of the most common handoffs in modern engineering, and it is also one of the easiest places to produce something that looks amateurish — broken images, code blocks wrapped weirdly, diagrams missing, no page numbers. This guide walks through what actually works.

1. Why developers convert Markdown to PDF

The use cases cluster into four groups, and each one has slightly different expectations for the output.

README handoffs to non-technical stakeholders. A GitHub README is rendered in a specific sans-serif layout with sticky navigation, file tree, and clickable anchors. Sending a stakeholder a link works, but half of them will ask for "the document" — they want an email attachment they can forward, mark up, and archive. A PDF is the universal lowest-common-denominator format.

Portable offline specs. Technical specifications that need to be read on a plane, in a secure room, or by a contractor who doesn't have VPN access need to be standalone. A PDF is self-contained: one file, everything embedded, no dependencies.

Invoice and report generation from templated Markdown. This is increasingly common. A build pipeline takes a templated .md file, fills in numbers or transaction data, and emits a branded PDF invoice or report. The toolchain is almost the same as for documentation, with tighter constraints on styling.

Archival records. PDF/A is the ISO standard for long-term preservation. If your compliance team needs to archive a document for seven years, the archival format is almost certainly PDF — and the source of truth is almost certainly a Markdown file sitting in a repo.

2. The three main conversion paths

There are dozens of Markdown-to-PDF tools, but under the hood they almost all reduce to three engines. Knowing which one you're using matters because each has a different failure mode.

a. HTML-in-browser printing

Render the Markdown to HTML, open it in a browser, and use Print → Save as PDF. This is the path most in-browser tools (including OnlyFormat's) take. Pros: zero install, full CSS support, syntax highlighting works natively, and the preview matches the output exactly. Cons: page-break control is limited by the CSS Paged Media Module coverage in your browser, and complex typography (footnotes, cross-references, bibliographies) is harder than in LaTeX. Wins for: READMEs, one-off docs, landing pages, anything under ~50 pages that doesn't need academic typesetting.

b. Pandoc + LaTeX engine

Pandoc parses Markdown into its internal AST and emits LaTeX, which a TeX engine (xelatex, lualatex, or pdflatex) renders to PDF. Pros: publication-quality typography, proper hyphenation, real footnotes and cross-references, bibliography support via BibTeX, full control over page geometry. Cons: the toolchain is ~2 GB installed, SVG and Mermaid don't render natively, and LaTeX error messages are cryptic. Wins for: long-form documents, books, academic papers, anything that needs a table of contents, lists of figures, or a bibliography.

c. Headless browser (Puppeteer / Playwright)

Automate a headless Chromium to load the HTML and call page.pdf(). Pros: identical rendering to the browser-print path, fully automatable, works in CI, can wait for JavaScript-heavy content (Mermaid diagrams, KaTeX) to finish rendering before capture. Cons: slower than Pandoc, requires a ~150 MB Chromium binary, and you're at the mercy of whatever CSS Paged Media features Chromium happens to support this month. Wins for: CI pipelines, templated invoice generation, anything with dynamic diagrams, and any workflow where "it looks the same as in Chrome" is the acceptance criterion.

3. Styling tradeoffs

The default GitHub-style Markdown theme is optimized for screen reading: wide content area, medium-weight sans-serif, flat code blocks, minimal spacing. Print-ready typography has different priorities: narrower measure (60–70 characters per line), serif body face for body copy, hierarchical headings with proper top-margin, and code blocks with visible borders so they don't get lost in a black-and-white printout.

Page breaks are the hardest part. Without intervention, a browser will break a page mid-paragraph, mid-code-block, or right after a heading — all of which look terrible. The CSS controls you need are page-break-inside: avoid on code blocks and tables, page-break-after: avoid on headings, and orphans: 3; widows: 3; on paragraphs. Pandoc handles most of this automatically in its LaTeX templates.

Margins, headers, and footers live in the @page CSS rule (for browser-based paths) or in the Pandoc template's geometry variable. A good default is 2.5 cm margins all round, with the document title in the top-left and page number in the bottom-right. For printed output, keep everything at least 1.5 cm from the physical edge — most consumer printers cannot print closer than that.

Syntax highlighting survives the PDF conversion in all three paths, but the contrast of popular dark themes (Dracula, One Dark) becomes unreadable on white paper. Pick a light theme (GitHub light, Solarized light, Kate's default) specifically for print output, even if your docs site uses a dark theme for on-screen reading.

4. Handling images

Broken images are the single most common complaint about Markdown-to-PDF conversion, and almost every case is a path-resolution problem.

Relative paths like ![diagram](./img/flow.png) work when the converter knows the base directory. Browser-based tools break here if you paste Markdown in without also supplying the image files. Fixes: bundle images into the same upload, use absolute file URLs, or host the images publicly and reference them over HTTPS.

Absolute HTTPS URLs work in every toolchain but introduce a network dependency at conversion time. If the target server is slow or briefly down, your PDF build fails. For production pipelines, mirror the images into the repo and use relative paths with an explicit resource path.

SVGs work natively in the browser and headless-Chromium paths — they're just rendered like any other web content. Pandoc + LaTeX, by contrast, can't handle SVG directly; you need rsvg-convert (librsvg), Inkscape, or a pre-processing step to rasterize SVG into PNG before conversion. If you're seeing "cannot find file flow.svg" errors from LaTeX even though the file exists, this is why.

5. Tables, math, and Mermaid diagrams

FeatureBrowser printPandoc + LaTeXHeadless Chromium
GFM tablesNativeNative (longtable)Native
KaTeX / MathJaxKaTeX works, MathJax needs JS waitRendered natively by LaTeXBoth work, wait for render-complete
Mermaid diagramsJS required; waits depend on browserNeeds pre-render via mmdcNative, wait for render-complete
FootnotesInline only, no bottom-of-pageProper bottom-of-pageInline or CSS-based
Table of contentsManual--toc flagManual or JS-generated

Common failure modes: tables that overflow the page width (fix: narrower columns or font-size: smaller in print CSS), MathJax rendering as raw LaTeX source (fix: wait for the MathJax.typesetPromise() to resolve before capturing), and Mermaid diagrams appearing as code blocks (fix: ensure the Mermaid JS actually ran, or pre-render with the mmdc CLI and replace the code block with the resulting SVG/PNG).

6. File size expectations

A 20-page Markdown README with a handful of screenshots typically produces a 1–3 MB PDF. Text contributes almost nothing (a few hundred KB for a long document once fonts are subsetted). The rest is images. Fewer, smaller, better-compressed images mean smaller PDFs.

The browser-print and headless-Chromium paths embed images at whatever resolution the HTML references. If your README links to raw 4K screenshots, your PDF will be huge. Either resize images before linking (best practice) or add max-width: 800px in CSS so the PDF engine downsamples during rasterization. Pandoc+LaTeX behaves similarly — graphicx embeds what it's given.

For hard size limits (email attachments, regulatory uploads), export first and compress second. Most PDF compressors can take a 5 MB exported PDF and shrink it to 1–2 MB with no visible loss. See our guide to reducing PDF file size for the specifics.

7. Practical workflow recommendations

Different document types reward different toolchains. Here's what actually works in the field.

Spec document (10–50 pages, internal)

Use the browser-print path or an in-browser converter. Set a print stylesheet with A4 @page, serif body font, and page-break-inside: avoid on code blocks. Export once per milestone. For version control, store the .md source in the repo and the exported PDF as a release artifact — never commit the PDF itself.

Landing handout (1–4 pages, external)

Headless Chromium with a custom print stylesheet. You want pixel-perfect control over typography, logo placement, and footer contact info. A small script that takes handout.md and emits handout.pdf with your brand styles embedded is worth the hour of setup.

Release notes (auto-generated, recurring)

Pandoc in CI. Release notes are text-heavy, benefit from a proper table of contents and cross-references, and rarely include dynamic diagrams. A pandoc --toc --pdf-engine=xelatex -o release-v${VERSION}.pdf CHANGELOG.md line in your release workflow takes 2 seconds to run and produces a document that archives well.

Convert Markdown to PDF in your browser

OnlyFormat's Markdown-to-PDF converter runs entirely in your browser — paste or drop a .md file, pick a style, and export. Nothing is uploaded, so private docs stay private.

Frequently asked questions

Q. Is the PDF searchable after conversion?

A. Yes — as long as the conversion path renders Markdown text as actual text rather than rasterizing it. All three mainstream paths (browser print, Pandoc+LaTeX, and headless Chromium) emit real text streams with embedded fonts, so Ctrl+F, copy-paste, and screen readers all work. The only way to end up with an unsearchable PDF is if you (or some online service) flatten pages to images after conversion, which some low-quality tools do silently. Avoid those.

Q. Do my code-block syntax colors survive?

A. Yes, if the renderer applies syntax highlighting before PDF generation. In browser and headless-Chromium paths, highlight.js or Prism runs on the HTML and the resulting color spans are printed as-is. In Pandoc, highlighting is driven by the --highlight-style flag and the Kate-syntax library; colors carry through to both HTML and LaTeX backends. If your code blocks come out black-and-white, you either disabled highlighting or your theme targets dark backgrounds only — switch to a light theme for print output.

Q. How do I add a title page and page numbers?

A. Pandoc handles this natively with --toc, a title-block at the top of the file (title, author, date), and LaTeX templates that emit page numbers in the footer. For the browser and headless-Chromium paths, use the CSS Paged Media Module: an @page rule with @bottom-center { content: counter(page); } produces page numbers, and a dedicated .title-page element with break-after: page creates a standalone cover. Not every browser engine supports every Paged Media feature — Chromium covers the common ones.

Q. Why are my images broken in the exported PDF?

A. Almost always a path-resolution issue. Relative paths like ./diagrams/flow.png work when the renderer knows the base directory, but break when Markdown is piped in from stdin or rendered from a different working directory. Fixes: (1) convert relative paths to absolute file URLs before conversion, (2) for Pandoc, use --resource-path, (3) for hosted files, use fully-qualified HTTPS URLs. SVGs additionally break in Pandoc+LaTeX unless rsvg-convert or Inkscape is installed, because LaTeX engines don't natively rasterize SVG.

Q. What about converting a whole folder of .md files at once?

A. Two approaches. First, concatenate: cat chapter-*.md | pandoc -o book.pdf produces a single combined PDF with a shared table of contents. Second, batch-convert: a shell loop or a small script that runs your chosen tool over every .md file individually, then optionally merges the resulting PDFs. OnlyFormat's PDF Merge tool handles the final combine step in-browser. For recurring pipelines (docs sites, release notes), put the conversion step into your CI so every commit produces a fresh PDF artifact.

References

  • CommonMark 0.30+ specification — spec.commonmark.org
  • GitHub Flavored Markdown Spec — github.github.com/gfm
  • Pandoc User's Guide — pandoc.org/MANUAL.html
  • W3C CSS Paged Media Module Level 3 — w3.org/TR/css-page-3
  • Puppeteer page.pdf() API reference

About the OnlyFormat Editorial Team

OnlyFormat's editorial team is made up of working web developers and image-workflow engineers who ship file-conversion tooling for a living. Every guide is reviewed against primary sources — W3C/WHATWG specifications, IETF RFCs, MDN Web Docs, ISO/IEC media standards, and the official documentation of libraries we actually use in production (libwebp, libjpeg-turbo, libavif, FFmpeg, pdf-lib). We update articles when standards change so the guidance stays current.

Sources we cite: W3C · WHATWG · MDN Web Docs · IETF RFCs · ISO/IEC · libwebp · libavif · FFmpeg · pdf-lib