OnlyFormat

CSV vs JSON vs XML: Choosing a Data Format in 2026

OnlyFormat Editorial Team···10 min read

Three formats dominate data interchange in 2026, and they aren't really competing — they solve different problems. CSV is for flat tables, JSON is the universal language of web APIs, and XML still runs enterprise integrations, document formats, and anywhere digital signatures or formal schemas matter. Picking the wrong one costs you storage, parsing time, tooling, or all three. This guide walks through what each format is genuinely good at, compares them on the same 1,000-row dataset, and ends with a decision tree you can actually use.

1. What each format is really for

Before arguing about which is "best", it helps to remember what each format was designed to model. CSV represents a two-dimensional table: rows and columns, nothing more. Every row has the same columns, cells hold strings (types are implicit), and there's no way to nest anything. It's the lowest common denominator of tabular data.

JSON represents hierarchical objects: strings, numbers, booleans, null, arrays, and objects composed recursively. It's structurally identical to an in-memory data structure in any modern language, which is why it parses fast and round-trips cleanly. JSON was standardised by Douglas Crockford in 2001 and formalised by IETF RFC 8259.

XML represents a tree of elements with attributes, mixed content, and namespaces — and crucially, it was designed around the idea of documents as well as data. XML supports schemas (XSD), queries (XPath/XQuery), transforms (XSLT), digital signatures (XML-DSig), and namespaces for composing vocabularies. That machinery is overkill for an API payload, but exactly right for a 200-page contract or a docx file.

2. CSV: when flat tables are enough

CSV (comma-separated values) is the format spreadsheets, databases, and analytics tools all speak natively. Export from Excel, import into Postgres, drop into pandas, load as a BigQuery external table — it just works. Because each record is a single line, CSV streams perfectly: you can process a 50 GB file with a few KB of memory using a line-based reader.

The catch is that CSV is deceptively underspecified. IETF RFC 4180 gives a reasonable baseline (commas as delimiters, CRLF line endings, double-quote escaping), but in the wild you'll hit semicolons (European locales), tabs (TSV), embedded newlines inside quoted fields, BOM markers from Excel, inconsistent quoting, and UTF-8 vs Windows-1252 encoding mismatches. A naive split(',') parser will eventually corrupt your data — always use a real CSV library.

Use CSV when: your data is genuinely tabular, you need spreadsheet compatibility, or you're moving bulk records between systems where size and streaming matter more than structure.

3. JSON: the modern default for APIs

JSON is text, UTF-8 by default, and maps onto native data structures in JavaScript, Python, Go, Rust, Java, C#, and every other mainstream language. JSON.parse in V8 is one of the most heavily optimised pieces of software on Earth — in 2023 Chrome shipped a SIMD-accelerated parser that handles typical API payloads at multiple gigabytes per second. For a new web API in 2026, JSON is the default, and everything else has to justify itself.

There are sharp edges. JSON has no comments, which makes it painful as a config format (JSONC and JSON5 are popular workarounds but aren't part of the standard). Numbers are IEEE 754 doubles, so integers larger than 253 silently lose precision — a real problem when you serialise 64-bit IDs from Postgres or Twitter's Snowflake IDs (the standard workaround is to send them as strings). There's no native date type (you encode dates as ISO 8601 strings), and no schema is enforced by default — you add that with JSON Schema, which is optional and out-of-band.

Use JSON when: you're building or consuming a web API, shipping structured data between services, or persisting hierarchical objects that need to be human-readable.

4. XML: still dominant in specific worlds

It's easy to dismiss XML as legacy, but huge industries run on it. SOAP web services still power most bank, government, and insurance integrations. HL7 v3 and FHIR XML carry healthcare records. SEPA payments, SAML assertions, and most B2B EDI pipelines are XML. Office Open XML (docx, xlsx, pptx) is a zip of XML files. SVG is XML. RSS and Atom feeds are XML. Android layouts are XML. If you're near any of those worlds, XML isn't a choice — it's a given.

What XML gives you that JSON doesn't is a mature document stack: XSD validates structure and types, XPath queries arbitrary nodes, XSLT transforms one dialect into another, and XML-DSig signs fragments of a document so a counterparty can verify a portion without the whole. That tooling is why regulated industries keep using XML even when JSON would be lighter — the contracts, signatures, and validators are already in place.

Use XML when: you're integrating with a system that already speaks it, working with document formats (docx/xlsx/SVG), or you genuinely need the schema, query, or signature machinery that XML ecosystems provide.

5. File-size comparison on the same dataset

To make the trade-offs concrete, here's the same 1,000-row dataset — a typical orders export with 8 fields per row (id, customer, email, product, qty, price, currency, created_at) — encoded three ways:

FormatTypical sizeRelative to CSVGzipped
CSV~85 KB100%~18 KB
JSON (array of objects)~210 KB~245%~22 KB
JSON (array of arrays)~110 KB~130%~19 KB
XML (element per field)~340 KB~400%~26 KB

Exact numbers depend on field lengths and structure — expect ±20% variation. Notice that gzip flattens most of the gap: JSON-object and XML both carry repeated keys/tags that compress extremely well, so over the wire the difference is much smaller than on disk.

6. Parsing speed and memory

Raw file size is only half the story. The other half is what it costs to turn bytes into an in-memory structure. JSON wins this comfortably in most runtimes: JSON.parse in V8, orjson in Python, and simdjson for C++/Go all hit multi-GB/s throughput on modern CPUs because the grammar is simple and regular. A 10 MB JSON payload typically parses in well under 100 ms.

CSV parsing is even cheaper per byte, and streams naturally: a good CSV reader holds one row at a time and processes the file in constant memory. That's why ETL pipelines handling tens of GB of records still use CSV — you can't fit the whole file in RAM, and you don't need to.

XML is the slowest of the three in DOM mode (loading the whole tree into memory) because the grammar is richer and most parsers build more structure per node. The workaround is SAX or StAX parsing, which emits events ("start element", "end element", "text") as bytes arrive and uses constant memory. Every serious XML pipeline uses streaming for large documents — if you're loading a 500 MB docx the DOM way, you're doing it wrong.

Need to move data between formats?

OnlyFormat's converters handle CSV, JSON, and XML in either direction. Everything runs in your browser — your data never leaves your device.

7. Decision tree: which format should you use?

  • Tabular data, no nesting → CSV. Smallest, streams easily, opens in every spreadsheet. If consumers live in Excel or a data warehouse, this is the right answer.
  • Web API payload or config file → JSON. Native in every language, fast to parse, small enough after gzip. Describe it with JSON Schema (via OpenAPI) for contract safety.
  • Integrating with an existing XML pipeline → XML. Don't fight SOAP, HL7, SEPA, or docx — their tooling assumes XML and converting away loses schemas and signatures.
  • Config with comments and humans editing it → YAML or TOML, not JSON. JSON forbids comments, which makes it a poor fit for files humans edit by hand.
  • Converting between them → CSV ↔ JSON is safe when data is flat; JSON ↔ XML round-trips cleanly only if you agree on a mapping convention (attributes vs elements, repeated tags vs arrays).

Frequently asked questions

Q. Is JSON always better than XML in 2026?

A. For new web APIs, config files, and browser-to-server payloads — yes, JSON is the obvious default. It's smaller, parses faster in every mainstream runtime, and maps directly onto JavaScript and Python objects. But "better" depends on context: if you're integrating with enterprise systems (SOAP, HL7, SEPA, SAML) or editing documents (docx, xlsx, SVG) the data is already XML and converting away from it loses tooling, schemas, and digital signatures. JSON is the modern default; XML is still right where it's entrenched.

Q. When should I use CSV instead of JSON?

A. Whenever your data is genuinely tabular — same columns for every row, no nesting. CSV is dramatically smaller than JSON for the same data (often 40–60% the size), streams trivially line-by-line, and every spreadsheet on the planet opens it. Exports for accounting, analytics dumps, bulk imports into CRMs, and ML training sets all belong in CSV. Reach for JSON as soon as you need nested structures, arrays inside rows, or explicit types.

Q. Does JSON have a schema like XML does?

A. Yes, but it's optional and out-of-band. JSON Schema (currently draft 2020-12) lets you describe the shape, types, required fields, and constraints of a JSON document, and tools like Ajv validate against it at runtime. Unlike XSD for XML, JSON Schema isn't embedded in the document itself — you reference it externally or by convention. For API contracts, OpenAPI wraps JSON Schema and is the de facto standard.

Q. What about newer formats like YAML, TOML, or Protobuf?

A. YAML is JSON with human-friendly syntax — great for config files (Kubernetes, GitHub Actions, CI pipelines) but a poor fit for machine-to-machine payloads because of ambiguous parsing rules. TOML is the better config alternative when you want something strictly typed and obvious (Cargo, pyproject.toml). Protobuf, MessagePack, and CBOR are binary formats — smaller and faster than JSON, but not human-readable and require a schema. Use them for high-throughput internal RPC, not public APIs.

Q. How do I pick a format for a new API?

A. Default to JSON over HTTPS, described by an OpenAPI spec with JSON Schema for each endpoint. It's what every client library, every documentation tool, and every developer expects. Only deviate if you have a specific reason: Protobuf/gRPC for internal high-throughput services, XML if you're integrating with an existing SOAP ecosystem, or CSV endpoints alongside JSON when your users explicitly need bulk exports for spreadsheets.

References

  • IETF RFC 8259 — The JavaScript Object Notation (JSON) Data Interchange Format
  • IETF RFC 4180 — Common Format and MIME Type for Comma-Separated Values (CSV) Files
  • W3C Recommendation — Extensible Markup Language (XML) 1.0 (5th edition)
  • JSON Schema — json-schema.org (draft 2020-12)
  • W3C Recommendation — XML Schema (XSD) 1.1 Parts 1 and 2
  • ECMA-404 — The JSON Data Interchange Syntax
  • MDN Web Docs — Working with JSON

About the OnlyFormat Editorial Team

OnlyFormat's editorial team is made up of working web developers and image-workflow engineers who ship file-conversion tooling for a living. Every guide is reviewed against primary sources — W3C/WHATWG specifications, IETF RFCs, MDN Web Docs, ISO/IEC media standards, and the official documentation of libraries we actually use in production (libwebp, libjpeg-turbo, libavif, FFmpeg, pdf-lib). We update articles when standards change so the guidance stays current.

Sources we cite: W3C · WHATWG · MDN Web Docs · IETF RFCs · ISO/IEC · libwebp · libavif · FFmpeg · pdf-lib