Skip to main content

Deterministic Output

PDFixa guarantees that the same sequence of API calls, with the same data, produces byte-identical PDF output on every run.

What makes a PDF non-deterministic

Most PDF libraries introduce at least one of these:

SourceExample
Timestamps/CreationDate (D:20240315...) changes every run
Random IDsDocument ID is a random UUID embedded in the trailer
HashMap iterationResource dictionary keys in unpredictable order
Floating-pointLocale-dependent decimal formatting in content streams
Font subsettingGlyphs subsetted in hash-map order

PDFixa eliminates all of these by design.

What PDFixa does

  • No timestamps — Creation and modification dates are not written, or can be set to a fixed value.
  • Stable IDs — Document ID is derived from content, not randomly generated.
  • Ordered resources — Dictionaries, fonts, and image resources are written in insertion order.
  • Fixed number format — PDF unit values are serialised with a fixed locale and precision.
  • Deterministic font subsetting — Glyph selection and ordering are deterministic.

Verification

You can verify determinism in a test:

@Test
void pdfIsDeterministic() throws Exception {
byte[] first = generateReport(sampleData);
byte[] second = generateReport(sampleData);

assertArrayEquals(first, second);
}

Or compare SHA-256 hashes across deploys:

String hash = sha256Hex(generateReport(sampleData));
assertEquals("e3b0c44298fc...", hash);

Your responsibilities

PDFixa controls its own output, but you control the inputs. For byte-identical results:

DoAvoid
Derive titles/authors from inputInstant.now() in metadata
Use consistent font filesDifferent font versions
Pass images as stable byte arraysRe-encoding images with different quality
Sort collections before iteratingIterating HashMap or Set

See Metadata for field-level guidance.