Best Java PDF Libraries in 2026

March 15, 2026

Java PDF Engineering

Generating PDFs from application data is one of the most common tasks in Java backend development. Invoices, financial reports, contracts, certificates, compliance documents — all of these end up as PDF files served from a REST endpoint, attached to an email, or stored in an archive.

The Java ecosystem has several libraries for this, and the differences between them go beyond feature lists. Licensing determines whether you can use the library at all. Dependency footprint affects your deployment. Determinism determines whether you can write meaningful tests for your output.

This article compares the four most relevant Java PDF libraries in 2026: iText, Apache PDFBox, OpenPDF, and PDFixa. The goal is to help you make an informed decision based on your actual requirements.

How to evaluate a Java PDF library

Before comparing specific libraries, establish what matters for your project. The criteria below appear in every PDF library evaluation — but their priority depends on your use case.

Dependencies

Some Java PDF libraries ship as a single JAR with no transitive dependencies. Others bring in Bouncy Castle, Apache Commons, XML parsers, font subsystems, and logging frameworks. For teams deploying to containers, serverless functions, or GraalVM native images, every transitive dependency adds to image size, startup time, and CVE surface area.

A library's dependency count is not inherently good or bad — but it should match your deployment constraints.

API complexity

Java PDF libraries fall into two categories:

Low-level (PDF-native): You work with content streams, font objects, page dictionaries, and cross-reference tables. You have full control, but generating a simple invoice requires significant boilerplate.
High-level (document-drawing): You call methods like drawTextBox(x, y, w, h, text) and drawImage(x, y, w, h, bytes). Less control over the raw PDF, but faster to implement.

If your use case is generating invoices and reports from structured data, a high-level API is usually more productive. If you need to manipulate existing PDFs, fill forms, or extract text, a low-level API is likely necessary.

Licensing

This is the single most important filter for commercial applications.

License type	What it means	Who uses it
AGPL	Your application must be open source if users interact with it over a network	iText 7 community edition
Commercial	Paid license, negotiated per deployment	iText 7 commercial
Apache 2.0	Free for any use, including commercial	Apache PDFBox
LGPL / MPL	Free for commercial use with minor linking constraints	OpenPDF
Permissive	Free for any use, no copyleft obligations	PDFixa

If you are building a commercial SaaS application and cannot use AGPL software, this eliminates the iText community edition immediately. This is the most common reason developers search for iText alternatives.

Performance

Performance matters in two distinct scenarios:

On-demand generation: A REST endpoint generates a single PDF per request. Latency per document matters. Startup time does not.
Batch generation: A scheduled job generates thousands of documents. Memory pressure, GC behavior, and streaming output become critical.

Most libraries perform acceptably for single-document generation. Batch scenarios require benchmarking with your specific document structure and data volume.

Determinism

This criterion is unfamiliar to most developers but critical for certain workflows.

Most PDF libraries embed a creation timestamp and a randomly generated document ID into every file. This means the same code with the same input produces different bytes on every execution. The document looks identical when viewed, but the files are not byte-equal.

This matters when:

You want to write unit tests that compare expected output to actual output
You cache generated PDFs by content hash
You store documents in a content-addressable system with deduplication
You need to prove that the same input produces the same document (audit, compliance)

Only one library in this comparison produces deterministic output by design. If determinism is a requirement, this narrows the field significantly.

Java PDF library overviews

iText

Overview

iText is the oldest and most feature-complete Java PDF library. iText 5 was the industry standard for a decade. iText 7, the current major version, is a complete rewrite with a modular architecture, an extensive plugin system, and coverage of nearly the full PDF specification.

Strengths

Most comprehensive feature set of any Java PDF library
Supports digital signatures, PDF/A, PDF/UA, tagged PDF, XFA forms, and barcodes
Active development with regular releases and commercial support
Extensive documentation, tutorials, and community knowledge base
Well-suited for regulated industries requiring compliance output

Considerations

Community edition is licensed under AGPL — not usable in most commercial applications without purchasing a commercial license
Commercial license pricing is not public and is negotiated per deployment
Significant transitive dependency tree (Bouncy Castle, SLF4J, and others)
API reflects the PDF specification model — requires understanding of PDF internals for advanced features
Generation is not deterministic: output bytes differ across runs

Best for: Enterprise applications with complex PDF requirements — digital signatures, interactive forms, PDF/A archival, accessibility compliance. Organizations with budget for a commercial license.

Apache PDFBox

Overview

Apache PDFBox is an open-source library from the Apache Software Foundation. It covers PDF creation, reading, text extraction, rendering, printing, and signing. Its API models the PDF specification closely, providing fine-grained control at the cost of verbosity.

Strengths

Apache 2.0 license — fully permissive for commercial use
Part of the Apache ecosystem with long-term maintenance expectations
Strong capabilities for reading, parsing, and extracting text from existing PDFs
Supports digital signatures, form filling, and PDF/A validation
Renders PDF pages to images (useful for previews and testing)

Considerations

Creating documents from scratch requires working directly with PDPage, PDPageContentStream, and manual coordinate positioning
No high-level layout abstraction — no built-in tables, automatic line wrapping, or pagination
More naturally suited for PDF reading/manipulation than generation from application data
Generation is not deterministic: output bytes differ across runs
Multiple transitive dependencies (Commons Logging, FontBox)

Best for: Applications that need to read, extract, modify, or sign existing PDF files. Text extraction pipelines. Teams that need Apache 2.0 licensing and are comfortable with a verbose API.

OpenPDF

Overview

OpenPDF is a community-maintained fork of iText 4, created after iText moved to more restrictive licensing. It preserves the iText 4 API under LGPL and MPL, making it a drop-in replacement for legacy iText 4 and early iText 5 codebases.

Strengths

LGPL / MPL license — permissive for commercial use without AGPL concerns
API is nearly identical to iText 4/5 — minimal migration effort from legacy codebases
Covers basic document generation: text, paragraphs, tables, images, fonts
Actively maintained with regular releases as of 2026

Considerations

Feature set is based on iText 4 — does not include features introduced in iText 7 (PDF 2.0, advanced PDF/A, tagged PDF)
Limited support for modern PDF standards
Community is smaller than iText or PDFBox
Generation is not deterministic: output bytes differ across runs
Documentation is sparser than the other three libraries

Best for: Teams migrating from iText 4 or iText 5. Applications that need basic PDF generation (text, tables, images) under a permissive license. Projects where the iText 4 API is already well understood.

PDFixa

Overview

PDFixa is a PDF generation library for Java 17 and later. It is designed for a single purpose: generating PDF documents from application data with deterministic, byte-identical output. It ships as a single JAR with zero transitive dependencies.

Strengths

Permissive open-source license — no AGPL, no copyleft
Zero transitive dependencies — single JAR, no Bouncy Castle, no Commons, no logging framework
Deterministic output by design: same API calls + same data = same bytes, every run, every machine
Clean three-class API: PdfDocument, PdfPage, PdfFont
Built for modern Java: AutoCloseable, clean method signatures, Java 17 minimum
Available on Maven Central — standard Maven/Gradle dependency, no custom repository

Considerations

Younger project with a narrower feature set than iText or PDFBox
Does not read or parse existing PDF files — generation only
No built-in support for digital signatures, PDF/A, or interactive forms
PDFixa Core supports Latin-1 (WinAnsiEncoding); full Unicode requires PDFixa Pro

Best for: Generating invoices, reports, receipts, and document exports from application data. Teams that need reproducible output for testing or caching. Microservice and serverless deployments where dependency footprint matters. Projects on Java 17+ that want a minimal, modern API.

Minimal example:

import dev.offixa.pdfixa.PdfDocument;
import dev.offixa.pdfixa.PdfPage;
import java.io.FileOutputStream;

try (PdfDocument doc = new PdfDocument()) {
    PdfPage page = doc.addPage(595, 842);  // A4
    page.drawTextBox(72, 750, 450, 40, "Invoice #1001");
    doc.writeTo(new FileOutputStream("invoice.pdf"));
}

Maven dependency:

<dependency>
    <groupId>dev.offixa</groupId>
    <artifactId>pdfixa</artifactId>
    <version>1.0.0</version>
</dependency>

Java PDF library comparison table

Library	License	Dependencies	API Style	Deterministic	Best For
iText 7	AGPL / commercial	Many	PDF-native, layered	No	Enterprise compliance, signatures, PDF/A
Apache PDFBox	Apache 2.0	Several	PDF-native, verbose	No	Reading, extraction, signing
OpenPDF	LGPL / MPL	Several	iText 4 style	No	iText 4 migration, basic generation
PDFixa	Permissive	None	Document-drawing	Yes	Generation from data, testing, microservices

When to choose each Java PDF library

Scenario 1: Enterprise compliance and digital signatures

Your application must produce PDF/A-compliant documents, apply PKCS#7 digital signatures, or fill interactive XFA forms. Regulatory requirements (banking, insurance, government) dictate the document format.

Choose iText 7 with a commercial license. No other Java PDF library covers the full breadth of enterprise PDF features.

Scenario 2: Reading, extracting, or modifying existing PDFs

Your application ingests existing PDF files and needs to extract text, fill forms, apply annotations, merge documents, or validate compliance. Generation from scratch is secondary.

Choose Apache PDFBox. Its PDF reading and manipulation capabilities are the strongest in the open-source space, and the Apache 2.0 license removes any commercial concern.

Scenario 3: Migrating from iText 4 or 5

Your codebase already uses iText 4 or early iText 5. The AGPL license change in iText 7 is not acceptable. You need a drop-in replacement with minimal code changes.

Choose OpenPDF. The API is nearly identical to iText 4, and the LGPL license is compatible with commercial use.

Scenario 4: Generating documents from data with deterministic output

Your application generates invoices, reports, receipts, or certificates from structured data. You want to write unit tests that verify PDF output byte-for-byte. You want to cache generated PDFs by content hash. Your deployment is a container or serverless function where every transitive dependency adds to image size and cold start time.

Choose PDFixa. It is the only Java PDF library in this comparison that produces deterministic output by design and ships with zero transitive dependencies.

Why deterministic PDF generation matters

Most developers have never thought about whether their PDF library produces the same bytes for the same input. The answer, for every major library except one, is no.

Here is why this matters in practice.

Golden file testing

The simplest and most reliable way to test PDF generation is golden file testing: generate the expected output once, store it in your test resources, and compare every future run against it.

@Test
void invoice_output_is_stable() throws Exception {
    byte[] expected = Files.readAllBytes(
        Path.of("src/test/resources/golden/invoice.pdf")
    );
    byte[] actual = InvoiceGenerator.generate(testOrder);

    assertArrayEquals(expected, actual);
}

With a non-deterministic library, this test fails on every run — even when the output is visually identical. With a deterministic library, the test passes as long as the output has not changed. A failing test means something actually changed in the document.

Caching by content hash

If your application generates the same PDF for the same request, you can cache it by computing a hash of the input and storing the result:

String key = sha256(orderId + ":" + records.hashCode());
byte[] cached = cache.get(key);
if (cached != null) return cached;

byte[] pdf = InvoiceGenerator.generate(order);
cache.put(key, pdf);
return pdf;

With deterministic output, the same input always produces the same bytes, so the cache key reliably identifies the correct document. With non-deterministic output, you would need to store and compare the full byte array — defeating the purpose of a cache key.

Storage deduplication

Content-addressable storage systems (Amazon S3 with deduplication, IPFS, and similar object stores) identify objects by their content hash. If two files have the same bytes, they are stored once. Non-deterministic PDFs generated from the same data produce different hashes and are stored as separate objects.

Audit and reproducibility

Regulated industries — banking, healthcare, insurance — sometimes require that a document can be reproduced from the same input at any future point in time. If the PDF library embeds a timestamp or random ID, the original file must be stored; it cannot be regenerated. Deterministic output means the document can be regenerated from its inputs and verified against the original hash.

Conclusion

There is no single best Java PDF library. Each library occupies a distinct position:

iText 7 is the right choice when you need the full breadth of the PDF specification — signatures, compliance, forms — and can accommodate the license model.
Apache PDFBox is the right choice when reading, parsing, and manipulating existing PDFs is the primary task.
OpenPDF is the right choice when you need a familiar iText 4 API under a permissive license.
PDFixa is the right choice when you are generating documents from application data and need deterministic output, zero dependencies, or both.

The decision comes down to three questions:

Are you generating new PDFs, or processing existing ones?
What is your licensing constraint?
Does determinism matter for your testing or caching strategy?

Answer those, and the comparison table above will point you to the right library.

PDFixa resources:

Documentation — Quick Start, guides, API reference
GitHub — source code and issues
pdfixa-examples — runnable invoice, report, and Spring Boot examples
Maven Central — dependency coordinates

How to evaluate a Java PDF library​

Dependencies​

API complexity​

Licensing​

Performance​

Determinism​

Java PDF library overviews​

iText​

Apache PDFBox​

OpenPDF​

PDFixa​

Java PDF library comparison table​

When to choose each Java PDF library​

Scenario 1: Enterprise compliance and digital signatures​

Scenario 2: Reading, extracting, or modifying existing PDFs​

Scenario 3: Migrating from iText 4 or 5​

Scenario 4: Generating documents from data with deterministic output​

Why deterministic PDF generation matters​

Golden file testing​

Caching by content hash​

Storage deduplication​

Audit and reproducibility​

Conclusion​

How to evaluate a Java PDF library

Dependencies

API complexity

Licensing

Performance

Determinism

Java PDF library overviews

iText

Apache PDFBox

OpenPDF

PDFixa

Java PDF library comparison table

When to choose each Java PDF library

Scenario 1: Enterprise compliance and digital signatures

Scenario 2: Reading, extracting, or modifying existing PDFs

Scenario 3: Migrating from iText 4 or 5

Scenario 4: Generating documents from data with deterministic output

Why deterministic PDF generation matters

Golden file testing

Caching by content hash

Storage deduplication

Audit and reproducibility

Conclusion