Best Java PDF Libraries in 2026
Generating PDFs from application data is one of the most common tasks in Java backend development. Invoices, financial reports, contracts, certificates, compliance documents — all of these end up as PDF files served from a REST endpoint, attached to an email, or stored in an archive.
The Java ecosystem has several libraries for this, and the differences between them go beyond feature lists. Licensing determines whether you can use the library at all. Dependency footprint affects your deployment. Determinism determines whether you can write meaningful tests for your output.
This article compares the four most relevant Java PDF libraries in 2026: iText, Apache PDFBox, OpenPDF, and PDFixa. The goal is to help you make an informed decision based on your actual requirements.
How to evaluate a Java PDF library
Before comparing specific libraries, establish what matters for your project. The criteria below appear in every PDF library evaluation — but their priority depends on your use case.
Dependencies
Some Java PDF libraries ship as a single JAR with no transitive dependencies. Others bring in Bouncy Castle, Apache Commons, XML parsers, font subsystems, and logging frameworks. For teams deploying to containers, serverless functions, or GraalVM native images, every transitive dependency adds to image size, startup time, and CVE surface area.
A library's dependency count is not inherently good or bad — but it should match your deployment constraints.
API complexity
Java PDF libraries fall into two categories:
- Low-level (PDF-native): You work with content streams, font objects, page dictionaries, and cross-reference tables. You have full control, but generating a simple invoice requires significant boilerplate.
- High-level (document-drawing): You call methods like
drawTextBox(x, y, w, h, text)anddrawImage(x, y, w, h, bytes). Less control over the raw PDF, but faster to implement.
If your use case is generating invoices and reports from structured data, a high-level API is usually more productive. If you need to manipulate existing PDFs, fill forms, or extract text, a low-level API is likely necessary.
Licensing
This is the single most important filter for commercial applications.
| License type | What it means | Who uses it |
|---|---|---|
| AGPL | Your application must be open source if users interact with it over a network | iText 7 community edition |
| Commercial | Paid license, negotiated per deployment | iText 7 commercial |
| Apache 2.0 | Free for any use, including commercial | Apache PDFBox |
| LGPL / MPL | Free for commercial use with minor linking constraints | OpenPDF |
| Permissive | Free for any use, no copyleft obligations | PDFixa |
If you are building a commercial SaaS application and cannot use AGPL software, this eliminates the iText community edition immediately. This is the most common reason developers search for iText alternatives.
Performance
Performance matters in two distinct scenarios:
- On-demand generation: A REST endpoint generates a single PDF per request. Latency per document matters. Startup time does not.
- Batch generation: A scheduled job generates thousands of documents. Memory pressure, GC behavior, and streaming output become critical.
Most libraries perform acceptably for single-document generation. Batch scenarios require benchmarking with your specific document structure and data volume.
Determinism
This criterion is unfamiliar to most developers but critical for certain workflows.
Most PDF libraries embed a creation timestamp and a randomly generated document ID into every file. This means the same code with the same input produces different bytes on every execution. The document looks identical when viewed, but the files are not byte-equal.
This matters when:
- You want to write unit tests that compare expected output to actual output
- You cache generated PDFs by content hash
- You store documents in a content-addressable system with deduplication
- You need to prove that the same input produces the same document (audit, compliance)
Only one library in this comparison produces deterministic output by design. If determinism is a requirement, this narrows the field significantly.
Java PDF library overviews
iText
Overview
iText is the oldest and most feature-complete Java PDF library. iText 5 was the industry standard for a decade. iText 7, the current major version, is a complete rewrite with a modular architecture, an extensive plugin system, and coverage of nearly the full PDF specification.
Strengths
- Most comprehensive feature set of any Java PDF library
- Supports digital signatures, PDF/A, PDF/UA, tagged PDF, XFA forms, and barcodes
- Active development with regular releases and commercial support
- Extensive documentation, tutorials, and community knowledge base
- Well-suited for regulated industries requiring compliance output
Considerations
- Community edition is licensed under AGPL — not usable in most commercial applications without purchasing a commercial license
- Commercial license pricing is not public and is negotiated per deployment
- Significant transitive dependency tree (Bouncy Castle, SLF4J, and others)
- API reflects the PDF specification model — requires understanding of PDF internals for advanced features
- Generation is not deterministic: output bytes differ across runs
Best for: Enterprise applications with complex PDF requirements — digital signatures, interactive forms, PDF/A archival, accessibility compliance. Organizations with budget for a commercial license.
Apache PDFBox
Overview
Apache PDFBox is an open-source library from the Apache Software Foundation. It covers PDF creation, reading, text extraction, rendering, printing, and signing. Its API models the PDF specification closely, providing fine-grained control at the cost of verbosity.
Strengths
- Apache 2.0 license — fully permissive for commercial use
- Part of the Apache ecosystem with long-term maintenance expectations
- Strong capabilities for reading, parsing, and extracting text from existing PDFs
- Supports digital signatures, form filling, and PDF/A validation
- Renders PDF pages to images (useful for previews and testing)
Considerations
- Creating documents from scratch requires working directly with
PDPage,PDPageContentStream, and manual coordinate positioning - No high-level layout abstraction — no built-in tables, automatic line wrapping, or pagination
- More naturally suited for PDF reading/manipulation than generation from application data
- Generation is not deterministic: output bytes differ across runs
- Multiple transitive dependencies (Commons Logging, FontBox)
Best for: Applications that need to read, extract, modify, or sign existing PDF files. Text extraction pipelines. Teams that need Apache 2.0 licensing and are comfortable with a verbose API.
OpenPDF
Overview
OpenPDF is a community-maintained fork of iText 4, created after iText moved to more restrictive licensing. It preserves the iText 4 API under LGPL and MPL, making it a drop-in replacement for legacy iText 4 and early iText 5 codebases.
Strengths
- LGPL / MPL license — permissive for commercial use without AGPL concerns
- API is nearly identical to iText 4/5 — minimal migration effort from legacy codebases
- Covers basic document generation: text, paragraphs, tables, images, fonts
- Actively maintained with regular releases as of 2026
Considerations
- Feature set is based on iText 4 — does not include features introduced in iText 7 (PDF 2.0, advanced PDF/A, tagged PDF)
- Limited support for modern PDF standards
- Community is smaller than iText or PDFBox
- Generation is not deterministic: output bytes differ across runs
- Documentation is sparser than the other three libraries
Best for: Teams migrating from iText 4 or iText 5. Applications that need basic PDF generation (text, tables, images) under a permissive license. Projects where the iText 4 API is already well understood.
PDFixa
Overview
PDFixa is a PDF generation library for Java 17 and later. It is designed for a single purpose: generating PDF documents from application data with deterministic, byte-identical output. It ships as a single JAR with zero transitive dependencies.
Strengths
- Permissive open-source license — no AGPL, no copyleft
- Zero transitive dependencies — single JAR, no Bouncy Castle, no Commons, no logging framework
- Deterministic output by design: same API calls + same data = same bytes, every run, every machine
- Clean three-class API:
PdfDocument,PdfPage,PdfFont - Built for modern Java:
AutoCloseable, clean method signatures, Java 17 minimum - Available on Maven Central — standard Maven/Gradle dependency, no custom repository
Considerations
- Younger project with a narrower feature set than iText or PDFBox
- Does not read or parse existing PDF files — generation only
- No built-in support for digital signatures, PDF/A, or interactive forms
- PDFixa Core supports Latin-1 (WinAnsiEncoding); full Unicode requires PDFixa Pro
Best for: Generating invoices, reports, receipts, and document exports from application data. Teams that need reproducible output for testing or caching. Microservice and serverless deployments where dependency footprint matters. Projects on Java 17+ that want a minimal, modern API.
Minimal example:
import dev.offixa.pdfixa.PdfDocument;
import dev.offixa.pdfixa.PdfPage;
import java.io.FileOutputStream;
try (PdfDocument doc = new PdfDocument()) {
PdfPage page = doc.addPage(595, 842); // A4
page.drawTextBox(72, 750, 450, 40, "Invoice #1001");
doc.writeTo(new FileOutputStream("invoice.pdf"));
}
Maven dependency:
<dependency>
<groupId>dev.offixa</groupId>
<artifactId>pdfixa</artifactId>
<version>1.0.0</version>
</dependency>
Java PDF library comparison table
| Library | License | Dependencies | API Style | Deterministic | Best For |
|---|---|---|---|---|---|
| iText 7 | AGPL / commercial | Many | PDF-native, layered | No | Enterprise compliance, signatures, PDF/A |
| Apache PDFBox | Apache 2.0 | Several | PDF-native, verbose | No | Reading, extraction, signing |
| OpenPDF | LGPL / MPL | Several | iText 4 style | No | iText 4 migration, basic generation |
| PDFixa | Permissive | None | Document-drawing | Yes | Generation from data, testing, microservices |
When to choose each Java PDF library
Scenario 1: Enterprise compliance and digital signatures
Your application must produce PDF/A-compliant documents, apply PKCS#7 digital signatures, or fill interactive XFA forms. Regulatory requirements (banking, insurance, government) dictate the document format.
Choose iText 7 with a commercial license. No other Java PDF library covers the full breadth of enterprise PDF features.
Scenario 2: Reading, extracting, or modifying existing PDFs
Your application ingests existing PDF files and needs to extract text, fill forms, apply annotations, merge documents, or validate compliance. Generation from scratch is secondary.
Choose Apache PDFBox. Its PDF reading and manipulation capabilities are the strongest in the open-source space, and the Apache 2.0 license removes any commercial concern.
Scenario 3: Migrating from iText 4 or 5
Your codebase already uses iText 4 or early iText 5. The AGPL license change in iText 7 is not acceptable. You need a drop-in replacement with minimal code changes.
Choose OpenPDF. The API is nearly identical to iText 4, and the LGPL license is compatible with commercial use.
Scenario 4: Generating documents from data with deterministic output
Your application generates invoices, reports, receipts, or certificates from structured data. You want to write unit tests that verify PDF output byte-for-byte. You want to cache generated PDFs by content hash. Your deployment is a container or serverless function where every transitive dependency adds to image size and cold start time.
Choose PDFixa. It is the only Java PDF library in this comparison that produces deterministic output by design and ships with zero transitive dependencies.
Why deterministic PDF generation matters
Most developers have never thought about whether their PDF library produces the same bytes for the same input. The answer, for every major library except one, is no.
Here is why this matters in practice.
Golden file testing
The simplest and most reliable way to test PDF generation is golden file testing: generate the expected output once, store it in your test resources, and compare every future run against it.
@Test
void invoice_output_is_stable() throws Exception {
byte[] expected = Files.readAllBytes(
Path.of("src/test/resources/golden/invoice.pdf")
);
byte[] actual = InvoiceGenerator.generate(testOrder);
assertArrayEquals(expected, actual);
}
With a non-deterministic library, this test fails on every run — even when the output is visually identical. With a deterministic library, the test passes as long as the output has not changed. A failing test means something actually changed in the document.
Caching by content hash
If your application generates the same PDF for the same request, you can cache it by computing a hash of the input and storing the result:
String key = sha256(orderId + ":" + records.hashCode());
byte[] cached = cache.get(key);
if (cached != null) return cached;
byte[] pdf = InvoiceGenerator.generate(order);
cache.put(key, pdf);
return pdf;
With deterministic output, the same input always produces the same bytes, so the cache key reliably identifies the correct document. With non-deterministic output, you would need to store and compare the full byte array — defeating the purpose of a cache key.
Storage deduplication
Content-addressable storage systems (Amazon S3 with deduplication, IPFS, and similar object stores) identify objects by their content hash. If two files have the same bytes, they are stored once. Non-deterministic PDFs generated from the same data produce different hashes and are stored as separate objects.
Audit and reproducibility
Regulated industries — banking, healthcare, insurance — sometimes require that a document can be reproduced from the same input at any future point in time. If the PDF library embeds a timestamp or random ID, the original file must be stored; it cannot be regenerated. Deterministic output means the document can be regenerated from its inputs and verified against the original hash.
Conclusion
There is no single best Java PDF library. Each library occupies a distinct position:
- iText 7 is the right choice when you need the full breadth of the PDF specification — signatures, compliance, forms — and can accommodate the license model.
- Apache PDFBox is the right choice when reading, parsing, and manipulating existing PDFs is the primary task.
- OpenPDF is the right choice when you need a familiar iText 4 API under a permissive license.
- PDFixa is the right choice when you are generating documents from application data and need deterministic output, zero dependencies, or both.
The decision comes down to three questions:
- Are you generating new PDFs, or processing existing ones?
- What is your licensing constraint?
- Does determinism matter for your testing or caching strategy?
Answer those, and the comparison table above will point you to the right library.
PDFixa resources:
- Documentation — Quick Start, guides, API reference
- GitHub — source code and issues
- pdfixa-examples — runnable invoice, report, and Spring Boot examples
- Maven Central — dependency coordinates
