Proper conversion from XLS to PDF in Java

3 minute read Published: 2021-06-30

Generating or modifying Excel files in Java is easy with Apache POI.

But when you generate Excel reports, you always end up having to export them in PDF too.

If you look for a library for this conversion, you'll find many ones, most of them expensive, and none of them giving acceptable results.

I explain here the efficient and reliable enough solution I've used in a Java application.

Overview

The idea is to use LibreOffice Headless, called as an external process from Java.

Called headless, LibreOffice is fast and reliable enough. And most importantly, it produces excellent results.

As it works on files, a sensible option is to work in dedicated temporary directories, thus avoiding race conditions and ensuring automated cleaning.

A working docker installation of LibreOffice

Adding to a Java image, add this command to install LibreOffice for headless execution:

# installing libre office for xls -> pdf conversions
RUN set -x \
    && apt-get update \
    && apt-get install -y --no-install-recommends libreoffice \
    && rm -rf /var/lib/apt/lists/*

The same command can be ran in your standard linux system if you don't use a container.

Doing the conversion in Java

I assume here the presence of a log facility, change the lines starting with log.info according to your env.

// get a temp directory into which to play
// (it will be automatically cleaned)
Path tempDir = Files.createTempDirectory("excel-to-pdf");

// write the workbook as a temporary file
// (if you don't start from a workbook, this step might differ)
File tempExcelFile = tempDir.resolve("report.xlsm").toFile();
FileOutputStream fos = new FileOutputStream(tempExcelFile);
workbook.write(fos);

// call libreoffice headless and politely
// ask it to convert our xlsm file to pdf
ProcessBuilder pb = new ProcessBuilder(
	"libreoffice", "--headless",
	"--convert-to", "pdf", tempExcelFile.getAbsolutePath(),
	"--outdir", tempDir.toAbsolutePath().toString()
);
pb.redirectErrorStream(true);
Process process = pb.start();
BufferedReader reader = new BufferedReader(
	new InputStreamReader(process.getInputStream())
);
String line;
while ((line = reader.readLine()) != null) {
	log.info("[libreoffice stdout+stderr] " + line);
}
process.waitFor();
log.info("converted");

// now the file has been converted

// read the converted file and send/use it
File tempPdfFile = tempDir.resolve("report.pdf").toFile();
FileInputStream fis = new FileInputStream(tempPdfFile);
fis.transferTo(outputStream); // here the example of a servlet download

// remove the temp dir
deleteFileRecursive(tempDir.toFile());

Conclusion

Using an external process might feel a little unsatisfying, and maybe we'll use a POI function in the future, but in the meantime, with no clean dedicated library available, it proves to be a reliable and efficient practical solution.

As you may imagine, this solution applies to other Microsoft formats than just Excel files.