This article is based on iText in Action, Second Edition, published on October, 2010. It is being reproduced here by permission from Manning Publications. Manning publishes MEAP (Manning Early Access Program,) eBooks and pBooks. MEAPs are sold exclusively through Manning.com. All pBook purchases include free PDF, mobi and epub. When mobile formats become available all customers will be contacted and upgraded. Visit Manning.com for more information. [ Use promotional code ‘java40beat’ and get 40% discount on eBooks and pBooks ]
also read:
- Java Tutorials
- Java EE Tutorials
- Design Patterns Tutorials
- Java File IO Tutorials
Resizing an Image in an Existing Document
Introduction
Here’s a question that is often posted to the mailing list: “How do we reduce the size of an existing PDF containing lots of images?” There are many different answers to this question, depending on the nature of the PDF file. Maybe the same image is added multiple times, in which case passing the PDF through PdfSmartCopy could already result in a serious file size reduction. Maybe the PDF wasn’t compressed or maybe there are plenty of unused objects. You could try and see if the PdfReader method removeUnusedObjects() yields any results.
It’s more likely that the PDF contains high-resolution images, in which case the original question should be rephrased into, “How do I reduce the resolution of the images inside my PDF?” To achieve this, we should extract the image from the PDF, downsample it, and then put it back into the PDF, replacing the high-resolution image. Listing 1 uses brute force instead of the PdfReaderContentParser to find images. With the method getXrefSize() we get the highest object number in the PDF document and we loop over every object, searching for a stream that has the special id we’re looking for.
Listing 1 ResizeImage.java
PdfName key = new PdfName("ITXT_SpecialId"); PdfName value = new PdfName("123456789"); PdfReader reader = new PdfReader(SpecialId.RESULT); int n = reader.getXrefSize(); PdfObject object; PRStream stream; for (int i = 0; i < n; i++) { object = reader.getPdfObject(i); if (object == null || !object.isStream()) continue; stream = (PRStream)object; if (value.equals(stream.get(key))) { PdfImageObject image = new PdfImageObject(stream); BufferedImage bi = image.getBufferedImage(); if (bi == null) continue; int width = (int)(bi.getWidth() * FACTOR); int height = (int)(bi.getHeight() * FACTOR); BufferedImage img = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB); AffineTransform at = AffineTransform.getScaleInstance(FACTOR, FACTOR); Graphics2D g = img.createGraphics(); g.drawRenderedImage(bi, at); ByteArrayOutputStream imgBytes = new ByteArrayOutputStream(); ImageIO.write(img, "JPG", imgBytes); stream.clear(); stream.setData( imgBytes.toByteArray(), false, PRStream.NO_COMPRESSION); stream.put(PdfName.TYPE, PdfName.XOBJECT); stream.put(PdfName.SUBTYPE, PdfName.IMAGE); stream.put(key, value); stream.put(PdfName.FILTER, PdfName.DCTDECODE); stream.put(PdfName.WIDTH, new PdfNumber(width)); stream.put(PdfName.HEIGHT, new PdfNumber(height)); stream.put(PdfName.BITSPERCOMPONENT, new PdfNumber(8)); stream.put(PdfName.COLORSPACE, PdfName.DEVICERGB); } } PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(RESULT)); stamper.close(); A Finds the image stream B Gets the BufferedImage C Creates a new BufferedImage D Writes JPG bytes E Replaces the content of the image stream<
Once we’ve found the stream we need, we create a PdfImageObject that will create us a java.awt.image.BufferedImage named bi. We’ll create a second BufferedImage named img that is a factor smaller. In this example, the value of FACTOR is 0.5. We draw the image bi to the Graphics2D object of the image image using an affine transformation that scales the image down with a factor FACTOR.
We write the image as a JPEG to a ByteArrayOutputStream. We use the bytes from this OutputStream as the new data for the stream object we’ve retrieved from PdfReader. We reset all the entries in the image dictionary and we add all the keys that are necessary for a PDF viewer to interpret the image bytes correctly. After changing the PRStream object in the reader, we use PdfStamper to write the altered file to a FileOutputStream. Again we get a look at the way iText works internally. When we add a JPEG to a document the normal way, iText selects all the entries for the image dictionary for us.
Working at the lowest level is fun and gives you a lot of power but you really have to know what you’re doing; otherwise, you can seriously damage a PDF file. Because of the high complexity, some requirements are close to impossible. For instance, it’s very hard to replace a font.
Summary
We discussed resizing images in a PDF. We used the method getXrefSize()to get the highest object number in the PDF document. Then, we created a PdfImageObject that produces a java.awt.image.BufferedImage named bi. Next, we created a second, smaller BufferedImage named img. We drew the image bi to the Graphics2D object of the image image using an affine transformation that scales the image down with a factor FACTOR. We wrote the image as a JPEG to a ByteArrayOutputStream. We used the bytes from this OutputStream as the new data for the stream object we’ve retrieved from PdfReader. We reset all the entries in the image dictionary and we added all the keys for a PDF viewer to interpret the image bytes correctly. After changing the PRStream object in the reader, we used PdfStamper to write the altered file to a FileOutputStream.