martedì 9 agosto 2011

Reading PDF file in Java

Usually, in our work as developers, we are asked to create PDF files using java and in this I recommend the use of library iText.
Today, instead we want to read data from a PDF file that have a copy protection to make sure you cannot copy-paste any content from the PDF. 
I try Apache PDFBox  , and with a few rows of code, I'm able to read content of PDF file. 



package it.pdfbox.test;

import java.io.File;
import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;

public class ReadPDF{
          /**
            * Library to use : pdfbox-1.4.0.jar, commons-logging.jar, fontbox-1.4.0.jar
            * Use library bcprov-jdk6-146.jar if pdf file is protected.
            *
            * @param args
            * @throws IOException
            */
             public static void main(String[] args) throws IOException {
                      PDFTextStripper stripper = new PDFTextStripper("UTF8");
                      PDDocument doc = PDDocument.load(new File("myProtected.pdf"));
                      String text = stripper.getText(doc);
                      System.out.println("---Print the content of pdf--");
                      System.out.println(text);
             }
}

Nessun commento:

Posta un commento