|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Objectorg.pdfbox.util.PDFStreamEngine
org.pdfbox.util.PDFTextStripper
org.pdfbox.util.PDFText2HTML
Wrap stripped text in simple HTML, trying to form HTML paragraphs. Paragraphs broken by pages, columns, or figures are not mended.
| Field Summary |
| Fields inherited from class org.pdfbox.util.PDFTextStripper |
charactersByArticle, output |
| Constructor Summary | |
PDFText2HTML()
Constructor. |
|
| Method Summary | |
void |
endDocument(PDDocument pdf)
This method is available for subclasses of this class. It will be called after processing of the document finishes. |
protected void |
endParagraph()
Write out the paragraph separator. |
protected void |
flushText()
This will print the text to the output stream. |
protected String |
getTitleGuess()
The guess to the document title. |
protected TextPosition |
guessTitle(Iterator textIter)
This method will attempt to guess the title of the document. |
boolean |
isSuppressParagraphs()
|
void |
setSuppressParagraphs(boolean shouldSuppressParagraphs)
|
protected void |
startParagraph()
Write out the paragraph separator. |
protected void |
writeCharacters(TextPosition position)
Write the string to the output stream. |
protected void |
writeHeader()
Write the header to the output document. |
| Methods inherited from class org.pdfbox.util.PDFStreamEngine |
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getXObjects, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, resetEngine, setColorSpaces, setFonts, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix, showString |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
public PDFText2HTML()
throws IOException
IOException - If there is an error during initialization.| Method Detail |
protected void writeHeader()
throws IOException
IOException - If there is a problem writing out the header to the document.protected String getTitleGuess()
protected void flushText()
throws IOException
flushText in class PDFTextStripperIOException - If there is an error writing the text.
public void endDocument(PDDocument pdf)
throws IOException
endDocument in class PDFTextStripperpdf - The PDF document that is being processed.
IOException - If an IO error occurs.protected TextPosition guessTitle(Iterator textIter)
textIter - The characters on the first page.
protected void startParagraph()
throws IOException
startParagraph in class PDFTextStripperIOException - If there is an error writing to the stream.
protected void endParagraph()
throws IOException
endParagraph in class PDFTextStripperIOException - If there is an error writing to the stream.
protected void writeCharacters(TextPosition position)
throws IOException
writeCharacters in class PDFTextStripperposition - The text to write to the stream.
IOException - If there is an error when writing the text.public boolean isSuppressParagraphs()
public void setSuppressParagraphs(boolean shouldSuppressParagraphs)
shouldSuppressParagraphs - The suppressParagraphs to set.
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||