|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Objectorg.pdfbox.searchengine.lucene.LucenePDFDocument
This class is used to create a document for the lucene search engine. This should easily plug into the IndexHTML or IndexFiles that comes with the lucene project. This class will populate the following fields.
| Lucene Field Name | Description |
|---|---|
| path | File system path if loaded from a file |
| url | URL to PDF document |
| contents | Entire contents of PDF document, indexed but not stored |
| summary | First 500 characters of content |
| modified | The modified date/time according to the url or path |
| uid | A unique identifier for the Lucene document. |
| CreationDate | From PDF meta-data if available |
| Creator | From PDF meta-data if available |
| Keywords | From PDF meta-data if available |
| ModificationDate | From PDF meta-data if available |
| Producer | From PDF meta-data if available |
| Subject | From PDF meta-data if available |
| Trapped | From PDF meta-data if available |
| Constructor Summary | |
LucenePDFDocument()
Constructor. |
|
| Method Summary | |
Document |
convertDocument(File file)
This will take a reference to a PDF document and create a lucene document. |
Document |
convertDocument(InputStream is)
Convert the PDF stream to a lucene document. |
Document |
convertDocument(URL url)
Convert the document from a PDF to a lucene document. |
DateTools.Resolution |
getDateTimeResolution()
Get the Lucene data time resolution. |
static Document |
getDocument(File file)
This will get a lucene document from a PDF file. |
static Document |
getDocument(InputStream is)
This will get a lucene document from a PDF file. |
static Document |
getDocument(URL url)
This will get a lucene document from a PDF file. |
static void |
main(String[] args)
This will test creating a document. |
void |
setDateTimeResolution(DateTools.Resolution resolution)
Set the Lucene data time resolution. |
void |
setTextStripper(PDFTextStripper aStripper)
Set the text stripper that will be used during extraction. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
public LucenePDFDocument()
| Method Detail |
public void setTextStripper(PDFTextStripper aStripper)
aStripper - The new pdf text stripper.public DateTools.Resolution getDateTimeResolution()
public void setDateTimeResolution(DateTools.Resolution resolution)
resolution - set new date/time resolution
public Document convertDocument(InputStream is)
throws IOException
is - The input stream.
IOException - If there is an error converting the PDF.
public Document convertDocument(File file)
throws IOException
file - A reference to a PDF document.
IOException - If there is an exception while converting the document.
public Document convertDocument(URL url)
throws IOException
url - A url to a PDF document.
IOException - If there is an error while converting the document.
public static Document getDocument(InputStream is)
throws IOException
is - The stream to read the PDF from.
IOException - If there is an error parsing or indexing the document.
public static Document getDocument(File file)
throws IOException
file - The file to get the document for.
IOException - If there is an error parsing or indexing the document.
public static Document getDocument(URL url)
throws IOException
url - The file to get the document for.
IOException - If there is an error parsing or indexing the document.
public static void main(String[] args)
throws IOException
args - command line arguments.
IOException - If there is an error.
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||