View comments | RSS feed
Take a survey

Section A - DDX User Guide > Querying Documents > Getting the text of a document

Getting the text of a document
You can use the DocumentText DDX element to return an XML file containing the words in one or more PDF documents. The documents are specified as child elements of the DocumentText element, which can be one or more PDF source or PDFGroup elements.
In this example, the words from doc1 are listed in the XML stream words.xml.
Example: Getting the text of a document
<DDX>
	<PDF result="doc1">
		<PDF source="doc2"/>
	</PDF>
	<DocumentText result="words.xml">
		<PDF source="doc1"/>
	</Text>
</DDX>
The XML stream conforms to a schema specified in doctext.xsd. Its namespace is
	http://ns.adobe.com/DDX/DocText/1.0
When more than one source document are specified, the pages are aggregated and the text is returned as if it were a single document. In this example, words.xml contains the words from a subset of pages from two documents.
Example: Getting the words from pages in two documents
<DDX>
	<DocumentText result="words.xml">
		<PDF source="doc1" pages="1-10"/>
		<PDF source="doc2" pages="3-5"/>
	</DocumentText>
</DDX>
The result document looks like this:
<DocText xmlns="http://ns.adobe.com/DDX/DocText/1.0/">
	<TextPerPage>
		<Page pageNumber="1"> 
It a re, uterest abuspiostam, C. Axim il hortam intiam tervisq uemorum ommodii fecte in sedii consulvid autea vehebem orurnum is. 
		</Page>
		<Page pageNumber="2">
			Sample Text Sample Text Sample Text Sample Text Sample Text Sample Text
		</Page>
	</TextPerPage>
</DocText>

Section A - DDX User Guide > Querying Documents > Getting the text of a document

Document Description XML (DDX) Help
LiveCycle ES Update 1

Comments


Hodmi said on Jun 4, 2009 at 8:28 AM :
The first sample is a bit off. The DocumentText tag is closed with a </Text>

 

RSS feed | Send me an e-mail when comments are added to this page | Comment Report

Current page: http://livedocs.adobe.com/livecycle/8.2/ddxRef/000711.html