Take a survey

Developing Applications Using APIs > Assembling PDF Documents > Querying Documents > Getting the text of a document

Getting the text of a document
You can use the DocumentText DDX element to return an XML file containing the words in one or more PDF documents. The documents are specified as child elements of the DocumentText element, which can be one or more PDF source or PDFGroup elements.
In this example, the words from doc1 are listed in the XML stream words.xml.
Example: Getting the text of a document
<DDX>
	<PDF result="doc1">
		<PDF source="doc2"/>
	</PDF>
	<DocumentText result="words.xml">
		<PDF source="doc1"/>
	</Text>
</DDX>
The XML stream conforms to a schema specified in doctext.xsd. Its namespace is
	http://ns.adobe.com/DDX/DocText/1.0
When more than one source document are specified, the pages are aggregated and the text is returned as if it were a single document. In this example, words.xml contains the words from a subset of pages from two documents.
Example: Getting the words from pages in two documents
<DDX>
	<DocumentText result="words.xml">
		<PDF source="doc1" pages="1-10"/>
		<PDF source="doc2" pages="3-5"/>
	</DocumentText>
</DDX>
The result document looks like this:
<DocText xmlns="http://ns.adobe.com/DDX/DocText/1.0/">
	<TextPerPage>
		<Page pageNumber="1"> 
It a re, uterest abuspiostam, C. Axim il hortam intiam tervisq uemorum ommodii fecte in sedii consulvid autea vehebem orurnum is. 
		</Page>
		<Page pageNumber="2">
			Sample Text Sample Text Sample Text Sample Text Sample Text Sample Text
		</Page>
	</TextPerPage>
</DocText>

 

Send me an e-mail when comments are added to this page | Comment Report

Current page: http://livedocs.adobe.com/livecycle/es/sdkHelp/programmer/sdkHelp/assemblePDFQueries.106.3.html