Back to Devexpress

PdfDocumentProcessor.GetText(PdfDocumentPosition, PdfDocumentPosition, PdfTextExtractionOptions) Method

officefileapi-devexpress-dot-pdf-dot-pdfdocumentprocessor-dot-gettext-x28-devexpress-dot-pdf-dot-pdfdocumentposition-devexpress-dot-pdf-dot-pdfdocumentposition-devexpress-dot-pdf-dot-pdftextextractionoptions-x29.md

latest3.6 KB
Original Source

PdfDocumentProcessor.GetText(PdfDocumentPosition, PdfDocumentPosition, PdfTextExtractionOptions) Method

Retrieves document content located between the specified document positions with specified extraction options.

Namespace : DevExpress.Pdf

Assembly : DevExpress.Docs.v25.2.dll

NuGet Package : DevExpress.Document.Processor

Declaration

csharp
public string GetText(
    PdfDocumentPosition startPosition,
    PdfDocumentPosition endPosition,
    PdfTextExtractionOptions options
)
vb
Public Function GetText(
    startPosition As PdfDocumentPosition,
    endPosition As PdfDocumentPosition,
    options As PdfTextExtractionOptions
) As String

Parameters

NameTypeDescription
startPositionPdfDocumentPosition

The area’s start position.

| | endPosition | PdfDocumentPosition |

The area’s end position.

| | options | PdfTextExtractionOptions |

A PdfTextExtractionOptions object that contains extraction options.

|

Returns

TypeDescription
String

The text obtained from the specified area.

|

Remarks

The GetText method uses the page coordinate system. Refer to the following help topic for more details: Coordinate Systems.

This method overload selects text similar to Adobe Acrobat reader’s cursor selection. Use the GetText method overloads with the PdfDocumentArea area parameter to get text from a rectangle.

If there is no text between the specified positions, this method returns text that is nearest to these positions.

The following code snippet retrieves the content located between two positions on the first page:

csharp
using (DevExpress.Pdf.PdfDocumentProcessor processor = new DevExpress.Pdf.PdfDocumentProcessor())
{
    processor.LoadDocument("TextExtraction.pdf");
    PdfDocumentPosition startPosition = new PdfDocumentPosition(1, new PdfPoint(0, 0));
    PdfDocumentPosition endPosition = new PdfDocumentPosition(1, new PdfPoint(500, 500));

    string pageText = 
    processor.GetText(startPosition, endPosition, new PdfTextExtractionOptions { ClipToCropBox = false });
    Console.WriteLine(pageText);
}
vb
Using processor As New DevExpress.Pdf.PdfDocumentProcessor()
  processor.LoadDocument("TextExtraction.pdf")
  Dim startPosition As New PdfDocumentPosition(1, New PdfPoint(0, 0))
  Dim endPosition As New PdfDocumentPosition(1, New PdfPoint(500, 500))

  Dim pageText As String = 
  processor.GetText(startPosition, endPosition, New PdfTextExtractionOptions With {.ClipToCropBox = False})
  Console.WriteLine(pageText)
End Using

See Also

PdfDocumentProcessor Class

PdfDocumentProcessor Members

DevExpress.Pdf Namespace