Back to Devexpress

PdfDocumentProcessor.GetText(PdfDocumentArea, PdfTextExtractionOptions) Method

officefileapi-devexpress-dot-pdf-dot-pdfdocumentprocessor-dot-gettext-x28-devexpress-dot-pdf-dot-pdfdocumentarea-devexpress-dot-pdf-dot-pdftextextractionoptions-x29.md

latest3.3 KB
Original Source

PdfDocumentProcessor.GetText(PdfDocumentArea, PdfTextExtractionOptions) Method

Retrieves document content from the specified area with specified extraction options.

Namespace : DevExpress.Pdf

Assembly : DevExpress.Docs.v25.2.dll

NuGet Package : DevExpress.Document.Processor

Declaration

csharp
public string GetText(
    PdfDocumentArea area,
    PdfTextExtractionOptions options
)
vb
Public Function GetText(
    area As PdfDocumentArea,
    options As PdfTextExtractionOptions
) As String

Parameters

NameTypeDescription
areaPdfDocumentArea

The document area from which the content should be extracted.

| | options | PdfTextExtractionOptions |

A PdfTextExtractionOptions object that contains extraction options.

|

Returns

TypeDescription
String

The text obtained from the specified area.

|

Remarks

The GetText method uses the page coordinate system. Refer to the following help topic for more details: Coordinate Systems.

Use the PdfTextExtractionOptions.ClipToCropBox property to extract content without clipping to the crop box.

The code sample below retrieves document content from the specified area:

csharp
using (DevExpress.Pdf.PdfDocumentProcessor processor = new DevExpress.Pdf.PdfDocumentProcessor())
{
    processor.LoadDocument("TextExtraction.pdf");
    PdfPage page = processor.Document.Pages[0];

    PdfRectangle pdfRectangle = new PdfRectangle(page.CropBox.Left / 3, page.CropBox.Bottom, page.CropBox.Right / 3, page.CropBox.Top);
    PdfDocumentArea pageArea = new PdfDocumentArea(1, pdfRectangle);

    string pageText = 
    processor.GetText(pageArea, new PdfTextExtractionOptions { ClipToCropBox = false });
    Console.WriteLine(pageText);
}
vb
Using processor As New DevExpress.Pdf.PdfDocumentProcessor()
  processor.LoadDocument("TextExtraction.pdf")
  Dim page As PdfPage = processor.Document.Pages(0)

  Dim pdfRectangle As New PdfRectangle(page.CropBox.Left / 3, page.CropBox.Bottom, page.CropBox.Right / 3, page.CropBox.Top)
  Dim pageArea As New PdfDocumentArea(1, pdfRectangle)

  Dim pageText As String = 
  processor.GetText(pageArea, New PdfTextExtractionOptions With {.ClipToCropBox = False})
  Console.WriteLine(pageText)
End Using

See Also

PdfDocumentProcessor Class

PdfDocumentProcessor Members

DevExpress.Pdf Namespace