Back to Devexpress

PdfDocumentProcessor.GetText(PdfDocumentArea) Method

officefileapi-devexpress-dot-pdf-dot-pdfdocumentprocessor-dot-gettext-x28-devexpress-dot-pdf-dot-pdfdocumentarea-x29.md

latest2.8 KB
Original Source

PdfDocumentProcessor.GetText(PdfDocumentArea) Method

Retrieves the text found in the specified document area.

Namespace : DevExpress.Pdf

Assembly : DevExpress.Docs.v25.2.dll

NuGet Package : DevExpress.Document.Processor

Declaration

csharp
public string GetText(
    PdfDocumentArea area
)
vb
Public Function GetText(
    area As PdfDocumentArea
) As String

Parameters

NameTypeDescription
areaPdfDocumentArea

A PdfDocumentArea object.

|

Returns

TypeDescription
String

The content retrieved from the specified area.

|

Remarks

The overloaded GetText method uses the page coordinate system. Refer to the following help topic for more details: Coordinate Systems.

Pass the PdfTextExtractionOptions object as the method parameter to extract text without clipping the content to the crop box.

The code sample below retrieves text from a specific part of the document.

csharp
using (PdfDocumentProcessor processor = new PdfDocumentProcessor())
{
    processor.LoadDocument("TextExtraction.pdf");
    PdfPage page = processor.Document.Pages[0];

    PdfRectangle pdfRectangle = new PdfRectangle(page.CropBox.Left / 3, page.CropBox.Bottom, page.CropBox.Right / 3, page.CropBox.Top);
    PdfDocumentArea pageArea = new PdfDocumentArea(1, pdfRectangle);

    string pageText = processor.GetText(pageArea);
    Console.WriteLine(pageText);
}
vb
Using processor As New PdfDocumentProcessor()
  processor.LoadDocument("TextExtraction.pdf")
  Dim page As PdfPage = processor.Document.Pages(0)

  Dim pdfRectangle As New PdfRectangle(page.CropBox.Left / 3, page.CropBox.Bottom, page.CropBox.Right / 3, page.CropBox.Top)
  Dim pageArea As New PdfDocumentArea(1, pdfRectangle)

  Dim pageText As String = processor.GetText(pageArea)
  Console.WriteLine(pageText)
End Using

See Also

PdfDocumentProcessor Class

PdfDocumentProcessor Members

DevExpress.Pdf Namespace