Back to Devexpress

Use Word Processing Document API to Load HTML Files or Export Documents to HTML

officefileapi-402993-word-processing-document-api-html-import-and-export.md

latest15.4 KB
Original Source

Use Word Processing Document API to Load HTML Files or Export Documents to HTML

  • Feb 21, 2025
  • 7 minutes to read

The Word Processing Document API allows you to import HTML files and save documents as HTML. When you open an HTML file, the RichEditDocumentServer converts HTML content into its internal document model. Some HTML tags and CSS attributes have no counterparts in Open XML and RTF formats. You can find more information about supported and unsupported HTML tags in the following help topic: HTML Support Limitations.

Import HTML Content

Load an HTML Page

Use the RichEditDocumentServer.LoadDocument or RichEditDocumentServer.Document.LoadDocument method to load an HTML page.

csharp
using (var wordProcessor = new RichEditDocumentServer())
{
    wordProcessor.LoadDocument("HtmlDocument.html", DocumentFormat.Html);
}
vb
Using wordProcessor As New RichEditDocumentServer()
    wordProcessor.LoadDocument("HtmlDocument.html", DocumentFormat.Html)
End Using

The sourceUri parameter of the LoadDocument method allows you to load files associated with an HTML document.

csharp
using (var wordProcessor = new RichEditDocumentServer())
{
    wordProcessor.LoadDocument("HtmlDocument.html", DocumentFormat.Html, "HtmlDocument_files");
}
vb
Using wordProcessor As New RichEditDocumentServer()
    wordProcessor.LoadDocument("HtmlDocument.html", DocumentFormat.Html, "HtmlDocument_files")
End Using

Specify HTML Text

Assign HTML text to the RichEditDocumentServer.HtmlText or RichEditDocumentServer.Document.HtmlText property.

csharp
using DevExpress.XtraRichEdit.API.Native;
using System.Net;
// ...

using (var wordProcessor = new RichEditDocumentServer())
{
    using (var client = new WebClient())
    {
        string html = client.DownloadString("https://docs.devexpress.com/WindowsForms/9610/");
        wordProcessor.HtmlText = html;
    }
}
vb
Imports DevExpress.XtraRichEdit.API.Native
Imports System.Net
' ...

Using wordProcessor As New RichEditDocumentServer()
    Using client As New WebClient()
        Dim html As String = client.DownloadString("https://docs.devexpress.com/WindowsForms/9610/")
        wordProcessor.HtmlText = html
    End Using
End Using

You can also use the following methods to insert HTML text into the document:

Define Import Options

Handle the RichEditDocumentServer.BeforeImport event and use the HtmlDocumentImporterOptions class properties to specify import options for HTML documents. These options are also returned by the RichEditDocumentServer.Options.Import.Html property.

csharp
using DevExpress.XtraRichEdit;
using DevExpress.XtraRichEdit.Import;
using System.Text;
// ...

wordProcessor.BeforeImport += (s, e) =>
{
    if (e.DocumentFormat == DocumentFormat.Html)
    {
        var options = (HtmlDocumentImporterOptions)e.Options;
        // Specify encoding.
        options.AutoDetectEncoding = false;
        options.Encoding = Encoding.UTF8;
        // Skip media rules.
        options.IgnoreMediaQueries = true;
        // Load images synchronously with HTML documents.
        options.AsyncImageLoading = false;
        // Preserve image resolution.
        options.ImageScalingDpi = 96;
    }
};
vb
Imports DevExpress.XtraRichEdit
Imports DevExpress.XtraRichEdit.Import
Imports System.Text
' ...

AddHandler wordProcessor.BeforeImport,
    Sub(s, e)
        If e.DocumentFormat = DocumentFormat.Html Then
            Dim options As HtmlDocumentImporterOptions = CType(e.Options, HtmlDocumentImporterOptions)
            ' Specify encoding.
            options.AutoDetectEncoding = False
            options.Encoding = Encoding.UTF8
            ' Skip media rules.
            options.IgnoreMediaQueries = True
            ' Load images synchronously with HTML documents.
            options.AsyncImageLoading = False
            ' Preserve image resolution.
            options.ImageScalingDpi = 96
        End If
    End Sub

Convert HTML To Another Format

After you load an HTML file, you can export the document to a different format.

Convert HTML to PDF

Use the RichEditDocumentServer.ExportToPdf method to export HTML content to PDF. Pass a PdfExportOptions instance to the method to specify export options.

csharp
using DevExpress.XtraRichEdit.API.Native;
using DevExpress.XtraRichEdit;
// ...

using (var wordProcessor = new RichEditDocumentServer())
{
    // Load an HTML document.
    wordProcessor.LoadDocument("HtmlDocument.html", DocumentFormat.Html);
    Document document = wordProcessor.Document;

    // Define PDF export options.
    var options = new DevExpress.XtraPrinting.PdfExportOptions();
    options.ConvertImagesToJpeg = false;
    options.ImageQuality = DevExpress.XtraPrinting.PdfJpegImageQuality.Highest;

    // Specify page settings before export to PDF.
    document.Unit = DevExpress.Office.DocumentUnit.Inch;
    // Set page size.
    document.Sections[0].Page.Width = 8.5f;
    document.Sections[0].Page.Height = 11f;
    // Change margin settings.
    document.Sections[0].Margins.Top = 0;
    document.Sections[0].Margins.Bottom = 0;
    document.Sections[0].Margins.Left = 0.2f;
    document.Sections[0].Margins.Right = 0.2f;

    // Export the document to PDF.
    wordProcessor.ExportToPdf("PdfDocument.pdf", options);
}
vb
Imports DevExpress.XtraRichEdit.API.Native
Imports DevExpress.XtraRichEdit
' ...

Using wordProcessor As New RichEditDocumentServer()
    ' Load an HTML document.
    wordProcessor.LoadDocument("HtmlDocument.html", DocumentFormat.Html)
    Dim document As Document = wordProcessor.Document

    ' Define PDF export options.
    Dim options As New DevExpress.XtraPrinting.PdfExportOptions()
    options.ConvertImagesToJpeg = False
    options.ImageQuality = DevExpress.XtraPrinting.PdfJpegImageQuality.Highest

    ' Specify page settings before export to PDF.
    document.Unit = DevExpress.Office.DocumentUnit.Inch
    ' Set page size.
    document.Sections(0).Page.Width = 8.5F
    document.Sections(0).Page.Height = 11F
    ' Change margin settings.
    document.Sections(0).Margins.Top = 0
    document.Sections(0).Margins.Bottom = 0
    document.Sections(0).Margins.Left = 0.2F
    document.Sections(0).Margins.Right = 0.2F

    ' Export the document to PDF.
    wordProcessor.ExportToPdf("PdfDocument.pdf", options)
End Using

Convert HTML to DOCX

csharp
using DevExpress.XtraRichEdit.API.Native;
using DevExpress.XtraRichEdit;
// ...

using (var wordProcessor = new RichEditDocumentServer())
{
    Document document = wordProcessor.Document;
    // Load an HTML document.
    document.LoadDocument("HtmlDocument.html", DocumentFormat.Html);

    // Modify and format the document.
    // ...

    // Save the document as DOCX.
    document.SaveDocument("Document.docx", DocumentFormat.Docx);
}
vb
Imports DevExpress.XtraRichEdit.API.Native
Imports DevExpress.XtraRichEdit
' ...

Using wordProcessor As New RichEditDocumentServer()
    Dim document As Document = wordProcessor.Document
    ' Load an HTML document.
    document.LoadDocument("HtmlDocument.html", DocumentFormat.Html)

    ' Modify and format the document.
    ' ...

    ' Save the document as DOCX.
    document.SaveDocument("Document.docx", DocumentFormat.Docx)
End Using

Import Tips

  1. Implement and register a custom IUriStreamProvider to specify how to load images from an HTML document. For example, you can use this service when you import an HTML file that contains images referenced with a custom prefix.

  2. If you import content from an HTTPS website, you may need to specify the security protocol enabled for this site. For example, if the site uses TLS 1.2 protocol, add the following line to your code:

  3. If you import an HTML file that contains tables and these tables extend into page margins, disable the following option: RichEditDocumentServer.Document.CompatibilitySettings.AllowTablesOutstepMargins.

  4. The HTML document is loaded in CompatibilityMode.ModeNotSpecified mode. In this case, the Word Processing Document API uses the compatibility settings of previous Microsoft Word versions. Set the CompatibilitySettings.CompatibilityMode property to CompatibilityMode.Mode15 after the document is loaded to enable compatibility with the latest Microsoft Word version and process the document correctly:

  5. RichEditDocumentServer imports paragraphs from an HTML document with line spacing between paragraphs (i.e. the SpacingBefore and SpacingAfter properties are set to 12 pt). Change the value of these properties before import to remove line spacing.

Export to HTML

Save as HTML File

Use the RichEditDocumentServer.SaveDocument method to export a document to HTML.

csharp
using (var wordProcessor = new RichEditDocumentServer())
{
    // Load a document.
    wordProcessor.LoadDocument("Document.docx");
    // Export the document to a file in HTML format.
    wordProcessor.SaveDocument("HtmlDocument.html", DocumentFormat.Html);
    // Export the document to a file stream in HTML format.
    using (FileStream fileStream = new FileStream("HtmlDocument2.html", FileMode.Create))
    {
        wordProcessor.SaveDocument(fileStream, DocumentFormat.Html);
    }
}
vb
Using wordProcessor As New RichEditDocumentServer()
    ' Load a document.
    wordProcessor.LoadDocument("Document.docx")
    ' Export the document to a file in HTML format.
    wordProcessor.SaveDocument("HtmlDocument.html", DocumentFormat.Html)
    ' Export the document to a file stream in HTML format.
    Using fileStream As New FileStream("HtmlDocument2.html", FileMode.Create)
        wordProcessor.SaveDocument(fileStream, DocumentFormat.Html)
    End Using
End Using

Note

The following document elements are not exported to an HTML file:

  • OLE objects
  • Headers and footers

Return HTML Text

Use the RichEditDocumentServer.HtmlText property or SubDocument.GetHtmlText method to convert document content to HTML.

csharp
using (var wordProcessor = new RichEditDocumentServer())
{
    // Load a document.
    wordProcessor.LoadDocument("Document.docx");
    // Export the document to a file in HTML format.
    System.IO.File.WriteAllText("HtmlDocument.html", wordProcessor.HtmlText);
}
vb
Using wordProcessor As New RichEditDocumentServer()
    ' Load a document.
    wordProcessor.LoadDocument("Document.docx")
    ' Export the document to a file in HTML format.
    System.IO.File.WriteAllText("HtmlDocument.html", wordProcessor.HtmlText)
End Using

Define Export Options

The HtmlDocumentExporterOptions class contains options used to export a document to HTML. You can use one of the following methods to specify these options:

csharp
using DevExpress.XtraRichEdit;
using DevExpress.XtraRichEdit.Export;
// ...

wordProcessor.BeforeExport += (s, e) =>
{
    if (e.DocumentFormat == DocumentFormat.Html)
    {
        var options = (HtmlDocumentExporterOptions)e.Options;
        // Embed images in HTML pages.
        options.EmbedImages = true;
        // Specify how to export style sheets.
        options.CssPropertiesExportType = DevExpress.XtraRichEdit.Export.Html.CssPropertiesExportType.Style;
        // Specify the root tag for export.
        options.ExportRootTag = DevExpress.XtraRichEdit.Export.Html.ExportRootTag.Body;
    }
};
vb
Imports DevExpress.XtraRichEdit
Imports DevExpress.XtraRichEdit.Export
' ...

AddHandler wordProcessor.BeforeExport,
    Sub(s, e)
        If e.DocumentFormat = DocumentFormat.Html Then
            Dim options As HtmlDocumentExporterOptions = CType(e.Options, HtmlDocumentExporterOptions)
            ' Embed images in HTML pages.
            options.EmbedImages = True
            ' Specify how to export style sheets.
            options.CssPropertiesExportType = DevExpress.XtraRichEdit.Export.Html.CssPropertiesExportType.Style
            ' Specify the root tag for export.
            options.ExportRootTag = DevExpress.XtraRichEdit.Export.Html.ExportRootTag.Body
        End If
    End Sub

See Also

Export Document Images to HTML

HTML Support Limitations and Troubleshooting in Word Processing Document API