Back to Devexpress

Embed XMP Metadata in a PDF Document

officefileapi-403105-pdf-document-api-xmp-metadata.md

latest17.2 KB
Original Source

Embed XMP Metadata in a PDF Document

  • Dec 12, 2024
  • 7 minutes to read

Adobe Extensible Metadata Platform (XMP) is an XML-based ISO metadata standard, originally created by Adobe Systems Inc. It defines the data structure, serialization model, and basic metadata properties intended to form a unified metadata package that can be embedded into different media formats.

PDF Document API allows you to embed XMP metadata in your documents. You can load metadata from a stream or a string, edit existing metadata or generate a new XMP data model.

Add New Metadata

The PdfDocument.SetMetadata method allows you to embed XMP metadata in the document. You can pass a string with metadata or an XmpDocument object to this method.

The XmpDocument object is an instance of the XMP data model (XMP packet). You can load data to the packet from a stream or a string.

The code sample below loads metadata from a file and embeds it in the document:

csharp
using DevExpress.Pdf;
using DevExpress.Pdf.Xmp;
//...

using (PdfDocumentProcessor pdfDocumentProcessor = new PdfDocumentProcessor())
{
    pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf");
    PdfDocument document = pdfDocumentProcessor.Document;
    XmpDocument metadata = new XmpDocument();
    using (FileStream xmlStream = new FileStream("Documents//metadata.xml", FileMode.Open, FileAccess.Read))
    {
        metadata = XmpDocument.FromStream(xmlStream);
        document.SetMetadata(metadata);
    }

    pdfDocumentProcessor.SaveDocument("Invoice_Upd.pdf");
}
vb
Imports DevExpress.Pdf
Imports DevExpress.Pdf.Xmp
'...

Using pdfDocumentProcessor As New PdfDocumentProcessor()
    pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf")
    Dim document As PdfDocument = pdfDocumentProcessor.Document
    Dim metadata As New XmpDocument()
    Using xmlStream As New FileStream("Documents//metadata.xml", FileMode.Open, FileAccess.Read)
        metadata = XmpDocument.FromStream(xmlStream)
        document.SetMetadata(metadata)
    End Using

    pdfDocumentProcessor.SaveDocument("Invoice_Upd.pdf")
End Using

Access Document Metadata

You can obtain metadata associated with a document, page or Form XObject. The PdfMetadata.Data property retrieves the object’s metadata. Use the XmpDocument.FromString method to convert the retrieved data to an XMP packet, as shown in the example below.

csharp
using DevExpress.Pdf;
using DevExpress.Pdf.Xmp;
//...

using (PdfDocumentProcessor pdfDocumentProcessor = new PdfDocumentProcessor())
{
    pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf");
    PdfDocument document = pdfDocumentProcessor.Document;

    string metadata = document.Metadata.Data;
    XmpDocument xmpDocument = XmpDocument.FromString(metadata);
}
vb
Imports DevExpress.Pdf
Imports DevExpress.Pdf.Xmp
'...

Using pdfDocumentProcessor As New PdfDocumentProcessor()
    pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf")
    Dim document As PdfDocument = pdfDocumentProcessor.Document

    Dim metadata As String = document.Metadata.Data
    Dim xmpDocument As XmpDocument = XmpDocument.FromString(metadata)
End Using

Manage Metadata Nodes

The XmpDocument.Values property returns the dictionary that contains packet nodes (name-value pairs for metadata properties). You can access a packet node by its name or value. A node name has the prefix:local name format.

You can add new nodes and change an existing node’s value. The table below lists available node value types and API used to create and access each node type.

Value typeDescriptionCreated ByRetrieved By
SimpleA Unicode string. The string may be empty. A simple value can be a regular string or URI string.XmpDocument.Add
XmpDocument.CreateSimpleValueXmpDocument.GetSimpleValue
XmpDocument.GetBoolean
XmpDocument.GetDate
XmpDocument.GetString
XmpDocument.GetFloat
XmpDocument.GetInteger
ArrayA container for items of any available value type, including a nested array and structure.XmpDocument.CreateArrayXmpDocument.GetArray
Structured valueA container for fields with unique names. Field values can have any available type.XmpDocument.CreateStructureXmpDocument.GetStructure

When you add a new node to the packet, you can use an XmpName class object or a string to specify the node name. In the latter case, make sure that the specified prefix is registered. You can call the XmpDocument.RegisterNamespace method to register the prefix.

The code sample below edits document metadata:

View Example: How to Embed XMP Metadata to the PDF Document

csharp
using DevExpress.Pdf;
using DevExpress.Pdf.Xmp;
//...

using (PdfDocumentProcessor pdfDocumentProcessor = new PdfDocumentProcessor())
{
    // Load a document
    pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf");
    PdfDocument document = pdfDocumentProcessor.Document;

    // Retrieve metadata:
    XmpDocument metadata = XmpDocument.FromString(document.Metadata.Data);

    // Add items to the Creator array:
    XmpArray creators = metadata.GetArray("dc:creator");
    if (creators != null)
    {
        creators.Add("PDF Document API");
        creators.Add("Office File API");
    }

    // Change the CreatorTool node value:
    XmpSimpleNode creatorTool = metadata.GetSimpleValue("xmp:CreatorTool");
    creatorTool.SetValue("PDF Document API");

    // Add MaxPageSize structure:
    XmpName structureName = XmpName.Get("MaxPageSize", "http://ns.adobe.com/xap/1.0/t/pg/");
    XmpStructure dimensions = metadata.CreateStructure(structureName);
    metadata.RegisterNamespace("http://ns.adobe.com/xap/1.0/sType/Dimensions#", "stDim");
    dimensions.Add("stDim:h", 11);
    dimensions.Add("stDim:w", 8.5f);
    dimensions.Add("stDim:Unit", "inch");

    // Embed modified metadata in the document:
    document.SetMetadata(metadata);

    // Save the result:
    pdfDocumentProcessor.SaveDocument("Invoice_Upd.pdf");
}
vb
Imports DevExpress.Pdf
Imports DevExpress.Pdf.Xmp
'...

Using pdfDocumentProcessor As New PdfDocumentProcessor()
    ' Load a document
    pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf")
    Dim document As PdfDocument = pdfDocumentProcessor.Document

    ' Retrieve metadata:
    Dim metadata As XmpDocument = XmpDocument.FromString(document.Metadata.Data)

    ' Add items to the Creator array:
    Dim creators As XmpArray = metadata.GetArray("dc:creator")
    If creators IsNot Nothing Then
        creators.Add("PDF Document API")
        creators.Add("Office File API")
    End If

    ' Change the CreatorTool node value:
    Dim creatorTool As XmpSimpleNode = metadata.GetSimpleValue("xmp:CreatorTool")
    creatorTool.SetValue("PDF Document API")

    ' Add MaxPageSize structure:
    Dim structureName As XmpName = XmpName.Get("MaxPageSize", "http://ns.adobe.com/xap/1.0/t/pg/")
    Dim dimensions As XmpStructure = metadata.CreateStructure(structureName)
    metadata.RegisterNamespace("http://ns.adobe.com/xap/1.0/sType/Dimensions#", "stDim")
    dimensions.Add("stDim:h", 11)
    dimensions.Add("stDim:w", 8.5F)
    dimensions.Add("stDim:Unit", "inch")

    ' Embed modified metadata in the document:
    document.SetMetadata(metadata)

    ' Save the result:
    pdfDocumentProcessor.SaveDocument("Invoice_Upd.pdf")
End Using

Use XMP Schemas

An XMP schema (or namespace) is a set of metadata properties. Each schema is identified by a unique namespace URI and can hold an arbitrary number of properties. The XMP specification contains a definition of predefined schemas that include standard general-purpose namespaces, and namespaces that are specialized for Adobe applications.

PDF Document API supports the following predefined XMP namespaces:

NamespaceDescriptionCredentialsClass
Basic XMPContains basic description information.Namespace URI: http://ns.adobe.com/xap/1.0/
Prefix: xmpXmpProperties
Adobe PDFSpecifies properties used in Adobe PDF documents.Namespace URI: http://ns.adobe.com/pdf/1.3/
Prefix: pdfAdobePdfProperties
PDF/AUsed to define a document’s PDF/A conformance level and version.Namespace URI: https://www.aiim.org/pdfa/ns/id
Prefix: pdfaidPdfAProperties
Dublin CoreContains information defined in the Dublin Core Metadata Set, created by the Dublin Core Metadata Initiative (DCMI).Namespace URI: http://purl.org/dc/elements/1.1/\_
Prefix: dcDublinCoreProperties
Rights ManagementContains information regarding the legal restrictions associated with a PDF document.Namespace URI: http://ns.adobe.com/xap/1.0/rights/
Prefix: xmpRightsXmpRightsManagementProperties

The code sample below adds items from the Rights Management schema to the packet:

csharp
using DevExpress.Pdf;
using DevExpress.Pdf.Xmp;
//...

using (PdfDocumentProcessor pdfDocumentProcessor = new PdfDocumentProcessor())
{
    // Load a document:
    pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf");
    PdfDocument document = pdfDocumentProcessor.Document;

    // Create a new XMP packet:
    XmpDocument metadata = new XmpDocument();
    XmpRightsManagementProperties rightsManagementSchema =
        metadata.RightsManagementProperties;
    rightsManagementSchema.Certificate = "https://www.devexpress.com/";
    rightsManagementSchema.Owner.Add("DevExpress");
    rightsManagementSchema.Marked = true;
    rightsManagementSchema.WebStatement = "https://www.devexpress.com/support/eulas/";
    rightsManagementSchema.UsageTerms.AddString("Copyright(C) 2021 DevExpress.All Rights Reserved.", 
        "x-default");

    // Embed metadata in the document:
    document.SetMetadata(metadata);

    // Save the result:
    pdfDocumentProcessor.SaveDocument("Invoice_Upd.pdf");
}
vb
Imports DevExpress.Pdf
Imports DevExpress.Pdf.Xmp
'...

Using pdfDocumentProcessor As New PdfDocumentProcessor()
    ' Load a document:
    pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf")
    Dim document As PdfDocument = pdfDocumentProcessor.Document

    ' Create a new XMP packet:
    Dim metadata As New XmpDocument()
    Dim rightsManagementSchema As XmpRightsManagementProperties = 
        metadata.RightsManagementProperties
    rightsManagementSchema.Certificate = "https://www.devexpress.com/"
    rightsManagementSchema.Owner.Add("DevExpress")
    rightsManagementSchema.Marked = True
    rightsManagementSchema.WebStatement = "https://www.devexpress.com/support/eulas/"
    rightsManagementSchema.UsageTerms.AddString("Copyright(C) 2021 DevExpress.All Rights Reserved.",
         "x-default")

    ' Embed metadata in the document:
    document.SetMetadata(metadata)

    ' Save the result:
    pdfDocumentProcessor.SaveDocument("Invoice_Upd.pdf")
End Using

Create Custom Schema

Create a CustomProperties class object and fill it with items to create a custom schema. Assign this object to the XmpDocument.CustomProperties property to add your schema to the packet.

csharp
using DevExpress.Pdf;
using DevExpress.Pdf.Xmp;
//...

using (PdfDocumentProcessor pdfDocumentProcessor = new PdfDocumentProcessor())
{
    pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf");
    PdfDocument document = pdfDocumentProcessor.Document;
    XmpDocument metadata = XmpDocument.FromString(document.Metadata.Data);

    metadata.RegisterNamespace("https://www.devexpress.com/", "dx");
    CustomProperties customProperties = new CustomProperties(metadata, "https://www.devexpress.com/");
    customProperties["Team"] = "Office";
    customProperties["Checked"] = "true";
    customProperties["Project"] = "PDF Document API";

    document.SetMetadata(metadata);
    pdfDocumentProcessor.SaveDocument("Invoice_Upd.pdf");
}
vb
Imports DevExpress.Pdf
Imports DevExpress.Pdf.Xmp
'...

Using pdfDocumentProcessor As New PdfDocumentProcessor()
    pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf")
    Dim document As PdfDocument = pdfDocumentProcessor.Document
    Dim metadata As XmpDocument = XmpDocument.FromString(document.Metadata.Data)

    metadata.RegisterNamespace("https://www.devexpress.com/", "dx")
    Dim customProperties As New CustomProperties(metadata, "https://www.devexpress.com/")
    customProperties("Team") = "Office"
    customProperties("Checked") = "true"
    customProperties("Project") = "PDF Document API"

    document.SetMetadata(metadata)
    pdfDocumentProcessor.SaveDocument("Invoice_Upd.pdf")
End Using

Tip

You can embed ZUGFeRD-compliant XML in your document, as described in the following help topic:

Read Tutorial: How to: Create a ZUGFeRD-Compliant PDF Invoice

Remove Metadata

Call the XmpDocument.Remove method to remove an XMP node with the specified name.

The code sample below removes the dc:Title node:

csharp
using (PdfDocumentProcessor pdfDocumentProcessor = new PdfDocumentProcessor())
{
    // Load a document
    pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf");
    PdfDocument document = pdfDocumentProcessor.Document;

    // Retrieve metadata:
    XmpDocument metadata = XmpDocument.FromString(document.Metadata.Data);

    // Delete the node:
    metadata.Remove("dc:title");

    // Apply changes:
    document.SetMetadata(metadata);

    // Save the result:
    pdfDocumentProcessor.SaveDocument("Invoice_Upd.pdf");
}
vb
Using pdfDocumentProcessor As New PdfDocumentProcessor()
    ' Load a document
    pdfDocumentProcessor.LoadDocument("Documents//Invoice.pdf")
    Dim document As PdfDocument = pdfDocumentProcessor.Document

    ' Retrieve metadata:
    Dim metadata As XmpDocument = XmpDocument.FromString(document.Metadata.Data)

    ' Delete the node:
    metadata.Remove("dc:title")

    ' Apply changes:
    document.SetMetadata(metadata)

    ' Save the result:
    pdfDocumentProcessor.SaveDocument("Invoice_Upd.pdf")
End Using

Note

The PdfDocumentProcessor.SaveDocument method call always writes the following metadata nodes:

  • xmp:CreateDate
  • xmp:ModifyDate
  • xmp:MetadataDate

Set the PdfSaveOptions.DisableCreationDateUpdate property to false and pass the PdfSaveOptions object as the SaveDocument method parameter to disable the xmp:CreateDate node update.

The PdfSaveOptions.DisableModDateUpdate property allows you to disable only the xmp:ModifyDate node update. Use the PdfSaveOptions.DisableMetadataUpdate property to disable all mandatory metadata nodes updates.