docs/src/content/en/reference/rag/document.mdx
The MDocument class processes documents for RAG applications. The main methods are .chunk() and .extractMetadata().
<PropertiesTable content={[ { name: 'docs', type: 'Array<{ text: string, metadata?: Record<string, any> }>', description: 'Array of document chunks with their text content and optional metadata', }, { name: 'type', type: "'text' | 'html' | 'markdown' | 'json' | 'latex'", description: 'Type of document content', }, ]} />
fromText()Creates a document from plain text content.
static fromText(text: string, metadata?: Record<string, any>): MDocument
fromHTML()Creates a document from HTML content.
static fromHTML(html: string, metadata?: Record<string, any>): MDocument
fromMarkdown()Creates a document from Markdown content.
static fromMarkdown(markdown: string, metadata?: Record<string, any>): MDocument
fromJSON()Creates a document from JSON content.
static fromJSON(json: string, metadata?: Record<string, any>): MDocument
chunk()Splits document into chunks and optionally extracts metadata.
async chunk(params?: ChunkParams): Promise<Chunk[]>
See chunk() reference for detailed options.
getDocs()Returns array of processed document chunks.
getDocs(): Chunk[]
getText()Returns array of text strings from chunks.
getText(): string[]
getMetadata()Returns array of metadata objects from chunks.
getMetadata(): Record<string, any>[]
extractMetadata()Extracts metadata using specified extractors. See ExtractParams reference for details.
async extractMetadata(params: ExtractParams): Promise<MDocument>
import { MDocument } from '@mastra/rag'
// Create document from text
const doc = MDocument.fromText('Your content here')
// Split into chunks with metadata extraction
const chunks = await doc.chunk({
strategy: 'markdown',
headers: [
['#', 'title'],
['##', 'section'],
],
extract: {
summary: true, // Extract summaries with default settings
keywords: true, // Extract keywords with default settings
},
})
// Get processed chunks
const docs = doc.getDocs()
const texts = doc.getText()
const metadata = doc.getMetadata()