markdown/guide/schema.md
Each ProseMirror document has a schema associated with it. The schema describes the kind of nodes that may occur in the document, and the way they are nested. For example, it might say that the top-level node can contain one or more blocks, and that paragraph nodes can contain any number of inline nodes, with any marks applied to them.
There is a package with a basic schema available, but the nice thing about ProseMirror is that it allows you to define your own schemas.
Every node in a document has a type, which represents its semantic meaning and its properties, such as the way it is rendered in the editor.
When you define a schema, you enumerate the node types that may occur within it, describing each with a spec object:
const trivialSchema = new Schema({
nodes: {
doc: {content: "paragraph+"},
paragraph: {content: "text*"},
text: {inline: true},
/* ... and so on */
}
})
That defines a schema where the document may contain one or more paragraphs, and each paragraph can contain any amount of text.
Every schema must at least define a top-level node type (which
defaults to the name "doc", but you can
configure that), and a "text" type for
text content.
Nodes that count as inline must declare this with the
inline property (though for the text
type, which is inline by definition, you may omit this).
The strings in the content fields in the
example schema above are called content expressions. They control
what sequences of child nodes are valid for this node type.
You can say, for example "paragraph" for “one paragraph”, or
"paragraph+" to express “one or more paragraphs”. Similarly,
"paragraph*" means “zero or more paragraphs” and "caption?" means
“zero or one caption node”. You can also use regular-expression-like
ranges, such as {2} (“exactly two”) {1, 5} (“one to five”) or
{2,} (“two or more”) after node names.
Such expressions can be combined to create a sequence, for example
"heading paragraph+" means ‘first a heading, then one or more
paragraphs’. You can also use the pipe | operator to indicate a
choice between two expressions, as in "(paragraph | blockquote)+".
Some groups of element types will appear multiple times in your
schema—for example you might have a concept of “block” nodes, that may
appear at the top level but also nested inside of blockquotes. You can
create a node group by giving your node specs a
group property, and then refer to that
group by its name in your expressions.
const groupSchema = new Schema({
nodes: {
doc: {content: "block+"},
paragraph: {group: "block", content: "text*"},
blockquote: {group: "block", content: "block+"},
text: {}
}
})
Here "block+" is equivalent to "(paragraph | blockquote)+".
It is recommended to always require at least one child node in nodes
that have block content (such as "doc" and "blockquote" in the
example above), because browsers will completely collapse the node
when it's empty, making it rather hard to edit.
The order in which your nodes appear in an or-expression is
significant. When creating a default instance for a non-optional node,
for example to make sure a document still conforms to the schema after
a replace step the first type in the
expression will be used. If that is a group, the first type in the
group (determined by the order in which the group's members appear in
your nodes map) is used. If I switched the positions of
"paragraph" and "blockquote" in the the example schema, you'd get
a stack overflow as soon as the editor tried to create a block
node—it'd create a "blockquote" node, whose content requires at
least one block, so it'd try to create another "blockquote" as
content, and so on.
Not every node-manipulating function in the library checks that it is
dealing with valid content—higher level concepts like
transforms do, but primitive node-creation methods
usually don't and instead put the responsibility for providing sane
input on their caller. It is perfectly possible to use, for example
NodeType.create, to create a node with
invalid content. For nodes that are ‘open’ on the edge of
slices, this is even a reasonable thing to do. There
is a separate createChecked
method, as well as an after-the-fact
check method that can be used to assert that a
given node's content is valid.
Marks are used to add extra styling or other information to inline content. A schema must declare all mark types it allows in its schema. Mark types are objects much like node types, used to tag mark objects and provide additional information about them.
By default, nodes with inline content allow all marks defined in the
schema to be applied to their children. You can configure this with
the marks property on your node spec.
Here's a simple schema that supports strong and emphasis marks on text in paragraphs, but not in headings:
const markSchema = new Schema({
nodes: {
doc: {content: "block+"},
paragraph: {group: "block", content: "text*", marks: "_"},
heading: {group: "block", content: "text*", marks: ""},
text: {inline: true}
},
marks: {
strong: {},
em: {}
}
})
The set of marks is interpreted as a space-separated string of mark
names or mark groups—"_" acts as a wildcard, and the empty string
corresponds to the empty set.
The document schema also defines which attributes each node or mark has. If your node type requires extra node-specific information to be stored, such as the level of a heading node, that is best done with an attribute.
Attribute sets are represented as plain objects with a predefined (per
node or mark) set of properties holding any JSON-serializeable values.
To specify what attributes it allows, use the optional attrs field
in a node or mark spec.
heading: {
content: "text*",
attrs: {level: {default: 1}}
}
In this schema, every instance of the heading node will have a
level attribute under .attrs.level. If it isn't specified when the
node is created, it will default to 1.
<a id="generatable"></a>When you don't give a default value for an attribute, an error will be raised when you attempt to create such a node without specifying that attribute.
That will also make it impossible for the library to generate such
nodes as filler to satisfy schema constraints during a transform or
when calling createAndFill. This
is why you are not allowed to put such nodes in a required position in
the schema—in order to be able to enforce the schema constraints, the
editor needs to be able to generate empty nodes to fill missing pieces
in the content.
In order to be able to edit them in the browser, it must be possible
to represent document nodes in the browser DOM. The easiest way to do
that is to include information about each node's DOM representation in
the schema using the toDOM field in the
node spec.
This field should hold a function that, when called with the node as argument, returns a description of the DOM structure for that node. This may either be a direct DOM node or an array describing it, for example:
const schema = new Schema({
nodes: {
doc: {content: "paragraph+"},
paragraph: {
content: "text*",
toDOM(node) { return ["p", 0] }
},
text: {}
}
})
The expression ["p", 0] declares that a paragraph is rendered as an
HTML <p> tag. The zero is the ‘hole’ where its content should be
rendered. You may also include an object with HTML attributes after
the tag name, for example ["div", {class: "c"}, 0]. Leaf nodes don't
need a hole in their DOM representation, since they don't have
content.
Mark specs allow a similar toDOM method,
but they are required to render as a single tag that directly wraps
the content, so the content always goes directly in the returned node,
and the hole doesn't need to be specified.
You'll also often need to parse a document from DOM data, for
example when the user pastes or drags something into the editor. The
model module also comes with functionality for that, and you are
encouraged to include parsing information directly in your schema with
the parseDOM property.
This may list an array of parse rules, which describe DOM constructs that map to a given node or mark. For example, the basic schema has these for the emphasis mark:
parseDOM: [
{tag: "em"}, // Match <em> nodes
{tag: "i"}, // and <i> nodes
{style: "font-style=italic"} // and inline 'font-style: italic'
]
The value given to tag in a parse rule can
be a CSS selector, so you can do thing like "div.myclass" too.
Similarly, style matches inline CSS
styles.
When a schema includes parseDOM annotations, you can create a
DOMParser object for it with
DOMParser.fromSchema. This is done
by the editor to create the default clipboard parser, but you can
also override that.
Documents also come with a built-in JSON serialization format. You can
call toJSON on them to get an object that can
safely be passed to
JSON.stringify,
and schema objects have a nodeFromJSON method
that can parse this representation back into a document.
The nodes and marks options passed to the Schema
constructor take OrderedMap
objects as well as
plain JavaScript objects. The resulting schema's
spec.nodes and spec.marks properties are
always OrderedMaps, which can be used as the basis for further
schemas.
Such maps support a number of methods to conveniently create updated
versions. For example you could say
schema.spec.nodes.remove("blockquote") to derive a set of nodes
without the blockquote node, which can then be passed as the nodes
field for a new schema.
The schema-list module exports a convenience method to add the nodes exported by those modules to a nodeset.