Back to Spacy

Visualizers

website/docs/usage/visualizers.mdx

4.0.0.dev1023.3 KB
Original Source

Visualizing a dependency parse or named entities in a text is not only a fun NLP demo – it can also be incredibly helpful in speeding up development and debugging your code and training process. That's why our popular visualizers, displaCy and displaCy <sup>ENT</sup> are also an official part of the core library. If you're running a Jupyter notebook, displaCy will detect this and return the markup in a format ready to be rendered and exported.

The quickest way to visualize Doc is to use displacy.serve. This will spin up a simple web server and let you view the result straight from your browser. displaCy can either take a single Doc or a list of Doc objects as its first argument. This lets you construct them however you like – using any pipeline or modifications you like. If you're using Streamlit, check out the spacy-streamlit package that helps you integrate spaCy visualizations into your apps!

Visualizing the dependency parse {id="dep"}

The dependency visualizer, dep, shows part-of-speech tags and syntactic dependencies.

python
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a sentence.")
displacy.serve(doc, style="dep")

The argument options lets you specify a dictionary of settings to customize the layout, for example:

<Infobox title="Important note" variant="warning">

There's currently a known issue with the compact mode for sentences with short arrows and long dependency labels, that causes labels longer than the arrow to wrap. So if you come across this problem, especially when using custom labels, you'll have to increase the distance setting in the options to allow longer arcs.

Moreover, you might need to modify the offset_x argument depending on the shape of your document. Otherwise, the left part of the document may overflow beyond the container's border.

</Infobox>
ArgumentDescription
compact"Compact mode" with square arrows that takes up less space. Defaults to False. bool
colorText color. Can be provided in any CSS legal format as a string e.g.: "#00ff00", "rgb(0, 255, 0)", "hsl(120, 100%, 50%)" and "green" all correspond to the color green (without transparency). Defaults to "#000000". str
bgBackground color. Can be provided in any CSS legal format as a string e.g.: "#00ff00", "rgb(0, 255, 0)", "hsl(120, 100%, 50%)" and "green" all correspond to the color green (without transparency). Defaults to "#ffffff". str
fontFont name or font family for all text. Defaults to "Arial". str
offset_xSpacing on left side of the SVG in px. You might need to tweak this setting for long texts. Defaults to 50. int

For a list of all available options, see the displacy API documentation.

Options example

python
options = {"compact": True, "bg": "#09a3d5",
           "color": "white", "font": "Source Sans Pro"}
displacy.serve(doc, style="dep", options=options)

Visualizing long texts {id="dep-long-text",version="2.0.12"}

Long texts can become difficult to read when displayed in one row, so it's often better to visualize them sentence-by-sentence instead. As of v2.0.12, displacy supports rendering both Doc and Span objects, as well as lists of Docs or Spans. Instead of passing the full Doc to displacy.serve, you can also pass in a list doc.sents. This will create one visualization for each sentence.

python
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
text = """In ancient Rome, some neighbors live in three adjacent houses. In the center is the house of Senex, who lives there with wife Domina, son Hero, and several slaves, including head slave Hysterium and the musical's main character Pseudolus. A slave belonging to Hero, Pseudolus wishes to buy, win, or steal his freedom. One of the neighboring houses is owned by Marcus Lycus, who is a buyer and seller of beautiful women; the other belongs to the ancient Erronius, who is abroad searching for his long-lost children (stolen in infancy by pirates). One day, Senex and Domina go on a trip and leave Pseudolus in charge of Hero. Hero confides in Pseudolus that he is in love with the lovely Philia, one of the courtesans in the House of Lycus (albeit still a virgin)."""
doc = nlp(text)
sentence_spans = list(doc.sents)
displacy.serve(sentence_spans, style="dep")

Visualizing the entity recognizer {id="ent"}

The entity visualizer, ent, highlights named entities and their labels in a text.

python
import spacy
from spacy import displacy

text = "When Sebastian Thrun started working on self-driving cars at Google in 2007, few people outside of the company took him seriously."

nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
displacy.serve(doc, style="ent")
<Standalone height={180}> <div style={{lineHeight: 2.5, fontFamily: "-apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol'", fontSize: 18}}>When <mark style={{ background: '#aa9cfc', padding: '0.45em 0.6em', margin: '0 0.25em', lineHeight: 1, borderRadius: '0.35em'}}>Sebastian Thrun <span style={{ fontSize: '0.8em', fontWeight: 'bold', lineHeight: 1, borderRadius: '0.35em', marginLeft: '0.5rem'}}>PERSON</span></mark> started working on self-driving cars at <mark style={{ background: '#7aecec', padding: '0.45em 0.6em', margin: '0 0.25em', lineHeight: 1, borderRadius: '0.35em'}}>Google <span style={{ fontSize: '0.8em', fontWeight: 'bold', lineHeight: 1, borderRadius: '0.35em', marginLeft: '0.5rem'}}>ORG</span></mark> in <mark style={{ background: '#bfe1d9', padding: '0.45em 0.6em', margin: '0 0.25em', lineHeight: 1, borderRadius: '0.35em'}}>2007 <span style={{ fontSize: '0.8em', fontWeight: 'bold', lineHeight: 1, borderRadius: '0.35em', marginLeft: '0.5rem'}}>DATE</span></mark>, few people outside of the company took him seriously.</div> </Standalone>

The entity visualizer lets you customize the following options:

ArgumentDescription
entsEntity types to highlight (None for all types). Defaults to None. Optional[List[str]]
colorsColor overrides. Entity types should be mapped to color names or values. Defaults to {}. Dict[str, str]

If you specify a list of ents, only those entity types will be rendered – for example, you can choose to display PERSON entities. Internally, the visualizer knows nothing about available entity types and will render whichever spans and labels it receives. This makes it especially easy to work with custom entity types. By default, displaCy comes with colors for all entity types used by trained spaCy pipelines. If you're using custom entity types, you can use the colors setting to add your own colors for them.

Options example

python
colors = {"ORG": "linear-gradient(90deg, #aa9cfc, #fc9ce7)"}
options = {"ents": ["ORG"], "colors": colors}
displacy.serve(doc, style="ent", options=options)
<Standalone height={225}> <div style={{lineHeight: 2.5, fontFamily: "-apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol'", fontSize: 18}}>But <mark style={{ background: 'linear-gradient(90deg, #aa9cfc, #fc9ce7)', padding: '0.45em 0.6em', margin: '0 0.25em', lineHeight: 1, borderRadius: '0.35em'}}>Google <span style={{ fontSize: '0.8em', fontWeight: 'bold', lineHeight: 1, borderRadius: '0.35em', marginLeft: '0.5rem'}}>ORG</span></mark> is starting from behind. The company made a late push into hardware, and <mark style={{ background: 'linear-gradient(90deg, #aa9cfc, #fc9ce7)', padding: '0.45em 0.6em', margin: '0 0.25em', lineHeight: 1, borderRadius: '0.35em'}}>Apple <span style={{ fontSize: '0.8em', fontWeight: 'bold', lineHeight: 1, borderRadius: '0.35em', marginLeft: '0.5rem'}}>ORG</span></mark>’s Siri, available on iPhones, and <mark style={{ background: 'linear-gradient(90deg, #aa9cfc, #fc9ce7)', padding: '0.45em 0.6em', margin: '0 0.25em', lineHeight: 1, borderRadius: '0.35em'}}>Amazon <span style={{ fontSize: '0.8em', fontWeight: 'bold', lineHeight: 1, borderRadius: '0.35em', marginLeft: '0.5rem'}}>ORG</span></mark>’s Alexa software, which runs on its Echo and Dot devices, have clear leads in consumer adoption.</div> </Standalone>

The above example uses a little trick: Since the background color values are added as the background style attribute, you can use any valid background value or shorthand – including gradients and even images!

Adding titles to documents {id="ent-titles"}

Rendering several large documents on one page can easily become confusing. To add a headline to each visualization, you can add a title to its user_data. User data is never touched or modified by spaCy.

python
doc = nlp("This is a sentence about Google.")
doc.user_data["title"] = "This is a title"
displacy.serve(doc, style="ent")

This feature is especially handy if you're using displaCy to compare performance at different stages of a process, e.g. during training. Here you could use the title for a brief description of the text example and the number of iterations.

Visualizing spans {id="span"}

The span visualizer, span, highlights overlapping spans in a text.

python
import spacy
from spacy import displacy
from spacy.tokens import Span

text = "Welcome to the Bank of China."

nlp = spacy.blank("en")
doc = nlp(text)

doc.spans["sc"] = [
    Span(doc, 3, 6, "ORG"),
    Span(doc, 5, 6, "GPE"),
]

displacy.serve(doc, style="span")
<Standalone height={100}> <div style={{ lineHeight: 2.5, direction: 'ltr', fontFamily: "-apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol'", fontSize: 18 }}>Welcome to the <span style={{ fontWeight: 'bold', display: 'inline-block', position: 'relative'}}>Bank<span style={{ background: '#7aecec', top: 40, height: 4, left: -1, width: 'calc(100% + 2px)', position: 'absolute' }}></span><span style={{ background: '#7aecec', top: 40, height: 4, borderTopLeftRadius: 3, borderBottomLeftRadius: 3, left: -1, width: 'calc(100% + 2px)', position: 'absolute' }}><span style={{ background: '#7aecec', color: '#000', top: '-0.5em', padding: '2px 3px', position: 'absolute', fontSize: '0.6em', fontWeight: 'bold', lineHeight: 1, borderRadius: 3 }}>ORG</span></span></span> <span style={{ fontWeight: 'bold', display: 'inline-block', position: 'relative'}}>of <span style={{ background: '#7aecec', top: 40, height: 4, left: -1, width: 'calc(100% + 2px)', position: 'absolute' }}></span></span> <span style={{ fontWeight: 'bold', display: 'inline-block', position: 'relative'}}>China<span style={{ background: '#7aecec', top: 40, height: 4, left: -1, width: 'calc(100% + 2px)', position: 'absolute' }}></span><span style={{ background: '#feca74', top: 57, height: 4, left: -1, width: 'calc(100% + 2px)', position: 'absolute' }}></span><span style={{ background: '#feca74', top: 57, height: 4, borderTopLeftRadius: 3, borderBottomLeftRadius: 3, left: -1, width: 'calc(100% + 2px)', position: 'absolute' }}><span style={{ background: '#feca74', color: '#000', top: '-0.5em', padding: '2px 3px', position: 'absolute', fontSize: '0.6em', fontWeight: 'bold', lineHeight: 1, borderRadius: 3 }}>GPE</span></span></span>.</div> </Standalone>

The span visualizer lets you customize the following options:

ArgumentDescription
spans_keyWhich spans key to render spans from. Default is "sc". str
templatesDictionary containing the keys "span", "slice", and "start". These dictate how the overall span, a span slice, and the starting token will be rendered. Optional[Dict[str, str]
kb_url_templateOptional template to construct the KB url for the entity to link to. Expects a python f-string format with single field to fill in Optional[str]
colorsColor overrides. Entity types should be mapped to color names or values. Dict[str, str]

Because spans can be stored across different keys in doc.spans, you need to specify which one displaCy should use with spans_key (sc is the default).

Options example

python
doc.spans["custom"] = [Span(doc, 3, 6, "BANK")]
options = {"spans_key": "custom"}
displacy.serve(doc, style="span", options=options)
<Standalone height={100}> <div style={{ lineHeight: 2.5, direction: 'ltr', fontFamily: "-apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol'", fontSize: 18 }}>Welcome to the <span style={{ fontWeight: 'bold', display: 'inline-block', position: 'relative'}}>Bank<span style={{ background: '#ddd', top: 40, height: 4, left: -1, width: 'calc(100% + 2px)', position: 'absolute' }}></span><span style={{ background: '#ddd', top: 40, height: 4, borderTopLeftRadius: 3, borderBottomLeftRadius: 3, left: -1, width: 'calc(100% + 2px)', position: 'absolute' }}><span style={{ background: '#ddd', color: '#000', top: '-0.5em', padding: '2px 3px', position: 'absolute', fontSize: '0.6em', fontWeight: 'bold', lineHeight: 1, borderRadius: 3 }}>BANK</span></span></span> <span style={{ fontWeight: 'bold', display: 'inline-block', position: 'relative'}}>of <span style={{ background: '#ddd', top: 40, height: 4, left: -1, width: 'calc(100% + 2px)', position: 'absolute' }}></span></span> <span style={{ fontWeight: 'bold', display: 'inline-block', position: 'relative'}}>China<span style={{ background: '#ddd', top: 40, height: 4, left: -1, width: 'calc(100% + 2px)', position: 'absolute' }}></span></span>.</div> </Standalone>

Using displaCy in Jupyter notebooks {id="jupyter"}

displaCy is able to detect whether you're working in a Jupyter notebook, and will return markup that can be rendered in a cell straight away. When you export your notebook, the visualizations will be included as HTML.

python
# Don't forget to install a trained pipeline, e.g.: python -m spacy download en

# In[1]:
import spacy
from spacy import displacy

# In[2]:
doc = nlp("Rats are various medium-sized, long-tailed rodents.")
displacy.render(doc, style="dep")

# In[3]:
doc2 = nlp(LONG_NEWS_ARTICLE)
displacy.render(doc2, style="ent")
<Infobox variant="warning" title="Important note">

To explicitly enable or disable "Jupyter mode", you can use the jupyter keyword argument – e.g. to return raw HTML in a notebook, or to force Jupyter rendering if auto-detection fails.

</Infobox>

Internally, displaCy imports display and HTML from IPython.display and returns a Jupyter HTML object. If you were doing it manually, it'd look like this:

python
from IPython.display import display, HTML

html = displacy.render(doc, style="dep")
display(HTML(html))

Rendering HTML {id="html"}

If you don't need the web server and just want to generate the markup – for example, to export it to a file or serve it in a custom way – you can use displacy.render. It works the same way, but returns a string containing the markup.

python
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
doc1 = nlp("This is a sentence.")
doc2 = nlp("This is another sentence.")
html = displacy.render([doc1, doc2], style="dep", page=True)

page=True renders the markup wrapped as a full HTML page. For minified and more compact HTML markup, you can set minify=True. If you're rendering a dependency parse, you can also export it as an .svg file.

What's SVG?

Unlike other image formats, the SVG (Scalable Vector Graphics) uses XML markup that's easy to manipulate using CSS or JavaScript. Essentially, SVG lets you design with code, which makes it a perfect fit for visualizing dependency trees. SVGs can be embedded online in an `` tag, or inlined in an HTML document. They're also pretty easy to convert.

python
svg = displacy.render(doc, style="dep")
output_path = Path("/images/sentence.svg")
output_path.open("w", encoding="utf-8").write(svg)
<Infobox title="Important note" variant="warning">

Since each visualization is generated as a separate SVG, exporting .svg files only works if you're rendering one single doc at a time. (This makes sense – after all, each visualization should be a standalone graphic.) So instead of rendering all Docs at once, loop over them and export them separately.

</Infobox>

Example: Export SVG graphics of dependency parses {id="examples-export-svg"}

python
import spacy
from spacy import displacy
from pathlib import Path

nlp = spacy.load("en_core_web_sm")
sentences = ["This is an example.", "This is another one."]
for sent in sentences:
    doc = nlp(sent)
    svg = displacy.render(doc, style="dep", jupyter=False)
    file_name = '-'.join([w.text for w in doc if not w.is_punct]) + ".svg"
    output_path = Path("/images/" + file_name)
    output_path.open("w", encoding="utf-8").write(svg)

The above code will generate the dependency visualizations as two files, This-is-an-example.svg and This-is-another-one.svg.

Rendering data manually {id="manual-usage"}

You can also use displaCy to manually render data. This can be useful if you want to visualize output from other libraries, like NLTK or SyntaxNet. If you set manual=True on either render() or serve(), you can pass in data in displaCy's format as a dictionary (instead of Doc objects). There are helper functions for converting Doc objects to displaCy's format for use with manual=True: displacy.parse_deps, displacy.parse_ents, and displacy.parse_spans.

Example with parse function

python
doc = nlp("But Google is starting from behind.")
ex = displacy.parse_ents(doc)
html = displacy.render(ex, style="ent", manual=True)

Example with raw data

python
ex = [{"text": "But Google is starting from behind.",
       "ents": [{"start": 4, "end": 10, "label": "ORG"}],
       "title": None}]
html = displacy.render(ex, style="ent", manual=True)
python
{
    "words": [
        {"text": "This", "tag": "DT"},
        {"text": "is", "tag": "VBZ"},
        {"text": "a", "tag": "DT"},
        {"text": "sentence", "tag": "NN"}
    ],
    "arcs": [
        {"start": 0, "end": 1, "label": "nsubj", "dir": "left"},
        {"start": 2, "end": 3, "label": "det", "dir": "left"},
        {"start": 1, "end": 3, "label": "attr", "dir": "right"}
    ]
}
python
{
    "text": "But Google is starting from behind.",
    "ents": [{"start": 4, "end": 10, "label": "ORG"}],
    "title": None
}
python
{
    "text": "But Google is starting from behind.",
    "ents": [{"start": 4, "end": 10, "label": "ORG", "kb_id": "Q95", "kb_url": "https://www.wikidata.org/entity/Q95"}],
    "title": None
}
python
{
    "text": "Welcome to the Bank of China.",
    "spans": [
        {"start_token": 3, "end_token": 6, "label": "ORG"},
        {"start_token": 5, "end_token": 6, "label": "GPE"},
    ],
    "tokens": ["Welcome", "to", "the", "Bank", "of", "China", "."],
}

Using displaCy in a web application {id="webapp"}

If you want to use the visualizers as part of a web application, for example to create something like our online demo, it's not recommended to only wrap and serve the displaCy renderer. Instead, you should only rely on the server to perform spaCy's processing capabilities, and use a client-side implementation like displaCy.js to render the JSON-formatted output.

Why not return the HTML by the server?

It's certainly possible to just have your server return the markup. But outputting raw, unsanitized HTML is risky and makes your app vulnerable to cross-site scripting (XSS). All your user needs to do is find a way to make spaCy return text like <script src="malicious-code.js"></script>, which is pretty easy in NER mode. Instead of relying on the server to render and sanitize HTML, you can do this on the client in JavaScript. displaCy.js creates the markup as DOM nodes and will never insert raw HTML.

<Grid cols={2}>

Alternatively, if you're using Streamlit, check out the spacy-streamlit package that helps you integrate spaCy visualizations into your apps. It includes a full embedded visualizer, as well as individual components.

</Grid>