lib/sanitize/README.md
sanitize is a Crystal library for transforming HTML/XML trees. It's primarily
used to sanitize HTML from untrusted sources in order to prevent
XSS attacks and other
adversities.
It builds on stdlib's XML module to
parse HTML/XML. Based on libxml2 it's a solid parser and
turns malformed and malicious input into valid and safe markup.
Add the dependency to your shard.yml:
dependencies:
sanitize:
github: straight-shoota/sanitize
Run shards install
The Sanitize::Policy::HTMLSanitizer policy applies the following sanitization steps. Except
for the first one (which is essential to the entire process), all can be disabled
or configured.
href or src) with customizable sanitization
policy.rel="nofollow" to all links and rel="noopener" to links with target.align, width and height.class attributes based on a whitelist (by default all classes are
rejected).Transformation is based on rules defined by Sanitize::Policy implementations.
The recommended standard policy for HTML sanitization is Sanitize::Policy::HTMLSanitizer.common
which represents good defaults for most use cases.
It sanitizes user input against a known safe list of accepted elements and their
attributes.
require "sanitize"
sanitizer = Sanitize::Policy::HTMLSanitizer.common
sanitizer.process(%(<a href="javascript:alert('foo')">foo</a>)) # => %(foo)
sanitizer.process(%(<p><a href="foo">foo</a></p>)) # => %(<p><a href="foo" rel="nofollow">foo</a></p>)
sanitizer.process(%()) # => %()
sanitizer.process(%(<table><tr><td>foo</td><td>bar</td></tr></table>)) # => %(<table><tr><td>foo</td><td>bar</td></tr></table>)
Sanitization should always run after any other processing (for example rendering Markdown) and is a must when including HTML from untrusted sources into a web page.
A typical format for user generated content is Markdown. Even though it has
only a very limited feature set compared to HTML, it can still produce
potentially harmful HTML and is is usually possible to embed raw HTML directly.
So Sanitization is necessary.
The most common Markdown renderer is markd,
so here is a sample how to use it with sanitize:
sanitizer = Sanitize::Policy::HTMLSanitizer.common
# Allow classes with `language-` prefix which are used for syntax highlighting.
sanitizer.valid_classes << /language-.+/
markdown = <<-MD
Sanitization with [https://shardbox.org/shards/sanitize](sanitize) is not that
**difficult**.
```cr
puts "Hello World!"
```
<p><a href="javascript:alert("XSS attack!")">Hello world!</a></p>
MD
html = Markd.to_html(markdown)
sanitized = sanitizer.process(html)
puts sanitized
The result:
<p>Sanitization with <a href="sanitize" rel="nofollow">https://shardbox.org/shards/sanitize</a> is not that
<strong>difficult</strong>.</p>
<pre><code class="language-cr">puts "Hello World!"
</code></pre>
<p>Hello world!</p>
Sanitizing CSS is not supported. Thus style attributes can't be accepted in a
safe way.
CSS sanitization features may be added when a CSS parsing library is available.
If you want to privately disclose security-issues, please contact
straightshoota on Keybase or
[email protected] (PGP: DF2D C9E9 FFB9 6AE0 2070 D5BC F0F3 4963 7AC5 087A).
git checkout -b my-new-feature)git commit -am 'Add some feature')git push origin my-new-feature)