docs/regular-expressions.md
Regular expressions are a powerful tool. However, they are also very expensive in terms of memory. Ensuring correct and useful functionality is the priority but we have a few tips to minimize impact without affecting capabilities.
Consider non-regular expressions options. strings.Contains(), strings.Replace(), and strings.ReplaceAll() are dramatically faster and less memory intensive than regular expressions. If one of these will work equally well, use the non-regular expression option.
Order character classes consistently. We use regular expression caching to reduce our memory footprint. This is more effective if character classes are consistently ordered. Since a character class is a set, order does not affect functionality. We have many equivalent regular expressions that only differ by character class order. Below is the order we recommend for consistency:
Numeric range, i.e., digits (e.g., 0-9)
Uppercase alphabetic range (e.g., A-Z, A-F)
Lowercase alphabetic range (e.g., a-z, a-f)
Underscore (_)
Everything else (except dash, -) in ASCII order: \t\n\r !"#$%&()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^abcdefghijklmnopqrstuvwxyz{|}~
Last, dash (-)
For example, consider the following expressions which are equivalent but vary character class ordering:
`[_a-zA-Z0-9-,.]` // wrong ordering
`[0-9A-Za-z_,.-]` // correct
`[;a-z0-9]` // wrong ordering
`[0-9a-z;]` // correct
Inside character classes, avoid unnecessary character escaping. Go does not complain about extra character escaping but avoid it to improve cache performance. Inside a character class, most characters do not need to be escaped, as Go assumes you mean the literal character.
These characters which normally have special meaning in regular expressions, inside character classes do not need to be escaped: $, (, ), *, +, ., ?, ^, {, |, }.
Dash (-), when it is last in the character class or otherwise unambiguously not part of a range, does not need to be escaped. If in doubt, place the dash last in the character class (e.g., [a-c-]) or escape the dash (e.g., \-).
Angle brackets ([, ]) always need to be escaped in a character class.
For example, consider the following expressions which are equivalent but include unnecessary character escapes:
`[\$\(\.\?\|]` // unnecessary escapes
`[$(.?|]` // correct
`[a-z\-0-9_A-Z\.]` // unnecessary escapes, wrong order
`[0-9A-Za-z_.-]` // correct