doc/syntax.html
| This page lists the regular expression syntax accepted by RE2. |
| It also lists syntax accepted by PCRE, PERL, and VIM. |
| Grayed out expressions are not supported by RE2. |
| |
| Single characters: |
| . | any character, possibly including newline (s=true) |
| [xyz] | character class |
| [^xyz] | negated character class |
| \d | Perl character class |
| \D | negated Perl character class |
| [[:alpha:]] | ASCII character class |
| [[:^alpha:]] | negated ASCII character class |
| \pN | Unicode character class (one-letter name) |
| \p{Greek} | Unicode character class |
| \PN | negated Unicode character class (one-letter name) |
| \P{Greek} | negated Unicode character class |
| |
| Composites: |
| xy | x followed by y |
| x|y | x or y (prefer x) |
| |
| Repetitions: |
| x* | zero or more x, prefer more |
| x+ | one or more x, prefer more |
| x? | zero or one x, prefer one |
| x{n,m} | n or n+1 or ... or m x, prefer more |
| x{n,} | n or more x, prefer more |
| x{n} | exactly n x |
| x*? | zero or more x, prefer fewer |
| x+? | one or more x, prefer fewer |
| x?? | zero or one x, prefer zero |
| x{n,m}? | n or n+1 or ... or m x, prefer fewer |
| x{n,}? | n or more x, prefer fewer |
| x{n}? | exactly n x |
| x{} | (≡ x*) VIM |
| x{-} | (≡ x*?) VIM |
| x{-n} | (≡ x{n}?) VIM |
| x= | (≡ x?) VIM |
| |
| Implementation restriction: The counting forms x{n,m}, x{n,}, and x{n} |
| reject forms that create a minimum or maximum repetition count above 1000. |
| Unlimited repetitions are not subject to this restriction. |
| |
| Possessive repetitions: |
| x*+ | zero or more x, possessive |
| x++ | one or more x, possessive |
| x?+ | zero or one x, possessive |
| x{n,m}+ | n or ... or m x, possessive |
| x{n,}+ | n or more x, possessive |
| x{n}+ | exactly n x, possessive |
| |
| Grouping: |
| (re) | numbered capturing group (submatch) |
| (?P<name>re) | named & numbered capturing group (submatch) |
| (?<name>re) | named & numbered capturing group (submatch) |
| (?'name're) | named & numbered capturing group (submatch) |
| (?:re) | non-capturing group |
| (?flags) | set flags within current group; non-capturing |
| (?flags:re) | set flags during re; non-capturing |
| (?#text) | comment |
| (?|x|y|z) | branch numbering reset |
| (?>re) | possessive match of re |
| re@> | possessive match of reVIM |
| %(re) | non-capturing group VIM |
| |
| Flags: |
| i | case-insensitive (default false) |
| m | multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false) |
| s | let . match \n (default false) |
| U | ungreedy: swap meaning of x* and x*?, x+ and x+?, etc (default false) |
| Flag syntax is xyz (set) or -xyz (clear) or xy-z (set xy, clear z). |
| |
| Empty strings: |
| ^ | at beginning of text or line (m=true) |
| $ | at end of text (like \z not \Z) or line (m=true) |
| \A | at beginning of text |
| \b | at ASCII word boundary (\w on one side and \W, \A, or \z on the other) |
| \B | not at ASCII word boundary |
| \G | at beginning of subtext being searched PCRE |
| \G | at end of last match PERL |
| \Z | at end of text, or before newline at end of text |
| \z | at end of text |
| (?=re) | before text matching re |
| (?!re) | before text not matching re |
| (?<=re) | after text matching re |
| (?<!re) | after text not matching re |
| re& | before text matching reVIM |
| re@= | before text matching reVIM |
| re@! | before text not matching reVIM |
| re@<= | after text matching reVIM |
| re@<! | after text not matching reVIM |
| \zs | sets start of match (= \K) VIM |
| \ze | sets end of match VIM |
| \%^ | beginning of file VIM |
| \%$ | end of file VIM |
| \%V | on screen VIM |
| \%# | cursor position VIM |
| \%'m | mark m position VIM |
| \%23l | in line 23 VIM |
| \%23c | in column 23 VIM |
| \%23v | in virtual column 23 VIM |
| |
| Escape sequences: |
| \a | bell (≡ \007) |
| \f | form feed (≡ \014) |
| \t | horizontal tab (≡ \011) |
| \n | newline (≡ \012) |
| \r | carriage return (≡ \015) |
| \v | vertical tab character (≡ \013) |
| \* | literal *, for any punctuation character * |
| \123 | octal character code (up to three digits) |
| \x7F | hex character code (exactly two digits) |
| \x{10FFFF} | hex character code |
| \C | match a single byte even in UTF-8 mode |
| \Q...\E | literal text ... even if ... has punctuation |
| |
| \1 | backreference |
| \b | backspace (use \010) |
| \cK | control char ^K (use \001 etc) |
| \e | escape (use \033) |
| \g1 | backreference |
| \g{1} | backreference |
| \g{+1} | backreference |
| \g{-1} | backreference |
| \g{name} | named backreference |
| \g<name> | subroutine call |
| \g'name' | subroutine call |
| \k<name> | named backreference |
| \k'name' | named backreference |
| \lX | lowercase X |
| \ux | uppercase x |
| \L...\E | lowercase text ... |
| \K | reset beginning of $0 |
| \N{name} | named Unicode character |
| \R | line break |
| \U...\E | upper case text ... |
| \X | extended Unicode sequence |
| |
| \%d123 | decimal character 123 VIM |
| \%xFF | hex character FF VIM |
| \%o123 | octal character 123 VIM |
| \%u1234 | Unicode character 0x1234 VIM |
| \%U12345678 | Unicode character 0x12345678 VIM |
| |
| Character class elements: |
| x | single character |
| A-Z | character range (inclusive) |
| \d | Perl character class |
| [:foo:] | ASCII character class foo |
| \p{Foo} | Unicode character class Foo |
| \pF | Unicode character class F (one-letter name) |
| |
| Named character classes as character class elements: |
| [\d] | digits (≡ \d) |
| [^\d] | not digits (≡ \D) |
| [\D] | not digits (≡ \D) |
| [^\D] | not not digits (≡ \d) |
| [[:name:]] | named ASCII class inside character class (≡ [:name:]) |
| [^[:name:]] | named ASCII class inside negated character class (≡ [:^name:]) |
| [\p{Name}] | named Unicode property inside character class (≡ \p{Name}) |
| [^\p{Name}] | named Unicode property inside negated character class (≡ \P{Name}) |
| |
| Perl character classes (all ASCII-only): |
| \d | digits (≡ [0-9]) |
| \D | not digits (≡ [^0-9]) |
| \s | whitespace (≡ [\t\n\f\r]) |
| \S | not whitespace (≡ [^\t\n\f\r]) |
| \w | word characters (≡ [0-9A-Za-z_]) |
| \W | not word characters (≡ [^0-9A-Za-z_]) |
| |
| \h | horizontal space |
| \H | not horizontal space |
| \v | vertical space |
| \V | not vertical space |
| |
| ASCII character classes: |
| [[:alnum:]] | alphanumeric (≡ [0-9A-Za-z]) |
| [[:alpha:]] | alphabetic (≡ [A-Za-z]) |
| [[:ascii:]] | ASCII (≡ [\x00-\x7F]) |
| [[:blank:]] | blank (≡ [\t]) |
| [[:cntrl:]] | control (≡ [\x00-\x1F\x7F]) |
| [[:digit:]] | digits (≡ [0-9]) |
| [[:graph:]] | graphical (≡ [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_{|}]] == [[:graph:]]) | | [[:lower:]]| lower case (≡[a-z]) | | [[:print:]]| printable (≡[-) | | [[:punct:]]| punctuation (≡[!-/:-@[-{-~]) |
| [[:space:]] | whitespace (≡ [\t\n\v\f\r]) |
| [[:upper:]] | upper case (≡ [A-Z]) |
| [[:word:]] | word characters (≡ [0-9A-Za-z_]) |
| [[:xdigit:]] | hex digit (≡ [0-9A-Fa-f]) |
| |
| Unicode character class names--general category: |
| C | other |
| Cc | control |
| Cf | format |
| Cn | unassigned code points |
| Co | private use |
| Cs | surrogate |
| L | letter |
| LC | cased letter |
| L& | cased letter |
| Ll | lowercase letter |
| Lm | modifier letter |
| Lo | other letter |
| Lt | titlecase letter |
| Lu | uppercase letter |
| M | mark |
| Mc | spacing mark |
| Me | enclosing mark |
| Mn | non-spacing mark |
| N | number |
| Nd | decimal number |
| Nl | letter number |
| No | other number |
| P | punctuation |
| Pc | connector punctuation |
| Pd | dash punctuation |
| Pe | close punctuation |
| Pf | final punctuation |
| Pi | initial punctuation |
| Po | other punctuation |
| Ps | open punctuation |
| S | symbol |
| Sc | currency symbol |
| Sk | modifier symbol |
| Sm | math symbol |
| So | other symbol |
| Z | separator |
| Zl | line separator |
| Zp | paragraph separator |
| Zs | space separator |
| |
| Unicode character class names--scripts: |
| Adlam |
| Ahom |
| Anatolian_Hieroglyphs |
| Arabic |
| Armenian |
| Avestan |
| Balinese |
| Bamum |
| Bassa_Vah |
| Batak |
| Bengali |
| Bhaiksuki |
| Bopomofo |
| Brahmi |
| Braille |
| Buginese |
| Buhid |
| Canadian_Aboriginal |
| Carian |
| Caucasian_Albanian |
| Chakma |
| Cham |
| Cherokee |
| Chorasmian |
| Common |
| Coptic |
| Cuneiform |
| Cypriot |
| Cypro_Minoan |
| Cyrillic |
| Deseret |
| Devanagari |
| Dives_Akuru |
| Dogra |
| Duployan |
| Egyptian_Hieroglyphs |
| Elbasan |
| Elymaic |
| Ethiopic |
| Georgian |
| Glagolitic |
| Gothic |
| Grantha |
| Greek |
| Gujarati |
| Gunjala_Gondi |
| Gurmukhi |
| Han |
| Hangul |
| Hanifi_Rohingya |
| Hanunoo |
| Hatran |
| Hebrew |
| Hiragana |
| Imperial_Aramaic |
| Inherited |
| Inscriptional_Pahlavi |
| Inscriptional_Parthian |
| Javanese |
| Kaithi |
| Kannada |
| Katakana |
| Kawi |
| Kayah_Li |
| Kharoshthi |
| Khitan_Small_Script |
| Khmer |
| Khojki |
| Khudawadi |
| Lao |
| Latin |
| Lepcha |
| Limbu |
| Linear_A |
| Linear_B |
| Lisu |
| Lycian |
| Lydian |
| Mahajani |
| Makasar |
| Malayalam |
| Mandaic |
| Manichaean |
| Marchen |
| Masaram_Gondi |
| Medefaidrin |
| Meetei_Mayek |
| Mende_Kikakui |
| Meroitic_Cursive |
| Meroitic_Hieroglyphs |
| Miao |
| Modi |
| Mongolian |
| Mro |
| Multani |
| Myanmar |
| Nabataean |
| Nag_Mundari |
| Nandinagari |
| New_Tai_Lue |
| Newa |
| Nko |
| Nushu |
| Nyiakeng_Puachue_Hmong |
| Ogham |
| Ol_Chiki |
| Old_Hungarian |
| Old_Italic |
| Old_North_Arabian |
| Old_Permic |
| Old_Persian |
| Old_Sogdian |
| Old_South_Arabian |
| Old_Turkic |
| Old_Uyghur |
| Oriya |
| Osage |
| Osmanya |
| Pahawh_Hmong |
| Palmyrene |
| Pau_Cin_Hau |
| Phags_Pa |
| Phoenician |
| Psalter_Pahlavi |
| Rejang |
| Runic |
| Samaritan |
| Saurashtra |
| Sharada |
| Shavian |
| Siddham |
| SignWriting |
| Sinhala |
| Sogdian |
| Sora_Sompeng |
| Soyombo |
| Sundanese |
| Syloti_Nagri |
| Syriac |
| Tagalog |
| Tagbanwa |
| Tai_Le |
| Tai_Tham |
| Tai_Viet |
| Takri |
| Tamil |
| Tangsa |
| Tangut |
| Telugu |
| Thaana |
| Thai |
| Tibetan |
| Tifinagh |
| Tirhuta |
| Toto |
| Ugaritic |
| Vai |
| Vithkuqi |
| Wancho |
| Warang_Citi |
| Yezidi |
| Yi |
| Zanabazar_Square |
| |
| Vim character classes: |
| \i | identifier character VIM |
| \I | \i except digits VIM |
| \k | keyword character VIM |
| \K | \k except digits VIM |
| \f | file name character VIM |
| \F | \f except digits VIM |
| \p | printable character VIM |
| \P | \p except digits VIM |
| \s | whitespace character (≡ [\t]) VIM |
| \S | non-white space character (≡ [^ \t]) VIM |
| \d | digits (≡ [0-9]) VIM |
| \D | not \d VIM |
| \x | hex digits (≡ [0-9A-Fa-f]) VIM |
| \X | not \xVIM |
| \o | octal digits (≡ [0-7]) VIM |
| \O | not \oVIM |
| \w | word character VIM |
| \W | not \w VIM |
| \h | head of word character VIM |
| \H | not \hVIM |
| \a | alphabetic VIM |
| \A | not \aVIM |
| \l | lowercase VIM |
| \L | not lowercase VIM |
| \u | uppercase VIM |
| \U | not uppercase VIM |
| \_x | \x plus newline, for any xVIM |
| |
| Vim flags: |
| \c | ignore case VIM |
| \C | match case VIM |
| \m | magic VIM |
| \M | nomagic VIM |
| \v | verymagic VIM |
| \V | verynomagic VIM |
| \Z | ignore differences in Unicode combining characters VIM |
| |
| Magic: |
| (?{code}) | arbitrary Perl code PERL |
| (??{code}) | postponed arbitrary Perl code PERL |
| (?n) | recursive call to regexp capturing group n |
| (?+n) | recursive call to relative group +n |
| (?-n) | recursive call to relative group -n |
| (?C) | PCRE callout PCRE |
| (?R) | recursive call to entire regexp (≡ (?0)) |
| (?&name) | recursive call to named group |
| (?P=name) | named backreference |
| (?P>name) | recursive call to named group |
| (?(cond)true|false) | conditional branch |
| (?(cond)true) | conditional branch |
| (*ACCEPT) | make regexps more like Prolog |
| (*COMMIT) | |
| (*F) | |
| (*FAIL) | |
| (*MARK) | |
| (*PRUNE) | |
| (*SKIP) | |
| (*THEN) | |
| (*ANY) | set newline convention |
| (*ANYCRLF) | |
| (*CR) | |
| (*CRLF) | |
| (*LF) | |
| (*BSR_ANYCRLF) | set \R convention PCRE |
| (*BSR_UNICODE) | PCRE |
| |