doc/choosing_a_combinator.md
Note: this list is meant to provide a nicer way to find a nom parser than reading through the documentation on docs.rs. Function combinators are organized in module so they are a bit easier to find.
Links present in this document will nearly always point to complete version of the parser. Most of the parsers also have a streaming version.
Those are used to recognize the lowest level elements of your grammar, like, "here is a dot", or "here is an big endian integer".
| combinator | usage | input | output | comment |
|---|---|---|---|---|
| char | char('a') | "abc" | Ok(("bc", 'a')) | Matches one character (works with non ASCII chars too) |
| is_a | is_a("ab") | "abbac" | Ok(("c", "abba")) | Matches a sequence of any of the characters passed as arguments |
| is_not | is_not("cd") | "ababc" | Ok(("c", "abab")) | Matches a sequence of none of the characters passed as arguments |
| one_of | one_of("abc") | "abc" | Ok(("bc", 'a')) | Matches one of the provided characters (works with non ASCII characters too) |
| none_of | none_of("abc") | "xyab" | Ok(("yab", 'x')) | Matches anything but the provided characters |
| tag | tag("hello") | "hello world" | Ok((" world", "hello")) | Recognizes a specific suite of characters or bytes |
| tag_no_case | tag_no_case("hello") | "HeLLo World" | Ok((" World", "HeLLo")) | Case insensitive comparison. Note that case insensitive comparison is not well defined for unicode, and that you might have bad surprises |
| take | take(4) | "hello" | Ok(("o", "hell")) | Takes a specific number of bytes or characters |
| take_while | take_while(is_alphabetic) | "abc123" | Ok(("123", "abc")) | Returns the longest list of bytes for which the provided function returns true. take_while1 does the same, but must return at least one character, while take_while_m_n must return between m and n |
| take_till | take_till(is_alphabetic) | "123abc" | Ok(("abc", "123")) | Returns the longest list of bytes or characters until the provided function returns true. take_till1 does the same, but must return at least one character. This is the reverse behaviour from take_while: take_till(f) is equivalent to take_while(|c| !f(c)) |
| take_until | take_until("world") | "Hello world" | Ok(("world", "Hello ")) | Returns the longest list of bytes or characters until the provided tag is found. take_until1 does the same, but must return at least one character |
| combinator | usage | input | output | comment |
|---|---|---|---|---|
| alt | alt((tag("ab"), tag("cd"))) | "cdef" | Ok(("ef", "cd")) | Try a list of parsers and return the result of the first successful one |
| permutation | permutation((tag("ab"), tag("cd"), tag("12"))) | "cd12abc" | Ok(("c", ("ab", "cd", "12")) | Succeeds when all its child parser have succeeded, whatever the order |
| combinator | usage | input | output | comment |
|---|---|---|---|---|
| delimited | delimited(char('('), take(2), char(')')) | "(ab)cd" | Ok(("cd", "ab")) | Matches an object from the first parser and discards it, then gets an object from the second parser, and finally matches an object from the third parser and discards it. |
| preceded | preceded(tag("ab"), tag("XY")) | "abXYZ" | Ok(("Z", "XY")) | Matches an object from the first parser and discards it, then gets an object from the second parser. |
| terminated | terminated(tag("ab"), tag("XY")) | "abXYZ" | Ok(("Z", "ab")) | Gets an object from the first parser, then matches an object from the second parser and discards it. |
| pair | pair(tag("ab"), tag("XY")) | "abXYZ" | Ok(("Z", ("ab", "XY"))) | Gets an object from the first parser, then gets another object from the second parser. |
| separated_pair | separated_pair(tag("hello"), char(','), tag("world")) | "hello,world!" | Ok(("!", ("hello", "world"))) | Gets an object from the first parser, then matches an object from the sep_parser and discards it, then gets another object from the second parser. |
| tuple | tuple((tag("ab"), tag("XY"), take(1))) | "abXYZ!" | Ok(("!", ("ab", "XY", "Z"))) | Chains parsers and assemble the sub results in a tuple. You can use as many child parsers as you can put elements in a tuple |
| combinator | usage | input | output | comment |
|---|---|---|---|---|
| count | count(take(2), 3) | "abcdefgh" | Ok(("gh", vec!["ab", "cd", "ef"])) | Applies the child parser a specified number of times |
| many0 | many0(tag("ab")) | "abababc" | Ok(("c", vec!["ab", "ab", "ab"])) | Applies the parser 0 or more times and returns the list of results in a Vec. many1 does the same operation but must return at least one element |
| many0_count | many0_count(tag("ab")) | "abababc" | Ok(("c", 3)) | Applies the parser 0 or more times and returns how often it was applicable. many1_count does the same operation but the parser must apply at least once |
| many_m_n | many_m_n(1, 3, tag("ab")) | "ababc" | Ok(("c", vec!["ab", "ab"])) | Applies the parser between m and n times (n included) and returns the list of results in a Vec |
| many_till | many_till(tag( "ab" ), tag( "ef" )) | "ababefg" | Ok(("g", (vec!["ab", "ab"], "ef"))) | Applies the first parser until the second applies. Returns a tuple containing the list of results from the first in a Vec and the result of the second |
| separated_list0 | separated_list0(tag(","), tag("ab")) | "ab,ab,ab." | Ok((".", vec!["ab", "ab", "ab"])) | separated_list1 works like separated_list0 but must returns at least one element |
| fold_many0 | fold_many0(be_u8, || 0, |acc, item| acc + item) | [1, 2, 3] | Ok(([], 6)) | Applies the parser 0 or more times and folds the list of return values. The fold_many1 version must apply the child parser at least one time |
| fold_many_m_n | fold_many_m_n(1, 2, be_u8, || 0, |acc, item| acc + item) | [1, 2, 3] | Ok(([3], 3)) | Applies the parser between m and n times (n included) and folds the list of return value |
| length_count | length_count(number, tag("ab")) | "2ababab" | Ok(("ab", vec!["ab", "ab"])) | Gets a number from the first parser, then applies the second parser that many times |
Parsing integers from binary formats can be done in two ways: With parser functions, or combinators with configurable endianness.
The following parsers could be found on docs.rs number section.
i16, i32, i64, u16, u32, u64 are combinators that take as argument a nom::number::Endianness, like this: i16(endianness). If the parameter is nom::number::Endianness::Big, parse a big endian i16 integer, otherwise a little endian i16 integer.be_ for big endian numbers, and by le_ for little endian numbers, and the suffix is the type they parse to. As an example, be_u32 parses a big endian unsigned integer stored in 32 bits.
be_f32, be_f64: Big endian floating point numbersle_f32, le_f64: Little endian floating point numbersbe_i8, be_i16, be_i24, be_i32, be_i64, be_i128: Big endian signed integersbe_u8, be_u16, be_u24, be_u32, be_u64, be_u128: Big endian unsigned integersle_i8, le_i16, le_i24, le_i32, le_i64, le_i128: Little endian signed integersle_u8, le_u16, le_u24, le_u32, le_u64, le_u128: Little endian unsigned integerseof: Returns its input if it is at the end of input datacomplete: Replaces an Incomplete returned by the child parser with an ErrorParser::and: method to create a parser by applying the supplied parser to the rest of the input after applying self, returning their results as a tuple (like sequence::tuple but only takes one parser)Parser::and_then: method to create a parser from applying another parser to the output of selfmap_parser: function variant of Parser::and_thenParser::map: method to map a function on the output of selfmap: function variant of Parser::mapParser::flat_map: method to create a parser which will map a parser returning function (such as take or something which returns a parser) on the output of self, then apply that parser over the rest of the input. That is, this method accepts a parser-returning function which consumes the output of self, the resulting parser gets applied to the rest of the inputflat_map: function variant of Parser::flat_mapcond: Conditional combinator. Wraps another parser and calls it if the condition is metmap_opt: Maps a function returning an Option on the output of a parsermap_res: Maps a function returning a Result on the output of a parserinto: Converts the child parser's result to another typenot: Returns a result only if the embedded parser returns Error or Incomplete. Does not consume the inputopt: Make the underlying parser optionalcut: Transform recoverable error into unrecoverable failure (commitment to current branch)peek: Returns a result without consuming the inputrecognize: If the child parser was successful, return the consumed input as the produced valueconsumed: If the child parser was successful, return a tuple of the consumed input and the produced output.verify: Returns the result of the child parser if it satisfies a verification functionvalue: Returns a provided value if the child parser was successfulall_consuming: Returns the result of the child parser only if it consumed all the inputdbg_dmp: Prints a message and the input if the parser failsescaped: Matches a byte string with escaped charactersescaped_transform: Matches a byte string with escaped characters, and returns a new string with the escaped characters replacedprecedence: Parses an expression with regards to operator precedencelength_data: Gets a number from the first parser, then takes a subslice of the input of that size, and returns that subslicelength_value: Gets a number from the first parser, takes a subslice of the input of that size, then applies the second parser on that subslice. If the second parser returns Incomplete, length_value will return an errorbits: Transforms the current input type (byte slice &[u8]) to a bit stream on which bit specific parsers and more general combinators can be appliedbytes: Transforms its bits stream input back into a byte slice for the underlying parsersuccess: Returns a value without consuming any input, always succeedsfail: Inversion of success. Always fails.Use these functions with a combinator like take_while:
is_alphabetic: Tests if byte is ASCII alphabetic: [A-Za-z]is_alphanumeric: Tests if byte is ASCII alphanumeric: [A-Za-z0-9]is_digit: Tests if byte is ASCII digit: [0-9]is_hex_digit: Tests if byte is ASCII hex digit: [0-9A-Fa-f]is_oct_digit: Tests if byte is ASCII octal digit: [0-7]is_bin_digit: Tests if byte is ASCII binary digit: [0-1]is_space: Tests if byte is ASCII space or tab: [ \t]is_newline: Tests if byte is ASCII newline: [\n]Alternatively there are ready to use functions:
alpha0: Recognizes zero or more lowercase and uppercase alphabetic characters: [a-zA-Z]. alpha1 does the same but returns at least one characteralphanumeric0: Recognizes zero or more numerical and alphabetic characters: [0-9a-zA-Z]. alphanumeric1 does the same but returns at least one characteranychar: Matches one byte as a charactercrlf: Recognizes the string \r\ndigit0: Recognizes zero or more numerical characters: [0-9]. digit1 does the same but returns at least one characterdouble: Recognizes floating point number in a byte string and returns a f64float: Recognizes floating point number in a byte string and returns a f32hex_digit0: Recognizes zero or more hexadecimal numerical characters: [0-9A-Fa-f]. hex_digit1 does the same but returns at least one characterhex_u32: Recognizes a hex-encoded integerline_ending: Recognizes an end of line (both \n and \r\n)multispace0: Recognizes zero or more spaces, tabs, carriage returns and line feeds. multispace1 does the same but returns at least one characternewline: Matches a newline character \nnot_line_ending: Recognizes a string of any char except \r or \noct_digit0: Recognizes zero or more octal characters: [0-7]. oct_digit1 does the same but returns at least one characterbin_digit0: Recognizes zero or more binary characters: [0-1]. bin_digit1 does the same but returns at least one characterrest: Return the remaining inputrest_len: Return the length of the remaining inputspace0: Recognizes zero or more spaces and tabs. space1 does the same but returns at least one charactertab: Matches a tab character \t