Back to Ruby

Strscan

doc/strscan/strscan.md

4.0.317.7 KB
Original Source

\Class StringScanner supports processing a stored string as a stream; this code creates a new StringScanner object with string 'foobarbaz':

rb
require 'strscan'
scanner = StringScanner.new('foobarbaz')

About the Examples

All examples here assume that StringScanner has been required:

rb
require 'strscan'

Some examples here assume that these constants are defined:

rb
MULTILINE_TEXT = <<~EOT
Go placidly amid the noise and haste,
and remember what peace there may be in silence.
EOT

HIRAGANA_TEXT = 'こんにちは'

ENGLISH_TEXT = 'Hello'

Some examples here assume that certain helper methods are defined:

  • put_situation(scanner): Displays the values of the scanner's methods #pos, #charpos, #rest, and #rest_size.
  • put_match_values(scanner): Displays the scanner's [match values][9].
  • match_values_cleared?(scanner): Returns whether the scanner's [match values][9] are cleared.

See examples at helper methods.

The StringScanner \Object

This code creates a StringScanner object (we'll call it simply a scanner), and shows some of its basic properties:

rb
scanner = StringScanner.new('foobarbaz')
scanner.string # => "foobarbaz"
put_situation(scanner)
# Situation:
#   pos:       0
#   charpos:   0
#   rest:      "foobarbaz"
#   rest_size: 9

The scanner has:

  • A <i>stored string</i>, which is:

    • Initially set by StringScanner.new(string) to the given string ('foobarbaz' in the example above).
    • Modifiable by methods #string=(new_string) and #concat(more_string).
    • Returned by method #string.

    More at [Stored String][1] below.

  • A position; a zero-based index into the bytes of the stored string (not into its characters):

    • Initially set by StringScanner.new to 0.
    • Returned by method #pos.
    • Modifiable explicitly by methods #reset, #terminate, and #pos=(new_pos).
    • Modifiable implicitly (various traversing methods, among others).

    More at [Byte Position][2] below.

  • A <i>target substring</i>, which is a trailing substring of the stored string; it extends from the current position to the end of the stored string:

    • Initially set by StringScanner.new(string) to the given string ('foobarbaz' in the example above).
    • Returned by method #rest.
    • Modified by any modification to either the stored string or the position.

    <b>Most importantly</b>: the searching and traversing methods operate on the target substring, which may be (and often is) less than the entire stored string.

    More at [Target Substring][3] below.

Stored \String

The <i>stored string</i> is the string stored in the StringScanner object.

Each of these methods sets, modifies, or returns the stored string:

MethodEffect
::new(string)Creates a new scanner for the given string.
#string=(new_string)Replaces the existing stored string.
#concat(more_string)Appends a string to the existing stored string.
#stringReturns the stored string.

Positions

A StringScanner object maintains a zero-based <i>byte position</i> and a zero-based <i>character position</i>.

Each of these methods explicitly sets positions:

MethodEffect
#resetSets both positions to zero (beginning of stored string).
#terminateSets both positions to the end of the stored string.
#pos=(new_byte_position)Sets byte position; adjusts character position.

Byte Position (Position)

The byte position (or simply position) is a zero-based index into the bytes in the scanner's stored string; for a new StringScanner object, the byte position is zero.

When the byte position is:

  • Zero (at the beginning), the target substring is the entire stored string.
  • Equal to the size of the stored string (at the end), the target substring is the empty string ''.

To get or set the byte position:

  • #pos: returns the byte position.
  • #pos=(new_pos): sets the byte position.

Many methods use the byte position as the basis for finding matches; many others set, increment, or decrement the byte position:

rb
scanner = StringScanner.new('foobar')
scanner.pos # => 0
scanner.scan(/foo/) # => "foo" # Match found.
scanner.pos         # => 3     # Byte position incremented.
scanner.scan(/foo/) # => nil   # Match not found.
scanner.pos # => 3             # Byte position not changed.

Some methods implicitly modify the byte position; see:

  • [Setting the Target Substring][4].
  • [Traversing the Target Substring][5].

The values of these methods are derived directly from the values of #pos and #string:

  • #charpos: the [character position][7].
  • #rest: the [target substring][3].
  • #rest_size: rest.size.

Character Position

The character position is a zero-based index into the characters in the stored string; for a new StringScanner object, the character position is zero.

\Method #charpos returns the character position; its value may not be reset explicitly.

Some methods change (increment or reset) the character position; see:

  • [Setting the Target Substring][4].
  • [Traversing the Target Substring][5].

Example (string includes multi-byte characters):

rb
scanner = StringScanner.new(ENGLISH_TEXT) # Five 1-byte characters.
scanner.concat(HIRAGANA_TEXT)             # Five 3-byte characters
scanner.string # => "Helloこんにちは"       # Twenty bytes in all.
put_situation(scanner)
# Situation:
#   pos:       0
#   charpos:   0
#   rest:      "Helloこんにちは"
#   rest_size: 20
scanner.scan(/Hello/) # => "Hello" # Five 1-byte characters.
put_situation(scanner)
# Situation:
#   pos:       5
#   charpos:   5
#   rest:      "こんにちは"
#   rest_size: 15
scanner.getch         # => "こ"    # One 3-byte character.
put_situation(scanner)
# Situation:
#   pos:       8
#   charpos:   6
#   rest:      "んにちは"
#   rest_size: 12

Target Substring

The target substring is the part of the [stored string][1] that extends from the current [byte position][2] to the end of the stored string; it is always either:

  • The entire stored string (byte position is zero).
  • A trailing substring of the stored string (byte position positive).

The target substring is returned by method #rest, and its size is returned by method #rest_size.

Examples:

rb
scanner = StringScanner.new('foobarbaz')
put_situation(scanner)
# Situation:
#   pos:       0
#   charpos:   0
#   rest:      "foobarbaz"
#   rest_size: 9
scanner.pos = 3
put_situation(scanner)
# Situation:
#   pos:       3
#   charpos:   3
#   rest:      "barbaz"
#   rest_size: 6
scanner.pos = 9
put_situation(scanner)
# Situation:
#   pos:       9
#   charpos:   9
#   rest:      ""
#   rest_size: 0

Setting the Target Substring

The target substring is set whenever:

  • The [stored string][1] is set (position reset to zero; target substring set to stored string).
  • The [byte position][2] is set (target substring adjusted accordingly).

Querying the Target Substring

This table summarizes (details and examples at the links):

MethodReturns
#restTarget substring.
#rest_sizeSize (bytes) of target substring.

Searching the Target Substring

A search method examines the target substring, but does not advance the [positions][11] or (by implication) shorten the target substring.

This table summarizes (details and examples at the links):

MethodReturnsSets Match Values?
#check(pattern)Matched leading substring or +nil+.Yes.
#check_until(pattern)Matched substring (anywhere) or +nil+.Yes.
#exist?(pattern)Matched substring (anywhere) end index.Yes.
#match?(pattern)Size of matched leading substring or +nil+.Yes.
#peek(size)Leading substring of given length (bytes).No.
#peek_byteInteger leading byte or +nil+.No.
#restTarget substring (from byte position to end).No.

Traversing the Target Substring

A traversal method examines the target substring, and, if successful:

  • Advances the [positions][11].
  • Shortens the target substring.

This table summarizes (details and examples at links):

MethodReturnsSets Match Values?
#get_byteLeading byte or +nil+.No.
#getchLeading character or +nil+.No.
#scan(pattern)Matched leading substring or +nil+.Yes.
#scan_byteInteger leading byte or +nil+.No.
#scan_until(pattern)Matched substring (anywhere) or +nil+.Yes.
#skip(pattern)Matched leading substring size or +nil+.Yes.
#skip_until(pattern)Position delta to end-of-matched-substring or +nil+.Yes.
#unscan+self+.No.

Querying the Scanner

Each of these methods queries the scanner object without modifying it (details and examples at links)

MethodReturns
#beginning_of_line?+true+ or +false+.
#charposCharacter position.
#eos?+true+ or +false+.
#fixed_anchor?+true+ or +false+.
#inspectString representation of +self+.
#posByte position.
#restTarget substring.
#rest_sizeSize of target substring.
#stringStored string.

Matching

StringScanner implements pattern matching via Ruby class [Regexp][6], and its matching behaviors are the same as Ruby's except for the [fixed-anchor property][10].

Matcher Methods

Each <i>matcher method</i> takes a single argument pattern, and attempts to find a matching substring in the [target substring][3].

MethodPattern TypeMatches Target SubstringSuccess ReturnMay Update Positions?
#checkRegexp or String.At beginning.Matched substring.No.
#check_untilRegexp or String.Anywhere.Substring.No.
#match?Regexp or String.At beginning.Match size.No.
#exist?Regexp or String.Anywhere.Substring size.No.
#scanRegexp or String.At beginning.Matched substring.Yes.
#scan_untilRegexp or String.Anywhere.Substring.Yes.
#skipRegexp or String.At beginning.Match size.Yes.
#skip_untilRegexp or String.Anywhere.Substring size.Yes.

Which matcher you choose will depend on:

  • Where you want to find a match:

    • Only at the beginning of the target substring: #check, #match?, #scan, #skip.
    • Anywhere in the target substring: #check_until, #exist?, #scan_until, #skip_until.
  • Whether you want to:

    • Traverse, by advancing the positions: #scan, #scan_until, #skip, #skip_until.
    • Keep the positions unchanged: #check, #check_until, #match?, #exist?.
  • What you want for the return value:

    • The matched substring: #check, #scan.
    • The substring: #check_until, #scan_until.
    • The match size: #match?, #skip.
    • The substring size: #exist?, #skip_until.

Match Values

The <i>match values</i> in a StringScanner object generally contain the results of the most recent attempted match.

Each match value may be thought of as:

  • Clear: Initially, or after an unsuccessful match attempt: usually, false, nil, or {}.
  • Set: After a successful match attempt: true, string, array, or hash.

Each of these methods clears match values:

  • ::new(string).
  • #reset.
  • #terminate.

Each of these methods attempts a match based on a pattern, and either sets match values (if successful) or clears them (if not);

  • #check(pattern)
  • #check_until(pattern)
  • #exist?(pattern)
  • #match?(pattern)
  • #scan(pattern)
  • #scan_until(pattern)
  • #skip(pattern)
  • #skip_until(pattern)

Basic Match Values

Basic match values are those not related to captures.

Each of these methods returns a basic match value:

MethodReturn After MatchReturn After No Match
#matched?+true+.+false+.
#matched_sizeSize of matched substring.+nil+.
#matchedMatched substring.+nil+.
#pre_matchSubstring preceding matched substring.+nil+.
#post_matchSubstring following matched substring.+nil+.

See examples below.

Captured Match Values

Captured match values are those related to [captures][16].

Each of these methods returns a captured match value:

MethodReturn After MatchReturn After No Match
#sizeCount of captured substrings.+nil+.
#[](n)<tt>n</tt>th captured substring.+nil+.
#capturesArray of all captured substrings.+nil+.
#values_at(*n)Array of specified captured substrings.+nil+.
#named_capturesHash of named captures.<tt>{}</tt>.

See examples below.

Match Values Examples

Successful basic match attempt (no captures):

rb
scanner = StringScanner.new('foobarbaz')
scanner.exist?(/bar/)
put_match_values(scanner)
# Basic match values:
#   matched?:       true
#   matched_size:   3
#   pre_match:      "foo"
#   matched  :      "bar"
#   post_match:     "baz"
# Captured match values:
#   size:           1
#   captures:       []
#   named_captures: {}
#   values_at:      ["bar", nil]
#   []:
#     [0]:          "bar"
#     [1]:          nil

Failed basic match attempt (no captures);

rb
scanner = StringScanner.new('foobarbaz')
scanner.exist?(/nope/)
match_values_cleared?(scanner) # => true

Successful unnamed capture match attempt:

rb
scanner = StringScanner.new('foobarbazbatbam')
scanner.exist?(/(foo)bar(baz)bat(bam)/)
put_match_values(scanner)
# Basic match values:
#   matched?:       true
#   matched_size:   15
#   pre_match:      ""
#   matched  :      "foobarbazbatbam"
#   post_match:     ""
# Captured match values:
#   size:           4
#   captures:       ["foo", "baz", "bam"]
#   named_captures: {}
#   values_at:      ["foobarbazbatbam", "foo", "baz", "bam", nil]
#   []:
#     [0]:          "foobarbazbatbam"
#     [1]:          "foo"
#     [2]:          "baz"
#     [3]:          "bam"
#     [4]:          nil

Successful named capture match attempt; same as unnamed above, except for #named_captures:

rb
scanner = StringScanner.new('foobarbazbatbam')
scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/)
scanner.named_captures # => {"x"=>"foo", "y"=>"baz", "z"=>"bam"}

Failed unnamed capture match attempt:

rb
scanner = StringScanner.new('somestring')
scanner.exist?(/(foo)bar(baz)bat(bam)/)
match_values_cleared?(scanner) # => true

Failed named capture match attempt; same as unnamed above, except for #named_captures:

rb
scanner = StringScanner.new('somestring')
scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/)
match_values_cleared?(scanner) # => false
scanner.named_captures # => {"x"=>nil, "y"=>nil, "z"=>nil}

Fixed-Anchor Property

Pattern matching in StringScanner is the same as in Ruby's, except for its fixed-anchor property, which determines the meaning of '\A':

  • false (the default): matches the current byte position.

    rb
    scanner = StringScanner.new('foobar')
    scanner.scan(/\A./) # => "f"
    scanner.scan(/\A./) # => "o"
    scanner.scan(/\A./) # => "o"
    scanner.scan(/\A./) # => "b"
    
  • true: matches the beginning of the target substring; never matches unless the byte position is zero:

    rb
    scanner = StringScanner.new('foobar', fixed_anchor: true)
    scanner.scan(/\A./) # => "f"
    scanner.scan(/\A./) # => nil
    scanner.reset
    scanner.scan(/\A./) # => "f"
    

The fixed-anchor property is set when the StringScanner object is created, and may not be modified (see StringScanner.new); method #fixed_anchor? returns the setting.