tcl7.6 User Commands - regexp






NAME

     regexp - Match a regular expression against a string


SYNOPSIS

     regexp ?switches? exp string  ?matchVar?  ?subMatchVar  sub-
     MatchVar ...?





DESCRIPTION

     Determines whether the regular expression exp  matches  part
     or all of string and returns 1 if it does, 0 if it doesn't.

     If additional arguments are specified after string then they
     are  treated  as  the  names of variables in which to return
     information about  which  part(s)  of  string  matched  exp.
     MatchVar will be set to the range of string that matched all
     of exp.  The first subMatchVar will contain  the  characters
     in string that matched the leftmost parenthesized subexpres-
     sion within exp, the next subMatchVar will contain the char-
     acters  that matched the next parenthesized subexpression to
     the right in exp, and so on.

     If the initial arguments to regexp start with  -  then  they
     are   treated  as  switches.   The  following  switches  are
     currently supported:

     -nocase    Causes upper-case  characters  in  string  to  be
               treated as lower case during the matching process.

     -indices   Changes  what  is  stored  in  the  subMatchVars.
               Instead  of  storing  the matching characters from
               string, each variable will contain a list  of  two
               decimal  strings  giving  the indices in string of
               the first and  last  characters  in  the  matching
               range of characters.

     --          Marks the end of switches.  The argument follow-
               ing  this  one  will  be treated as exp even if it
               starts with a -.

     If there are more subMatchVar's  than  parenthesized  subex-
     pressions  within  exp,  or if a particular subexpression in
     exp doesn't match the string (e.g. because it was in a  por-
     tion  of  the  expression  that  wasn't  matched),  then the
     corresponding subMatchVar will be set to ``- 1  - 1''  if  -
     indices has been specified or to an empty string otherwise.




REGULAR EXPRESSIONS

     Regular expressions are implemented  using  Henry  Spencer's
     package  (thanks,  Henry!),  and  much of the description of
     regular expressions below is copied verbatim from his manual
     entry.

     A regular expression is zero or more branches, separated  by
     ``|''.    It  matches  anything  that  matches  one  of  the
     branches.

     A branch is zero or more pieces, concatenated.  It matches a
     match  for  the  first,  followed by a match for the second,
     etc.

     A piece is an atom possibly followed  by  ``*'',  ``+'',  or
     ``?''.  An atom followed by ``*'' matches a sequence of 0 or
     more matches of the atom.  An atom followed by ``+'' matches
     a  sequence  of 1 or more matches of the atom.  An atom fol-
     lowed by ``?'' matches a match of  the  atom,  or  the  null
     string.

     An atom is a regular expression in parentheses  (matching  a
     match  for  the  regular  expression),  a range (see below),
     ``.''  (matching any single character), ``^'' (matching  the
     null  string  at  the  beginning of the input string), ``$''
     (matching the null string at the end of the input string), a
     ``\''  followed by a single character (matching that charac-
     ter), or a  single  character  with  no  other  significance
     (matching that character).

     A range is a sequence of characters enclosed in ``[]''.   It
     normally matches any single character from the sequence.  If
     the sequence begins with ``^'', it matches any single  char-
     acter  not from the rest of the sequence.  If two characters
     in the sequence are separated by ``-'',  this  is  shorthand
     for  the  full  list  of ASCII characters between them (e.g.
     ``[0-9]'' matches any decimal digit).  To include a  literal
     ``]''  in the sequence, make it the first character (follow-
     ing a possible ``^'').  To include a literal ``-'', make  it
     the first or last character.



CHOOSING AMONG ALTERNATIVE MATCHES

     In general there may be more than one way to match a regular
     expression  to  an  input string.  For example, consider the
     command
          regexp  (a*)b*  aabaaabb  x  y
     Considering only the rules given so far, x and y  could  end
     up  with  the values aabb and aa, aaab and aaa, ab and a, or
     any of several other combinations.  To resolve  this  poten-
     tial  ambiguity  regexp chooses among alternatives using the
     rule ``first then longest''.  In other words,  it  considers
     the  possible  matches  in  order working from left to right
     across the input string and the pattern, and it attempts  to
     match longer pieces of the input string before shorter ones.
     More specifically, the following rules apply  in  decreasing
     order of priority:

     [1]  If a regular expression could match two different parts
          of  an  input  string  then  it will match the one that
          begins earliest.

     [2]  If a regular expression contains | operators  then  the
          leftmost matching sub-expression is chosen.

     [3]  In *, +, and ? constructs, longer matches are chosen in
          preference to shorter ones.

     [4]  In sequences of expression  components  the  components
          are considered from left to right.

     In the example from above, (a*)b*  matches  aab:   the  (a*)
     portion  of the pattern is matched first and it consumes the
     leading aa; then the b* portion of the pattern consumes  the
     next b.  Or, consider the following example:
          regexp  (ab|a)(b*)c  abc  x  y  z
     After this command x will be abc, y will be ab, and  z  will
     be an empty string.  Rule 4 specifies that (ab|a) gets first
     shot at the input string and Rule 2 specifies  that  the  ab
     sub-expression is checked before the a sub-expression.  Thus
     the b has already been claimed before the (b*) component  is
     checked and (b*) must match an empty string.



KEYWORDS

     match, regular expression, string