CS 279 - Week 7 Lab - 10-3-12

*   if I wanted to match a line that contained
    nothing but white space (optionally) and
    a digit (exactly one),

    (and the white space can come before or after
    the digit),

    pattern [[:blank:]] matches a blank or a tab
    character (but this can vary w/your locale)

    I could use the pattern

    ^[[:blank:]]*[0-9][[:blank:]]*$

*   subexpressions in a regular expression give
    you a way to search for lines with repeated
    substrings

    in a BRE,
    you surround a desired subexpression with
    \(   and   \)

    after that, in the same BRE,
    \1   refers the first subexpression,
    \2   refers to the second subexpression,
    ...
    \9   refers to the ninth subexpression
    (these are called backreferences)

    g\([a-z]*\)moo\1

    that would match a line containing
    g, followed by 0 or more lowercase letters,
    followed by moo,
    followed by the SAME 0 or more lowercase letters
 
    g\([a-z]*\)moo\([A-Z][0-9]\)\1\2

    g
    followed by 0 or more lowercase letters (expr1)
    followed by moo
    followed by an uppercase letter and a digit (expr2)
    followed by expr1
    followed by expr2

*   a subexpression or backreference can also
    be followed by *, indicating 0 or more
    repetitions

    ^\([A-Z]\)\1*$

    a line containing either a single uppercase
    letter OR multiple of the same uppercase letter

    an uppercase letter followed by zero or more
    of the same uppercase letter!

*   for a particular number of repetitions
    of some pattern, you can use
    an interval expression

    you can follow a single character, or a regular
    expression denoting a single character,
    by this notation;

    here's the notation:
    \{m\}      matches EXACTLY m occurrences of the 
               preceding 
    \{m,\}     matches m OR MORE occurrences of the
               preceding
    \{m,n\}    matches between m and n occurrences
               of the preceding

    I'd like to match JUST lines containing 5-digit
    integers   ^[0-9]\{5\}$
    ...for 5 or more, put a comma after the 5:
               ^[0-9]\{5,\}$
    ...for between 5 and 7 digits inclusive,
       put a 7 after the comma
               ^[0-9]\{5,7\}$

^[0-9]\{0,3\}$

*  now -- EXTENDED regular expressions
          ERE

   *   the command egrep can take extended
       regular expressions;
       so can grep with the -E option
       (but without the -E option grep expects BREs)

   *   now, a + is special,
       and after an ERE, it means match ONE or MORE
       instances of that ERE

       a+   <-- match one or more a's

   *   now, a ? is special,
       and after an ERE, it means match ZERO or ONE
       instances of that ERE
 
       ^a?$ <-- matches a blank line or a line with 1 a

   *   unescaped parentheses are used for
       grouping subexpressions --

       and | means OR between two subexpressions
       within unescaped parentheses

       ex:  (cat|dog)  matches a line including either
                       cat or dog

       ex:  (cat|dog)(fish|fight)

   *   since    +  ? | ( )  are now special,
       you need to escape them to match JUST + ? | ( )

*   Command-line arguments in bash shell scripts

*   bash sets certain local variables for you
    within a shell script;
    $0   is the name of the script
    $1   is the first command line argument
    $2   is the second command line argument
    ...
    $#   is the number of command line arguments

    $@  and  $*  contain all the command line arguments