CS 279 - Week 7 Lab - 10-3-12
* if I wanted to match a line that contained
nothing but white space (optionally) and
a digit (exactly one),
(and the white space can come before or after
the digit),
pattern [[:blank:]] matches a blank or a tab
character (but this can vary w/your locale)
I could use the pattern
^[[:blank:]]*[0-9][[:blank:]]*$
* subexpressions in a regular expression give
you a way to search for lines with repeated
substrings
in a BRE,
you surround a desired subexpression with
\( and \)
after that, in the same BRE,
\1 refers the first subexpression,
\2 refers to the second subexpression,
...
\9 refers to the ninth subexpression
(these are called backreferences)
g\([a-z]*\)moo\1
that would match a line containing
g, followed by 0 or more lowercase letters,
followed by moo,
followed by the SAME 0 or more lowercase letters
g\([a-z]*\)moo\([A-Z][0-9]\)\1\2
g
followed by 0 or more lowercase letters (expr1)
followed by moo
followed by an uppercase letter and a digit (expr2)
followed by expr1
followed by expr2
* a subexpression or backreference can also
be followed by *, indicating 0 or more
repetitions
^\([A-Z]\)\1*$
a line containing either a single uppercase
letter OR multiple of the same uppercase letter
an uppercase letter followed by zero or more
of the same uppercase letter!
* for a particular number of repetitions
of some pattern, you can use
an interval expression
you can follow a single character, or a regular
expression denoting a single character,
by this notation;
here's the notation:
\{m\} matches EXACTLY m occurrences of the
preceding
\{m,\} matches m OR MORE occurrences of the
preceding
\{m,n\} matches between m and n occurrences
of the preceding
I'd like to match JUST lines containing 5-digit
integers ^[0-9]\{5\}$
...for 5 or more, put a comma after the 5:
^[0-9]\{5,\}$
...for between 5 and 7 digits inclusive,
put a 7 after the comma
^[0-9]\{5,7\}$
^[0-9]\{0,3\}$
* now -- EXTENDED regular expressions
ERE
* the command egrep can take extended
regular expressions;
so can grep with the -E option
(but without the -E option grep expects BREs)
* now, a + is special,
and after an ERE, it means match ONE or MORE
instances of that ERE
a+ <-- match one or more a's
* now, a ? is special,
and after an ERE, it means match ZERO or ONE
instances of that ERE
^a?$ <-- matches a blank line or a line with 1 a
* unescaped parentheses are used for
grouping subexpressions --
and | means OR between two subexpressions
within unescaped parentheses
ex: (cat|dog) matches a line including either
cat or dog
ex: (cat|dog)(fish|fight)
* since + ? | ( ) are now special,
you need to escape them to match JUST + ? | ( )
* Command-line arguments in bash shell scripts
* bash sets certain local variables for you
within a shell script;
$0 is the name of the script
$1 is the first command line argument
$2 is the second command line argument
...
$# is the number of command line arguments
$@ and $* contain all the command line arguments