Regular Expressions

Flavio daCosta ~ n0p@n0p.net

Kyle Rankin ~ greenfly@greenfly.org

What? Why?

What are They?

Why would you use them?

Where/How do you use them?

Shell Globbing

[]$ ls /var/websites/site[35]/httpd/logs/*.log
/var/websites/site3/httpd/logs/access.log
/var/websites/site3/httpd/logs/error.log
/var/websites/site5/httpd/logs/access.log
/var/websites/site5/httpd/logs/error.log
[]$

Literals

/tom/

tomatoe
automobile
atom

Metacharacters: Wildcard

. Match any character

/chu.e/              /be.r/  

parachute            beer
schuler              beard
bouchure             betray

Metacharacters: Anchors

/^p..ne$/

plane
paine
prune

Metacharacters: Character Class

/^gr[ea]y$/

gray
grey

Character Class - Metacharacters

Note: . is literal in a Character class

/p[^aeiouAEIOU][a-eA-E]r/

encipher
intercepter
atmospHere
APPARENT

Metacharacters: Alternation

/this|that/

get this thing
this presentation is cool
I need that
get that stuff

Note: Discussing () later.

Metacharacters: Quantifiers

/^Subject: N?[0-9]+.*$/

Subject: 5 Cool Things
Subject: 5551212 is my number
Subject: N535GT

Metacharacters: Quantifiers

/^[0-9]{1,3},?[0-9]{3}$/

1,247
32,456
673419

4-6 Digits with optional comma.

Metacharacters: SubExpression Grouping

/gr(e|a)y/

grey
gray

/get (this|that)/

get this thing
get that stuff

Earlier Example

Note: using /gr[ea]y/ would be equivilant

Metacharacters: SubExpression Grouping

/^(anti)?social$/

antisocial
social

/^([0-9]{3}[-. ]?){1,2}[0-9]{4}$/

248-555-1212
313.555.1212
5865551212
734 2997650

Metacharacters: SubExpression Grouping

/([a-z]{3})\1/


assassin
counterterrorist
cringing
Tsutsutsi

Metacharacters: Backslash

Shorthands

Shorthands - POSIX (non inclusive list)

Metacharacters: More Anchors

Metacharacters: Non Greedy Quantifiers (PCRE)

Search for Perl comments

Problem:

Search through an irc log file for all comments I have made containing a perl one-liner.

Solution:

Perl one-liners always contain perl -e (in some form or another), so match that along with the common regular expression anchors, //, then make sure that I'm the one that said the one-liner by anchoring on the timestamp at the beginning of each line.

Sample line:
14:16 @     greenfly| tail -f /var/log/mail.log | 
perl -ne 'print scalar((split)[6]),"\n" if(/imapd.*LOGIN/);'


Parser:
egrep "^..:.. [@%+]? *greenfly.*perl -.?e.*/.*/" .irssi/irclogs/ars/#linux.log

Grab Country Codes

Problem:

Grab the current list of assigned country codes from iana.org.

Solution:

Pipe the output of wget through a regular expression, grabbing the country and TLD

Sample line:        

>.bg  –  Bulgaria</A>

Parser:

[]$ wget -q -O - http://www.iana.org/cctld/cctld-whois.htm \
    | perl -ne 'print "$1\t$2\n" if />\.(\w\w).*?(\w+)</;'

Check Terror Level

Problem:

Check the current Department of Homeland Security Terror Level from the command line.

Solution:

The front page for the Department of Homeland Security displays an image corresponding to the current Terror Level. wget the Homeland Security page and pipe it through a regular expression, grabbing the alt tag for that image.

Sample line:
src="/homeland/images/threat/elevated.jpg" alt="Elevated" border="0"


Parser:
[]$ wget -q -O - http://www.whitehouse.gov/homeland/ \
    | perl -ne 'print "Terror Level is: \n"
    if /src=\"\/homeland\/images\/threat\/\w+.jpg\" alt=\"(\w+)\"/;'

Monitor IMAP logins

Problem:

You want to monitor which users are currently logging into imap in real time.

Solution:

Tail the file and print out the user (happens to be the 7th space-separated field) logging into IMAP

Sample line:
Apr 11 20:35:34 napoleon imapd-ssl: LOGIN, user=greenfly,
ip=[::ffff:192.168.0.11]

Parser:
[]$ tail -f /var/log/mail.log \
    | perl -ne 'print scalar((split)[6]),"\n" if(/imapd.*LOGIN/);'
 

Keep Logs in Sequence

Problem:

You create a large apache log file for the quarter by concatenating weekly apache logs together, and you want to make sure that all of the logs made it in sequence.

Solution:

Match the date header in each apache log entry, and print out the date if it is different from the date on the last line. The correctly formatted log will result in output that is in proper sequence. If you want to get more sophisticated, you could use something like Date::Calc and logically compare the dates in the program itself.

Sample line:
foo.example.com - - [11/Apr/2004:12:52:26 -0700] 
"GET /fujitsu/screenshot1_s.jpg HTTP/1.1" 200 37946 
"http://www.greenfly.org/fujitsu/" "Mozilla/5.0 (X11; U;
Linux i686; en-US; rv:1.6) Gecko/20040327 Firefox/0.8"

Parser:
[]$ cat Audit_2003_Qtr_1_full_log \
    | perl -ne 'm|\[(\d+/\w{3}/\d+)|; 
		print "$1\n" if($1 ne $last);
		$last = $1;'

Grep for Comments

Problem:

You want to grep an entire C source code tree for comments and dump them to a text file, along with where you found them:

Solution:

#!/usr/bin/perl

#########################################################
# this script will grep out any "C-style" comments      #
# from a file.  C-Style being /* */ or //               #
#                                                       #
# usage: comment_grep <filename>                        #
#                                                       #
# to recursively traverse a directory with this:        #
# find ./ -name '*' -exec comment_grep {} > /tmp/foo \; #
#                                                       #
#########################################################

$infile = shift;

$/ = "";        # remove newline from the line delimiter

$found = 0;
while(<>)
{
#this pattern will match /* */ comments
   if(
	 m{
	 (/\*              # match the /*
	 (?:[^*][^/])*     # match any number of characters that AREN'T */
         \*/)              # match the */
	 }mx               # multi-line regex x option to allow comments
     )
   {
      unless($found)    # list what file you found this in
# , if you find anything 
      {
	 print "\nfrom $infile:\n";
      }
      print "\n";     # print /* */ and anything in between
      $found = 1;
   }
# this pattern will match // comments
   if(
	 m{
	 (//.*\n)          # match // then anything, up to a newline
	 }x                # x option to allow comments
     )
   {
      unless($found)    # list what file you found this in, if you
                        # find anything.
      {
	 print "\nfrom $infile:\n";
      }
      print "";       # print /* */ and anything in between
      $found = 1;
   }
}

Add Backticks to SQL Queries

Problem:

Take a file full of SQL queries and surround the values in the first paren only with backticks. As the sample queries show, there aren't a set number of values in the query.

Solution:

Grab all the values out of the first set of parens, use Perl's split and join to surround each value with backticks, and print out the result.

Sample line:
INSERT INTO seen (nick, "when", "where", what) 
    VALUES ('RiSE', '2003-08-03 20:42:04-05', '#foo', 'Part');

INSERT INTO seen (nick, "when", "where", what, something) 
    VALUES ('RiSE', '2003-08-03 20:42:04-05', '#foo', 'Part');

INSERT INTO seen (nick, "when", "where", what, something, else) 
    VALUES ('RiSE', '2003-08-03 20:42:04-05', '#foo', 'Part');

Parser:

#!/usr/bin/perl

while(<>)
{
# if the line has has parens then process it, otherwise just print the line out
   if(/^([^(]+\()	# match some number of nonparens, then a paren,
                        # and put it into the  variable

	    (.*?)	# match anycharacter, but as few as possible, so you
                        # don't go past the very next closing paren.  Then put
			# what you match in  variable

	    (\).*$)	# match a closing paren, and anything else up to the
			# end of the line.  put it in the  variable
	    /x)
	 {
	 $a=$1;		# store these temporary variables somewhere
	 $b=$2;
	 $c=$3;

	 $b =~ s/"//g;	# strip any quotes (could strip other 'bad' chars here)

	 $b = "`" . 	# start with an opening backtick
	 join("`, `", 	# join each item from the array output from the 
			# following split command into a single string, with	
			# each item separated with `, `

	    split(/,\s*/, $b))	# this split command takes each comma
				# delimited item and splits it into an
				# individual string in an array

	 . "`"; 		# put the closing ` on the string


	 print "$a$b$c\n";
	 }
   else
   {
      print;
   }
}

Create reverse DNS zone

Problem:

Have a forward DNS Zone file, need to create reverse zone file.

Solution:

s/^([^ \t]*?)(\.binaryservice\.com\.)?
 [ \t]*(IN)?[ \t]+A[ \t]+([0-9]{1,3})
 \.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})
 /....in-addr.arpa. \tPTR \t.binaryservice.com./gx;

Before:
mail                    IN A    172.24.1.25
ns.binaryservice.com.   IN A    172.24.1.26

# Spaces
photon                    IN A    172.24.1.27
bunyip                    IN   A    172.24.1.28

# Tabs instead of spaces
meeker          IN A    172.24.1.29
meeker          A       172.24.1.30 

# Tabs and spaces
yuan            IN       A       172.24.1.31
crank           IN      A   172.24.1.32

_________________________________________________________________
#### After ###
25.1.24.172.in-addr.arpa.       PTR     mail.binaryservice.com.
26.1.24.172.in-addr.arpa.       PTR     ns.binaryservice.com.
 
# Spaces
27.1.24.172.in-addr.arpa.       PTR     photon.binaryservice.com.
28.1.24.172.in-addr.arpa.       PTR     bunyip.binaryservice.com.
  
# Tabs instead of spaces
29.1.24.172.in-addr.arpa.       PTR     meeker.binaryservice.com.
30.1.24.172.in-addr.arpa.       PTR     meeker.binaryservice.com.
   
# Tabs and spaces
31.1.24.172.in-addr.arpa.       PTR     yuan.binaryservice.com.
32.1.24.172.in-addr.arpa.       PTR     crank.binaryservice.com.