Wonderment of Regular Expressions

Regular Expressions are awesome!…And useful.

I first learned about regular expressions when I started learning programming. I didn’t immediately understand the usefulness and power of regular expressions. They were confusing and intimidating. MSDN.Microsoft.com states that “A regular expression is a pattern that the regular expression engine attempts to match in input text”. Basically, regular expressions (often shortened to RegEx) are used for pattern matching and become invaluable for string searches. I’ve been using regular expressions with Python. However, regular expressions can be utilized by multiple programming languages and may differ slightly between languages. They’re basically concise magic, like an ancient art that has been rediscovered from obscurity. As you can tell, I’ve come to really love regular expressions. They’re powerful and underestimated.

Let’s look at some common uses and examples:

 

Phone numbers!

\d\d\d-\d\d\d-\d\d\d\d

\d stands for any digit 0-9. So this will only accept the match in this format 555-555-5555.

\d{3}-\d{3}-\d{4}

This one is identical to the one above. Curly brackets { } signify a quantity repeating. In this case \d any digit repeats {3} times, etc.

\(??(\d{3})\)??[-\.\s]??(\d{3})[-\.\s]??(\d{4})

This one will accept numbers in different formats. IE: (555)-555-5555, 555-555-5555, 5555555555, 555.555.5555. You get the idea. It handles many formats. ?? is a quantifier matching between 0 and 1 times. This means something that is ?? may or may not be part of the matching. Just like the parentheses for the area code. Some people add those in and others exclude them. Also something important to note, the back slash means the next character will be taken literally. This is present at the very beginning of the regular expression which means this first parentheses is not a grouping, but I’m telling regex to literally keep an eye out for that first parentheses as part of my match. This is also applied to other symbols that I want to be taken as literal.

Email!
Perhaps you’re looking for email addresses or phone numbers on a long page of text. We know that an email address follows a basic pattern. “Something @ something .com” I looked at a lot of different Regex for email and I didn’t like most of them. This one fits most criteria.

([_a-zA-Z0-9-.+]+)*@([_a-zA-Z0-9-.+]+)*(\.[a-zA-Z]{2,4})

Okay, I'll try to quickly go through this crazy looking thing. Parentheses are groupings. All of the stuff between [] means it'll accept any of the characters lower case a-z, capital A-Z, numbers 0-9, and the symbols (_-.+). + is a quantifier that matches between 1 and infinite number of matches. * is a quantifier that matches between 0 and infinite number of times.

Let’s look at times I’ve used these in my projects

Searching for a specific word or set of words:

Grouping can come in handy when you want to search for groups of words. Parentheses ( ) signify a grouping.

In this program, I was searching a network for specific models of Cisco IP phones. I use a regular expression to find splash pages with “ip phone” somewhere on the page.

Link to lines in project.

verify_cisco = re.compile(r'(ip phone)', re.I)
mo = verify_cisco.search(r.text)
print('Cisco ' + mo.group())

My regular expressions are often just basic word searches. In this one you see ‘re.I’. That’s a flag parameter for the expression to ignore case sensitivity. Remember programs and computers aren’t smart and they’ll only do what you tell them to do! You have to think in a precise manner. In Python, Regex matches return “match objects”. The matching object has a group() method that contains the text or whatever you were searching for. Let’s quickly delineate the differences between re.match, re.search, and re.findall.

re.match() will only match if 0 or more characters match at the beginning of the string.

re.match(pattern, string, flags=0)

re.match(“sea”, “Sally sells sea shells by the shiny sea shore”).group() = FAIL
re.match(“Sally”, “Sally sells sea shells by the shiny sea shore”).group() = SUCCESS
Returns “Sally”

re.search() will match for the first and only the first successful match it comes across in string.

re.search(pattern, string, flags=0)

re.search(“sea”, “Sally sells sea shells by the shiny sea shore”).group() = SUCCESS
Returns “sea”

re.findall() will return a list of groups that match.

re.findall(pattern, string, flags=0)

re.findall(“sea”, “Sally sells sea shells by the shiny sea shore”)
Returns [“sea”, “sea”]

If you want to search for one match from multiple words. The | operator functions as “or”.

Link to lines in project.

typeRegex = re.compile(r'(7942G|7962G|7911G|7925G)')\
mo2 = typeRegex.search(r.text)

Here I’m performing a similar search, but in the grouping I use the “or” operator to signify a match if one of the patterns is present. I was looking for any of these phone models so if the text I was looking at contained one of these then the match was successful.

Keep in mind unsuccessful matches return a None value or null value. If you attempt to put the .group() method at the end of the search for re.match or re.search on an unsuccessful match you’ll get an error message. This is because you’re attempting to call a method on a None value that has no methods. Remember the match object has the .group() method and you’ll have to catch those kind of errors.

These are some easy RegEx searches. They obviously get much more complicated, but I’ll leave that for another post. I’ll leave some good regular expression resources below.

https://regex101.com/#python
http://pythex.org/
https://www.debuggex.com/
http://regexr.com/
https://regexcrossword.com/
http://regexone.com/
https://regex.netlify.com/cheat-sheet

 

I use Regex101 quite a bit. They have a unit test section that comes in really handy if you’re trying to test a regular expression and you’ll have a lot of possibilities you want to test against. Instead of rewriting them over and over you can just save the test and test the regexes in batches. It’s pretty great! Here’s a screenshot from the my tests of the phone number example above.

 

I really liked “Automate the Boring Stuff with Python” from Al Sweigart. He has a great chapter in his book about regular expressions that helps shed light and understanding on the subject. Link to the chapter. Be sure to support the author.

If you have some good resources for regular expressions, if you want to share a project you used them in, if you have any suggestions, or if you have any questions; send me an email! Thanks for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *