Regular Expression (Regex) for Beginners

By Ameen Chowdhary, Network Security Engineer at Tec-Refresh, Inc.

Find and Replace

If you’re anything like me, then you would remember the day that you found out how to search for text on a webpage or a word document. Someone was probably standing over your shoulder and told you, “You do realize that you can just press Ctrl F right?”

As amazing as the search function can be, computers and the programs that run on them are more than capable of doing a one-to-one match. The text replace feature is also a valuable tool as it reduces the need to perform repetitive labor. If a programmer wants to change the name of a variable, then he can search for its name and replace it everywhere it was referenced. This function saves a user time, and it allows the writer, the programmer, and the engineer to focus their time and energy on producing. As functional and time-saving as the search and replace function can be, it is hard to claim that a basic search and replace function is powerful. 

In the 1950s, the mathematician Stephen Cole Kleene developed regular expression symbols for finite automata or FA. The University of Rochester describes finite automata as “a simple idealized machine used to recognize patterns within input taken from some character set (or alphabet) C. The job of an FA is to accept or reject an input depending on whether the pattern defined by the FA occurs in the input.”

Professor Kleene was able to create a set of notations that represented a specific pattern of text. In the late 1960s, Ken Thompson, a developer for the UNIX system, took Professor Kleene’s regular expressions concept and implemented it into a text editor called ed. Regular expression or as commonly referred to as Regex nowadays, has been integrated into popular programming languages such as Perl, Java, and Python. Regex is also integrated with typical notepad applications such as Sublime, through which we explore the syntax of a search query.

Examples of Working with Regex

To give a practical and straightforward example of Regex, I will show you how I could configure the switch ports on the VLAN “Computers” on my Extreme switch. Instead of manually issuing more than a dozen commands individually, I began by dumping the ports that needed to be configured from an excel sheet into Sublime. With the EXOS configuration syntax in the first and last line, all I needed to do was create an expression that recognized each line’s end, which I was able to achieve with the expression “\n.” I then wanted to replace the end of each port number with a comma. This would effectively put my command in one line and would separate the ports with a comma. I could now copy this text and issue a configuration of nineteen ports with one command.

configure the switch ports on the VLAN "Computers" on my Extreme switch

Many other simple regular expressions can be memorized and easily learned by the average individual who works with data. In this situation, we are working with a document with website entries.

Considering that we want to create an expression that would only identify websites with the domain com, we could use the expression “\.com\>.” This expression consists of a few unique elements. First and foremost, we begin with a backslash, the escape character, which allows us to search for text that contains a period. A period is considered a metacharacter. It contains a special meaning; therefore, if we need to search for a literal period, we must escape the special meaning by typing “\.” and follow it up after with the text we want to search. We end with the expression “\>” as this signifies only matching text with .com at the end of an entry. Notice how “.company” is not selected since .com appears at the beginning of the entry. The expressions we have explored so far are quite simple to learn; however, regular expressions can also be quite sophisticated. The following example should not be intimidating, but rather, it should demonstrate the capacity of regular expressions to correlate with intricate details. Consider we want to identify IP addresses in a text file. We would need to make sure that the expression matches four octets’ entry and only recognizes a range of numbers in each octet between 0-255. A quick Google search to find an expression commonly used is an excellent reminder that you don’t need to create expressions out of thin air to work with them. 

Creating Regular Expression

Concerning the previous examples, this expression is quite sophisticated; however, it satisfies our criteria to avoid any number greater than 255 and any entries with five octets.

Regex Cheat Sheet

There are many cheat sheets available online to help you under the symbols of the many expression. I have gone ahead and reformatted portions of the cheat sheet posted on Stanford’s website to create an introductory cheat sheet.

One of my professors asked the class once, “is it possible to eat an elephant?  Well, it is if you take it one bite at a time.” Taking that philosophy, I would strongly recommend using the RegexOne website as a launching pad as it is tailored for the absolute beginner. It provides interactive piecemeal exercises which gradually introduce users to the Regex syntax.

Sources

  1. https://www.cs.rochester.edu/u/nelson/courses/csc_173/fa/fa.html
  2. http://stanford.edu/~wpmarble/webscraping_tutorial/regex_cheatsheet.pdf
  3. https://www.britannica.com/biography/Stephen-Cole-Kleene
  4. https://www.oreilly.com/library/view/beautiful-code/9780596510046/ch01.html

As a Managed Service Provider (MSP) and Managed Security Services Provider (MSSP), we manage your IT services, offer backup and disaster recovery, and provide network visibility and security. Our knowledgeable and customer-oriented staff engages to accelerate your network and simplify its security, visibility and automation. Questions? Contact Tec-Refresh, Inc.

Leave a Comment

Your email address will not be published. Required fields are marked *

What is digital transformation, and do you need it?