Starting Regular Expressions for Ruby Beginners

Published: 2014-08-12

If you are working with strings then the next step up is 'Regular Expressions'.

When I first started with regular expressions I didn't grasp the concept until I saw some simple examples that showed me the basics. I never did find a good resource to guide me through, I just stumbled my way along.
This post will hopefully be a 'first step in the right direction' for you if you are just getting started as well. Hopefully you will leave here with a sense of what can be done with regular expressions, and not be lost in the dark like I was when I first started.

Pro Tip: Let me just say right now that the Rubular website is the easiest way to work with Regular Expressions. You paste in your text and can then build and test your regular expression live in the browser.
As soon as you have an idea about what you want your regular expression to do, then head to the Rubular site and get testing.

Detecting Strings that match a Pattern

Standard: does this string have that string?

/my string/ =~ "this is my string for testing" # 8 (which means true/yes)
On the left is our regular expression, it begins and ends with a /
In this case we are just searching for a simple text string, nothing fancy.

Does the string match from the start or the end of the string?

/\Amy string/ =~ "this is my string for testing" # nil - no match
/my string\z/ =~ "this is my string for testing" # nil - no match
Once again, just a simple text string, this time with a special character.
The \A symbol is the start of the string, so it would only match to strings that start that way.
The \z symbol means the end of the string so it only matches to strings that end this way.

Now lets change the search string so it will match the start or the end of our test string

/\Athis is/ =~ "this is my string for testing" # 0 - match found
/for testing\z/ =~ "this is my string for testing" # 18 - match found

Looking for this or that

/(this|that)/ =~ "this is my string for testing" # 0 - match found
/(this|that)/ =~ "that is my string for testing" # 0 - match found
/(this|that)/ =~ "not my string for testing" # nil - no match
/(what|is|test)/ =~ "that is my string for testing" # 5 - match found
Notice how the | symbol works as an OR just like in a regular IF statement
Make sure to but brackets around your OR searches

Extracting part of a string that matches a Pattern

How to extract a specific section of text when you know the exact words around it

/My Name is (\w+)/.match("Hi, My Name is Adam!")[1] # "Adam"
/\AHi, My Name is (\w+)/.match("this is my string for testing") # nil 
Here we are looking for the persons name which we know is preceded by "My Name is "
The expression \w+ means: One or more (this is the +) of any word character (the \w). Notice how it ignores the ! as it isn't a word character.
The .match method searches our string and any matches to the expression in brackets ( ) are stored in a match class

Match all occurrences of a regular expression in a string

"Hi, My Name is Adam! Hi, My Name is Steve!".scan(/My Name is (\w+)/) # [["Adam"], ["Steve"]] 
"Hi, My Name is Adam! Hi, My Name is Steve!".scan(/My Name is (\w+)/)[1][0] # "Steve"
Here we go back to the string class and use scan. This time the results are put into an array
Once again the expression surrounded by ( ) match are what is captured.

Easily Modifying Strings in a complex way

When modifying strings with regular expressions, .gsub (supports !) is your best friend.
The same rules apply
If you just need to modify the first match, then .sub will do just that.

Convert multiple spaces in a string into a single space

"This  is  my test   string   for    testing".gsub(/ +/, ' ') # "This is my test string for testing"
This is saying find all occurrences of 1 or more (this is what + means) spaces and replace it with a single space

Convert all , to ;

"10,123,1234,123,589,940".gsub(/,/, ';') # "10;123;1234;123;589;940" 

Fix inconsistent spacing between csv style lists

"10, 123,1234, 123,589, 940".gsub(/,\s*/, ', ') # "10, 123, 1234, 123, 589, 940"
"10, 123,1234, 123,589, 940".gsub(/,\s*/, ',') # "10,123,1234,123,589,940" 
"10, 123,1234, 123,589, 940".gsub(/,\s*/, ',') # "10,123,1234,123,589,940" 
"10  , 123,1234, 123,589, 940".gsub(/,\s*/, ', ') # "10  , 123, 1234, 123, 589, 940"
"10  , 123,1234, 123,589, 940".gsub(/\s*,\s*/, ', ') # "10, 123, 1234, 123, 589, 940"
Here we look for , with 0 or more (this is what the * means) whitespace characters (this is what \s means)

Moving forward with Regular Expressions

It's always a great idea to test on as much sample data as you can.
While the examples here are simple, once you have the core concepts down, it won't take you more than a few minutes in Rubular to build your own custom regular expressions for any situation.

Always write tests to make sure your regular expressions are working as designed and to catch any 'accidental' changes.
Writing the test first with your before and after strings for comparison will let you know that everything is working correctly.

Resources

Rubular - a Ruby regular expression editor that lets you test and edit your regular expressions on the fly right in the browser.

Ruby Regexp Class - The Ruby Regexp Class methods.



comments powered by Disqus