Python RE Module: Word Boundaries


In regular expressions, the token “\b” is used for “word boundaries”.  It will match at the start or at the end of a word.  It is called an “anchor“.  To match a full word in a string, it should be used as “\bcat\b” if we are looking for the word cat and will match the word “cat” in the string “My cat is brown”.  Although it should be noted that it will not match the word “cat” in “category”.

The regex itself is quite simple, however in Python, running this code:

import re
patt = "\bcat\b"
str = "My cat is brown"
print re.search(patt, str).group()

will give an error as the search object will not find anything.

The solution to fix this would be to escape “\b” as it is also used as the backspace control sequence.  This modified version of the above code will give you the result and find the word cat:

import re
patt = "\\bcat\\b"
str = "My cat is brown"
print re.search(patt, str).group()

If you would prefer to use the raw string literals instead of explicitly declarining your pattern, you don’t need to escape the “\b”  because raw strings use different rules for interpreting backslash escape sequences. Example:

import re
str = "My cat is brown"
print re.search(r'\bcat\b', str).group()

To define a word boundary using the re module in python, you must remember to escape the “\b” as in “\\b”   as it matches a single backspace character when interpreted as a regular expression, especially if you are defining your pattern in a separate string variable.  It should be noted though that when writing Regular Expression patterns in Python, it is recommended to use raw string literals as in our last example.


Leave a Reply

Your email address will not be published. Required fields are marked *