Pattern matching and regular expressions
Check if a string matches a regular expression
Section titled “Check if a string matches a regular expression”Check if a string consists in exactly 8 digits:
$ date=20150624$ [[ $date =~ ^[0-9]{8}$ ]] && echo "yes" || echo "no"yes$ date=hello$ [[ $date =~ ^[0-9]{8}$ ]] && echo "yes" || echo "no"noBehaviour when a glob does not match anything
Section titled “Behaviour when a glob does not match anything”Preparation
$ mkdir globbing$ cd globbing$ mkdir -p folder/{sub,another}folder/content/deepfolder/touch macy stacy tracy "file with space" folder/{sub,another}folder/content/deepfolder/file .hiddenfile$ shopt -u nullglob$ shopt -u failglob$ shopt -u dotglob$ shopt -u nocaseglob$ shopt -u extglob$ shopt -u globstarIn case the glob does not match anything the result is determined by the
options nullglob and failglob. If neither of them are set, Bash will return the glob itself if nothing is matched
$ echo no*matchno*matchIf nullglob is activated then nothing (null) is returned:
$ shopt -s nullglob$ echo no*match
$If failglob is activated then an error message is returned:
$ shopt -s failglob$ echo no*matchbash: no match: no*match$Notice, that the failglob option supersedes the nullglob option, i.e.,
if nullglob and failglob are both set, then - in case of no match - an
error is returned.
Get captured groups from a regex match against a string
Section titled “Get captured groups from a regex match against a string”a='I am a simple string with digits 1234'pat='(.*) ([0-9]+)'[[ "$a" =~ $pat ]]echo "${BASH_REMATCH[0]}"echo "${BASH_REMATCH[1]}"echo "${BASH_REMATCH[2]}"Output:
I am a simple string with digits 1234I am a simple string with digits1234The * glob
Section titled “The * glob”Preparation
$ mkdir globbing$ cd globbing$ mkdir -p folder/{sub,another}folder/content/deepfolder/touch macy stacy tracy "file with space" folder/{sub,another}folder/content/deepfolder/file .hiddenfile$ shopt -u nullglob$ shopt -u failglob$ shopt -u dotglob$ shopt -u nocaseglob$ shopt -u extglob$ shopt -u globstarThe asterisk * is probably the most commonly used glob. It simply matches any String
$ echo *acymacy stacy tracyA single * will not match files and folders that reside in subfolders
$ echo *emptyfolder folder macy stacy tracy$ echo folder/*folder/anotherfolder folder/subfolderThe ** glob
Section titled “The ** glob”Preparation
$ mkdir globbing$ cd globbing$ mkdir -p folder/{sub,another}folder/content/deepfolder/touch macy stacy tracy "file with space" folder/{sub,another}folder/content/deepfolder/file .hiddenfile$ shopt -u nullglob$ shopt -u failglob$ shopt -u dotglob$ shopt -u nocaseglob$ shopt -u extglob$ shopt -s globstarBash is able to interpret two adjacent asterisks as a single glob. With the globstar
option activated this can be used to match folders that reside deeper in the directory structure
echo **emptyfolder folder folder/anotherfolder folder/anotherfolder/content folder/anotherfolder/content/deepfolder folder/anotherfolder/content/deepfolder/file folder/subfolder folder/subfolder/content folder/subfolder/content/deepfolder folder/subfolder/content/deepfolder/file macy stacy tracyThe ** can be thought of a path expansion, no matter how deep the path is.
This example matches any file or folder that starts with deep, regardless of how
deep it is nested:
$ echo **/deep*folder/anotherfolder/content/deepfolder folder/subfolder/content/deepfolderThe ? glob
Section titled “The ? glob”Preparation
$ mkdir globbing$ cd globbing$ mkdir -p folder/{sub,another}folder/content/deepfolder/touch macy stacy tracy "file with space" folder/{sub,another}folder/content/deepfolder/file .hiddenfile$ shopt -u nullglob$ shopt -u failglob$ shopt -u dotglob$ shopt -u nocaseglob$ shopt -u extglob$ shopt -u globstarThe ? simply matches exactly one character
$ echo ?acymacy$ echo ??acystacy tracyThe [ ] glob
Section titled “The [ ] glob”Preparation
$ mkdir globbing$ cd globbing$ mkdir -p folder/{sub,another}folder/content/deepfolder/touch macy stacy tracy "file with space" folder/{sub,another}folder/content/deepfolder/file .hiddenfile$ shopt -u nullglob$ shopt -u failglob$ shopt -u dotglob$ shopt -u nocaseglob$ shopt -u extglob$ shopt -u globstarIf there is a need to match specific characters then ’[]’ can be used. Any character inside ’[]’ will be matched exactly once.
$ echo [m]acymacy$ echo [st][tr]acystacy tracyThe [] glob, however, is more versatile than just that. It also allows
for a negative match and even matching ranges of characters and
characterclasses. A negative match is achieved by using ! or ^ as the first
character following [. We can match stacy by
$ echo [!t][^r]acystacyHere we are telling bash the we want to match only files which do not not
start with a t and the second letter is not an r and the file ends in
acy.
Ranges can be matched by seperating a pair of characters with a hyphen (-). Any
character that falls between those two enclosing characters - inclusive - will
be matched. E.g., [r-t] is equivalent to [rst]
$ echo [r-t][r-t]acystacy tracyCharacter classes can be matched by [:class:], e.g., in order to match files
that contain a whitespace
$ echo *[[:blank:]]*file with spaceMatching hidden files
Section titled “Matching hidden files”Preparation
$ mkdir globbing$ cd globbing$ mkdir -p folder/{sub,another}folder/content/deepfolder/touch macy stacy tracy "file with space" folder/{sub,another}folder/content/deepfolder/file .hiddenfile$ shopt -u nullglob$ shopt -u failglob$ shopt -u dotglob$ shopt -u nocaseglob$ shopt -u extglob$ shopt -u globstarThe Bash built-in option dotglob allows to match hidden files
and folders, i.e., files and folders that start with a .
$ shopt -s dotglob$ echo *file with space folder .hiddenfile macy stacy tracyCase insensitive matching
Section titled “Case insensitive matching”Preparation
$ mkdir globbing$ cd globbing$ mkdir -p folder/{sub,another}folder/content/deepfolder/touch macy stacy tracy "file with space" folder/{sub,another}folder/content/deepfolder/file .hiddenfile$ shopt -u nullglob$ shopt -u failglob$ shopt -u dotglob$ shopt -u nocaseglob$ shopt -u extglob$ shopt -u globstarSetting the option nocaseglob will match the glob in a case insensitive
manner
$ echo M*M*$ shopt -s nocaseglob$ echo M*macyExtended globbing
Section titled “Extended globbing”Preparation
$ mkdir globbing$ cd globbing$ mkdir -p folder/{sub,another}folder/content/deepfolder/touch macy stacy tracy "file with space" folder/{sub,another}folder/content/deepfolder/file .hiddenfile$ shopt -u nullglob$ shopt -u failglob$ shopt -u dotglob$ shopt -u nocaseglob$ shopt -u extglob$ shopt -u globstarBash’s built-in extglob option can extend a glob’s matching capabilities
shopt -s extglobThe following sub-patterns comprise valid extended globs:
?(pattern-list)– Matches zero or one occurrence of the given patterns*(pattern-list)– Matches zero or more occurrences of the given patterns+(pattern-list)– Matches one or more occurrences of the given patterns@(pattern-list)– Matches one of the given patterns!(pattern-list)– Matches anything except one of the given patterns
The pattern-list is a list of globs separated by |.
$ echo *([r-t])acystacy tracy
$ echo *([r-t]|m)acymacy stacy tracy
$ echo ?([a-z])acymacyThe pattern-list itself can be another, nested extended glob. In the above
example we have seen that we can match tracy and stacy with *(r-t).
This extended glob itself can be used inside the negated extended glob
!(pattern-list) in order to match macy
$ echo !(*([r-t]))acymacyIt matches anything that does not start with zero or more occurrences of the
letters r, s and t, which leaves only macy as possible match.
Regex matching
Section titled “Regex matching”pat='[^0-9]+([0-9]+)'s='I am a string with some digits 1024'[[ $s =~ $pat ]] # $pat must be unquotedecho "${BASH_REMATCH[0]}"echo "${BASH_REMATCH[1]}"Output:
I am a string with some digits 10241024Instead of assigning the regex to a variable ($pat) we could also do:
[[ $s =~ [^0-9]+([0-9]+) ]]Explanation
- The
[[ $s =~ $pat ]]construct performs the regex matching - The captured groups i.e the match results are available in an array named BASH_REMATCH
- The 0th index in the BASH_REMATCH array is the total match
- The i’th index in the BASH_REMATCH array is the i’th captured group, where i = 1, 2, 3 …
Syntax
Section titled “Syntax”- $ shopt -u option # Deactivate Bash’s built-in ‘option’
- $ shopt -s option # Activate Bash’s built-in ‘option’
Remarks
Section titled “Remarks”Character Classes
Valid character classes for the [] glob are defined by the POSIX standard:
alnum alpha ascii blank cntrl digit graph lower print punct space upper word xdigit
Inside [] more than one character class or range can be used, e.g.,
$ echo a[a-z[:blank:]0-9]*will match any file that starts with an a and is followed by either a lowercase letter or a blank or a digit.
It should be kept in mind, though, that a [] glob can only be wholly negated and not only parts of it. The negating character must be the first character following the opening [, e.g., this expression matches all files that do not start with an a
$ echo [^a]*The following does match all files that start with either a digit or a ^
$ echo [[:alpha:]^a]*It does not match any file or folder that starts with with letter except an a because the ^ is interpreted as a literal ^.
Escaping glob characters
It is possible that a file or folder contains a glob character as part of its name. In this case a glob can be escaped with a preceding \ in order for a literal match. Another approach is to use double "" or single '' quotes to address the file.
Bash does not process globs that are enclosed within "" or ''.
Difference to Regular Expressions
The most significant difference between globs and Regular Expressions is that
a valid Regular Expressions requires a qualifier as well as a quantifier.
A qualifier identifies what to match and a quantifier tells how often
to match the qualifier. The equivalent RegEx to the * glob is .* where
. stands for any character and * stands for zero or more matches of the
previous character. The equivalent RegEx for the ? glob is .{1}. As
before, the qualifier . matches any character and the {1} indicates to
match the preceding qualifier exactly once. This should not be confused with
the ? quantifier, which matches zero or once in a RegEx.
The [] glob is can be used just the same in a RegEx, as long as it is
followed by a mandatory quantifier.
Equivalent Regular Expressions
| Glob | RegEx |
|---|---|
* | .* |
? | . |
[] | [] |