Regular expressions (regex) match and parse text. The regex language is a powerful shorthand for describing patterns. Powershell makes use of regular expressions in several ways. Sometimes it is easy to forget that these commands are using regex becuase it is so tightly integrated. You may already be using some of these commands and not even realize it.
Image from xkcd.com, slightly altered
Index
- Index
- Scope of this article
- Select-String
- -match
- -replace
- -split
- Switch
- ValidatePattern
- $Matches
- .Net Regex
- Multiple matches per line
- Should match
- Putting it all together
Scope of this article
Teaching the regex syntax and language is beyond the scope of this article. I will just cover what I need in order to focus on the PowerShell. My regex examples will intentionally be very basic.
Regex quick start
You can use normal numbers and characters in your patterns for exact matches. This works when you know exactly what needs to match. Sometimes you need a pattern where any digit or letter should make the match valid. Here are some basic patterns that I may use in these examples.
\d digit [0-9]
\w alpha numeric [a-zA-Z0-9_]
\s whitespace character
. any character except newline
() sub-expression
\ escape the next character
So a pattern of \d\d\d-\d\d-\d\d\d\d
will match a USA social security number. Three digits, then a dash, two digits, then a dash and then 4 digits. There are better and more compact ways to represent that same pattern. But this will work for our examples today.
Regex resources
Here are some regular expression resources to help you find the right patterns for your task.
Interactive regex calculators:
Documentation and training:
Select-String
This cmdlet is great for searching files or strings for a text pattern.
Get-ChildItem -Path $logFolder | Select-String -Pattern 'Error'
This example searches all the files in the $logFolder
for lines that have the word Error
. The pattern parameter is a regular expression and in this case, the word Error
is valid regex. It will find any line that has the word error in it.
Get-ChildItem -Path $logFolder |
Select-String -Pattern '\d\d\d-\d\d-\d\d\d\d'
This one would search text documents for numbers that look like a USA Social Security Number.
-match
The -match
opperator takes a regular expression and returns $true
if the pattern matches.
PS> $message = 'there is an error with your file'
PS> $message -match 'error'
True
PS> '123-45-6789' -match '\d\d\d-\d\d-\d\d\d\d'
True
If you apply a match to an array, you will get a list of all the items that match the pattern.
PS> $data = @(
"General text without meaning"
"my ssn is 123-45-6789"
"some other string"
"another SSN 123-12-1234"
)
PS> $data -match '\d\d\d-\d\d-\d\d\d\d'
my ssn is 123-45-6789
another SSN 123-12-1234
Variations
-imatch
makes it explicit that you are doing a case insensitive operation (the default)
-cmatch
makes the operation case sensitive.
-notmatch
returns true when there is no match.
The i
and c
variants of an operator is available for all comparison operators.
-like
The -like
command is like -match
except it does not use regex. It uses a simpler wildcard pattern where ?
is any character and *
is multiple unknown characters.
$message -like '*error*'
One important difference is that the -like
command expects an exact match unless you include the wildcards. So if you are looking for a pattern within a larger string, you will need to add the wildcards on both ends. '*error*'
Sometimes all you need is a basic wildcard and that is where -like
comes in.
This operator has -ilike
, -clike
, -notlike
variants.
String.Contains()
If all you want to do is test to see if your string has a substring, you can use the string.contains($substring)
appraoch.
PS> $message = 'there is an error with your file'
PS> $message.contains('error')
True
string.contains()
is case sensitive. This will perform faster then using the other opperators for this substring scenario.
-replace
The replace command uses regex for it’s pattern matching.
PS> $message = "Hi, my name is Dave."
PS> $message -replace 'Dave','Kevin'
Hi, my name is Kevin.
PS> $message = "My SSN is 123-45-6789."
PS> $message -replace '\d\d\d-\d\d-\d\d\d\d', '###-##-####'
My SSN is ###-##-####.
The other variants of this command are -creplace
and -ireplace
.
String.Replace()
The .Net String.Replace($pattern,$replacement)
funciton does not use regex. I mention this because it performs faster than -replace
.
PS> $message = "Hi, my name is Dave."
PS> $message.replace('Dave','Kevin')
Hi, my name is Kevin.
This one is also case sensitive. Infact, all the string funtions are case sensitive.
-split
This command is very often overlooked as one that uses a regex. We are often splitting on simple patterns that happen to be regex compatible that we never even notice.
PS> 'CA,TX,NE' -split ','
CA
TX
NE
Every once and a while, we will try to use some other character that means something else in regex. This will lead to very unexpected results. If we change our comma to a period, we get a bunch of blank lines.
PS> 'CA.TX.NE' -split '.'
PS>
The reason is that .
will match any character, so in this case it matches every character. It ends up spliting at every character and giving us 9 empty values.
PS> ('CA.TX.NE' -split '.').count
9
This is why it is important to remember what commands use regex.
-isplit
and -csplit
are the variants on this command.
String.Split()
Like with the replace command, there is a String.Split()
function that does not use regex. It will be faster when splitting on a character (or substring) and give you the same results.
Switch
By default, the switch
statement does exact matches. But it does have an -regex
option to use regex matches instead.
switch -regex ($message)
{
'\d\d\d-\d\d-\d\d\d\d' {
Write-Warning 'message may contain a SSN'
}
'\d\d\d\d-\d\d\d\d-\d\d\d\d-\d\d\d\d' {
Write-Warning 'message may contain a credit card number'
}
'\d\d\d-\d\d\d-\d\d\d\d' {
Write-Warning 'message may contain a phone number'
}
}
This feature of switch
is often overlooked.
Multiple switch matches
The interesting thing about using regex in a switch is that it will test each pattern so you can have several matches to one switch.
Run this example with the above switch statement:
PS> $message = "Hey, call me at 123-456-1234, there is an issue with my 1234-5678-8765-4321 card"
WARNING: message may contain a credit card number
WARNING: message may contain a phone number
Even though we had one string in the $message
, 2 of the switch statements executed.
ValidatePattern
When creating an advanced function, you can add a [ValidatePattern()]
to your parameter. This will validate the incomming value has the pattern that you expect.
function Get-Data
{
[cmdletbinding()]
param(
[ValidatePattern('\d\d\d-\d\d-\d\d\d\d')]
[string]
$SSN
)
# ... #
}
This example requests a SSN from the user and it does the validation on the input. This will give the user an error message if not valid. My issue with this is that it does not give a good error message by default.
PS> Get-Data 'Kevin'
get-data : Cannot validate argument on parameter 'SSN'. The argument "Kevin" does not match
the "\d\d\d-\d\d-\d\d\d\d" pattern. Supply an argument that matches "\d\d\d-\d\d-\d\d\d\d"
and try the command again.
ValidateScript
One way around that is to use a [ValidateScript({...})]
instead that throws a custom error message.
[ValidateScript({
if( $_ -match '\d\d\d-\d\d-\d\d\d\d')
{
$true
}
else
{
throw 'Please provide a valid SSN (ex 123-45-5678)'
}
})]
Now we get this error message
PS> get-data 'Kevin'
get-data : Cannot validate argument on parameter 'SSN'.
Please provide a valid SSN (ex 123-45-5678)
It may complicate our parameter, but it is much easier for our users to understand.
Validate ErrorMessage in PS 6
The use of a validate script just to give a good error message is kind of ugly. A new feature in PS 6 is that you can specify a custom error message for the ValiatePattern by using the ErrorMessage parameter. Here is how you would specify the ErrorMessage
[ValidatePattern('\d\d\d-\d\d-\d\d\d\d',ErrorMessage = 'The pattern does not match a valid US SSN format.')]
When a value does not match, then the user is presented with the error below.
Get-Data : Cannot validate argument on parameter 'SSN'. The pattern does not match a valid US SSN format.
While this is a great new feature, it makes the code invalid for older versions of PowerShell. If I run that same code in Windows Powershell, I get this error message:
Property 'ErrorMessage' cannot be found for type 'System.Management.Automation.ValidatePatternAttribute'.
Validators on variables
We mostly think of validators as part of an advanced function but the reality is that they apply to the variable and can be used outside of an advanced function.
PS> [ValidatePattern('\d\d\d-\d\d-\d\d\d\d')]
PS> [string]$SSN = '123-45-6789'
PS> $SSN = "I don't know"
The variable cannot be validated because the value `I don't know`
is not a valid value for the SSN variable.
I can’t say that I realy ever do this, but this would be a good trick to know.
$Matches
When you use the -match
operator, an automatic variable called $matches
contains the results of the match. If you have any sub expressions in your regex, those sub matches are also listed.
$message = 'My SSN is 123-45-6789.'
$message -match 'My SSN is (\d\d\d-\d\d-\d\d\d\d)\.'
$Matches[0]
$Matches[1]
My SSN is 123-45-6789.
123-45-6789
Named matches
This is one of my favorite features that most people don’t know about.If you use a named regex match, then you can access that match by name on the matches.
$message = 'My Name is Kevin and my SSN is 123-45-6789.'
if($message -match 'My Name is (?<Name>.+) and my SSN is (?<SSN>\d\d\d-\d\d-\d\d\d\d)\.')
{
$Matches.Name
$Matches.SSN
}
In the example above, the (?<Name>.+)
is a named sub expression. This value is then placed in the $Matches.Name
property. Same goes for SSN.
.Net Regex
Because this is PowerShell, we have full access to the .net regex object. Most of them are covered by the functionality above. If you are getting into more advanced regex where you need custom options, then take a second look at this object.
[regex]::new($pattern) | Get-Member
All the .Net regex methods are case sensitive.
I’m going to touch on [regex]::Escape()
because there is not a PowerShell equivalent.
Escape regex
regex is a complex language with common symbols and a shorthand syntax. There are times where you may want to match a literal value instead of a pattern. The [regex]::Escape()
will escape out all the regex syntax for you.
Take this string for example (123)456-7890
. It contains regex syntax that may not be obvious to you.
PS> $message = $message = 'My phone is (123)456-7890'
PS> $message -match '(123)456-7890'
False
You may think this is matching a specific phone number but the thing it would match is 123456-7890
. My point is that when you use a literal string where a regex is expected, that you will get unexpected results. This is where the [regex]::Escape()
solves that issue.
PS> $message -match [regex]::Escape('(123)456-7890')
True
I don’t want to talk on this too much because this is an anti-pattern. If you are needing to regex escape your entire pattern before you match it, then you should use the String.Contains()
method instead.
The only time you should be escaping a regex is if you are placing that value inside a more complex regex. Even that is solved with a more complex regex pattern.
If you are using this in your code. Rethink why you need it because odds are, you are using the wrong operator or method.
Multiple matches per line
The -Match
operator will only match once per line so the $matches
variable only contains that first match. There are times where I want to grab every occurace of a pattern even if there are multiples per line. I have 2 ways that I approach this scenario.
-AllMatches
Select-String
offers support for this with the -AllMatches
parameter. In this case the returned object contains a Matches
property for every match.
PS> $data = 'The event runs from 2018-10-06 to 1018-10-09'
PS> $datePattern = '\d\d\d\d-\d\d-\d\d'
PS> $results = $data | Select-String $datePattern -AllMatches
PS> $results.Matches.Value
2018-10-06
2018-10-09
Regex Matches()
The [Regex]
object method Matches is the other option.
PS> $data = 'The event runs from 2018-10-06 to 1018-10-09'
PS> $datePattern = [Regex]::new('\d\d\d\d-\d\d-\d\d')
PS> $matches = $datePattern.Matches($data)
PS> $matches.Value
2018-10-06
2018-10-09
Should match
When using Pester tests, the Should Match
uses a regular expression.
It "contains a SSN"{
$message = Get-Data
$message | Should Match '\d\d\d-\d\d-\d\d\d\d'
}
When with Pester is the exception to the rule of not using [regex]::Escape()
. Pester does not have a substring match alternative.
It "contains $subString"{
$message = Get-Data
$message | Should Match ([regex]::Escape($subString))
}
Putting it all together
As you can see, there are a lot of places where you can use regex or may already using regex and not even know it. PowerShell did a good job of integrating these into the language. But be wary of using them if performance is a concern and you are not actually using regex pattern.
Let me know if you discover any other common ways to use regex in PowerShell. I would love to hear about them and add them to my list.