Loop search and replace two-part string over file using PowerShell while preserving one of the parts
up vote
2
down vote
favorite
I am new at PowerShell and have not found a Stack Overflow question or a documentation reference that gets me all the way to a successful outcome. If a question or documentation reference already exists that answers this that I overlooked I would be grateful to know.
In a text file is a string like this:
<span><span><span><span><span></span></span></span></span></span>
The number of <span>
and </span>
varies from file to file. For example, in some files it is like this:
<span></span>
Yet in others it is like this:
<span><span></span></span>
And so on. There are likely never going to be more than 24 of each in a string.
I want to eliminate all strings like this in the text file, yet preserve the </span>
in strings like this:
<span style="font-weight:bold;">text</span>
There may be many variations on that kind of string in the text file; for example, <span style="font-size: 10px; font-weight: 400;">text</span>
or <span style="font-size: 10px; font-weight: 400;">text</span>
and I don't know beforehand what variation(s) will be included in the text file.
This partially works...
$original_file = 'in.txt'
$destination_file = 'out.txt'
(Get-Content $original_file) | Foreach-Object {
$_ -replace '<span>', '' `
-replace '</span>', ''
} | Set-Content $destination_file
...but obviously results in something like <span style="font-weight:bold;">text
.
In the PowerShell script above I can use
$_ -replace '<span></span>', '' `
But of course it only catches the <span></span>
in the middle of the string because, as it is written now, it does not loop.
I know it is silly to do something like this
$original_file = 'in.txt'
$destination_file = 'out.txt'
(Get-Content $original_file) | Foreach-Object {
$_ -replace '<span></span>', '' `
-replace '<span></span>', '' `
-replace '<span></span>', '' `
-replace '<span></span>', '' `
-replace '<span></span>', ''
} | Set-Content $destination_file
So because the <span>
string collapses into itself each time the script is run, producing a new inner <span></span>
that can then be removed, the best solution I can think of is to loop the script over the file until it recognizes that all instances of <span></span>
are gone.
I feel like adding logic along these lines is necessary:
foreach($i in 1..24){
Write-Host $i
But have not been able to successfully incorporate it into the script.
If this is the wrong approach entirely I would be grateful to know.
The reason for PowerShell is that my team prefers it for scripts included in an Azure DevOps release pipeline.
Thanks for any ideas or help.
regex powershell
add a comment |
up vote
2
down vote
favorite
I am new at PowerShell and have not found a Stack Overflow question or a documentation reference that gets me all the way to a successful outcome. If a question or documentation reference already exists that answers this that I overlooked I would be grateful to know.
In a text file is a string like this:
<span><span><span><span><span></span></span></span></span></span>
The number of <span>
and </span>
varies from file to file. For example, in some files it is like this:
<span></span>
Yet in others it is like this:
<span><span></span></span>
And so on. There are likely never going to be more than 24 of each in a string.
I want to eliminate all strings like this in the text file, yet preserve the </span>
in strings like this:
<span style="font-weight:bold;">text</span>
There may be many variations on that kind of string in the text file; for example, <span style="font-size: 10px; font-weight: 400;">text</span>
or <span style="font-size: 10px; font-weight: 400;">text</span>
and I don't know beforehand what variation(s) will be included in the text file.
This partially works...
$original_file = 'in.txt'
$destination_file = 'out.txt'
(Get-Content $original_file) | Foreach-Object {
$_ -replace '<span>', '' `
-replace '</span>', ''
} | Set-Content $destination_file
...but obviously results in something like <span style="font-weight:bold;">text
.
In the PowerShell script above I can use
$_ -replace '<span></span>', '' `
But of course it only catches the <span></span>
in the middle of the string because, as it is written now, it does not loop.
I know it is silly to do something like this
$original_file = 'in.txt'
$destination_file = 'out.txt'
(Get-Content $original_file) | Foreach-Object {
$_ -replace '<span></span>', '' `
-replace '<span></span>', '' `
-replace '<span></span>', '' `
-replace '<span></span>', '' `
-replace '<span></span>', ''
} | Set-Content $destination_file
So because the <span>
string collapses into itself each time the script is run, producing a new inner <span></span>
that can then be removed, the best solution I can think of is to loop the script over the file until it recognizes that all instances of <span></span>
are gone.
I feel like adding logic along these lines is necessary:
foreach($i in 1..24){
Write-Host $i
But have not been able to successfully incorporate it into the script.
If this is the wrong approach entirely I would be grateful to know.
The reason for PowerShell is that my team prefers it for scripts included in an Azure DevOps release pipeline.
Thanks for any ideas or help.
regex powershell
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I am new at PowerShell and have not found a Stack Overflow question or a documentation reference that gets me all the way to a successful outcome. If a question or documentation reference already exists that answers this that I overlooked I would be grateful to know.
In a text file is a string like this:
<span><span><span><span><span></span></span></span></span></span>
The number of <span>
and </span>
varies from file to file. For example, in some files it is like this:
<span></span>
Yet in others it is like this:
<span><span></span></span>
And so on. There are likely never going to be more than 24 of each in a string.
I want to eliminate all strings like this in the text file, yet preserve the </span>
in strings like this:
<span style="font-weight:bold;">text</span>
There may be many variations on that kind of string in the text file; for example, <span style="font-size: 10px; font-weight: 400;">text</span>
or <span style="font-size: 10px; font-weight: 400;">text</span>
and I don't know beforehand what variation(s) will be included in the text file.
This partially works...
$original_file = 'in.txt'
$destination_file = 'out.txt'
(Get-Content $original_file) | Foreach-Object {
$_ -replace '<span>', '' `
-replace '</span>', ''
} | Set-Content $destination_file
...but obviously results in something like <span style="font-weight:bold;">text
.
In the PowerShell script above I can use
$_ -replace '<span></span>', '' `
But of course it only catches the <span></span>
in the middle of the string because, as it is written now, it does not loop.
I know it is silly to do something like this
$original_file = 'in.txt'
$destination_file = 'out.txt'
(Get-Content $original_file) | Foreach-Object {
$_ -replace '<span></span>', '' `
-replace '<span></span>', '' `
-replace '<span></span>', '' `
-replace '<span></span>', '' `
-replace '<span></span>', ''
} | Set-Content $destination_file
So because the <span>
string collapses into itself each time the script is run, producing a new inner <span></span>
that can then be removed, the best solution I can think of is to loop the script over the file until it recognizes that all instances of <span></span>
are gone.
I feel like adding logic along these lines is necessary:
foreach($i in 1..24){
Write-Host $i
But have not been able to successfully incorporate it into the script.
If this is the wrong approach entirely I would be grateful to know.
The reason for PowerShell is that my team prefers it for scripts included in an Azure DevOps release pipeline.
Thanks for any ideas or help.
regex powershell
I am new at PowerShell and have not found a Stack Overflow question or a documentation reference that gets me all the way to a successful outcome. If a question or documentation reference already exists that answers this that I overlooked I would be grateful to know.
In a text file is a string like this:
<span><span><span><span><span></span></span></span></span></span>
The number of <span>
and </span>
varies from file to file. For example, in some files it is like this:
<span></span>
Yet in others it is like this:
<span><span></span></span>
And so on. There are likely never going to be more than 24 of each in a string.
I want to eliminate all strings like this in the text file, yet preserve the </span>
in strings like this:
<span style="font-weight:bold;">text</span>
There may be many variations on that kind of string in the text file; for example, <span style="font-size: 10px; font-weight: 400;">text</span>
or <span style="font-size: 10px; font-weight: 400;">text</span>
and I don't know beforehand what variation(s) will be included in the text file.
This partially works...
$original_file = 'in.txt'
$destination_file = 'out.txt'
(Get-Content $original_file) | Foreach-Object {
$_ -replace '<span>', '' `
-replace '</span>', ''
} | Set-Content $destination_file
...but obviously results in something like <span style="font-weight:bold;">text
.
In the PowerShell script above I can use
$_ -replace '<span></span>', '' `
But of course it only catches the <span></span>
in the middle of the string because, as it is written now, it does not loop.
I know it is silly to do something like this
$original_file = 'in.txt'
$destination_file = 'out.txt'
(Get-Content $original_file) | Foreach-Object {
$_ -replace '<span></span>', '' `
-replace '<span></span>', '' `
-replace '<span></span>', '' `
-replace '<span></span>', '' `
-replace '<span></span>', ''
} | Set-Content $destination_file
So because the <span>
string collapses into itself each time the script is run, producing a new inner <span></span>
that can then be removed, the best solution I can think of is to loop the script over the file until it recognizes that all instances of <span></span>
are gone.
I feel like adding logic along these lines is necessary:
foreach($i in 1..24){
Write-Host $i
But have not been able to successfully incorporate it into the script.
If this is the wrong approach entirely I would be grateful to know.
The reason for PowerShell is that my team prefers it for scripts included in an Azure DevOps release pipeline.
Thanks for any ideas or help.
regex powershell
regex powershell
edited Nov 12 at 22:09
LotPings
15.5k61531
15.5k61531
asked Nov 10 at 17:41
hcdocs
696
696
add a comment |
add a comment |
5 Answers
5
active
oldest
votes
up vote
1
down vote
accepted
If you just want to remove any number of empty spans use a Regular Expression with a group and a quantifier:
$original_file = 'in.txt'
$destination_file = 'out.txt'
(Get-Content $original_file) -replace "(<span>)+(</span>)+" |
Set-Content $destination_file
add a comment |
up vote
0
down vote
Try the following .. i've added some comments to clearify things.
# always use absolute paths if possible
$original_file = 'c:tmpin.txt'
$destination_file = 'c:tmpout.txt'
$patternToBeRemoved = '<span></span>'
# store the file contents in a variable
$fileContent = Get-Content -Path $original_file
# save the result of these operations in a new variable and iterate through each line
$newContent = foreach($string in $fileContent) {
# while the pattern you don't want is found it will be removed
while($string.Contains($patternToBeRemoved)) {
$string = $string.Replace($patternToBeRemoved, '')
}
# when it's no longer found the new string is returned
$string
}
# save the new content in the destination file
Set-Content -Path $destination_file -Value $newContent
add a comment |
up vote
0
down vote
$original_file = 'in.txt'
$destination_file = 'out.txt'
ForEach ($Line in (Get-Content $original_file)) {
Do {
$Line = $Line -replace '<span></span>',''
} While ($Line -match '<span></span>')
Set-Content -Path $destination_file -Value $Line
}
add a comment |
up vote
0
down vote
You can use a regular expression together with the -replace
operator to strip all <span>optional content</span>
pairs from a string. That means all pairs where the opening tag does not specify any attributes.
$content = '<span></span><span><span><span style="font-weight:bold;">Foo</span></span></span>'
$regex = '<span>(.*?)</span>'
while ($content -match $regex)
{
$content = $content -replace $regex,'$1'
}
Write-Output $content
The result will be:
<span style="font-weight:bold;">Foo</span>
The while loop takes care of your nested occurrences of the <span></span>
pair.
add a comment |
up vote
0
down vote
$content = '<span></span><span><span><span style="font-weight:bold;">Foo</span></span></span>'
$regex = '<spans+[^<]+</span>'
$null = $content -match $regex
$Matches[0]
New contributor
Welcome to Stack Overflow. While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value.How to Answer
– Elletlar
Nov 11 at 0:19
add a comment |
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
If you just want to remove any number of empty spans use a Regular Expression with a group and a quantifier:
$original_file = 'in.txt'
$destination_file = 'out.txt'
(Get-Content $original_file) -replace "(<span>)+(</span>)+" |
Set-Content $destination_file
add a comment |
up vote
1
down vote
accepted
If you just want to remove any number of empty spans use a Regular Expression with a group and a quantifier:
$original_file = 'in.txt'
$destination_file = 'out.txt'
(Get-Content $original_file) -replace "(<span>)+(</span>)+" |
Set-Content $destination_file
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
If you just want to remove any number of empty spans use a Regular Expression with a group and a quantifier:
$original_file = 'in.txt'
$destination_file = 'out.txt'
(Get-Content $original_file) -replace "(<span>)+(</span>)+" |
Set-Content $destination_file
If you just want to remove any number of empty spans use a Regular Expression with a group and a quantifier:
$original_file = 'in.txt'
$destination_file = 'out.txt'
(Get-Content $original_file) -replace "(<span>)+(</span>)+" |
Set-Content $destination_file
answered Nov 11 at 13:12
LotPings
15.5k61531
15.5k61531
add a comment |
add a comment |
up vote
0
down vote
Try the following .. i've added some comments to clearify things.
# always use absolute paths if possible
$original_file = 'c:tmpin.txt'
$destination_file = 'c:tmpout.txt'
$patternToBeRemoved = '<span></span>'
# store the file contents in a variable
$fileContent = Get-Content -Path $original_file
# save the result of these operations in a new variable and iterate through each line
$newContent = foreach($string in $fileContent) {
# while the pattern you don't want is found it will be removed
while($string.Contains($patternToBeRemoved)) {
$string = $string.Replace($patternToBeRemoved, '')
}
# when it's no longer found the new string is returned
$string
}
# save the new content in the destination file
Set-Content -Path $destination_file -Value $newContent
add a comment |
up vote
0
down vote
Try the following .. i've added some comments to clearify things.
# always use absolute paths if possible
$original_file = 'c:tmpin.txt'
$destination_file = 'c:tmpout.txt'
$patternToBeRemoved = '<span></span>'
# store the file contents in a variable
$fileContent = Get-Content -Path $original_file
# save the result of these operations in a new variable and iterate through each line
$newContent = foreach($string in $fileContent) {
# while the pattern you don't want is found it will be removed
while($string.Contains($patternToBeRemoved)) {
$string = $string.Replace($patternToBeRemoved, '')
}
# when it's no longer found the new string is returned
$string
}
# save the new content in the destination file
Set-Content -Path $destination_file -Value $newContent
add a comment |
up vote
0
down vote
up vote
0
down vote
Try the following .. i've added some comments to clearify things.
# always use absolute paths if possible
$original_file = 'c:tmpin.txt'
$destination_file = 'c:tmpout.txt'
$patternToBeRemoved = '<span></span>'
# store the file contents in a variable
$fileContent = Get-Content -Path $original_file
# save the result of these operations in a new variable and iterate through each line
$newContent = foreach($string in $fileContent) {
# while the pattern you don't want is found it will be removed
while($string.Contains($patternToBeRemoved)) {
$string = $string.Replace($patternToBeRemoved, '')
}
# when it's no longer found the new string is returned
$string
}
# save the new content in the destination file
Set-Content -Path $destination_file -Value $newContent
Try the following .. i've added some comments to clearify things.
# always use absolute paths if possible
$original_file = 'c:tmpin.txt'
$destination_file = 'c:tmpout.txt'
$patternToBeRemoved = '<span></span>'
# store the file contents in a variable
$fileContent = Get-Content -Path $original_file
# save the result of these operations in a new variable and iterate through each line
$newContent = foreach($string in $fileContent) {
# while the pattern you don't want is found it will be removed
while($string.Contains($patternToBeRemoved)) {
$string = $string.Replace($patternToBeRemoved, '')
}
# when it's no longer found the new string is returned
$string
}
# save the new content in the destination file
Set-Content -Path $destination_file -Value $newContent
answered Nov 10 at 18:22
Guenther Schmitz
7431214
7431214
add a comment |
add a comment |
up vote
0
down vote
$original_file = 'in.txt'
$destination_file = 'out.txt'
ForEach ($Line in (Get-Content $original_file)) {
Do {
$Line = $Line -replace '<span></span>',''
} While ($Line -match '<span></span>')
Set-Content -Path $destination_file -Value $Line
}
add a comment |
up vote
0
down vote
$original_file = 'in.txt'
$destination_file = 'out.txt'
ForEach ($Line in (Get-Content $original_file)) {
Do {
$Line = $Line -replace '<span></span>',''
} While ($Line -match '<span></span>')
Set-Content -Path $destination_file -Value $Line
}
add a comment |
up vote
0
down vote
up vote
0
down vote
$original_file = 'in.txt'
$destination_file = 'out.txt'
ForEach ($Line in (Get-Content $original_file)) {
Do {
$Line = $Line -replace '<span></span>',''
} While ($Line -match '<span></span>')
Set-Content -Path $destination_file -Value $Line
}
$original_file = 'in.txt'
$destination_file = 'out.txt'
ForEach ($Line in (Get-Content $original_file)) {
Do {
$Line = $Line -replace '<span></span>',''
} While ($Line -match '<span></span>')
Set-Content -Path $destination_file -Value $Line
}
answered Nov 10 at 18:24
ErikW
947
947
add a comment |
add a comment |
up vote
0
down vote
You can use a regular expression together with the -replace
operator to strip all <span>optional content</span>
pairs from a string. That means all pairs where the opening tag does not specify any attributes.
$content = '<span></span><span><span><span style="font-weight:bold;">Foo</span></span></span>'
$regex = '<span>(.*?)</span>'
while ($content -match $regex)
{
$content = $content -replace $regex,'$1'
}
Write-Output $content
The result will be:
<span style="font-weight:bold;">Foo</span>
The while loop takes care of your nested occurrences of the <span></span>
pair.
add a comment |
up vote
0
down vote
You can use a regular expression together with the -replace
operator to strip all <span>optional content</span>
pairs from a string. That means all pairs where the opening tag does not specify any attributes.
$content = '<span></span><span><span><span style="font-weight:bold;">Foo</span></span></span>'
$regex = '<span>(.*?)</span>'
while ($content -match $regex)
{
$content = $content -replace $regex,'$1'
}
Write-Output $content
The result will be:
<span style="font-weight:bold;">Foo</span>
The while loop takes care of your nested occurrences of the <span></span>
pair.
add a comment |
up vote
0
down vote
up vote
0
down vote
You can use a regular expression together with the -replace
operator to strip all <span>optional content</span>
pairs from a string. That means all pairs where the opening tag does not specify any attributes.
$content = '<span></span><span><span><span style="font-weight:bold;">Foo</span></span></span>'
$regex = '<span>(.*?)</span>'
while ($content -match $regex)
{
$content = $content -replace $regex,'$1'
}
Write-Output $content
The result will be:
<span style="font-weight:bold;">Foo</span>
The while loop takes care of your nested occurrences of the <span></span>
pair.
You can use a regular expression together with the -replace
operator to strip all <span>optional content</span>
pairs from a string. That means all pairs where the opening tag does not specify any attributes.
$content = '<span></span><span><span><span style="font-weight:bold;">Foo</span></span></span>'
$regex = '<span>(.*?)</span>'
while ($content -match $regex)
{
$content = $content -replace $regex,'$1'
}
Write-Output $content
The result will be:
<span style="font-weight:bold;">Foo</span>
The while loop takes care of your nested occurrences of the <span></span>
pair.
edited Nov 10 at 18:49
answered Nov 10 at 18:19
Manuel Batsching
724312
724312
add a comment |
add a comment |
up vote
0
down vote
$content = '<span></span><span><span><span style="font-weight:bold;">Foo</span></span></span>'
$regex = '<spans+[^<]+</span>'
$null = $content -match $regex
$Matches[0]
New contributor
Welcome to Stack Overflow. While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value.How to Answer
– Elletlar
Nov 11 at 0:19
add a comment |
up vote
0
down vote
$content = '<span></span><span><span><span style="font-weight:bold;">Foo</span></span></span>'
$regex = '<spans+[^<]+</span>'
$null = $content -match $regex
$Matches[0]
New contributor
Welcome to Stack Overflow. While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value.How to Answer
– Elletlar
Nov 11 at 0:19
add a comment |
up vote
0
down vote
up vote
0
down vote
$content = '<span></span><span><span><span style="font-weight:bold;">Foo</span></span></span>'
$regex = '<spans+[^<]+</span>'
$null = $content -match $regex
$Matches[0]
New contributor
$content = '<span></span><span><span><span style="font-weight:bold;">Foo</span></span></span>'
$regex = '<spans+[^<]+</span>'
$null = $content -match $regex
$Matches[0]
New contributor
New contributor
answered Nov 11 at 0:13
walid
1
1
New contributor
New contributor
Welcome to Stack Overflow. While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value.How to Answer
– Elletlar
Nov 11 at 0:19
add a comment |
Welcome to Stack Overflow. While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value.How to Answer
– Elletlar
Nov 11 at 0:19
Welcome to Stack Overflow. While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value.How to Answer
– Elletlar
Nov 11 at 0:19
Welcome to Stack Overflow. While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value.How to Answer
– Elletlar
Nov 11 at 0:19
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241692%2floop-search-and-replace-two-part-string-over-file-using-powershell-while-preserv%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown