Perl Regex Find and Return Every Possible Match
up vote
2
down vote
favorite
Im trying to create a while loop that will find every possible sub-string within a string. But so far all I can match is the largest instance or the shortest. So for example I have the string
EDIT CHANGE STRING FOR DEMO PURPOSES
"A.....B.....B......B......B......B"
And I want to find every possible sequence of "A.......B"
This code will give me the shortest possible return and exit the while loop
while($string =~ m/(A(.*?)B)/gi) {
print "foundn";
my $substr = $1;
print $substr."n";
}
And this will give me the longest and exit the while loop.
$string =~ m/(A(.*)B)/gi
But I want it to loop through the string returning every possible match. Does anyone know if Perl allows for this?
EDIT ADDED DESIRED OUTPUT BELOW
found
A.....B
found
A.....B.....B
found
A.....B.....B......B
found
A.....B.....B......B......B
found
A.....B.....B......B......B......B
regex perl
|
show 10 more comments
up vote
2
down vote
favorite
Im trying to create a while loop that will find every possible sub-string within a string. But so far all I can match is the largest instance or the shortest. So for example I have the string
EDIT CHANGE STRING FOR DEMO PURPOSES
"A.....B.....B......B......B......B"
And I want to find every possible sequence of "A.......B"
This code will give me the shortest possible return and exit the while loop
while($string =~ m/(A(.*?)B)/gi) {
print "foundn";
my $substr = $1;
print $substr."n";
}
And this will give me the longest and exit the while loop.
$string =~ m/(A(.*)B)/gi
But I want it to loop through the string returning every possible match. Does anyone know if Perl allows for this?
EDIT ADDED DESIRED OUTPUT BELOW
found
A.....B
found
A.....B.....B
found
A.....B.....B......B
found
A.....B.....B......B......B
found
A.....B.....B......B......B......B
regex perl
Could you show what you mean by "every possible match"?
– Schwern
Nov 10 at 8:08
e.g 1st match = start F F Q Q E R Q Q stop
– Philip Butler
Nov 10 at 8:08
2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
– Philip Butler
Nov 10 at 8:09
It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
– Schwern
Nov 10 at 8:10
1
Ah, now the problem makes sense. Thank you
– zdim
Nov 10 at 8:24
|
show 10 more comments
up vote
2
down vote
favorite
up vote
2
down vote
favorite
Im trying to create a while loop that will find every possible sub-string within a string. But so far all I can match is the largest instance or the shortest. So for example I have the string
EDIT CHANGE STRING FOR DEMO PURPOSES
"A.....B.....B......B......B......B"
And I want to find every possible sequence of "A.......B"
This code will give me the shortest possible return and exit the while loop
while($string =~ m/(A(.*?)B)/gi) {
print "foundn";
my $substr = $1;
print $substr."n";
}
And this will give me the longest and exit the while loop.
$string =~ m/(A(.*)B)/gi
But I want it to loop through the string returning every possible match. Does anyone know if Perl allows for this?
EDIT ADDED DESIRED OUTPUT BELOW
found
A.....B
found
A.....B.....B
found
A.....B.....B......B
found
A.....B.....B......B......B
found
A.....B.....B......B......B......B
regex perl
Im trying to create a while loop that will find every possible sub-string within a string. But so far all I can match is the largest instance or the shortest. So for example I have the string
EDIT CHANGE STRING FOR DEMO PURPOSES
"A.....B.....B......B......B......B"
And I want to find every possible sequence of "A.......B"
This code will give me the shortest possible return and exit the while loop
while($string =~ m/(A(.*?)B)/gi) {
print "foundn";
my $substr = $1;
print $substr."n";
}
And this will give me the longest and exit the while loop.
$string =~ m/(A(.*)B)/gi
But I want it to loop through the string returning every possible match. Does anyone know if Perl allows for this?
EDIT ADDED DESIRED OUTPUT BELOW
found
A.....B
found
A.....B.....B
found
A.....B.....B......B
found
A.....B.....B......B......B
found
A.....B.....B......B......B......B
regex perl
regex perl
edited Nov 10 at 8:31
asked Nov 10 at 8:01
Philip Butler
376
376
Could you show what you mean by "every possible match"?
– Schwern
Nov 10 at 8:08
e.g 1st match = start F F Q Q E R Q Q stop
– Philip Butler
Nov 10 at 8:08
2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
– Philip Butler
Nov 10 at 8:09
It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
– Schwern
Nov 10 at 8:10
1
Ah, now the problem makes sense. Thank you
– zdim
Nov 10 at 8:24
|
show 10 more comments
Could you show what you mean by "every possible match"?
– Schwern
Nov 10 at 8:08
e.g 1st match = start F F Q Q E R Q Q stop
– Philip Butler
Nov 10 at 8:08
2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
– Philip Butler
Nov 10 at 8:09
It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
– Schwern
Nov 10 at 8:10
1
Ah, now the problem makes sense. Thank you
– zdim
Nov 10 at 8:24
Could you show what you mean by "every possible match"?
– Schwern
Nov 10 at 8:08
Could you show what you mean by "every possible match"?
– Schwern
Nov 10 at 8:08
e.g 1st match = start F F Q Q E R Q Q stop
– Philip Butler
Nov 10 at 8:08
e.g 1st match = start F F Q Q E R Q Q stop
– Philip Butler
Nov 10 at 8:08
2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
– Philip Butler
Nov 10 at 8:09
2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
– Philip Butler
Nov 10 at 8:09
It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
– Schwern
Nov 10 at 8:10
It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
– Schwern
Nov 10 at 8:10
1
1
Ah, now the problem makes sense. Thank you
– zdim
Nov 10 at 8:24
Ah, now the problem makes sense. Thank you
– zdim
Nov 10 at 8:24
|
show 10 more comments
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
There are various ways to parse the string so to scoop up what you want.
For example, use regex to step through all A...A
substrings and process each capture
use warnings;
use strict;
use feature 'say';
my $s = "A.....B.....B......B......B......B";
while ($s =~ m/(A.*)(?=A|$)/gi) {
my @seqs = split /(B)/, $1;
for my $i (0..$#seqs) {
say @seqs[0..$i] if $i % 2 != 0;
}
}
The (?=A|$)
is a lookahead, so .*
matches everything up to an A
(or the end of string) but that A
is not consumed and so is there for the next match. The split
uses ()
in the separator pattern so that the separator, too, is returned (so we have all those B's). It only prints for an even number of elements, so only substrings ending with the separator (B
here).
The above prints
A.....B
A.....B.....B
A.....B.....B......B
A.....B.....B......B......B
A.....B.....B......B......B......B
There may be bioinformatics modules that do this but I am not familiar with them.
perfect thank you very much
– Philip Butler
Nov 10 at 8:36
Super answer. very concise, exactly what I need
– Philip Butler
Nov 10 at 8:37
1
@PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
– zdim
Nov 10 at 8:39
I learned something new withmy @seqs = split /(B)/, $1;
- didn't know about capturing the delimiter before.
– Automaton
Nov 11 at 0:31
To skip the redundant non-B-ending lines you can dosay @seqs[0..$i] if ($seqs[$i] eq 'B');
– Automaton
Nov 11 at 0:43
|
show 2 more comments
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
There are various ways to parse the string so to scoop up what you want.
For example, use regex to step through all A...A
substrings and process each capture
use warnings;
use strict;
use feature 'say';
my $s = "A.....B.....B......B......B......B";
while ($s =~ m/(A.*)(?=A|$)/gi) {
my @seqs = split /(B)/, $1;
for my $i (0..$#seqs) {
say @seqs[0..$i] if $i % 2 != 0;
}
}
The (?=A|$)
is a lookahead, so .*
matches everything up to an A
(or the end of string) but that A
is not consumed and so is there for the next match. The split
uses ()
in the separator pattern so that the separator, too, is returned (so we have all those B's). It only prints for an even number of elements, so only substrings ending with the separator (B
here).
The above prints
A.....B
A.....B.....B
A.....B.....B......B
A.....B.....B......B......B
A.....B.....B......B......B......B
There may be bioinformatics modules that do this but I am not familiar with them.
perfect thank you very much
– Philip Butler
Nov 10 at 8:36
Super answer. very concise, exactly what I need
– Philip Butler
Nov 10 at 8:37
1
@PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
– zdim
Nov 10 at 8:39
I learned something new withmy @seqs = split /(B)/, $1;
- didn't know about capturing the delimiter before.
– Automaton
Nov 11 at 0:31
To skip the redundant non-B-ending lines you can dosay @seqs[0..$i] if ($seqs[$i] eq 'B');
– Automaton
Nov 11 at 0:43
|
show 2 more comments
up vote
2
down vote
accepted
There are various ways to parse the string so to scoop up what you want.
For example, use regex to step through all A...A
substrings and process each capture
use warnings;
use strict;
use feature 'say';
my $s = "A.....B.....B......B......B......B";
while ($s =~ m/(A.*)(?=A|$)/gi) {
my @seqs = split /(B)/, $1;
for my $i (0..$#seqs) {
say @seqs[0..$i] if $i % 2 != 0;
}
}
The (?=A|$)
is a lookahead, so .*
matches everything up to an A
(or the end of string) but that A
is not consumed and so is there for the next match. The split
uses ()
in the separator pattern so that the separator, too, is returned (so we have all those B's). It only prints for an even number of elements, so only substrings ending with the separator (B
here).
The above prints
A.....B
A.....B.....B
A.....B.....B......B
A.....B.....B......B......B
A.....B.....B......B......B......B
There may be bioinformatics modules that do this but I am not familiar with them.
perfect thank you very much
– Philip Butler
Nov 10 at 8:36
Super answer. very concise, exactly what I need
– Philip Butler
Nov 10 at 8:37
1
@PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
– zdim
Nov 10 at 8:39
I learned something new withmy @seqs = split /(B)/, $1;
- didn't know about capturing the delimiter before.
– Automaton
Nov 11 at 0:31
To skip the redundant non-B-ending lines you can dosay @seqs[0..$i] if ($seqs[$i] eq 'B');
– Automaton
Nov 11 at 0:43
|
show 2 more comments
up vote
2
down vote
accepted
up vote
2
down vote
accepted
There are various ways to parse the string so to scoop up what you want.
For example, use regex to step through all A...A
substrings and process each capture
use warnings;
use strict;
use feature 'say';
my $s = "A.....B.....B......B......B......B";
while ($s =~ m/(A.*)(?=A|$)/gi) {
my @seqs = split /(B)/, $1;
for my $i (0..$#seqs) {
say @seqs[0..$i] if $i % 2 != 0;
}
}
The (?=A|$)
is a lookahead, so .*
matches everything up to an A
(or the end of string) but that A
is not consumed and so is there for the next match. The split
uses ()
in the separator pattern so that the separator, too, is returned (so we have all those B's). It only prints for an even number of elements, so only substrings ending with the separator (B
here).
The above prints
A.....B
A.....B.....B
A.....B.....B......B
A.....B.....B......B......B
A.....B.....B......B......B......B
There may be bioinformatics modules that do this but I am not familiar with them.
There are various ways to parse the string so to scoop up what you want.
For example, use regex to step through all A...A
substrings and process each capture
use warnings;
use strict;
use feature 'say';
my $s = "A.....B.....B......B......B......B";
while ($s =~ m/(A.*)(?=A|$)/gi) {
my @seqs = split /(B)/, $1;
for my $i (0..$#seqs) {
say @seqs[0..$i] if $i % 2 != 0;
}
}
The (?=A|$)
is a lookahead, so .*
matches everything up to an A
(or the end of string) but that A
is not consumed and so is there for the next match. The split
uses ()
in the separator pattern so that the separator, too, is returned (so we have all those B's). It only prints for an even number of elements, so only substrings ending with the separator (B
here).
The above prints
A.....B
A.....B.....B
A.....B.....B......B
A.....B.....B......B......B
A.....B.....B......B......B......B
There may be bioinformatics modules that do this but I am not familiar with them.
edited Nov 11 at 2:57
answered Nov 10 at 8:32
zdim
30.9k32040
30.9k32040
perfect thank you very much
– Philip Butler
Nov 10 at 8:36
Super answer. very concise, exactly what I need
– Philip Butler
Nov 10 at 8:37
1
@PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
– zdim
Nov 10 at 8:39
I learned something new withmy @seqs = split /(B)/, $1;
- didn't know about capturing the delimiter before.
– Automaton
Nov 11 at 0:31
To skip the redundant non-B-ending lines you can dosay @seqs[0..$i] if ($seqs[$i] eq 'B');
– Automaton
Nov 11 at 0:43
|
show 2 more comments
perfect thank you very much
– Philip Butler
Nov 10 at 8:36
Super answer. very concise, exactly what I need
– Philip Butler
Nov 10 at 8:37
1
@PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
– zdim
Nov 10 at 8:39
I learned something new withmy @seqs = split /(B)/, $1;
- didn't know about capturing the delimiter before.
– Automaton
Nov 11 at 0:31
To skip the redundant non-B-ending lines you can dosay @seqs[0..$i] if ($seqs[$i] eq 'B');
– Automaton
Nov 11 at 0:43
perfect thank you very much
– Philip Butler
Nov 10 at 8:36
perfect thank you very much
– Philip Butler
Nov 10 at 8:36
Super answer. very concise, exactly what I need
– Philip Butler
Nov 10 at 8:37
Super answer. very concise, exactly what I need
– Philip Butler
Nov 10 at 8:37
1
1
@PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
– zdim
Nov 10 at 8:39
@PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
– zdim
Nov 10 at 8:39
I learned something new with
my @seqs = split /(B)/, $1;
- didn't know about capturing the delimiter before.– Automaton
Nov 11 at 0:31
I learned something new with
my @seqs = split /(B)/, $1;
- didn't know about capturing the delimiter before.– Automaton
Nov 11 at 0:31
To skip the redundant non-B-ending lines you can do
say @seqs[0..$i] if ($seqs[$i] eq 'B');
– Automaton
Nov 11 at 0:43
To skip the redundant non-B-ending lines you can do
say @seqs[0..$i] if ($seqs[$i] eq 'B');
– Automaton
Nov 11 at 0:43
|
show 2 more comments
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237102%2fperl-regex-find-and-return-every-possible-match%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Could you show what you mean by "every possible match"?
– Schwern
Nov 10 at 8:08
e.g 1st match = start F F Q Q E R Q Q stop
– Philip Butler
Nov 10 at 8:08
2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
– Philip Butler
Nov 10 at 8:09
It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
– Schwern
Nov 10 at 8:10
1
Ah, now the problem makes sense. Thank you
– zdim
Nov 10 at 8:24