Perl Regex Find and Return Every Possible Match











up vote
2
down vote

favorite
1












Im trying to create a while loop that will find every possible sub-string within a string. But so far all I can match is the largest instance or the shortest. So for example I have the string



EDIT CHANGE STRING FOR DEMO PURPOSES



"A.....B.....B......B......B......B"


And I want to find every possible sequence of "A.......B"



This code will give me the shortest possible return and exit the while loop



while($string =~ m/(A(.*?)B)/gi) {
print "foundn";
my $substr = $1;
print $substr."n";
}


And this will give me the longest and exit the while loop.



$string =~ m/(A(.*)B)/gi


But I want it to loop through the string returning every possible match. Does anyone know if Perl allows for this?



EDIT ADDED DESIRED OUTPUT BELOW



found
A.....B
found
A.....B.....B
found
A.....B.....B......B
found
A.....B.....B......B......B
found
A.....B.....B......B......B......B









share|improve this question
























  • Could you show what you mean by "every possible match"?
    – Schwern
    Nov 10 at 8:08










  • e.g 1st match = start F F Q Q E R Q Q stop
    – Philip Butler
    Nov 10 at 8:08










  • 2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
    – Philip Butler
    Nov 10 at 8:09










  • It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
    – Schwern
    Nov 10 at 8:10






  • 1




    Ah, now the problem makes sense. Thank you
    – zdim
    Nov 10 at 8:24

















up vote
2
down vote

favorite
1












Im trying to create a while loop that will find every possible sub-string within a string. But so far all I can match is the largest instance or the shortest. So for example I have the string



EDIT CHANGE STRING FOR DEMO PURPOSES



"A.....B.....B......B......B......B"


And I want to find every possible sequence of "A.......B"



This code will give me the shortest possible return and exit the while loop



while($string =~ m/(A(.*?)B)/gi) {
print "foundn";
my $substr = $1;
print $substr."n";
}


And this will give me the longest and exit the while loop.



$string =~ m/(A(.*)B)/gi


But I want it to loop through the string returning every possible match. Does anyone know if Perl allows for this?



EDIT ADDED DESIRED OUTPUT BELOW



found
A.....B
found
A.....B.....B
found
A.....B.....B......B
found
A.....B.....B......B......B
found
A.....B.....B......B......B......B









share|improve this question
























  • Could you show what you mean by "every possible match"?
    – Schwern
    Nov 10 at 8:08










  • e.g 1st match = start F F Q Q E R Q Q stop
    – Philip Butler
    Nov 10 at 8:08










  • 2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
    – Philip Butler
    Nov 10 at 8:09










  • It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
    – Schwern
    Nov 10 at 8:10






  • 1




    Ah, now the problem makes sense. Thank you
    – zdim
    Nov 10 at 8:24















up vote
2
down vote

favorite
1









up vote
2
down vote

favorite
1






1





Im trying to create a while loop that will find every possible sub-string within a string. But so far all I can match is the largest instance or the shortest. So for example I have the string



EDIT CHANGE STRING FOR DEMO PURPOSES



"A.....B.....B......B......B......B"


And I want to find every possible sequence of "A.......B"



This code will give me the shortest possible return and exit the while loop



while($string =~ m/(A(.*?)B)/gi) {
print "foundn";
my $substr = $1;
print $substr."n";
}


And this will give me the longest and exit the while loop.



$string =~ m/(A(.*)B)/gi


But I want it to loop through the string returning every possible match. Does anyone know if Perl allows for this?



EDIT ADDED DESIRED OUTPUT BELOW



found
A.....B
found
A.....B.....B
found
A.....B.....B......B
found
A.....B.....B......B......B
found
A.....B.....B......B......B......B









share|improve this question















Im trying to create a while loop that will find every possible sub-string within a string. But so far all I can match is the largest instance or the shortest. So for example I have the string



EDIT CHANGE STRING FOR DEMO PURPOSES



"A.....B.....B......B......B......B"


And I want to find every possible sequence of "A.......B"



This code will give me the shortest possible return and exit the while loop



while($string =~ m/(A(.*?)B)/gi) {
print "foundn";
my $substr = $1;
print $substr."n";
}


And this will give me the longest and exit the while loop.



$string =~ m/(A(.*)B)/gi


But I want it to loop through the string returning every possible match. Does anyone know if Perl allows for this?



EDIT ADDED DESIRED OUTPUT BELOW



found
A.....B
found
A.....B.....B
found
A.....B.....B......B
found
A.....B.....B......B......B
found
A.....B.....B......B......B......B






regex perl






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 8:31

























asked Nov 10 at 8:01









Philip Butler

376




376












  • Could you show what you mean by "every possible match"?
    – Schwern
    Nov 10 at 8:08










  • e.g 1st match = start F F Q Q E R Q Q stop
    – Philip Butler
    Nov 10 at 8:08










  • 2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
    – Philip Butler
    Nov 10 at 8:09










  • It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
    – Schwern
    Nov 10 at 8:10






  • 1




    Ah, now the problem makes sense. Thank you
    – zdim
    Nov 10 at 8:24




















  • Could you show what you mean by "every possible match"?
    – Schwern
    Nov 10 at 8:08










  • e.g 1st match = start F F Q Q E R Q Q stop
    – Philip Butler
    Nov 10 at 8:08










  • 2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
    – Philip Butler
    Nov 10 at 8:09










  • It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
    – Schwern
    Nov 10 at 8:10






  • 1




    Ah, now the problem makes sense. Thank you
    – zdim
    Nov 10 at 8:24


















Could you show what you mean by "every possible match"?
– Schwern
Nov 10 at 8:08




Could you show what you mean by "every possible match"?
– Schwern
Nov 10 at 8:08












e.g 1st match = start F F Q Q E R Q Q stop
– Philip Butler
Nov 10 at 8:08




e.g 1st match = start F F Q Q E R Q Q stop
– Philip Butler
Nov 10 at 8:08












2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
– Philip Butler
Nov 10 at 8:09




2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
– Philip Butler
Nov 10 at 8:09












It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
– Schwern
Nov 10 at 8:10




It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
– Schwern
Nov 10 at 8:10




1




1




Ah, now the problem makes sense. Thank you
– zdim
Nov 10 at 8:24






Ah, now the problem makes sense. Thank you
– zdim
Nov 10 at 8:24














1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted










There are various ways to parse the string so to scoop up what you want.



For example, use regex to step through all A...A substrings and process each capture



use warnings;
use strict;
use feature 'say';

my $s = "A.....B.....B......B......B......B";

while ($s =~ m/(A.*)(?=A|$)/gi) {
my @seqs = split /(B)/, $1;
for my $i (0..$#seqs) {
say @seqs[0..$i] if $i % 2 != 0;
}
}


The (?=A|$) is a lookahead, so .* matches everything up to an A (or the end of string) but that A is not consumed and so is there for the next match. The split uses () in the separator pattern so that the separator, too, is returned (so we have all those B's). It only prints for an even number of elements, so only substrings ending with the separator (B here).



The above prints




A.....B
A.....B.....B
A.....B.....B......B
A.....B.....B......B......B
A.....B.....B......B......B......B


There may be bioinformatics modules that do this but I am not familiar with them.






share|improve this answer























  • perfect thank you very much
    – Philip Butler
    Nov 10 at 8:36










  • Super answer. very concise, exactly what I need
    – Philip Butler
    Nov 10 at 8:37






  • 1




    @PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
    – zdim
    Nov 10 at 8:39












  • I learned something new with my @seqs = split /(B)/, $1; - didn't know about capturing the delimiter before.
    – Automaton
    Nov 11 at 0:31










  • To skip the redundant non-B-ending lines you can do say @seqs[0..$i] if ($seqs[$i] eq 'B');
    – Automaton
    Nov 11 at 0:43











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237102%2fperl-regex-find-and-return-every-possible-match%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










There are various ways to parse the string so to scoop up what you want.



For example, use regex to step through all A...A substrings and process each capture



use warnings;
use strict;
use feature 'say';

my $s = "A.....B.....B......B......B......B";

while ($s =~ m/(A.*)(?=A|$)/gi) {
my @seqs = split /(B)/, $1;
for my $i (0..$#seqs) {
say @seqs[0..$i] if $i % 2 != 0;
}
}


The (?=A|$) is a lookahead, so .* matches everything up to an A (or the end of string) but that A is not consumed and so is there for the next match. The split uses () in the separator pattern so that the separator, too, is returned (so we have all those B's). It only prints for an even number of elements, so only substrings ending with the separator (B here).



The above prints




A.....B
A.....B.....B
A.....B.....B......B
A.....B.....B......B......B
A.....B.....B......B......B......B


There may be bioinformatics modules that do this but I am not familiar with them.






share|improve this answer























  • perfect thank you very much
    – Philip Butler
    Nov 10 at 8:36










  • Super answer. very concise, exactly what I need
    – Philip Butler
    Nov 10 at 8:37






  • 1




    @PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
    – zdim
    Nov 10 at 8:39












  • I learned something new with my @seqs = split /(B)/, $1; - didn't know about capturing the delimiter before.
    – Automaton
    Nov 11 at 0:31










  • To skip the redundant non-B-ending lines you can do say @seqs[0..$i] if ($seqs[$i] eq 'B');
    – Automaton
    Nov 11 at 0:43















up vote
2
down vote



accepted










There are various ways to parse the string so to scoop up what you want.



For example, use regex to step through all A...A substrings and process each capture



use warnings;
use strict;
use feature 'say';

my $s = "A.....B.....B......B......B......B";

while ($s =~ m/(A.*)(?=A|$)/gi) {
my @seqs = split /(B)/, $1;
for my $i (0..$#seqs) {
say @seqs[0..$i] if $i % 2 != 0;
}
}


The (?=A|$) is a lookahead, so .* matches everything up to an A (or the end of string) but that A is not consumed and so is there for the next match. The split uses () in the separator pattern so that the separator, too, is returned (so we have all those B's). It only prints for an even number of elements, so only substrings ending with the separator (B here).



The above prints




A.....B
A.....B.....B
A.....B.....B......B
A.....B.....B......B......B
A.....B.....B......B......B......B


There may be bioinformatics modules that do this but I am not familiar with them.






share|improve this answer























  • perfect thank you very much
    – Philip Butler
    Nov 10 at 8:36










  • Super answer. very concise, exactly what I need
    – Philip Butler
    Nov 10 at 8:37






  • 1




    @PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
    – zdim
    Nov 10 at 8:39












  • I learned something new with my @seqs = split /(B)/, $1; - didn't know about capturing the delimiter before.
    – Automaton
    Nov 11 at 0:31










  • To skip the redundant non-B-ending lines you can do say @seqs[0..$i] if ($seqs[$i] eq 'B');
    – Automaton
    Nov 11 at 0:43













up vote
2
down vote



accepted







up vote
2
down vote



accepted






There are various ways to parse the string so to scoop up what you want.



For example, use regex to step through all A...A substrings and process each capture



use warnings;
use strict;
use feature 'say';

my $s = "A.....B.....B......B......B......B";

while ($s =~ m/(A.*)(?=A|$)/gi) {
my @seqs = split /(B)/, $1;
for my $i (0..$#seqs) {
say @seqs[0..$i] if $i % 2 != 0;
}
}


The (?=A|$) is a lookahead, so .* matches everything up to an A (or the end of string) but that A is not consumed and so is there for the next match. The split uses () in the separator pattern so that the separator, too, is returned (so we have all those B's). It only prints for an even number of elements, so only substrings ending with the separator (B here).



The above prints




A.....B
A.....B.....B
A.....B.....B......B
A.....B.....B......B......B
A.....B.....B......B......B......B


There may be bioinformatics modules that do this but I am not familiar with them.






share|improve this answer














There are various ways to parse the string so to scoop up what you want.



For example, use regex to step through all A...A substrings and process each capture



use warnings;
use strict;
use feature 'say';

my $s = "A.....B.....B......B......B......B";

while ($s =~ m/(A.*)(?=A|$)/gi) {
my @seqs = split /(B)/, $1;
for my $i (0..$#seqs) {
say @seqs[0..$i] if $i % 2 != 0;
}
}


The (?=A|$) is a lookahead, so .* matches everything up to an A (or the end of string) but that A is not consumed and so is there for the next match. The split uses () in the separator pattern so that the separator, too, is returned (so we have all those B's). It only prints for an even number of elements, so only substrings ending with the separator (B here).



The above prints




A.....B
A.....B.....B
A.....B.....B......B
A.....B.....B......B......B
A.....B.....B......B......B......B


There may be bioinformatics modules that do this but I am not familiar with them.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 11 at 2:57

























answered Nov 10 at 8:32









zdim

30.9k32040




30.9k32040












  • perfect thank you very much
    – Philip Butler
    Nov 10 at 8:36










  • Super answer. very concise, exactly what I need
    – Philip Butler
    Nov 10 at 8:37






  • 1




    @PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
    – zdim
    Nov 10 at 8:39












  • I learned something new with my @seqs = split /(B)/, $1; - didn't know about capturing the delimiter before.
    – Automaton
    Nov 11 at 0:31










  • To skip the redundant non-B-ending lines you can do say @seqs[0..$i] if ($seqs[$i] eq 'B');
    – Automaton
    Nov 11 at 0:43


















  • perfect thank you very much
    – Philip Butler
    Nov 10 at 8:36










  • Super answer. very concise, exactly what I need
    – Philip Butler
    Nov 10 at 8:37






  • 1




    @PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
    – zdim
    Nov 10 at 8:39












  • I learned something new with my @seqs = split /(B)/, $1; - didn't know about capturing the delimiter before.
    – Automaton
    Nov 11 at 0:31










  • To skip the redundant non-B-ending lines you can do say @seqs[0..$i] if ($seqs[$i] eq 'B');
    – Automaton
    Nov 11 at 0:43
















perfect thank you very much
– Philip Butler
Nov 10 at 8:36




perfect thank you very much
– Philip Butler
Nov 10 at 8:36












Super answer. very concise, exactly what I need
– Philip Butler
Nov 10 at 8:37




Super answer. very concise, exactly what I need
– Philip Butler
Nov 10 at 8:37




1




1




@PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
– zdim
Nov 10 at 8:39






@PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
– zdim
Nov 10 at 8:39














I learned something new with my @seqs = split /(B)/, $1; - didn't know about capturing the delimiter before.
– Automaton
Nov 11 at 0:31




I learned something new with my @seqs = split /(B)/, $1; - didn't know about capturing the delimiter before.
– Automaton
Nov 11 at 0:31












To skip the redundant non-B-ending lines you can do say @seqs[0..$i] if ($seqs[$i] eq 'B');
– Automaton
Nov 11 at 0:43




To skip the redundant non-B-ending lines you can do say @seqs[0..$i] if ($seqs[$i] eq 'B');
– Automaton
Nov 11 at 0:43


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237102%2fperl-regex-find-and-return-every-possible-match%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Bressuire

Vorschmack

Quarantine