Perl Regex Find and Return Every Possible Match

up vote
2
down vote

favorite

Im trying to create a while loop that will find every possible sub-string within a string. But so far all I can match is the largest instance or the shortest. So for example I have the string

EDIT CHANGE STRING FOR DEMO PURPOSES

"A.....B.....B......B......B......B"

And I want to find every possible sequence of "A.......B"

This code will give me the shortest possible return and exit the while loop

while($string =~ m/(A(.*?)B)/gi) {

    print "foundn";

    my $substr = $1;

    print $substr."n";

}

And this will give me the longest and exit the while loop.

$string =~ m/(A(.*)B)/gi

But I want it to loop through the string returning every possible match. Does anyone know if Perl allows for this?

EDIT ADDED DESIRED OUTPUT BELOW

found

A.....B

found

A.....B.....B

found

A.....B.....B......B

found

A.....B.....B......B......B

found

A.....B.....B......B......B......B

edited Nov 10 at 8:31

asked Nov 10 at 8:01

Philip Butler

376

Could you show what you mean by "every possible match"?
– Schwern
Nov 10 at 8:08

e.g 1st match = start F F Q Q E R Q Q stop
– Philip Butler
Nov 10 at 8:08

2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
– Philip Butler
Nov 10 at 8:09

It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
– Schwern
Nov 10 at 8:10

1

Ah, now the problem makes sense. Thank you
– zdim
Nov 10 at 8:24

|
show 10 more comments

up vote
2
down vote

favorite

Im trying to create a while loop that will find every possible sub-string within a string. But so far all I can match is the largest instance or the shortest. So for example I have the string

EDIT CHANGE STRING FOR DEMO PURPOSES

"A.....B.....B......B......B......B"

And I want to find every possible sequence of "A.......B"

This code will give me the shortest possible return and exit the while loop

while($string =~ m/(A(.*?)B)/gi) {

    print "foundn";

    my $substr = $1;

    print $substr."n";

}

And this will give me the longest and exit the while loop.

$string =~ m/(A(.*)B)/gi

But I want it to loop through the string returning every possible match. Does anyone know if Perl allows for this?

EDIT ADDED DESIRED OUTPUT BELOW

found

A.....B

found

A.....B.....B

found

A.....B.....B......B

found

A.....B.....B......B......B

found

A.....B.....B......B......B......B

edited Nov 10 at 8:31

asked Nov 10 at 8:01

Philip Butler

376

Could you show what you mean by "every possible match"?
– Schwern
Nov 10 at 8:08

e.g 1st match = start F F Q Q E R Q Q stop
– Philip Butler
Nov 10 at 8:08

2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
– Philip Butler
Nov 10 at 8:09

It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
– Schwern
Nov 10 at 8:10

1

Ah, now the problem makes sense. Thank you
– zdim
Nov 10 at 8:24

|
show 10 more comments

up vote
2
down vote

favorite

Im trying to create a while loop that will find every possible sub-string within a string. But so far all I can match is the largest instance or the shortest. So for example I have the string

EDIT CHANGE STRING FOR DEMO PURPOSES

"A.....B.....B......B......B......B"

And I want to find every possible sequence of "A.......B"

This code will give me the shortest possible return and exit the while loop

while($string =~ m/(A(.*?)B)/gi) {

    print "foundn";

    my $substr = $1;

    print $substr."n";

}

And this will give me the longest and exit the while loop.

$string =~ m/(A(.*)B)/gi

But I want it to loop through the string returning every possible match. Does anyone know if Perl allows for this?

EDIT ADDED DESIRED OUTPUT BELOW

found

A.....B

found

A.....B.....B

found

A.....B.....B......B

found

A.....B.....B......B......B

found

A.....B.....B......B......B......B

edited Nov 10 at 8:31

asked Nov 10 at 8:01

Philip Butler

376

Im trying to create a while loop that will find every possible sub-string within a string. But so far all I can match is the largest instance or the shortest. So for example I have the string

EDIT CHANGE STRING FOR DEMO PURPOSES

"A.....B.....B......B......B......B"

And I want to find every possible sequence of "A.......B"

This code will give me the shortest possible return and exit the while loop

while($string =~ m/(A(.*?)B)/gi) {

    print "foundn";

    my $substr = $1;

    print $substr."n";

}

And this will give me the longest and exit the while loop.

$string =~ m/(A(.*)B)/gi

But I want it to loop through the string returning every possible match. Does anyone know if Perl allows for this?

EDIT ADDED DESIRED OUTPUT BELOW

found

A.....B

found

A.....B.....B

found

A.....B.....B......B

found

A.....B.....B......B......B

found

A.....B.....B......B......B......B

regex perl

edited Nov 10 at 8:31

asked Nov 10 at 8:01

Philip Butler

376

edited Nov 10 at 8:31

asked Nov 10 at 8:01

Philip Butler

376

edited Nov 10 at 8:31

asked Nov 10 at 8:01

Philip Butler

376

asked Nov 10 at 8:01

Philip Butler

376

asked Nov 10 at 8:01

Philip Butler

376

Could you show what you mean by "every possible match"?
– Schwern
Nov 10 at 8:08

e.g 1st match = start F F Q Q E R Q Q stop
– Philip Butler
Nov 10 at 8:08

2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
– Philip Butler
Nov 10 at 8:09

It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
– Schwern
Nov 10 at 8:10

1

Ah, now the problem makes sense. Thank you
– zdim
Nov 10 at 8:24

|
show 10 more comments

Could you show what you mean by "every possible match"?
– Schwern
Nov 10 at 8:08

e.g 1st match = start F F Q Q E R Q Q stop
– Philip Butler
Nov 10 at 8:08

2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
– Philip Butler
Nov 10 at 8:09

It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
– Schwern
Nov 10 at 8:10

1

Ah, now the problem makes sense. Thank you
– zdim
Nov 10 at 8:24

Could you show what you mean by "every possible match"?
– Schwern
Nov 10 at 8:08

e.g 1st match = start F F Q Q E R Q Q stop
– Philip Butler
Nov 10 at 8:08

2nd = start F F Q Q E R Q Q stop R Q R R H A G C R H W Y G C E R R Q R Q H V F R R A G S S A N A T A A A E Q H R L L R S G Q V R Y P F stop etc.. (sorry hit return by accident thats why 2 coments)
– Philip Butler
Nov 10 at 8:09

It's best if you edit your full expected output into the answer. Perhaps using a shorter string.
– Schwern
Nov 10 at 8:10

Ah, now the problem makes sense. Thank you
– zdim
Nov 10 at 8:24

|
show 10 more comments

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

There are various ways to parse the string so to scoop up what you want.

For example, use regex to step through all A...A substrings and process each capture

use warnings;

use strict;

use feature 'say';



my $s = "A.....B.....B......B......B......B";



while ($s =~ m/(A.*)(?=A|$)/gi) {

    my @seqs = split /(B)/, $1; 

    for my $i (0..$#seqs) {

        say @seqs[0..$i] if $i % 2 != 0;

    }   

}

The (?=A|$) is a lookahead, so .* matches everything up to an A (or the end of string) but that A is not consumed and so is there for the next match. The split uses () in the separator pattern so that the separator, too, is returned (so we have all those B's). It only prints for an even number of elements, so only substrings ending with the separator (B here).

The above prints



A.....B

A.....B.....B

A.....B.....B......B

A.....B.....B......B......B

A.....B.....B......B......B......B

There may be bioinformatics modules that do this but I am not familiar with them.

edited Nov 11 at 2:57

answered Nov 10 at 8:32

zdim

30.9k32040

perfect thank you very much
– Philip Butler
Nov 10 at 8:36

Super answer. very concise, exactly what I need
– Philip Butler
Nov 10 at 8:37

1

@PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
– zdim
Nov 10 at 8:39

I learned something new with my @seqs = split /(B)/, $1; - didn't know about capturing the delimiter before.
– Automaton
Nov 11 at 0:31

To skip the redundant non-B-ending lines you can do say @seqs[0..$i] if ($seqs[$i] eq 'B');
– Automaton
Nov 11 at 0:43

|
show 2 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237102%2fperl-regex-find-and-return-every-possible-match%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

There are various ways to parse the string so to scoop up what you want.

For example, use regex to step through all A...A substrings and process each capture

use warnings;

use strict;

use feature 'say';



my $s = "A.....B.....B......B......B......B";



while ($s =~ m/(A.*)(?=A|$)/gi) {

    my @seqs = split /(B)/, $1; 

    for my $i (0..$#seqs) {

        say @seqs[0..$i] if $i % 2 != 0;

    }   

}

The above prints



A.....B

A.....B.....B

A.....B.....B......B

A.....B.....B......B......B

A.....B.....B......B......B......B

There may be bioinformatics modules that do this but I am not familiar with them.

edited Nov 11 at 2:57

answered Nov 10 at 8:32

zdim

30.9k32040

perfect thank you very much
– Philip Butler
Nov 10 at 8:36

Super answer. very concise, exactly what I need
– Philip Butler
Nov 10 at 8:37

1

@PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
– zdim
Nov 10 at 8:39

I learned something new with my @seqs = split /(B)/, $1; - didn't know about capturing the delimiter before.
– Automaton
Nov 11 at 0:31

To skip the redundant non-B-ending lines you can do say @seqs[0..$i] if ($seqs[$i] eq 'B');
– Automaton
Nov 11 at 0:43

|
show 2 more comments

up vote
2
down vote

accepted

There are various ways to parse the string so to scoop up what you want.

For example, use regex to step through all A...A substrings and process each capture

use warnings;

use strict;

use feature 'say';



my $s = "A.....B.....B......B......B......B";



while ($s =~ m/(A.*)(?=A|$)/gi) {

    my @seqs = split /(B)/, $1; 

    for my $i (0..$#seqs) {

        say @seqs[0..$i] if $i % 2 != 0;

    }   

}

The above prints



A.....B

A.....B.....B

A.....B.....B......B

A.....B.....B......B......B

A.....B.....B......B......B......B

There may be bioinformatics modules that do this but I am not familiar with them.

edited Nov 11 at 2:57

answered Nov 10 at 8:32

zdim

30.9k32040

perfect thank you very much
– Philip Butler
Nov 10 at 8:36

Super answer. very concise, exactly what I need
– Philip Butler
Nov 10 at 8:37

1

@PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
– zdim
Nov 10 at 8:39

I learned something new with my @seqs = split /(B)/, $1; - didn't know about capturing the delimiter before.
– Automaton
Nov 11 at 0:31

To skip the redundant non-B-ending lines you can do say @seqs[0..$i] if ($seqs[$i] eq 'B');
– Automaton
Nov 11 at 0:43

|
show 2 more comments

up vote
2
down vote

accepted

There are various ways to parse the string so to scoop up what you want.

For example, use regex to step through all A...A substrings and process each capture

use warnings;

use strict;

use feature 'say';



my $s = "A.....B.....B......B......B......B";



while ($s =~ m/(A.*)(?=A|$)/gi) {

    my @seqs = split /(B)/, $1; 

    for my $i (0..$#seqs) {

        say @seqs[0..$i] if $i % 2 != 0;

    }   

}

The above prints



A.....B

A.....B.....B

A.....B.....B......B

A.....B.....B......B......B

A.....B.....B......B......B......B

There may be bioinformatics modules that do this but I am not familiar with them.

edited Nov 11 at 2:57

answered Nov 10 at 8:32

zdim

30.9k32040

There are various ways to parse the string so to scoop up what you want.

For example, use regex to step through all A...A substrings and process each capture

use warnings;

use strict;

use feature 'say';



my $s = "A.....B.....B......B......B......B";



while ($s =~ m/(A.*)(?=A|$)/gi) {

    my @seqs = split /(B)/, $1; 

    for my $i (0..$#seqs) {

        say @seqs[0..$i] if $i % 2 != 0;

    }   

}

The above prints



A.....B

A.....B.....B

A.....B.....B......B

A.....B.....B......B......B

A.....B.....B......B......B......B

There may be bioinformatics modules that do this but I am not familiar with them.

edited Nov 11 at 2:57

answered Nov 10 at 8:32

zdim

30.9k32040

edited Nov 11 at 2:57

answered Nov 10 at 8:32

zdim

30.9k32040

answered Nov 10 at 8:32

zdim

30.9k32040

answered Nov 10 at 8:32

zdim

30.9k32040

perfect thank you very much
– Philip Butler
Nov 10 at 8:36

Super answer. very concise, exactly what I need
– Philip Butler
Nov 10 at 8:37

1

@PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
– zdim
Nov 10 at 8:39

I learned something new with my @seqs = split /(B)/, $1; - didn't know about capturing the delimiter before.
– Automaton
Nov 11 at 0:31

To skip the redundant non-B-ending lines you can do say @seqs[0..$i] if ($seqs[$i] eq 'B');
– Automaton
Nov 11 at 0:43

|
show 2 more comments

perfect thank you very much
– Philip Butler
Nov 10 at 8:36

Super answer. very concise, exactly what I need
– Philip Butler
Nov 10 at 8:37

1

@PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
– zdim
Nov 10 at 8:39

I learned something new with my @seqs = split /(B)/, $1; - didn't know about capturing the delimiter before.
– Automaton
Nov 11 at 0:31

To skip the redundant non-B-ending lines you can do say @seqs[0..$i] if ($seqs[$i] eq 'B');
– Automaton
Nov 11 at 0:43

perfect thank you very much
– Philip Butler
Nov 10 at 8:36

Super answer. very concise, exactly what I need
– Philip Butler
Nov 10 at 8:37

@PhilipButler Great :) Let me know if testing with real data shows problems, or if other questions pop up
– zdim
Nov 10 at 8:39

I learned something new with my @seqs = split /(B)/, $1; - didn't know about capturing the delimiter before.
– Automaton
Nov 11 at 0:31

To skip the redundant non-B-ending lines you can do say @seqs[0..$i] if ($seqs[$i] eq 'B');
– Automaton
Nov 11 at 0:43

|
show 2 more comments

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

zWDSPZ9HG

搜尋此網誌

Vfrdtyky