regex lookbehind alternative for parser (js)

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

Good morning

(I saw this topic has a LOT of answers but I couldn't find one that fits)

I am writing a little parser in javascript that would cut the text into sections like this :

var tex = "hello   this :word is apart"



var parsed = [

  "hello",

  "   ",

  "this",

  " ",

  // ":word" should not be there, neither "word"

  " ",

  "is",

  "apart"

]

the perfect regex for this is :

/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g

But it has a positive lookbehind that, as I read, was only implemented in javascript in 2018, so I guess many browser compatibility conflicts... and I would like it to have at least a little compatibility...

I considered :

trying capturing groups (?:) but it consumes the space before...

just removing the spaces-check, but ":word" comes in as "word"

parsing the text 2 times, one for words, the other for spaces, but i fear putting them in the right order would be a pain

Understand, I NEED words AND ALL spaces, and to exclude some words.
I am open in other methods, like not using regex.

my last option :

removing the spaces-check and organising my whole regex in the right order, praying that ":word" would be kept in the "special words" group before anything else.

my question :

would that work in javascript, and be reliable ?

I tried

/(((:[a-z]+)|([ ]+)|([a-z]*))/g

in https://regexr.com/ seems to work, will it work in every case ?

edited Nov 17 '18 at 3:28

asked Nov 17 '18 at 3:14

Gui3

288

Your second regex has one too many left parenthes.

– Poul Bak
Nov 17 '18 at 3:59

I agree, it was to make groups, but i don't know if it's worth it

– Gui3
Nov 17 '18 at 4:20

add a comment |

Good morning

(I saw this topic has a LOT of answers but I couldn't find one that fits)

I am writing a little parser in javascript that would cut the text into sections like this :

var tex = "hello   this :word is apart"



var parsed = [

  "hello",

  "   ",

  "this",

  " ",

  // ":word" should not be there, neither "word"

  " ",

  "is",

  "apart"

]

the perfect regex for this is :

/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g

I considered :

trying capturing groups (?:) but it consumes the space before...

just removing the spaces-check, but ":word" comes in as "word"

parsing the text 2 times, one for words, the other for spaces, but i fear putting them in the right order would be a pain

Understand, I NEED words AND ALL spaces, and to exclude some words.
I am open in other methods, like not using regex.

my last option :

removing the spaces-check and organising my whole regex in the right order, praying that ":word" would be kept in the "special words" group before anything else.

my question :

would that work in javascript, and be reliable ?

I tried

/(((:[a-z]+)|([ ]+)|([a-z]*))/g

in https://regexr.com/ seems to work, will it work in every case ?

edited Nov 17 '18 at 3:28

asked Nov 17 '18 at 3:14

Gui3

288

Your second regex has one too many left parenthes.

– Poul Bak
Nov 17 '18 at 3:59

I agree, it was to make groups, but i don't know if it's worth it

– Gui3
Nov 17 '18 at 4:20

add a comment |

Good morning

(I saw this topic has a LOT of answers but I couldn't find one that fits)

I am writing a little parser in javascript that would cut the text into sections like this :

var tex = "hello   this :word is apart"



var parsed = [

  "hello",

  "   ",

  "this",

  " ",

  // ":word" should not be there, neither "word"

  " ",

  "is",

  "apart"

]

the perfect regex for this is :

/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g

I considered :

trying capturing groups (?:) but it consumes the space before...

just removing the spaces-check, but ":word" comes in as "word"

parsing the text 2 times, one for words, the other for spaces, but i fear putting them in the right order would be a pain

Understand, I NEED words AND ALL spaces, and to exclude some words.
I am open in other methods, like not using regex.

my last option :

removing the spaces-check and organising my whole regex in the right order, praying that ":word" would be kept in the "special words" group before anything else.

my question :

would that work in javascript, and be reliable ?

I tried

/(((:[a-z]+)|([ ]+)|([a-z]*))/g

in https://regexr.com/ seems to work, will it work in every case ?

edited Nov 17 '18 at 3:28

asked Nov 17 '18 at 3:14

Gui3

288

Good morning

(I saw this topic has a LOT of answers but I couldn't find one that fits)

I am writing a little parser in javascript that would cut the text into sections like this :

var tex = "hello   this :word is apart"



var parsed = [

  "hello",

  "   ",

  "this",

  " ",

  // ":word" should not be there, neither "word"

  " ",

  "is",

  "apart"

]

the perfect regex for this is :

/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g

I considered :

trying capturing groups (?:) but it consumes the space before...

just removing the spaces-check, but ":word" comes in as "word"

parsing the text 2 times, one for words, the other for spaces, but i fear putting them in the right order would be a pain

Understand, I NEED words AND ALL spaces, and to exclude some words.
I am open in other methods, like not using regex.

my last option :

removing the spaces-check and organising my whole regex in the right order, praying that ":word" would be kept in the "special words" group before anything else.

my question :

would that work in javascript, and be reliable ?

I tried

/(((:[a-z]+)|([ ]+)|([a-z]*))/g

in https://regexr.com/ seems to work, will it work in every case ?

javascript regex parsing lookbehind

edited Nov 17 '18 at 3:28

asked Nov 17 '18 at 3:14

Gui3

288

edited Nov 17 '18 at 3:28

asked Nov 17 '18 at 3:14

Gui3

288

edited Nov 17 '18 at 3:28

asked Nov 17 '18 at 3:14

Gui3

288

asked Nov 17 '18 at 3:14

Gui3

288

asked Nov 17 '18 at 3:14

Gui3

288

Your second regex has one too many left parenthes.

– Poul Bak
Nov 17 '18 at 3:59

I agree, it was to make groups, but i don't know if it's worth it

– Gui3
Nov 17 '18 at 4:20

add a comment |

Your second regex has one too many left parenthes.

– Poul Bak
Nov 17 '18 at 3:59

I agree, it was to make groups, but i don't know if it's worth it

– Gui3
Nov 17 '18 at 4:20

Your second regex has one too many left parenthes.

– Poul Bak
Nov 17 '18 at 3:59

I agree, it was to make groups, but i don't know if it's worth it

– Gui3
Nov 17 '18 at 4:20

add a comment |

2 Answers
2

active

oldest

votes

You said you're open to non-regex solutions, but I can give you one that includes both. Since you can't rely on lookbehind being supported, then just capture everything and filter out what you don't want, words followed by a colon.

const text = 'hello   this :word is apart';

const regex = /(w+)|(:w+)|(s+)/g;

const parsed = text.match(regex).filter(word => !word.includes(':'));



console.log(parsed);

answered Nov 17 '18 at 3:44

AnonymousSB

2,239221

thanks a lot ! exactly the solution i was working on, but much simpler.

– Gui3
Nov 17 '18 at 4:25

and your solution highlights that no matter the order, regex will take ... the longest group ?

– Gui3
Nov 17 '18 at 4:39

Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.

– AnonymousSB
Nov 17 '18 at 4:50

after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....

– Gui3
Nov 17 '18 at 14:46

add a comment |

I would use 2 regexes, first one matches the Words, you DON'T want and then replace them with an empty string, this is the simple regex:

/:w+/g

Then replace with an empty string. Now you have a string, that can be parsed with this regex:

/([ ]+)|([a-z]*)/g

which is a simplified version of your second regex, since forbidden Words are already gone.

answered Nov 17 '18 at 3:58

Poul Bak

5,49331233

thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?

– Gui3
Nov 17 '18 at 4:22

Yes, to my knowledge, w Works everywhere, it's very basic

– Poul Bak
Nov 17 '18 at 4:26

Yes, w is fully supported in JavaScript, as is d for digits and s for space

– AnonymousSB
Nov 17 '18 at 4:28

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53347859%2fregex-lookbehind-alternative-for-parser-js%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

const text = 'hello   this :word is apart';

const regex = /(w+)|(:w+)|(s+)/g;

const parsed = text.match(regex).filter(word => !word.includes(':'));



console.log(parsed);

answered Nov 17 '18 at 3:44

AnonymousSB

2,239221

thanks a lot ! exactly the solution i was working on, but much simpler.

– Gui3
Nov 17 '18 at 4:25

and your solution highlights that no matter the order, regex will take ... the longest group ?

– Gui3
Nov 17 '18 at 4:39

Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.

– AnonymousSB
Nov 17 '18 at 4:50

after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....

– Gui3
Nov 17 '18 at 14:46

add a comment |

const text = 'hello   this :word is apart';

const regex = /(w+)|(:w+)|(s+)/g;

const parsed = text.match(regex).filter(word => !word.includes(':'));



console.log(parsed);

answered Nov 17 '18 at 3:44

AnonymousSB

2,239221

thanks a lot ! exactly the solution i was working on, but much simpler.

– Gui3
Nov 17 '18 at 4:25

and your solution highlights that no matter the order, regex will take ... the longest group ?

– Gui3
Nov 17 '18 at 4:39

Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.

– AnonymousSB
Nov 17 '18 at 4:50

after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....

– Gui3
Nov 17 '18 at 14:46

add a comment |

const text = 'hello   this :word is apart';

const regex = /(w+)|(:w+)|(s+)/g;

const parsed = text.match(regex).filter(word => !word.includes(':'));



console.log(parsed);

answered Nov 17 '18 at 3:44

AnonymousSB

2,239221

const text = 'hello   this :word is apart';

const regex = /(w+)|(:w+)|(s+)/g;

const parsed = text.match(regex).filter(word => !word.includes(':'));



console.log(parsed);

const text = 'hello   this :word is apart';

const regex = /(w+)|(:w+)|(s+)/g;

const parsed = text.match(regex).filter(word => !word.includes(':'));



console.log(parsed);

const text = 'hello   this :word is apart';

const regex = /(w+)|(:w+)|(s+)/g;

const parsed = text.match(regex).filter(word => !word.includes(':'));



console.log(parsed);

answered Nov 17 '18 at 3:44

AnonymousSB

2,239221

answered Nov 17 '18 at 3:44

AnonymousSB

2,239221

answered Nov 17 '18 at 3:44

AnonymousSB

2,239221

answered Nov 17 '18 at 3:44

AnonymousSB

2,239221

thanks a lot ! exactly the solution i was working on, but much simpler.

– Gui3
Nov 17 '18 at 4:25

and your solution highlights that no matter the order, regex will take ... the longest group ?

– Gui3
Nov 17 '18 at 4:39

Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.

– AnonymousSB
Nov 17 '18 at 4:50

after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....

– Gui3
Nov 17 '18 at 14:46

add a comment |

thanks a lot ! exactly the solution i was working on, but much simpler.

– Gui3
Nov 17 '18 at 4:25

and your solution highlights that no matter the order, regex will take ... the longest group ?

– Gui3
Nov 17 '18 at 4:39

Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.

– AnonymousSB
Nov 17 '18 at 4:50

after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....

– Gui3
Nov 17 '18 at 14:46

thanks a lot ! exactly the solution i was working on, but much simpler.

– Gui3
Nov 17 '18 at 4:25

and your solution highlights that no matter the order, regex will take ... the longest group ?

– Gui3
Nov 17 '18 at 4:39

Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.

– AnonymousSB
Nov 17 '18 at 4:50

after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....

– Gui3
Nov 17 '18 at 14:46

add a comment |

I would use 2 regexes, first one matches the Words, you DON'T want and then replace them with an empty string, this is the simple regex:

/:w+/g

Then replace with an empty string. Now you have a string, that can be parsed with this regex:

/([ ]+)|([a-z]*)/g

which is a simplified version of your second regex, since forbidden Words are already gone.

answered Nov 17 '18 at 3:58

Poul Bak

5,49331233

thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?

– Gui3
Nov 17 '18 at 4:22

Yes, to my knowledge, w Works everywhere, it's very basic

– Poul Bak
Nov 17 '18 at 4:26

Yes, w is fully supported in JavaScript, as is d for digits and s for space

– AnonymousSB
Nov 17 '18 at 4:28

add a comment |

I would use 2 regexes, first one matches the Words, you DON'T want and then replace them with an empty string, this is the simple regex:

/:w+/g

Then replace with an empty string. Now you have a string, that can be parsed with this regex:

/([ ]+)|([a-z]*)/g

which is a simplified version of your second regex, since forbidden Words are already gone.

answered Nov 17 '18 at 3:58

Poul Bak

5,49331233

thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?

– Gui3
Nov 17 '18 at 4:22

Yes, to my knowledge, w Works everywhere, it's very basic

– Poul Bak
Nov 17 '18 at 4:26

Yes, w is fully supported in JavaScript, as is d for digits and s for space

– AnonymousSB
Nov 17 '18 at 4:28

add a comment |

I would use 2 regexes, first one matches the Words, you DON'T want and then replace them with an empty string, this is the simple regex:

/:w+/g

Then replace with an empty string. Now you have a string, that can be parsed with this regex:

/([ ]+)|([a-z]*)/g

which is a simplified version of your second regex, since forbidden Words are already gone.

answered Nov 17 '18 at 3:58

Poul Bak

5,49331233

I would use 2 regexes, first one matches the Words, you DON'T want and then replace them with an empty string, this is the simple regex:

/:w+/g

Then replace with an empty string. Now you have a string, that can be parsed with this regex:

/([ ]+)|([a-z]*)/g

which is a simplified version of your second regex, since forbidden Words are already gone.

answered Nov 17 '18 at 3:58

Poul Bak

5,49331233

answered Nov 17 '18 at 3:58

Poul Bak

5,49331233

answered Nov 17 '18 at 3:58

Poul Bak

5,49331233

answered Nov 17 '18 at 3:58

Poul Bak

5,49331233

thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?

– Gui3
Nov 17 '18 at 4:22

Yes, to my knowledge, w Works everywhere, it's very basic

– Poul Bak
Nov 17 '18 at 4:26

Yes, w is fully supported in JavaScript, as is d for digits and s for space

– AnonymousSB
Nov 17 '18 at 4:28

add a comment |

thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?

– Gui3
Nov 17 '18 at 4:22

Yes, to my knowledge, w Works everywhere, it's very basic

– Poul Bak
Nov 17 '18 at 4:26

Yes, w is fully supported in JavaScript, as is d for digits and s for space

– AnonymousSB
Nov 17 '18 at 4:28

thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?

– Gui3
Nov 17 '18 at 4:22

Yes, to my knowledge, w Works everywhere, it's very basic

– Poul Bak
Nov 17 '18 at 4:26

Yes, w is fully supported in JavaScript, as is d for digits and s for space

– AnonymousSB
Nov 17 '18 at 4:28

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky