regex lookbehind alternative for parser (js)





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















Good morning



(I saw this topic has a LOT of answers but I couldn't find one that fits)



I am writing a little parser in javascript that would cut the text into sections like this :



var tex = "hello   this :word is apart"

var parsed = [
"hello",
" ",
"this",
" ",
// ":word" should not be there, neither "word"
" ",
"is",
"apart"
]


the perfect regex for this is :



/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g


But it has a positive lookbehind that, as I read, was only implemented in javascript in 2018, so I guess many browser compatibility conflicts... and I would like it to have at least a little compatibility...



I considered :




  • trying capturing groups (?:) but it consumes the space before...

  • just removing the spaces-check, but ":word" comes in as "word"

  • parsing the text 2 times, one for words, the other for spaces, but i fear putting them in the right order would be a pain


Understand, I NEED words AND ALL spaces, and to exclude some words.
I am open in other methods, like not using regex.



my last option :



removing the spaces-check and organising my whole regex in the right order, praying that ":word" would be kept in the "special words" group before anything else.



my question :



would that work in javascript, and be reliable ?



I tried



/(((:[a-z]+)|([ ]+)|([a-z]*))/g


in https://regexr.com/ seems to work, will it work in every case ?










share|improve this question

























  • Your second regex has one too many left parenthes.

    – Poul Bak
    Nov 17 '18 at 3:59











  • I agree, it was to make groups, but i don't know if it's worth it

    – Gui3
    Nov 17 '18 at 4:20


















0















Good morning



(I saw this topic has a LOT of answers but I couldn't find one that fits)



I am writing a little parser in javascript that would cut the text into sections like this :



var tex = "hello   this :word is apart"

var parsed = [
"hello",
" ",
"this",
" ",
// ":word" should not be there, neither "word"
" ",
"is",
"apart"
]


the perfect regex for this is :



/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g


But it has a positive lookbehind that, as I read, was only implemented in javascript in 2018, so I guess many browser compatibility conflicts... and I would like it to have at least a little compatibility...



I considered :




  • trying capturing groups (?:) but it consumes the space before...

  • just removing the spaces-check, but ":word" comes in as "word"

  • parsing the text 2 times, one for words, the other for spaces, but i fear putting them in the right order would be a pain


Understand, I NEED words AND ALL spaces, and to exclude some words.
I am open in other methods, like not using regex.



my last option :



removing the spaces-check and organising my whole regex in the right order, praying that ":word" would be kept in the "special words" group before anything else.



my question :



would that work in javascript, and be reliable ?



I tried



/(((:[a-z]+)|([ ]+)|([a-z]*))/g


in https://regexr.com/ seems to work, will it work in every case ?










share|improve this question

























  • Your second regex has one too many left parenthes.

    – Poul Bak
    Nov 17 '18 at 3:59











  • I agree, it was to make groups, but i don't know if it's worth it

    – Gui3
    Nov 17 '18 at 4:20














0












0








0








Good morning



(I saw this topic has a LOT of answers but I couldn't find one that fits)



I am writing a little parser in javascript that would cut the text into sections like this :



var tex = "hello   this :word is apart"

var parsed = [
"hello",
" ",
"this",
" ",
// ":word" should not be there, neither "word"
" ",
"is",
"apart"
]


the perfect regex for this is :



/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g


But it has a positive lookbehind that, as I read, was only implemented in javascript in 2018, so I guess many browser compatibility conflicts... and I would like it to have at least a little compatibility...



I considered :




  • trying capturing groups (?:) but it consumes the space before...

  • just removing the spaces-check, but ":word" comes in as "word"

  • parsing the text 2 times, one for words, the other for spaces, but i fear putting them in the right order would be a pain


Understand, I NEED words AND ALL spaces, and to exclude some words.
I am open in other methods, like not using regex.



my last option :



removing the spaces-check and organising my whole regex in the right order, praying that ":word" would be kept in the "special words" group before anything else.



my question :



would that work in javascript, and be reliable ?



I tried



/(((:[a-z]+)|([ ]+)|([a-z]*))/g


in https://regexr.com/ seems to work, will it work in every case ?










share|improve this question
















Good morning



(I saw this topic has a LOT of answers but I couldn't find one that fits)



I am writing a little parser in javascript that would cut the text into sections like this :



var tex = "hello   this :word is apart"

var parsed = [
"hello",
" ",
"this",
" ",
// ":word" should not be there, neither "word"
" ",
"is",
"apart"
]


the perfect regex for this is :



/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g


But it has a positive lookbehind that, as I read, was only implemented in javascript in 2018, so I guess many browser compatibility conflicts... and I would like it to have at least a little compatibility...



I considered :




  • trying capturing groups (?:) but it consumes the space before...

  • just removing the spaces-check, but ":word" comes in as "word"

  • parsing the text 2 times, one for words, the other for spaces, but i fear putting them in the right order would be a pain


Understand, I NEED words AND ALL spaces, and to exclude some words.
I am open in other methods, like not using regex.



my last option :



removing the spaces-check and organising my whole regex in the right order, praying that ":word" would be kept in the "special words" group before anything else.



my question :



would that work in javascript, and be reliable ?



I tried



/(((:[a-z]+)|([ ]+)|([a-z]*))/g


in https://regexr.com/ seems to work, will it work in every case ?







javascript regex parsing lookbehind






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 17 '18 at 3:28







Gui3

















asked Nov 17 '18 at 3:14









Gui3Gui3

288




288













  • Your second regex has one too many left parenthes.

    – Poul Bak
    Nov 17 '18 at 3:59











  • I agree, it was to make groups, but i don't know if it's worth it

    – Gui3
    Nov 17 '18 at 4:20



















  • Your second regex has one too many left parenthes.

    – Poul Bak
    Nov 17 '18 at 3:59











  • I agree, it was to make groups, but i don't know if it's worth it

    – Gui3
    Nov 17 '18 at 4:20

















Your second regex has one too many left parenthes.

– Poul Bak
Nov 17 '18 at 3:59





Your second regex has one too many left parenthes.

– Poul Bak
Nov 17 '18 at 3:59













I agree, it was to make groups, but i don't know if it's worth it

– Gui3
Nov 17 '18 at 4:20





I agree, it was to make groups, but i don't know if it's worth it

– Gui3
Nov 17 '18 at 4:20












2 Answers
2






active

oldest

votes


















1














You said you're open to non-regex solutions, but I can give you one that includes both. Since you can't rely on lookbehind being supported, then just capture everything and filter out what you don't want, words followed by a colon.






const text = 'hello   this :word is apart';
const regex = /(w+)|(:w+)|(s+)/g;
const parsed = text.match(regex).filter(word => !word.includes(':'));

console.log(parsed);








share|improve this answer
























  • thanks a lot ! exactly the solution i was working on, but much simpler.

    – Gui3
    Nov 17 '18 at 4:25











  • and your solution highlights that no matter the order, regex will take ... the longest group ?

    – Gui3
    Nov 17 '18 at 4:39











  • Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.

    – AnonymousSB
    Nov 17 '18 at 4:50











  • after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....

    – Gui3
    Nov 17 '18 at 14:46





















1














I would use 2 regexes, first one matches the Words, you DON'T want and then replace them with an empty string, this is the simple regex:



/:w+/g


Then replace with an empty string. Now you have a string, that can be parsed with this regex:



/([ ]+)|([a-z]*)/g


which is a simplified version of your second regex, since forbidden Words are already gone.






share|improve this answer
























  • thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?

    – Gui3
    Nov 17 '18 at 4:22













  • Yes, to my knowledge, w Works everywhere, it's very basic

    – Poul Bak
    Nov 17 '18 at 4:26











  • Yes, w is fully supported in JavaScript, as is d for digits and s for space

    – AnonymousSB
    Nov 17 '18 at 4:28












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53347859%2fregex-lookbehind-alternative-for-parser-js%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














You said you're open to non-regex solutions, but I can give you one that includes both. Since you can't rely on lookbehind being supported, then just capture everything and filter out what you don't want, words followed by a colon.






const text = 'hello   this :word is apart';
const regex = /(w+)|(:w+)|(s+)/g;
const parsed = text.match(regex).filter(word => !word.includes(':'));

console.log(parsed);








share|improve this answer
























  • thanks a lot ! exactly the solution i was working on, but much simpler.

    – Gui3
    Nov 17 '18 at 4:25











  • and your solution highlights that no matter the order, regex will take ... the longest group ?

    – Gui3
    Nov 17 '18 at 4:39











  • Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.

    – AnonymousSB
    Nov 17 '18 at 4:50











  • after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....

    – Gui3
    Nov 17 '18 at 14:46


















1














You said you're open to non-regex solutions, but I can give you one that includes both. Since you can't rely on lookbehind being supported, then just capture everything and filter out what you don't want, words followed by a colon.






const text = 'hello   this :word is apart';
const regex = /(w+)|(:w+)|(s+)/g;
const parsed = text.match(regex).filter(word => !word.includes(':'));

console.log(parsed);








share|improve this answer
























  • thanks a lot ! exactly the solution i was working on, but much simpler.

    – Gui3
    Nov 17 '18 at 4:25











  • and your solution highlights that no matter the order, regex will take ... the longest group ?

    – Gui3
    Nov 17 '18 at 4:39











  • Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.

    – AnonymousSB
    Nov 17 '18 at 4:50











  • after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....

    – Gui3
    Nov 17 '18 at 14:46
















1












1








1







You said you're open to non-regex solutions, but I can give you one that includes both. Since you can't rely on lookbehind being supported, then just capture everything and filter out what you don't want, words followed by a colon.






const text = 'hello   this :word is apart';
const regex = /(w+)|(:w+)|(s+)/g;
const parsed = text.match(regex).filter(word => !word.includes(':'));

console.log(parsed);








share|improve this answer













You said you're open to non-regex solutions, but I can give you one that includes both. Since you can't rely on lookbehind being supported, then just capture everything and filter out what you don't want, words followed by a colon.






const text = 'hello   this :word is apart';
const regex = /(w+)|(:w+)|(s+)/g;
const parsed = text.match(regex).filter(word => !word.includes(':'));

console.log(parsed);








const text = 'hello   this :word is apart';
const regex = /(w+)|(:w+)|(s+)/g;
const parsed = text.match(regex).filter(word => !word.includes(':'));

console.log(parsed);





const text = 'hello   this :word is apart';
const regex = /(w+)|(:w+)|(s+)/g;
const parsed = text.match(regex).filter(word => !word.includes(':'));

console.log(parsed);






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 17 '18 at 3:44









AnonymousSBAnonymousSB

2,239221




2,239221













  • thanks a lot ! exactly the solution i was working on, but much simpler.

    – Gui3
    Nov 17 '18 at 4:25











  • and your solution highlights that no matter the order, regex will take ... the longest group ?

    – Gui3
    Nov 17 '18 at 4:39











  • Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.

    – AnonymousSB
    Nov 17 '18 at 4:50











  • after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....

    – Gui3
    Nov 17 '18 at 14:46





















  • thanks a lot ! exactly the solution i was working on, but much simpler.

    – Gui3
    Nov 17 '18 at 4:25











  • and your solution highlights that no matter the order, regex will take ... the longest group ?

    – Gui3
    Nov 17 '18 at 4:39











  • Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.

    – AnonymousSB
    Nov 17 '18 at 4:50











  • after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....

    – Gui3
    Nov 17 '18 at 14:46



















thanks a lot ! exactly the solution i was working on, but much simpler.

– Gui3
Nov 17 '18 at 4:25





thanks a lot ! exactly the solution i was working on, but much simpler.

– Gui3
Nov 17 '18 at 4:25













and your solution highlights that no matter the order, regex will take ... the longest group ?

– Gui3
Nov 17 '18 at 4:39





and your solution highlights that no matter the order, regex will take ... the longest group ?

– Gui3
Nov 17 '18 at 4:39













Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.

– AnonymousSB
Nov 17 '18 at 4:50





Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.

– AnonymousSB
Nov 17 '18 at 4:50













after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....

– Gui3
Nov 17 '18 at 14:46







after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....

– Gui3
Nov 17 '18 at 14:46















1














I would use 2 regexes, first one matches the Words, you DON'T want and then replace them with an empty string, this is the simple regex:



/:w+/g


Then replace with an empty string. Now you have a string, that can be parsed with this regex:



/([ ]+)|([a-z]*)/g


which is a simplified version of your second regex, since forbidden Words are already gone.






share|improve this answer
























  • thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?

    – Gui3
    Nov 17 '18 at 4:22













  • Yes, to my knowledge, w Works everywhere, it's very basic

    – Poul Bak
    Nov 17 '18 at 4:26











  • Yes, w is fully supported in JavaScript, as is d for digits and s for space

    – AnonymousSB
    Nov 17 '18 at 4:28
















1














I would use 2 regexes, first one matches the Words, you DON'T want and then replace them with an empty string, this is the simple regex:



/:w+/g


Then replace with an empty string. Now you have a string, that can be parsed with this regex:



/([ ]+)|([a-z]*)/g


which is a simplified version of your second regex, since forbidden Words are already gone.






share|improve this answer
























  • thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?

    – Gui3
    Nov 17 '18 at 4:22













  • Yes, to my knowledge, w Works everywhere, it's very basic

    – Poul Bak
    Nov 17 '18 at 4:26











  • Yes, w is fully supported in JavaScript, as is d for digits and s for space

    – AnonymousSB
    Nov 17 '18 at 4:28














1












1








1







I would use 2 regexes, first one matches the Words, you DON'T want and then replace them with an empty string, this is the simple regex:



/:w+/g


Then replace with an empty string. Now you have a string, that can be parsed with this regex:



/([ ]+)|([a-z]*)/g


which is a simplified version of your second regex, since forbidden Words are already gone.






share|improve this answer













I would use 2 regexes, first one matches the Words, you DON'T want and then replace them with an empty string, this is the simple regex:



/:w+/g


Then replace with an empty string. Now you have a string, that can be parsed with this regex:



/([ ]+)|([a-z]*)/g


which is a simplified version of your second regex, since forbidden Words are already gone.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 17 '18 at 3:58









Poul BakPoul Bak

5,49331233




5,49331233













  • thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?

    – Gui3
    Nov 17 '18 at 4:22













  • Yes, to my knowledge, w Works everywhere, it's very basic

    – Poul Bak
    Nov 17 '18 at 4:26











  • Yes, w is fully supported in JavaScript, as is d for digits and s for space

    – AnonymousSB
    Nov 17 '18 at 4:28



















  • thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?

    – Gui3
    Nov 17 '18 at 4:22













  • Yes, to my knowledge, w Works everywhere, it's very basic

    – Poul Bak
    Nov 17 '18 at 4:26











  • Yes, w is fully supported in JavaScript, as is d for digits and s for space

    – AnonymousSB
    Nov 17 '18 at 4:28

















thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?

– Gui3
Nov 17 '18 at 4:22







thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?

– Gui3
Nov 17 '18 at 4:22















Yes, to my knowledge, w Works everywhere, it's very basic

– Poul Bak
Nov 17 '18 at 4:26





Yes, to my knowledge, w Works everywhere, it's very basic

– Poul Bak
Nov 17 '18 at 4:26













Yes, w is fully supported in JavaScript, as is d for digits and s for space

– AnonymousSB
Nov 17 '18 at 4:28





Yes, w is fully supported in JavaScript, as is d for digits and s for space

– AnonymousSB
Nov 17 '18 at 4:28


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53347859%2fregex-lookbehind-alternative-for-parser-js%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Xamarin.iOS Cant Deploy on Iphone

Glorious Revolution

Dulmage-Mendelsohn matrix decomposition in Python