regex lookbehind alternative for parser (js)
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
Good morning
(I saw this topic has a LOT of answers but I couldn't find one that fits)
I am writing a little parser in javascript that would cut the text into sections like this :
var tex = "hello this :word is apart"
var parsed = [
"hello",
" ",
"this",
" ",
// ":word" should not be there, neither "word"
" ",
"is",
"apart"
]
the perfect regex for this is :
/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g
But it has a positive lookbehind that, as I read, was only implemented in javascript in 2018, so I guess many browser compatibility conflicts... and I would like it to have at least a little compatibility...
I considered :
- trying capturing groups (?:) but it consumes the space before...
- just removing the spaces-check, but ":word" comes in as "word"
- parsing the text 2 times, one for words, the other for spaces, but i fear putting them in the right order would be a pain
Understand, I NEED words AND ALL spaces, and to exclude some words.
I am open in other methods, like not using regex.
my last option :
removing the spaces-check and organising my whole regex in the right order, praying that ":word" would be kept in the "special words" group before anything else.
my question :
would that work in javascript, and be reliable ?
I tried
/(((:[a-z]+)|([ ]+)|([a-z]*))/g
in https://regexr.com/ seems to work, will it work in every case ?
javascript regex parsing lookbehind
add a comment |
Good morning
(I saw this topic has a LOT of answers but I couldn't find one that fits)
I am writing a little parser in javascript that would cut the text into sections like this :
var tex = "hello this :word is apart"
var parsed = [
"hello",
" ",
"this",
" ",
// ":word" should not be there, neither "word"
" ",
"is",
"apart"
]
the perfect regex for this is :
/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g
But it has a positive lookbehind that, as I read, was only implemented in javascript in 2018, so I guess many browser compatibility conflicts... and I would like it to have at least a little compatibility...
I considered :
- trying capturing groups (?:) but it consumes the space before...
- just removing the spaces-check, but ":word" comes in as "word"
- parsing the text 2 times, one for words, the other for spaces, but i fear putting them in the right order would be a pain
Understand, I NEED words AND ALL spaces, and to exclude some words.
I am open in other methods, like not using regex.
my last option :
removing the spaces-check and organising my whole regex in the right order, praying that ":word" would be kept in the "special words" group before anything else.
my question :
would that work in javascript, and be reliable ?
I tried
/(((:[a-z]+)|([ ]+)|([a-z]*))/g
in https://regexr.com/ seems to work, will it work in every case ?
javascript regex parsing lookbehind
Your second regex has one too many left parenthes.
– Poul Bak
Nov 17 '18 at 3:59
I agree, it was to make groups, but i don't know if it's worth it
– Gui3
Nov 17 '18 at 4:20
add a comment |
Good morning
(I saw this topic has a LOT of answers but I couldn't find one that fits)
I am writing a little parser in javascript that would cut the text into sections like this :
var tex = "hello this :word is apart"
var parsed = [
"hello",
" ",
"this",
" ",
// ":word" should not be there, neither "word"
" ",
"is",
"apart"
]
the perfect regex for this is :
/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g
But it has a positive lookbehind that, as I read, was only implemented in javascript in 2018, so I guess many browser compatibility conflicts... and I would like it to have at least a little compatibility...
I considered :
- trying capturing groups (?:) but it consumes the space before...
- just removing the spaces-check, but ":word" comes in as "word"
- parsing the text 2 times, one for words, the other for spaces, but i fear putting them in the right order would be a pain
Understand, I NEED words AND ALL spaces, and to exclude some words.
I am open in other methods, like not using regex.
my last option :
removing the spaces-check and organising my whole regex in the right order, praying that ":word" would be kept in the "special words" group before anything else.
my question :
would that work in javascript, and be reliable ?
I tried
/(((:[a-z]+)|([ ]+)|([a-z]*))/g
in https://regexr.com/ seems to work, will it work in every case ?
javascript regex parsing lookbehind
Good morning
(I saw this topic has a LOT of answers but I couldn't find one that fits)
I am writing a little parser in javascript that would cut the text into sections like this :
var tex = "hello this :word is apart"
var parsed = [
"hello",
" ",
"this",
" ",
// ":word" should not be there, neither "word"
" ",
"is",
"apart"
]
the perfect regex for this is :
/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g
But it has a positive lookbehind that, as I read, was only implemented in javascript in 2018, so I guess many browser compatibility conflicts... and I would like it to have at least a little compatibility...
I considered :
- trying capturing groups (?:) but it consumes the space before...
- just removing the spaces-check, but ":word" comes in as "word"
- parsing the text 2 times, one for words, the other for spaces, but i fear putting them in the right order would be a pain
Understand, I NEED words AND ALL spaces, and to exclude some words.
I am open in other methods, like not using regex.
my last option :
removing the spaces-check and organising my whole regex in the right order, praying that ":word" would be kept in the "special words" group before anything else.
my question :
would that work in javascript, and be reliable ?
I tried
/(((:[a-z]+)|([ ]+)|([a-z]*))/g
in https://regexr.com/ seems to work, will it work in every case ?
javascript regex parsing lookbehind
javascript regex parsing lookbehind
edited Nov 17 '18 at 3:28
Gui3
asked Nov 17 '18 at 3:14
Gui3Gui3
288
288
Your second regex has one too many left parenthes.
– Poul Bak
Nov 17 '18 at 3:59
I agree, it was to make groups, but i don't know if it's worth it
– Gui3
Nov 17 '18 at 4:20
add a comment |
Your second regex has one too many left parenthes.
– Poul Bak
Nov 17 '18 at 3:59
I agree, it was to make groups, but i don't know if it's worth it
– Gui3
Nov 17 '18 at 4:20
Your second regex has one too many left parenthes.
– Poul Bak
Nov 17 '18 at 3:59
Your second regex has one too many left parenthes.
– Poul Bak
Nov 17 '18 at 3:59
I agree, it was to make groups, but i don't know if it's worth it
– Gui3
Nov 17 '18 at 4:20
I agree, it was to make groups, but i don't know if it's worth it
– Gui3
Nov 17 '18 at 4:20
add a comment |
2 Answers
2
active
oldest
votes
You said you're open to non-regex solutions, but I can give you one that includes both. Since you can't rely on lookbehind being supported, then just capture everything and filter out what you don't want, words followed by a colon.
const text = 'hello this :word is apart';
const regex = /(w+)|(:w+)|(s+)/g;
const parsed = text.match(regex).filter(word => !word.includes(':'));
console.log(parsed);
thanks a lot ! exactly the solution i was working on, but much simpler.
– Gui3
Nov 17 '18 at 4:25
and your solution highlights that no matter the order, regex will take ... the longest group ?
– Gui3
Nov 17 '18 at 4:39
Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.
– AnonymousSB
Nov 17 '18 at 4:50
after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....
– Gui3
Nov 17 '18 at 14:46
add a comment |
I would use 2 regexes, first one matches the Words, you DON'T want and then replace
them with an empty string
, this is the simple regex:
/:w+/g
Then replace
with an empty string
. Now you have a string, that can be parsed with this regex:
/([ ]+)|([a-z]*)/g
which is a simplified version of your second regex, since forbidden Words are already gone.
thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?
– Gui3
Nov 17 '18 at 4:22
Yes, to my knowledge, w Works everywhere, it's very basic
– Poul Bak
Nov 17 '18 at 4:26
Yes, w is fully supported in JavaScript, as is d for digits and s for space
– AnonymousSB
Nov 17 '18 at 4:28
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53347859%2fregex-lookbehind-alternative-for-parser-js%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You said you're open to non-regex solutions, but I can give you one that includes both. Since you can't rely on lookbehind being supported, then just capture everything and filter out what you don't want, words followed by a colon.
const text = 'hello this :word is apart';
const regex = /(w+)|(:w+)|(s+)/g;
const parsed = text.match(regex).filter(word => !word.includes(':'));
console.log(parsed);
thanks a lot ! exactly the solution i was working on, but much simpler.
– Gui3
Nov 17 '18 at 4:25
and your solution highlights that no matter the order, regex will take ... the longest group ?
– Gui3
Nov 17 '18 at 4:39
Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.
– AnonymousSB
Nov 17 '18 at 4:50
after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....
– Gui3
Nov 17 '18 at 14:46
add a comment |
You said you're open to non-regex solutions, but I can give you one that includes both. Since you can't rely on lookbehind being supported, then just capture everything and filter out what you don't want, words followed by a colon.
const text = 'hello this :word is apart';
const regex = /(w+)|(:w+)|(s+)/g;
const parsed = text.match(regex).filter(word => !word.includes(':'));
console.log(parsed);
thanks a lot ! exactly the solution i was working on, but much simpler.
– Gui3
Nov 17 '18 at 4:25
and your solution highlights that no matter the order, regex will take ... the longest group ?
– Gui3
Nov 17 '18 at 4:39
Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.
– AnonymousSB
Nov 17 '18 at 4:50
after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....
– Gui3
Nov 17 '18 at 14:46
add a comment |
You said you're open to non-regex solutions, but I can give you one that includes both. Since you can't rely on lookbehind being supported, then just capture everything and filter out what you don't want, words followed by a colon.
const text = 'hello this :word is apart';
const regex = /(w+)|(:w+)|(s+)/g;
const parsed = text.match(regex).filter(word => !word.includes(':'));
console.log(parsed);
You said you're open to non-regex solutions, but I can give you one that includes both. Since you can't rely on lookbehind being supported, then just capture everything and filter out what you don't want, words followed by a colon.
const text = 'hello this :word is apart';
const regex = /(w+)|(:w+)|(s+)/g;
const parsed = text.match(regex).filter(word => !word.includes(':'));
console.log(parsed);
const text = 'hello this :word is apart';
const regex = /(w+)|(:w+)|(s+)/g;
const parsed = text.match(regex).filter(word => !word.includes(':'));
console.log(parsed);
const text = 'hello this :word is apart';
const regex = /(w+)|(:w+)|(s+)/g;
const parsed = text.match(regex).filter(word => !word.includes(':'));
console.log(parsed);
answered Nov 17 '18 at 3:44
AnonymousSBAnonymousSB
2,239221
2,239221
thanks a lot ! exactly the solution i was working on, but much simpler.
– Gui3
Nov 17 '18 at 4:25
and your solution highlights that no matter the order, regex will take ... the longest group ?
– Gui3
Nov 17 '18 at 4:39
Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.
– AnonymousSB
Nov 17 '18 at 4:50
after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....
– Gui3
Nov 17 '18 at 14:46
add a comment |
thanks a lot ! exactly the solution i was working on, but much simpler.
– Gui3
Nov 17 '18 at 4:25
and your solution highlights that no matter the order, regex will take ... the longest group ?
– Gui3
Nov 17 '18 at 4:39
Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.
– AnonymousSB
Nov 17 '18 at 4:50
after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....
– Gui3
Nov 17 '18 at 14:46
thanks a lot ! exactly the solution i was working on, but much simpler.
– Gui3
Nov 17 '18 at 4:25
thanks a lot ! exactly the solution i was working on, but much simpler.
– Gui3
Nov 17 '18 at 4:25
and your solution highlights that no matter the order, regex will take ... the longest group ?
– Gui3
Nov 17 '18 at 4:39
and your solution highlights that no matter the order, regex will take ... the longest group ?
– Gui3
Nov 17 '18 at 4:39
Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.
– AnonymousSB
Nov 17 '18 at 4:50
Not at all, those are three different capture groups. The first finds just groups of letters, the second finds a colon followed by a group of letters, and the last one matches just spaces.
– AnonymousSB
Nov 17 '18 at 4:50
after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....
– Gui3
Nov 17 '18 at 14:46
after many attempts, it does seem to me that regex matches group by order : /(some regex)|(.+?)/g makes everything that is not "some regex" the second group, but if I invert the groups everything is in the lazy group, even the "some regex" matching points....
– Gui3
Nov 17 '18 at 14:46
add a comment |
I would use 2 regexes, first one matches the Words, you DON'T want and then replace
them with an empty string
, this is the simple regex:
/:w+/g
Then replace
with an empty string
. Now you have a string, that can be parsed with this regex:
/([ ]+)|([a-z]*)/g
which is a simplified version of your second regex, since forbidden Words are already gone.
thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?
– Gui3
Nov 17 '18 at 4:22
Yes, to my knowledge, w Works everywhere, it's very basic
– Poul Bak
Nov 17 '18 at 4:26
Yes, w is fully supported in JavaScript, as is d for digits and s for space
– AnonymousSB
Nov 17 '18 at 4:28
add a comment |
I would use 2 regexes, first one matches the Words, you DON'T want and then replace
them with an empty string
, this is the simple regex:
/:w+/g
Then replace
with an empty string
. Now you have a string, that can be parsed with this regex:
/([ ]+)|([a-z]*)/g
which is a simplified version of your second regex, since forbidden Words are already gone.
thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?
– Gui3
Nov 17 '18 at 4:22
Yes, to my knowledge, w Works everywhere, it's very basic
– Poul Bak
Nov 17 '18 at 4:26
Yes, w is fully supported in JavaScript, as is d for digits and s for space
– AnonymousSB
Nov 17 '18 at 4:28
add a comment |
I would use 2 regexes, first one matches the Words, you DON'T want and then replace
them with an empty string
, this is the simple regex:
/:w+/g
Then replace
with an empty string
. Now you have a string, that can be parsed with this regex:
/([ ]+)|([a-z]*)/g
which is a simplified version of your second regex, since forbidden Words are already gone.
I would use 2 regexes, first one matches the Words, you DON'T want and then replace
them with an empty string
, this is the simple regex:
/:w+/g
Then replace
with an empty string
. Now you have a string, that can be parsed with this regex:
/([ ]+)|([a-z]*)/g
which is a simplified version of your second regex, since forbidden Words are already gone.
answered Nov 17 '18 at 3:58
Poul BakPoul Bak
5,49331233
5,49331233
thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?
– Gui3
Nov 17 '18 at 4:22
Yes, to my knowledge, w Works everywhere, it's very basic
– Poul Bak
Nov 17 '18 at 4:26
Yes, w is fully supported in JavaScript, as is d for digits and s for space
– AnonymousSB
Nov 17 '18 at 4:28
add a comment |
thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?
– Gui3
Nov 17 '18 at 4:22
Yes, to my knowledge, w Works everywhere, it's very basic
– Poul Bak
Nov 17 '18 at 4:26
Yes, w is fully supported in JavaScript, as is d for digits and s for space
– AnonymousSB
Nov 17 '18 at 4:28
thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?
– Gui3
Nov 17 '18 at 4:22
thanks ! I'm working on something of this kind, first parsing the text to classify words then parsing words again to keep those I want... thank for the w ! does it work everywhere ?
– Gui3
Nov 17 '18 at 4:22
Yes, to my knowledge, w Works everywhere, it's very basic
– Poul Bak
Nov 17 '18 at 4:26
Yes, to my knowledge, w Works everywhere, it's very basic
– Poul Bak
Nov 17 '18 at 4:26
Yes, w is fully supported in JavaScript, as is d for digits and s for space
– AnonymousSB
Nov 17 '18 at 4:28
Yes, w is fully supported in JavaScript, as is d for digits and s for space
– AnonymousSB
Nov 17 '18 at 4:28
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53347859%2fregex-lookbehind-alternative-for-parser-js%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Your second regex has one too many left parenthes.
– Poul Bak
Nov 17 '18 at 3:59
I agree, it was to make groups, but i don't know if it's worth it
– Gui3
Nov 17 '18 at 4:20