Python regex - any substring matches
up vote
1
down vote
favorite
I want to find dates in the formats 18-05-2018 and 18-05-18, but not 2018-05-18. I want to use regular expressions such that I get True when such a date appears in a string.
So it should return True for these strings:
ggggg18-05-2018gggggggggg18-05-2018ggggg12345678ggggg18-05-18gggggggggg18-05-18ggggg12345678
But it should return False for these strings:
ggggg2018-05-18gggggggggg2018-05-18ggggg12345678
How to do it? I've found findall() method and pattern 'd{1,2}[-]d{1,2}[-]d{2,4}' but it returned True for the last two strings, as it found 18-05-18 in them.
python regex
add a comment |
up vote
1
down vote
favorite
I want to find dates in the formats 18-05-2018 and 18-05-18, but not 2018-05-18. I want to use regular expressions such that I get True when such a date appears in a string.
So it should return True for these strings:
ggggg18-05-2018gggggggggg18-05-2018ggggg12345678ggggg18-05-18gggggggggg18-05-18ggggg12345678
But it should return False for these strings:
ggggg2018-05-18gggggggggg2018-05-18ggggg12345678
How to do it? I've found findall() method and pattern 'd{1,2}[-]d{1,2}[-]d{2,4}' but it returned True for the last two strings, as it found 18-05-18 in them.
python regex
What should it return for e.g.2018-12-12-12or01.01.01-01-01? And what for strings that contain dates for both formats likeggg18-05-18gggg2018-05-18ggg?
– das-g
Nov 11 at 10:58
It should return true if there is even one that matches the pattern
– Clyde Barrow
Nov 11 at 11:05
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I want to find dates in the formats 18-05-2018 and 18-05-18, but not 2018-05-18. I want to use regular expressions such that I get True when such a date appears in a string.
So it should return True for these strings:
ggggg18-05-2018gggggggggg18-05-2018ggggg12345678ggggg18-05-18gggggggggg18-05-18ggggg12345678
But it should return False for these strings:
ggggg2018-05-18gggggggggg2018-05-18ggggg12345678
How to do it? I've found findall() method and pattern 'd{1,2}[-]d{1,2}[-]d{2,4}' but it returned True for the last two strings, as it found 18-05-18 in them.
python regex
I want to find dates in the formats 18-05-2018 and 18-05-18, but not 2018-05-18. I want to use regular expressions such that I get True when such a date appears in a string.
So it should return True for these strings:
ggggg18-05-2018gggggggggg18-05-2018ggggg12345678ggggg18-05-18gggggggggg18-05-18ggggg12345678
But it should return False for these strings:
ggggg2018-05-18gggggggggg2018-05-18ggggg12345678
How to do it? I've found findall() method and pattern 'd{1,2}[-]d{1,2}[-]d{2,4}' but it returned True for the last two strings, as it found 18-05-18 in them.
python regex
python regex
edited Nov 11 at 10:54
das-g
5,86322250
5,86322250
asked Nov 11 at 10:33
Clyde Barrow
718
718
What should it return for e.g.2018-12-12-12or01.01.01-01-01? And what for strings that contain dates for both formats likeggg18-05-18gggg2018-05-18ggg?
– das-g
Nov 11 at 10:58
It should return true if there is even one that matches the pattern
– Clyde Barrow
Nov 11 at 11:05
add a comment |
What should it return for e.g.2018-12-12-12or01.01.01-01-01? And what for strings that contain dates for both formats likeggg18-05-18gggg2018-05-18ggg?
– das-g
Nov 11 at 10:58
It should return true if there is even one that matches the pattern
– Clyde Barrow
Nov 11 at 11:05
What should it return for e.g.
2018-12-12-12 or 01.01.01-01-01? And what for strings that contain dates for both formats like ggg18-05-18gggg2018-05-18ggg?– das-g
Nov 11 at 10:58
What should it return for e.g.
2018-12-12-12 or 01.01.01-01-01? And what for strings that contain dates for both formats like ggg18-05-18gggg2018-05-18ggg?– das-g
Nov 11 at 10:58
It should return true if there is even one that matches the pattern
– Clyde Barrow
Nov 11 at 11:05
It should return true if there is even one that matches the pattern
– Clyde Barrow
Nov 11 at 11:05
add a comment |
4 Answers
4
active
oldest
votes
up vote
2
down vote
accepted
Use negative lookbehind and lookahead:
import re
s = 'sasdassdsadasdadas18-05-2018sdaq1213211214142'
print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
# ['18-05-2018']
This makes sure that there is no trailing digits at the beginning or at the end of what is desired.
To prove that it handles your error case:
import re
s = 'sasdassdsadasdadas2018-05-2018sdaq1213211214142'
print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
#
add a comment |
up vote
1
down vote
One approach is to check that what comes before the start of the date match is either a non number of the start of the input, and that what comes after the date match is also a non digit or the end of the input.
text = "sasdassdsadasdadas18-05-2018sdaq1213211214142"
matches = re.findall(r'(?:D|^)(d{1,2}[-]d{1,2}[-]d{2,4})(?:D|$)', text)
print matches
['18-05-2018']
add a comment |
up vote
0
down vote
I'd suggest using a negative lookbehind (?<!...), which you can insert at any point in a regular expression to ensure that whatever comes right before that point does not match a certain expression (the ...). In your case, you want to ensure that what comes right before the beginning of the expression doesn't match a digit (d), so you would insert (?<!d) at the beginning of your regex.
If you would also like to exclude matches with the wrong number of digits at the end, as in aaaa18-05-181bbb, then you could also use a negative lookahead (?!...), which is similar to the negative lookbehind except that it ensures whatever comes after a certain point does not match an expression. In your case, to ensure that a digit does not come after the end of the match, you'd add (?!d) at the end of your expression.
add a comment |
up vote
0
down vote
You could use a negative lookbehind and a negative lookahead to assert that there are no digits on the left and on the right side. To match either 2 or 4 digits at the end you could use an alternation:
(?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)
Regex demo
import re
str = 'ggggg18-05-2018ggggg12345678'
print(re.findall(r'(?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)', str))
Note that you can use the hyphen without the character class.
Demo Python
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Use negative lookbehind and lookahead:
import re
s = 'sasdassdsadasdadas18-05-2018sdaq1213211214142'
print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
# ['18-05-2018']
This makes sure that there is no trailing digits at the beginning or at the end of what is desired.
To prove that it handles your error case:
import re
s = 'sasdassdsadasdadas2018-05-2018sdaq1213211214142'
print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
#
add a comment |
up vote
2
down vote
accepted
Use negative lookbehind and lookahead:
import re
s = 'sasdassdsadasdadas18-05-2018sdaq1213211214142'
print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
# ['18-05-2018']
This makes sure that there is no trailing digits at the beginning or at the end of what is desired.
To prove that it handles your error case:
import re
s = 'sasdassdsadasdadas2018-05-2018sdaq1213211214142'
print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
#
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Use negative lookbehind and lookahead:
import re
s = 'sasdassdsadasdadas18-05-2018sdaq1213211214142'
print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
# ['18-05-2018']
This makes sure that there is no trailing digits at the beginning or at the end of what is desired.
To prove that it handles your error case:
import re
s = 'sasdassdsadasdadas2018-05-2018sdaq1213211214142'
print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
#
Use negative lookbehind and lookahead:
import re
s = 'sasdassdsadasdadas18-05-2018sdaq1213211214142'
print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
# ['18-05-2018']
This makes sure that there is no trailing digits at the beginning or at the end of what is desired.
To prove that it handles your error case:
import re
s = 'sasdassdsadasdadas2018-05-2018sdaq1213211214142'
print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
#
edited Nov 11 at 10:44
answered Nov 11 at 10:39
Austin
8,8293828
8,8293828
add a comment |
add a comment |
up vote
1
down vote
One approach is to check that what comes before the start of the date match is either a non number of the start of the input, and that what comes after the date match is also a non digit or the end of the input.
text = "sasdassdsadasdadas18-05-2018sdaq1213211214142"
matches = re.findall(r'(?:D|^)(d{1,2}[-]d{1,2}[-]d{2,4})(?:D|$)', text)
print matches
['18-05-2018']
add a comment |
up vote
1
down vote
One approach is to check that what comes before the start of the date match is either a non number of the start of the input, and that what comes after the date match is also a non digit or the end of the input.
text = "sasdassdsadasdadas18-05-2018sdaq1213211214142"
matches = re.findall(r'(?:D|^)(d{1,2}[-]d{1,2}[-]d{2,4})(?:D|$)', text)
print matches
['18-05-2018']
add a comment |
up vote
1
down vote
up vote
1
down vote
One approach is to check that what comes before the start of the date match is either a non number of the start of the input, and that what comes after the date match is also a non digit or the end of the input.
text = "sasdassdsadasdadas18-05-2018sdaq1213211214142"
matches = re.findall(r'(?:D|^)(d{1,2}[-]d{1,2}[-]d{2,4})(?:D|$)', text)
print matches
['18-05-2018']
One approach is to check that what comes before the start of the date match is either a non number of the start of the input, and that what comes after the date match is also a non digit or the end of the input.
text = "sasdassdsadasdadas18-05-2018sdaq1213211214142"
matches = re.findall(r'(?:D|^)(d{1,2}[-]d{1,2}[-]d{2,4})(?:D|$)', text)
print matches
['18-05-2018']
answered Nov 11 at 10:42
Tim Biegeleisen
211k1382129
211k1382129
add a comment |
add a comment |
up vote
0
down vote
I'd suggest using a negative lookbehind (?<!...), which you can insert at any point in a regular expression to ensure that whatever comes right before that point does not match a certain expression (the ...). In your case, you want to ensure that what comes right before the beginning of the expression doesn't match a digit (d), so you would insert (?<!d) at the beginning of your regex.
If you would also like to exclude matches with the wrong number of digits at the end, as in aaaa18-05-181bbb, then you could also use a negative lookahead (?!...), which is similar to the negative lookbehind except that it ensures whatever comes after a certain point does not match an expression. In your case, to ensure that a digit does not come after the end of the match, you'd add (?!d) at the end of your expression.
add a comment |
up vote
0
down vote
I'd suggest using a negative lookbehind (?<!...), which you can insert at any point in a regular expression to ensure that whatever comes right before that point does not match a certain expression (the ...). In your case, you want to ensure that what comes right before the beginning of the expression doesn't match a digit (d), so you would insert (?<!d) at the beginning of your regex.
If you would also like to exclude matches with the wrong number of digits at the end, as in aaaa18-05-181bbb, then you could also use a negative lookahead (?!...), which is similar to the negative lookbehind except that it ensures whatever comes after a certain point does not match an expression. In your case, to ensure that a digit does not come after the end of the match, you'd add (?!d) at the end of your expression.
add a comment |
up vote
0
down vote
up vote
0
down vote
I'd suggest using a negative lookbehind (?<!...), which you can insert at any point in a regular expression to ensure that whatever comes right before that point does not match a certain expression (the ...). In your case, you want to ensure that what comes right before the beginning of the expression doesn't match a digit (d), so you would insert (?<!d) at the beginning of your regex.
If you would also like to exclude matches with the wrong number of digits at the end, as in aaaa18-05-181bbb, then you could also use a negative lookahead (?!...), which is similar to the negative lookbehind except that it ensures whatever comes after a certain point does not match an expression. In your case, to ensure that a digit does not come after the end of the match, you'd add (?!d) at the end of your expression.
I'd suggest using a negative lookbehind (?<!...), which you can insert at any point in a regular expression to ensure that whatever comes right before that point does not match a certain expression (the ...). In your case, you want to ensure that what comes right before the beginning of the expression doesn't match a digit (d), so you would insert (?<!d) at the beginning of your regex.
If you would also like to exclude matches with the wrong number of digits at the end, as in aaaa18-05-181bbb, then you could also use a negative lookahead (?!...), which is similar to the negative lookbehind except that it ensures whatever comes after a certain point does not match an expression. In your case, to ensure that a digit does not come after the end of the match, you'd add (?!d) at the end of your expression.
answered Nov 11 at 10:40
David Z
93.6k17197236
93.6k17197236
add a comment |
add a comment |
up vote
0
down vote
You could use a negative lookbehind and a negative lookahead to assert that there are no digits on the left and on the right side. To match either 2 or 4 digits at the end you could use an alternation:
(?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)
Regex demo
import re
str = 'ggggg18-05-2018ggggg12345678'
print(re.findall(r'(?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)', str))
Note that you can use the hyphen without the character class.
Demo Python
add a comment |
up vote
0
down vote
You could use a negative lookbehind and a negative lookahead to assert that there are no digits on the left and on the right side. To match either 2 or 4 digits at the end you could use an alternation:
(?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)
Regex demo
import re
str = 'ggggg18-05-2018ggggg12345678'
print(re.findall(r'(?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)', str))
Note that you can use the hyphen without the character class.
Demo Python
add a comment |
up vote
0
down vote
up vote
0
down vote
You could use a negative lookbehind and a negative lookahead to assert that there are no digits on the left and on the right side. To match either 2 or 4 digits at the end you could use an alternation:
(?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)
Regex demo
import re
str = 'ggggg18-05-2018ggggg12345678'
print(re.findall(r'(?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)', str))
Note that you can use the hyphen without the character class.
Demo Python
You could use a negative lookbehind and a negative lookahead to assert that there are no digits on the left and on the right side. To match either 2 or 4 digits at the end you could use an alternation:
(?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)
Regex demo
import re
str = 'ggggg18-05-2018ggggg12345678'
print(re.findall(r'(?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)', str))
Note that you can use the hyphen without the character class.
Demo Python
edited Nov 11 at 11:21
answered Nov 11 at 11:15
The fourth bird
19.1k71323
19.1k71323
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53247866%2fpython-regex-any-substring-matches%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What should it return for e.g.
2018-12-12-12or01.01.01-01-01? And what for strings that contain dates for both formats likeggg18-05-18gggg2018-05-18ggg?– das-g
Nov 11 at 10:58
It should return true if there is even one that matches the pattern
– Clyde Barrow
Nov 11 at 11:05