Python regex - any substring matches











up vote
1
down vote

favorite












I want to find dates in the formats 18-05-2018 and 18-05-18, but not 2018-05-18. I want to use regular expressions such that I get True when such a date appears in a string.



So it should return True for these strings:




  • ggggg18-05-2018ggggg

  • ggggg18-05-2018ggggg12345678

  • ggggg18-05-18ggggg

  • ggggg18-05-18ggggg12345678


But it should return False for these strings:




  • ggggg2018-05-18ggggg

  • ggggg2018-05-18ggggg12345678


How to do it? I've found findall() method and pattern 'd{1,2}[-]d{1,2}[-]d{2,4}' but it returned True for the last two strings, as it found 18-05-18 in them.










share|improve this question
























  • What should it return for e.g. 2018-12-12-12 or 01.01.01-01-01? And what for strings that contain dates for both formats like ggg18-05-18gggg2018-05-18ggg?
    – das-g
    Nov 11 at 10:58










  • It should return true if there is even one that matches the pattern
    – Clyde Barrow
    Nov 11 at 11:05















up vote
1
down vote

favorite












I want to find dates in the formats 18-05-2018 and 18-05-18, but not 2018-05-18. I want to use regular expressions such that I get True when such a date appears in a string.



So it should return True for these strings:




  • ggggg18-05-2018ggggg

  • ggggg18-05-2018ggggg12345678

  • ggggg18-05-18ggggg

  • ggggg18-05-18ggggg12345678


But it should return False for these strings:




  • ggggg2018-05-18ggggg

  • ggggg2018-05-18ggggg12345678


How to do it? I've found findall() method and pattern 'd{1,2}[-]d{1,2}[-]d{2,4}' but it returned True for the last two strings, as it found 18-05-18 in them.










share|improve this question
























  • What should it return for e.g. 2018-12-12-12 or 01.01.01-01-01? And what for strings that contain dates for both formats like ggg18-05-18gggg2018-05-18ggg?
    – das-g
    Nov 11 at 10:58










  • It should return true if there is even one that matches the pattern
    – Clyde Barrow
    Nov 11 at 11:05













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I want to find dates in the formats 18-05-2018 and 18-05-18, but not 2018-05-18. I want to use regular expressions such that I get True when such a date appears in a string.



So it should return True for these strings:




  • ggggg18-05-2018ggggg

  • ggggg18-05-2018ggggg12345678

  • ggggg18-05-18ggggg

  • ggggg18-05-18ggggg12345678


But it should return False for these strings:




  • ggggg2018-05-18ggggg

  • ggggg2018-05-18ggggg12345678


How to do it? I've found findall() method and pattern 'd{1,2}[-]d{1,2}[-]d{2,4}' but it returned True for the last two strings, as it found 18-05-18 in them.










share|improve this question















I want to find dates in the formats 18-05-2018 and 18-05-18, but not 2018-05-18. I want to use regular expressions such that I get True when such a date appears in a string.



So it should return True for these strings:




  • ggggg18-05-2018ggggg

  • ggggg18-05-2018ggggg12345678

  • ggggg18-05-18ggggg

  • ggggg18-05-18ggggg12345678


But it should return False for these strings:




  • ggggg2018-05-18ggggg

  • ggggg2018-05-18ggggg12345678


How to do it? I've found findall() method and pattern 'd{1,2}[-]d{1,2}[-]d{2,4}' but it returned True for the last two strings, as it found 18-05-18 in them.







python regex






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 10:54









das-g

5,86322250




5,86322250










asked Nov 11 at 10:33









Clyde Barrow

718




718












  • What should it return for e.g. 2018-12-12-12 or 01.01.01-01-01? And what for strings that contain dates for both formats like ggg18-05-18gggg2018-05-18ggg?
    – das-g
    Nov 11 at 10:58










  • It should return true if there is even one that matches the pattern
    – Clyde Barrow
    Nov 11 at 11:05


















  • What should it return for e.g. 2018-12-12-12 or 01.01.01-01-01? And what for strings that contain dates for both formats like ggg18-05-18gggg2018-05-18ggg?
    – das-g
    Nov 11 at 10:58










  • It should return true if there is even one that matches the pattern
    – Clyde Barrow
    Nov 11 at 11:05
















What should it return for e.g. 2018-12-12-12 or 01.01.01-01-01? And what for strings that contain dates for both formats like ggg18-05-18gggg2018-05-18ggg?
– das-g
Nov 11 at 10:58




What should it return for e.g. 2018-12-12-12 or 01.01.01-01-01? And what for strings that contain dates for both formats like ggg18-05-18gggg2018-05-18ggg?
– das-g
Nov 11 at 10:58












It should return true if there is even one that matches the pattern
– Clyde Barrow
Nov 11 at 11:05




It should return true if there is even one that matches the pattern
– Clyde Barrow
Nov 11 at 11:05












4 Answers
4






active

oldest

votes

















up vote
2
down vote



accepted










Use negative lookbehind and lookahead:



import re

s = 'sasdassdsadasdadas18-05-2018sdaq1213211214142'

print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
# ['18-05-2018']


This makes sure that there is no trailing digits at the beginning or at the end of what is desired.





To prove that it handles your error case:



import re

s = 'sasdassdsadasdadas2018-05-2018sdaq1213211214142'

print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
#





share|improve this answer






























    up vote
    1
    down vote













    One approach is to check that what comes before the start of the date match is either a non number of the start of the input, and that what comes after the date match is also a non digit or the end of the input.



    text = "sasdassdsadasdadas18-05-2018sdaq1213211214142"
    matches = re.findall(r'(?:D|^)(d{1,2}[-]d{1,2}[-]d{2,4})(?:D|$)', text)
    print matches

    ['18-05-2018']





    share|improve this answer




























      up vote
      0
      down vote













      I'd suggest using a negative lookbehind (?<!...), which you can insert at any point in a regular expression to ensure that whatever comes right before that point does not match a certain expression (the ...). In your case, you want to ensure that what comes right before the beginning of the expression doesn't match a digit (d), so you would insert (?<!d) at the beginning of your regex.



      If you would also like to exclude matches with the wrong number of digits at the end, as in aaaa18-05-181bbb, then you could also use a negative lookahead (?!...), which is similar to the negative lookbehind except that it ensures whatever comes after a certain point does not match an expression. In your case, to ensure that a digit does not come after the end of the match, you'd add (?!d) at the end of your expression.






      share|improve this answer




























        up vote
        0
        down vote













        You could use a negative lookbehind and a negative lookahead to assert that there are no digits on the left and on the right side. To match either 2 or 4 digits at the end you could use an alternation:



        (?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)



        Regex demo



        import re
        str = 'ggggg18-05-2018ggggg12345678'
        print(re.findall(r'(?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)', str))


        Note that you can use the hyphen without the character class.



        Demo Python






        share|improve this answer























          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53247866%2fpython-regex-any-substring-matches%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          4 Answers
          4






          active

          oldest

          votes








          4 Answers
          4






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          2
          down vote



          accepted










          Use negative lookbehind and lookahead:



          import re

          s = 'sasdassdsadasdadas18-05-2018sdaq1213211214142'

          print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
          # ['18-05-2018']


          This makes sure that there is no trailing digits at the beginning or at the end of what is desired.





          To prove that it handles your error case:



          import re

          s = 'sasdassdsadasdadas2018-05-2018sdaq1213211214142'

          print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
          #





          share|improve this answer



























            up vote
            2
            down vote



            accepted










            Use negative lookbehind and lookahead:



            import re

            s = 'sasdassdsadasdadas18-05-2018sdaq1213211214142'

            print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
            # ['18-05-2018']


            This makes sure that there is no trailing digits at the beginning or at the end of what is desired.





            To prove that it handles your error case:



            import re

            s = 'sasdassdsadasdadas2018-05-2018sdaq1213211214142'

            print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
            #





            share|improve this answer

























              up vote
              2
              down vote



              accepted







              up vote
              2
              down vote



              accepted






              Use negative lookbehind and lookahead:



              import re

              s = 'sasdassdsadasdadas18-05-2018sdaq1213211214142'

              print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
              # ['18-05-2018']


              This makes sure that there is no trailing digits at the beginning or at the end of what is desired.





              To prove that it handles your error case:



              import re

              s = 'sasdassdsadasdadas2018-05-2018sdaq1213211214142'

              print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
              #





              share|improve this answer














              Use negative lookbehind and lookahead:



              import re

              s = 'sasdassdsadasdadas18-05-2018sdaq1213211214142'

              print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
              # ['18-05-2018']


              This makes sure that there is no trailing digits at the beginning or at the end of what is desired.





              To prove that it handles your error case:



              import re

              s = 'sasdassdsadasdadas2018-05-2018sdaq1213211214142'

              print(re.findall(r'(?<!d)d{1,2}[-]d{1,2}[-]d{2,4}(?!d)', s))
              #






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 11 at 10:44

























              answered Nov 11 at 10:39









              Austin

              8,8293828




              8,8293828
























                  up vote
                  1
                  down vote













                  One approach is to check that what comes before the start of the date match is either a non number of the start of the input, and that what comes after the date match is also a non digit or the end of the input.



                  text = "sasdassdsadasdadas18-05-2018sdaq1213211214142"
                  matches = re.findall(r'(?:D|^)(d{1,2}[-]d{1,2}[-]d{2,4})(?:D|$)', text)
                  print matches

                  ['18-05-2018']





                  share|improve this answer

























                    up vote
                    1
                    down vote













                    One approach is to check that what comes before the start of the date match is either a non number of the start of the input, and that what comes after the date match is also a non digit or the end of the input.



                    text = "sasdassdsadasdadas18-05-2018sdaq1213211214142"
                    matches = re.findall(r'(?:D|^)(d{1,2}[-]d{1,2}[-]d{2,4})(?:D|$)', text)
                    print matches

                    ['18-05-2018']





                    share|improve this answer























                      up vote
                      1
                      down vote










                      up vote
                      1
                      down vote









                      One approach is to check that what comes before the start of the date match is either a non number of the start of the input, and that what comes after the date match is also a non digit or the end of the input.



                      text = "sasdassdsadasdadas18-05-2018sdaq1213211214142"
                      matches = re.findall(r'(?:D|^)(d{1,2}[-]d{1,2}[-]d{2,4})(?:D|$)', text)
                      print matches

                      ['18-05-2018']





                      share|improve this answer












                      One approach is to check that what comes before the start of the date match is either a non number of the start of the input, and that what comes after the date match is also a non digit or the end of the input.



                      text = "sasdassdsadasdadas18-05-2018sdaq1213211214142"
                      matches = re.findall(r'(?:D|^)(d{1,2}[-]d{1,2}[-]d{2,4})(?:D|$)', text)
                      print matches

                      ['18-05-2018']






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Nov 11 at 10:42









                      Tim Biegeleisen

                      211k1382129




                      211k1382129






















                          up vote
                          0
                          down vote













                          I'd suggest using a negative lookbehind (?<!...), which you can insert at any point in a regular expression to ensure that whatever comes right before that point does not match a certain expression (the ...). In your case, you want to ensure that what comes right before the beginning of the expression doesn't match a digit (d), so you would insert (?<!d) at the beginning of your regex.



                          If you would also like to exclude matches with the wrong number of digits at the end, as in aaaa18-05-181bbb, then you could also use a negative lookahead (?!...), which is similar to the negative lookbehind except that it ensures whatever comes after a certain point does not match an expression. In your case, to ensure that a digit does not come after the end of the match, you'd add (?!d) at the end of your expression.






                          share|improve this answer

























                            up vote
                            0
                            down vote













                            I'd suggest using a negative lookbehind (?<!...), which you can insert at any point in a regular expression to ensure that whatever comes right before that point does not match a certain expression (the ...). In your case, you want to ensure that what comes right before the beginning of the expression doesn't match a digit (d), so you would insert (?<!d) at the beginning of your regex.



                            If you would also like to exclude matches with the wrong number of digits at the end, as in aaaa18-05-181bbb, then you could also use a negative lookahead (?!...), which is similar to the negative lookbehind except that it ensures whatever comes after a certain point does not match an expression. In your case, to ensure that a digit does not come after the end of the match, you'd add (?!d) at the end of your expression.






                            share|improve this answer























                              up vote
                              0
                              down vote










                              up vote
                              0
                              down vote









                              I'd suggest using a negative lookbehind (?<!...), which you can insert at any point in a regular expression to ensure that whatever comes right before that point does not match a certain expression (the ...). In your case, you want to ensure that what comes right before the beginning of the expression doesn't match a digit (d), so you would insert (?<!d) at the beginning of your regex.



                              If you would also like to exclude matches with the wrong number of digits at the end, as in aaaa18-05-181bbb, then you could also use a negative lookahead (?!...), which is similar to the negative lookbehind except that it ensures whatever comes after a certain point does not match an expression. In your case, to ensure that a digit does not come after the end of the match, you'd add (?!d) at the end of your expression.






                              share|improve this answer












                              I'd suggest using a negative lookbehind (?<!...), which you can insert at any point in a regular expression to ensure that whatever comes right before that point does not match a certain expression (the ...). In your case, you want to ensure that what comes right before the beginning of the expression doesn't match a digit (d), so you would insert (?<!d) at the beginning of your regex.



                              If you would also like to exclude matches with the wrong number of digits at the end, as in aaaa18-05-181bbb, then you could also use a negative lookahead (?!...), which is similar to the negative lookbehind except that it ensures whatever comes after a certain point does not match an expression. In your case, to ensure that a digit does not come after the end of the match, you'd add (?!d) at the end of your expression.







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Nov 11 at 10:40









                              David Z

                              93.6k17197236




                              93.6k17197236






















                                  up vote
                                  0
                                  down vote













                                  You could use a negative lookbehind and a negative lookahead to assert that there are no digits on the left and on the right side. To match either 2 or 4 digits at the end you could use an alternation:



                                  (?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)



                                  Regex demo



                                  import re
                                  str = 'ggggg18-05-2018ggggg12345678'
                                  print(re.findall(r'(?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)', str))


                                  Note that you can use the hyphen without the character class.



                                  Demo Python






                                  share|improve this answer



























                                    up vote
                                    0
                                    down vote













                                    You could use a negative lookbehind and a negative lookahead to assert that there are no digits on the left and on the right side. To match either 2 or 4 digits at the end you could use an alternation:



                                    (?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)



                                    Regex demo



                                    import re
                                    str = 'ggggg18-05-2018ggggg12345678'
                                    print(re.findall(r'(?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)', str))


                                    Note that you can use the hyphen without the character class.



                                    Demo Python






                                    share|improve this answer

























                                      up vote
                                      0
                                      down vote










                                      up vote
                                      0
                                      down vote









                                      You could use a negative lookbehind and a negative lookahead to assert that there are no digits on the left and on the right side. To match either 2 or 4 digits at the end you could use an alternation:



                                      (?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)



                                      Regex demo



                                      import re
                                      str = 'ggggg18-05-2018ggggg12345678'
                                      print(re.findall(r'(?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)', str))


                                      Note that you can use the hyphen without the character class.



                                      Demo Python






                                      share|improve this answer














                                      You could use a negative lookbehind and a negative lookahead to assert that there are no digits on the left and on the right side. To match either 2 or 4 digits at the end you could use an alternation:



                                      (?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)



                                      Regex demo



                                      import re
                                      str = 'ggggg18-05-2018ggggg12345678'
                                      print(re.findall(r'(?<!d)d{2}-d{2}-(?:d{4}|d{2})(?!d)', str))


                                      Note that you can use the hyphen without the character class.



                                      Demo Python







                                      share|improve this answer














                                      share|improve this answer



                                      share|improve this answer








                                      edited Nov 11 at 11:21

























                                      answered Nov 11 at 11:15









                                      The fourth bird

                                      19.1k71323




                                      19.1k71323






























                                          draft saved

                                          draft discarded




















































                                          Thanks for contributing an answer to Stack Overflow!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.





                                          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                          Please pay close attention to the following guidance:


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function () {
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53247866%2fpython-regex-any-substring-matches%23new-answer', 'question_page');
                                          }
                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          List item for chat from Array inside array React Native

                                          Thiostrepton

                                          Caerphilly