Python + regex: How to extract values between two underscores in Python?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







-1















I am trying to extract values between two underscores. For that I have written this code:



patient_ids = 
for file in files:
print(file)
patient_id = re.findall("_(.*?)_", file)
patient_ids.append(patient_id)

print(patient_ids)


Output:



PT_112_NIM 26-04-2017_merged.csv
PT_114_NIM_merged.csv
PT_115_NIM_merged.csv
PT_116_NIM_merged.csv
PT_117_NIM_merged.csv
PT_118_NIM_merged.csv
PT_119_NIM_merged.csv
[['112'], ['114'], ['115'], ['116'], ['117'], ['118'], ['119'], ['120'], ['121'], ['122'], ['123'], ['124'], ['125'], ['126'], ['127'], ['128'], ['129'], ['130'], ['131'], ['132'], ['133'], ['134'], ['135'], ['136'], ['137'], ['138'], ['139'], ['140'], ['141'], ['142'], ['143'], ['144'], ['145'], ['146'], ['147'], ['150'], ['151'], ['152'], ['153'], ['154'], ['155'], ['156'], ['157'], ['158'], ['159'], ['160'], ['161'], ['162'], ['163'], ['165']]


So extracted values are in this form: ['121']. I want them in this form: 121 , i.e., just the number inside two underscores.



What change should I make to my code?










share|improve this question























  • How about int(patient_id[0])?

    – zipa
    Nov 16 '18 at 12:50











  • Try int(re.search("_(d*?)_", file).group(1))

    – schwobaseggl
    Nov 16 '18 at 12:50






  • 1





    @usr2564301 I think you misunderstood...

    – schwobaseggl
    Nov 16 '18 at 12:51


















-1















I am trying to extract values between two underscores. For that I have written this code:



patient_ids = 
for file in files:
print(file)
patient_id = re.findall("_(.*?)_", file)
patient_ids.append(patient_id)

print(patient_ids)


Output:



PT_112_NIM 26-04-2017_merged.csv
PT_114_NIM_merged.csv
PT_115_NIM_merged.csv
PT_116_NIM_merged.csv
PT_117_NIM_merged.csv
PT_118_NIM_merged.csv
PT_119_NIM_merged.csv
[['112'], ['114'], ['115'], ['116'], ['117'], ['118'], ['119'], ['120'], ['121'], ['122'], ['123'], ['124'], ['125'], ['126'], ['127'], ['128'], ['129'], ['130'], ['131'], ['132'], ['133'], ['134'], ['135'], ['136'], ['137'], ['138'], ['139'], ['140'], ['141'], ['142'], ['143'], ['144'], ['145'], ['146'], ['147'], ['150'], ['151'], ['152'], ['153'], ['154'], ['155'], ['156'], ['157'], ['158'], ['159'], ['160'], ['161'], ['162'], ['163'], ['165']]


So extracted values are in this form: ['121']. I want them in this form: 121 , i.e., just the number inside two underscores.



What change should I make to my code?










share|improve this question























  • How about int(patient_id[0])?

    – zipa
    Nov 16 '18 at 12:50











  • Try int(re.search("_(d*?)_", file).group(1))

    – schwobaseggl
    Nov 16 '18 at 12:50






  • 1





    @usr2564301 I think you misunderstood...

    – schwobaseggl
    Nov 16 '18 at 12:51














-1












-1








-1








I am trying to extract values between two underscores. For that I have written this code:



patient_ids = 
for file in files:
print(file)
patient_id = re.findall("_(.*?)_", file)
patient_ids.append(patient_id)

print(patient_ids)


Output:



PT_112_NIM 26-04-2017_merged.csv
PT_114_NIM_merged.csv
PT_115_NIM_merged.csv
PT_116_NIM_merged.csv
PT_117_NIM_merged.csv
PT_118_NIM_merged.csv
PT_119_NIM_merged.csv
[['112'], ['114'], ['115'], ['116'], ['117'], ['118'], ['119'], ['120'], ['121'], ['122'], ['123'], ['124'], ['125'], ['126'], ['127'], ['128'], ['129'], ['130'], ['131'], ['132'], ['133'], ['134'], ['135'], ['136'], ['137'], ['138'], ['139'], ['140'], ['141'], ['142'], ['143'], ['144'], ['145'], ['146'], ['147'], ['150'], ['151'], ['152'], ['153'], ['154'], ['155'], ['156'], ['157'], ['158'], ['159'], ['160'], ['161'], ['162'], ['163'], ['165']]


So extracted values are in this form: ['121']. I want them in this form: 121 , i.e., just the number inside two underscores.



What change should I make to my code?










share|improve this question














I am trying to extract values between two underscores. For that I have written this code:



patient_ids = 
for file in files:
print(file)
patient_id = re.findall("_(.*?)_", file)
patient_ids.append(patient_id)

print(patient_ids)


Output:



PT_112_NIM 26-04-2017_merged.csv
PT_114_NIM_merged.csv
PT_115_NIM_merged.csv
PT_116_NIM_merged.csv
PT_117_NIM_merged.csv
PT_118_NIM_merged.csv
PT_119_NIM_merged.csv
[['112'], ['114'], ['115'], ['116'], ['117'], ['118'], ['119'], ['120'], ['121'], ['122'], ['123'], ['124'], ['125'], ['126'], ['127'], ['128'], ['129'], ['130'], ['131'], ['132'], ['133'], ['134'], ['135'], ['136'], ['137'], ['138'], ['139'], ['140'], ['141'], ['142'], ['143'], ['144'], ['145'], ['146'], ['147'], ['150'], ['151'], ['152'], ['153'], ['154'], ['155'], ['156'], ['157'], ['158'], ['159'], ['160'], ['161'], ['162'], ['163'], ['165']]


So extracted values are in this form: ['121']. I want them in this form: 121 , i.e., just the number inside two underscores.



What change should I make to my code?







python regex






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 16 '18 at 12:47









DebbieDebbie

361314




361314













  • How about int(patient_id[0])?

    – zipa
    Nov 16 '18 at 12:50











  • Try int(re.search("_(d*?)_", file).group(1))

    – schwobaseggl
    Nov 16 '18 at 12:50






  • 1





    @usr2564301 I think you misunderstood...

    – schwobaseggl
    Nov 16 '18 at 12:51



















  • How about int(patient_id[0])?

    – zipa
    Nov 16 '18 at 12:50











  • Try int(re.search("_(d*?)_", file).group(1))

    – schwobaseggl
    Nov 16 '18 at 12:50






  • 1





    @usr2564301 I think you misunderstood...

    – schwobaseggl
    Nov 16 '18 at 12:51

















How about int(patient_id[0])?

– zipa
Nov 16 '18 at 12:50





How about int(patient_id[0])?

– zipa
Nov 16 '18 at 12:50













Try int(re.search("_(d*?)_", file).group(1))

– schwobaseggl
Nov 16 '18 at 12:50





Try int(re.search("_(d*?)_", file).group(1))

– schwobaseggl
Nov 16 '18 at 12:50




1




1





@usr2564301 I think you misunderstood...

– schwobaseggl
Nov 16 '18 at 12:51





@usr2564301 I think you misunderstood...

– schwobaseggl
Nov 16 '18 at 12:51












4 Answers
4






active

oldest

votes


















1














Really, an easy way would be, instead of appending a list to another list, just make that list equivalent:



patient_ids = 
for file in files:
print(file)
patient_ids.extend(re.findall("_(.*?)_", file))

print(patient_ids)





share|improve this answer

































    1














    Just replace the last line of your for loop by :



    patient_ids.extend(int(patient_id))


    extend will flatten your results, and int(patient_id) will convert the string to int






    share|improve this answer































      1














      You need to flatten your results, e.g. like that:



       patient_ids = [item for sublist in patient_ids for item in sublist]
      print flat_list
      # => ['112', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', '150', '151', '152', '153', '154', '155', '156', '157', '158', '159', '160', '161', '162', '163', '165']





      share|improve this answer































        1














        You have a list of findall results (which only ever is 1 result per file it seems) - you can either just convert the strings to integers or also flatten the result:



        patient_ids= [['112'], ['114','4711'], ['115'], ['116'], ['117'], ['118'], ['119']]
        # ^^^^^ ^^^^^^ modified to have 2 ids for demo-purposes


        # if you want to keep the boxing
        numms = [ list(map(int,m)) for m in patient_ids]

        # converted and flattened
        numms2 = [ x for y in [list(map(int,m)) for m in patient_ids] for x in y]


        print(numms)

        print(numms2)


        Output:



        # this keeps the findall results together in inner lists
        [[112], [114, 4711], [115], [116], [117], [118], [119]]

        # this flattens all results
        [112, 114, 4711, 115, 116, 117, 118, 119]


        Doku:




        • you can find the doku for map() and int() at Overview of built in functions






        share|improve this answer
























          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338234%2fpython-regex-how-to-extract-values-between-two-underscores-in-python%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          4 Answers
          4






          active

          oldest

          votes








          4 Answers
          4






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Really, an easy way would be, instead of appending a list to another list, just make that list equivalent:



          patient_ids = 
          for file in files:
          print(file)
          patient_ids.extend(re.findall("_(.*?)_", file))

          print(patient_ids)





          share|improve this answer






























            1














            Really, an easy way would be, instead of appending a list to another list, just make that list equivalent:



            patient_ids = 
            for file in files:
            print(file)
            patient_ids.extend(re.findall("_(.*?)_", file))

            print(patient_ids)





            share|improve this answer




























              1












              1








              1







              Really, an easy way would be, instead of appending a list to another list, just make that list equivalent:



              patient_ids = 
              for file in files:
              print(file)
              patient_ids.extend(re.findall("_(.*?)_", file))

              print(patient_ids)





              share|improve this answer















              Really, an easy way would be, instead of appending a list to another list, just make that list equivalent:



              patient_ids = 
              for file in files:
              print(file)
              patient_ids.extend(re.findall("_(.*?)_", file))

              print(patient_ids)






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 16 '18 at 13:02

























              answered Nov 16 '18 at 13:01









              connectyourchargerconnectyourcharger

              599424




              599424

























                  1














                  Just replace the last line of your for loop by :



                  patient_ids.extend(int(patient_id))


                  extend will flatten your results, and int(patient_id) will convert the string to int






                  share|improve this answer




























                    1














                    Just replace the last line of your for loop by :



                    patient_ids.extend(int(patient_id))


                    extend will flatten your results, and int(patient_id) will convert the string to int






                    share|improve this answer


























                      1












                      1








                      1







                      Just replace the last line of your for loop by :



                      patient_ids.extend(int(patient_id))


                      extend will flatten your results, and int(patient_id) will convert the string to int






                      share|improve this answer













                      Just replace the last line of your for loop by :



                      patient_ids.extend(int(patient_id))


                      extend will flatten your results, and int(patient_id) will convert the string to int







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Nov 16 '18 at 12:52









                      Matina GMatina G

                      629213




                      629213























                          1














                          You need to flatten your results, e.g. like that:



                           patient_ids = [item for sublist in patient_ids for item in sublist]
                          print flat_list
                          # => ['112', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', '150', '151', '152', '153', '154', '155', '156', '157', '158', '159', '160', '161', '162', '163', '165']





                          share|improve this answer




























                            1














                            You need to flatten your results, e.g. like that:



                             patient_ids = [item for sublist in patient_ids for item in sublist]
                            print flat_list
                            # => ['112', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', '150', '151', '152', '153', '154', '155', '156', '157', '158', '159', '160', '161', '162', '163', '165']





                            share|improve this answer


























                              1












                              1








                              1







                              You need to flatten your results, e.g. like that:



                               patient_ids = [item for sublist in patient_ids for item in sublist]
                              print flat_list
                              # => ['112', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', '150', '151', '152', '153', '154', '155', '156', '157', '158', '159', '160', '161', '162', '163', '165']





                              share|improve this answer













                              You need to flatten your results, e.g. like that:



                               patient_ids = [item for sublist in patient_ids for item in sublist]
                              print flat_list
                              # => ['112', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', '150', '151', '152', '153', '154', '155', '156', '157', '158', '159', '160', '161', '162', '163', '165']






                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Nov 16 '18 at 12:52









                              mrzasamrzasa

                              10.7k104079




                              10.7k104079























                                  1














                                  You have a list of findall results (which only ever is 1 result per file it seems) - you can either just convert the strings to integers or also flatten the result:



                                  patient_ids= [['112'], ['114','4711'], ['115'], ['116'], ['117'], ['118'], ['119']]
                                  # ^^^^^ ^^^^^^ modified to have 2 ids for demo-purposes


                                  # if you want to keep the boxing
                                  numms = [ list(map(int,m)) for m in patient_ids]

                                  # converted and flattened
                                  numms2 = [ x for y in [list(map(int,m)) for m in patient_ids] for x in y]


                                  print(numms)

                                  print(numms2)


                                  Output:



                                  # this keeps the findall results together in inner lists
                                  [[112], [114, 4711], [115], [116], [117], [118], [119]]

                                  # this flattens all results
                                  [112, 114, 4711, 115, 116, 117, 118, 119]


                                  Doku:




                                  • you can find the doku for map() and int() at Overview of built in functions






                                  share|improve this answer




























                                    1














                                    You have a list of findall results (which only ever is 1 result per file it seems) - you can either just convert the strings to integers or also flatten the result:



                                    patient_ids= [['112'], ['114','4711'], ['115'], ['116'], ['117'], ['118'], ['119']]
                                    # ^^^^^ ^^^^^^ modified to have 2 ids for demo-purposes


                                    # if you want to keep the boxing
                                    numms = [ list(map(int,m)) for m in patient_ids]

                                    # converted and flattened
                                    numms2 = [ x for y in [list(map(int,m)) for m in patient_ids] for x in y]


                                    print(numms)

                                    print(numms2)


                                    Output:



                                    # this keeps the findall results together in inner lists
                                    [[112], [114, 4711], [115], [116], [117], [118], [119]]

                                    # this flattens all results
                                    [112, 114, 4711, 115, 116, 117, 118, 119]


                                    Doku:




                                    • you can find the doku for map() and int() at Overview of built in functions






                                    share|improve this answer


























                                      1












                                      1








                                      1







                                      You have a list of findall results (which only ever is 1 result per file it seems) - you can either just convert the strings to integers or also flatten the result:



                                      patient_ids= [['112'], ['114','4711'], ['115'], ['116'], ['117'], ['118'], ['119']]
                                      # ^^^^^ ^^^^^^ modified to have 2 ids for demo-purposes


                                      # if you want to keep the boxing
                                      numms = [ list(map(int,m)) for m in patient_ids]

                                      # converted and flattened
                                      numms2 = [ x for y in [list(map(int,m)) for m in patient_ids] for x in y]


                                      print(numms)

                                      print(numms2)


                                      Output:



                                      # this keeps the findall results together in inner lists
                                      [[112], [114, 4711], [115], [116], [117], [118], [119]]

                                      # this flattens all results
                                      [112, 114, 4711, 115, 116, 117, 118, 119]


                                      Doku:




                                      • you can find the doku for map() and int() at Overview of built in functions






                                      share|improve this answer













                                      You have a list of findall results (which only ever is 1 result per file it seems) - you can either just convert the strings to integers or also flatten the result:



                                      patient_ids= [['112'], ['114','4711'], ['115'], ['116'], ['117'], ['118'], ['119']]
                                      # ^^^^^ ^^^^^^ modified to have 2 ids for demo-purposes


                                      # if you want to keep the boxing
                                      numms = [ list(map(int,m)) for m in patient_ids]

                                      # converted and flattened
                                      numms2 = [ x for y in [list(map(int,m)) for m in patient_ids] for x in y]


                                      print(numms)

                                      print(numms2)


                                      Output:



                                      # this keeps the findall results together in inner lists
                                      [[112], [114, 4711], [115], [116], [117], [118], [119]]

                                      # this flattens all results
                                      [112, 114, 4711, 115, 116, 117, 118, 119]


                                      Doku:




                                      • you can find the doku for map() and int() at Overview of built in functions







                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Nov 16 '18 at 13:01









                                      Patrick ArtnerPatrick Artner

                                      26.4k62544




                                      26.4k62544






























                                          draft saved

                                          draft discarded




















































                                          Thanks for contributing an answer to Stack Overflow!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function () {
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338234%2fpython-regex-how-to-extract-values-between-two-underscores-in-python%23new-answer', 'question_page');
                                          }
                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          Bressuire

                                          Vorschmack

                                          Quarantine