Finding strings that are a certain length and contain specific characters












1















Sample data



a<-c("hour","four","ruoh", "six", "high", "our")


I want to find all strings that contain o & u & h & are 4 characters but the order does not matter.



I want to return "hour","four","ruoh"
this is my attempt



grepl("o+u+r", a) nchar(a)==4









share|improve this question

























  • What about testing each separately. You first test (with grep) which elements of the vector contains "o", those who pass, you test if they has "u" and those who pass you test for "h".

    – Cris
    Nov 14 '18 at 23:26











  • @Cris is this the most simple approach to do so?

    – bvowe
    Nov 14 '18 at 23:28






  • 5





    "four" does not contain o & u & h.

    – neilfws
    Nov 14 '18 at 23:28











  • @neilfws I have now done a modification

    – bvowe
    Nov 14 '18 at 23:34






  • 1





    See Regular Expressions: Is there an AND operator?; grepl("(?=.*h)(?=.*o)(?=.*u)", a, perl = TRUE)

    – Henrik
    Nov 14 '18 at 23:43
















1















Sample data



a<-c("hour","four","ruoh", "six", "high", "our")


I want to find all strings that contain o & u & h & are 4 characters but the order does not matter.



I want to return "hour","four","ruoh"
this is my attempt



grepl("o+u+r", a) nchar(a)==4









share|improve this question

























  • What about testing each separately. You first test (with grep) which elements of the vector contains "o", those who pass, you test if they has "u" and those who pass you test for "h".

    – Cris
    Nov 14 '18 at 23:26











  • @Cris is this the most simple approach to do so?

    – bvowe
    Nov 14 '18 at 23:28






  • 5





    "four" does not contain o & u & h.

    – neilfws
    Nov 14 '18 at 23:28











  • @neilfws I have now done a modification

    – bvowe
    Nov 14 '18 at 23:34






  • 1





    See Regular Expressions: Is there an AND operator?; grepl("(?=.*h)(?=.*o)(?=.*u)", a, perl = TRUE)

    – Henrik
    Nov 14 '18 at 23:43














1












1








1








Sample data



a<-c("hour","four","ruoh", "six", "high", "our")


I want to find all strings that contain o & u & h & are 4 characters but the order does not matter.



I want to return "hour","four","ruoh"
this is my attempt



grepl("o+u+r", a) nchar(a)==4









share|improve this question
















Sample data



a<-c("hour","four","ruoh", "six", "high", "our")


I want to find all strings that contain o & u & h & are 4 characters but the order does not matter.



I want to return "hour","four","ruoh"
this is my attempt



grepl("o+u+r", a) nchar(a)==4






r string grepl






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 23:34







bvowe

















asked Nov 14 '18 at 23:22









bvowebvowe

30818




30818













  • What about testing each separately. You first test (with grep) which elements of the vector contains "o", those who pass, you test if they has "u" and those who pass you test for "h".

    – Cris
    Nov 14 '18 at 23:26











  • @Cris is this the most simple approach to do so?

    – bvowe
    Nov 14 '18 at 23:28






  • 5





    "four" does not contain o & u & h.

    – neilfws
    Nov 14 '18 at 23:28











  • @neilfws I have now done a modification

    – bvowe
    Nov 14 '18 at 23:34






  • 1





    See Regular Expressions: Is there an AND operator?; grepl("(?=.*h)(?=.*o)(?=.*u)", a, perl = TRUE)

    – Henrik
    Nov 14 '18 at 23:43



















  • What about testing each separately. You first test (with grep) which elements of the vector contains "o", those who pass, you test if they has "u" and those who pass you test for "h".

    – Cris
    Nov 14 '18 at 23:26











  • @Cris is this the most simple approach to do so?

    – bvowe
    Nov 14 '18 at 23:28






  • 5





    "four" does not contain o & u & h.

    – neilfws
    Nov 14 '18 at 23:28











  • @neilfws I have now done a modification

    – bvowe
    Nov 14 '18 at 23:34






  • 1





    See Regular Expressions: Is there an AND operator?; grepl("(?=.*h)(?=.*o)(?=.*u)", a, perl = TRUE)

    – Henrik
    Nov 14 '18 at 23:43

















What about testing each separately. You first test (with grep) which elements of the vector contains "o", those who pass, you test if they has "u" and those who pass you test for "h".

– Cris
Nov 14 '18 at 23:26





What about testing each separately. You first test (with grep) which elements of the vector contains "o", those who pass, you test if they has "u" and those who pass you test for "h".

– Cris
Nov 14 '18 at 23:26













@Cris is this the most simple approach to do so?

– bvowe
Nov 14 '18 at 23:28





@Cris is this the most simple approach to do so?

– bvowe
Nov 14 '18 at 23:28




5




5





"four" does not contain o & u & h.

– neilfws
Nov 14 '18 at 23:28





"four" does not contain o & u & h.

– neilfws
Nov 14 '18 at 23:28













@neilfws I have now done a modification

– bvowe
Nov 14 '18 at 23:34





@neilfws I have now done a modification

– bvowe
Nov 14 '18 at 23:34




1




1





See Regular Expressions: Is there an AND operator?; grepl("(?=.*h)(?=.*o)(?=.*u)", a, perl = TRUE)

– Henrik
Nov 14 '18 at 23:43





See Regular Expressions: Is there an AND operator?; grepl("(?=.*h)(?=.*o)(?=.*u)", a, perl = TRUE)

– Henrik
Nov 14 '18 at 23:43












3 Answers
3






active

oldest

votes


















1














Using grepl with your edited method (r instead of h):



a<-c("hour","four","ruoh", "six", "high", "our")

a[grepl(pattern="o", x=a) & grepl(pattern="u", x=a) & grepl(pattern="r", x=a) & nchar(a)==4]


Returns:



[1] "hour" "four" "ruoh"





share|improve this answer































    2














    To match strings of length 4 containing the characters h, o, and u use:



    grepl("(?=^.{4}$)(?=.*h)(?=.*o)(?=.*u)",
    c("hour","four","ruoh", "six", "high", "our"),
    perl = TRUE)
    [1] TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE




    • (?=^.{4}$): string has length 4.


    • (?=.*x): x occurs at any position in string.






    share|improve this answer

































      1














      You could use strsplit and setdiff, I added an additional edge case to your sample data :



      a<-c("hour","four","ruoh", "six", "high", "our","oouh")
      a[nchar(a) == 4 &
      lengths(lapply(strsplit(a,""),function(x) setdiff(x, c("o","u","h")))) == 1]
      # [1] "hour" "ruoh"


      or grepl :



      a[nchar(a) == 4 & !rowSums(sapply(c("o","u","h"), Negate(grepl), a))]
      # [1] "hour" "ruoh" "oouh"


      sapply(c("o","u","h"), Negate(grepl), a) gives you a matrix of which word doesn't contain each letter, then the rowSums acts like any applied by row, as it will be coerced to logical.






      share|improve this answer


























      • This might have to be tweaked depending on how you want to treat some edge cases (multiple "h" for example)

        – Moody_Mudskipper
        Nov 14 '18 at 23:31











      • thanks a bunch @Moody_Mudskipper do you have grepl solution?

        – bvowe
        Nov 14 '18 at 23:35











      • see edited answer

        – Moody_Mudskipper
        Nov 14 '18 at 23:39











      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310252%2ffinding-strings-that-are-a-certain-length-and-contain-specific-characters%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      Using grepl with your edited method (r instead of h):



      a<-c("hour","four","ruoh", "six", "high", "our")

      a[grepl(pattern="o", x=a) & grepl(pattern="u", x=a) & grepl(pattern="r", x=a) & nchar(a)==4]


      Returns:



      [1] "hour" "four" "ruoh"





      share|improve this answer




























        1














        Using grepl with your edited method (r instead of h):



        a<-c("hour","four","ruoh", "six", "high", "our")

        a[grepl(pattern="o", x=a) & grepl(pattern="u", x=a) & grepl(pattern="r", x=a) & nchar(a)==4]


        Returns:



        [1] "hour" "four" "ruoh"





        share|improve this answer


























          1












          1








          1







          Using grepl with your edited method (r instead of h):



          a<-c("hour","four","ruoh", "six", "high", "our")

          a[grepl(pattern="o", x=a) & grepl(pattern="u", x=a) & grepl(pattern="r", x=a) & nchar(a)==4]


          Returns:



          [1] "hour" "four" "ruoh"





          share|improve this answer













          Using grepl with your edited method (r instead of h):



          a<-c("hour","four","ruoh", "six", "high", "our")

          a[grepl(pattern="o", x=a) & grepl(pattern="u", x=a) & grepl(pattern="r", x=a) & nchar(a)==4]


          Returns:



          [1] "hour" "four" "ruoh"






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 14 '18 at 23:42









          CrisCris

          498311




          498311

























              2














              To match strings of length 4 containing the characters h, o, and u use:



              grepl("(?=^.{4}$)(?=.*h)(?=.*o)(?=.*u)",
              c("hour","four","ruoh", "six", "high", "our"),
              perl = TRUE)
              [1] TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE




              • (?=^.{4}$): string has length 4.


              • (?=.*x): x occurs at any position in string.






              share|improve this answer






























                2














                To match strings of length 4 containing the characters h, o, and u use:



                grepl("(?=^.{4}$)(?=.*h)(?=.*o)(?=.*u)",
                c("hour","four","ruoh", "six", "high", "our"),
                perl = TRUE)
                [1] TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE




                • (?=^.{4}$): string has length 4.


                • (?=.*x): x occurs at any position in string.






                share|improve this answer




























                  2












                  2








                  2







                  To match strings of length 4 containing the characters h, o, and u use:



                  grepl("(?=^.{4}$)(?=.*h)(?=.*o)(?=.*u)",
                  c("hour","four","ruoh", "six", "high", "our"),
                  perl = TRUE)
                  [1] TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE




                  • (?=^.{4}$): string has length 4.


                  • (?=.*x): x occurs at any position in string.






                  share|improve this answer















                  To match strings of length 4 containing the characters h, o, and u use:



                  grepl("(?=^.{4}$)(?=.*h)(?=.*o)(?=.*u)",
                  c("hour","four","ruoh", "six", "high", "our"),
                  perl = TRUE)
                  [1] TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE




                  • (?=^.{4}$): string has length 4.


                  • (?=.*x): x occurs at any position in string.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 15 '18 at 1:39

























                  answered Nov 15 '18 at 0:36









                  FlorianFlorian

                  1,092817




                  1,092817























                      1














                      You could use strsplit and setdiff, I added an additional edge case to your sample data :



                      a<-c("hour","four","ruoh", "six", "high", "our","oouh")
                      a[nchar(a) == 4 &
                      lengths(lapply(strsplit(a,""),function(x) setdiff(x, c("o","u","h")))) == 1]
                      # [1] "hour" "ruoh"


                      or grepl :



                      a[nchar(a) == 4 & !rowSums(sapply(c("o","u","h"), Negate(grepl), a))]
                      # [1] "hour" "ruoh" "oouh"


                      sapply(c("o","u","h"), Negate(grepl), a) gives you a matrix of which word doesn't contain each letter, then the rowSums acts like any applied by row, as it will be coerced to logical.






                      share|improve this answer


























                      • This might have to be tweaked depending on how you want to treat some edge cases (multiple "h" for example)

                        – Moody_Mudskipper
                        Nov 14 '18 at 23:31











                      • thanks a bunch @Moody_Mudskipper do you have grepl solution?

                        – bvowe
                        Nov 14 '18 at 23:35











                      • see edited answer

                        – Moody_Mudskipper
                        Nov 14 '18 at 23:39
















                      1














                      You could use strsplit and setdiff, I added an additional edge case to your sample data :



                      a<-c("hour","four","ruoh", "six", "high", "our","oouh")
                      a[nchar(a) == 4 &
                      lengths(lapply(strsplit(a,""),function(x) setdiff(x, c("o","u","h")))) == 1]
                      # [1] "hour" "ruoh"


                      or grepl :



                      a[nchar(a) == 4 & !rowSums(sapply(c("o","u","h"), Negate(grepl), a))]
                      # [1] "hour" "ruoh" "oouh"


                      sapply(c("o","u","h"), Negate(grepl), a) gives you a matrix of which word doesn't contain each letter, then the rowSums acts like any applied by row, as it will be coerced to logical.






                      share|improve this answer


























                      • This might have to be tweaked depending on how you want to treat some edge cases (multiple "h" for example)

                        – Moody_Mudskipper
                        Nov 14 '18 at 23:31











                      • thanks a bunch @Moody_Mudskipper do you have grepl solution?

                        – bvowe
                        Nov 14 '18 at 23:35











                      • see edited answer

                        – Moody_Mudskipper
                        Nov 14 '18 at 23:39














                      1












                      1








                      1







                      You could use strsplit and setdiff, I added an additional edge case to your sample data :



                      a<-c("hour","four","ruoh", "six", "high", "our","oouh")
                      a[nchar(a) == 4 &
                      lengths(lapply(strsplit(a,""),function(x) setdiff(x, c("o","u","h")))) == 1]
                      # [1] "hour" "ruoh"


                      or grepl :



                      a[nchar(a) == 4 & !rowSums(sapply(c("o","u","h"), Negate(grepl), a))]
                      # [1] "hour" "ruoh" "oouh"


                      sapply(c("o","u","h"), Negate(grepl), a) gives you a matrix of which word doesn't contain each letter, then the rowSums acts like any applied by row, as it will be coerced to logical.






                      share|improve this answer















                      You could use strsplit and setdiff, I added an additional edge case to your sample data :



                      a<-c("hour","four","ruoh", "six", "high", "our","oouh")
                      a[nchar(a) == 4 &
                      lengths(lapply(strsplit(a,""),function(x) setdiff(x, c("o","u","h")))) == 1]
                      # [1] "hour" "ruoh"


                      or grepl :



                      a[nchar(a) == 4 & !rowSums(sapply(c("o","u","h"), Negate(grepl), a))]
                      # [1] "hour" "ruoh" "oouh"


                      sapply(c("o","u","h"), Negate(grepl), a) gives you a matrix of which word doesn't contain each letter, then the rowSums acts like any applied by row, as it will be coerced to logical.







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Nov 14 '18 at 23:39

























                      answered Nov 14 '18 at 23:28









                      Moody_MudskipperMoody_Mudskipper

                      23.1k33264




                      23.1k33264













                      • This might have to be tweaked depending on how you want to treat some edge cases (multiple "h" for example)

                        – Moody_Mudskipper
                        Nov 14 '18 at 23:31











                      • thanks a bunch @Moody_Mudskipper do you have grepl solution?

                        – bvowe
                        Nov 14 '18 at 23:35











                      • see edited answer

                        – Moody_Mudskipper
                        Nov 14 '18 at 23:39



















                      • This might have to be tweaked depending on how you want to treat some edge cases (multiple "h" for example)

                        – Moody_Mudskipper
                        Nov 14 '18 at 23:31











                      • thanks a bunch @Moody_Mudskipper do you have grepl solution?

                        – bvowe
                        Nov 14 '18 at 23:35











                      • see edited answer

                        – Moody_Mudskipper
                        Nov 14 '18 at 23:39

















                      This might have to be tweaked depending on how you want to treat some edge cases (multiple "h" for example)

                      – Moody_Mudskipper
                      Nov 14 '18 at 23:31





                      This might have to be tweaked depending on how you want to treat some edge cases (multiple "h" for example)

                      – Moody_Mudskipper
                      Nov 14 '18 at 23:31













                      thanks a bunch @Moody_Mudskipper do you have grepl solution?

                      – bvowe
                      Nov 14 '18 at 23:35





                      thanks a bunch @Moody_Mudskipper do you have grepl solution?

                      – bvowe
                      Nov 14 '18 at 23:35













                      see edited answer

                      – Moody_Mudskipper
                      Nov 14 '18 at 23:39





                      see edited answer

                      – Moody_Mudskipper
                      Nov 14 '18 at 23:39


















                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310252%2ffinding-strings-that-are-a-certain-length-and-contain-specific-characters%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Bressuire

                      Vorschmack

                      Quarantine