How to find all comments with Beautiful Soup











up vote
6
down vote

favorite
5












This question was asked four years ago, but the answer is now out of date for BS4.



I want to delete all comments in my html file using beautiful soup. Since BS4 makes each comment as a special type of navigable string, I thought this code would work:



for comments in soup.find_all('comment'):
comments.decompose()


So that didn't work.... How do I find all comments using BS4?










share|improve this question
























  • This answer should still work I suppose.
    – alecxe
    Oct 15 '15 at 3:12












  • I'm getting "global name 'comment' is not defined"
    – Joseph
    Oct 15 '15 at 3:21






  • 1




    I realize this is old, but @Joseph, if you import Comment from bs4 it should work
    – atarw
    Jul 12 '16 at 1:44










  • It does... The accepted answer is correct.
    – Joseph
    Jul 14 '16 at 12:15















up vote
6
down vote

favorite
5












This question was asked four years ago, but the answer is now out of date for BS4.



I want to delete all comments in my html file using beautiful soup. Since BS4 makes each comment as a special type of navigable string, I thought this code would work:



for comments in soup.find_all('comment'):
comments.decompose()


So that didn't work.... How do I find all comments using BS4?










share|improve this question
























  • This answer should still work I suppose.
    – alecxe
    Oct 15 '15 at 3:12












  • I'm getting "global name 'comment' is not defined"
    – Joseph
    Oct 15 '15 at 3:21






  • 1




    I realize this is old, but @Joseph, if you import Comment from bs4 it should work
    – atarw
    Jul 12 '16 at 1:44










  • It does... The accepted answer is correct.
    – Joseph
    Jul 14 '16 at 12:15













up vote
6
down vote

favorite
5









up vote
6
down vote

favorite
5






5





This question was asked four years ago, but the answer is now out of date for BS4.



I want to delete all comments in my html file using beautiful soup. Since BS4 makes each comment as a special type of navigable string, I thought this code would work:



for comments in soup.find_all('comment'):
comments.decompose()


So that didn't work.... How do I find all comments using BS4?










share|improve this question















This question was asked four years ago, but the answer is now out of date for BS4.



I want to delete all comments in my html file using beautiful soup. Since BS4 makes each comment as a special type of navigable string, I thought this code would work:



for comments in soup.find_all('comment'):
comments.decompose()


So that didn't work.... How do I find all comments using BS4?







python html beautifulsoup comments bs4






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 23 '17 at 11:53









Community

11




11










asked Oct 15 '15 at 2:43









Joseph

431311




431311












  • This answer should still work I suppose.
    – alecxe
    Oct 15 '15 at 3:12












  • I'm getting "global name 'comment' is not defined"
    – Joseph
    Oct 15 '15 at 3:21






  • 1




    I realize this is old, but @Joseph, if you import Comment from bs4 it should work
    – atarw
    Jul 12 '16 at 1:44










  • It does... The accepted answer is correct.
    – Joseph
    Jul 14 '16 at 12:15


















  • This answer should still work I suppose.
    – alecxe
    Oct 15 '15 at 3:12












  • I'm getting "global name 'comment' is not defined"
    – Joseph
    Oct 15 '15 at 3:21






  • 1




    I realize this is old, but @Joseph, if you import Comment from bs4 it should work
    – atarw
    Jul 12 '16 at 1:44










  • It does... The accepted answer is correct.
    – Joseph
    Jul 14 '16 at 12:15
















This answer should still work I suppose.
– alecxe
Oct 15 '15 at 3:12






This answer should still work I suppose.
– alecxe
Oct 15 '15 at 3:12














I'm getting "global name 'comment' is not defined"
– Joseph
Oct 15 '15 at 3:21




I'm getting "global name 'comment' is not defined"
– Joseph
Oct 15 '15 at 3:21




1




1




I realize this is old, but @Joseph, if you import Comment from bs4 it should work
– atarw
Jul 12 '16 at 1:44




I realize this is old, but @Joseph, if you import Comment from bs4 it should work
– atarw
Jul 12 '16 at 1:44












It does... The accepted answer is correct.
– Joseph
Jul 14 '16 at 12:15




It does... The accepted answer is correct.
– Joseph
Jul 14 '16 at 12:15












2 Answers
2






active

oldest

votes

















up vote
11
down vote



accepted










You can pass a function to find_all() to help it check whether the string is a Comment.



For example I have below html:



<body>
<!-- Branding and main navigation -->
<div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>
<div class="l-branding">
<p>Just a brand</p>
</div>
<!-- test comment here -->
<div class="block_content">
<a href="https://www.google.com">Google</a>
</div>
</body>


Code:



from bs4 import BeautifulSoup as BS
from bs4 import Comment
....
soup=BS(html,'html.parser')
comments=soup.find_all(string=lambda text:isinstance(text,Comment))
for c in comments:
print c
print "==========="
c.decompose()


the output would be:



Branding and main navigation 
============
test comment here
============


BTW, I think the reason why find_all('Comment') doesn't work is (from BeautifulSoup document):




Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match.







share|improve this answer























  • I'm glad I found your answer, thanks! Any idea how we could write it without using lambda?
    – JinSnow
    Jan 4 '17 at 20:43


















up vote
10
down vote













Two things I needed to do:



First, when importing Beautiful Soup



from bs4 import BeautifulSoup, Comment


Second, here's the code to extract comments



for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):
comments.extract()





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f33138937%2fhow-to-find-all-comments-with-beautiful-soup%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    11
    down vote



    accepted










    You can pass a function to find_all() to help it check whether the string is a Comment.



    For example I have below html:



    <body>
    <!-- Branding and main navigation -->
    <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>
    <div class="l-branding">
    <p>Just a brand</p>
    </div>
    <!-- test comment here -->
    <div class="block_content">
    <a href="https://www.google.com">Google</a>
    </div>
    </body>


    Code:



    from bs4 import BeautifulSoup as BS
    from bs4 import Comment
    ....
    soup=BS(html,'html.parser')
    comments=soup.find_all(string=lambda text:isinstance(text,Comment))
    for c in comments:
    print c
    print "==========="
    c.decompose()


    the output would be:



    Branding and main navigation 
    ============
    test comment here
    ============


    BTW, I think the reason why find_all('Comment') doesn't work is (from BeautifulSoup document):




    Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match.







    share|improve this answer























    • I'm glad I found your answer, thanks! Any idea how we could write it without using lambda?
      – JinSnow
      Jan 4 '17 at 20:43















    up vote
    11
    down vote



    accepted










    You can pass a function to find_all() to help it check whether the string is a Comment.



    For example I have below html:



    <body>
    <!-- Branding and main navigation -->
    <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>
    <div class="l-branding">
    <p>Just a brand</p>
    </div>
    <!-- test comment here -->
    <div class="block_content">
    <a href="https://www.google.com">Google</a>
    </div>
    </body>


    Code:



    from bs4 import BeautifulSoup as BS
    from bs4 import Comment
    ....
    soup=BS(html,'html.parser')
    comments=soup.find_all(string=lambda text:isinstance(text,Comment))
    for c in comments:
    print c
    print "==========="
    c.decompose()


    the output would be:



    Branding and main navigation 
    ============
    test comment here
    ============


    BTW, I think the reason why find_all('Comment') doesn't work is (from BeautifulSoup document):




    Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match.







    share|improve this answer























    • I'm glad I found your answer, thanks! Any idea how we could write it without using lambda?
      – JinSnow
      Jan 4 '17 at 20:43













    up vote
    11
    down vote



    accepted







    up vote
    11
    down vote



    accepted






    You can pass a function to find_all() to help it check whether the string is a Comment.



    For example I have below html:



    <body>
    <!-- Branding and main navigation -->
    <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>
    <div class="l-branding">
    <p>Just a brand</p>
    </div>
    <!-- test comment here -->
    <div class="block_content">
    <a href="https://www.google.com">Google</a>
    </div>
    </body>


    Code:



    from bs4 import BeautifulSoup as BS
    from bs4 import Comment
    ....
    soup=BS(html,'html.parser')
    comments=soup.find_all(string=lambda text:isinstance(text,Comment))
    for c in comments:
    print c
    print "==========="
    c.decompose()


    the output would be:



    Branding and main navigation 
    ============
    test comment here
    ============


    BTW, I think the reason why find_all('Comment') doesn't work is (from BeautifulSoup document):




    Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match.







    share|improve this answer














    You can pass a function to find_all() to help it check whether the string is a Comment.



    For example I have below html:



    <body>
    <!-- Branding and main navigation -->
    <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>
    <div class="l-branding">
    <p>Just a brand</p>
    </div>
    <!-- test comment here -->
    <div class="block_content">
    <a href="https://www.google.com">Google</a>
    </div>
    </body>


    Code:



    from bs4 import BeautifulSoup as BS
    from bs4 import Comment
    ....
    soup=BS(html,'html.parser')
    comments=soup.find_all(string=lambda text:isinstance(text,Comment))
    for c in comments:
    print c
    print "==========="
    c.decompose()


    the output would be:



    Branding and main navigation 
    ============
    test comment here
    ============


    BTW, I think the reason why find_all('Comment') doesn't work is (from BeautifulSoup document):




    Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match.








    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Oct 15 '15 at 4:00

























    answered Oct 15 '15 at 3:39









    Flickerlight

    494411




    494411












    • I'm glad I found your answer, thanks! Any idea how we could write it without using lambda?
      – JinSnow
      Jan 4 '17 at 20:43


















    • I'm glad I found your answer, thanks! Any idea how we could write it without using lambda?
      – JinSnow
      Jan 4 '17 at 20:43
















    I'm glad I found your answer, thanks! Any idea how we could write it without using lambda?
    – JinSnow
    Jan 4 '17 at 20:43




    I'm glad I found your answer, thanks! Any idea how we could write it without using lambda?
    – JinSnow
    Jan 4 '17 at 20:43












    up vote
    10
    down vote













    Two things I needed to do:



    First, when importing Beautiful Soup



    from bs4 import BeautifulSoup, Comment


    Second, here's the code to extract comments



    for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):
    comments.extract()





    share|improve this answer

























      up vote
      10
      down vote













      Two things I needed to do:



      First, when importing Beautiful Soup



      from bs4 import BeautifulSoup, Comment


      Second, here's the code to extract comments



      for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):
      comments.extract()





      share|improve this answer























        up vote
        10
        down vote










        up vote
        10
        down vote









        Two things I needed to do:



        First, when importing Beautiful Soup



        from bs4 import BeautifulSoup, Comment


        Second, here's the code to extract comments



        for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):
        comments.extract()





        share|improve this answer












        Two things I needed to do:



        First, when importing Beautiful Soup



        from bs4 import BeautifulSoup, Comment


        Second, here's the code to extract comments



        for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):
        comments.extract()






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Oct 15 '15 at 3:26









        Joseph

        431311




        431311






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f33138937%2fhow-to-find-all-comments-with-beautiful-soup%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Xamarin.iOS Cant Deploy on Iphone

            Glorious Revolution

            Dulmage-Mendelsohn matrix decomposition in Python