How to find all comments with Beautiful Soup

up vote
6
down vote

favorite

This question was asked four years ago, but the answer is now out of date for BS4.

I want to delete all comments in my html file using beautiful soup. Since BS4 makes each comment as a special type of navigable string, I thought this code would work:

for comments in soup.find_all('comment'):

     comments.decompose()

So that didn't work.... How do I find all comments using BS4?

edited May 23 '17 at 11:53

Community♦

asked Oct 15 '15 at 2:43

Joseph

431311

This answer should still work I suppose.
– alecxe
Oct 15 '15 at 3:12

I'm getting "global name 'comment' is not defined"
– Joseph
Oct 15 '15 at 3:21

1

I realize this is old, but @Joseph, if you import Comment from bs4 it should work
– atarw
Jul 12 '16 at 1:44

It does... The accepted answer is correct.
– Joseph
Jul 14 '16 at 12:15

add a comment |

up vote
6
down vote

favorite

This question was asked four years ago, but the answer is now out of date for BS4.

I want to delete all comments in my html file using beautiful soup. Since BS4 makes each comment as a special type of navigable string, I thought this code would work:

for comments in soup.find_all('comment'):

     comments.decompose()

So that didn't work.... How do I find all comments using BS4?

edited May 23 '17 at 11:53

Community♦

asked Oct 15 '15 at 2:43

Joseph

431311

This answer should still work I suppose.
– alecxe
Oct 15 '15 at 3:12

I'm getting "global name 'comment' is not defined"
– Joseph
Oct 15 '15 at 3:21

1

I realize this is old, but @Joseph, if you import Comment from bs4 it should work
– atarw
Jul 12 '16 at 1:44

It does... The accepted answer is correct.
– Joseph
Jul 14 '16 at 12:15

add a comment |

up vote
6
down vote

favorite

This question was asked four years ago, but the answer is now out of date for BS4.

I want to delete all comments in my html file using beautiful soup. Since BS4 makes each comment as a special type of navigable string, I thought this code would work:

for comments in soup.find_all('comment'):

     comments.decompose()

So that didn't work.... How do I find all comments using BS4?

edited May 23 '17 at 11:53

Community♦

asked Oct 15 '15 at 2:43

Joseph

431311

This question was asked four years ago, but the answer is now out of date for BS4.

I want to delete all comments in my html file using beautiful soup. Since BS4 makes each comment as a special type of navigable string, I thought this code would work:

for comments in soup.find_all('comment'):

     comments.decompose()

So that didn't work.... How do I find all comments using BS4?

python html beautifulsoup comments bs4

edited May 23 '17 at 11:53

Community♦

asked Oct 15 '15 at 2:43

Joseph

431311

edited May 23 '17 at 11:53

Community♦

asked Oct 15 '15 at 2:43

Joseph

431311

edited May 23 '17 at 11:53

Community♦

edited May 23 '17 at 11:53

Community♦

edited May 23 '17 at 11:53

Community♦

asked Oct 15 '15 at 2:43

Joseph

431311

asked Oct 15 '15 at 2:43

Joseph

431311

asked Oct 15 '15 at 2:43

Joseph

431311

This answer should still work I suppose.
– alecxe
Oct 15 '15 at 3:12

I'm getting "global name 'comment' is not defined"
– Joseph
Oct 15 '15 at 3:21

1

I realize this is old, but @Joseph, if you import Comment from bs4 it should work
– atarw
Jul 12 '16 at 1:44

It does... The accepted answer is correct.
– Joseph
Jul 14 '16 at 12:15

add a comment |

This answer should still work I suppose.
– alecxe
Oct 15 '15 at 3:12

I'm getting "global name 'comment' is not defined"
– Joseph
Oct 15 '15 at 3:21

1

I realize this is old, but @Joseph, if you import Comment from bs4 it should work
– atarw
Jul 12 '16 at 1:44

It does... The accepted answer is correct.
– Joseph
Jul 14 '16 at 12:15

This answer should still work I suppose.
– alecxe
Oct 15 '15 at 3:12

I'm getting "global name 'comment' is not defined"
– Joseph
Oct 15 '15 at 3:21

I realize this is old, but @Joseph, if you import Comment from bs4 it should work
– atarw
Jul 12 '16 at 1:44

It does... The accepted answer is correct.
– Joseph
Jul 14 '16 at 12:15

add a comment |

2 Answers
2

active

oldest

votes

up vote
11
down vote

accepted

You can pass a function to find_all() to help it check whether the string is a Comment.

For example I have below html:

<body>

   <!-- Branding and main navigation -->

   <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>

   <div class="l-branding">

      <p>Just a brand</p>

   </div>

      <!-- test comment here -->

      <div class="block_content">

          <a href="https://www.google.com">Google</a>

   </div>

</body>

Code:

from bs4 import BeautifulSoup as BS

from bs4 import Comment

....

soup=BS(html,'html.parser')

comments=soup.find_all(string=lambda text:isinstance(text,Comment))

for c in comments:

    print c

    print "==========="

    c.decompose()

the output would be:

Branding and main navigation 

============

test comment here

============

BTW, I think the reason why find_all('Comment') doesn't work is (from BeautifulSoup document):

Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match.

edited Oct 15 '15 at 4:00

answered Oct 15 '15 at 3:39

Flickerlight

494411

I'm glad I found your answer, thanks! Any idea how we could write it without using lambda?
– JinSnow
Jan 4 '17 at 20:43

add a comment |

up vote
10
down vote

Two things I needed to do:

First, when importing Beautiful Soup

from bs4 import BeautifulSoup, Comment

Second, here's the code to extract comments

for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):

    comments.extract()

answered Oct 15 '15 at 3:26

Joseph

431311

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f33138937%2fhow-to-find-all-comments-with-beautiful-soup%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
11
down vote

accepted

You can pass a function to find_all() to help it check whether the string is a Comment.

For example I have below html:

<body>

   <!-- Branding and main navigation -->

   <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>

   <div class="l-branding">

      <p>Just a brand</p>

   </div>

      <!-- test comment here -->

      <div class="block_content">

          <a href="https://www.google.com">Google</a>

   </div>

</body>

Code:

from bs4 import BeautifulSoup as BS

from bs4 import Comment

....

soup=BS(html,'html.parser')

comments=soup.find_all(string=lambda text:isinstance(text,Comment))

for c in comments:

    print c

    print "==========="

    c.decompose()

the output would be:

Branding and main navigation 

============

test comment here

============

BTW, I think the reason why find_all('Comment') doesn't work is (from BeautifulSoup document):

Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match.

edited Oct 15 '15 at 4:00

answered Oct 15 '15 at 3:39

Flickerlight

494411

I'm glad I found your answer, thanks! Any idea how we could write it without using lambda?
– JinSnow
Jan 4 '17 at 20:43

add a comment |

up vote
11
down vote

accepted

You can pass a function to find_all() to help it check whether the string is a Comment.

For example I have below html:

<body>

   <!-- Branding and main navigation -->

   <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>

   <div class="l-branding">

      <p>Just a brand</p>

   </div>

      <!-- test comment here -->

      <div class="block_content">

          <a href="https://www.google.com">Google</a>

   </div>

</body>

Code:

from bs4 import BeautifulSoup as BS

from bs4 import Comment

....

soup=BS(html,'html.parser')

comments=soup.find_all(string=lambda text:isinstance(text,Comment))

for c in comments:

    print c

    print "==========="

    c.decompose()

the output would be:

Branding and main navigation 

============

test comment here

============

BTW, I think the reason why find_all('Comment') doesn't work is (from BeautifulSoup document):

Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match.

edited Oct 15 '15 at 4:00

answered Oct 15 '15 at 3:39

Flickerlight

494411

I'm glad I found your answer, thanks! Any idea how we could write it without using lambda?
– JinSnow
Jan 4 '17 at 20:43

add a comment |

up vote
11
down vote

accepted

You can pass a function to find_all() to help it check whether the string is a Comment.

For example I have below html:

<body>

   <!-- Branding and main navigation -->

   <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>

   <div class="l-branding">

      <p>Just a brand</p>

   </div>

      <!-- test comment here -->

      <div class="block_content">

          <a href="https://www.google.com">Google</a>

   </div>

</body>

Code:

from bs4 import BeautifulSoup as BS

from bs4 import Comment

....

soup=BS(html,'html.parser')

comments=soup.find_all(string=lambda text:isinstance(text,Comment))

for c in comments:

    print c

    print "==========="

    c.decompose()

the output would be:

Branding and main navigation 

============

test comment here

============

BTW, I think the reason why find_all('Comment') doesn't work is (from BeautifulSoup document):

Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match.

edited Oct 15 '15 at 4:00

answered Oct 15 '15 at 3:39

Flickerlight

494411

You can pass a function to find_all() to help it check whether the string is a Comment.

For example I have below html:

<body>

   <!-- Branding and main navigation -->

   <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>

   <div class="l-branding">

      <p>Just a brand</p>

   </div>

      <!-- test comment here -->

      <div class="block_content">

          <a href="https://www.google.com">Google</a>

   </div>

</body>

Code:

from bs4 import BeautifulSoup as BS

from bs4 import Comment

....

soup=BS(html,'html.parser')

comments=soup.find_all(string=lambda text:isinstance(text,Comment))

for c in comments:

    print c

    print "==========="

    c.decompose()

the output would be:

Branding and main navigation 

============

test comment here

============

BTW, I think the reason why find_all('Comment') doesn't work is (from BeautifulSoup document):

Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match.

edited Oct 15 '15 at 4:00

answered Oct 15 '15 at 3:39

Flickerlight

494411

edited Oct 15 '15 at 4:00

answered Oct 15 '15 at 3:39

Flickerlight

494411

answered Oct 15 '15 at 3:39

Flickerlight

494411

answered Oct 15 '15 at 3:39

Flickerlight

494411

I'm glad I found your answer, thanks! Any idea how we could write it without using lambda?
– JinSnow
Jan 4 '17 at 20:43

add a comment |

I'm glad I found your answer, thanks! Any idea how we could write it without using lambda?
– JinSnow
Jan 4 '17 at 20:43

I'm glad I found your answer, thanks! Any idea how we could write it without using lambda?
– JinSnow
Jan 4 '17 at 20:43

add a comment |

up vote
10
down vote

Two things I needed to do:

First, when importing Beautiful Soup

from bs4 import BeautifulSoup, Comment

Second, here's the code to extract comments

for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):

    comments.extract()

answered Oct 15 '15 at 3:26

Joseph

431311

add a comment |

up vote
10
down vote

Two things I needed to do:

First, when importing Beautiful Soup

from bs4 import BeautifulSoup, Comment

Second, here's the code to extract comments

for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):

    comments.extract()

answered Oct 15 '15 at 3:26

Joseph

431311

add a comment |

up vote
10
down vote

Two things I needed to do:

First, when importing Beautiful Soup

from bs4 import BeautifulSoup, Comment

Second, here's the code to extract comments

for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):

    comments.extract()

answered Oct 15 '15 at 3:26

Joseph

431311

Two things I needed to do:

First, when importing Beautiful Soup

from bs4 import BeautifulSoup, Comment

Second, here's the code to extract comments

for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):

    comments.extract()

answered Oct 15 '15 at 3:26

Joseph

431311

answered Oct 15 '15 at 3:26

Joseph

431311

answered Oct 15 '15 at 3:26

Joseph

431311

answered Oct 15 '15 at 3:26

Joseph

431311

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky