Extracting specific page links from a <a href tag using BeautifulSoup











up vote
1
down vote

favorite












I am using BeautifulSoup to extract all the links from this page: http://kern.humdrum.org/search?s=t&keyword=Haydn



I am getting all these links this way:



# -*- coding: utf-8 -*-

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'http://kern.humdrum.org/search?s=t&keyword=Haydn'

#opening up connecting, grabbing the page
uClient = uReq(my_url)

# put all the content in a variable
page_html = uClient.read()

#close the internet connection
uClient.close()

#It does my HTML parser
page_soup = soup(page_html, "html.parser")

# Grab all of the links
containers = page_soup.findAll('a', href=True)
#print(type(containers))

for container in containers:
link = container
#start_index = link.index('href="')
print(link)
print("---")
#print(start_index)


part of my output is:
enter image description here



Notice that it is returning several links but I really want all the ones with >Someting. (For example, ">Allegro" and "Allegro vivace" and so forth).



I am having a hard time getting the following type of output (example of the image):
"Allegro - http://kern.ccarh.org/cgi-bin/ksdata?location=users/craig/classical/beethoven/piano/sonata&file=sonata01-1.krn&format=info"



In other words, at this point, I have a bunch of anchor tags (+- 1000). From all these tags there are a bunch that are just "trash" and +- 350 of tags that I would like to extract. All these tags look almost the same but the only difference is that the tags that I need have a "> Somebody's name<a>" at the end. I would like to exctract only the link of all the anchor tags with this characteristic.










share|improve this question
























  • Do you need to use beautifulsoup? If use an html parser that allows xpath expressions, this can be much easier. See here
    – bunji
    Nov 11 at 4:20










  • @bunji, I don't need to use it. I just saw that online must people use beautifulsoup this is why I followed. I will check alternative ways, thanks.
    – Fabio Soares
    Nov 11 at 4:25















up vote
1
down vote

favorite












I am using BeautifulSoup to extract all the links from this page: http://kern.humdrum.org/search?s=t&keyword=Haydn



I am getting all these links this way:



# -*- coding: utf-8 -*-

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'http://kern.humdrum.org/search?s=t&keyword=Haydn'

#opening up connecting, grabbing the page
uClient = uReq(my_url)

# put all the content in a variable
page_html = uClient.read()

#close the internet connection
uClient.close()

#It does my HTML parser
page_soup = soup(page_html, "html.parser")

# Grab all of the links
containers = page_soup.findAll('a', href=True)
#print(type(containers))

for container in containers:
link = container
#start_index = link.index('href="')
print(link)
print("---")
#print(start_index)


part of my output is:
enter image description here



Notice that it is returning several links but I really want all the ones with >Someting. (For example, ">Allegro" and "Allegro vivace" and so forth).



I am having a hard time getting the following type of output (example of the image):
"Allegro - http://kern.ccarh.org/cgi-bin/ksdata?location=users/craig/classical/beethoven/piano/sonata&file=sonata01-1.krn&format=info"



In other words, at this point, I have a bunch of anchor tags (+- 1000). From all these tags there are a bunch that are just "trash" and +- 350 of tags that I would like to extract. All these tags look almost the same but the only difference is that the tags that I need have a "> Somebody's name<a>" at the end. I would like to exctract only the link of all the anchor tags with this characteristic.










share|improve this question
























  • Do you need to use beautifulsoup? If use an html parser that allows xpath expressions, this can be much easier. See here
    – bunji
    Nov 11 at 4:20










  • @bunji, I don't need to use it. I just saw that online must people use beautifulsoup this is why I followed. I will check alternative ways, thanks.
    – Fabio Soares
    Nov 11 at 4:25













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I am using BeautifulSoup to extract all the links from this page: http://kern.humdrum.org/search?s=t&keyword=Haydn



I am getting all these links this way:



# -*- coding: utf-8 -*-

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'http://kern.humdrum.org/search?s=t&keyword=Haydn'

#opening up connecting, grabbing the page
uClient = uReq(my_url)

# put all the content in a variable
page_html = uClient.read()

#close the internet connection
uClient.close()

#It does my HTML parser
page_soup = soup(page_html, "html.parser")

# Grab all of the links
containers = page_soup.findAll('a', href=True)
#print(type(containers))

for container in containers:
link = container
#start_index = link.index('href="')
print(link)
print("---")
#print(start_index)


part of my output is:
enter image description here



Notice that it is returning several links but I really want all the ones with >Someting. (For example, ">Allegro" and "Allegro vivace" and so forth).



I am having a hard time getting the following type of output (example of the image):
"Allegro - http://kern.ccarh.org/cgi-bin/ksdata?location=users/craig/classical/beethoven/piano/sonata&file=sonata01-1.krn&format=info"



In other words, at this point, I have a bunch of anchor tags (+- 1000). From all these tags there are a bunch that are just "trash" and +- 350 of tags that I would like to extract. All these tags look almost the same but the only difference is that the tags that I need have a "> Somebody's name<a>" at the end. I would like to exctract only the link of all the anchor tags with this characteristic.










share|improve this question















I am using BeautifulSoup to extract all the links from this page: http://kern.humdrum.org/search?s=t&keyword=Haydn



I am getting all these links this way:



# -*- coding: utf-8 -*-

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'http://kern.humdrum.org/search?s=t&keyword=Haydn'

#opening up connecting, grabbing the page
uClient = uReq(my_url)

# put all the content in a variable
page_html = uClient.read()

#close the internet connection
uClient.close()

#It does my HTML parser
page_soup = soup(page_html, "html.parser")

# Grab all of the links
containers = page_soup.findAll('a', href=True)
#print(type(containers))

for container in containers:
link = container
#start_index = link.index('href="')
print(link)
print("---")
#print(start_index)


part of my output is:
enter image description here



Notice that it is returning several links but I really want all the ones with >Someting. (For example, ">Allegro" and "Allegro vivace" and so forth).



I am having a hard time getting the following type of output (example of the image):
"Allegro - http://kern.ccarh.org/cgi-bin/ksdata?location=users/craig/classical/beethoven/piano/sonata&file=sonata01-1.krn&format=info"



In other words, at this point, I have a bunch of anchor tags (+- 1000). From all these tags there are a bunch that are just "trash" and +- 350 of tags that I would like to extract. All these tags look almost the same but the only difference is that the tags that I need have a "> Somebody's name<a>" at the end. I would like to exctract only the link of all the anchor tags with this characteristic.







python beautifulsoup






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 4:38

























asked Nov 11 at 4:08









Fabio Soares

759




759












  • Do you need to use beautifulsoup? If use an html parser that allows xpath expressions, this can be much easier. See here
    – bunji
    Nov 11 at 4:20










  • @bunji, I don't need to use it. I just saw that online must people use beautifulsoup this is why I followed. I will check alternative ways, thanks.
    – Fabio Soares
    Nov 11 at 4:25


















  • Do you need to use beautifulsoup? If use an html parser that allows xpath expressions, this can be much easier. See here
    – bunji
    Nov 11 at 4:20










  • @bunji, I don't need to use it. I just saw that online must people use beautifulsoup this is why I followed. I will check alternative ways, thanks.
    – Fabio Soares
    Nov 11 at 4:25
















Do you need to use beautifulsoup? If use an html parser that allows xpath expressions, this can be much easier. See here
– bunji
Nov 11 at 4:20




Do you need to use beautifulsoup? If use an html parser that allows xpath expressions, this can be much easier. See here
– bunji
Nov 11 at 4:20












@bunji, I don't need to use it. I just saw that online must people use beautifulsoup this is why I followed. I will check alternative ways, thanks.
– Fabio Soares
Nov 11 at 4:25




@bunji, I don't need to use it. I just saw that online must people use beautifulsoup this is why I followed. I will check alternative ways, thanks.
– Fabio Soares
Nov 11 at 4:25












4 Answers
4






active

oldest

votes

















up vote
3
down vote



accepted










From what I can see in the image the ones with info have an href attribute containing format="info" so you could use an attribute=value CSS selector of [href*=format="info"] , where the * indicates contains; the attribute value contains the substring after the first equals.



import bs4 , requests

res = requests.get("http://kern.humdrum.org/search?s=t&keyword=Haydn")
soup = bs4.BeautifulSoup(res.text,"html.parser")
for link in soup.select('[href*=format="info"]'):
print(link.getText(), link['href'])





share|improve this answer






























    up vote
    1
    down vote













    The best and easiest way is using text attribute when printing the link. like this :
    print link.text






    share|improve this answer




























      up vote
      0
      down vote













      Assuming you already have a list of the substrings you need to search for, you can do something like:



      for link in containers:
      text = link.get_text().lower()
      if any(text.endswith(substr) for substr in substring_list):
      print(link)
      print('---')





      share|improve this answer























      • This would not do what I need. The whole point of me scraping the page is to try to automatic grab all the links. If I filter for "allegro", this would be only one link.
        – Fabio Soares
        Nov 11 at 4:26










      • Not sure what you're trying to do then. That loop printed a whole bunch of anchor tags for me. Could you provide more context about your problem?
        – eicksl
        Nov 11 at 4:30










      • Sure. At this point, I have a bunch of anchor tags (+- 1000). From all these tags there are a bunch that are just "trash" and +- 350 of tags that I would like to extract. All these tags look almost the same but the only difference is that the tags that I need have a "> Somebody's name<a>" at the end. I would like to exctract only the link of all the anchor tags with this characteristic.
        – Fabio Soares
        Nov 11 at 4:37










      • So in other words, you need to find all tags whose text ends with a certain substring? Or are there multiple different substrings you need to search for?
        – eicksl
        Nov 11 at 4:40












      • There are multiple substrings that I need to search for. For example, it could be "Allegro con brio" or "Presto" or... The only way to identify this tag is in the end of the tag I would have something like "> Somebody's name<a>" (i.e. "> Presto<a>")
        – Fabio Soares
        Nov 11 at 4:45


















      up vote
      0
      down vote













      you want to extract link with specified anchor text?



      for container in containers:
      link = container
      # match exact
      #if 'Allegro di molto' == link.text:
      if 'Allegro' in link.text: # contain
      print(link)
      print("---")





      share|improve this answer





















        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














         

        draft saved


        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53245761%2fextracting-specific-page-links-from-a-a-href-tag-using-beautifulsoup%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        4 Answers
        4






        active

        oldest

        votes








        4 Answers
        4






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        3
        down vote



        accepted










        From what I can see in the image the ones with info have an href attribute containing format="info" so you could use an attribute=value CSS selector of [href*=format="info"] , where the * indicates contains; the attribute value contains the substring after the first equals.



        import bs4 , requests

        res = requests.get("http://kern.humdrum.org/search?s=t&keyword=Haydn")
        soup = bs4.BeautifulSoup(res.text,"html.parser")
        for link in soup.select('[href*=format="info"]'):
        print(link.getText(), link['href'])





        share|improve this answer



























          up vote
          3
          down vote



          accepted










          From what I can see in the image the ones with info have an href attribute containing format="info" so you could use an attribute=value CSS selector of [href*=format="info"] , where the * indicates contains; the attribute value contains the substring after the first equals.



          import bs4 , requests

          res = requests.get("http://kern.humdrum.org/search?s=t&keyword=Haydn")
          soup = bs4.BeautifulSoup(res.text,"html.parser")
          for link in soup.select('[href*=format="info"]'):
          print(link.getText(), link['href'])





          share|improve this answer

























            up vote
            3
            down vote



            accepted







            up vote
            3
            down vote



            accepted






            From what I can see in the image the ones with info have an href attribute containing format="info" so you could use an attribute=value CSS selector of [href*=format="info"] , where the * indicates contains; the attribute value contains the substring after the first equals.



            import bs4 , requests

            res = requests.get("http://kern.humdrum.org/search?s=t&keyword=Haydn")
            soup = bs4.BeautifulSoup(res.text,"html.parser")
            for link in soup.select('[href*=format="info"]'):
            print(link.getText(), link['href'])





            share|improve this answer














            From what I can see in the image the ones with info have an href attribute containing format="info" so you could use an attribute=value CSS selector of [href*=format="info"] , where the * indicates contains; the attribute value contains the substring after the first equals.



            import bs4 , requests

            res = requests.get("http://kern.humdrum.org/search?s=t&keyword=Haydn")
            soup = bs4.BeautifulSoup(res.text,"html.parser")
            for link in soup.select('[href*=format="info"]'):
            print(link.getText(), link['href'])






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 11 at 13:41

























            answered Nov 11 at 5:16









            QHarr

            26.4k81839




            26.4k81839
























                up vote
                1
                down vote













                The best and easiest way is using text attribute when printing the link. like this :
                print link.text






                share|improve this answer

























                  up vote
                  1
                  down vote













                  The best and easiest way is using text attribute when printing the link. like this :
                  print link.text






                  share|improve this answer























                    up vote
                    1
                    down vote










                    up vote
                    1
                    down vote









                    The best and easiest way is using text attribute when printing the link. like this :
                    print link.text






                    share|improve this answer












                    The best and easiest way is using text attribute when printing the link. like this :
                    print link.text







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 11 at 5:22









                    Ali Kargar

                    1444




                    1444






















                        up vote
                        0
                        down vote













                        Assuming you already have a list of the substrings you need to search for, you can do something like:



                        for link in containers:
                        text = link.get_text().lower()
                        if any(text.endswith(substr) for substr in substring_list):
                        print(link)
                        print('---')





                        share|improve this answer























                        • This would not do what I need. The whole point of me scraping the page is to try to automatic grab all the links. If I filter for "allegro", this would be only one link.
                          – Fabio Soares
                          Nov 11 at 4:26










                        • Not sure what you're trying to do then. That loop printed a whole bunch of anchor tags for me. Could you provide more context about your problem?
                          – eicksl
                          Nov 11 at 4:30










                        • Sure. At this point, I have a bunch of anchor tags (+- 1000). From all these tags there are a bunch that are just "trash" and +- 350 of tags that I would like to extract. All these tags look almost the same but the only difference is that the tags that I need have a "> Somebody's name<a>" at the end. I would like to exctract only the link of all the anchor tags with this characteristic.
                          – Fabio Soares
                          Nov 11 at 4:37










                        • So in other words, you need to find all tags whose text ends with a certain substring? Or are there multiple different substrings you need to search for?
                          – eicksl
                          Nov 11 at 4:40












                        • There are multiple substrings that I need to search for. For example, it could be "Allegro con brio" or "Presto" or... The only way to identify this tag is in the end of the tag I would have something like "> Somebody's name<a>" (i.e. "> Presto<a>")
                          – Fabio Soares
                          Nov 11 at 4:45















                        up vote
                        0
                        down vote













                        Assuming you already have a list of the substrings you need to search for, you can do something like:



                        for link in containers:
                        text = link.get_text().lower()
                        if any(text.endswith(substr) for substr in substring_list):
                        print(link)
                        print('---')





                        share|improve this answer























                        • This would not do what I need. The whole point of me scraping the page is to try to automatic grab all the links. If I filter for "allegro", this would be only one link.
                          – Fabio Soares
                          Nov 11 at 4:26










                        • Not sure what you're trying to do then. That loop printed a whole bunch of anchor tags for me. Could you provide more context about your problem?
                          – eicksl
                          Nov 11 at 4:30










                        • Sure. At this point, I have a bunch of anchor tags (+- 1000). From all these tags there are a bunch that are just "trash" and +- 350 of tags that I would like to extract. All these tags look almost the same but the only difference is that the tags that I need have a "> Somebody's name<a>" at the end. I would like to exctract only the link of all the anchor tags with this characteristic.
                          – Fabio Soares
                          Nov 11 at 4:37










                        • So in other words, you need to find all tags whose text ends with a certain substring? Or are there multiple different substrings you need to search for?
                          – eicksl
                          Nov 11 at 4:40












                        • There are multiple substrings that I need to search for. For example, it could be "Allegro con brio" or "Presto" or... The only way to identify this tag is in the end of the tag I would have something like "> Somebody's name<a>" (i.e. "> Presto<a>")
                          – Fabio Soares
                          Nov 11 at 4:45













                        up vote
                        0
                        down vote










                        up vote
                        0
                        down vote









                        Assuming you already have a list of the substrings you need to search for, you can do something like:



                        for link in containers:
                        text = link.get_text().lower()
                        if any(text.endswith(substr) for substr in substring_list):
                        print(link)
                        print('---')





                        share|improve this answer














                        Assuming you already have a list of the substrings you need to search for, you can do something like:



                        for link in containers:
                        text = link.get_text().lower()
                        if any(text.endswith(substr) for substr in substring_list):
                        print(link)
                        print('---')






                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Nov 11 at 4:52

























                        answered Nov 11 at 4:21









                        eicksl

                        29427




                        29427












                        • This would not do what I need. The whole point of me scraping the page is to try to automatic grab all the links. If I filter for "allegro", this would be only one link.
                          – Fabio Soares
                          Nov 11 at 4:26










                        • Not sure what you're trying to do then. That loop printed a whole bunch of anchor tags for me. Could you provide more context about your problem?
                          – eicksl
                          Nov 11 at 4:30










                        • Sure. At this point, I have a bunch of anchor tags (+- 1000). From all these tags there are a bunch that are just "trash" and +- 350 of tags that I would like to extract. All these tags look almost the same but the only difference is that the tags that I need have a "> Somebody's name<a>" at the end. I would like to exctract only the link of all the anchor tags with this characteristic.
                          – Fabio Soares
                          Nov 11 at 4:37










                        • So in other words, you need to find all tags whose text ends with a certain substring? Or are there multiple different substrings you need to search for?
                          – eicksl
                          Nov 11 at 4:40












                        • There are multiple substrings that I need to search for. For example, it could be "Allegro con brio" or "Presto" or... The only way to identify this tag is in the end of the tag I would have something like "> Somebody's name<a>" (i.e. "> Presto<a>")
                          – Fabio Soares
                          Nov 11 at 4:45


















                        • This would not do what I need. The whole point of me scraping the page is to try to automatic grab all the links. If I filter for "allegro", this would be only one link.
                          – Fabio Soares
                          Nov 11 at 4:26










                        • Not sure what you're trying to do then. That loop printed a whole bunch of anchor tags for me. Could you provide more context about your problem?
                          – eicksl
                          Nov 11 at 4:30










                        • Sure. At this point, I have a bunch of anchor tags (+- 1000). From all these tags there are a bunch that are just "trash" and +- 350 of tags that I would like to extract. All these tags look almost the same but the only difference is that the tags that I need have a "> Somebody's name<a>" at the end. I would like to exctract only the link of all the anchor tags with this characteristic.
                          – Fabio Soares
                          Nov 11 at 4:37










                        • So in other words, you need to find all tags whose text ends with a certain substring? Or are there multiple different substrings you need to search for?
                          – eicksl
                          Nov 11 at 4:40












                        • There are multiple substrings that I need to search for. For example, it could be "Allegro con brio" or "Presto" or... The only way to identify this tag is in the end of the tag I would have something like "> Somebody's name<a>" (i.e. "> Presto<a>")
                          – Fabio Soares
                          Nov 11 at 4:45
















                        This would not do what I need. The whole point of me scraping the page is to try to automatic grab all the links. If I filter for "allegro", this would be only one link.
                        – Fabio Soares
                        Nov 11 at 4:26




                        This would not do what I need. The whole point of me scraping the page is to try to automatic grab all the links. If I filter for "allegro", this would be only one link.
                        – Fabio Soares
                        Nov 11 at 4:26












                        Not sure what you're trying to do then. That loop printed a whole bunch of anchor tags for me. Could you provide more context about your problem?
                        – eicksl
                        Nov 11 at 4:30




                        Not sure what you're trying to do then. That loop printed a whole bunch of anchor tags for me. Could you provide more context about your problem?
                        – eicksl
                        Nov 11 at 4:30












                        Sure. At this point, I have a bunch of anchor tags (+- 1000). From all these tags there are a bunch that are just "trash" and +- 350 of tags that I would like to extract. All these tags look almost the same but the only difference is that the tags that I need have a "> Somebody's name<a>" at the end. I would like to exctract only the link of all the anchor tags with this characteristic.
                        – Fabio Soares
                        Nov 11 at 4:37




                        Sure. At this point, I have a bunch of anchor tags (+- 1000). From all these tags there are a bunch that are just "trash" and +- 350 of tags that I would like to extract. All these tags look almost the same but the only difference is that the tags that I need have a "> Somebody's name<a>" at the end. I would like to exctract only the link of all the anchor tags with this characteristic.
                        – Fabio Soares
                        Nov 11 at 4:37












                        So in other words, you need to find all tags whose text ends with a certain substring? Or are there multiple different substrings you need to search for?
                        – eicksl
                        Nov 11 at 4:40






                        So in other words, you need to find all tags whose text ends with a certain substring? Or are there multiple different substrings you need to search for?
                        – eicksl
                        Nov 11 at 4:40














                        There are multiple substrings that I need to search for. For example, it could be "Allegro con brio" or "Presto" or... The only way to identify this tag is in the end of the tag I would have something like "> Somebody's name<a>" (i.e. "> Presto<a>")
                        – Fabio Soares
                        Nov 11 at 4:45




                        There are multiple substrings that I need to search for. For example, it could be "Allegro con brio" or "Presto" or... The only way to identify this tag is in the end of the tag I would have something like "> Somebody's name<a>" (i.e. "> Presto<a>")
                        – Fabio Soares
                        Nov 11 at 4:45










                        up vote
                        0
                        down vote













                        you want to extract link with specified anchor text?



                        for container in containers:
                        link = container
                        # match exact
                        #if 'Allegro di molto' == link.text:
                        if 'Allegro' in link.text: # contain
                        print(link)
                        print("---")





                        share|improve this answer

























                          up vote
                          0
                          down vote













                          you want to extract link with specified anchor text?



                          for container in containers:
                          link = container
                          # match exact
                          #if 'Allegro di molto' == link.text:
                          if 'Allegro' in link.text: # contain
                          print(link)
                          print("---")





                          share|improve this answer























                            up vote
                            0
                            down vote










                            up vote
                            0
                            down vote









                            you want to extract link with specified anchor text?



                            for container in containers:
                            link = container
                            # match exact
                            #if 'Allegro di molto' == link.text:
                            if 'Allegro' in link.text: # contain
                            print(link)
                            print("---")





                            share|improve this answer












                            you want to extract link with specified anchor text?



                            for container in containers:
                            link = container
                            # match exact
                            #if 'Allegro di molto' == link.text:
                            if 'Allegro' in link.text: # contain
                            print(link)
                            print("---")






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Nov 11 at 10:12









                            ewwink

                            6,47122233




                            6,47122233






























                                 

                                draft saved


                                draft discarded



















































                                 


                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53245761%2fextracting-specific-page-links-from-a-a-href-tag-using-beautifulsoup%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Xamarin.iOS Cant Deploy on Iphone

                                Glorious Revolution

                                Dulmage-Mendelsohn matrix decomposition in Python