How to get li titles using beautiful soup

up vote
0
down vote

favorite

I'm trying to scrape the list of universities in the United States. I've tried looking around for hours but nothing is working (i.e. other methods just crash the console). Here's what I have so far.

The HTML is Formatted as follows:

<ol>

<a name="A"><b>A</b></a><br/>

<p>

<li><a href="http://www.acu.edu/">

    Abilene Christian University</a> (acu.edu)



<li><a href="http://www.adelphi.edu/">

    Adelphi University</a> (adelphi.edu)



<li><a href="http://www.scottlan.edu/">

        Agnes Scott College</a> (scottlan.edu)



<li><a href="http://www.afit.af.mil/">

    Air Force Institute of Technology</a> (afit.af.mil)

This is my code:

from bs4 import BeautifulSoup as soup

from urllib.request import urlopen as uReq



#Site for list scraping 

my_url = "http://doors.stanford.edu/~sr/universities.html"



#Open connection and grab the page

uClient = uReq(my_url)



#Save contents to variable

page_html = uClient.read()



#Close connection

uClient.close()



#Html parsing

page_soup = soup(page_html, "html.parser")



#Checking the list

page_soup.ol

I've tried page_soup.findChildren("li") as well as page_soup.find("li", {"class":"text"}) and countless others to no avail.

Help?

asked Nov 12 at 2:06

handavidbang

109119

add a comment |

up vote
0
down vote

favorite

I'm trying to scrape the list of universities in the United States. I've tried looking around for hours but nothing is working (i.e. other methods just crash the console). Here's what I have so far.

The HTML is Formatted as follows:

<ol>

<a name="A"><b>A</b></a><br/>

<p>

<li><a href="http://www.acu.edu/">

    Abilene Christian University</a> (acu.edu)



<li><a href="http://www.adelphi.edu/">

    Adelphi University</a> (adelphi.edu)



<li><a href="http://www.scottlan.edu/">

        Agnes Scott College</a> (scottlan.edu)



<li><a href="http://www.afit.af.mil/">

    Air Force Institute of Technology</a> (afit.af.mil)

This is my code:

from bs4 import BeautifulSoup as soup

from urllib.request import urlopen as uReq



#Site for list scraping 

my_url = "http://doors.stanford.edu/~sr/universities.html"



#Open connection and grab the page

uClient = uReq(my_url)



#Save contents to variable

page_html = uClient.read()



#Close connection

uClient.close()



#Html parsing

page_soup = soup(page_html, "html.parser")



#Checking the list

page_soup.ol

I've tried page_soup.findChildren("li") as well as page_soup.find("li", {"class":"text"}) and countless others to no avail.

Help?

asked Nov 12 at 2:06

handavidbang

109119

add a comment |

up vote
0
down vote

favorite

I'm trying to scrape the list of universities in the United States. I've tried looking around for hours but nothing is working (i.e. other methods just crash the console). Here's what I have so far.

The HTML is Formatted as follows:

<ol>

<a name="A"><b>A</b></a><br/>

<p>

<li><a href="http://www.acu.edu/">

    Abilene Christian University</a> (acu.edu)



<li><a href="http://www.adelphi.edu/">

    Adelphi University</a> (adelphi.edu)



<li><a href="http://www.scottlan.edu/">

        Agnes Scott College</a> (scottlan.edu)



<li><a href="http://www.afit.af.mil/">

    Air Force Institute of Technology</a> (afit.af.mil)

This is my code:

from bs4 import BeautifulSoup as soup

from urllib.request import urlopen as uReq



#Site for list scraping 

my_url = "http://doors.stanford.edu/~sr/universities.html"



#Open connection and grab the page

uClient = uReq(my_url)



#Save contents to variable

page_html = uClient.read()



#Close connection

uClient.close()



#Html parsing

page_soup = soup(page_html, "html.parser")



#Checking the list

page_soup.ol

I've tried page_soup.findChildren("li") as well as page_soup.find("li", {"class":"text"}) and countless others to no avail.

Help?

asked Nov 12 at 2:06

handavidbang

109119

I'm trying to scrape the list of universities in the United States. I've tried looking around for hours but nothing is working (i.e. other methods just crash the console). Here's what I have so far.

The HTML is Formatted as follows:

<ol>

<a name="A"><b>A</b></a><br/>

<p>

<li><a href="http://www.acu.edu/">

    Abilene Christian University</a> (acu.edu)



<li><a href="http://www.adelphi.edu/">

    Adelphi University</a> (adelphi.edu)



<li><a href="http://www.scottlan.edu/">

        Agnes Scott College</a> (scottlan.edu)



<li><a href="http://www.afit.af.mil/">

    Air Force Institute of Technology</a> (afit.af.mil)

This is my code:

from bs4 import BeautifulSoup as soup

from urllib.request import urlopen as uReq



#Site for list scraping 

my_url = "http://doors.stanford.edu/~sr/universities.html"



#Open connection and grab the page

uClient = uReq(my_url)



#Save contents to variable

page_html = uClient.read()



#Close connection

uClient.close()



#Html parsing

page_soup = soup(page_html, "html.parser")



#Checking the list

page_soup.ol

I've tried page_soup.findChildren("li") as well as page_soup.find("li", {"class":"text"}) and countless others to no avail.

Help?

python-3.x web-scraping beautifulsoup

asked Nov 12 at 2:06

handavidbang

109119

asked Nov 12 at 2:06

handavidbang

109119

asked Nov 12 at 2:06

handavidbang

109119

asked Nov 12 at 2:06

handavidbang

109119

asked Nov 12 at 2:06

handavidbang

109119

add a comment |

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

I just simply try page_soup.find_all("li") and I can get all the <li> tag.

Don't know why it's unable to get <li> inside the <ol> by "ol.getChildren()", there is also a post of it Unable to scrape <li> tag inside the <ol> tag using beautiful soup.

answered Nov 12 at 2:32

Ha Bom

343417

add a comment |

up vote
0
down vote

After looking at the documentation and experimenting I figured it out. It's kind of dirty though so you'll have to clean it.

#Get the list

listofuni = [li.text for li in page_soup.findAll('li')]

answered Nov 12 at 2:19

handavidbang

109119

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53255167%2fhow-to-get-li-titles-using-beautiful-soup%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

I just simply try page_soup.find_all("li") and I can get all the <li> tag.

Don't know why it's unable to get <li> inside the <ol> by "ol.getChildren()", there is also a post of it Unable to scrape <li> tag inside the <ol> tag using beautiful soup.

answered Nov 12 at 2:32

Ha Bom

343417

add a comment |

up vote
0
down vote

accepted

I just simply try page_soup.find_all("li") and I can get all the <li> tag.

Don't know why it's unable to get <li> inside the <ol> by "ol.getChildren()", there is also a post of it Unable to scrape <li> tag inside the <ol> tag using beautiful soup.

answered Nov 12 at 2:32

Ha Bom

343417

add a comment |

up vote
0
down vote

accepted

I just simply try page_soup.find_all("li") and I can get all the <li> tag.

Don't know why it's unable to get <li> inside the <ol> by "ol.getChildren()", there is also a post of it Unable to scrape <li> tag inside the <ol> tag using beautiful soup.

answered Nov 12 at 2:32

Ha Bom

343417

I just simply try page_soup.find_all("li") and I can get all the <li> tag.

Don't know why it's unable to get <li> inside the <ol> by "ol.getChildren()", there is also a post of it Unable to scrape <li> tag inside the <ol> tag using beautiful soup.

answered Nov 12 at 2:32

Ha Bom

343417

answered Nov 12 at 2:32

Ha Bom

343417

answered Nov 12 at 2:32

Ha Bom

343417

answered Nov 12 at 2:32

Ha Bom

343417

add a comment |

up vote
0
down vote

After looking at the documentation and experimenting I figured it out. It's kind of dirty though so you'll have to clean it.

#Get the list

listofuni = [li.text for li in page_soup.findAll('li')]

answered Nov 12 at 2:19

handavidbang

109119

add a comment |

up vote
0
down vote

After looking at the documentation and experimenting I figured it out. It's kind of dirty though so you'll have to clean it.

#Get the list

listofuni = [li.text for li in page_soup.findAll('li')]

answered Nov 12 at 2:19

handavidbang

109119

add a comment |

up vote
0
down vote

After looking at the documentation and experimenting I figured it out. It's kind of dirty though so you'll have to clean it.

#Get the list

listofuni = [li.text for li in page_soup.findAll('li')]

answered Nov 12 at 2:19

handavidbang

109119

After looking at the documentation and experimenting I figured it out. It's kind of dirty though so you'll have to clean it.

#Get the list

listofuni = [li.text for li in page_soup.findAll('li')]

answered Nov 12 at 2:19

handavidbang

109119

answered Nov 12 at 2:19

handavidbang

109119

answered Nov 12 at 2:19

handavidbang

109119

answered Nov 12 at 2:19

handavidbang

109119

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky