How to get li titles using beautiful soup
up vote
0
down vote
favorite
I'm trying to scrape the list of universities in the United States. I've tried looking around for hours but nothing is working (i.e. other methods just crash the console). Here's what I have so far.
The HTML is Formatted as follows:
<ol>
<a name="A"><b>A</b></a><br/>
<p>
<li><a href="http://www.acu.edu/">
Abilene Christian University</a> (acu.edu)
<li><a href="http://www.adelphi.edu/">
Adelphi University</a> (adelphi.edu)
<li><a href="http://www.scottlan.edu/">
Agnes Scott College</a> (scottlan.edu)
<li><a href="http://www.afit.af.mil/">
Air Force Institute of Technology</a> (afit.af.mil)
This is my code:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
#Site for list scraping
my_url = "http://doors.stanford.edu/~sr/universities.html"
#Open connection and grab the page
uClient = uReq(my_url)
#Save contents to variable
page_html = uClient.read()
#Close connection
uClient.close()
#Html parsing
page_soup = soup(page_html, "html.parser")
#Checking the list
page_soup.ol
I've tried page_soup.findChildren("li") as well as page_soup.find("li", {"class":"text"}) and countless others to no avail.
Help?
python-3.x web-scraping beautifulsoup
add a comment |
up vote
0
down vote
favorite
I'm trying to scrape the list of universities in the United States. I've tried looking around for hours but nothing is working (i.e. other methods just crash the console). Here's what I have so far.
The HTML is Formatted as follows:
<ol>
<a name="A"><b>A</b></a><br/>
<p>
<li><a href="http://www.acu.edu/">
Abilene Christian University</a> (acu.edu)
<li><a href="http://www.adelphi.edu/">
Adelphi University</a> (adelphi.edu)
<li><a href="http://www.scottlan.edu/">
Agnes Scott College</a> (scottlan.edu)
<li><a href="http://www.afit.af.mil/">
Air Force Institute of Technology</a> (afit.af.mil)
This is my code:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
#Site for list scraping
my_url = "http://doors.stanford.edu/~sr/universities.html"
#Open connection and grab the page
uClient = uReq(my_url)
#Save contents to variable
page_html = uClient.read()
#Close connection
uClient.close()
#Html parsing
page_soup = soup(page_html, "html.parser")
#Checking the list
page_soup.ol
I've tried page_soup.findChildren("li") as well as page_soup.find("li", {"class":"text"}) and countless others to no avail.
Help?
python-3.x web-scraping beautifulsoup
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I'm trying to scrape the list of universities in the United States. I've tried looking around for hours but nothing is working (i.e. other methods just crash the console). Here's what I have so far.
The HTML is Formatted as follows:
<ol>
<a name="A"><b>A</b></a><br/>
<p>
<li><a href="http://www.acu.edu/">
Abilene Christian University</a> (acu.edu)
<li><a href="http://www.adelphi.edu/">
Adelphi University</a> (adelphi.edu)
<li><a href="http://www.scottlan.edu/">
Agnes Scott College</a> (scottlan.edu)
<li><a href="http://www.afit.af.mil/">
Air Force Institute of Technology</a> (afit.af.mil)
This is my code:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
#Site for list scraping
my_url = "http://doors.stanford.edu/~sr/universities.html"
#Open connection and grab the page
uClient = uReq(my_url)
#Save contents to variable
page_html = uClient.read()
#Close connection
uClient.close()
#Html parsing
page_soup = soup(page_html, "html.parser")
#Checking the list
page_soup.ol
I've tried page_soup.findChildren("li") as well as page_soup.find("li", {"class":"text"}) and countless others to no avail.
Help?
python-3.x web-scraping beautifulsoup
I'm trying to scrape the list of universities in the United States. I've tried looking around for hours but nothing is working (i.e. other methods just crash the console). Here's what I have so far.
The HTML is Formatted as follows:
<ol>
<a name="A"><b>A</b></a><br/>
<p>
<li><a href="http://www.acu.edu/">
Abilene Christian University</a> (acu.edu)
<li><a href="http://www.adelphi.edu/">
Adelphi University</a> (adelphi.edu)
<li><a href="http://www.scottlan.edu/">
Agnes Scott College</a> (scottlan.edu)
<li><a href="http://www.afit.af.mil/">
Air Force Institute of Technology</a> (afit.af.mil)
This is my code:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
#Site for list scraping
my_url = "http://doors.stanford.edu/~sr/universities.html"
#Open connection and grab the page
uClient = uReq(my_url)
#Save contents to variable
page_html = uClient.read()
#Close connection
uClient.close()
#Html parsing
page_soup = soup(page_html, "html.parser")
#Checking the list
page_soup.ol
I've tried page_soup.findChildren("li") as well as page_soup.find("li", {"class":"text"}) and countless others to no avail.
Help?
python-3.x web-scraping beautifulsoup
python-3.x web-scraping beautifulsoup
asked Nov 12 at 2:06
handavidbang
109119
109119
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
accepted
I just simply try page_soup.find_all("li") and I can get all the <li> tag.
Don't know why it's unable to get <li> inside the <ol> by "ol.getChildren()", there is also a post of it Unable to scrape <li> tag inside the <ol> tag using beautiful soup.
add a comment |
up vote
0
down vote
After looking at the documentation and experimenting I figured it out. It's kind of dirty though so you'll have to clean it.
#Get the list
listofuni = [li.text for li in page_soup.findAll('li')]
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
I just simply try page_soup.find_all("li") and I can get all the <li> tag.
Don't know why it's unable to get <li> inside the <ol> by "ol.getChildren()", there is also a post of it Unable to scrape <li> tag inside the <ol> tag using beautiful soup.
add a comment |
up vote
0
down vote
accepted
I just simply try page_soup.find_all("li") and I can get all the <li> tag.
Don't know why it's unable to get <li> inside the <ol> by "ol.getChildren()", there is also a post of it Unable to scrape <li> tag inside the <ol> tag using beautiful soup.
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
I just simply try page_soup.find_all("li") and I can get all the <li> tag.
Don't know why it's unable to get <li> inside the <ol> by "ol.getChildren()", there is also a post of it Unable to scrape <li> tag inside the <ol> tag using beautiful soup.
I just simply try page_soup.find_all("li") and I can get all the <li> tag.
Don't know why it's unable to get <li> inside the <ol> by "ol.getChildren()", there is also a post of it Unable to scrape <li> tag inside the <ol> tag using beautiful soup.
answered Nov 12 at 2:32
Ha Bom
343417
343417
add a comment |
add a comment |
up vote
0
down vote
After looking at the documentation and experimenting I figured it out. It's kind of dirty though so you'll have to clean it.
#Get the list
listofuni = [li.text for li in page_soup.findAll('li')]
add a comment |
up vote
0
down vote
After looking at the documentation and experimenting I figured it out. It's kind of dirty though so you'll have to clean it.
#Get the list
listofuni = [li.text for li in page_soup.findAll('li')]
add a comment |
up vote
0
down vote
up vote
0
down vote
After looking at the documentation and experimenting I figured it out. It's kind of dirty though so you'll have to clean it.
#Get the list
listofuni = [li.text for li in page_soup.findAll('li')]
After looking at the documentation and experimenting I figured it out. It's kind of dirty though so you'll have to clean it.
#Get the list
listofuni = [li.text for li in page_soup.findAll('li')]
answered Nov 12 at 2:19
handavidbang
109119
109119
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53255167%2fhow-to-get-li-titles-using-beautiful-soup%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown