How to deal with bad HTML in scrapping with BeautifulSoup
up vote
-1
down vote
favorite
During accessing the respective table, the table ends with in middle. While accessing the website code through Ctrl+U, i found the complete table.
Screen Shots are attached below
Accessing through Soup
and
Accessing through Ctrl+U or inspecting element
Accessing in Soup Like
soup = BeautifulSoup(r.text, 'html.parser')
table =soup.findAll('table',{'align':'center', 'border':'1', 'cellpadding':'1' ,'cellspacing':'0', 'width':'800'})
print(table)
Website Link is
web-scraping python-requests
add a comment |
up vote
-1
down vote
favorite
During accessing the respective table, the table ends with in middle. While accessing the website code through Ctrl+U, i found the complete table.
Screen Shots are attached below
Accessing through Soup
and
Accessing through Ctrl+U or inspecting element
Accessing in Soup Like
soup = BeautifulSoup(r.text, 'html.parser')
table =soup.findAll('table',{'align':'center', 'border':'1', 'cellpadding':'1' ,'cellspacing':'0', 'width':'800'})
print(table)
Website Link is
web-scraping python-requests
i have done it by changing [ soup = BeautifulSoup(r.text, 'lxml') ]
– aftab qaisrani
13 hours ago
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
During accessing the respective table, the table ends with in middle. While accessing the website code through Ctrl+U, i found the complete table.
Screen Shots are attached below
Accessing through Soup
and
Accessing through Ctrl+U or inspecting element
Accessing in Soup Like
soup = BeautifulSoup(r.text, 'html.parser')
table =soup.findAll('table',{'align':'center', 'border':'1', 'cellpadding':'1' ,'cellspacing':'0', 'width':'800'})
print(table)
Website Link is
web-scraping python-requests
During accessing the respective table, the table ends with in middle. While accessing the website code through Ctrl+U, i found the complete table.
Screen Shots are attached below
Accessing through Soup
and
Accessing through Ctrl+U or inspecting element
Accessing in Soup Like
soup = BeautifulSoup(r.text, 'html.parser')
table =soup.findAll('table',{'align':'center', 'border':'1', 'cellpadding':'1' ,'cellspacing':'0', 'width':'800'})
print(table)
Website Link is
web-scraping python-requests
web-scraping python-requests
edited 15 hours ago
asked 16 hours ago
aftab qaisrani
13
13
i have done it by changing [ soup = BeautifulSoup(r.text, 'lxml') ]
– aftab qaisrani
13 hours ago
add a comment |
i have done it by changing [ soup = BeautifulSoup(r.text, 'lxml') ]
– aftab qaisrani
13 hours ago
i have done it by changing [ soup = BeautifulSoup(r.text, 'lxml') ]
– aftab qaisrani
13 hours ago
i have done it by changing [ soup = BeautifulSoup(r.text, 'lxml') ]
– aftab qaisrani
13 hours ago
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237625%2fhow-to-deal-with-bad-html-in-scrapping-with-beautifulsoup%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
i have done it by changing [ soup = BeautifulSoup(r.text, 'lxml') ]
– aftab qaisrani
13 hours ago