Selenium loading, but not printing all HTML
up vote
1
down vote
favorite
I am attempting to use Python and Selenium to web-scrape dynamically loaded data from a website. The problem is, only about half of the data is being reported as present, when in reality it all should be there. Even after using pauses before printing out all the page content, or simple find element by class searches, there seems to be no solution. The URL of the site is https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909. As you can see, there are 13 main sections, however I am only able to retrieve data from the first four games. To best show the problem I'll attach the code for printing the inner-HTML for the entire page to show the discrepancies between the loaded and non-loaded data.
from selenium import webdriver
import requests
url = "https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909"
driver = webdriver.Chrome()
driver.get(url)
print(driver.execute_script("return document.documentElement.innerText;"))
EDIT:
The problem is not the wait time, for I am running it line by line and fully waiting for it to load. It appears the problem boild down to selenium not grabbing all the JS loaded text on the page, as seen by the console output in the answer below.
python selenium selenium-webdriver web-scraping webdriverwait
add a comment |
up vote
1
down vote
favorite
I am attempting to use Python and Selenium to web-scrape dynamically loaded data from a website. The problem is, only about half of the data is being reported as present, when in reality it all should be there. Even after using pauses before printing out all the page content, or simple find element by class searches, there seems to be no solution. The URL of the site is https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909. As you can see, there are 13 main sections, however I am only able to retrieve data from the first four games. To best show the problem I'll attach the code for printing the inner-HTML for the entire page to show the discrepancies between the loaded and non-loaded data.
from selenium import webdriver
import requests
url = "https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909"
driver = webdriver.Chrome()
driver.get(url)
print(driver.execute_script("return document.documentElement.innerText;"))
EDIT:
The problem is not the wait time, for I am running it line by line and fully waiting for it to load. It appears the problem boild down to selenium not grabbing all the JS loaded text on the page, as seen by the console output in the answer below.
python selenium selenium-webdriver web-scraping webdriverwait
Are you sure you don't wantdocument.documentElement.innerHTML? or maybedocument.body.innerHTML?
– pguardiario
Nov 12 at 6:01
@pguardiario Why are you posting to exact duplicate comments?
– U9-Forward
Nov 12 at 6:03
Those are actually slightly different. I recommend usingdocument.body.innerHTML
– pguardiario
Nov 12 at 11:09
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I am attempting to use Python and Selenium to web-scrape dynamically loaded data from a website. The problem is, only about half of the data is being reported as present, when in reality it all should be there. Even after using pauses before printing out all the page content, or simple find element by class searches, there seems to be no solution. The URL of the site is https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909. As you can see, there are 13 main sections, however I am only able to retrieve data from the first four games. To best show the problem I'll attach the code for printing the inner-HTML for the entire page to show the discrepancies between the loaded and non-loaded data.
from selenium import webdriver
import requests
url = "https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909"
driver = webdriver.Chrome()
driver.get(url)
print(driver.execute_script("return document.documentElement.innerText;"))
EDIT:
The problem is not the wait time, for I am running it line by line and fully waiting for it to load. It appears the problem boild down to selenium not grabbing all the JS loaded text on the page, as seen by the console output in the answer below.
python selenium selenium-webdriver web-scraping webdriverwait
I am attempting to use Python and Selenium to web-scrape dynamically loaded data from a website. The problem is, only about half of the data is being reported as present, when in reality it all should be there. Even after using pauses before printing out all the page content, or simple find element by class searches, there seems to be no solution. The URL of the site is https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909. As you can see, there are 13 main sections, however I am only able to retrieve data from the first four games. To best show the problem I'll attach the code for printing the inner-HTML for the entire page to show the discrepancies between the loaded and non-loaded data.
from selenium import webdriver
import requests
url = "https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909"
driver = webdriver.Chrome()
driver.get(url)
print(driver.execute_script("return document.documentElement.innerText;"))
EDIT:
The problem is not the wait time, for I am running it line by line and fully waiting for it to load. It appears the problem boild down to selenium not grabbing all the JS loaded text on the page, as seen by the console output in the answer below.
python selenium selenium-webdriver web-scraping webdriverwait
python selenium selenium-webdriver web-scraping webdriverwait
edited Nov 12 at 21:37
asked Nov 12 at 5:45
Kyle Lagerquist
153
153
Are you sure you don't wantdocument.documentElement.innerHTML? or maybedocument.body.innerHTML?
– pguardiario
Nov 12 at 6:01
@pguardiario Why are you posting to exact duplicate comments?
– U9-Forward
Nov 12 at 6:03
Those are actually slightly different. I recommend usingdocument.body.innerHTML
– pguardiario
Nov 12 at 11:09
add a comment |
Are you sure you don't wantdocument.documentElement.innerHTML? or maybedocument.body.innerHTML?
– pguardiario
Nov 12 at 6:01
@pguardiario Why are you posting to exact duplicate comments?
– U9-Forward
Nov 12 at 6:03
Those are actually slightly different. I recommend usingdocument.body.innerHTML
– pguardiario
Nov 12 at 11:09
Are you sure you don't want
document.documentElement.innerHTML ? or maybe document.body.innerHTML ?– pguardiario
Nov 12 at 6:01
Are you sure you don't want
document.documentElement.innerHTML ? or maybe document.body.innerHTML ?– pguardiario
Nov 12 at 6:01
@pguardiario Why are you posting to exact duplicate comments?
– U9-Forward
Nov 12 at 6:03
@pguardiario Why are you posting to exact duplicate comments?
– U9-Forward
Nov 12 at 6:03
Those are actually slightly different. I recommend using
document.body.innerHTML– pguardiario
Nov 12 at 11:09
Those are actually slightly different. I recommend using
document.body.innerHTML– pguardiario
Nov 12 at 11:09
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
@sudonym's analysis was in the right direction. You need to induce WebDriverWait for the desired elements to be visible before you attempt to extract them through execute_script() method as follows:
Code Block:
# -*- coding: UTF-8 -*-
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
url = "https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909"
driver = webdriver.Chrome()
driver.get(url)
WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//h2[contains(.,'USA - National Football League')]//following::section//span[3]")))
print(driver.execute_script("return document.documentElement.innerText;"))
Console Output:
SPORTSBOOK REVIEW
Home
Best Sportsbooks
Rating Guide
Blacklist
Bonuses
BETTING ODDS
FREE PICKS
Sports Picks
NFL
College Football
NBA
NCAAB
MLB
NHL
More Sports
How to Bet
Tools
FORUM
Home
Players Talk
Sportsbooks & Industry
Newbie Forum
Handicapper Think Tank
David Malinsky's Point Blank
Service Plays
Bitcoin Sports Betting
NBA Betting
NFL Betting
NCAAF Betting
MLB Betting
NHL Betting
CONTESTS
EARN BETPOINTS
What Are Betpoints?
SBR Sportsbook
SBR Casino
SBR Racebook
SBR Poker
SBR Store
Today
NFL
NBA
NHL
MLB
College Football
NCAA Basketball
Soccer
Soccer Odds
Major League Soccer
UEFA Champions League
UEFA Nations League
UEFA Europa League
English Premier League
World Cup 2022
Tennis
Tennis Odds
ATP
WTA
UFC
Boxing
More Sports
CFL
WNBA
AFL
Betting Odds/NFL Odds/Consensus
TODAY
|
YESTERDAY
|
DATE
?
Login
?
Settings
?
Bet Tracker
?
Bet Card
?
Favorites
NFL Consensus for Sep 09, 2018
USA - National Football League
Sunday Sep 09, 2018
01:00 PM
/
Pittsburgh vs Cleveland
453
Pittsburgh
454
Cleveland
Current Line
-3½+105
+3½-115
Wagers Placed
10040
54.07%
8530
45.93%
Amount Wagered
$381,520.00
56.10%
$298,550.00
43.90%
Average Bet Size
$38.00
$35.00
SBR Contest Best Bets
22
9
01:00 PM
/
San Francisco vs Minnesota
455
San Francisco
456
Minnesota
Current Line
+6-102
-6-108
Wagers Placed
6250
41.25%
8900
58.75%
Amount Wagered
$175,000.00
29.50%
$418,300.00
70.50%
Average Bet Size
$28.00
$47.00
SBR Contest Best Bets
5
19
01:00 PM
/
Cincinnati vs Indianapolis
457
Cincinnati
458
Indianapolis
Current Line
-1-104
+1-106
Wagers Placed
11640
66.36%
5900
33.64%
Amount Wagered
$1,338,600.00
85.65%
$224,200.00
14.35%
Average Bet Size
$115.00
$38.00
SBR Contest Best Bets
23
12
01:00 PM
/
Buffalo vs Baltimore
459
Buffalo
460
Baltimore
Current Line
+7½-103
-7½-107
Wagers Placed
5220
33.83%
10210
66.17%
Amount Wagered
$78,300.00
16.79%
$387,980.00
83.21%
Average Bet Size
$15.00
$38.00
SBR Contest Best Bets
5
17
01:00 PM
/
Jacksonville vs N.Y. Giants
461
Jacksonville
462
N.Y. Giants
01:00 PM
/
Tampa Bay vs New Orleans
463
Tampa Bay
464
New Orleans
01:00 PM
/
Houston vs New England
465
Houston
466
New England
01:00 PM
/
Tennessee vs Miami
467
Tennessee
468
Miami
04:05 PM
/
Kansas City vs L.A. Chargers
469
Kansas City
470
L.A. Chargers
04:25 PM
/
Seattle vs Denver
471
Seattle
472
Denver
04:25 PM
/
Dallas vs Carolina
473
Dallas
474
Carolina
04:25 PM
/
Washington vs Arizona
475
Washington
476
Arizona
08:20 PM
/
Chicago vs Green Bay
477
Chicago
478
Green Bay
Media
Site Map
Terms of use
Contact Us
Privacy Policy
DMCA
18+. Gamble Responsibly.
© Sportsbook Review. All Rights Reserved.
1
actually this is the way to go for OP's setup. my reason to not do it this way was that for my use case, there have been so manyWebDriverWait.until()calls that the execution time went off into infinity. I decided to just getdriver.page_source()once and do the rest with BS/lxml.
– sudonym
Nov 12 at 8:31
Not sure if you noticed though, but the problem is still present in the console output. But for the first 4 games, all of the wager data is retrieved, however for the next 9 games, only the team names and game start times are return, when looking at the actual page, all elements are present and loaded by the JS.
– Kyle Lagerquist
Nov 12 at 18:52
add a comment |
up vote
1
down vote
This solution is only worth to consider if there are lots of WebDriverWait calls
and given the interest in reduced runtime - else go for DebanjanB's
approach
You need to wait some time to let your html load completely. Also, you can set a timeout for script execution. To add a unconditional wait to driver.get(URL) in selenium, driver.set_page_load_timeout(n) with n = time/seconds and loop:
driver.set_page_load_timeout(n) # Set timeout of n seconds for page load
loading_finished = 0 # Set flag to 0
while loading_finished == 0: # Repeat while flag = 0
try:
sleep(random.uniform(0.1, 0.5)) # wait some time
website = driver.get(URL) # try to load for n seconds
loading_finished = 1 # Set flag to 1 and exit while loop
logger.info("website loaded") # Indicate load success
except:
logger.warn("timeout - retry") # Indicate load fail
else: # If flag == 1
driver.set_script_timeout(n) # Set timeout of n seconds for script
script_finished = 0 # Set flag to 0
while script_finished == 0 # Second loop
try:
print driver.execute_script("return document.documentElement.innerText;")
script_finished = 1 # Set flag to 1
logger.info("script done") # Indicate script done
except:
logger.warn("script timeout")
else:
logger.info("if you're still missing html here, increase timeout")
use WebDriverWait, not this.
– Corey Goldberg
Nov 12 at 13:17
Not sure if you noticed though, but the problem is still present in the console output. But for the first 4 games, all of the wager data is retrieved, however for the next 9 games, only the team names and game start times are return, when looking at the actual page, all elements are present and loaded by the JS.
– Kyle Lagerquist
Nov 12 at 20:15
taking into account that you are executing a script, you coulddriver.set_script_timeout(n)link - I have updated my answer
– sudonym
Nov 13 at 1:06
Great thought, but still not returning the same data for each div.
– Kyle Lagerquist
Nov 13 at 1:14
admittedly, my attempt now looks length and ugly - did you try the solution of @DebanjanB
– sudonym
Nov 13 at 1:17
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53256487%2fselenium-loading-but-not-printing-all-html%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
@sudonym's analysis was in the right direction. You need to induce WebDriverWait for the desired elements to be visible before you attempt to extract them through execute_script() method as follows:
Code Block:
# -*- coding: UTF-8 -*-
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
url = "https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909"
driver = webdriver.Chrome()
driver.get(url)
WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//h2[contains(.,'USA - National Football League')]//following::section//span[3]")))
print(driver.execute_script("return document.documentElement.innerText;"))
Console Output:
SPORTSBOOK REVIEW
Home
Best Sportsbooks
Rating Guide
Blacklist
Bonuses
BETTING ODDS
FREE PICKS
Sports Picks
NFL
College Football
NBA
NCAAB
MLB
NHL
More Sports
How to Bet
Tools
FORUM
Home
Players Talk
Sportsbooks & Industry
Newbie Forum
Handicapper Think Tank
David Malinsky's Point Blank
Service Plays
Bitcoin Sports Betting
NBA Betting
NFL Betting
NCAAF Betting
MLB Betting
NHL Betting
CONTESTS
EARN BETPOINTS
What Are Betpoints?
SBR Sportsbook
SBR Casino
SBR Racebook
SBR Poker
SBR Store
Today
NFL
NBA
NHL
MLB
College Football
NCAA Basketball
Soccer
Soccer Odds
Major League Soccer
UEFA Champions League
UEFA Nations League
UEFA Europa League
English Premier League
World Cup 2022
Tennis
Tennis Odds
ATP
WTA
UFC
Boxing
More Sports
CFL
WNBA
AFL
Betting Odds/NFL Odds/Consensus
TODAY
|
YESTERDAY
|
DATE
?
Login
?
Settings
?
Bet Tracker
?
Bet Card
?
Favorites
NFL Consensus for Sep 09, 2018
USA - National Football League
Sunday Sep 09, 2018
01:00 PM
/
Pittsburgh vs Cleveland
453
Pittsburgh
454
Cleveland
Current Line
-3½+105
+3½-115
Wagers Placed
10040
54.07%
8530
45.93%
Amount Wagered
$381,520.00
56.10%
$298,550.00
43.90%
Average Bet Size
$38.00
$35.00
SBR Contest Best Bets
22
9
01:00 PM
/
San Francisco vs Minnesota
455
San Francisco
456
Minnesota
Current Line
+6-102
-6-108
Wagers Placed
6250
41.25%
8900
58.75%
Amount Wagered
$175,000.00
29.50%
$418,300.00
70.50%
Average Bet Size
$28.00
$47.00
SBR Contest Best Bets
5
19
01:00 PM
/
Cincinnati vs Indianapolis
457
Cincinnati
458
Indianapolis
Current Line
-1-104
+1-106
Wagers Placed
11640
66.36%
5900
33.64%
Amount Wagered
$1,338,600.00
85.65%
$224,200.00
14.35%
Average Bet Size
$115.00
$38.00
SBR Contest Best Bets
23
12
01:00 PM
/
Buffalo vs Baltimore
459
Buffalo
460
Baltimore
Current Line
+7½-103
-7½-107
Wagers Placed
5220
33.83%
10210
66.17%
Amount Wagered
$78,300.00
16.79%
$387,980.00
83.21%
Average Bet Size
$15.00
$38.00
SBR Contest Best Bets
5
17
01:00 PM
/
Jacksonville vs N.Y. Giants
461
Jacksonville
462
N.Y. Giants
01:00 PM
/
Tampa Bay vs New Orleans
463
Tampa Bay
464
New Orleans
01:00 PM
/
Houston vs New England
465
Houston
466
New England
01:00 PM
/
Tennessee vs Miami
467
Tennessee
468
Miami
04:05 PM
/
Kansas City vs L.A. Chargers
469
Kansas City
470
L.A. Chargers
04:25 PM
/
Seattle vs Denver
471
Seattle
472
Denver
04:25 PM
/
Dallas vs Carolina
473
Dallas
474
Carolina
04:25 PM
/
Washington vs Arizona
475
Washington
476
Arizona
08:20 PM
/
Chicago vs Green Bay
477
Chicago
478
Green Bay
Media
Site Map
Terms of use
Contact Us
Privacy Policy
DMCA
18+. Gamble Responsibly.
© Sportsbook Review. All Rights Reserved.
1
actually this is the way to go for OP's setup. my reason to not do it this way was that for my use case, there have been so manyWebDriverWait.until()calls that the execution time went off into infinity. I decided to just getdriver.page_source()once and do the rest with BS/lxml.
– sudonym
Nov 12 at 8:31
Not sure if you noticed though, but the problem is still present in the console output. But for the first 4 games, all of the wager data is retrieved, however for the next 9 games, only the team names and game start times are return, when looking at the actual page, all elements are present and loaded by the JS.
– Kyle Lagerquist
Nov 12 at 18:52
add a comment |
up vote
1
down vote
@sudonym's analysis was in the right direction. You need to induce WebDriverWait for the desired elements to be visible before you attempt to extract them through execute_script() method as follows:
Code Block:
# -*- coding: UTF-8 -*-
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
url = "https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909"
driver = webdriver.Chrome()
driver.get(url)
WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//h2[contains(.,'USA - National Football League')]//following::section//span[3]")))
print(driver.execute_script("return document.documentElement.innerText;"))
Console Output:
SPORTSBOOK REVIEW
Home
Best Sportsbooks
Rating Guide
Blacklist
Bonuses
BETTING ODDS
FREE PICKS
Sports Picks
NFL
College Football
NBA
NCAAB
MLB
NHL
More Sports
How to Bet
Tools
FORUM
Home
Players Talk
Sportsbooks & Industry
Newbie Forum
Handicapper Think Tank
David Malinsky's Point Blank
Service Plays
Bitcoin Sports Betting
NBA Betting
NFL Betting
NCAAF Betting
MLB Betting
NHL Betting
CONTESTS
EARN BETPOINTS
What Are Betpoints?
SBR Sportsbook
SBR Casino
SBR Racebook
SBR Poker
SBR Store
Today
NFL
NBA
NHL
MLB
College Football
NCAA Basketball
Soccer
Soccer Odds
Major League Soccer
UEFA Champions League
UEFA Nations League
UEFA Europa League
English Premier League
World Cup 2022
Tennis
Tennis Odds
ATP
WTA
UFC
Boxing
More Sports
CFL
WNBA
AFL
Betting Odds/NFL Odds/Consensus
TODAY
|
YESTERDAY
|
DATE
?
Login
?
Settings
?
Bet Tracker
?
Bet Card
?
Favorites
NFL Consensus for Sep 09, 2018
USA - National Football League
Sunday Sep 09, 2018
01:00 PM
/
Pittsburgh vs Cleveland
453
Pittsburgh
454
Cleveland
Current Line
-3½+105
+3½-115
Wagers Placed
10040
54.07%
8530
45.93%
Amount Wagered
$381,520.00
56.10%
$298,550.00
43.90%
Average Bet Size
$38.00
$35.00
SBR Contest Best Bets
22
9
01:00 PM
/
San Francisco vs Minnesota
455
San Francisco
456
Minnesota
Current Line
+6-102
-6-108
Wagers Placed
6250
41.25%
8900
58.75%
Amount Wagered
$175,000.00
29.50%
$418,300.00
70.50%
Average Bet Size
$28.00
$47.00
SBR Contest Best Bets
5
19
01:00 PM
/
Cincinnati vs Indianapolis
457
Cincinnati
458
Indianapolis
Current Line
-1-104
+1-106
Wagers Placed
11640
66.36%
5900
33.64%
Amount Wagered
$1,338,600.00
85.65%
$224,200.00
14.35%
Average Bet Size
$115.00
$38.00
SBR Contest Best Bets
23
12
01:00 PM
/
Buffalo vs Baltimore
459
Buffalo
460
Baltimore
Current Line
+7½-103
-7½-107
Wagers Placed
5220
33.83%
10210
66.17%
Amount Wagered
$78,300.00
16.79%
$387,980.00
83.21%
Average Bet Size
$15.00
$38.00
SBR Contest Best Bets
5
17
01:00 PM
/
Jacksonville vs N.Y. Giants
461
Jacksonville
462
N.Y. Giants
01:00 PM
/
Tampa Bay vs New Orleans
463
Tampa Bay
464
New Orleans
01:00 PM
/
Houston vs New England
465
Houston
466
New England
01:00 PM
/
Tennessee vs Miami
467
Tennessee
468
Miami
04:05 PM
/
Kansas City vs L.A. Chargers
469
Kansas City
470
L.A. Chargers
04:25 PM
/
Seattle vs Denver
471
Seattle
472
Denver
04:25 PM
/
Dallas vs Carolina
473
Dallas
474
Carolina
04:25 PM
/
Washington vs Arizona
475
Washington
476
Arizona
08:20 PM
/
Chicago vs Green Bay
477
Chicago
478
Green Bay
Media
Site Map
Terms of use
Contact Us
Privacy Policy
DMCA
18+. Gamble Responsibly.
© Sportsbook Review. All Rights Reserved.
1
actually this is the way to go for OP's setup. my reason to not do it this way was that for my use case, there have been so manyWebDriverWait.until()calls that the execution time went off into infinity. I decided to just getdriver.page_source()once and do the rest with BS/lxml.
– sudonym
Nov 12 at 8:31
Not sure if you noticed though, but the problem is still present in the console output. But for the first 4 games, all of the wager data is retrieved, however for the next 9 games, only the team names and game start times are return, when looking at the actual page, all elements are present and loaded by the JS.
– Kyle Lagerquist
Nov 12 at 18:52
add a comment |
up vote
1
down vote
up vote
1
down vote
@sudonym's analysis was in the right direction. You need to induce WebDriverWait for the desired elements to be visible before you attempt to extract them through execute_script() method as follows:
Code Block:
# -*- coding: UTF-8 -*-
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
url = "https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909"
driver = webdriver.Chrome()
driver.get(url)
WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//h2[contains(.,'USA - National Football League')]//following::section//span[3]")))
print(driver.execute_script("return document.documentElement.innerText;"))
Console Output:
SPORTSBOOK REVIEW
Home
Best Sportsbooks
Rating Guide
Blacklist
Bonuses
BETTING ODDS
FREE PICKS
Sports Picks
NFL
College Football
NBA
NCAAB
MLB
NHL
More Sports
How to Bet
Tools
FORUM
Home
Players Talk
Sportsbooks & Industry
Newbie Forum
Handicapper Think Tank
David Malinsky's Point Blank
Service Plays
Bitcoin Sports Betting
NBA Betting
NFL Betting
NCAAF Betting
MLB Betting
NHL Betting
CONTESTS
EARN BETPOINTS
What Are Betpoints?
SBR Sportsbook
SBR Casino
SBR Racebook
SBR Poker
SBR Store
Today
NFL
NBA
NHL
MLB
College Football
NCAA Basketball
Soccer
Soccer Odds
Major League Soccer
UEFA Champions League
UEFA Nations League
UEFA Europa League
English Premier League
World Cup 2022
Tennis
Tennis Odds
ATP
WTA
UFC
Boxing
More Sports
CFL
WNBA
AFL
Betting Odds/NFL Odds/Consensus
TODAY
|
YESTERDAY
|
DATE
?
Login
?
Settings
?
Bet Tracker
?
Bet Card
?
Favorites
NFL Consensus for Sep 09, 2018
USA - National Football League
Sunday Sep 09, 2018
01:00 PM
/
Pittsburgh vs Cleveland
453
Pittsburgh
454
Cleveland
Current Line
-3½+105
+3½-115
Wagers Placed
10040
54.07%
8530
45.93%
Amount Wagered
$381,520.00
56.10%
$298,550.00
43.90%
Average Bet Size
$38.00
$35.00
SBR Contest Best Bets
22
9
01:00 PM
/
San Francisco vs Minnesota
455
San Francisco
456
Minnesota
Current Line
+6-102
-6-108
Wagers Placed
6250
41.25%
8900
58.75%
Amount Wagered
$175,000.00
29.50%
$418,300.00
70.50%
Average Bet Size
$28.00
$47.00
SBR Contest Best Bets
5
19
01:00 PM
/
Cincinnati vs Indianapolis
457
Cincinnati
458
Indianapolis
Current Line
-1-104
+1-106
Wagers Placed
11640
66.36%
5900
33.64%
Amount Wagered
$1,338,600.00
85.65%
$224,200.00
14.35%
Average Bet Size
$115.00
$38.00
SBR Contest Best Bets
23
12
01:00 PM
/
Buffalo vs Baltimore
459
Buffalo
460
Baltimore
Current Line
+7½-103
-7½-107
Wagers Placed
5220
33.83%
10210
66.17%
Amount Wagered
$78,300.00
16.79%
$387,980.00
83.21%
Average Bet Size
$15.00
$38.00
SBR Contest Best Bets
5
17
01:00 PM
/
Jacksonville vs N.Y. Giants
461
Jacksonville
462
N.Y. Giants
01:00 PM
/
Tampa Bay vs New Orleans
463
Tampa Bay
464
New Orleans
01:00 PM
/
Houston vs New England
465
Houston
466
New England
01:00 PM
/
Tennessee vs Miami
467
Tennessee
468
Miami
04:05 PM
/
Kansas City vs L.A. Chargers
469
Kansas City
470
L.A. Chargers
04:25 PM
/
Seattle vs Denver
471
Seattle
472
Denver
04:25 PM
/
Dallas vs Carolina
473
Dallas
474
Carolina
04:25 PM
/
Washington vs Arizona
475
Washington
476
Arizona
08:20 PM
/
Chicago vs Green Bay
477
Chicago
478
Green Bay
Media
Site Map
Terms of use
Contact Us
Privacy Policy
DMCA
18+. Gamble Responsibly.
© Sportsbook Review. All Rights Reserved.
@sudonym's analysis was in the right direction. You need to induce WebDriverWait for the desired elements to be visible before you attempt to extract them through execute_script() method as follows:
Code Block:
# -*- coding: UTF-8 -*-
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
url = "https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909"
driver = webdriver.Chrome()
driver.get(url)
WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//h2[contains(.,'USA - National Football League')]//following::section//span[3]")))
print(driver.execute_script("return document.documentElement.innerText;"))
Console Output:
SPORTSBOOK REVIEW
Home
Best Sportsbooks
Rating Guide
Blacklist
Bonuses
BETTING ODDS
FREE PICKS
Sports Picks
NFL
College Football
NBA
NCAAB
MLB
NHL
More Sports
How to Bet
Tools
FORUM
Home
Players Talk
Sportsbooks & Industry
Newbie Forum
Handicapper Think Tank
David Malinsky's Point Blank
Service Plays
Bitcoin Sports Betting
NBA Betting
NFL Betting
NCAAF Betting
MLB Betting
NHL Betting
CONTESTS
EARN BETPOINTS
What Are Betpoints?
SBR Sportsbook
SBR Casino
SBR Racebook
SBR Poker
SBR Store
Today
NFL
NBA
NHL
MLB
College Football
NCAA Basketball
Soccer
Soccer Odds
Major League Soccer
UEFA Champions League
UEFA Nations League
UEFA Europa League
English Premier League
World Cup 2022
Tennis
Tennis Odds
ATP
WTA
UFC
Boxing
More Sports
CFL
WNBA
AFL
Betting Odds/NFL Odds/Consensus
TODAY
|
YESTERDAY
|
DATE
?
Login
?
Settings
?
Bet Tracker
?
Bet Card
?
Favorites
NFL Consensus for Sep 09, 2018
USA - National Football League
Sunday Sep 09, 2018
01:00 PM
/
Pittsburgh vs Cleveland
453
Pittsburgh
454
Cleveland
Current Line
-3½+105
+3½-115
Wagers Placed
10040
54.07%
8530
45.93%
Amount Wagered
$381,520.00
56.10%
$298,550.00
43.90%
Average Bet Size
$38.00
$35.00
SBR Contest Best Bets
22
9
01:00 PM
/
San Francisco vs Minnesota
455
San Francisco
456
Minnesota
Current Line
+6-102
-6-108
Wagers Placed
6250
41.25%
8900
58.75%
Amount Wagered
$175,000.00
29.50%
$418,300.00
70.50%
Average Bet Size
$28.00
$47.00
SBR Contest Best Bets
5
19
01:00 PM
/
Cincinnati vs Indianapolis
457
Cincinnati
458
Indianapolis
Current Line
-1-104
+1-106
Wagers Placed
11640
66.36%
5900
33.64%
Amount Wagered
$1,338,600.00
85.65%
$224,200.00
14.35%
Average Bet Size
$115.00
$38.00
SBR Contest Best Bets
23
12
01:00 PM
/
Buffalo vs Baltimore
459
Buffalo
460
Baltimore
Current Line
+7½-103
-7½-107
Wagers Placed
5220
33.83%
10210
66.17%
Amount Wagered
$78,300.00
16.79%
$387,980.00
83.21%
Average Bet Size
$15.00
$38.00
SBR Contest Best Bets
5
17
01:00 PM
/
Jacksonville vs N.Y. Giants
461
Jacksonville
462
N.Y. Giants
01:00 PM
/
Tampa Bay vs New Orleans
463
Tampa Bay
464
New Orleans
01:00 PM
/
Houston vs New England
465
Houston
466
New England
01:00 PM
/
Tennessee vs Miami
467
Tennessee
468
Miami
04:05 PM
/
Kansas City vs L.A. Chargers
469
Kansas City
470
L.A. Chargers
04:25 PM
/
Seattle vs Denver
471
Seattle
472
Denver
04:25 PM
/
Dallas vs Carolina
473
Dallas
474
Carolina
04:25 PM
/
Washington vs Arizona
475
Washington
476
Arizona
08:20 PM
/
Chicago vs Green Bay
477
Chicago
478
Green Bay
Media
Site Map
Terms of use
Contact Us
Privacy Policy
DMCA
18+. Gamble Responsibly.
© Sportsbook Review. All Rights Reserved.
edited Nov 12 at 13:18
Corey Goldberg
36.6k22106121
36.6k22106121
answered Nov 12 at 7:52
DebanjanB
37.5k73373
37.5k73373
1
actually this is the way to go for OP's setup. my reason to not do it this way was that for my use case, there have been so manyWebDriverWait.until()calls that the execution time went off into infinity. I decided to just getdriver.page_source()once and do the rest with BS/lxml.
– sudonym
Nov 12 at 8:31
Not sure if you noticed though, but the problem is still present in the console output. But for the first 4 games, all of the wager data is retrieved, however for the next 9 games, only the team names and game start times are return, when looking at the actual page, all elements are present and loaded by the JS.
– Kyle Lagerquist
Nov 12 at 18:52
add a comment |
1
actually this is the way to go for OP's setup. my reason to not do it this way was that for my use case, there have been so manyWebDriverWait.until()calls that the execution time went off into infinity. I decided to just getdriver.page_source()once and do the rest with BS/lxml.
– sudonym
Nov 12 at 8:31
Not sure if you noticed though, but the problem is still present in the console output. But for the first 4 games, all of the wager data is retrieved, however for the next 9 games, only the team names and game start times are return, when looking at the actual page, all elements are present and loaded by the JS.
– Kyle Lagerquist
Nov 12 at 18:52
1
1
actually this is the way to go for OP's setup. my reason to not do it this way was that for my use case, there have been so many
WebDriverWait.until() calls that the execution time went off into infinity. I decided to just get driver.page_source() once and do the rest with BS/lxml.– sudonym
Nov 12 at 8:31
actually this is the way to go for OP's setup. my reason to not do it this way was that for my use case, there have been so many
WebDriverWait.until() calls that the execution time went off into infinity. I decided to just get driver.page_source() once and do the rest with BS/lxml.– sudonym
Nov 12 at 8:31
Not sure if you noticed though, but the problem is still present in the console output. But for the first 4 games, all of the wager data is retrieved, however for the next 9 games, only the team names and game start times are return, when looking at the actual page, all elements are present and loaded by the JS.
– Kyle Lagerquist
Nov 12 at 18:52
Not sure if you noticed though, but the problem is still present in the console output. But for the first 4 games, all of the wager data is retrieved, however for the next 9 games, only the team names and game start times are return, when looking at the actual page, all elements are present and loaded by the JS.
– Kyle Lagerquist
Nov 12 at 18:52
add a comment |
up vote
1
down vote
This solution is only worth to consider if there are lots of WebDriverWait calls
and given the interest in reduced runtime - else go for DebanjanB's
approach
You need to wait some time to let your html load completely. Also, you can set a timeout for script execution. To add a unconditional wait to driver.get(URL) in selenium, driver.set_page_load_timeout(n) with n = time/seconds and loop:
driver.set_page_load_timeout(n) # Set timeout of n seconds for page load
loading_finished = 0 # Set flag to 0
while loading_finished == 0: # Repeat while flag = 0
try:
sleep(random.uniform(0.1, 0.5)) # wait some time
website = driver.get(URL) # try to load for n seconds
loading_finished = 1 # Set flag to 1 and exit while loop
logger.info("website loaded") # Indicate load success
except:
logger.warn("timeout - retry") # Indicate load fail
else: # If flag == 1
driver.set_script_timeout(n) # Set timeout of n seconds for script
script_finished = 0 # Set flag to 0
while script_finished == 0 # Second loop
try:
print driver.execute_script("return document.documentElement.innerText;")
script_finished = 1 # Set flag to 1
logger.info("script done") # Indicate script done
except:
logger.warn("script timeout")
else:
logger.info("if you're still missing html here, increase timeout")
use WebDriverWait, not this.
– Corey Goldberg
Nov 12 at 13:17
Not sure if you noticed though, but the problem is still present in the console output. But for the first 4 games, all of the wager data is retrieved, however for the next 9 games, only the team names and game start times are return, when looking at the actual page, all elements are present and loaded by the JS.
– Kyle Lagerquist
Nov 12 at 20:15
taking into account that you are executing a script, you coulddriver.set_script_timeout(n)link - I have updated my answer
– sudonym
Nov 13 at 1:06
Great thought, but still not returning the same data for each div.
– Kyle Lagerquist
Nov 13 at 1:14
admittedly, my attempt now looks length and ugly - did you try the solution of @DebanjanB
– sudonym
Nov 13 at 1:17
|
show 1 more comment
up vote
1
down vote
This solution is only worth to consider if there are lots of WebDriverWait calls
and given the interest in reduced runtime - else go for DebanjanB's
approach
You need to wait some time to let your html load completely. Also, you can set a timeout for script execution. To add a unconditional wait to driver.get(URL) in selenium, driver.set_page_load_timeout(n) with n = time/seconds and loop:
driver.set_page_load_timeout(n) # Set timeout of n seconds for page load
loading_finished = 0 # Set flag to 0
while loading_finished == 0: # Repeat while flag = 0
try:
sleep(random.uniform(0.1, 0.5)) # wait some time
website = driver.get(URL) # try to load for n seconds
loading_finished = 1 # Set flag to 1 and exit while loop
logger.info("website loaded") # Indicate load success
except:
logger.warn("timeout - retry") # Indicate load fail
else: # If flag == 1
driver.set_script_timeout(n) # Set timeout of n seconds for script
script_finished = 0 # Set flag to 0
while script_finished == 0 # Second loop
try:
print driver.execute_script("return document.documentElement.innerText;")
script_finished = 1 # Set flag to 1
logger.info("script done") # Indicate script done
except:
logger.warn("script timeout")
else:
logger.info("if you're still missing html here, increase timeout")
use WebDriverWait, not this.
– Corey Goldberg
Nov 12 at 13:17
Not sure if you noticed though, but the problem is still present in the console output. But for the first 4 games, all of the wager data is retrieved, however for the next 9 games, only the team names and game start times are return, when looking at the actual page, all elements are present and loaded by the JS.
– Kyle Lagerquist
Nov 12 at 20:15
taking into account that you are executing a script, you coulddriver.set_script_timeout(n)link - I have updated my answer
– sudonym
Nov 13 at 1:06
Great thought, but still not returning the same data for each div.
– Kyle Lagerquist
Nov 13 at 1:14
admittedly, my attempt now looks length and ugly - did you try the solution of @DebanjanB
– sudonym
Nov 13 at 1:17
|
show 1 more comment
up vote
1
down vote
up vote
1
down vote
This solution is only worth to consider if there are lots of WebDriverWait calls
and given the interest in reduced runtime - else go for DebanjanB's
approach
You need to wait some time to let your html load completely. Also, you can set a timeout for script execution. To add a unconditional wait to driver.get(URL) in selenium, driver.set_page_load_timeout(n) with n = time/seconds and loop:
driver.set_page_load_timeout(n) # Set timeout of n seconds for page load
loading_finished = 0 # Set flag to 0
while loading_finished == 0: # Repeat while flag = 0
try:
sleep(random.uniform(0.1, 0.5)) # wait some time
website = driver.get(URL) # try to load for n seconds
loading_finished = 1 # Set flag to 1 and exit while loop
logger.info("website loaded") # Indicate load success
except:
logger.warn("timeout - retry") # Indicate load fail
else: # If flag == 1
driver.set_script_timeout(n) # Set timeout of n seconds for script
script_finished = 0 # Set flag to 0
while script_finished == 0 # Second loop
try:
print driver.execute_script("return document.documentElement.innerText;")
script_finished = 1 # Set flag to 1
logger.info("script done") # Indicate script done
except:
logger.warn("script timeout")
else:
logger.info("if you're still missing html here, increase timeout")
This solution is only worth to consider if there are lots of WebDriverWait calls
and given the interest in reduced runtime - else go for DebanjanB's
approach
You need to wait some time to let your html load completely. Also, you can set a timeout for script execution. To add a unconditional wait to driver.get(URL) in selenium, driver.set_page_load_timeout(n) with n = time/seconds and loop:
driver.set_page_load_timeout(n) # Set timeout of n seconds for page load
loading_finished = 0 # Set flag to 0
while loading_finished == 0: # Repeat while flag = 0
try:
sleep(random.uniform(0.1, 0.5)) # wait some time
website = driver.get(URL) # try to load for n seconds
loading_finished = 1 # Set flag to 1 and exit while loop
logger.info("website loaded") # Indicate load success
except:
logger.warn("timeout - retry") # Indicate load fail
else: # If flag == 1
driver.set_script_timeout(n) # Set timeout of n seconds for script
script_finished = 0 # Set flag to 0
while script_finished == 0 # Second loop
try:
print driver.execute_script("return document.documentElement.innerText;")
script_finished = 1 # Set flag to 1
logger.info("script done") # Indicate script done
except:
logger.warn("script timeout")
else:
logger.info("if you're still missing html here, increase timeout")
edited Nov 13 at 1:34
answered Nov 12 at 6:01
sudonym
1,291924
1,291924
use WebDriverWait, not this.
– Corey Goldberg
Nov 12 at 13:17
Not sure if you noticed though, but the problem is still present in the console output. But for the first 4 games, all of the wager data is retrieved, however for the next 9 games, only the team names and game start times are return, when looking at the actual page, all elements are present and loaded by the JS.
– Kyle Lagerquist
Nov 12 at 20:15
taking into account that you are executing a script, you coulddriver.set_script_timeout(n)link - I have updated my answer
– sudonym
Nov 13 at 1:06
Great thought, but still not returning the same data for each div.
– Kyle Lagerquist
Nov 13 at 1:14
admittedly, my attempt now looks length and ugly - did you try the solution of @DebanjanB
– sudonym
Nov 13 at 1:17
|
show 1 more comment
use WebDriverWait, not this.
– Corey Goldberg
Nov 12 at 13:17
Not sure if you noticed though, but the problem is still present in the console output. But for the first 4 games, all of the wager data is retrieved, however for the next 9 games, only the team names and game start times are return, when looking at the actual page, all elements are present and loaded by the JS.
– Kyle Lagerquist
Nov 12 at 20:15
taking into account that you are executing a script, you coulddriver.set_script_timeout(n)link - I have updated my answer
– sudonym
Nov 13 at 1:06
Great thought, but still not returning the same data for each div.
– Kyle Lagerquist
Nov 13 at 1:14
admittedly, my attempt now looks length and ugly - did you try the solution of @DebanjanB
– sudonym
Nov 13 at 1:17
use WebDriverWait, not this.
– Corey Goldberg
Nov 12 at 13:17
use WebDriverWait, not this.
– Corey Goldberg
Nov 12 at 13:17
Not sure if you noticed though, but the problem is still present in the console output. But for the first 4 games, all of the wager data is retrieved, however for the next 9 games, only the team names and game start times are return, when looking at the actual page, all elements are present and loaded by the JS.
– Kyle Lagerquist
Nov 12 at 20:15
Not sure if you noticed though, but the problem is still present in the console output. But for the first 4 games, all of the wager data is retrieved, however for the next 9 games, only the team names and game start times are return, when looking at the actual page, all elements are present and loaded by the JS.
– Kyle Lagerquist
Nov 12 at 20:15
taking into account that you are executing a script, you could
driver.set_script_timeout(n) link - I have updated my answer– sudonym
Nov 13 at 1:06
taking into account that you are executing a script, you could
driver.set_script_timeout(n) link - I have updated my answer– sudonym
Nov 13 at 1:06
Great thought, but still not returning the same data for each div.
– Kyle Lagerquist
Nov 13 at 1:14
Great thought, but still not returning the same data for each div.
– Kyle Lagerquist
Nov 13 at 1:14
admittedly, my attempt now looks length and ugly - did you try the solution of @DebanjanB
– sudonym
Nov 13 at 1:17
admittedly, my attempt now looks length and ugly - did you try the solution of @DebanjanB
– sudonym
Nov 13 at 1:17
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53256487%2fselenium-loading-but-not-printing-all-html%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Are you sure you don't want
document.documentElement.innerHTML? or maybedocument.body.innerHTML?– pguardiario
Nov 12 at 6:01
@pguardiario Why are you posting to exact duplicate comments?
– U9-Forward
Nov 12 at 6:03
Those are actually slightly different. I recommend using
document.body.innerHTML– pguardiario
Nov 12 at 11:09