Excel VBA HTML Nested QuerySelector
Consider this extract of an html page:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Document</title>
</head>
<body>
<div class="BoxBody">
<span class="txt">20 Records found. </span>
<p style="text-align: right;"><span class="txt">[First/Previous] 1 , <a class="page" href="javascript:paginacao('paginar','2');" title="Go to page 2">2</a> [<a class="page" title="Next page" href="javascript:paginacao('paginar','next');">Next</a>/<a class="page" title="Last page" href="javascript:paginacao('paginar','last');">Last</a>]</span></p>
<br>
<span class="txt">25 Records found. </span>
<p style="text-align: right;"><span class="txt">[First/Previous] 1 , <a class="page" href="javascript:paginacao('paginar2','2');" title="Go to page 2">2</a> [<a class="page" title="Next page" href="javascript:paginacao('paginar2','next');">Next</a>/<a class="page" title="Last page" href="javascript:paginacao('paginar2','last');">Last</a>]</span></p>
</div>
</body>
</html>
I am trying to get the anchor
tag that has the "next" page href
(if it has one).
I tried this in the console using Firefox and it works:
document.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='Next page']")
I put up a sample VBA code using querySelector
as well, but it fails with Invalid argument
.
Sub test()
Dim oFSO As Object, paginator As Object
Dim oFS As Object, sText As String
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(ThisWorkbook.Path & "example.html")
Do Until oFS.AtEndOfStream
sText = oFS.ReadAll()
Loop
Dim html As HTMLDocument, html2 As Object
Set html = New HTMLDocument
Set html2 = html
html2.Write sText
Set paginator = html.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='Next page']")
End Sub
What is causing this? The p:nth-child(2)
identifier?
How should I go to extract that element using VBA?
html excel vba web-scraping css-selectors
add a comment |
Consider this extract of an html page:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Document</title>
</head>
<body>
<div class="BoxBody">
<span class="txt">20 Records found. </span>
<p style="text-align: right;"><span class="txt">[First/Previous] 1 , <a class="page" href="javascript:paginacao('paginar','2');" title="Go to page 2">2</a> [<a class="page" title="Next page" href="javascript:paginacao('paginar','next');">Next</a>/<a class="page" title="Last page" href="javascript:paginacao('paginar','last');">Last</a>]</span></p>
<br>
<span class="txt">25 Records found. </span>
<p style="text-align: right;"><span class="txt">[First/Previous] 1 , <a class="page" href="javascript:paginacao('paginar2','2');" title="Go to page 2">2</a> [<a class="page" title="Next page" href="javascript:paginacao('paginar2','next');">Next</a>/<a class="page" title="Last page" href="javascript:paginacao('paginar2','last');">Last</a>]</span></p>
</div>
</body>
</html>
I am trying to get the anchor
tag that has the "next" page href
(if it has one).
I tried this in the console using Firefox and it works:
document.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='Next page']")
I put up a sample VBA code using querySelector
as well, but it fails with Invalid argument
.
Sub test()
Dim oFSO As Object, paginator As Object
Dim oFS As Object, sText As String
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(ThisWorkbook.Path & "example.html")
Do Until oFS.AtEndOfStream
sText = oFS.ReadAll()
Loop
Dim html As HTMLDocument, html2 As Object
Set html = New HTMLDocument
Set html2 = html
html2.Write sText
Set paginator = html.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='Next page']")
End Sub
What is causing this? The p:nth-child(2)
identifier?
How should I go to extract that element using VBA?
html excel vba web-scraping css-selectors
add a comment |
Consider this extract of an html page:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Document</title>
</head>
<body>
<div class="BoxBody">
<span class="txt">20 Records found. </span>
<p style="text-align: right;"><span class="txt">[First/Previous] 1 , <a class="page" href="javascript:paginacao('paginar','2');" title="Go to page 2">2</a> [<a class="page" title="Next page" href="javascript:paginacao('paginar','next');">Next</a>/<a class="page" title="Last page" href="javascript:paginacao('paginar','last');">Last</a>]</span></p>
<br>
<span class="txt">25 Records found. </span>
<p style="text-align: right;"><span class="txt">[First/Previous] 1 , <a class="page" href="javascript:paginacao('paginar2','2');" title="Go to page 2">2</a> [<a class="page" title="Next page" href="javascript:paginacao('paginar2','next');">Next</a>/<a class="page" title="Last page" href="javascript:paginacao('paginar2','last');">Last</a>]</span></p>
</div>
</body>
</html>
I am trying to get the anchor
tag that has the "next" page href
(if it has one).
I tried this in the console using Firefox and it works:
document.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='Next page']")
I put up a sample VBA code using querySelector
as well, but it fails with Invalid argument
.
Sub test()
Dim oFSO As Object, paginator As Object
Dim oFS As Object, sText As String
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(ThisWorkbook.Path & "example.html")
Do Until oFS.AtEndOfStream
sText = oFS.ReadAll()
Loop
Dim html As HTMLDocument, html2 As Object
Set html = New HTMLDocument
Set html2 = html
html2.Write sText
Set paginator = html.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='Next page']")
End Sub
What is causing this? The p:nth-child(2)
identifier?
How should I go to extract that element using VBA?
html excel vba web-scraping css-selectors
Consider this extract of an html page:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Document</title>
</head>
<body>
<div class="BoxBody">
<span class="txt">20 Records found. </span>
<p style="text-align: right;"><span class="txt">[First/Previous] 1 , <a class="page" href="javascript:paginacao('paginar','2');" title="Go to page 2">2</a> [<a class="page" title="Next page" href="javascript:paginacao('paginar','next');">Next</a>/<a class="page" title="Last page" href="javascript:paginacao('paginar','last');">Last</a>]</span></p>
<br>
<span class="txt">25 Records found. </span>
<p style="text-align: right;"><span class="txt">[First/Previous] 1 , <a class="page" href="javascript:paginacao('paginar2','2');" title="Go to page 2">2</a> [<a class="page" title="Next page" href="javascript:paginacao('paginar2','next');">Next</a>/<a class="page" title="Last page" href="javascript:paginacao('paginar2','last');">Last</a>]</span></p>
</div>
</body>
</html>
I am trying to get the anchor
tag that has the "next" page href
(if it has one).
I tried this in the console using Firefox and it works:
document.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='Next page']")
I put up a sample VBA code using querySelector
as well, but it fails with Invalid argument
.
Sub test()
Dim oFSO As Object, paginator As Object
Dim oFS As Object, sText As String
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(ThisWorkbook.Path & "example.html")
Do Until oFS.AtEndOfStream
sText = oFS.ReadAll()
Loop
Dim html As HTMLDocument, html2 As Object
Set html = New HTMLDocument
Set html2 = html
html2.Write sText
Set paginator = html.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='Next page']")
End Sub
What is causing this? The p:nth-child(2)
identifier?
How should I go to extract that element using VBA?
html excel vba web-scraping css-selectors
html excel vba web-scraping css-selectors
edited Nov 13 at 5:42
BoltClock♦
514k12711521191
514k12711521191
asked Nov 12 at 16:38
drec4s
1,6062621
1,6062621
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
nth-child(2)
is not supported in VBA and is indeed causing the error message. You can't use :nth-child()
or :nth-of-type()
. There is very little implemented in libraries available to you that deal with pseudo-classes. You can use first-child
interestingly. You will also find you are limited on which objects you can chain querySelector on.
Dim ele As Object, iText As String
Set ele = html.querySelector(".BoxBody > p > span:first-child > a[title='Next page']")
On Error Resume Next
iText = ele.href
On Error GoTo 0
If iText = vbNullString Then '<== This assumes that the href has a value otherwise use an On Error GoTo which will then handle the error and print "no href"
Debug.Print "No href"
Else
Debug.Print "href"
End If
That was my first solution, but since there are two similar paginated tables in the page (with that same title attribute), I really need to check if that element exists inside that.BoxBody > p:nth-child(2) > span:nth-child(1) span:nth-child(1)
element..
– drec4s
Nov 12 at 17:00
1
No, but I can edit my question to exemplify that.
– drec4s
Nov 12 at 17:05
Ok. If there is enough to demonstrate the choice that must be made .
– QHarr
Nov 12 at 17:05
No, I want one match only (whether the 'next' button has an href, or not)
– drec4s
Nov 12 at 17:14
1
Well, thatfirst-child
is a nice catch, thanks!
– drec4s
Nov 12 at 17:29
|
show 4 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53266480%2fexcel-vba-html-nested-queryselector%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
nth-child(2)
is not supported in VBA and is indeed causing the error message. You can't use :nth-child()
or :nth-of-type()
. There is very little implemented in libraries available to you that deal with pseudo-classes. You can use first-child
interestingly. You will also find you are limited on which objects you can chain querySelector on.
Dim ele As Object, iText As String
Set ele = html.querySelector(".BoxBody > p > span:first-child > a[title='Next page']")
On Error Resume Next
iText = ele.href
On Error GoTo 0
If iText = vbNullString Then '<== This assumes that the href has a value otherwise use an On Error GoTo which will then handle the error and print "no href"
Debug.Print "No href"
Else
Debug.Print "href"
End If
That was my first solution, but since there are two similar paginated tables in the page (with that same title attribute), I really need to check if that element exists inside that.BoxBody > p:nth-child(2) > span:nth-child(1) span:nth-child(1)
element..
– drec4s
Nov 12 at 17:00
1
No, but I can edit my question to exemplify that.
– drec4s
Nov 12 at 17:05
Ok. If there is enough to demonstrate the choice that must be made .
– QHarr
Nov 12 at 17:05
No, I want one match only (whether the 'next' button has an href, or not)
– drec4s
Nov 12 at 17:14
1
Well, thatfirst-child
is a nice catch, thanks!
– drec4s
Nov 12 at 17:29
|
show 4 more comments
nth-child(2)
is not supported in VBA and is indeed causing the error message. You can't use :nth-child()
or :nth-of-type()
. There is very little implemented in libraries available to you that deal with pseudo-classes. You can use first-child
interestingly. You will also find you are limited on which objects you can chain querySelector on.
Dim ele As Object, iText As String
Set ele = html.querySelector(".BoxBody > p > span:first-child > a[title='Next page']")
On Error Resume Next
iText = ele.href
On Error GoTo 0
If iText = vbNullString Then '<== This assumes that the href has a value otherwise use an On Error GoTo which will then handle the error and print "no href"
Debug.Print "No href"
Else
Debug.Print "href"
End If
That was my first solution, but since there are two similar paginated tables in the page (with that same title attribute), I really need to check if that element exists inside that.BoxBody > p:nth-child(2) > span:nth-child(1) span:nth-child(1)
element..
– drec4s
Nov 12 at 17:00
1
No, but I can edit my question to exemplify that.
– drec4s
Nov 12 at 17:05
Ok. If there is enough to demonstrate the choice that must be made .
– QHarr
Nov 12 at 17:05
No, I want one match only (whether the 'next' button has an href, or not)
– drec4s
Nov 12 at 17:14
1
Well, thatfirst-child
is a nice catch, thanks!
– drec4s
Nov 12 at 17:29
|
show 4 more comments
nth-child(2)
is not supported in VBA and is indeed causing the error message. You can't use :nth-child()
or :nth-of-type()
. There is very little implemented in libraries available to you that deal with pseudo-classes. You can use first-child
interestingly. You will also find you are limited on which objects you can chain querySelector on.
Dim ele As Object, iText As String
Set ele = html.querySelector(".BoxBody > p > span:first-child > a[title='Next page']")
On Error Resume Next
iText = ele.href
On Error GoTo 0
If iText = vbNullString Then '<== This assumes that the href has a value otherwise use an On Error GoTo which will then handle the error and print "no href"
Debug.Print "No href"
Else
Debug.Print "href"
End If
nth-child(2)
is not supported in VBA and is indeed causing the error message. You can't use :nth-child()
or :nth-of-type()
. There is very little implemented in libraries available to you that deal with pseudo-classes. You can use first-child
interestingly. You will also find you are limited on which objects you can chain querySelector on.
Dim ele As Object, iText As String
Set ele = html.querySelector(".BoxBody > p > span:first-child > a[title='Next page']")
On Error Resume Next
iText = ele.href
On Error GoTo 0
If iText = vbNullString Then '<== This assumes that the href has a value otherwise use an On Error GoTo which will then handle the error and print "no href"
Debug.Print "No href"
Else
Debug.Print "href"
End If
edited Nov 13 at 5:42
BoltClock♦
514k12711521191
514k12711521191
answered Nov 12 at 16:56
QHarr
29.8k81841
29.8k81841
That was my first solution, but since there are two similar paginated tables in the page (with that same title attribute), I really need to check if that element exists inside that.BoxBody > p:nth-child(2) > span:nth-child(1) span:nth-child(1)
element..
– drec4s
Nov 12 at 17:00
1
No, but I can edit my question to exemplify that.
– drec4s
Nov 12 at 17:05
Ok. If there is enough to demonstrate the choice that must be made .
– QHarr
Nov 12 at 17:05
No, I want one match only (whether the 'next' button has an href, or not)
– drec4s
Nov 12 at 17:14
1
Well, thatfirst-child
is a nice catch, thanks!
– drec4s
Nov 12 at 17:29
|
show 4 more comments
That was my first solution, but since there are two similar paginated tables in the page (with that same title attribute), I really need to check if that element exists inside that.BoxBody > p:nth-child(2) > span:nth-child(1) span:nth-child(1)
element..
– drec4s
Nov 12 at 17:00
1
No, but I can edit my question to exemplify that.
– drec4s
Nov 12 at 17:05
Ok. If there is enough to demonstrate the choice that must be made .
– QHarr
Nov 12 at 17:05
No, I want one match only (whether the 'next' button has an href, or not)
– drec4s
Nov 12 at 17:14
1
Well, thatfirst-child
is a nice catch, thanks!
– drec4s
Nov 12 at 17:29
That was my first solution, but since there are two similar paginated tables in the page (with that same title attribute), I really need to check if that element exists inside that
.BoxBody > p:nth-child(2) > span:nth-child(1) span:nth-child(1)
element..– drec4s
Nov 12 at 17:00
That was my first solution, but since there are two similar paginated tables in the page (with that same title attribute), I really need to check if that element exists inside that
.BoxBody > p:nth-child(2) > span:nth-child(1) span:nth-child(1)
element..– drec4s
Nov 12 at 17:00
1
1
No, but I can edit my question to exemplify that.
– drec4s
Nov 12 at 17:05
No, but I can edit my question to exemplify that.
– drec4s
Nov 12 at 17:05
Ok. If there is enough to demonstrate the choice that must be made .
– QHarr
Nov 12 at 17:05
Ok. If there is enough to demonstrate the choice that must be made .
– QHarr
Nov 12 at 17:05
No, I want one match only (whether the 'next' button has an href, or not)
– drec4s
Nov 12 at 17:14
No, I want one match only (whether the 'next' button has an href, or not)
– drec4s
Nov 12 at 17:14
1
1
Well, that
first-child
is a nice catch, thanks!– drec4s
Nov 12 at 17:29
Well, that
first-child
is a nice catch, thanks!– drec4s
Nov 12 at 17:29
|
show 4 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53266480%2fexcel-vba-html-nested-queryselector%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown