how to define parser when using BS4 in python

up vote
1
down vote

favorite

#!/usr/bin/env python



import requests

from bs4 import BeautifulSoup



url = "https://www.youtube.com/channel/UCaKt8dvEIPnEHWSbLYhzrxg/videos"

response = requests.get(url)

# parse html

page = str(BeautifulSoup(response.content))





def getURL(page):

    """



    :param page: html of web page (here: Python home page) 

    :return: urls in that page 

    """

    start_link = page.find("a href")

    if start_link == -1:

        return None, 0

    start_quote = page.find('"', start_link)

    end_quote = page.find('"', start_quote + 1)

    url = page[start_quote + 1: end_quote]

    return url, end_quote



while True:

    url, n = getURL(page)

    page = page[n:]

    if url:

        print(url)

    else:

        break

I am using above code to get list of all youtube videos on webpage. If i try to do this. I get following error

The code that caused this warning is on line 9 of the file C:/Users/PycharmProjects/ReadCSVFile/venv/Links.py. To get rid of this warning, change code that looks like this:

I did and started using html but some different error came .

I am using Python 3.0 . I am using IDE Pycharm.

Can someone please help me this.

edited Nov 11 at 11:13

ewwink

6,22122233

asked Nov 11 at 0:54

NewtoPython

296

add a comment |

up vote
1
down vote

favorite

#!/usr/bin/env python



import requests

from bs4 import BeautifulSoup



url = "https://www.youtube.com/channel/UCaKt8dvEIPnEHWSbLYhzrxg/videos"

response = requests.get(url)

# parse html

page = str(BeautifulSoup(response.content))





def getURL(page):

    """



    :param page: html of web page (here: Python home page) 

    :return: urls in that page 

    """

    start_link = page.find("a href")

    if start_link == -1:

        return None, 0

    start_quote = page.find('"', start_link)

    end_quote = page.find('"', start_quote + 1)

    url = page[start_quote + 1: end_quote]

    return url, end_quote



while True:

    url, n = getURL(page)

    page = page[n:]

    if url:

        print(url)

    else:

        break

I am using above code to get list of all youtube videos on webpage. If i try to do this. I get following error

The code that caused this warning is on line 9 of the file C:/Users/PycharmProjects/ReadCSVFile/venv/Links.py. To get rid of this warning, change code that looks like this:

I did and started using html but some different error came .

I am using Python 3.0 . I am using IDE Pycharm.

Can someone please help me this.

edited Nov 11 at 11:13

ewwink

6,22122233

asked Nov 11 at 0:54

NewtoPython

296

add a comment |

up vote
1
down vote

favorite

#!/usr/bin/env python



import requests

from bs4 import BeautifulSoup



url = "https://www.youtube.com/channel/UCaKt8dvEIPnEHWSbLYhzrxg/videos"

response = requests.get(url)

# parse html

page = str(BeautifulSoup(response.content))





def getURL(page):

    """



    :param page: html of web page (here: Python home page) 

    :return: urls in that page 

    """

    start_link = page.find("a href")

    if start_link == -1:

        return None, 0

    start_quote = page.find('"', start_link)

    end_quote = page.find('"', start_quote + 1)

    url = page[start_quote + 1: end_quote]

    return url, end_quote



while True:

    url, n = getURL(page)

    page = page[n:]

    if url:

        print(url)

    else:

        break

I am using above code to get list of all youtube videos on webpage. If i try to do this. I get following error

The code that caused this warning is on line 9 of the file C:/Users/PycharmProjects/ReadCSVFile/venv/Links.py. To get rid of this warning, change code that looks like this:

I did and started using html but some different error came .

I am using Python 3.0 . I am using IDE Pycharm.

Can someone please help me this.

edited Nov 11 at 11:13

ewwink

6,22122233

asked Nov 11 at 0:54

NewtoPython

296

#!/usr/bin/env python



import requests

from bs4 import BeautifulSoup



url = "https://www.youtube.com/channel/UCaKt8dvEIPnEHWSbLYhzrxg/videos"

response = requests.get(url)

# parse html

page = str(BeautifulSoup(response.content))





def getURL(page):

    """



    :param page: html of web page (here: Python home page) 

    :return: urls in that page 

    """

    start_link = page.find("a href")

    if start_link == -1:

        return None, 0

    start_quote = page.find('"', start_link)

    end_quote = page.find('"', start_quote + 1)

    url = page[start_quote + 1: end_quote]

    return url, end_quote



while True:

    url, n = getURL(page)

    page = page[n:]

    if url:

        print(url)

    else:

        break

I am using above code to get list of all youtube videos on webpage. If i try to do this. I get following error

The code that caused this warning is on line 9 of the file C:/Users/PycharmProjects/ReadCSVFile/venv/Links.py. To get rid of this warning, change code that looks like this:

I did and started using html but some different error came .

I am using Python 3.0 . I am using IDE Pycharm.

Can someone please help me this.

python-3.x beautifulsoup

edited Nov 11 at 11:13

ewwink

6,22122233

asked Nov 11 at 0:54

NewtoPython

296

edited Nov 11 at 11:13

ewwink

6,22122233

asked Nov 11 at 0:54

NewtoPython

296

edited Nov 11 at 11:13

ewwink

6,22122233

edited Nov 11 at 11:13

ewwink

6,22122233

edited Nov 11 at 11:13

ewwink

6,22122233

asked Nov 11 at 0:54

NewtoPython

296

asked Nov 11 at 0:54

NewtoPython

296

asked Nov 11 at 0:54

NewtoPython

296

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

its not error, but warning you didn't set parser which can be 'html.parser', 'lxml', 'xml'. change it to like

page = BeautifulSoup(response.content, 'html.parser')

your code above actually not doing what BeautifulSoup do, but here the example using it.

#!/usr/bin/env python



import requests

from bs4 import BeautifulSoup



def getURL(url):

    """

    :param url: url of web page

    :return: urls in that page 

    """

    response = requests.get(url)

    # parse html

    page = BeautifulSoup(response.content, 'html.parser')

    link_tags = page.find_all('a')

    urls = [x.get('href') for x in link_tags]

    return urls



url = "https://www.youtube.com/channel/UCaKt8dvEIPnEHWSbLYhzrxg/videos"

all_url = getURL(url)

print('n'.join(all_url))

answered Nov 11 at 11:05

ewwink

6,22122233

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244883%2fhow-to-define-parser-when-using-bs4-in-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

its not error, but warning you didn't set parser which can be 'html.parser', 'lxml', 'xml'. change it to like

page = BeautifulSoup(response.content, 'html.parser')

your code above actually not doing what BeautifulSoup do, but here the example using it.

#!/usr/bin/env python



import requests

from bs4 import BeautifulSoup



def getURL(url):

    """

    :param url: url of web page

    :return: urls in that page 

    """

    response = requests.get(url)

    # parse html

    page = BeautifulSoup(response.content, 'html.parser')

    link_tags = page.find_all('a')

    urls = [x.get('href') for x in link_tags]

    return urls



url = "https://www.youtube.com/channel/UCaKt8dvEIPnEHWSbLYhzrxg/videos"

all_url = getURL(url)

print('n'.join(all_url))

answered Nov 11 at 11:05

ewwink

6,22122233

add a comment |

up vote
0
down vote

its not error, but warning you didn't set parser which can be 'html.parser', 'lxml', 'xml'. change it to like

page = BeautifulSoup(response.content, 'html.parser')

your code above actually not doing what BeautifulSoup do, but here the example using it.

#!/usr/bin/env python



import requests

from bs4 import BeautifulSoup



def getURL(url):

    """

    :param url: url of web page

    :return: urls in that page 

    """

    response = requests.get(url)

    # parse html

    page = BeautifulSoup(response.content, 'html.parser')

    link_tags = page.find_all('a')

    urls = [x.get('href') for x in link_tags]

    return urls



url = "https://www.youtube.com/channel/UCaKt8dvEIPnEHWSbLYhzrxg/videos"

all_url = getURL(url)

print('n'.join(all_url))

answered Nov 11 at 11:05

ewwink

6,22122233

add a comment |

up vote
0
down vote

its not error, but warning you didn't set parser which can be 'html.parser', 'lxml', 'xml'. change it to like

page = BeautifulSoup(response.content, 'html.parser')

your code above actually not doing what BeautifulSoup do, but here the example using it.

#!/usr/bin/env python



import requests

from bs4 import BeautifulSoup



def getURL(url):

    """

    :param url: url of web page

    :return: urls in that page 

    """

    response = requests.get(url)

    # parse html

    page = BeautifulSoup(response.content, 'html.parser')

    link_tags = page.find_all('a')

    urls = [x.get('href') for x in link_tags]

    return urls



url = "https://www.youtube.com/channel/UCaKt8dvEIPnEHWSbLYhzrxg/videos"

all_url = getURL(url)

print('n'.join(all_url))

answered Nov 11 at 11:05

ewwink

6,22122233

its not error, but warning you didn't set parser which can be 'html.parser', 'lxml', 'xml'. change it to like

page = BeautifulSoup(response.content, 'html.parser')

your code above actually not doing what BeautifulSoup do, but here the example using it.

#!/usr/bin/env python



import requests

from bs4 import BeautifulSoup



def getURL(url):

    """

    :param url: url of web page

    :return: urls in that page 

    """

    response = requests.get(url)

    # parse html

    page = BeautifulSoup(response.content, 'html.parser')

    link_tags = page.find_all('a')

    urls = [x.get('href') for x in link_tags]

    return urls



url = "https://www.youtube.com/channel/UCaKt8dvEIPnEHWSbLYhzrxg/videos"

all_url = getURL(url)

print('n'.join(all_url))

answered Nov 11 at 11:05

ewwink

6,22122233

answered Nov 11 at 11:05

ewwink

6,22122233

answered Nov 11 at 11:05

ewwink

6,22122233

answered Nov 11 at 11:05

ewwink

6,22122233

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky