Is it possible to access https pages through a proxy with Scrapy?












2















I can successfully access http pages through a proxy in Scrapy, but I cannot access https sites. I've researched the topic, but it's still unclear to me. Is it possible to access https pages through a proxy with Scrapy? Do I need to patch anything? Or add some custom code? If it can be confirmed this is a standard feature, I can follow up with more details. Hopefully this is an easy one.



Edited:



Here is what I added to the settings file:



DOWNLOADER_MIDDLEWARES = {'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110, 'test_website.middlewares.ProxyMiddleware': 100}
PROXIES = [{'ip_port': 'us-il.proxymesh.com:31280', 'user_pass': 'username:password'}]


Here is the code for my spider:



import scrapy

class TestSpider(scrapy.Spider):
name = "test_spider"
allowed_domains = "ipify.org"
start_urls = ["https://api.ipify.org"]

def parse(self, response):
with open('test.html', 'wb') as f:
f.write(response.body)


Here is the middlewares file:



import base64
import random
from settings import PROXIES

class ProxyMiddleware(object):
def process_request(self, request, spider):
proxy = random.choice(PROXIES)
if proxy['user_pass'] is not None:
request.meta['proxy'] = "http://%s" % proxy['ip_port']
encoded_user_pass = base64.encodestring(proxy['user_pass'])
request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass
else:
request.meta['proxy'] = "http://%s" % proxy['ip_port']


Here is the log file:



2015-08-12 20:15:50 [scrapy] INFO: Scrapy 1.0.3 started (bot: test_website)
2015-08-12 20:15:50 [scrapy] INFO: Optional features available: ssl, http11
2015-08-12 20:15:50 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'test_website.spiders', 'SPIDER_MODULES': ['test_website.spiders'], 'LOG_STDOUT': True, 'LOG_FILE': 'log.txt', 'BOT_NAME': 'test_website'}
2015-08-12 20:15:51 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
2015-08-12 20:15:53 [scrapy] INFO: Enabled downloader middlewares: ProxyMiddleware, HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-08-12 20:15:53 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-08-12 20:15:53 [scrapy] INFO: Enabled item pipelines:
2015-08-12 20:15:53 [scrapy] INFO: Spider opened
2015-08-12 20:15:53 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-08-12 20:15:53 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2015-08-12 20:15:53 [scrapy] DEBUG: Retrying <GET https://api.ipify.org> (failed 1 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2015-08-12 20:15:53 [scrapy] DEBUG: Retrying <GET https://api.ipify.org> (failed 2 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2015-08-12 20:15:53 [scrapy] DEBUG: Gave up retrying <GET https://api.ipify.org> (failed 3 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2015-08-12 20:15:53 [scrapy] ERROR: Error downloading <GET https://api.ipify.org>: [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2015-08-12 20:15:53 [scrapy] INFO: Closing spider (finished)
2015-08-12 20:15:53 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 3,
'downloader/exception_type_count/twisted.web._newclient.ResponseNeverReceived': 3,
'downloader/request_bytes': 819,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 3,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2015, 8, 13, 2, 15, 53, 943000),
'log_count/DEBUG': 4,
'log_count/ERROR': 1,
'log_count/INFO': 7,
'scheduler/dequeued': 3,
'scheduler/dequeued/memory': 3,
'scheduler/enqueued': 3,
'scheduler/enqueued/memory': 3,
'start_time': datetime.datetime(2015, 8, 13, 2, 15, 53, 38000)}
2015-08-12 20:15:53 [scrapy] INFO: Spider closed (finished)


My spider works successfully if the 's' in 'https' is removed or I disable the proxy. I can access the https page through the proxy through my browser.










share|improve this question





























    2















    I can successfully access http pages through a proxy in Scrapy, but I cannot access https sites. I've researched the topic, but it's still unclear to me. Is it possible to access https pages through a proxy with Scrapy? Do I need to patch anything? Or add some custom code? If it can be confirmed this is a standard feature, I can follow up with more details. Hopefully this is an easy one.



    Edited:



    Here is what I added to the settings file:



    DOWNLOADER_MIDDLEWARES = {'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110, 'test_website.middlewares.ProxyMiddleware': 100}
    PROXIES = [{'ip_port': 'us-il.proxymesh.com:31280', 'user_pass': 'username:password'}]


    Here is the code for my spider:



    import scrapy

    class TestSpider(scrapy.Spider):
    name = "test_spider"
    allowed_domains = "ipify.org"
    start_urls = ["https://api.ipify.org"]

    def parse(self, response):
    with open('test.html', 'wb') as f:
    f.write(response.body)


    Here is the middlewares file:



    import base64
    import random
    from settings import PROXIES

    class ProxyMiddleware(object):
    def process_request(self, request, spider):
    proxy = random.choice(PROXIES)
    if proxy['user_pass'] is not None:
    request.meta['proxy'] = "http://%s" % proxy['ip_port']
    encoded_user_pass = base64.encodestring(proxy['user_pass'])
    request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass
    else:
    request.meta['proxy'] = "http://%s" % proxy['ip_port']


    Here is the log file:



    2015-08-12 20:15:50 [scrapy] INFO: Scrapy 1.0.3 started (bot: test_website)
    2015-08-12 20:15:50 [scrapy] INFO: Optional features available: ssl, http11
    2015-08-12 20:15:50 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'test_website.spiders', 'SPIDER_MODULES': ['test_website.spiders'], 'LOG_STDOUT': True, 'LOG_FILE': 'log.txt', 'BOT_NAME': 'test_website'}
    2015-08-12 20:15:51 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
    2015-08-12 20:15:53 [scrapy] INFO: Enabled downloader middlewares: ProxyMiddleware, HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
    2015-08-12 20:15:53 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
    2015-08-12 20:15:53 [scrapy] INFO: Enabled item pipelines:
    2015-08-12 20:15:53 [scrapy] INFO: Spider opened
    2015-08-12 20:15:53 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
    2015-08-12 20:15:53 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
    2015-08-12 20:15:53 [scrapy] DEBUG: Retrying <GET https://api.ipify.org> (failed 1 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
    2015-08-12 20:15:53 [scrapy] DEBUG: Retrying <GET https://api.ipify.org> (failed 2 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
    2015-08-12 20:15:53 [scrapy] DEBUG: Gave up retrying <GET https://api.ipify.org> (failed 3 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
    2015-08-12 20:15:53 [scrapy] ERROR: Error downloading <GET https://api.ipify.org>: [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
    2015-08-12 20:15:53 [scrapy] INFO: Closing spider (finished)
    2015-08-12 20:15:53 [scrapy] INFO: Dumping Scrapy stats:
    {'downloader/exception_count': 3,
    'downloader/exception_type_count/twisted.web._newclient.ResponseNeverReceived': 3,
    'downloader/request_bytes': 819,
    'downloader/request_count': 3,
    'downloader/request_method_count/GET': 3,
    'finish_reason': 'finished',
    'finish_time': datetime.datetime(2015, 8, 13, 2, 15, 53, 943000),
    'log_count/DEBUG': 4,
    'log_count/ERROR': 1,
    'log_count/INFO': 7,
    'scheduler/dequeued': 3,
    'scheduler/dequeued/memory': 3,
    'scheduler/enqueued': 3,
    'scheduler/enqueued/memory': 3,
    'start_time': datetime.datetime(2015, 8, 13, 2, 15, 53, 38000)}
    2015-08-12 20:15:53 [scrapy] INFO: Spider closed (finished)


    My spider works successfully if the 's' in 'https' is removed or I disable the proxy. I can access the https page through the proxy through my browser.










    share|improve this question



























      2












      2








      2


      1






      I can successfully access http pages through a proxy in Scrapy, but I cannot access https sites. I've researched the topic, but it's still unclear to me. Is it possible to access https pages through a proxy with Scrapy? Do I need to patch anything? Or add some custom code? If it can be confirmed this is a standard feature, I can follow up with more details. Hopefully this is an easy one.



      Edited:



      Here is what I added to the settings file:



      DOWNLOADER_MIDDLEWARES = {'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110, 'test_website.middlewares.ProxyMiddleware': 100}
      PROXIES = [{'ip_port': 'us-il.proxymesh.com:31280', 'user_pass': 'username:password'}]


      Here is the code for my spider:



      import scrapy

      class TestSpider(scrapy.Spider):
      name = "test_spider"
      allowed_domains = "ipify.org"
      start_urls = ["https://api.ipify.org"]

      def parse(self, response):
      with open('test.html', 'wb') as f:
      f.write(response.body)


      Here is the middlewares file:



      import base64
      import random
      from settings import PROXIES

      class ProxyMiddleware(object):
      def process_request(self, request, spider):
      proxy = random.choice(PROXIES)
      if proxy['user_pass'] is not None:
      request.meta['proxy'] = "http://%s" % proxy['ip_port']
      encoded_user_pass = base64.encodestring(proxy['user_pass'])
      request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass
      else:
      request.meta['proxy'] = "http://%s" % proxy['ip_port']


      Here is the log file:



      2015-08-12 20:15:50 [scrapy] INFO: Scrapy 1.0.3 started (bot: test_website)
      2015-08-12 20:15:50 [scrapy] INFO: Optional features available: ssl, http11
      2015-08-12 20:15:50 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'test_website.spiders', 'SPIDER_MODULES': ['test_website.spiders'], 'LOG_STDOUT': True, 'LOG_FILE': 'log.txt', 'BOT_NAME': 'test_website'}
      2015-08-12 20:15:51 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
      2015-08-12 20:15:53 [scrapy] INFO: Enabled downloader middlewares: ProxyMiddleware, HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
      2015-08-12 20:15:53 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
      2015-08-12 20:15:53 [scrapy] INFO: Enabled item pipelines:
      2015-08-12 20:15:53 [scrapy] INFO: Spider opened
      2015-08-12 20:15:53 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
      2015-08-12 20:15:53 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
      2015-08-12 20:15:53 [scrapy] DEBUG: Retrying <GET https://api.ipify.org> (failed 1 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
      2015-08-12 20:15:53 [scrapy] DEBUG: Retrying <GET https://api.ipify.org> (failed 2 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
      2015-08-12 20:15:53 [scrapy] DEBUG: Gave up retrying <GET https://api.ipify.org> (failed 3 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
      2015-08-12 20:15:53 [scrapy] ERROR: Error downloading <GET https://api.ipify.org>: [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
      2015-08-12 20:15:53 [scrapy] INFO: Closing spider (finished)
      2015-08-12 20:15:53 [scrapy] INFO: Dumping Scrapy stats:
      {'downloader/exception_count': 3,
      'downloader/exception_type_count/twisted.web._newclient.ResponseNeverReceived': 3,
      'downloader/request_bytes': 819,
      'downloader/request_count': 3,
      'downloader/request_method_count/GET': 3,
      'finish_reason': 'finished',
      'finish_time': datetime.datetime(2015, 8, 13, 2, 15, 53, 943000),
      'log_count/DEBUG': 4,
      'log_count/ERROR': 1,
      'log_count/INFO': 7,
      'scheduler/dequeued': 3,
      'scheduler/dequeued/memory': 3,
      'scheduler/enqueued': 3,
      'scheduler/enqueued/memory': 3,
      'start_time': datetime.datetime(2015, 8, 13, 2, 15, 53, 38000)}
      2015-08-12 20:15:53 [scrapy] INFO: Spider closed (finished)


      My spider works successfully if the 's' in 'https' is removed or I disable the proxy. I can access the https page through the proxy through my browser.










      share|improve this question
















      I can successfully access http pages through a proxy in Scrapy, but I cannot access https sites. I've researched the topic, but it's still unclear to me. Is it possible to access https pages through a proxy with Scrapy? Do I need to patch anything? Or add some custom code? If it can be confirmed this is a standard feature, I can follow up with more details. Hopefully this is an easy one.



      Edited:



      Here is what I added to the settings file:



      DOWNLOADER_MIDDLEWARES = {'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110, 'test_website.middlewares.ProxyMiddleware': 100}
      PROXIES = [{'ip_port': 'us-il.proxymesh.com:31280', 'user_pass': 'username:password'}]


      Here is the code for my spider:



      import scrapy

      class TestSpider(scrapy.Spider):
      name = "test_spider"
      allowed_domains = "ipify.org"
      start_urls = ["https://api.ipify.org"]

      def parse(self, response):
      with open('test.html', 'wb') as f:
      f.write(response.body)


      Here is the middlewares file:



      import base64
      import random
      from settings import PROXIES

      class ProxyMiddleware(object):
      def process_request(self, request, spider):
      proxy = random.choice(PROXIES)
      if proxy['user_pass'] is not None:
      request.meta['proxy'] = "http://%s" % proxy['ip_port']
      encoded_user_pass = base64.encodestring(proxy['user_pass'])
      request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass
      else:
      request.meta['proxy'] = "http://%s" % proxy['ip_port']


      Here is the log file:



      2015-08-12 20:15:50 [scrapy] INFO: Scrapy 1.0.3 started (bot: test_website)
      2015-08-12 20:15:50 [scrapy] INFO: Optional features available: ssl, http11
      2015-08-12 20:15:50 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'test_website.spiders', 'SPIDER_MODULES': ['test_website.spiders'], 'LOG_STDOUT': True, 'LOG_FILE': 'log.txt', 'BOT_NAME': 'test_website'}
      2015-08-12 20:15:51 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
      2015-08-12 20:15:53 [scrapy] INFO: Enabled downloader middlewares: ProxyMiddleware, HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
      2015-08-12 20:15:53 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
      2015-08-12 20:15:53 [scrapy] INFO: Enabled item pipelines:
      2015-08-12 20:15:53 [scrapy] INFO: Spider opened
      2015-08-12 20:15:53 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
      2015-08-12 20:15:53 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
      2015-08-12 20:15:53 [scrapy] DEBUG: Retrying <GET https://api.ipify.org> (failed 1 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
      2015-08-12 20:15:53 [scrapy] DEBUG: Retrying <GET https://api.ipify.org> (failed 2 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
      2015-08-12 20:15:53 [scrapy] DEBUG: Gave up retrying <GET https://api.ipify.org> (failed 3 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
      2015-08-12 20:15:53 [scrapy] ERROR: Error downloading <GET https://api.ipify.org>: [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
      2015-08-12 20:15:53 [scrapy] INFO: Closing spider (finished)
      2015-08-12 20:15:53 [scrapy] INFO: Dumping Scrapy stats:
      {'downloader/exception_count': 3,
      'downloader/exception_type_count/twisted.web._newclient.ResponseNeverReceived': 3,
      'downloader/request_bytes': 819,
      'downloader/request_count': 3,
      'downloader/request_method_count/GET': 3,
      'finish_reason': 'finished',
      'finish_time': datetime.datetime(2015, 8, 13, 2, 15, 53, 943000),
      'log_count/DEBUG': 4,
      'log_count/ERROR': 1,
      'log_count/INFO': 7,
      'scheduler/dequeued': 3,
      'scheduler/dequeued/memory': 3,
      'scheduler/enqueued': 3,
      'scheduler/enqueued/memory': 3,
      'start_time': datetime.datetime(2015, 8, 13, 2, 15, 53, 38000)}
      2015-08-12 20:15:53 [scrapy] INFO: Spider closed (finished)


      My spider works successfully if the 's' in 'https' is removed or I disable the proxy. I can access the https page through the proxy through my browser.







      python https scrapy proxies






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Aug 13 '15 at 2:16







      patrick.s

















      asked Aug 12 '15 at 19:37









      patrick.spatrick.s

      184




      184
























          3 Answers
          3






          active

          oldest

          votes


















          1














          I got this because of using base64.encodestring instead of base64.b64encode in the proxy middleware. Try changing it.






          share|improve this answer































            0














            I think it's possible.



            If you're setting the proxy through Request.meta it should just work.
            If you're setting the proxy with the http_proxy environment variable, you might also need to set https_proxy.



            It might be the case, however, that your proxy does not support HTTPS.



            It would be easier to help you if you posted the error you are getting.






            share|improve this answer

































              0














              The Scrapy is automatically bypass the https ssl.
              like @Aminah Nuraini said, just using base64.encodestring instead of base64.b64encode in the proxy middleware





              1. add following code in middlewares.py



                import base64
                class ProxyMiddleware(object):
                # overwrite process request
                def process_request(self, request, spider):
                # Set the location of the proxy
                request.meta['proxy'] = "<PROXY_SERVER>:<PROXY_PROT>"
                # Use the following lines if your proxy requires authentication
                proxy_user_pass = "<PROXY_USERNAME>:<PROXY_PASSWORD>"
                # setup basic authentication for the proxy
                encoded_user_pass = base64.b64encode(proxy_user_pass)
                request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass



              2. Add the proxy in settings.py



                DOWNLOADER_MIDDLEWARES = {

                'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,

                '<YOUR_PROJECT>.middlewares.ProxyMiddleware': 100,

                }







              share|improve this answer

























                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f31973947%2fis-it-possible-to-access-https-pages-through-a-proxy-with-scrapy%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                1














                I got this because of using base64.encodestring instead of base64.b64encode in the proxy middleware. Try changing it.






                share|improve this answer




























                  1














                  I got this because of using base64.encodestring instead of base64.b64encode in the proxy middleware. Try changing it.






                  share|improve this answer


























                    1












                    1








                    1







                    I got this because of using base64.encodestring instead of base64.b64encode in the proxy middleware. Try changing it.






                    share|improve this answer













                    I got this because of using base64.encodestring instead of base64.b64encode in the proxy middleware. Try changing it.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Sep 14 '16 at 1:32









                    Aminah NurainiAminah Nuraini

                    6,52744562




                    6,52744562

























                        0














                        I think it's possible.



                        If you're setting the proxy through Request.meta it should just work.
                        If you're setting the proxy with the http_proxy environment variable, you might also need to set https_proxy.



                        It might be the case, however, that your proxy does not support HTTPS.



                        It would be easier to help you if you posted the error you are getting.






                        share|improve this answer






























                          0














                          I think it's possible.



                          If you're setting the proxy through Request.meta it should just work.
                          If you're setting the proxy with the http_proxy environment variable, you might also need to set https_proxy.



                          It might be the case, however, that your proxy does not support HTTPS.



                          It would be easier to help you if you posted the error you are getting.






                          share|improve this answer




























                            0












                            0








                            0







                            I think it's possible.



                            If you're setting the proxy through Request.meta it should just work.
                            If you're setting the proxy with the http_proxy environment variable, you might also need to set https_proxy.



                            It might be the case, however, that your proxy does not support HTTPS.



                            It would be easier to help you if you posted the error you are getting.






                            share|improve this answer















                            I think it's possible.



                            If you're setting the proxy through Request.meta it should just work.
                            If you're setting the proxy with the http_proxy environment variable, you might also need to set https_proxy.



                            It might be the case, however, that your proxy does not support HTTPS.



                            It would be easier to help you if you posted the error you are getting.







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Aug 12 '15 at 23:54

























                            answered Aug 12 '15 at 22:46









                            Artur GasparArtur Gaspar

                            3,39111725




                            3,39111725























                                0














                                The Scrapy is automatically bypass the https ssl.
                                like @Aminah Nuraini said, just using base64.encodestring instead of base64.b64encode in the proxy middleware





                                1. add following code in middlewares.py



                                  import base64
                                  class ProxyMiddleware(object):
                                  # overwrite process request
                                  def process_request(self, request, spider):
                                  # Set the location of the proxy
                                  request.meta['proxy'] = "<PROXY_SERVER>:<PROXY_PROT>"
                                  # Use the following lines if your proxy requires authentication
                                  proxy_user_pass = "<PROXY_USERNAME>:<PROXY_PASSWORD>"
                                  # setup basic authentication for the proxy
                                  encoded_user_pass = base64.b64encode(proxy_user_pass)
                                  request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass



                                2. Add the proxy in settings.py



                                  DOWNLOADER_MIDDLEWARES = {

                                  'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,

                                  '<YOUR_PROJECT>.middlewares.ProxyMiddleware': 100,

                                  }







                                share|improve this answer






























                                  0














                                  The Scrapy is automatically bypass the https ssl.
                                  like @Aminah Nuraini said, just using base64.encodestring instead of base64.b64encode in the proxy middleware





                                  1. add following code in middlewares.py



                                    import base64
                                    class ProxyMiddleware(object):
                                    # overwrite process request
                                    def process_request(self, request, spider):
                                    # Set the location of the proxy
                                    request.meta['proxy'] = "<PROXY_SERVER>:<PROXY_PROT>"
                                    # Use the following lines if your proxy requires authentication
                                    proxy_user_pass = "<PROXY_USERNAME>:<PROXY_PASSWORD>"
                                    # setup basic authentication for the proxy
                                    encoded_user_pass = base64.b64encode(proxy_user_pass)
                                    request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass



                                  2. Add the proxy in settings.py



                                    DOWNLOADER_MIDDLEWARES = {

                                    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,

                                    '<YOUR_PROJECT>.middlewares.ProxyMiddleware': 100,

                                    }







                                  share|improve this answer




























                                    0












                                    0








                                    0







                                    The Scrapy is automatically bypass the https ssl.
                                    like @Aminah Nuraini said, just using base64.encodestring instead of base64.b64encode in the proxy middleware





                                    1. add following code in middlewares.py



                                      import base64
                                      class ProxyMiddleware(object):
                                      # overwrite process request
                                      def process_request(self, request, spider):
                                      # Set the location of the proxy
                                      request.meta['proxy'] = "<PROXY_SERVER>:<PROXY_PROT>"
                                      # Use the following lines if your proxy requires authentication
                                      proxy_user_pass = "<PROXY_USERNAME>:<PROXY_PASSWORD>"
                                      # setup basic authentication for the proxy
                                      encoded_user_pass = base64.b64encode(proxy_user_pass)
                                      request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass



                                    2. Add the proxy in settings.py



                                      DOWNLOADER_MIDDLEWARES = {

                                      'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,

                                      '<YOUR_PROJECT>.middlewares.ProxyMiddleware': 100,

                                      }







                                    share|improve this answer















                                    The Scrapy is automatically bypass the https ssl.
                                    like @Aminah Nuraini said, just using base64.encodestring instead of base64.b64encode in the proxy middleware





                                    1. add following code in middlewares.py



                                      import base64
                                      class ProxyMiddleware(object):
                                      # overwrite process request
                                      def process_request(self, request, spider):
                                      # Set the location of the proxy
                                      request.meta['proxy'] = "<PROXY_SERVER>:<PROXY_PROT>"
                                      # Use the following lines if your proxy requires authentication
                                      proxy_user_pass = "<PROXY_USERNAME>:<PROXY_PASSWORD>"
                                      # setup basic authentication for the proxy
                                      encoded_user_pass = base64.b64encode(proxy_user_pass)
                                      request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass



                                    2. Add the proxy in settings.py



                                      DOWNLOADER_MIDDLEWARES = {

                                      'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,

                                      '<YOUR_PROJECT>.middlewares.ProxyMiddleware': 100,

                                      }








                                    share|improve this answer














                                    share|improve this answer



                                    share|improve this answer








                                    edited Nov 13 '18 at 12:18









                                    Suraj Rao

                                    22.8k75469




                                    22.8k75469










                                    answered Nov 13 '18 at 12:13









                                    KenKen

                                    214




                                    214






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f31973947%2fis-it-possible-to-access-https-pages-through-a-proxy-with-scrapy%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Bressuire

                                        Vorschmack

                                        Quarantine