Twitter Streaming API - urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead












3















Running a python script using tweepy which streams (using the twitter streaming API) in a random sample of english tweets, for a minute and then alternates to searching (using the twitter searching API) for a minute and then returns. Issue I've found is that after about 40+ seconds the streaming crashes and gives the following error:



Full Error:




urllib3.exceptions.ProtocolError: ('Connection broken:
IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))




The amount of bytes read can vary from 0 to well in the 1000's.



The first time this is seen the streaming cuts out prematurely and the search function starts early, after the search function is done it comes back to the stream once again and on the second recurrence of this error the code crashes.



The code I'm running is:



# Handles date time calculation
def calculateTweetDateTime(tweet):
tweetDateTime = str(tweet.created_at)

tweetDateTime = ciso8601.parse_datetime(tweetDateTime)
time.mktime(tweetDateTime.timetuple())
return tweetDateTime

# Checks to see whether that permitted time has past.
def hasTimeThresholdPast():
global startTime
if time.clock() - startTime > 60:
return True
else:
return False

#override tweepy.StreamListener to add logic to on_status
class StreamListener(StreamListener):

def on_status(self, tweet):
if hasTimeThresholdPast():
return False

if hasattr(tweet, 'lang'):
if tweet.lang == 'en':

try:
tweetText = tweet.extended_tweet["full_text"]
except AttributeError:
tweetText = tweet.text

tweetDateTime = calculateTweetDateTime(tweet)

entityList = DataProcessing.identifyEntities(True, tweetText)
DataStorage.storeHotTerm(entityList, tweetDateTime)
DataStorage.storeTweet(tweet)


def on_error(self, status_code):
def on_error(self, status_code):
if status_code == 420:
# returning False in on_data disconnects the stream
return False


def startTwitterStream():

searchTerms =

myStreamListener = StreamListener()
twitterStream = Stream(auth=api.auth, listener=StreamListener())
global geoGatheringTag
if geoGatheringTag == False:
twitterStream.filter(track=['the', 'this', 'is', 'their', 'though', 'a', 'an'], async=True, stall_warnings=True)

if geoGatheringTag == True:
twitterStream.filter(track=['the', 'this', 'is', 'their', 'though', 'a', 'an', 'they're'],
async=False, locations=[-4.5091, 55.7562, -3.9814, 55.9563], stall_warnings=True)



# ----------------------- Twitter API Functions ------------------------
# XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# --------------------------- Main Function ----------------------------

startTime = 0


def main():
global startTime
userInput = ""
userInput.lower()
while userInput != "-1":
userInput = input("Type ACTiVATE to activate the Crawler, or DATABASE to access data analytic option (-1 to exit): n")
if userInput.lower() == 'activate':
while(True):
startTime = time.clock()

startTwitterStream()

startTime = time.clock()
startTwitterSearchAPI()

if __name__ == '__main__':
main()


I've trimmed out the search function, and database handling aspects given they're seperate and to avoid cluttering up the code.



If anyone has any ideas why this is happening and how I might solve it please let me know, I'd be curious on any insight.





Solutions I have tried:

Try/Except block with the http.client.IncompleteRead:

As per Error-while-fetching-tweets-with-tweepy



Setting Stall_Warning = to True:

As per Incompleteread-error-when-retrieving-twitter-data-using-python



Removing the english language filter.










share|improve this question





























    3















    Running a python script using tweepy which streams (using the twitter streaming API) in a random sample of english tweets, for a minute and then alternates to searching (using the twitter searching API) for a minute and then returns. Issue I've found is that after about 40+ seconds the streaming crashes and gives the following error:



    Full Error:




    urllib3.exceptions.ProtocolError: ('Connection broken:
    IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))




    The amount of bytes read can vary from 0 to well in the 1000's.



    The first time this is seen the streaming cuts out prematurely and the search function starts early, after the search function is done it comes back to the stream once again and on the second recurrence of this error the code crashes.



    The code I'm running is:



    # Handles date time calculation
    def calculateTweetDateTime(tweet):
    tweetDateTime = str(tweet.created_at)

    tweetDateTime = ciso8601.parse_datetime(tweetDateTime)
    time.mktime(tweetDateTime.timetuple())
    return tweetDateTime

    # Checks to see whether that permitted time has past.
    def hasTimeThresholdPast():
    global startTime
    if time.clock() - startTime > 60:
    return True
    else:
    return False

    #override tweepy.StreamListener to add logic to on_status
    class StreamListener(StreamListener):

    def on_status(self, tweet):
    if hasTimeThresholdPast():
    return False

    if hasattr(tweet, 'lang'):
    if tweet.lang == 'en':

    try:
    tweetText = tweet.extended_tweet["full_text"]
    except AttributeError:
    tweetText = tweet.text

    tweetDateTime = calculateTweetDateTime(tweet)

    entityList = DataProcessing.identifyEntities(True, tweetText)
    DataStorage.storeHotTerm(entityList, tweetDateTime)
    DataStorage.storeTweet(tweet)


    def on_error(self, status_code):
    def on_error(self, status_code):
    if status_code == 420:
    # returning False in on_data disconnects the stream
    return False


    def startTwitterStream():

    searchTerms =

    myStreamListener = StreamListener()
    twitterStream = Stream(auth=api.auth, listener=StreamListener())
    global geoGatheringTag
    if geoGatheringTag == False:
    twitterStream.filter(track=['the', 'this', 'is', 'their', 'though', 'a', 'an'], async=True, stall_warnings=True)

    if geoGatheringTag == True:
    twitterStream.filter(track=['the', 'this', 'is', 'their', 'though', 'a', 'an', 'they're'],
    async=False, locations=[-4.5091, 55.7562, -3.9814, 55.9563], stall_warnings=True)



    # ----------------------- Twitter API Functions ------------------------
    # XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    # --------------------------- Main Function ----------------------------

    startTime = 0


    def main():
    global startTime
    userInput = ""
    userInput.lower()
    while userInput != "-1":
    userInput = input("Type ACTiVATE to activate the Crawler, or DATABASE to access data analytic option (-1 to exit): n")
    if userInput.lower() == 'activate':
    while(True):
    startTime = time.clock()

    startTwitterStream()

    startTime = time.clock()
    startTwitterSearchAPI()

    if __name__ == '__main__':
    main()


    I've trimmed out the search function, and database handling aspects given they're seperate and to avoid cluttering up the code.



    If anyone has any ideas why this is happening and how I might solve it please let me know, I'd be curious on any insight.





    Solutions I have tried:

    Try/Except block with the http.client.IncompleteRead:

    As per Error-while-fetching-tweets-with-tweepy



    Setting Stall_Warning = to True:

    As per Incompleteread-error-when-retrieving-twitter-data-using-python



    Removing the english language filter.










    share|improve this question



























      3












      3








      3


      1






      Running a python script using tweepy which streams (using the twitter streaming API) in a random sample of english tweets, for a minute and then alternates to searching (using the twitter searching API) for a minute and then returns. Issue I've found is that after about 40+ seconds the streaming crashes and gives the following error:



      Full Error:




      urllib3.exceptions.ProtocolError: ('Connection broken:
      IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))




      The amount of bytes read can vary from 0 to well in the 1000's.



      The first time this is seen the streaming cuts out prematurely and the search function starts early, after the search function is done it comes back to the stream once again and on the second recurrence of this error the code crashes.



      The code I'm running is:



      # Handles date time calculation
      def calculateTweetDateTime(tweet):
      tweetDateTime = str(tweet.created_at)

      tweetDateTime = ciso8601.parse_datetime(tweetDateTime)
      time.mktime(tweetDateTime.timetuple())
      return tweetDateTime

      # Checks to see whether that permitted time has past.
      def hasTimeThresholdPast():
      global startTime
      if time.clock() - startTime > 60:
      return True
      else:
      return False

      #override tweepy.StreamListener to add logic to on_status
      class StreamListener(StreamListener):

      def on_status(self, tweet):
      if hasTimeThresholdPast():
      return False

      if hasattr(tweet, 'lang'):
      if tweet.lang == 'en':

      try:
      tweetText = tweet.extended_tweet["full_text"]
      except AttributeError:
      tweetText = tweet.text

      tweetDateTime = calculateTweetDateTime(tweet)

      entityList = DataProcessing.identifyEntities(True, tweetText)
      DataStorage.storeHotTerm(entityList, tweetDateTime)
      DataStorage.storeTweet(tweet)


      def on_error(self, status_code):
      def on_error(self, status_code):
      if status_code == 420:
      # returning False in on_data disconnects the stream
      return False


      def startTwitterStream():

      searchTerms =

      myStreamListener = StreamListener()
      twitterStream = Stream(auth=api.auth, listener=StreamListener())
      global geoGatheringTag
      if geoGatheringTag == False:
      twitterStream.filter(track=['the', 'this', 'is', 'their', 'though', 'a', 'an'], async=True, stall_warnings=True)

      if geoGatheringTag == True:
      twitterStream.filter(track=['the', 'this', 'is', 'their', 'though', 'a', 'an', 'they're'],
      async=False, locations=[-4.5091, 55.7562, -3.9814, 55.9563], stall_warnings=True)



      # ----------------------- Twitter API Functions ------------------------
      # XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
      # --------------------------- Main Function ----------------------------

      startTime = 0


      def main():
      global startTime
      userInput = ""
      userInput.lower()
      while userInput != "-1":
      userInput = input("Type ACTiVATE to activate the Crawler, or DATABASE to access data analytic option (-1 to exit): n")
      if userInput.lower() == 'activate':
      while(True):
      startTime = time.clock()

      startTwitterStream()

      startTime = time.clock()
      startTwitterSearchAPI()

      if __name__ == '__main__':
      main()


      I've trimmed out the search function, and database handling aspects given they're seperate and to avoid cluttering up the code.



      If anyone has any ideas why this is happening and how I might solve it please let me know, I'd be curious on any insight.





      Solutions I have tried:

      Try/Except block with the http.client.IncompleteRead:

      As per Error-while-fetching-tweets-with-tweepy



      Setting Stall_Warning = to True:

      As per Incompleteread-error-when-retrieving-twitter-data-using-python



      Removing the english language filter.










      share|improve this question
















      Running a python script using tweepy which streams (using the twitter streaming API) in a random sample of english tweets, for a minute and then alternates to searching (using the twitter searching API) for a minute and then returns. Issue I've found is that after about 40+ seconds the streaming crashes and gives the following error:



      Full Error:




      urllib3.exceptions.ProtocolError: ('Connection broken:
      IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))




      The amount of bytes read can vary from 0 to well in the 1000's.



      The first time this is seen the streaming cuts out prematurely and the search function starts early, after the search function is done it comes back to the stream once again and on the second recurrence of this error the code crashes.



      The code I'm running is:



      # Handles date time calculation
      def calculateTweetDateTime(tweet):
      tweetDateTime = str(tweet.created_at)

      tweetDateTime = ciso8601.parse_datetime(tweetDateTime)
      time.mktime(tweetDateTime.timetuple())
      return tweetDateTime

      # Checks to see whether that permitted time has past.
      def hasTimeThresholdPast():
      global startTime
      if time.clock() - startTime > 60:
      return True
      else:
      return False

      #override tweepy.StreamListener to add logic to on_status
      class StreamListener(StreamListener):

      def on_status(self, tweet):
      if hasTimeThresholdPast():
      return False

      if hasattr(tweet, 'lang'):
      if tweet.lang == 'en':

      try:
      tweetText = tweet.extended_tweet["full_text"]
      except AttributeError:
      tweetText = tweet.text

      tweetDateTime = calculateTweetDateTime(tweet)

      entityList = DataProcessing.identifyEntities(True, tweetText)
      DataStorage.storeHotTerm(entityList, tweetDateTime)
      DataStorage.storeTweet(tweet)


      def on_error(self, status_code):
      def on_error(self, status_code):
      if status_code == 420:
      # returning False in on_data disconnects the stream
      return False


      def startTwitterStream():

      searchTerms =

      myStreamListener = StreamListener()
      twitterStream = Stream(auth=api.auth, listener=StreamListener())
      global geoGatheringTag
      if geoGatheringTag == False:
      twitterStream.filter(track=['the', 'this', 'is', 'their', 'though', 'a', 'an'], async=True, stall_warnings=True)

      if geoGatheringTag == True:
      twitterStream.filter(track=['the', 'this', 'is', 'their', 'though', 'a', 'an', 'they're'],
      async=False, locations=[-4.5091, 55.7562, -3.9814, 55.9563], stall_warnings=True)



      # ----------------------- Twitter API Functions ------------------------
      # XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
      # --------------------------- Main Function ----------------------------

      startTime = 0


      def main():
      global startTime
      userInput = ""
      userInput.lower()
      while userInput != "-1":
      userInput = input("Type ACTiVATE to activate the Crawler, or DATABASE to access data analytic option (-1 to exit): n")
      if userInput.lower() == 'activate':
      while(True):
      startTime = time.clock()

      startTwitterStream()

      startTime = time.clock()
      startTwitterSearchAPI()

      if __name__ == '__main__':
      main()


      I've trimmed out the search function, and database handling aspects given they're seperate and to avoid cluttering up the code.



      If anyone has any ideas why this is happening and how I might solve it please let me know, I'd be curious on any insight.





      Solutions I have tried:

      Try/Except block with the http.client.IncompleteRead:

      As per Error-while-fetching-tweets-with-tweepy



      Setting Stall_Warning = to True:

      As per Incompleteread-error-when-retrieving-twitter-data-using-python



      Removing the english language filter.







      python twitter tweepy twitter-streaming-api






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 15 '18 at 19:51







      Chris Cookman

















      asked Nov 15 '18 at 19:45









      Chris CookmanChris Cookman

      514




      514
























          1 Answer
          1






          active

          oldest

          votes


















          2














          Solved.



          To those curious or who are experiencing a similar issue: after some experimentation I've discovered the backlog of incoming tweets was the issue. Every time the system recieves a tweet my system ran a process of entity identification and storing which cost a small piece of time and over the time of gathering several hundred to thousand tweets this backlog grew larger and larger until the API couldn't handle it and threw up that error.



          Solution: Strip your "on_status/on_data/on_success" function to the bare essentials and handle any computations, i.e storing or entity identification, seperately after the streaming session has closed. Alternatively you could make your computations much more efficient and make the gap in time insubstantial, up to you.






          share|improve this answer
























          • this helped me a lot, i was having the same issues. Basically the solution is to just dump the data and do the processing separately as you rightly mention..

            – tezzaaa
            2 days ago











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53326879%2ftwitter-streaming-api-urllib3-exceptions-protocolerror-connection-broken-i%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          Solved.



          To those curious or who are experiencing a similar issue: after some experimentation I've discovered the backlog of incoming tweets was the issue. Every time the system recieves a tweet my system ran a process of entity identification and storing which cost a small piece of time and over the time of gathering several hundred to thousand tweets this backlog grew larger and larger until the API couldn't handle it and threw up that error.



          Solution: Strip your "on_status/on_data/on_success" function to the bare essentials and handle any computations, i.e storing or entity identification, seperately after the streaming session has closed. Alternatively you could make your computations much more efficient and make the gap in time insubstantial, up to you.






          share|improve this answer
























          • this helped me a lot, i was having the same issues. Basically the solution is to just dump the data and do the processing separately as you rightly mention..

            – tezzaaa
            2 days ago
















          2














          Solved.



          To those curious or who are experiencing a similar issue: after some experimentation I've discovered the backlog of incoming tweets was the issue. Every time the system recieves a tweet my system ran a process of entity identification and storing which cost a small piece of time and over the time of gathering several hundred to thousand tweets this backlog grew larger and larger until the API couldn't handle it and threw up that error.



          Solution: Strip your "on_status/on_data/on_success" function to the bare essentials and handle any computations, i.e storing or entity identification, seperately after the streaming session has closed. Alternatively you could make your computations much more efficient and make the gap in time insubstantial, up to you.






          share|improve this answer
























          • this helped me a lot, i was having the same issues. Basically the solution is to just dump the data and do the processing separately as you rightly mention..

            – tezzaaa
            2 days ago














          2












          2








          2







          Solved.



          To those curious or who are experiencing a similar issue: after some experimentation I've discovered the backlog of incoming tweets was the issue. Every time the system recieves a tweet my system ran a process of entity identification and storing which cost a small piece of time and over the time of gathering several hundred to thousand tweets this backlog grew larger and larger until the API couldn't handle it and threw up that error.



          Solution: Strip your "on_status/on_data/on_success" function to the bare essentials and handle any computations, i.e storing or entity identification, seperately after the streaming session has closed. Alternatively you could make your computations much more efficient and make the gap in time insubstantial, up to you.






          share|improve this answer













          Solved.



          To those curious or who are experiencing a similar issue: after some experimentation I've discovered the backlog of incoming tweets was the issue. Every time the system recieves a tweet my system ran a process of entity identification and storing which cost a small piece of time and over the time of gathering several hundred to thousand tweets this backlog grew larger and larger until the API couldn't handle it and threw up that error.



          Solution: Strip your "on_status/on_data/on_success" function to the bare essentials and handle any computations, i.e storing or entity identification, seperately after the streaming session has closed. Alternatively you could make your computations much more efficient and make the gap in time insubstantial, up to you.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 16 '18 at 23:50









          Chris CookmanChris Cookman

          514




          514













          • this helped me a lot, i was having the same issues. Basically the solution is to just dump the data and do the processing separately as you rightly mention..

            – tezzaaa
            2 days ago



















          • this helped me a lot, i was having the same issues. Basically the solution is to just dump the data and do the processing separately as you rightly mention..

            – tezzaaa
            2 days ago

















          this helped me a lot, i was having the same issues. Basically the solution is to just dump the data and do the processing separately as you rightly mention..

          – tezzaaa
          2 days ago





          this helped me a lot, i was having the same issues. Basically the solution is to just dump the data and do the processing separately as you rightly mention..

          – tezzaaa
          2 days ago




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53326879%2ftwitter-streaming-api-urllib3-exceptions-protocolerror-connection-broken-i%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Bressuire

          Vorschmack

          Quarantine