Pandas Dataframes.to_csv truncates long values












0















Problem: I'm trying to store big datasets using Pandas dataframes in python. My trouble is that when I try to save it to csv, chunks of my data is being trunctated, as such:




e+12



and



[value1 value2 value3 . . . value1853 value1854]




Explanation:
I need to store lots of data into single cells, and some of the values I need to store are Long (time) values and I created a short script to display the errors I'm getting:



dframe = pd.DataFrame()
arr = np.array()
for x in range(1234567891230,1234567892230):
arr = np.append(arr,x)
dframe['elements'] = [arr]
print(dframe['elements'][0][999]) # prints correct values, eg. 1234567892229.0
dframe.to_csv('temp.csv', index=False)


In the example above stored values appears as below for the first 1000 values (1234567891230 to 1234567892230)




1.23456789e+12




Which completely ignores the four least significant characters. If you extend the list to 1001 values even more gets truncated:



dframe = pd.DataFrame()
arr = np.array()
for x in range(1234567891230,1234567892231):
arr = np.append(arr,x)
dframe['elements'] = [arr]
print(dframe['elements'][0][999]) # still prints correct values, eg. 1234567892229.0
dframe.to_csv('temp.csv', index=False)


And the full csv file finally looks like this:




elements



"[1.23456789e+12 1.23456789e+12 1.23456789e+12 ... 1.23456789e+12
1.23456789e+12 1.23456789e+12]"




Which has removed almost all of the 1000 elements and replaced them by ... .



Does anyone know any workaround for these problems or how to solve them?



This is not a problem of truncation simply for display (such as Pandas to_html() truncates string contents) but actually corrupts the data stored to csv.










share|improve this question



























    0















    Problem: I'm trying to store big datasets using Pandas dataframes in python. My trouble is that when I try to save it to csv, chunks of my data is being trunctated, as such:




    e+12



    and



    [value1 value2 value3 . . . value1853 value1854]




    Explanation:
    I need to store lots of data into single cells, and some of the values I need to store are Long (time) values and I created a short script to display the errors I'm getting:



    dframe = pd.DataFrame()
    arr = np.array()
    for x in range(1234567891230,1234567892230):
    arr = np.append(arr,x)
    dframe['elements'] = [arr]
    print(dframe['elements'][0][999]) # prints correct values, eg. 1234567892229.0
    dframe.to_csv('temp.csv', index=False)


    In the example above stored values appears as below for the first 1000 values (1234567891230 to 1234567892230)




    1.23456789e+12




    Which completely ignores the four least significant characters. If you extend the list to 1001 values even more gets truncated:



    dframe = pd.DataFrame()
    arr = np.array()
    for x in range(1234567891230,1234567892231):
    arr = np.append(arr,x)
    dframe['elements'] = [arr]
    print(dframe['elements'][0][999]) # still prints correct values, eg. 1234567892229.0
    dframe.to_csv('temp.csv', index=False)


    And the full csv file finally looks like this:




    elements



    "[1.23456789e+12 1.23456789e+12 1.23456789e+12 ... 1.23456789e+12
    1.23456789e+12 1.23456789e+12]"




    Which has removed almost all of the 1000 elements and replaced them by ... .



    Does anyone know any workaround for these problems or how to solve them?



    This is not a problem of truncation simply for display (such as Pandas to_html() truncates string contents) but actually corrupts the data stored to csv.










    share|improve this question

























      0












      0








      0








      Problem: I'm trying to store big datasets using Pandas dataframes in python. My trouble is that when I try to save it to csv, chunks of my data is being trunctated, as such:




      e+12



      and



      [value1 value2 value3 . . . value1853 value1854]




      Explanation:
      I need to store lots of data into single cells, and some of the values I need to store are Long (time) values and I created a short script to display the errors I'm getting:



      dframe = pd.DataFrame()
      arr = np.array()
      for x in range(1234567891230,1234567892230):
      arr = np.append(arr,x)
      dframe['elements'] = [arr]
      print(dframe['elements'][0][999]) # prints correct values, eg. 1234567892229.0
      dframe.to_csv('temp.csv', index=False)


      In the example above stored values appears as below for the first 1000 values (1234567891230 to 1234567892230)




      1.23456789e+12




      Which completely ignores the four least significant characters. If you extend the list to 1001 values even more gets truncated:



      dframe = pd.DataFrame()
      arr = np.array()
      for x in range(1234567891230,1234567892231):
      arr = np.append(arr,x)
      dframe['elements'] = [arr]
      print(dframe['elements'][0][999]) # still prints correct values, eg. 1234567892229.0
      dframe.to_csv('temp.csv', index=False)


      And the full csv file finally looks like this:




      elements



      "[1.23456789e+12 1.23456789e+12 1.23456789e+12 ... 1.23456789e+12
      1.23456789e+12 1.23456789e+12]"




      Which has removed almost all of the 1000 elements and replaced them by ... .



      Does anyone know any workaround for these problems or how to solve them?



      This is not a problem of truncation simply for display (such as Pandas to_html() truncates string contents) but actually corrupts the data stored to csv.










      share|improve this question














      Problem: I'm trying to store big datasets using Pandas dataframes in python. My trouble is that when I try to save it to csv, chunks of my data is being trunctated, as such:




      e+12



      and



      [value1 value2 value3 . . . value1853 value1854]




      Explanation:
      I need to store lots of data into single cells, and some of the values I need to store are Long (time) values and I created a short script to display the errors I'm getting:



      dframe = pd.DataFrame()
      arr = np.array()
      for x in range(1234567891230,1234567892230):
      arr = np.append(arr,x)
      dframe['elements'] = [arr]
      print(dframe['elements'][0][999]) # prints correct values, eg. 1234567892229.0
      dframe.to_csv('temp.csv', index=False)


      In the example above stored values appears as below for the first 1000 values (1234567891230 to 1234567892230)




      1.23456789e+12




      Which completely ignores the four least significant characters. If you extend the list to 1001 values even more gets truncated:



      dframe = pd.DataFrame()
      arr = np.array()
      for x in range(1234567891230,1234567892231):
      arr = np.append(arr,x)
      dframe['elements'] = [arr]
      print(dframe['elements'][0][999]) # still prints correct values, eg. 1234567892229.0
      dframe.to_csv('temp.csv', index=False)


      And the full csv file finally looks like this:




      elements



      "[1.23456789e+12 1.23456789e+12 1.23456789e+12 ... 1.23456789e+12
      1.23456789e+12 1.23456789e+12]"




      Which has removed almost all of the 1000 elements and replaced them by ... .



      Does anyone know any workaround for these problems or how to solve them?



      This is not a problem of truncation simply for display (such as Pandas to_html() truncates string contents) but actually corrupts the data stored to csv.







      python pandas dataframe






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 15 '18 at 9:42









      Jens MartinssonJens Martinsson

      11




      11
























          3 Answers
          3






          active

          oldest

          votes


















          1














          Try setting the dtype of your numpy array to an integer.



          dframe = pd.DataFrame()
          arr = np.array(, dtype='int16')
          for x in range(1234567891230,1234567892230):
          arr = np.append(arr,x)
          dframe['elements'] = [arr]
          print(dframe['elements'][0][999]) # prints correct values, eg. 1234567892229.0
          dframe.to_csv('temp.csv', index=False)


          Elements



          "[1234567891230 1234567891231 1234567891232 ... 1234567891233 1234567891234]"





          share|improve this answer































            0














            So , replicating your code on my machine, I see the rounding, but not the truncation of the list.



            I do not know the best solution but here are some suggestions



            Do you need the file on drive to he human readable?
            Do what system will read it later?




            • if the file will just go into another python step, consider using pickle instead

            • consider turning your list into a string, you have full control over the string (e.g. number of explicit decimal places for each value). If you keep the list structure intact internally, but just wrap it in "" you can easily unpack it with just about any tool out there






            share|improve this answer































              0














              Changing the data type as @Jacob Tomlinson said solves one problem, looking into numpys array2string solved the other.



              Adding np.set_printoptions(threshold=np.nan) stops to_csv from truncating the output strings.



              dframe = pd.DataFrame()
              arr = np.array()
              for x in range(1234567891230,1234567892230):
              arr = np.append(arr,x)
              dframe['elements'] = [arr.astype('uint64')]
              print(dframe['elements'][0][999]) # prints correct values, eg. 1234567892229.0

              np.set_printoptions(threshold=np.nan)
              dframe.to_csv('temp.csv', index=False)





              share|improve this answer























                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53316471%2fpandas-dataframes-to-csv-truncates-long-values%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                1














                Try setting the dtype of your numpy array to an integer.



                dframe = pd.DataFrame()
                arr = np.array(, dtype='int16')
                for x in range(1234567891230,1234567892230):
                arr = np.append(arr,x)
                dframe['elements'] = [arr]
                print(dframe['elements'][0][999]) # prints correct values, eg. 1234567892229.0
                dframe.to_csv('temp.csv', index=False)


                Elements



                "[1234567891230 1234567891231 1234567891232 ... 1234567891233 1234567891234]"





                share|improve this answer




























                  1














                  Try setting the dtype of your numpy array to an integer.



                  dframe = pd.DataFrame()
                  arr = np.array(, dtype='int16')
                  for x in range(1234567891230,1234567892230):
                  arr = np.append(arr,x)
                  dframe['elements'] = [arr]
                  print(dframe['elements'][0][999]) # prints correct values, eg. 1234567892229.0
                  dframe.to_csv('temp.csv', index=False)


                  Elements



                  "[1234567891230 1234567891231 1234567891232 ... 1234567891233 1234567891234]"





                  share|improve this answer


























                    1












                    1








                    1







                    Try setting the dtype of your numpy array to an integer.



                    dframe = pd.DataFrame()
                    arr = np.array(, dtype='int16')
                    for x in range(1234567891230,1234567892230):
                    arr = np.append(arr,x)
                    dframe['elements'] = [arr]
                    print(dframe['elements'][0][999]) # prints correct values, eg. 1234567892229.0
                    dframe.to_csv('temp.csv', index=False)


                    Elements



                    "[1234567891230 1234567891231 1234567891232 ... 1234567891233 1234567891234]"





                    share|improve this answer













                    Try setting the dtype of your numpy array to an integer.



                    dframe = pd.DataFrame()
                    arr = np.array(, dtype='int16')
                    for x in range(1234567891230,1234567892230):
                    arr = np.append(arr,x)
                    dframe['elements'] = [arr]
                    print(dframe['elements'][0][999]) # prints correct values, eg. 1234567892229.0
                    dframe.to_csv('temp.csv', index=False)


                    Elements



                    "[1234567891230 1234567891231 1234567891232 ... 1234567891233 1234567891234]"






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 15 '18 at 9:53









                    Jacob TomlinsonJacob Tomlinson

                    1,64421946




                    1,64421946

























                        0














                        So , replicating your code on my machine, I see the rounding, but not the truncation of the list.



                        I do not know the best solution but here are some suggestions



                        Do you need the file on drive to he human readable?
                        Do what system will read it later?




                        • if the file will just go into another python step, consider using pickle instead

                        • consider turning your list into a string, you have full control over the string (e.g. number of explicit decimal places for each value). If you keep the list structure intact internally, but just wrap it in "" you can easily unpack it with just about any tool out there






                        share|improve this answer




























                          0














                          So , replicating your code on my machine, I see the rounding, but not the truncation of the list.



                          I do not know the best solution but here are some suggestions



                          Do you need the file on drive to he human readable?
                          Do what system will read it later?




                          • if the file will just go into another python step, consider using pickle instead

                          • consider turning your list into a string, you have full control over the string (e.g. number of explicit decimal places for each value). If you keep the list structure intact internally, but just wrap it in "" you can easily unpack it with just about any tool out there






                          share|improve this answer


























                            0












                            0








                            0







                            So , replicating your code on my machine, I see the rounding, but not the truncation of the list.



                            I do not know the best solution but here are some suggestions



                            Do you need the file on drive to he human readable?
                            Do what system will read it later?




                            • if the file will just go into another python step, consider using pickle instead

                            • consider turning your list into a string, you have full control over the string (e.g. number of explicit decimal places for each value). If you keep the list structure intact internally, but just wrap it in "" you can easily unpack it with just about any tool out there






                            share|improve this answer













                            So , replicating your code on my machine, I see the rounding, but not the truncation of the list.



                            I do not know the best solution but here are some suggestions



                            Do you need the file on drive to he human readable?
                            Do what system will read it later?




                            • if the file will just go into another python step, consider using pickle instead

                            • consider turning your list into a string, you have full control over the string (e.g. number of explicit decimal places for each value). If you keep the list structure intact internally, but just wrap it in "" you can easily unpack it with just about any tool out there







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Nov 15 '18 at 10:03









                            dozyaustindozyaustin

                            14411




                            14411























                                0














                                Changing the data type as @Jacob Tomlinson said solves one problem, looking into numpys array2string solved the other.



                                Adding np.set_printoptions(threshold=np.nan) stops to_csv from truncating the output strings.



                                dframe = pd.DataFrame()
                                arr = np.array()
                                for x in range(1234567891230,1234567892230):
                                arr = np.append(arr,x)
                                dframe['elements'] = [arr.astype('uint64')]
                                print(dframe['elements'][0][999]) # prints correct values, eg. 1234567892229.0

                                np.set_printoptions(threshold=np.nan)
                                dframe.to_csv('temp.csv', index=False)





                                share|improve this answer




























                                  0














                                  Changing the data type as @Jacob Tomlinson said solves one problem, looking into numpys array2string solved the other.



                                  Adding np.set_printoptions(threshold=np.nan) stops to_csv from truncating the output strings.



                                  dframe = pd.DataFrame()
                                  arr = np.array()
                                  for x in range(1234567891230,1234567892230):
                                  arr = np.append(arr,x)
                                  dframe['elements'] = [arr.astype('uint64')]
                                  print(dframe['elements'][0][999]) # prints correct values, eg. 1234567892229.0

                                  np.set_printoptions(threshold=np.nan)
                                  dframe.to_csv('temp.csv', index=False)





                                  share|improve this answer


























                                    0












                                    0








                                    0







                                    Changing the data type as @Jacob Tomlinson said solves one problem, looking into numpys array2string solved the other.



                                    Adding np.set_printoptions(threshold=np.nan) stops to_csv from truncating the output strings.



                                    dframe = pd.DataFrame()
                                    arr = np.array()
                                    for x in range(1234567891230,1234567892230):
                                    arr = np.append(arr,x)
                                    dframe['elements'] = [arr.astype('uint64')]
                                    print(dframe['elements'][0][999]) # prints correct values, eg. 1234567892229.0

                                    np.set_printoptions(threshold=np.nan)
                                    dframe.to_csv('temp.csv', index=False)





                                    share|improve this answer













                                    Changing the data type as @Jacob Tomlinson said solves one problem, looking into numpys array2string solved the other.



                                    Adding np.set_printoptions(threshold=np.nan) stops to_csv from truncating the output strings.



                                    dframe = pd.DataFrame()
                                    arr = np.array()
                                    for x in range(1234567891230,1234567892230):
                                    arr = np.append(arr,x)
                                    dframe['elements'] = [arr.astype('uint64')]
                                    print(dframe['elements'][0][999]) # prints correct values, eg. 1234567892229.0

                                    np.set_printoptions(threshold=np.nan)
                                    dframe.to_csv('temp.csv', index=False)






                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Nov 15 '18 at 11:10









                                    Jens MartinssonJens Martinsson

                                    11




                                    11






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53316471%2fpandas-dataframes-to-csv-truncates-long-values%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Xamarin.iOS Cant Deploy on Iphone

                                        Glorious Revolution

                                        Dulmage-Mendelsohn matrix decomposition in Python