Parsing large string values in Pandas












1















I have a .csv which I've generated a dataframe from. This csv has raw data outputs from a system that follows this format:



{"DataType1":"Value","DataType2":"Value","DataType3":"Value",.....}


Each row in the dataframe has just this in 1 column. I'm trying to break this out so that the data types become column headers and the values populate the rows. One other aspect is that not all rows have the same data types, some have additional data types that might not be present in other rows. For example row 1 may have DataType1, DataType2, and DataType3 and row 2 may have DataType2, DataType4, and DataType5. Ideally I'd like for the output to have the column headers incorporate all data types whether that row has a value for it or not. So the final dataframe would this structure:



-------------------------------------------------------------
| DataType1 | DataType2 | DataType3 | DataType4 | DataType5 |
-------------------------------------------------------------
| Value | Value | Value | NaN | NaN |
-------------------------------------------------------------
| NaN | Value | NaN | Value | Value |
-------------------------------------------------------------









share|improve this question

























  • Hi, welcome to Stack Overflow. Please look around SO for similar problems, e.g stackoverflow.com/questions/14745022/…, stackoverflow.com/questions/29370211/…, stackoverflow.com/questions/39553392/… etc.

    – Evan
    Nov 16 '18 at 6:02











  • Possible duplicate of Split strings in tuples into columns, in Pandas

    – Evan
    Nov 16 '18 at 6:02











  • If you know, is the data JSON, or a Python dictionary? What have you tried so far?

    – Evan
    Nov 16 '18 at 6:03











  • The data is in a csv table as listed above. Each row just has 1 column with 1 string. It follows that dictionary format

    – Danny
    Nov 17 '18 at 20:26
















1















I have a .csv which I've generated a dataframe from. This csv has raw data outputs from a system that follows this format:



{"DataType1":"Value","DataType2":"Value","DataType3":"Value",.....}


Each row in the dataframe has just this in 1 column. I'm trying to break this out so that the data types become column headers and the values populate the rows. One other aspect is that not all rows have the same data types, some have additional data types that might not be present in other rows. For example row 1 may have DataType1, DataType2, and DataType3 and row 2 may have DataType2, DataType4, and DataType5. Ideally I'd like for the output to have the column headers incorporate all data types whether that row has a value for it or not. So the final dataframe would this structure:



-------------------------------------------------------------
| DataType1 | DataType2 | DataType3 | DataType4 | DataType5 |
-------------------------------------------------------------
| Value | Value | Value | NaN | NaN |
-------------------------------------------------------------
| NaN | Value | NaN | Value | Value |
-------------------------------------------------------------









share|improve this question

























  • Hi, welcome to Stack Overflow. Please look around SO for similar problems, e.g stackoverflow.com/questions/14745022/…, stackoverflow.com/questions/29370211/…, stackoverflow.com/questions/39553392/… etc.

    – Evan
    Nov 16 '18 at 6:02











  • Possible duplicate of Split strings in tuples into columns, in Pandas

    – Evan
    Nov 16 '18 at 6:02











  • If you know, is the data JSON, or a Python dictionary? What have you tried so far?

    – Evan
    Nov 16 '18 at 6:03











  • The data is in a csv table as listed above. Each row just has 1 column with 1 string. It follows that dictionary format

    – Danny
    Nov 17 '18 at 20:26














1












1








1








I have a .csv which I've generated a dataframe from. This csv has raw data outputs from a system that follows this format:



{"DataType1":"Value","DataType2":"Value","DataType3":"Value",.....}


Each row in the dataframe has just this in 1 column. I'm trying to break this out so that the data types become column headers and the values populate the rows. One other aspect is that not all rows have the same data types, some have additional data types that might not be present in other rows. For example row 1 may have DataType1, DataType2, and DataType3 and row 2 may have DataType2, DataType4, and DataType5. Ideally I'd like for the output to have the column headers incorporate all data types whether that row has a value for it or not. So the final dataframe would this structure:



-------------------------------------------------------------
| DataType1 | DataType2 | DataType3 | DataType4 | DataType5 |
-------------------------------------------------------------
| Value | Value | Value | NaN | NaN |
-------------------------------------------------------------
| NaN | Value | NaN | Value | Value |
-------------------------------------------------------------









share|improve this question
















I have a .csv which I've generated a dataframe from. This csv has raw data outputs from a system that follows this format:



{"DataType1":"Value","DataType2":"Value","DataType3":"Value",.....}


Each row in the dataframe has just this in 1 column. I'm trying to break this out so that the data types become column headers and the values populate the rows. One other aspect is that not all rows have the same data types, some have additional data types that might not be present in other rows. For example row 1 may have DataType1, DataType2, and DataType3 and row 2 may have DataType2, DataType4, and DataType5. Ideally I'd like for the output to have the column headers incorporate all data types whether that row has a value for it or not. So the final dataframe would this structure:



-------------------------------------------------------------
| DataType1 | DataType2 | DataType3 | DataType4 | DataType5 |
-------------------------------------------------------------
| Value | Value | Value | NaN | NaN |
-------------------------------------------------------------
| NaN | Value | NaN | Value | Value |
-------------------------------------------------------------






python pandas csv dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 16 '18 at 7:40









Aqueous Carlos

373415




373415










asked Nov 16 '18 at 5:58









DannyDanny

61




61













  • Hi, welcome to Stack Overflow. Please look around SO for similar problems, e.g stackoverflow.com/questions/14745022/…, stackoverflow.com/questions/29370211/…, stackoverflow.com/questions/39553392/… etc.

    – Evan
    Nov 16 '18 at 6:02











  • Possible duplicate of Split strings in tuples into columns, in Pandas

    – Evan
    Nov 16 '18 at 6:02











  • If you know, is the data JSON, or a Python dictionary? What have you tried so far?

    – Evan
    Nov 16 '18 at 6:03











  • The data is in a csv table as listed above. Each row just has 1 column with 1 string. It follows that dictionary format

    – Danny
    Nov 17 '18 at 20:26



















  • Hi, welcome to Stack Overflow. Please look around SO for similar problems, e.g stackoverflow.com/questions/14745022/…, stackoverflow.com/questions/29370211/…, stackoverflow.com/questions/39553392/… etc.

    – Evan
    Nov 16 '18 at 6:02











  • Possible duplicate of Split strings in tuples into columns, in Pandas

    – Evan
    Nov 16 '18 at 6:02











  • If you know, is the data JSON, or a Python dictionary? What have you tried so far?

    – Evan
    Nov 16 '18 at 6:03











  • The data is in a csv table as listed above. Each row just has 1 column with 1 string. It follows that dictionary format

    – Danny
    Nov 17 '18 at 20:26

















Hi, welcome to Stack Overflow. Please look around SO for similar problems, e.g stackoverflow.com/questions/14745022/…, stackoverflow.com/questions/29370211/…, stackoverflow.com/questions/39553392/… etc.

– Evan
Nov 16 '18 at 6:02





Hi, welcome to Stack Overflow. Please look around SO for similar problems, e.g stackoverflow.com/questions/14745022/…, stackoverflow.com/questions/29370211/…, stackoverflow.com/questions/39553392/… etc.

– Evan
Nov 16 '18 at 6:02













Possible duplicate of Split strings in tuples into columns, in Pandas

– Evan
Nov 16 '18 at 6:02





Possible duplicate of Split strings in tuples into columns, in Pandas

– Evan
Nov 16 '18 at 6:02













If you know, is the data JSON, or a Python dictionary? What have you tried so far?

– Evan
Nov 16 '18 at 6:03





If you know, is the data JSON, or a Python dictionary? What have you tried so far?

– Evan
Nov 16 '18 at 6:03













The data is in a csv table as listed above. Each row just has 1 column with 1 string. It follows that dictionary format

– Danny
Nov 17 '18 at 20:26





The data is in a csv table as listed above. Each row just has 1 column with 1 string. It follows that dictionary format

– Danny
Nov 17 '18 at 20:26












1 Answer
1






active

oldest

votes


















0














Dataframes follow this format when converted from a dictionery:



dict = {'column 1':[1,2], 'column 2':[3,4], ...}


Notice that the length of values in each key is same or



pd.DataFrame(dict)


will throw an error.



To surpass the error, you can iterate over the dict and make the DataFrame by parsing it.



pd.DataFrame(dict([(k,pd.Series(v)) for k,v in dict.items() ]))


*Assuming 'dict' is your dictionery name.



This way you'll have the desired output.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53332235%2fparsing-large-string-values-in-pandas%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Dataframes follow this format when converted from a dictionery:



    dict = {'column 1':[1,2], 'column 2':[3,4], ...}


    Notice that the length of values in each key is same or



    pd.DataFrame(dict)


    will throw an error.



    To surpass the error, you can iterate over the dict and make the DataFrame by parsing it.



    pd.DataFrame(dict([(k,pd.Series(v)) for k,v in dict.items() ]))


    *Assuming 'dict' is your dictionery name.



    This way you'll have the desired output.






    share|improve this answer




























      0














      Dataframes follow this format when converted from a dictionery:



      dict = {'column 1':[1,2], 'column 2':[3,4], ...}


      Notice that the length of values in each key is same or



      pd.DataFrame(dict)


      will throw an error.



      To surpass the error, you can iterate over the dict and make the DataFrame by parsing it.



      pd.DataFrame(dict([(k,pd.Series(v)) for k,v in dict.items() ]))


      *Assuming 'dict' is your dictionery name.



      This way you'll have the desired output.






      share|improve this answer


























        0












        0








        0







        Dataframes follow this format when converted from a dictionery:



        dict = {'column 1':[1,2], 'column 2':[3,4], ...}


        Notice that the length of values in each key is same or



        pd.DataFrame(dict)


        will throw an error.



        To surpass the error, you can iterate over the dict and make the DataFrame by parsing it.



        pd.DataFrame(dict([(k,pd.Series(v)) for k,v in dict.items() ]))


        *Assuming 'dict' is your dictionery name.



        This way you'll have the desired output.






        share|improve this answer













        Dataframes follow this format when converted from a dictionery:



        dict = {'column 1':[1,2], 'column 2':[3,4], ...}


        Notice that the length of values in each key is same or



        pd.DataFrame(dict)


        will throw an error.



        To surpass the error, you can iterate over the dict and make the DataFrame by parsing it.



        pd.DataFrame(dict([(k,pd.Series(v)) for k,v in dict.items() ]))


        *Assuming 'dict' is your dictionery name.



        This way you'll have the desired output.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 16 '18 at 6:12









        gauravtolanigauravtolani

        564




        564
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53332235%2fparsing-large-string-values-in-pandas%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            List item for chat from Array inside array React Native

            Thiostrepton

            Caerphilly