Parsing large string values in Pandas
I have a .csv which I've generated a dataframe from. This csv has raw data outputs from a system that follows this format:
{"DataType1":"Value","DataType2":"Value","DataType3":"Value",.....}
Each row in the dataframe has just this in 1 column. I'm trying to break this out so that the data types become column headers and the values populate the rows. One other aspect is that not all rows have the same data types, some have additional data types that might not be present in other rows. For example row 1 may have DataType1, DataType2, and DataType3 and row 2 may have DataType2, DataType4, and DataType5. Ideally I'd like for the output to have the column headers incorporate all data types whether that row has a value for it or not. So the final dataframe would this structure:
-------------------------------------------------------------
| DataType1 | DataType2 | DataType3 | DataType4 | DataType5 |
-------------------------------------------------------------
| Value | Value | Value | NaN | NaN |
-------------------------------------------------------------
| NaN | Value | NaN | Value | Value |
-------------------------------------------------------------
python pandas csv dataframe
add a comment |
I have a .csv which I've generated a dataframe from. This csv has raw data outputs from a system that follows this format:
{"DataType1":"Value","DataType2":"Value","DataType3":"Value",.....}
Each row in the dataframe has just this in 1 column. I'm trying to break this out so that the data types become column headers and the values populate the rows. One other aspect is that not all rows have the same data types, some have additional data types that might not be present in other rows. For example row 1 may have DataType1, DataType2, and DataType3 and row 2 may have DataType2, DataType4, and DataType5. Ideally I'd like for the output to have the column headers incorporate all data types whether that row has a value for it or not. So the final dataframe would this structure:
-------------------------------------------------------------
| DataType1 | DataType2 | DataType3 | DataType4 | DataType5 |
-------------------------------------------------------------
| Value | Value | Value | NaN | NaN |
-------------------------------------------------------------
| NaN | Value | NaN | Value | Value |
-------------------------------------------------------------
python pandas csv dataframe
Hi, welcome to Stack Overflow. Please look around SO for similar problems, e.g stackoverflow.com/questions/14745022/…, stackoverflow.com/questions/29370211/…, stackoverflow.com/questions/39553392/… etc.
– Evan
Nov 16 '18 at 6:02
Possible duplicate of Split strings in tuples into columns, in Pandas
– Evan
Nov 16 '18 at 6:02
If you know, is the data JSON, or a Python dictionary? What have you tried so far?
– Evan
Nov 16 '18 at 6:03
The data is in a csv table as listed above. Each row just has 1 column with 1 string. It follows that dictionary format
– Danny
Nov 17 '18 at 20:26
add a comment |
I have a .csv which I've generated a dataframe from. This csv has raw data outputs from a system that follows this format:
{"DataType1":"Value","DataType2":"Value","DataType3":"Value",.....}
Each row in the dataframe has just this in 1 column. I'm trying to break this out so that the data types become column headers and the values populate the rows. One other aspect is that not all rows have the same data types, some have additional data types that might not be present in other rows. For example row 1 may have DataType1, DataType2, and DataType3 and row 2 may have DataType2, DataType4, and DataType5. Ideally I'd like for the output to have the column headers incorporate all data types whether that row has a value for it or not. So the final dataframe would this structure:
-------------------------------------------------------------
| DataType1 | DataType2 | DataType3 | DataType4 | DataType5 |
-------------------------------------------------------------
| Value | Value | Value | NaN | NaN |
-------------------------------------------------------------
| NaN | Value | NaN | Value | Value |
-------------------------------------------------------------
python pandas csv dataframe
I have a .csv which I've generated a dataframe from. This csv has raw data outputs from a system that follows this format:
{"DataType1":"Value","DataType2":"Value","DataType3":"Value",.....}
Each row in the dataframe has just this in 1 column. I'm trying to break this out so that the data types become column headers and the values populate the rows. One other aspect is that not all rows have the same data types, some have additional data types that might not be present in other rows. For example row 1 may have DataType1, DataType2, and DataType3 and row 2 may have DataType2, DataType4, and DataType5. Ideally I'd like for the output to have the column headers incorporate all data types whether that row has a value for it or not. So the final dataframe would this structure:
-------------------------------------------------------------
| DataType1 | DataType2 | DataType3 | DataType4 | DataType5 |
-------------------------------------------------------------
| Value | Value | Value | NaN | NaN |
-------------------------------------------------------------
| NaN | Value | NaN | Value | Value |
-------------------------------------------------------------
python pandas csv dataframe
python pandas csv dataframe
edited Nov 16 '18 at 7:40
Aqueous Carlos
373415
373415
asked Nov 16 '18 at 5:58
DannyDanny
61
61
Hi, welcome to Stack Overflow. Please look around SO for similar problems, e.g stackoverflow.com/questions/14745022/…, stackoverflow.com/questions/29370211/…, stackoverflow.com/questions/39553392/… etc.
– Evan
Nov 16 '18 at 6:02
Possible duplicate of Split strings in tuples into columns, in Pandas
– Evan
Nov 16 '18 at 6:02
If you know, is the data JSON, or a Python dictionary? What have you tried so far?
– Evan
Nov 16 '18 at 6:03
The data is in a csv table as listed above. Each row just has 1 column with 1 string. It follows that dictionary format
– Danny
Nov 17 '18 at 20:26
add a comment |
Hi, welcome to Stack Overflow. Please look around SO for similar problems, e.g stackoverflow.com/questions/14745022/…, stackoverflow.com/questions/29370211/…, stackoverflow.com/questions/39553392/… etc.
– Evan
Nov 16 '18 at 6:02
Possible duplicate of Split strings in tuples into columns, in Pandas
– Evan
Nov 16 '18 at 6:02
If you know, is the data JSON, or a Python dictionary? What have you tried so far?
– Evan
Nov 16 '18 at 6:03
The data is in a csv table as listed above. Each row just has 1 column with 1 string. It follows that dictionary format
– Danny
Nov 17 '18 at 20:26
Hi, welcome to Stack Overflow. Please look around SO for similar problems, e.g stackoverflow.com/questions/14745022/…, stackoverflow.com/questions/29370211/…, stackoverflow.com/questions/39553392/… etc.
– Evan
Nov 16 '18 at 6:02
Hi, welcome to Stack Overflow. Please look around SO for similar problems, e.g stackoverflow.com/questions/14745022/…, stackoverflow.com/questions/29370211/…, stackoverflow.com/questions/39553392/… etc.
– Evan
Nov 16 '18 at 6:02
Possible duplicate of Split strings in tuples into columns, in Pandas
– Evan
Nov 16 '18 at 6:02
Possible duplicate of Split strings in tuples into columns, in Pandas
– Evan
Nov 16 '18 at 6:02
If you know, is the data JSON, or a Python dictionary? What have you tried so far?
– Evan
Nov 16 '18 at 6:03
If you know, is the data JSON, or a Python dictionary? What have you tried so far?
– Evan
Nov 16 '18 at 6:03
The data is in a csv table as listed above. Each row just has 1 column with 1 string. It follows that dictionary format
– Danny
Nov 17 '18 at 20:26
The data is in a csv table as listed above. Each row just has 1 column with 1 string. It follows that dictionary format
– Danny
Nov 17 '18 at 20:26
add a comment |
1 Answer
1
active
oldest
votes
Dataframes follow this format when converted from a dictionery:
dict = {'column 1':[1,2], 'column 2':[3,4], ...}
Notice that the length of values in each key is same or
pd.DataFrame(dict)
will throw an error.
To surpass the error, you can iterate over the dict and make the DataFrame by parsing it.
pd.DataFrame(dict([(k,pd.Series(v)) for k,v in dict.items() ]))
*Assuming 'dict' is your dictionery name.
This way you'll have the desired output.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53332235%2fparsing-large-string-values-in-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Dataframes follow this format when converted from a dictionery:
dict = {'column 1':[1,2], 'column 2':[3,4], ...}
Notice that the length of values in each key is same or
pd.DataFrame(dict)
will throw an error.
To surpass the error, you can iterate over the dict and make the DataFrame by parsing it.
pd.DataFrame(dict([(k,pd.Series(v)) for k,v in dict.items() ]))
*Assuming 'dict' is your dictionery name.
This way you'll have the desired output.
add a comment |
Dataframes follow this format when converted from a dictionery:
dict = {'column 1':[1,2], 'column 2':[3,4], ...}
Notice that the length of values in each key is same or
pd.DataFrame(dict)
will throw an error.
To surpass the error, you can iterate over the dict and make the DataFrame by parsing it.
pd.DataFrame(dict([(k,pd.Series(v)) for k,v in dict.items() ]))
*Assuming 'dict' is your dictionery name.
This way you'll have the desired output.
add a comment |
Dataframes follow this format when converted from a dictionery:
dict = {'column 1':[1,2], 'column 2':[3,4], ...}
Notice that the length of values in each key is same or
pd.DataFrame(dict)
will throw an error.
To surpass the error, you can iterate over the dict and make the DataFrame by parsing it.
pd.DataFrame(dict([(k,pd.Series(v)) for k,v in dict.items() ]))
*Assuming 'dict' is your dictionery name.
This way you'll have the desired output.
Dataframes follow this format when converted from a dictionery:
dict = {'column 1':[1,2], 'column 2':[3,4], ...}
Notice that the length of values in each key is same or
pd.DataFrame(dict)
will throw an error.
To surpass the error, you can iterate over the dict and make the DataFrame by parsing it.
pd.DataFrame(dict([(k,pd.Series(v)) for k,v in dict.items() ]))
*Assuming 'dict' is your dictionery name.
This way you'll have the desired output.
answered Nov 16 '18 at 6:12
gauravtolanigauravtolani
564
564
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53332235%2fparsing-large-string-values-in-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Hi, welcome to Stack Overflow. Please look around SO for similar problems, e.g stackoverflow.com/questions/14745022/…, stackoverflow.com/questions/29370211/…, stackoverflow.com/questions/39553392/… etc.
– Evan
Nov 16 '18 at 6:02
Possible duplicate of Split strings in tuples into columns, in Pandas
– Evan
Nov 16 '18 at 6:02
If you know, is the data JSON, or a Python dictionary? What have you tried so far?
– Evan
Nov 16 '18 at 6:03
The data is in a csv table as listed above. Each row just has 1 column with 1 string. It follows that dictionary format
– Danny
Nov 17 '18 at 20:26