Pandas datatype change within a function
General background
I've written a function which incorporates a MySQL query, with some munging on the returned data (pulled into a pandas df).
enginedb =create_engine("mysql+mysqlconnector://user:pswd@10.0.10.26:3306/db",
encoding='latin1')
query = ("""Select blah blah""")
df = pd.read_sql(query, enginedb)
This works fine - the query is a significant one with multiple joins etc.. However, it transpired for a certain lot within the db, the datatype was off: for almost all 'normal' lots, the datatypes for the columns were int64
, some object
, a datetime64[ns]
... but for one lot (so far), all but the datetime were returning as object
.
Issue
I need to do a stack - one of the columns is a list, and i've got some code to take each item of the list and stack them down row by row:
cols = list(df)
cols = cols[:-1]
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()
The problem is this doesn't work for this 'odd' lot, with the non-standard datatypes (the reason for the non-std data types is due to an upstream ETL process and i can't affect that).
The exact error is:
'Series' object has no attribute 'stack'
Consequently I had incorporated an if/else
statement, checking to see if the dtype of one of the cols is incorrect, and if so, change it:
if df['id'].dtype == 'int64':
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()
df_stack = df_stack.reset_index()
else:
df_stack = df.apply(pd.to_numeric, errors = 'coerce')
# it can't be more specific than for all the columns, as there are a LOT
But this is having no effect - i've included in the function (containing the query and subsequent munging) a print out statement of dy.dtypes
and df_stack.dtypes
and the function is having no effect.
Why is this?
EDIT
I've added this picture to show the code (at right) which is attempting to catch the incorrectly-dtyped lot (12384), and the print-outs before and after the pd.to_numeric
function (which both show just objects, no numeric cols).
My underlying questions has two parts:
- What would cause
'Series' object has no attribute 'stack'
? (more fundamentally than wrong datatype - or at least why is datatype an issue?) - Why would a pd.numeric not cause any change here?
python pandas
|
show 3 more comments
General background
I've written a function which incorporates a MySQL query, with some munging on the returned data (pulled into a pandas df).
enginedb =create_engine("mysql+mysqlconnector://user:pswd@10.0.10.26:3306/db",
encoding='latin1')
query = ("""Select blah blah""")
df = pd.read_sql(query, enginedb)
This works fine - the query is a significant one with multiple joins etc.. However, it transpired for a certain lot within the db, the datatype was off: for almost all 'normal' lots, the datatypes for the columns were int64
, some object
, a datetime64[ns]
... but for one lot (so far), all but the datetime were returning as object
.
Issue
I need to do a stack - one of the columns is a list, and i've got some code to take each item of the list and stack them down row by row:
cols = list(df)
cols = cols[:-1]
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()
The problem is this doesn't work for this 'odd' lot, with the non-standard datatypes (the reason for the non-std data types is due to an upstream ETL process and i can't affect that).
The exact error is:
'Series' object has no attribute 'stack'
Consequently I had incorporated an if/else
statement, checking to see if the dtype of one of the cols is incorrect, and if so, change it:
if df['id'].dtype == 'int64':
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()
df_stack = df_stack.reset_index()
else:
df_stack = df.apply(pd.to_numeric, errors = 'coerce')
# it can't be more specific than for all the columns, as there are a LOT
But this is having no effect - i've included in the function (containing the query and subsequent munging) a print out statement of dy.dtypes
and df_stack.dtypes
and the function is having no effect.
Why is this?
EDIT
I've added this picture to show the code (at right) which is attempting to catch the incorrectly-dtyped lot (12384), and the print-outs before and after the pd.to_numeric
function (which both show just objects, no numeric cols).
My underlying questions has two parts:
- What would cause
'Series' object has no attribute 'stack'
? (more fundamentally than wrong datatype - or at least why is datatype an issue?) - Why would a pd.numeric not cause any change here?
python pandas
1
Could you provide some input data and expected output?
– Franco Piccolo
Nov 13 '18 at 14:30
I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue
– BAC83
Nov 13 '18 at 14:46
Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.
– Franco Piccolo
Nov 13 '18 at 14:58
1
It works with the others because they are probably dataframes.
– Franco Piccolo
Nov 13 '18 at 15:14
1
It will work as long as it is a DataFrame. Not when it is a Series.
– Franco Piccolo
Nov 13 '18 at 17:04
|
show 3 more comments
General background
I've written a function which incorporates a MySQL query, with some munging on the returned data (pulled into a pandas df).
enginedb =create_engine("mysql+mysqlconnector://user:pswd@10.0.10.26:3306/db",
encoding='latin1')
query = ("""Select blah blah""")
df = pd.read_sql(query, enginedb)
This works fine - the query is a significant one with multiple joins etc.. However, it transpired for a certain lot within the db, the datatype was off: for almost all 'normal' lots, the datatypes for the columns were int64
, some object
, a datetime64[ns]
... but for one lot (so far), all but the datetime were returning as object
.
Issue
I need to do a stack - one of the columns is a list, and i've got some code to take each item of the list and stack them down row by row:
cols = list(df)
cols = cols[:-1]
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()
The problem is this doesn't work for this 'odd' lot, with the non-standard datatypes (the reason for the non-std data types is due to an upstream ETL process and i can't affect that).
The exact error is:
'Series' object has no attribute 'stack'
Consequently I had incorporated an if/else
statement, checking to see if the dtype of one of the cols is incorrect, and if so, change it:
if df['id'].dtype == 'int64':
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()
df_stack = df_stack.reset_index()
else:
df_stack = df.apply(pd.to_numeric, errors = 'coerce')
# it can't be more specific than for all the columns, as there are a LOT
But this is having no effect - i've included in the function (containing the query and subsequent munging) a print out statement of dy.dtypes
and df_stack.dtypes
and the function is having no effect.
Why is this?
EDIT
I've added this picture to show the code (at right) which is attempting to catch the incorrectly-dtyped lot (12384), and the print-outs before and after the pd.to_numeric
function (which both show just objects, no numeric cols).
My underlying questions has two parts:
- What would cause
'Series' object has no attribute 'stack'
? (more fundamentally than wrong datatype - or at least why is datatype an issue?) - Why would a pd.numeric not cause any change here?
python pandas
General background
I've written a function which incorporates a MySQL query, with some munging on the returned data (pulled into a pandas df).
enginedb =create_engine("mysql+mysqlconnector://user:pswd@10.0.10.26:3306/db",
encoding='latin1')
query = ("""Select blah blah""")
df = pd.read_sql(query, enginedb)
This works fine - the query is a significant one with multiple joins etc.. However, it transpired for a certain lot within the db, the datatype was off: for almost all 'normal' lots, the datatypes for the columns were int64
, some object
, a datetime64[ns]
... but for one lot (so far), all but the datetime were returning as object
.
Issue
I need to do a stack - one of the columns is a list, and i've got some code to take each item of the list and stack them down row by row:
cols = list(df)
cols = cols[:-1]
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()
The problem is this doesn't work for this 'odd' lot, with the non-standard datatypes (the reason for the non-std data types is due to an upstream ETL process and i can't affect that).
The exact error is:
'Series' object has no attribute 'stack'
Consequently I had incorporated an if/else
statement, checking to see if the dtype of one of the cols is incorrect, and if so, change it:
if df['id'].dtype == 'int64':
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()
df_stack = df_stack.reset_index()
else:
df_stack = df.apply(pd.to_numeric, errors = 'coerce')
# it can't be more specific than for all the columns, as there are a LOT
But this is having no effect - i've included in the function (containing the query and subsequent munging) a print out statement of dy.dtypes
and df_stack.dtypes
and the function is having no effect.
Why is this?
EDIT
I've added this picture to show the code (at right) which is attempting to catch the incorrectly-dtyped lot (12384), and the print-outs before and after the pd.to_numeric
function (which both show just objects, no numeric cols).
My underlying questions has two parts:
- What would cause
'Series' object has no attribute 'stack'
? (more fundamentally than wrong datatype - or at least why is datatype an issue?) - Why would a pd.numeric not cause any change here?
python pandas
python pandas
edited Nov 13 '18 at 14:53
BAC83
asked Nov 13 '18 at 14:25
BAC83BAC83
1159
1159
1
Could you provide some input data and expected output?
– Franco Piccolo
Nov 13 '18 at 14:30
I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue
– BAC83
Nov 13 '18 at 14:46
Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.
– Franco Piccolo
Nov 13 '18 at 14:58
1
It works with the others because they are probably dataframes.
– Franco Piccolo
Nov 13 '18 at 15:14
1
It will work as long as it is a DataFrame. Not when it is a Series.
– Franco Piccolo
Nov 13 '18 at 17:04
|
show 3 more comments
1
Could you provide some input data and expected output?
– Franco Piccolo
Nov 13 '18 at 14:30
I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue
– BAC83
Nov 13 '18 at 14:46
Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.
– Franco Piccolo
Nov 13 '18 at 14:58
1
It works with the others because they are probably dataframes.
– Franco Piccolo
Nov 13 '18 at 15:14
1
It will work as long as it is a DataFrame. Not when it is a Series.
– Franco Piccolo
Nov 13 '18 at 17:04
1
1
Could you provide some input data and expected output?
– Franco Piccolo
Nov 13 '18 at 14:30
Could you provide some input data and expected output?
– Franco Piccolo
Nov 13 '18 at 14:30
I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue
– BAC83
Nov 13 '18 at 14:46
I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue
– BAC83
Nov 13 '18 at 14:46
Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.
– Franco Piccolo
Nov 13 '18 at 14:58
Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.
– Franco Piccolo
Nov 13 '18 at 14:58
1
1
It works with the others because they are probably dataframes.
– Franco Piccolo
Nov 13 '18 at 15:14
It works with the others because they are probably dataframes.
– Franco Piccolo
Nov 13 '18 at 15:14
1
1
It will work as long as it is a DataFrame. Not when it is a Series.
– Franco Piccolo
Nov 13 '18 at 17:04
It will work as long as it is a DataFrame. Not when it is a Series.
– Franco Piccolo
Nov 13 '18 at 17:04
|
show 3 more comments
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283182%2fpandas-datatype-change-within-a-function%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283182%2fpandas-datatype-change-within-a-function%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Could you provide some input data and expected output?
– Franco Piccolo
Nov 13 '18 at 14:30
I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue
– BAC83
Nov 13 '18 at 14:46
Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.
– Franco Piccolo
Nov 13 '18 at 14:58
1
It works with the others because they are probably dataframes.
– Franco Piccolo
Nov 13 '18 at 15:14
1
It will work as long as it is a DataFrame. Not when it is a Series.
– Franco Piccolo
Nov 13 '18 at 17:04