Pandas datatype change within a function












0















General background



I've written a function which incorporates a MySQL query, with some munging on the returned data (pulled into a pandas df).



enginedb =create_engine("mysql+mysqlconnector://user:pswd@10.0.10.26:3306/db", 
encoding='latin1')

query = ("""Select blah blah""")

df = pd.read_sql(query, enginedb)


This works fine - the query is a significant one with multiple joins etc.. However, it transpired for a certain lot within the db, the datatype was off: for almost all 'normal' lots, the datatypes for the columns were int64, some object, a datetime64[ns]... but for one lot (so far), all but the datetime were returning as object.



Issue



I need to do a stack - one of the columns is a list, and i've got some code to take each item of the list and stack them down row by row:



cols = list(df)
cols = cols[:-1]
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()


The problem is this doesn't work for this 'odd' lot, with the non-standard datatypes (the reason for the non-std data types is due to an upstream ETL process and i can't affect that).
The exact error is:
'Series' object has no attribute 'stack'



Consequently I had incorporated an if/else statement, checking to see if the dtype of one of the cols is incorrect, and if so, change it:



if df['id'].dtype == 'int64':
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()
df_stack = df_stack.reset_index()

else:
df_stack = df.apply(pd.to_numeric, errors = 'coerce')
# it can't be more specific than for all the columns, as there are a LOT


But this is having no effect - i've included in the function (containing the query and subsequent munging) a print out statement of dy.dtypes and df_stack.dtypes and the function is having no effect.



Why is this?



EDIT



I've added this picture to show the code (at right) which is attempting to catch the incorrectly-dtyped lot (12384), and the print-outs before and after the pd.to_numeric function (which both show just objects, no numeric cols).



enter image description here



My underlying questions has two parts:




  1. What would cause 'Series' object has no attribute 'stack'? (more fundamentally than wrong datatype - or at least why is datatype an issue?)

  2. Why would a pd.numeric not cause any change here?










share|improve this question




















  • 1





    Could you provide some input data and expected output?

    – Franco Piccolo
    Nov 13 '18 at 14:30











  • I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue

    – BAC83
    Nov 13 '18 at 14:46











  • Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.

    – Franco Piccolo
    Nov 13 '18 at 14:58






  • 1





    It works with the others because they are probably dataframes.

    – Franco Piccolo
    Nov 13 '18 at 15:14






  • 1





    It will work as long as it is a DataFrame. Not when it is a Series.

    – Franco Piccolo
    Nov 13 '18 at 17:04
















0















General background



I've written a function which incorporates a MySQL query, with some munging on the returned data (pulled into a pandas df).



enginedb =create_engine("mysql+mysqlconnector://user:pswd@10.0.10.26:3306/db", 
encoding='latin1')

query = ("""Select blah blah""")

df = pd.read_sql(query, enginedb)


This works fine - the query is a significant one with multiple joins etc.. However, it transpired for a certain lot within the db, the datatype was off: for almost all 'normal' lots, the datatypes for the columns were int64, some object, a datetime64[ns]... but for one lot (so far), all but the datetime were returning as object.



Issue



I need to do a stack - one of the columns is a list, and i've got some code to take each item of the list and stack them down row by row:



cols = list(df)
cols = cols[:-1]
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()


The problem is this doesn't work for this 'odd' lot, with the non-standard datatypes (the reason for the non-std data types is due to an upstream ETL process and i can't affect that).
The exact error is:
'Series' object has no attribute 'stack'



Consequently I had incorporated an if/else statement, checking to see if the dtype of one of the cols is incorrect, and if so, change it:



if df['id'].dtype == 'int64':
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()
df_stack = df_stack.reset_index()

else:
df_stack = df.apply(pd.to_numeric, errors = 'coerce')
# it can't be more specific than for all the columns, as there are a LOT


But this is having no effect - i've included in the function (containing the query and subsequent munging) a print out statement of dy.dtypes and df_stack.dtypes and the function is having no effect.



Why is this?



EDIT



I've added this picture to show the code (at right) which is attempting to catch the incorrectly-dtyped lot (12384), and the print-outs before and after the pd.to_numeric function (which both show just objects, no numeric cols).



enter image description here



My underlying questions has two parts:




  1. What would cause 'Series' object has no attribute 'stack'? (more fundamentally than wrong datatype - or at least why is datatype an issue?)

  2. Why would a pd.numeric not cause any change here?










share|improve this question




















  • 1





    Could you provide some input data and expected output?

    – Franco Piccolo
    Nov 13 '18 at 14:30











  • I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue

    – BAC83
    Nov 13 '18 at 14:46











  • Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.

    – Franco Piccolo
    Nov 13 '18 at 14:58






  • 1





    It works with the others because they are probably dataframes.

    – Franco Piccolo
    Nov 13 '18 at 15:14






  • 1





    It will work as long as it is a DataFrame. Not when it is a Series.

    – Franco Piccolo
    Nov 13 '18 at 17:04














0












0








0








General background



I've written a function which incorporates a MySQL query, with some munging on the returned data (pulled into a pandas df).



enginedb =create_engine("mysql+mysqlconnector://user:pswd@10.0.10.26:3306/db", 
encoding='latin1')

query = ("""Select blah blah""")

df = pd.read_sql(query, enginedb)


This works fine - the query is a significant one with multiple joins etc.. However, it transpired for a certain lot within the db, the datatype was off: for almost all 'normal' lots, the datatypes for the columns were int64, some object, a datetime64[ns]... but for one lot (so far), all but the datetime were returning as object.



Issue



I need to do a stack - one of the columns is a list, and i've got some code to take each item of the list and stack them down row by row:



cols = list(df)
cols = cols[:-1]
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()


The problem is this doesn't work for this 'odd' lot, with the non-standard datatypes (the reason for the non-std data types is due to an upstream ETL process and i can't affect that).
The exact error is:
'Series' object has no attribute 'stack'



Consequently I had incorporated an if/else statement, checking to see if the dtype of one of the cols is incorrect, and if so, change it:



if df['id'].dtype == 'int64':
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()
df_stack = df_stack.reset_index()

else:
df_stack = df.apply(pd.to_numeric, errors = 'coerce')
# it can't be more specific than for all the columns, as there are a LOT


But this is having no effect - i've included in the function (containing the query and subsequent munging) a print out statement of dy.dtypes and df_stack.dtypes and the function is having no effect.



Why is this?



EDIT



I've added this picture to show the code (at right) which is attempting to catch the incorrectly-dtyped lot (12384), and the print-outs before and after the pd.to_numeric function (which both show just objects, no numeric cols).



enter image description here



My underlying questions has two parts:




  1. What would cause 'Series' object has no attribute 'stack'? (more fundamentally than wrong datatype - or at least why is datatype an issue?)

  2. Why would a pd.numeric not cause any change here?










share|improve this question
















General background



I've written a function which incorporates a MySQL query, with some munging on the returned data (pulled into a pandas df).



enginedb =create_engine("mysql+mysqlconnector://user:pswd@10.0.10.26:3306/db", 
encoding='latin1')

query = ("""Select blah blah""")

df = pd.read_sql(query, enginedb)


This works fine - the query is a significant one with multiple joins etc.. However, it transpired for a certain lot within the db, the datatype was off: for almost all 'normal' lots, the datatypes for the columns were int64, some object, a datetime64[ns]... but for one lot (so far), all but the datetime were returning as object.



Issue



I need to do a stack - one of the columns is a list, and i've got some code to take each item of the list and stack them down row by row:



cols = list(df)
cols = cols[:-1]
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()


The problem is this doesn't work for this 'odd' lot, with the non-standard datatypes (the reason for the non-std data types is due to an upstream ETL process and i can't affect that).
The exact error is:
'Series' object has no attribute 'stack'



Consequently I had incorporated an if/else statement, checking to see if the dtype of one of the cols is incorrect, and if so, change it:



if df['id'].dtype == 'int64':
df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()
df_stack = df_stack.reset_index()

else:
df_stack = df.apply(pd.to_numeric, errors = 'coerce')
# it can't be more specific than for all the columns, as there are a LOT


But this is having no effect - i've included in the function (containing the query and subsequent munging) a print out statement of dy.dtypes and df_stack.dtypes and the function is having no effect.



Why is this?



EDIT



I've added this picture to show the code (at right) which is attempting to catch the incorrectly-dtyped lot (12384), and the print-outs before and after the pd.to_numeric function (which both show just objects, no numeric cols).



enter image description here



My underlying questions has two parts:




  1. What would cause 'Series' object has no attribute 'stack'? (more fundamentally than wrong datatype - or at least why is datatype an issue?)

  2. Why would a pd.numeric not cause any change here?







python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 14:53







BAC83

















asked Nov 13 '18 at 14:25









BAC83BAC83

1159




1159








  • 1





    Could you provide some input data and expected output?

    – Franco Piccolo
    Nov 13 '18 at 14:30











  • I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue

    – BAC83
    Nov 13 '18 at 14:46











  • Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.

    – Franco Piccolo
    Nov 13 '18 at 14:58






  • 1





    It works with the others because they are probably dataframes.

    – Franco Piccolo
    Nov 13 '18 at 15:14






  • 1





    It will work as long as it is a DataFrame. Not when it is a Series.

    – Franco Piccolo
    Nov 13 '18 at 17:04














  • 1





    Could you provide some input data and expected output?

    – Franco Piccolo
    Nov 13 '18 at 14:30











  • I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue

    – BAC83
    Nov 13 '18 at 14:46











  • Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.

    – Franco Piccolo
    Nov 13 '18 at 14:58






  • 1





    It works with the others because they are probably dataframes.

    – Franco Piccolo
    Nov 13 '18 at 15:14






  • 1





    It will work as long as it is a DataFrame. Not when it is a Series.

    – Franco Piccolo
    Nov 13 '18 at 17:04








1




1





Could you provide some input data and expected output?

– Franco Piccolo
Nov 13 '18 at 14:30





Could you provide some input data and expected output?

– Franco Piccolo
Nov 13 '18 at 14:30













I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue

– BAC83
Nov 13 '18 at 14:46





I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue

– BAC83
Nov 13 '18 at 14:46













Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.

– Franco Piccolo
Nov 13 '18 at 14:58





Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.

– Franco Piccolo
Nov 13 '18 at 14:58




1




1





It works with the others because they are probably dataframes.

– Franco Piccolo
Nov 13 '18 at 15:14





It works with the others because they are probably dataframes.

– Franco Piccolo
Nov 13 '18 at 15:14




1




1





It will work as long as it is a DataFrame. Not when it is a Series.

– Franco Piccolo
Nov 13 '18 at 17:04





It will work as long as it is a DataFrame. Not when it is a Series.

– Franco Piccolo
Nov 13 '18 at 17:04












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283182%2fpandas-datatype-change-within-a-function%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283182%2fpandas-datatype-change-within-a-function%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Bressuire

Vorschmack

Quarantine