Pandas datatype change within a function

General background

I've written a function which incorporates a MySQL query, with some munging on the returned data (pulled into a pandas df).

enginedb =create_engine("mysql+mysqlconnector://user:pswd@10.0.10.26:3306/db", 

                   encoding='latin1')



query = ("""Select blah blah""")



df = pd.read_sql(query, enginedb)

This works fine - the query is a significant one with multiple joins etc.. However, it transpired for a certain lot within the db, the datatype was off: for almost all 'normal' lots, the datatypes for the columns were int64, some object, a datetime64[ns]... but for one lot (so far), all but the datetime were returning as object.

Issue

I need to do a stack - one of the columns is a list, and i've got some code to take each item of the list and stack them down row by row:

cols = list(df)

cols = cols[:-1]   

df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()

The problem is this doesn't work for this 'odd' lot, with the non-standard datatypes (the reason for the non-std data types is due to an upstream ETL process and i can't affect that).
The exact error is:
'Series' object has no attribute 'stack'

Consequently I had incorporated an if/else statement, checking to see if the dtype of one of the cols is incorrect, and if so, change it:

if df['id'].dtype == 'int64':

            df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()

            df_stack = df_stack.reset_index()



else:

     df_stack = df.apply(pd.to_numeric, errors = 'coerce') 

# it can't be more specific than for all the columns, as there are a LOT

But this is having no effect - i've included in the function (containing the query and subsequent munging) a print out statement of dy.dtypes and df_stack.dtypes and the function is having no effect.

Why is this?

EDIT

I've added this picture to show the code (at right) which is attempting to catch the incorrectly-dtyped lot (12384), and the print-outs before and after the pd.to_numeric function (which both show just objects, no numeric cols).

enter image description here

My underlying questions has two parts:

What would cause 'Series' object has no attribute 'stack'? (more fundamentally than wrong datatype - or at least why is datatype an issue?)

Why would a pd.numeric not cause any change here?

edited Nov 13 '18 at 14:53

asked Nov 13 '18 at 14:25

BAC83

1159

1

Could you provide some input data and expected output?

– Franco Piccolo
Nov 13 '18 at 14:30

I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue

– BAC83
Nov 13 '18 at 14:46

Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.

– Franco Piccolo
Nov 13 '18 at 14:58

1

It works with the others because they are probably dataframes.

– Franco Piccolo
Nov 13 '18 at 15:14

1

It will work as long as it is a DataFrame. Not when it is a Series.

– Franco Piccolo
Nov 13 '18 at 17:04

|
show 3 more comments

General background

I've written a function which incorporates a MySQL query, with some munging on the returned data (pulled into a pandas df).

enginedb =create_engine("mysql+mysqlconnector://user:pswd@10.0.10.26:3306/db", 

                   encoding='latin1')



query = ("""Select blah blah""")



df = pd.read_sql(query, enginedb)

Issue

I need to do a stack - one of the columns is a list, and i've got some code to take each item of the list and stack them down row by row:

cols = list(df)

cols = cols[:-1]   

df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()

Consequently I had incorporated an if/else statement, checking to see if the dtype of one of the cols is incorrect, and if so, change it:

if df['id'].dtype == 'int64':

            df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()

            df_stack = df_stack.reset_index()



else:

     df_stack = df.apply(pd.to_numeric, errors = 'coerce') 

# it can't be more specific than for all the columns, as there are a LOT

Why is this?

EDIT

enter image description here

My underlying questions has two parts:

What would cause 'Series' object has no attribute 'stack'? (more fundamentally than wrong datatype - or at least why is datatype an issue?)

Why would a pd.numeric not cause any change here?

edited Nov 13 '18 at 14:53

asked Nov 13 '18 at 14:25

BAC83

1159

1

Could you provide some input data and expected output?

– Franco Piccolo
Nov 13 '18 at 14:30

I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue

– BAC83
Nov 13 '18 at 14:46

Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.

– Franco Piccolo
Nov 13 '18 at 14:58

1

It works with the others because they are probably dataframes.

– Franco Piccolo
Nov 13 '18 at 15:14

1

It will work as long as it is a DataFrame. Not when it is a Series.

– Franco Piccolo
Nov 13 '18 at 17:04

|
show 3 more comments

General background

I've written a function which incorporates a MySQL query, with some munging on the returned data (pulled into a pandas df).

enginedb =create_engine("mysql+mysqlconnector://user:pswd@10.0.10.26:3306/db", 

                   encoding='latin1')



query = ("""Select blah blah""")



df = pd.read_sql(query, enginedb)

Issue

I need to do a stack - one of the columns is a list, and i've got some code to take each item of the list and stack them down row by row:

cols = list(df)

cols = cols[:-1]   

df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()

Consequently I had incorporated an if/else statement, checking to see if the dtype of one of the cols is incorrect, and if so, change it:

if df['id'].dtype == 'int64':

            df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()

            df_stack = df_stack.reset_index()



else:

     df_stack = df.apply(pd.to_numeric, errors = 'coerce') 

# it can't be more specific than for all the columns, as there are a LOT

Why is this?

EDIT

enter image description here

My underlying questions has two parts:

What would cause 'Series' object has no attribute 'stack'? (more fundamentally than wrong datatype - or at least why is datatype an issue?)

Why would a pd.numeric not cause any change here?

edited Nov 13 '18 at 14:53

asked Nov 13 '18 at 14:25

BAC83

1159

General background

I've written a function which incorporates a MySQL query, with some munging on the returned data (pulled into a pandas df).

enginedb =create_engine("mysql+mysqlconnector://user:pswd@10.0.10.26:3306/db", 

                   encoding='latin1')



query = ("""Select blah blah""")



df = pd.read_sql(query, enginedb)

Issue

I need to do a stack - one of the columns is a list, and i've got some code to take each item of the list and stack them down row by row:

cols = list(df)

cols = cols[:-1]   

df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()

Consequently I had incorporated an if/else statement, checking to see if the dtype of one of the cols is incorrect, and if so, change it:

if df['id'].dtype == 'int64':

            df_stack = df.set_index(cols)['data'].apply(pd.Series).stack()

            df_stack = df_stack.reset_index()



else:

     df_stack = df.apply(pd.to_numeric, errors = 'coerce') 

# it can't be more specific than for all the columns, as there are a LOT

Why is this?

EDIT

enter image description here

My underlying questions has two parts:

What would cause 'Series' object has no attribute 'stack'? (more fundamentally than wrong datatype - or at least why is datatype an issue?)

Why would a pd.numeric not cause any change here?

python pandas

edited Nov 13 '18 at 14:53

asked Nov 13 '18 at 14:25

BAC83

1159

edited Nov 13 '18 at 14:53

asked Nov 13 '18 at 14:25

BAC83

1159

edited Nov 13 '18 at 14:53

asked Nov 13 '18 at 14:25

BAC83

1159

asked Nov 13 '18 at 14:25

BAC83

1159

asked Nov 13 '18 at 14:25

BAC83

1159

1

Could you provide some input data and expected output?

– Franco Piccolo
Nov 13 '18 at 14:30

I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue

– BAC83
Nov 13 '18 at 14:46

Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.

– Franco Piccolo
Nov 13 '18 at 14:58

1

It works with the others because they are probably dataframes.

– Franco Piccolo
Nov 13 '18 at 15:14

1

It will work as long as it is a DataFrame. Not when it is a Series.

– Franco Piccolo
Nov 13 '18 at 17:04

|
show 3 more comments

1

Could you provide some input data and expected output?

– Franco Piccolo
Nov 13 '18 at 14:30

I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue

– BAC83
Nov 13 '18 at 14:46

Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.

– Franco Piccolo
Nov 13 '18 at 14:58

1

It works with the others because they are probably dataframes.

– Franco Piccolo
Nov 13 '18 at 15:14

1

It will work as long as it is a DataFrame. Not when it is a Series.

– Franco Piccolo
Nov 13 '18 at 17:04

Could you provide some input data and expected output?

– Franco Piccolo
Nov 13 '18 at 14:30

I've added some outputs - it's hard to give example input data (privacy issues aside), but hopefully this shows the issue

– BAC83
Nov 13 '18 at 14:46

Ideally you should provide an input dataframe with an expected output, otherwise its impossible to reproduce your issue.. Regarding to your last question, you can't apply stack to a Series.

– Franco Piccolo
Nov 13 '18 at 14:58

It works with the others because they are probably dataframes.

– Franco Piccolo
Nov 13 '18 at 15:14

It will work as long as it is a DataFrame. Not when it is a Series.

– Franco Piccolo
Nov 13 '18 at 17:04

|
show 3 more comments

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283182%2fpandas-datatype-change-within-a-function%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

xSYUdP snlI4 8nxQMDTxLBT2xydtM2W,l5R 4r,H6hx49mVBfo0Ek7ysJF8gw2OP,mpr53rM qUcwBb oPdMRUr

搜尋此網誌

Vfrdtyky