Replace Nan with 0 at where feature is missing in dataframe

-2

I am working on a dataset with missing values. The head of the dataset looks like this:

+1 1:0.2 2:0.7 3:-1.2 4:0.5

-1 1:0.9 3:0.1 4:0.8

-1 1:-0.1 2:0.1 4:1.0

+1 2:0.6 3:-1.0

The first column is the label of the data, and the number in front of the colon is the index of the feature. Some features are missing at some rows. So when I import the data using the following code,

df = pandas.read_csv('dataset',header=None,sep = 's+|:',engine='python',dtype=float)

I get a dataframe looks like

    0       1       2       3       4       5       6       7       8

0   1.0     1.0     0.2     2.0     0.7     3.0     -1.2    4.0     0.5

1   -1.0    1.0     0.9     3.0     0.1     4.0     0.8     NaN     NaN

2   -1.0    1.0     -0.1    2.0     0.1     4.0     1.0     NaN     NaN

3   1.0     2.0     0.6     3.0     -1.0    NaN     NaN     NaN     NaN

I want to replace the NaNs with 0s in the correct place. But if I use df.fillna(0), I will replace the NaN at the end of each row, which looks like

    0       1       2       3       4       5       6       7       8

0   1.0     1.0     0.2     2.0     0.7     3.0     -1.2    4.0     0.5

1   -1.0    1.0     0.9     3.0     0.1     4.0     0.8     0.0     0.0

2   -1.0    1.0     -0.1    2.0     0.1     4.0     1.0     0.0     0.0

3   1.0     2.0     0.6     3.0     -1.0    0.0     0.0     0.0     0.0

What I really want is a dataframe looks like this,

    0       1       2       3       4       5       6       7       8

0   1.0     1.0     0.2     2.0     0.7     3.0     -1.2    4.0     0.5

1   -1.0    1.0     0.9     0.0     0.0     3.0     0.1     4.0     0.8

2   -1.0    1.0     -0.1    2.0     0.1     0.0     0.0     4.0     1.0

3   1.0     0.0     0.0     2.0     0.6     3.0     -1.0    0.0     0.0

So after I drop the index I should have

    0       1       2       3       4     

0   1.0     0.2     0.7     -1.2    0.5

1   -1.0    0.9     0.0     0.1     0.8

2   -1.0    -0.1    0.1     0.0     1.0

3   1.0     0.0     0.6     -1.0    0.0

edited Nov 15 '18 at 17:59

asked Nov 15 '18 at 17:49

Neyo Yang

615

3

Your question is confusing. You say you want to replace the NaNs with 0, but you say that fillna(0) replaces the NaNs with 0, and you don't want that. Are you instead looking for dropna(axis=1)?

– G. Anderson
Nov 15 '18 at 17:52

1

Can you double check your df you posted under "What I really want is a dataframe to that looks like this"? Not sure how you went from 9 -> 5 columns

– Capn Jack
Nov 15 '18 at 17:52

1

@CapnJack, also different values in some of the columns

– G. Anderson
Nov 15 '18 at 17:53

1

@BrianJoseph, that sounds like dropna() with extra steps. Looking at he values, it seems like OP wants to shift values from the ends of the rows into earlier columns...flagged for being unclear

– G. Anderson
Nov 15 '18 at 17:57

1

I think in each row of input the number in front of the : is supposed to be the correct column for the value after it, so pandas.read_csv is probably the problem here.

– BurningKarl
Nov 15 '18 at 18:09

|
show 3 more comments

-2

I am working on a dataset with missing values. The head of the dataset looks like this:

+1 1:0.2 2:0.7 3:-1.2 4:0.5

-1 1:0.9 3:0.1 4:0.8

-1 1:-0.1 2:0.1 4:1.0

+1 2:0.6 3:-1.0

df = pandas.read_csv('dataset',header=None,sep = 's+|:',engine='python',dtype=float)

I get a dataframe looks like

    0       1       2       3       4       5       6       7       8

0   1.0     1.0     0.2     2.0     0.7     3.0     -1.2    4.0     0.5

1   -1.0    1.0     0.9     3.0     0.1     4.0     0.8     NaN     NaN

2   -1.0    1.0     -0.1    2.0     0.1     4.0     1.0     NaN     NaN

3   1.0     2.0     0.6     3.0     -1.0    NaN     NaN     NaN     NaN

I want to replace the NaNs with 0s in the correct place. But if I use df.fillna(0), I will replace the NaN at the end of each row, which looks like

    0       1       2       3       4       5       6       7       8

0   1.0     1.0     0.2     2.0     0.7     3.0     -1.2    4.0     0.5

1   -1.0    1.0     0.9     3.0     0.1     4.0     0.8     0.0     0.0

2   -1.0    1.0     -0.1    2.0     0.1     4.0     1.0     0.0     0.0

3   1.0     2.0     0.6     3.0     -1.0    0.0     0.0     0.0     0.0

What I really want is a dataframe looks like this,

    0       1       2       3       4       5       6       7       8

0   1.0     1.0     0.2     2.0     0.7     3.0     -1.2    4.0     0.5

1   -1.0    1.0     0.9     0.0     0.0     3.0     0.1     4.0     0.8

2   -1.0    1.0     -0.1    2.0     0.1     0.0     0.0     4.0     1.0

3   1.0     0.0     0.0     2.0     0.6     3.0     -1.0    0.0     0.0

So after I drop the index I should have

    0       1       2       3       4     

0   1.0     0.2     0.7     -1.2    0.5

1   -1.0    0.9     0.0     0.1     0.8

2   -1.0    -0.1    0.1     0.0     1.0

3   1.0     0.0     0.6     -1.0    0.0

edited Nov 15 '18 at 17:59

asked Nov 15 '18 at 17:49

Neyo Yang

615

3

Your question is confusing. You say you want to replace the NaNs with 0, but you say that fillna(0) replaces the NaNs with 0, and you don't want that. Are you instead looking for dropna(axis=1)?

– G. Anderson
Nov 15 '18 at 17:52

1

Can you double check your df you posted under "What I really want is a dataframe to that looks like this"? Not sure how you went from 9 -> 5 columns

– Capn Jack
Nov 15 '18 at 17:52

1

@CapnJack, also different values in some of the columns

– G. Anderson
Nov 15 '18 at 17:53

1

@BrianJoseph, that sounds like dropna() with extra steps. Looking at he values, it seems like OP wants to shift values from the ends of the rows into earlier columns...flagged for being unclear

– G. Anderson
Nov 15 '18 at 17:57

1

I think in each row of input the number in front of the : is supposed to be the correct column for the value after it, so pandas.read_csv is probably the problem here.

– BurningKarl
Nov 15 '18 at 18:09

|
show 3 more comments

-2

I am working on a dataset with missing values. The head of the dataset looks like this:

+1 1:0.2 2:0.7 3:-1.2 4:0.5

-1 1:0.9 3:0.1 4:0.8

-1 1:-0.1 2:0.1 4:1.0

+1 2:0.6 3:-1.0

df = pandas.read_csv('dataset',header=None,sep = 's+|:',engine='python',dtype=float)

I get a dataframe looks like

    0       1       2       3       4       5       6       7       8

0   1.0     1.0     0.2     2.0     0.7     3.0     -1.2    4.0     0.5

1   -1.0    1.0     0.9     3.0     0.1     4.0     0.8     NaN     NaN

2   -1.0    1.0     -0.1    2.0     0.1     4.0     1.0     NaN     NaN

3   1.0     2.0     0.6     3.0     -1.0    NaN     NaN     NaN     NaN

I want to replace the NaNs with 0s in the correct place. But if I use df.fillna(0), I will replace the NaN at the end of each row, which looks like

    0       1       2       3       4       5       6       7       8

0   1.0     1.0     0.2     2.0     0.7     3.0     -1.2    4.0     0.5

1   -1.0    1.0     0.9     3.0     0.1     4.0     0.8     0.0     0.0

2   -1.0    1.0     -0.1    2.0     0.1     4.0     1.0     0.0     0.0

3   1.0     2.0     0.6     3.0     -1.0    0.0     0.0     0.0     0.0

What I really want is a dataframe looks like this,

    0       1       2       3       4       5       6       7       8

0   1.0     1.0     0.2     2.0     0.7     3.0     -1.2    4.0     0.5

1   -1.0    1.0     0.9     0.0     0.0     3.0     0.1     4.0     0.8

2   -1.0    1.0     -0.1    2.0     0.1     0.0     0.0     4.0     1.0

3   1.0     0.0     0.0     2.0     0.6     3.0     -1.0    0.0     0.0

So after I drop the index I should have

    0       1       2       3       4     

0   1.0     0.2     0.7     -1.2    0.5

1   -1.0    0.9     0.0     0.1     0.8

2   -1.0    -0.1    0.1     0.0     1.0

3   1.0     0.0     0.6     -1.0    0.0

edited Nov 15 '18 at 17:59

asked Nov 15 '18 at 17:49

Neyo Yang

615

I am working on a dataset with missing values. The head of the dataset looks like this:

+1 1:0.2 2:0.7 3:-1.2 4:0.5

-1 1:0.9 3:0.1 4:0.8

-1 1:-0.1 2:0.1 4:1.0

+1 2:0.6 3:-1.0

df = pandas.read_csv('dataset',header=None,sep = 's+|:',engine='python',dtype=float)

I get a dataframe looks like

    0       1       2       3       4       5       6       7       8

0   1.0     1.0     0.2     2.0     0.7     3.0     -1.2    4.0     0.5

1   -1.0    1.0     0.9     3.0     0.1     4.0     0.8     NaN     NaN

2   -1.0    1.0     -0.1    2.0     0.1     4.0     1.0     NaN     NaN

3   1.0     2.0     0.6     3.0     -1.0    NaN     NaN     NaN     NaN

I want to replace the NaNs with 0s in the correct place. But if I use df.fillna(0), I will replace the NaN at the end of each row, which looks like

    0       1       2       3       4       5       6       7       8

0   1.0     1.0     0.2     2.0     0.7     3.0     -1.2    4.0     0.5

1   -1.0    1.0     0.9     3.0     0.1     4.0     0.8     0.0     0.0

2   -1.0    1.0     -0.1    2.0     0.1     4.0     1.0     0.0     0.0

3   1.0     2.0     0.6     3.0     -1.0    0.0     0.0     0.0     0.0

What I really want is a dataframe looks like this,

    0       1       2       3       4       5       6       7       8

0   1.0     1.0     0.2     2.0     0.7     3.0     -1.2    4.0     0.5

1   -1.0    1.0     0.9     0.0     0.0     3.0     0.1     4.0     0.8

2   -1.0    1.0     -0.1    2.0     0.1     0.0     0.0     4.0     1.0

3   1.0     0.0     0.0     2.0     0.6     3.0     -1.0    0.0     0.0

So after I drop the index I should have

    0       1       2       3       4     

0   1.0     0.2     0.7     -1.2    0.5

1   -1.0    0.9     0.0     0.1     0.8

2   -1.0    -0.1    0.1     0.0     1.0

3   1.0     0.0     0.6     -1.0    0.0

python pandas

edited Nov 15 '18 at 17:59

asked Nov 15 '18 at 17:49

Neyo Yang

615

edited Nov 15 '18 at 17:59

asked Nov 15 '18 at 17:49

Neyo Yang

615

edited Nov 15 '18 at 17:59

asked Nov 15 '18 at 17:49

Neyo Yang

615

asked Nov 15 '18 at 17:49

Neyo Yang

615

asked Nov 15 '18 at 17:49

Neyo Yang

615

3

Your question is confusing. You say you want to replace the NaNs with 0, but you say that fillna(0) replaces the NaNs with 0, and you don't want that. Are you instead looking for dropna(axis=1)?

– G. Anderson
Nov 15 '18 at 17:52

1

Can you double check your df you posted under "What I really want is a dataframe to that looks like this"? Not sure how you went from 9 -> 5 columns

– Capn Jack
Nov 15 '18 at 17:52

1

@CapnJack, also different values in some of the columns

– G. Anderson
Nov 15 '18 at 17:53

1

@BrianJoseph, that sounds like dropna() with extra steps. Looking at he values, it seems like OP wants to shift values from the ends of the rows into earlier columns...flagged for being unclear

– G. Anderson
Nov 15 '18 at 17:57

1

I think in each row of input the number in front of the : is supposed to be the correct column for the value after it, so pandas.read_csv is probably the problem here.

– BurningKarl
Nov 15 '18 at 18:09

|
show 3 more comments

3

Your question is confusing. You say you want to replace the NaNs with 0, but you say that fillna(0) replaces the NaNs with 0, and you don't want that. Are you instead looking for dropna(axis=1)?

– G. Anderson
Nov 15 '18 at 17:52

1

Can you double check your df you posted under "What I really want is a dataframe to that looks like this"? Not sure how you went from 9 -> 5 columns

– Capn Jack
Nov 15 '18 at 17:52

1

@CapnJack, also different values in some of the columns

– G. Anderson
Nov 15 '18 at 17:53

1

@BrianJoseph, that sounds like dropna() with extra steps. Looking at he values, it seems like OP wants to shift values from the ends of the rows into earlier columns...flagged for being unclear

– G. Anderson
Nov 15 '18 at 17:57

1

I think in each row of input the number in front of the : is supposed to be the correct column for the value after it, so pandas.read_csv is probably the problem here.

– BurningKarl
Nov 15 '18 at 18:09

Your question is confusing. You say you want to replace the NaNs with 0, but you say that fillna(0) replaces the NaNs with 0, and you don't want that. Are you instead looking for dropna(axis=1)?

– G. Anderson
Nov 15 '18 at 17:52

Can you double check your df you posted under "What I really want is a dataframe to that looks like this"? Not sure how you went from 9 -> 5 columns

– Capn Jack
Nov 15 '18 at 17:52

@CapnJack, also different values in some of the columns

– G. Anderson
Nov 15 '18 at 17:53

@BrianJoseph, that sounds like dropna() with extra steps. Looking at he values, it seems like OP wants to shift values from the ends of the rows into earlier columns...flagged for being unclear

– G. Anderson
Nov 15 '18 at 17:57

I think in each row of input the number in front of the : is supposed to be the correct column for the value after it, so pandas.read_csv is probably the problem here.

– BurningKarl
Nov 15 '18 at 18:09

|
show 3 more comments

1 Answer
1

active

oldest

votes

The problem isn't with filling N/A values, as @BurningKarl suggested in the comments, the problem is trying to read in file with read_csv that isn't in any way a csv or csv-like file. You will likely need to parse this file differently.

If it helps you get started, I have posted a snippet below that shows how to get the data formatted to ingest into a proper dataframe, according to what you say you need. If you can parse your file with file.readlines into a list of dictionaries, you can just wrap that in a DataFrame constructor. (Note, this parsing will likely take some effort to get it exactly right)

x=[{0:1,1:0.2, 2:0.7, 3:-1.2, 4:0.5},

{0:-1,1:0.9, 3:0.1, 4:0.8},

{0:-1,1:-0.1, 2:0.1, 4:1.0},

{0:1,2:0.6, 3:-1.0}]



pd.DataFrame(x)

gives you

    0    1       2      3       4

0   1    0.2     0.7    -1.2    0.5

1   -1   0.9     NaN    0.1     0.8

2   -1   -0.1    0.1    NaN     1.0

3   1    NaN     0.6    -1.0    NaN

and then you can just fillna(0) as you tried before

edited Nov 15 '18 at 19:14

answered Nov 15 '18 at 19:08

G. Anderson

1,6941311

I used csv.reader instead of readlines and followed your suggestion, and it works.

– Neyo Yang
Nov 15 '18 at 20:46

I'm glad I was able to help. Don't forget to accept the answer if you feel like it's warranted.

– G. Anderson
Nov 15 '18 at 21:37

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53325254%2freplace-nan-with-0-at-where-feature-is-missing-in-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

x=[{0:1,1:0.2, 2:0.7, 3:-1.2, 4:0.5},

{0:-1,1:0.9, 3:0.1, 4:0.8},

{0:-1,1:-0.1, 2:0.1, 4:1.0},

{0:1,2:0.6, 3:-1.0}]



pd.DataFrame(x)

gives you

    0    1       2      3       4

0   1    0.2     0.7    -1.2    0.5

1   -1   0.9     NaN    0.1     0.8

2   -1   -0.1    0.1    NaN     1.0

3   1    NaN     0.6    -1.0    NaN

and then you can just fillna(0) as you tried before

edited Nov 15 '18 at 19:14

answered Nov 15 '18 at 19:08

G. Anderson

1,6941311

I used csv.reader instead of readlines and followed your suggestion, and it works.

– Neyo Yang
Nov 15 '18 at 20:46

I'm glad I was able to help. Don't forget to accept the answer if you feel like it's warranted.

– G. Anderson
Nov 15 '18 at 21:37

add a comment |

x=[{0:1,1:0.2, 2:0.7, 3:-1.2, 4:0.5},

{0:-1,1:0.9, 3:0.1, 4:0.8},

{0:-1,1:-0.1, 2:0.1, 4:1.0},

{0:1,2:0.6, 3:-1.0}]



pd.DataFrame(x)

gives you

    0    1       2      3       4

0   1    0.2     0.7    -1.2    0.5

1   -1   0.9     NaN    0.1     0.8

2   -1   -0.1    0.1    NaN     1.0

3   1    NaN     0.6    -1.0    NaN

and then you can just fillna(0) as you tried before

edited Nov 15 '18 at 19:14

answered Nov 15 '18 at 19:08

G. Anderson

1,6941311

I used csv.reader instead of readlines and followed your suggestion, and it works.

– Neyo Yang
Nov 15 '18 at 20:46

I'm glad I was able to help. Don't forget to accept the answer if you feel like it's warranted.

– G. Anderson
Nov 15 '18 at 21:37

add a comment |

x=[{0:1,1:0.2, 2:0.7, 3:-1.2, 4:0.5},

{0:-1,1:0.9, 3:0.1, 4:0.8},

{0:-1,1:-0.1, 2:0.1, 4:1.0},

{0:1,2:0.6, 3:-1.0}]



pd.DataFrame(x)

gives you

    0    1       2      3       4

0   1    0.2     0.7    -1.2    0.5

1   -1   0.9     NaN    0.1     0.8

2   -1   -0.1    0.1    NaN     1.0

3   1    NaN     0.6    -1.0    NaN

and then you can just fillna(0) as you tried before

edited Nov 15 '18 at 19:14

answered Nov 15 '18 at 19:08

G. Anderson

1,6941311

x=[{0:1,1:0.2, 2:0.7, 3:-1.2, 4:0.5},

{0:-1,1:0.9, 3:0.1, 4:0.8},

{0:-1,1:-0.1, 2:0.1, 4:1.0},

{0:1,2:0.6, 3:-1.0}]



pd.DataFrame(x)

gives you

    0    1       2      3       4

0   1    0.2     0.7    -1.2    0.5

1   -1   0.9     NaN    0.1     0.8

2   -1   -0.1    0.1    NaN     1.0

3   1    NaN     0.6    -1.0    NaN

and then you can just fillna(0) as you tried before

edited Nov 15 '18 at 19:14

answered Nov 15 '18 at 19:08

G. Anderson

1,6941311

edited Nov 15 '18 at 19:14

answered Nov 15 '18 at 19:08

G. Anderson

1,6941311

answered Nov 15 '18 at 19:08

G. Anderson

1,6941311

answered Nov 15 '18 at 19:08

G. Anderson

1,6941311

I used csv.reader instead of readlines and followed your suggestion, and it works.

– Neyo Yang
Nov 15 '18 at 20:46

I'm glad I was able to help. Don't forget to accept the answer if you feel like it's warranted.

– G. Anderson
Nov 15 '18 at 21:37

add a comment |

I used csv.reader instead of readlines and followed your suggestion, and it works.

– Neyo Yang
Nov 15 '18 at 20:46

I'm glad I was able to help. Don't forget to accept the answer if you feel like it's warranted.

– G. Anderson
Nov 15 '18 at 21:37

I used csv.reader instead of readlines and followed your suggestion, and it works.

– Neyo Yang
Nov 15 '18 at 20:46

I'm glad I was able to help. Don't forget to accept the answer if you feel like it's warranted.

– G. Anderson
Nov 15 '18 at 21:37

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

O ILrJ8T7BwJEnpBgpNcwKrMqUKBG,yFobOa,fRH,x QF,q 1FQSxScIXB,lZ2AJLdk qmXglEHqF,uQtDc,ObNywxY5JniXHpBmbummp n

搜尋此網誌

Vfrdtyky