Pandas create new column based on first unique values of existing column

I'm trying to add a new column to a dataframe with only unique values from an existing column. There will be fewer rows in the new column maybe with np.nan values where duplicates would have been.

import pandas as pd

import numpy as np



df = pd.DataFrame({'a':[1,2,3,4,5], 'b':[3,4,3,4,5]})

df



    a   b

0   1   3

1   2   4

2   3   3

3   4   4

4   5   5

Goal:

    a   b   c

0   1   3   3

1   2   4   4

2   3   3   nan

3   4   4   nan

4   5   5   5

I've tried:

df['c'] = np.where(df['b'].unique(), df['b'], np.nan)

It throws: operands could not be broadcast together with shapes (3,) (5,) ()

edited Nov 14 '18 at 17:43

jpp

101k2162111

asked Nov 14 '18 at 17:36

Derek_P

328215

add a comment |

I'm trying to add a new column to a dataframe with only unique values from an existing column. There will be fewer rows in the new column maybe with np.nan values where duplicates would have been.

import pandas as pd

import numpy as np



df = pd.DataFrame({'a':[1,2,3,4,5], 'b':[3,4,3,4,5]})

df



    a   b

0   1   3

1   2   4

2   3   3

3   4   4

4   5   5

Goal:

    a   b   c

0   1   3   3

1   2   4   4

2   3   3   nan

3   4   4   nan

4   5   5   5

I've tried:

df['c'] = np.where(df['b'].unique(), df['b'], np.nan)

It throws: operands could not be broadcast together with shapes (3,) (5,) ()

edited Nov 14 '18 at 17:43

jpp

101k2162111

asked Nov 14 '18 at 17:36

Derek_P

328215

add a comment |

I'm trying to add a new column to a dataframe with only unique values from an existing column. There will be fewer rows in the new column maybe with np.nan values where duplicates would have been.

import pandas as pd

import numpy as np



df = pd.DataFrame({'a':[1,2,3,4,5], 'b':[3,4,3,4,5]})

df



    a   b

0   1   3

1   2   4

2   3   3

3   4   4

4   5   5

Goal:

    a   b   c

0   1   3   3

1   2   4   4

2   3   3   nan

3   4   4   nan

4   5   5   5

I've tried:

df['c'] = np.where(df['b'].unique(), df['b'], np.nan)

It throws: operands could not be broadcast together with shapes (3,) (5,) ()

edited Nov 14 '18 at 17:43

jpp

101k2162111

asked Nov 14 '18 at 17:36

Derek_P

328215

I'm trying to add a new column to a dataframe with only unique values from an existing column. There will be fewer rows in the new column maybe with np.nan values where duplicates would have been.

import pandas as pd

import numpy as np



df = pd.DataFrame({'a':[1,2,3,4,5], 'b':[3,4,3,4,5]})

df



    a   b

0   1   3

1   2   4

2   3   3

3   4   4

4   5   5

Goal:

    a   b   c

0   1   3   3

1   2   4   4

2   3   3   nan

3   4   4   nan

4   5   5   5

I've tried:

df['c'] = np.where(df['b'].unique(), df['b'], np.nan)

It throws: operands could not be broadcast together with shapes (3,) (5,) ()

python python-3.x pandas numpy unique

edited Nov 14 '18 at 17:43

jpp

101k2162111

asked Nov 14 '18 at 17:36

Derek_P

328215

edited Nov 14 '18 at 17:43

jpp

101k2162111

asked Nov 14 '18 at 17:36

Derek_P

328215

edited Nov 14 '18 at 17:43

jpp

101k2162111

edited Nov 14 '18 at 17:43

jpp

101k2162111

edited Nov 14 '18 at 17:43

jpp

101k2162111

asked Nov 14 '18 at 17:36

Derek_P

328215

asked Nov 14 '18 at 17:36

Derek_P

328215

asked Nov 14 '18 at 17:36

Derek_P

328215

add a comment |

3 Answers
3

active

oldest

votes

`mask` + `duplicated`

You can use Pandas methods for masking a series:

df['c'] = df['b'].mask(df['b'].duplicated())



print(df)



   a  b    c

0  1  3  3.0

1  2  4  4.0

2  3  3  NaN

3  4  4  NaN

4  5  5  5.0

answered Nov 14 '18 at 17:42

jpp

101k2162111

add a comment |

Use duplicated with np.where:

df['c'] = np.where(df['b'].duplicated(),np.nan,df['b'])

Or:

df['c'] = df['b'].where(~df['b'].duplicated(),np.nan)

print(df)

   a  b    c

0  1  3  3.0

1  2  4  4.0

2  3  3  NaN

3  4  4  NaN

4  5  5  5.0

edited Nov 14 '18 at 17:48

answered Nov 14 '18 at 17:43

Sandeep Kadapa

7,098830

add a comment |

ppg wrote:

df['c'] = df['b'].mask(df['b'].duplicated())



print(df)



   a  b    c

0  1  3  3.0

1  2  4  4.0

2  3  3  NaN

3  4  4  NaN

4  5  5  5.0

I like the code, but the last column should also give NaN

    0  1  3  3.0

    1  2  4  4.0

    2  3  3  NaN

    3  4  4  NaN

    4  5  5  NaN

answered Nov 14 '18 at 18:03

Michael G.

2231316

I don't understand your answer / point. Can you explain further?

– jpp
Jan 13 at 14:12

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53305886%2fpandas-create-new-column-based-on-first-unique-values-of-existing-column%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

`mask` + `duplicated`

You can use Pandas methods for masking a series:

df['c'] = df['b'].mask(df['b'].duplicated())



print(df)



   a  b    c

0  1  3  3.0

1  2  4  4.0

2  3  3  NaN

3  4  4  NaN

4  5  5  5.0

answered Nov 14 '18 at 17:42

jpp

101k2162111

add a comment |

`mask` + `duplicated`

You can use Pandas methods for masking a series:

df['c'] = df['b'].mask(df['b'].duplicated())



print(df)



   a  b    c

0  1  3  3.0

1  2  4  4.0

2  3  3  NaN

3  4  4  NaN

4  5  5  5.0

answered Nov 14 '18 at 17:42

jpp

101k2162111

add a comment |

`mask` + `duplicated`

You can use Pandas methods for masking a series:

df['c'] = df['b'].mask(df['b'].duplicated())



print(df)



   a  b    c

0  1  3  3.0

1  2  4  4.0

2  3  3  NaN

3  4  4  NaN

4  5  5  5.0

answered Nov 14 '18 at 17:42

jpp

101k2162111

`mask` + `duplicated`

You can use Pandas methods for masking a series:

df['c'] = df['b'].mask(df['b'].duplicated())



print(df)



   a  b    c

0  1  3  3.0

1  2  4  4.0

2  3  3  NaN

3  4  4  NaN

4  5  5  5.0

answered Nov 14 '18 at 17:42

jpp

101k2162111

answered Nov 14 '18 at 17:42

jpp

101k2162111

answered Nov 14 '18 at 17:42

jpp

101k2162111

answered Nov 14 '18 at 17:42

jpp

101k2162111

add a comment |

Use duplicated with np.where:

df['c'] = np.where(df['b'].duplicated(),np.nan,df['b'])

Or:

df['c'] = df['b'].where(~df['b'].duplicated(),np.nan)

print(df)

   a  b    c

0  1  3  3.0

1  2  4  4.0

2  3  3  NaN

3  4  4  NaN

4  5  5  5.0

edited Nov 14 '18 at 17:48

answered Nov 14 '18 at 17:43

Sandeep Kadapa

7,098830

add a comment |

Use duplicated with np.where:

df['c'] = np.where(df['b'].duplicated(),np.nan,df['b'])

Or:

df['c'] = df['b'].where(~df['b'].duplicated(),np.nan)

print(df)

   a  b    c

0  1  3  3.0

1  2  4  4.0

2  3  3  NaN

3  4  4  NaN

4  5  5  5.0

edited Nov 14 '18 at 17:48

answered Nov 14 '18 at 17:43

Sandeep Kadapa

7,098830

add a comment |

Use duplicated with np.where:

df['c'] = np.where(df['b'].duplicated(),np.nan,df['b'])

Or:

df['c'] = df['b'].where(~df['b'].duplicated(),np.nan)

print(df)

   a  b    c

0  1  3  3.0

1  2  4  4.0

2  3  3  NaN

3  4  4  NaN

4  5  5  5.0

edited Nov 14 '18 at 17:48

answered Nov 14 '18 at 17:43

Sandeep Kadapa

7,098830

Use duplicated with np.where:

df['c'] = np.where(df['b'].duplicated(),np.nan,df['b'])

Or:

df['c'] = df['b'].where(~df['b'].duplicated(),np.nan)

print(df)

   a  b    c

0  1  3  3.0

1  2  4  4.0

2  3  3  NaN

3  4  4  NaN

4  5  5  5.0

edited Nov 14 '18 at 17:48

answered Nov 14 '18 at 17:43

Sandeep Kadapa

7,098830

edited Nov 14 '18 at 17:48

answered Nov 14 '18 at 17:43

Sandeep Kadapa

7,098830

answered Nov 14 '18 at 17:43

Sandeep Kadapa

7,098830

answered Nov 14 '18 at 17:43

Sandeep Kadapa

7,098830

add a comment |

ppg wrote:

df['c'] = df['b'].mask(df['b'].duplicated())



print(df)



   a  b    c

0  1  3  3.0

1  2  4  4.0

2  3  3  NaN

3  4  4  NaN

4  5  5  5.0

I like the code, but the last column should also give NaN

    0  1  3  3.0

    1  2  4  4.0

    2  3  3  NaN

    3  4  4  NaN

    4  5  5  NaN

answered Nov 14 '18 at 18:03

Michael G.

2231316

I don't understand your answer / point. Can you explain further?

– jpp
Jan 13 at 14:12

add a comment |

ppg wrote:

df['c'] = df['b'].mask(df['b'].duplicated())



print(df)



   a  b    c

0  1  3  3.0

1  2  4  4.0

2  3  3  NaN

3  4  4  NaN

4  5  5  5.0

I like the code, but the last column should also give NaN

    0  1  3  3.0

    1  2  4  4.0

    2  3  3  NaN

    3  4  4  NaN

    4  5  5  NaN

answered Nov 14 '18 at 18:03

Michael G.

2231316

I don't understand your answer / point. Can you explain further?

– jpp
Jan 13 at 14:12

add a comment |

ppg wrote:

df['c'] = df['b'].mask(df['b'].duplicated())



print(df)



   a  b    c

0  1  3  3.0

1  2  4  4.0

2  3  3  NaN

3  4  4  NaN

4  5  5  5.0

I like the code, but the last column should also give NaN

    0  1  3  3.0

    1  2  4  4.0

    2  3  3  NaN

    3  4  4  NaN

    4  5  5  NaN

answered Nov 14 '18 at 18:03

Michael G.

2231316

ppg wrote:

df['c'] = df['b'].mask(df['b'].duplicated())



print(df)



   a  b    c

0  1  3  3.0

1  2  4  4.0

2  3  3  NaN

3  4  4  NaN

4  5  5  5.0

I like the code, but the last column should also give NaN

    0  1  3  3.0

    1  2  4  4.0

    2  3  3  NaN

    3  4  4  NaN

    4  5  5  NaN

answered Nov 14 '18 at 18:03

Michael G.

2231316

answered Nov 14 '18 at 18:03

Michael G.

2231316

answered Nov 14 '18 at 18:03

Michael G.

2231316

answered Nov 14 '18 at 18:03

Michael G.

2231316

I don't understand your answer / point. Can you explain further?

– jpp
Jan 13 at 14:12

add a comment |

I don't understand your answer / point. Can you explain further?

– jpp
Jan 13 at 14:12

I don't understand your answer / point. Can you explain further?

– jpp
Jan 13 at 14:12

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky