Pandas merge handling duplicates in join output

up vote
1
down vote

favorite

Is there a nice way to bring only one row, preferably random in one-to-many matching during left join in Pandas?

e.g

left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]

right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]

left = np.asarray(left)

right = np.asarray(right)

left = pd.DataFrame(left)

right = pd.DataFrame(right)

joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])

So this is what we get

   0  1  2

0  1  1  1

1  2  2  2

2  3  3  3

3  9  9  9

4  1  3  2



   0  1  2

0  1  2  2

1  1  2  3

2  3  2  2

3  3  2  9

4  3  2  2



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  1    1    1  2.0  3.0

2  2    2    2  NaN  NaN

3  3    3    3  2.0  2.0

4  3    3    3  2.0  9.0

5  3    3    3  2.0  2.0

6  9    9    9  NaN  NaN

7  1    3    2  2.0  2.0

8  1    3    2  2.0  3.0

So now I want to have output to be of the same size as my left dataframe and when there are more than one match in right dataframe I want to bring only single random column.

Is there a nice way of doing it using pandas short cut tricks?

thank you!

edited Nov 11 at 1:26

coldspeed

111k1799169

asked Nov 11 at 0:36

YohanRoth

8861819

add a comment |

up vote
1
down vote

favorite

Is there a nice way to bring only one row, preferably random in one-to-many matching during left join in Pandas?

e.g

left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]

right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]

left = np.asarray(left)

right = np.asarray(right)

left = pd.DataFrame(left)

right = pd.DataFrame(right)

joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])

So this is what we get

   0  1  2

0  1  1  1

1  2  2  2

2  3  3  3

3  9  9  9

4  1  3  2



   0  1  2

0  1  2  2

1  1  2  3

2  3  2  2

3  3  2  9

4  3  2  2



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  1    1    1  2.0  3.0

2  2    2    2  NaN  NaN

3  3    3    3  2.0  2.0

4  3    3    3  2.0  9.0

5  3    3    3  2.0  2.0

6  9    9    9  NaN  NaN

7  1    3    2  2.0  2.0

8  1    3    2  2.0  3.0

So now I want to have output to be of the same size as my left dataframe and when there are more than one match in right dataframe I want to bring only single random column.

Is there a nice way of doing it using pandas short cut tricks?

thank you!

edited Nov 11 at 1:26

coldspeed

111k1799169

asked Nov 11 at 0:36

YohanRoth

8861819

add a comment |

up vote
1
down vote

favorite

Is there a nice way to bring only one row, preferably random in one-to-many matching during left join in Pandas?

e.g

left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]

right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]

left = np.asarray(left)

right = np.asarray(right)

left = pd.DataFrame(left)

right = pd.DataFrame(right)

joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])

So this is what we get

   0  1  2

0  1  1  1

1  2  2  2

2  3  3  3

3  9  9  9

4  1  3  2



   0  1  2

0  1  2  2

1  1  2  3

2  3  2  2

3  3  2  9

4  3  2  2



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  1    1    1  2.0  3.0

2  2    2    2  NaN  NaN

3  3    3    3  2.0  2.0

4  3    3    3  2.0  9.0

5  3    3    3  2.0  2.0

6  9    9    9  NaN  NaN

7  1    3    2  2.0  2.0

8  1    3    2  2.0  3.0

So now I want to have output to be of the same size as my left dataframe and when there are more than one match in right dataframe I want to bring only single random column.

Is there a nice way of doing it using pandas short cut tricks?

thank you!

edited Nov 11 at 1:26

coldspeed

111k1799169

asked Nov 11 at 0:36

YohanRoth

8861819

Is there a nice way to bring only one row, preferably random in one-to-many matching during left join in Pandas?

e.g

left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]

right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]

left = np.asarray(left)

right = np.asarray(right)

left = pd.DataFrame(left)

right = pd.DataFrame(right)

joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])

So this is what we get

   0  1  2

0  1  1  1

1  2  2  2

2  3  3  3

3  9  9  9

4  1  3  2



   0  1  2

0  1  2  2

1  1  2  3

2  3  2  2

3  3  2  9

4  3  2  2



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  1    1    1  2.0  3.0

2  2    2    2  NaN  NaN

3  3    3    3  2.0  2.0

4  3    3    3  2.0  9.0

5  3    3    3  2.0  2.0

6  9    9    9  NaN  NaN

7  1    3    2  2.0  2.0

8  1    3    2  2.0  3.0

So now I want to have output to be of the same size as my left dataframe and when there are more than one match in right dataframe I want to bring only single random column.

Is there a nice way of doing it using pandas short cut tricks?

thank you!

python pandas dataframe random merge

edited Nov 11 at 1:26

coldspeed

111k1799169

asked Nov 11 at 0:36

YohanRoth

8861819

edited Nov 11 at 1:26

coldspeed

111k1799169

asked Nov 11 at 0:36

YohanRoth

8861819

edited Nov 11 at 1:26

coldspeed

111k1799169

edited Nov 11 at 1:26

coldspeed

111k1799169

edited Nov 11 at 1:26

coldspeed

111k1799169

asked Nov 11 at 0:36

YohanRoth

8861819

asked Nov 11 at 0:36

YohanRoth

8861819

asked Nov 11 at 0:36

YohanRoth

8861819

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You can shuffle right and drop_duplicates(...[, keep='first']) before merging.

right2 = right.sample(frac=1).drop_duplicates(subset=[0])

left.merge(right2, how='left', left_on=[0], right_on=[0])



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  2    2    2  NaN  NaN

2  3    3    3  2.0  2.0

3  9    9    9  NaN  NaN

4  1    3    2  2.0  2.0

We shuffle right first, and then drop every duplicate except the first row (considering only column #0), which is the same as randomly selecting a row.

answered Nov 11 at 0:39

coldspeed

111k1799169

1

I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43

@YohanRoth - in this case - if your first row of the output is 1 1 1 2.0 2.0, I think that guarantees the last row is also 1 3 2 2.0 2.0 since you've dropped 1 2 3. From your question asking for a random choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244793%2fpandas-merge-handling-duplicates-in-join-output%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You can shuffle right and drop_duplicates(...[, keep='first']) before merging.

right2 = right.sample(frac=1).drop_duplicates(subset=[0])

left.merge(right2, how='left', left_on=[0], right_on=[0])



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  2    2    2  NaN  NaN

2  3    3    3  2.0  2.0

3  9    9    9  NaN  NaN

4  1    3    2  2.0  2.0

We shuffle right first, and then drop every duplicate except the first row (considering only column #0), which is the same as randomly selecting a row.

answered Nov 11 at 0:39

coldspeed

111k1799169

1

I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43

@YohanRoth - in this case - if your first row of the output is 1 1 1 2.0 2.0, I think that guarantees the last row is also 1 3 2 2.0 2.0 since you've dropped 1 2 3. From your question asking for a random choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47

add a comment |

up vote
1
down vote

accepted

You can shuffle right and drop_duplicates(...[, keep='first']) before merging.

right2 = right.sample(frac=1).drop_duplicates(subset=[0])

left.merge(right2, how='left', left_on=[0], right_on=[0])



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  2    2    2  NaN  NaN

2  3    3    3  2.0  2.0

3  9    9    9  NaN  NaN

4  1    3    2  2.0  2.0

We shuffle right first, and then drop every duplicate except the first row (considering only column #0), which is the same as randomly selecting a row.

answered Nov 11 at 0:39

coldspeed

111k1799169

1

I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43

@YohanRoth - in this case - if your first row of the output is 1 1 1 2.0 2.0, I think that guarantees the last row is also 1 3 2 2.0 2.0 since you've dropped 1 2 3. From your question asking for a random choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47

add a comment |

up vote
1
down vote

accepted

You can shuffle right and drop_duplicates(...[, keep='first']) before merging.

right2 = right.sample(frac=1).drop_duplicates(subset=[0])

left.merge(right2, how='left', left_on=[0], right_on=[0])



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  2    2    2  NaN  NaN

2  3    3    3  2.0  2.0

3  9    9    9  NaN  NaN

4  1    3    2  2.0  2.0

We shuffle right first, and then drop every duplicate except the first row (considering only column #0), which is the same as randomly selecting a row.

answered Nov 11 at 0:39

coldspeed

111k1799169

You can shuffle right and drop_duplicates(...[, keep='first']) before merging.

right2 = right.sample(frac=1).drop_duplicates(subset=[0])

left.merge(right2, how='left', left_on=[0], right_on=[0])



   0  1_x  2_x  1_y  2_y

0  1    1    1  2.0  2.0

1  2    2    2  NaN  NaN

2  3    3    3  2.0  2.0

3  9    9    9  NaN  NaN

4  1    3    2  2.0  2.0

We shuffle right first, and then drop every duplicate except the first row (considering only column #0), which is the same as randomly selecting a row.

answered Nov 11 at 0:39

coldspeed

111k1799169

answered Nov 11 at 0:39

coldspeed

111k1799169

answered Nov 11 at 0:39

coldspeed

111k1799169

answered Nov 11 at 0:39

coldspeed

111k1799169

1

I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43

@YohanRoth - in this case - if your first row of the output is 1 1 1 2.0 2.0, I think that guarantees the last row is also 1 3 2 2.0 2.0 since you've dropped 1 2 3. From your question asking for a random choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47

add a comment |

1

I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43

@YohanRoth - in this case - if your first row of the output is 1 1 1 2.0 2.0, I think that guarantees the last row is also 1 3 2 2.0 2.0 since you've dropped 1 2 3. From your question asking for a random choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47

I see, so you drop duplicates for a merge key column right. Ingenious! Thank you
– YohanRoth
Nov 11 at 0:43

@YohanRoth - in this case - if your first row of the output is 1 1 1 2.0 2.0, I think that guarantees the last row is also 1 3 2 2.0 2.0 since you've dropped 1 2 3. From your question asking for a random choice, I'm a bit concerned that this may not be the behavior you want. Perhaps it's fine, but worth making sure it's consistent with what you want.
– Joel
Nov 11 at 4:47

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

TY5Rekoi K,YBo5RE,lIwAJ2J

搜尋此網誌

Vfrdtyky