Keeping only categories that contain both female and male observations [closed]

-2

I'm working with a dataframe where a "job_title" column contains hundreds of titles, and a different column, "gender", specifies whether the person represented by the observation is male or female.

(I'm struggling to figure out) how to drop all rows such that the job title value isn't shared by both a male and a female?

In other words, I want to keep a row if and only if its "job_title" value is recorded for at least one other row which has the other "gender" value. If only males have a specific job title, I want to drop all the rows with that job title; if only females have a job title, I'm looking to drop all rows with that job title too.

asked Nov 16 '18 at 5:58

tres14

closed as off-topic by Mitch Wheat, MLavoie, Umair, Rob, VDWWD Nov 16 '18 at 14:24

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Mitch Wheat, MLavoie, Umair, Rob, VDWWD

If this question can be reworded to fit the rules in the help center, please edit the question.

1

please provide a reproducible output using dput()

– Hunaidkhan
Nov 16 '18 at 6:03

add a comment |

-2

I'm working with a dataframe where a "job_title" column contains hundreds of titles, and a different column, "gender", specifies whether the person represented by the observation is male or female.

(I'm struggling to figure out) how to drop all rows such that the job title value isn't shared by both a male and a female?

asked Nov 16 '18 at 5:58

tres14

closed as off-topic by Mitch Wheat, MLavoie, Umair, Rob, VDWWD Nov 16 '18 at 14:24

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Mitch Wheat, MLavoie, Umair, Rob, VDWWD

If this question can be reworded to fit the rules in the help center, please edit the question.

1

please provide a reproducible output using dput()

– Hunaidkhan
Nov 16 '18 at 6:03

add a comment |

-2

I'm working with a dataframe where a "job_title" column contains hundreds of titles, and a different column, "gender", specifies whether the person represented by the observation is male or female.

(I'm struggling to figure out) how to drop all rows such that the job title value isn't shared by both a male and a female?

asked Nov 16 '18 at 5:58

tres14

I'm working with a dataframe where a "job_title" column contains hundreds of titles, and a different column, "gender", specifies whether the person represented by the observation is male or female.

(I'm struggling to figure out) how to drop all rows such that the job title value isn't shared by both a male and a female?

r dataframe plyr tidyverse

asked Nov 16 '18 at 5:58

tres14

asked Nov 16 '18 at 5:58

tres14

asked Nov 16 '18 at 5:58

tres14

asked Nov 16 '18 at 5:58

tres14

asked Nov 16 '18 at 5:58

tres14

closed as off-topic by Mitch Wheat, MLavoie, Umair, Rob, VDWWD Nov 16 '18 at 14:24

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Mitch Wheat, MLavoie, Umair, Rob, VDWWD

If this question can be reworded to fit the rules in the help center, please edit the question.

closed as off-topic by Mitch Wheat, MLavoie, Umair, Rob, VDWWD Nov 16 '18 at 14:24

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Mitch Wheat, MLavoie, Umair, Rob, VDWWD

If this question can be reworded to fit the rules in the help center, please edit the question.

1

please provide a reproducible output using dput()

– Hunaidkhan
Nov 16 '18 at 6:03

add a comment |

1

please provide a reproducible output using dput()

– Hunaidkhan
Nov 16 '18 at 6:03

please provide a reproducible output using dput()

– Hunaidkhan
Nov 16 '18 at 6:03

add a comment |

1 Answer
1

active

oldest

votes

Assuming df is your dataframe, here's a base R solution -

df[ave(df$gender, df$job_title, FUN = function(x) length(unique(x))) > 1, ]

With tidyverse -

df %>%

  group_by(job_title) %>%

  filter(n_distinct(gender) > 1)

answered Nov 16 '18 at 6:13

Shree

3,5161424

The latter of the two options works for my particular case!

– tres14
Nov 16 '18 at 6:22

@tres14 both should work but I can test them only if you provide a sample data set.

– Shree
Nov 16 '18 at 6:24

add a comment |

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Assuming df is your dataframe, here's a base R solution -

df[ave(df$gender, df$job_title, FUN = function(x) length(unique(x))) > 1, ]

With tidyverse -

df %>%

  group_by(job_title) %>%

  filter(n_distinct(gender) > 1)

answered Nov 16 '18 at 6:13

Shree

3,5161424

The latter of the two options works for my particular case!

– tres14
Nov 16 '18 at 6:22

@tres14 both should work but I can test them only if you provide a sample data set.

– Shree
Nov 16 '18 at 6:24

add a comment |

Assuming df is your dataframe, here's a base R solution -

df[ave(df$gender, df$job_title, FUN = function(x) length(unique(x))) > 1, ]

With tidyverse -

df %>%

  group_by(job_title) %>%

  filter(n_distinct(gender) > 1)

answered Nov 16 '18 at 6:13

Shree

3,5161424

The latter of the two options works for my particular case!

– tres14
Nov 16 '18 at 6:22

@tres14 both should work but I can test them only if you provide a sample data set.

– Shree
Nov 16 '18 at 6:24

add a comment |

Assuming df is your dataframe, here's a base R solution -

df[ave(df$gender, df$job_title, FUN = function(x) length(unique(x))) > 1, ]

With tidyverse -

df %>%

  group_by(job_title) %>%

  filter(n_distinct(gender) > 1)

answered Nov 16 '18 at 6:13

Shree

3,5161424

Assuming df is your dataframe, here's a base R solution -

df[ave(df$gender, df$job_title, FUN = function(x) length(unique(x))) > 1, ]

With tidyverse -

df %>%

  group_by(job_title) %>%

  filter(n_distinct(gender) > 1)

answered Nov 16 '18 at 6:13

Shree

3,5161424

answered Nov 16 '18 at 6:13

Shree

3,5161424

answered Nov 16 '18 at 6:13

Shree

3,5161424

answered Nov 16 '18 at 6:13

Shree

3,5161424

The latter of the two options works for my particular case!

– tres14
Nov 16 '18 at 6:22

@tres14 both should work but I can test them only if you provide a sample data set.

– Shree
Nov 16 '18 at 6:24

add a comment |

The latter of the two options works for my particular case!

– tres14
Nov 16 '18 at 6:22

@tres14 both should work but I can test them only if you provide a sample data set.

– Shree
Nov 16 '18 at 6:24

The latter of the two options works for my particular case!

– tres14
Nov 16 '18 at 6:22

@tres14 both should work but I can test them only if you provide a sample data set.

– Shree
Nov 16 '18 at 6:24

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky