Keeping only categories that contain both female and male observations [closed]
I'm working with a dataframe where a "job_title" column contains hundreds of titles, and a different column, "gender", specifies whether the person represented by the observation is male or female.
(I'm struggling to figure out) how to drop all rows such that the job title value isn't shared by both a male and a female?
In other words, I want to keep a row if and only if its "job_title" value is recorded for at least one other row which has the other "gender" value. If only males have a specific job title, I want to drop all the rows with that job title; if only females have a job title, I'm looking to drop all rows with that job title too.
r dataframe plyr tidyverse
closed as off-topic by Mitch Wheat, MLavoie, Umair, Rob, VDWWD Nov 16 '18 at 14:24
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Mitch Wheat, MLavoie, Umair, Rob, VDWWD
If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
I'm working with a dataframe where a "job_title" column contains hundreds of titles, and a different column, "gender", specifies whether the person represented by the observation is male or female.
(I'm struggling to figure out) how to drop all rows such that the job title value isn't shared by both a male and a female?
In other words, I want to keep a row if and only if its "job_title" value is recorded for at least one other row which has the other "gender" value. If only males have a specific job title, I want to drop all the rows with that job title; if only females have a job title, I'm looking to drop all rows with that job title too.
r dataframe plyr tidyverse
closed as off-topic by Mitch Wheat, MLavoie, Umair, Rob, VDWWD Nov 16 '18 at 14:24
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Mitch Wheat, MLavoie, Umair, Rob, VDWWD
If this question can be reworded to fit the rules in the help center, please edit the question.
1
please provide a reproducible output using dput()
– Hunaidkhan
Nov 16 '18 at 6:03
add a comment |
I'm working with a dataframe where a "job_title" column contains hundreds of titles, and a different column, "gender", specifies whether the person represented by the observation is male or female.
(I'm struggling to figure out) how to drop all rows such that the job title value isn't shared by both a male and a female?
In other words, I want to keep a row if and only if its "job_title" value is recorded for at least one other row which has the other "gender" value. If only males have a specific job title, I want to drop all the rows with that job title; if only females have a job title, I'm looking to drop all rows with that job title too.
r dataframe plyr tidyverse
I'm working with a dataframe where a "job_title" column contains hundreds of titles, and a different column, "gender", specifies whether the person represented by the observation is male or female.
(I'm struggling to figure out) how to drop all rows such that the job title value isn't shared by both a male and a female?
In other words, I want to keep a row if and only if its "job_title" value is recorded for at least one other row which has the other "gender" value. If only males have a specific job title, I want to drop all the rows with that job title; if only females have a job title, I'm looking to drop all rows with that job title too.
r dataframe plyr tidyverse
r dataframe plyr tidyverse
asked Nov 16 '18 at 5:58
tres14tres14
11
11
closed as off-topic by Mitch Wheat, MLavoie, Umair, Rob, VDWWD Nov 16 '18 at 14:24
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Mitch Wheat, MLavoie, Umair, Rob, VDWWD
If this question can be reworded to fit the rules in the help center, please edit the question.
closed as off-topic by Mitch Wheat, MLavoie, Umair, Rob, VDWWD Nov 16 '18 at 14:24
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Mitch Wheat, MLavoie, Umair, Rob, VDWWD
If this question can be reworded to fit the rules in the help center, please edit the question.
1
please provide a reproducible output using dput()
– Hunaidkhan
Nov 16 '18 at 6:03
add a comment |
1
please provide a reproducible output using dput()
– Hunaidkhan
Nov 16 '18 at 6:03
1
1
please provide a reproducible output using dput()
– Hunaidkhan
Nov 16 '18 at 6:03
please provide a reproducible output using dput()
– Hunaidkhan
Nov 16 '18 at 6:03
add a comment |
1 Answer
1
active
oldest
votes
Assuming df is your dataframe, here's a base R solution -
df[ave(df$gender, df$job_title, FUN = function(x) length(unique(x))) > 1, ]
With tidyverse -
df %>%
group_by(job_title) %>%
filter(n_distinct(gender) > 1)
The latter of the two options works for my particular case!
– tres14
Nov 16 '18 at 6:22
@tres14 both should work but I can test them only if you provide a sample data set.
– Shree
Nov 16 '18 at 6:24
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Assuming df is your dataframe, here's a base R solution -
df[ave(df$gender, df$job_title, FUN = function(x) length(unique(x))) > 1, ]
With tidyverse -
df %>%
group_by(job_title) %>%
filter(n_distinct(gender) > 1)
The latter of the two options works for my particular case!
– tres14
Nov 16 '18 at 6:22
@tres14 both should work but I can test them only if you provide a sample data set.
– Shree
Nov 16 '18 at 6:24
add a comment |
Assuming df is your dataframe, here's a base R solution -
df[ave(df$gender, df$job_title, FUN = function(x) length(unique(x))) > 1, ]
With tidyverse -
df %>%
group_by(job_title) %>%
filter(n_distinct(gender) > 1)
The latter of the two options works for my particular case!
– tres14
Nov 16 '18 at 6:22
@tres14 both should work but I can test them only if you provide a sample data set.
– Shree
Nov 16 '18 at 6:24
add a comment |
Assuming df is your dataframe, here's a base R solution -
df[ave(df$gender, df$job_title, FUN = function(x) length(unique(x))) > 1, ]
With tidyverse -
df %>%
group_by(job_title) %>%
filter(n_distinct(gender) > 1)
Assuming df is your dataframe, here's a base R solution -
df[ave(df$gender, df$job_title, FUN = function(x) length(unique(x))) > 1, ]
With tidyverse -
df %>%
group_by(job_title) %>%
filter(n_distinct(gender) > 1)
answered Nov 16 '18 at 6:13
ShreeShree
3,5161424
3,5161424
The latter of the two options works for my particular case!
– tres14
Nov 16 '18 at 6:22
@tres14 both should work but I can test them only if you provide a sample data set.
– Shree
Nov 16 '18 at 6:24
add a comment |
The latter of the two options works for my particular case!
– tres14
Nov 16 '18 at 6:22
@tres14 both should work but I can test them only if you provide a sample data set.
– Shree
Nov 16 '18 at 6:24
The latter of the two options works for my particular case!
– tres14
Nov 16 '18 at 6:22
The latter of the two options works for my particular case!
– tres14
Nov 16 '18 at 6:22
@tres14 both should work but I can test them only if you provide a sample data set.
– Shree
Nov 16 '18 at 6:24
@tres14 both should work but I can test them only if you provide a sample data set.
– Shree
Nov 16 '18 at 6:24
add a comment |
1
please provide a reproducible output using dput()
– Hunaidkhan
Nov 16 '18 at 6:03