creating a new dataframe based on differen data in rows

I'm moving an operation from Excel Power Query to R, which is much faster. The result is I have a data frame with thousands of rows, however, I'm looking to create a sample data frame that includes one row for every different option (factor level)for columns 5:10 of 15 columns, so people can manually test every option (like a truth table?)

I could manually do this, but I wondered if I could do it automatically.

    col1     col2       col3

    name     option1    option2

    name2    option1    option2

    name3    option1    option2

    name4    option2    option1

would be converted into a data frame like this:

    col1     col2       col3

    name     option1    option2

    name4    option2    option1

any help would be greatly appreciated.

Chris

asked Nov 14 '18 at 13:42

Chris

9011

see ?duplicated

– Bastien
Nov 14 '18 at 13:46

add a comment |

I could manually do this, but I wondered if I could do it automatically.

    col1     col2       col3

    name     option1    option2

    name2    option1    option2

    name3    option1    option2

    name4    option2    option1

would be converted into a data frame like this:

    col1     col2       col3

    name     option1    option2

    name4    option2    option1

any help would be greatly appreciated.

Chris

asked Nov 14 '18 at 13:42

Chris

9011

see ?duplicated

– Bastien
Nov 14 '18 at 13:46

add a comment |

I could manually do this, but I wondered if I could do it automatically.

    col1     col2       col3

    name     option1    option2

    name2    option1    option2

    name3    option1    option2

    name4    option2    option1

would be converted into a data frame like this:

    col1     col2       col3

    name     option1    option2

    name4    option2    option1

any help would be greatly appreciated.

Chris

asked Nov 14 '18 at 13:42

Chris

9011

I could manually do this, but I wondered if I could do it automatically.

    col1     col2       col3

    name     option1    option2

    name2    option1    option2

    name3    option1    option2

    name4    option2    option1

would be converted into a data frame like this:

    col1     col2       col3

    name     option1    option2

    name4    option2    option1

any help would be greatly appreciated.

Chris

r dataframe rstudio tidyverse

asked Nov 14 '18 at 13:42

Chris

9011

asked Nov 14 '18 at 13:42

Chris

9011

asked Nov 14 '18 at 13:42

Chris

9011

asked Nov 14 '18 at 13:42

Chris

9011

asked Nov 14 '18 at 13:42

Chris

9011

see ?duplicated

– Bastien
Nov 14 '18 at 13:46

add a comment |

see ?duplicated

– Bastien
Nov 14 '18 at 13:46

see ?duplicated

– Bastien
Nov 14 '18 at 13:46

add a comment |

1 Answer
1

active

oldest

votes

With dplyr:

library(dplyr)

d %>% distinct(col2, col3, .keep_all=T)



#    col1    col2    col3

# 1  name option1 option2

# 2 name4 option2 option1

If you want to use distinct only for a subset of columns, you can match first a regex:

d %>% 

    select(matches("[5-10]|[1]")) %>%  # this selects only rows from 5 to 10 or 1 in the name

    distinct(.keep_all=T)

This will have your first row "col1", and all the rows "col5" to "col10".

edited Nov 14 '18 at 14:05

answered Nov 14 '18 at 13:45

RLave

4,42711023

Thanks for the quick reply. In my data frame I end up with 19 thousand rows still (from 39,000) I'm after a table with about 100 - 200 lines, but perhaps my maths is way off. rather than have a new row for every distinct option, have rows which may take care of many options per row, so reducing the number of lines...does that make any sense?

– Chris
Nov 14 '18 at 14:09

mm no, I'm sorry but you need to be more clear. Try updating your question with a reproducible example where you show what you expect.

– RLave
Nov 14 '18 at 14:15

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53301627%2fcreating-a-new-dataframe-based-on-differen-data-in-rows%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

With dplyr:

library(dplyr)

d %>% distinct(col2, col3, .keep_all=T)



#    col1    col2    col3

# 1  name option1 option2

# 2 name4 option2 option1

If you want to use distinct only for a subset of columns, you can match first a regex:

d %>% 

    select(matches("[5-10]|[1]")) %>%  # this selects only rows from 5 to 10 or 1 in the name

    distinct(.keep_all=T)

This will have your first row "col1", and all the rows "col5" to "col10".

edited Nov 14 '18 at 14:05

answered Nov 14 '18 at 13:45

RLave

4,42711023

Thanks for the quick reply. In my data frame I end up with 19 thousand rows still (from 39,000) I'm after a table with about 100 - 200 lines, but perhaps my maths is way off. rather than have a new row for every distinct option, have rows which may take care of many options per row, so reducing the number of lines...does that make any sense?

– Chris
Nov 14 '18 at 14:09

mm no, I'm sorry but you need to be more clear. Try updating your question with a reproducible example where you show what you expect.

– RLave
Nov 14 '18 at 14:15

add a comment |

With dplyr:

library(dplyr)

d %>% distinct(col2, col3, .keep_all=T)



#    col1    col2    col3

# 1  name option1 option2

# 2 name4 option2 option1

If you want to use distinct only for a subset of columns, you can match first a regex:

d %>% 

    select(matches("[5-10]|[1]")) %>%  # this selects only rows from 5 to 10 or 1 in the name

    distinct(.keep_all=T)

This will have your first row "col1", and all the rows "col5" to "col10".

edited Nov 14 '18 at 14:05

answered Nov 14 '18 at 13:45

RLave

4,42711023

Thanks for the quick reply. In my data frame I end up with 19 thousand rows still (from 39,000) I'm after a table with about 100 - 200 lines, but perhaps my maths is way off. rather than have a new row for every distinct option, have rows which may take care of many options per row, so reducing the number of lines...does that make any sense?

– Chris
Nov 14 '18 at 14:09

mm no, I'm sorry but you need to be more clear. Try updating your question with a reproducible example where you show what you expect.

– RLave
Nov 14 '18 at 14:15

add a comment |

With dplyr:

library(dplyr)

d %>% distinct(col2, col3, .keep_all=T)



#    col1    col2    col3

# 1  name option1 option2

# 2 name4 option2 option1

If you want to use distinct only for a subset of columns, you can match first a regex:

d %>% 

    select(matches("[5-10]|[1]")) %>%  # this selects only rows from 5 to 10 or 1 in the name

    distinct(.keep_all=T)

This will have your first row "col1", and all the rows "col5" to "col10".

edited Nov 14 '18 at 14:05

answered Nov 14 '18 at 13:45

RLave

4,42711023

With dplyr:

library(dplyr)

d %>% distinct(col2, col3, .keep_all=T)



#    col1    col2    col3

# 1  name option1 option2

# 2 name4 option2 option1

If you want to use distinct only for a subset of columns, you can match first a regex:

d %>% 

    select(matches("[5-10]|[1]")) %>%  # this selects only rows from 5 to 10 or 1 in the name

    distinct(.keep_all=T)

This will have your first row "col1", and all the rows "col5" to "col10".

edited Nov 14 '18 at 14:05

answered Nov 14 '18 at 13:45

RLave

4,42711023

edited Nov 14 '18 at 14:05

answered Nov 14 '18 at 13:45

RLave

4,42711023

answered Nov 14 '18 at 13:45

RLave

4,42711023

answered Nov 14 '18 at 13:45

RLave

4,42711023

Thanks for the quick reply. In my data frame I end up with 19 thousand rows still (from 39,000) I'm after a table with about 100 - 200 lines, but perhaps my maths is way off. rather than have a new row for every distinct option, have rows which may take care of many options per row, so reducing the number of lines...does that make any sense?

– Chris
Nov 14 '18 at 14:09

mm no, I'm sorry but you need to be more clear. Try updating your question with a reproducible example where you show what you expect.

– RLave
Nov 14 '18 at 14:15

add a comment |

Thanks for the quick reply. In my data frame I end up with 19 thousand rows still (from 39,000) I'm after a table with about 100 - 200 lines, but perhaps my maths is way off. rather than have a new row for every distinct option, have rows which may take care of many options per row, so reducing the number of lines...does that make any sense?

– Chris
Nov 14 '18 at 14:09

mm no, I'm sorry but you need to be more clear. Try updating your question with a reproducible example where you show what you expect.

– RLave
Nov 14 '18 at 14:15

Thanks for the quick reply. In my data frame I end up with 19 thousand rows still (from 39,000) I'm after a table with about 100 - 200 lines, but perhaps my maths is way off. rather than have a new row for every distinct option, have rows which may take care of many options per row, so reducing the number of lines...does that make any sense?

– Chris
Nov 14 '18 at 14:09

mm no, I'm sorry but you need to be more clear. Try updating your question with a reproducible example where you show what you expect.

– RLave
Nov 14 '18 at 14:15

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky