creating a new dataframe based on differen data in rows












0















I'm moving an operation from Excel Power Query to R, which is much faster. The result is I have a data frame with thousands of rows, however, I'm looking to create a sample data frame that includes one row for every different option (factor level)for columns 5:10 of 15 columns, so people can manually test every option (like a truth table?)



I could manually do this, but I wondered if I could do it automatically.



    col1     col2       col3
name option1 option2
name2 option1 option2
name3 option1 option2
name4 option2 option1


would be converted into a data frame like this:



    col1     col2       col3
name option1 option2
name4 option2 option1


any help would be greatly appreciated.



Chris










share|improve this question























  • see ?duplicated

    – Bastien
    Nov 14 '18 at 13:46


















0















I'm moving an operation from Excel Power Query to R, which is much faster. The result is I have a data frame with thousands of rows, however, I'm looking to create a sample data frame that includes one row for every different option (factor level)for columns 5:10 of 15 columns, so people can manually test every option (like a truth table?)



I could manually do this, but I wondered if I could do it automatically.



    col1     col2       col3
name option1 option2
name2 option1 option2
name3 option1 option2
name4 option2 option1


would be converted into a data frame like this:



    col1     col2       col3
name option1 option2
name4 option2 option1


any help would be greatly appreciated.



Chris










share|improve this question























  • see ?duplicated

    – Bastien
    Nov 14 '18 at 13:46
















0












0








0








I'm moving an operation from Excel Power Query to R, which is much faster. The result is I have a data frame with thousands of rows, however, I'm looking to create a sample data frame that includes one row for every different option (factor level)for columns 5:10 of 15 columns, so people can manually test every option (like a truth table?)



I could manually do this, but I wondered if I could do it automatically.



    col1     col2       col3
name option1 option2
name2 option1 option2
name3 option1 option2
name4 option2 option1


would be converted into a data frame like this:



    col1     col2       col3
name option1 option2
name4 option2 option1


any help would be greatly appreciated.



Chris










share|improve this question














I'm moving an operation from Excel Power Query to R, which is much faster. The result is I have a data frame with thousands of rows, however, I'm looking to create a sample data frame that includes one row for every different option (factor level)for columns 5:10 of 15 columns, so people can manually test every option (like a truth table?)



I could manually do this, but I wondered if I could do it automatically.



    col1     col2       col3
name option1 option2
name2 option1 option2
name3 option1 option2
name4 option2 option1


would be converted into a data frame like this:



    col1     col2       col3
name option1 option2
name4 option2 option1


any help would be greatly appreciated.



Chris







r dataframe rstudio tidyverse






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 14 '18 at 13:42









ChrisChris

9011




9011













  • see ?duplicated

    – Bastien
    Nov 14 '18 at 13:46





















  • see ?duplicated

    – Bastien
    Nov 14 '18 at 13:46



















see ?duplicated

– Bastien
Nov 14 '18 at 13:46







see ?duplicated

– Bastien
Nov 14 '18 at 13:46














1 Answer
1






active

oldest

votes


















1














With dplyr:



library(dplyr)
d %>% distinct(col2, col3, .keep_all=T)

# col1 col2 col3
# 1 name option1 option2
# 2 name4 option2 option1


If you want to use distinct only for a subset of columns, you can match first a regex:



d %>% 
select(matches("[5-10]|[1]")) %>% # this selects only rows from 5 to 10 or 1 in the name
distinct(.keep_all=T)


This will have your first row "col1", and all the rows "col5" to "col10".






share|improve this answer


























  • Thanks for the quick reply. In my data frame I end up with 19 thousand rows still (from 39,000) I'm after a table with about 100 - 200 lines, but perhaps my maths is way off. rather than have a new row for every distinct option, have rows which may take care of many options per row, so reducing the number of lines...does that make any sense?

    – Chris
    Nov 14 '18 at 14:09











  • mm no, I'm sorry but you need to be more clear. Try updating your question with a reproducible example where you show what you expect.

    – RLave
    Nov 14 '18 at 14:15











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53301627%2fcreating-a-new-dataframe-based-on-differen-data-in-rows%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














With dplyr:



library(dplyr)
d %>% distinct(col2, col3, .keep_all=T)

# col1 col2 col3
# 1 name option1 option2
# 2 name4 option2 option1


If you want to use distinct only for a subset of columns, you can match first a regex:



d %>% 
select(matches("[5-10]|[1]")) %>% # this selects only rows from 5 to 10 or 1 in the name
distinct(.keep_all=T)


This will have your first row "col1", and all the rows "col5" to "col10".






share|improve this answer


























  • Thanks for the quick reply. In my data frame I end up with 19 thousand rows still (from 39,000) I'm after a table with about 100 - 200 lines, but perhaps my maths is way off. rather than have a new row for every distinct option, have rows which may take care of many options per row, so reducing the number of lines...does that make any sense?

    – Chris
    Nov 14 '18 at 14:09











  • mm no, I'm sorry but you need to be more clear. Try updating your question with a reproducible example where you show what you expect.

    – RLave
    Nov 14 '18 at 14:15
















1














With dplyr:



library(dplyr)
d %>% distinct(col2, col3, .keep_all=T)

# col1 col2 col3
# 1 name option1 option2
# 2 name4 option2 option1


If you want to use distinct only for a subset of columns, you can match first a regex:



d %>% 
select(matches("[5-10]|[1]")) %>% # this selects only rows from 5 to 10 or 1 in the name
distinct(.keep_all=T)


This will have your first row "col1", and all the rows "col5" to "col10".






share|improve this answer


























  • Thanks for the quick reply. In my data frame I end up with 19 thousand rows still (from 39,000) I'm after a table with about 100 - 200 lines, but perhaps my maths is way off. rather than have a new row for every distinct option, have rows which may take care of many options per row, so reducing the number of lines...does that make any sense?

    – Chris
    Nov 14 '18 at 14:09











  • mm no, I'm sorry but you need to be more clear. Try updating your question with a reproducible example where you show what you expect.

    – RLave
    Nov 14 '18 at 14:15














1












1








1







With dplyr:



library(dplyr)
d %>% distinct(col2, col3, .keep_all=T)

# col1 col2 col3
# 1 name option1 option2
# 2 name4 option2 option1


If you want to use distinct only for a subset of columns, you can match first a regex:



d %>% 
select(matches("[5-10]|[1]")) %>% # this selects only rows from 5 to 10 or 1 in the name
distinct(.keep_all=T)


This will have your first row "col1", and all the rows "col5" to "col10".






share|improve this answer















With dplyr:



library(dplyr)
d %>% distinct(col2, col3, .keep_all=T)

# col1 col2 col3
# 1 name option1 option2
# 2 name4 option2 option1


If you want to use distinct only for a subset of columns, you can match first a regex:



d %>% 
select(matches("[5-10]|[1]")) %>% # this selects only rows from 5 to 10 or 1 in the name
distinct(.keep_all=T)


This will have your first row "col1", and all the rows "col5" to "col10".







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 14 '18 at 14:05

























answered Nov 14 '18 at 13:45









RLaveRLave

4,42711023




4,42711023













  • Thanks for the quick reply. In my data frame I end up with 19 thousand rows still (from 39,000) I'm after a table with about 100 - 200 lines, but perhaps my maths is way off. rather than have a new row for every distinct option, have rows which may take care of many options per row, so reducing the number of lines...does that make any sense?

    – Chris
    Nov 14 '18 at 14:09











  • mm no, I'm sorry but you need to be more clear. Try updating your question with a reproducible example where you show what you expect.

    – RLave
    Nov 14 '18 at 14:15



















  • Thanks for the quick reply. In my data frame I end up with 19 thousand rows still (from 39,000) I'm after a table with about 100 - 200 lines, but perhaps my maths is way off. rather than have a new row for every distinct option, have rows which may take care of many options per row, so reducing the number of lines...does that make any sense?

    – Chris
    Nov 14 '18 at 14:09











  • mm no, I'm sorry but you need to be more clear. Try updating your question with a reproducible example where you show what you expect.

    – RLave
    Nov 14 '18 at 14:15

















Thanks for the quick reply. In my data frame I end up with 19 thousand rows still (from 39,000) I'm after a table with about 100 - 200 lines, but perhaps my maths is way off. rather than have a new row for every distinct option, have rows which may take care of many options per row, so reducing the number of lines...does that make any sense?

– Chris
Nov 14 '18 at 14:09





Thanks for the quick reply. In my data frame I end up with 19 thousand rows still (from 39,000) I'm after a table with about 100 - 200 lines, but perhaps my maths is way off. rather than have a new row for every distinct option, have rows which may take care of many options per row, so reducing the number of lines...does that make any sense?

– Chris
Nov 14 '18 at 14:09













mm no, I'm sorry but you need to be more clear. Try updating your question with a reproducible example where you show what you expect.

– RLave
Nov 14 '18 at 14:15





mm no, I'm sorry but you need to be more clear. Try updating your question with a reproducible example where you show what you expect.

– RLave
Nov 14 '18 at 14:15




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53301627%2fcreating-a-new-dataframe-based-on-differen-data-in-rows%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Xamarin.iOS Cant Deploy on Iphone

Glorious Revolution

Dulmage-Mendelsohn matrix decomposition in Python