Break a column at regular intervals into multiple rows [duplicate]
This question already has an answer here:
Convert Vector to Matrix without Recycling
2 answers
I have a column of numbers in a csv file and I want to break the column at regular intervals and transpose them into multiple rows. For example:
Dummy input file:
10
25
09
04
14
100
01
10
100
04
04
01
04
Expected output (Breaking at regular intervals of 3):
10 25 09
04 14 100
01 10 100
04 04 01
04
I am trying to do this in R by using for
loop but haven't succeeded. I am not getting the desired output but also there are more than 10 million points like these in a single column. So I am not sure if using loop is an efficient way. I have googled and seen other such queries on stackexchange like split string at regular intervals and How to split a string into substrings of a given length?. But it hasn't solved my problem.
Nevertheless, any help with this is appreciated.
r split rows
marked as duplicate by Jaap
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 8:42
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
Convert Vector to Matrix without Recycling
2 answers
I have a column of numbers in a csv file and I want to break the column at regular intervals and transpose them into multiple rows. For example:
Dummy input file:
10
25
09
04
14
100
01
10
100
04
04
01
04
Expected output (Breaking at regular intervals of 3):
10 25 09
04 14 100
01 10 100
04 04 01
04
I am trying to do this in R by using for
loop but haven't succeeded. I am not getting the desired output but also there are more than 10 million points like these in a single column. So I am not sure if using loop is an efficient way. I have googled and seen other such queries on stackexchange like split string at regular intervals and How to split a string into substrings of a given length?. But it hasn't solved my problem.
Nevertheless, any help with this is appreciated.
r split rows
marked as duplicate by Jaap
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 8:42
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
Convert Vector to Matrix without Recycling
2 answers
I have a column of numbers in a csv file and I want to break the column at regular intervals and transpose them into multiple rows. For example:
Dummy input file:
10
25
09
04
14
100
01
10
100
04
04
01
04
Expected output (Breaking at regular intervals of 3):
10 25 09
04 14 100
01 10 100
04 04 01
04
I am trying to do this in R by using for
loop but haven't succeeded. I am not getting the desired output but also there are more than 10 million points like these in a single column. So I am not sure if using loop is an efficient way. I have googled and seen other such queries on stackexchange like split string at regular intervals and How to split a string into substrings of a given length?. But it hasn't solved my problem.
Nevertheless, any help with this is appreciated.
r split rows
This question already has an answer here:
Convert Vector to Matrix without Recycling
2 answers
I have a column of numbers in a csv file and I want to break the column at regular intervals and transpose them into multiple rows. For example:
Dummy input file:
10
25
09
04
14
100
01
10
100
04
04
01
04
Expected output (Breaking at regular intervals of 3):
10 25 09
04 14 100
01 10 100
04 04 01
04
I am trying to do this in R by using for
loop but haven't succeeded. I am not getting the desired output but also there are more than 10 million points like these in a single column. So I am not sure if using loop is an efficient way. I have googled and seen other such queries on stackexchange like split string at regular intervals and How to split a string into substrings of a given length?. But it hasn't solved my problem.
Nevertheless, any help with this is appreciated.
This question already has an answer here:
Convert Vector to Matrix without Recycling
2 answers
r split rows
r split rows
edited Nov 16 '18 at 5:09
Dark_Knight
asked Nov 16 '18 at 4:57
Dark_KnightDark_Knight
1235
1235
marked as duplicate by Jaap
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 8:42
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by Jaap
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 8:42
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
Here's a dynamic tidyverse
way. Should work for any breaks value.
set.seed(1)
df <- data_frame(x = sample(20, 10))
breaks <- 3
df %>%
mutate(
id = rep(paste0("col", 1:breaks), length.out = nrow(.)),
rn = ave(x, id, FUN = seq_along)
) %>%
spread(id, x) %>%
select(-rn)
# A tibble: 4 x 3
col1 col2 col3
<int> <int> <int>
1 6 8 11
2 16 4 14
3 15 9 19
4 1 NA NA
# another example with breaks at 6
breaks <- 6
df %>%
mutate(
id = rep(paste0("col", 1:breaks), length.out = nrow(.)),
rn = ave(x, id, FUN = seq_along)
) %>%
spread(id, x) %>%
select(-rn)
# A tibble: 2 x 6
col1 col2 col3 col4 col5 col6
<int> <int> <int> <int> <int> <int>
1 6 8 11 16 4 14
2 15 9 19 1 NA NA
Thanks. It's almost working. I am encountering an errorDuplicate identifiers for rows (600, 653,...)
while working on the actual data. For small dummy data it works perfectly fine.
– Dark_Knight
Nov 16 '18 at 11:23
Is your breaks > 26? If so, you need to adjust theletters[1:breaks]
to something more appropriate. Seems like you are breaking at intervals of 52. Also this question has been marked as duplicate so check out the original question for other answers.
– Shree
Nov 16 '18 at 13:17
Yes. Originally I am breaking at intervals of 11446. What modifications needs to be done withletters[1:breaks]
?
– Dark_Knight
Nov 16 '18 at 13:52
I have updated the answer to make it scalable to any breaks value. Try it and let me know.
– Shree
Nov 16 '18 at 15:35
It worked perfectly. Thank you.
– Dark_Knight
Nov 16 '18 at 16:29
add a comment |
Here is one base R option. We can pad your input vector/column with NA
so that its length becomes a multiple of three. Then, generate index series for each of three columns, and create the desired data frame.
rem <- length(input) %% 3
input <- c(input, rep(NA, ifelse(rem == 0, 0, 3 - rem)))
idx1 <- seq(1, length(input), 3)
idx2 <- seq(2, length(input), 3)
idx3 <- seq(3, length(input), 3)
df <- data.frame(v1=input[idx1], v2=input[idx2], v3=input[idx3])
Not for production use, but here is a small demo showing that the logic works.
– Tim Biegeleisen
Nov 16 '18 at 5:26
This works when we take theinput
file as a vectorc(1,2,..)
. However when I import a csv file containing these numbers it doesn't work.
– Dark_Knight
Nov 16 '18 at 5:38
1
@Dark_Knight Then my code would just require a slight modification. We can replaceinput
with the data frame/data table column.
– Tim Biegeleisen
Nov 16 '18 at 5:42
1
read.csv( "my_data.csv" )[ ,1 ]
would give you a vector
– vaettchen
Nov 16 '18 at 5:46
Elegant solution. But iflength(input)
is a very large number (millions) and the break needs to be done at intervals of magnitude thousand, then it won't be possible to generateidx
sequences manually.
– Dark_Knight
Nov 16 '18 at 6:09
|
show 6 more comments
You can use cut function in dplyr package.
dataframe %>% group_by(column) %>%
mutate(new_variable = cut(column, breaks=quantile(column, c(0,0.25,0.5,0.75,1), labels=F))
or
#breaks into the intervals you require
new_variable <- cut(as.numeric(dataset$column),breaks = 3)
And then use melt function in reshape package to transpose column to rows
add a comment |
If your data is in the form of a vector you can do the following:
data <- c('10', '25', '09', '04', '14', '100', '01',
'10', '100', '04', '04', '01', '04')
split(data, ceiling(seq_along(data) / 3))
If it is in a data frame this should do it:
library(dplyr)
library(tidyr)
data <- data.frame(
value = c('10', '25', '09', '04', '14', '100', '01',
'10', '100', '04', '04', '01', '04'))
data %>%
mutate(key = rep_len(c('a', 'b', 'c'), length.out = nrow(.))) %>%
group_by(idx = as.integer((row_number() - 1) / 3)) %>%
spread(key, value) %>%
select(-idx) %>%
ungroup()
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here's a dynamic tidyverse
way. Should work for any breaks value.
set.seed(1)
df <- data_frame(x = sample(20, 10))
breaks <- 3
df %>%
mutate(
id = rep(paste0("col", 1:breaks), length.out = nrow(.)),
rn = ave(x, id, FUN = seq_along)
) %>%
spread(id, x) %>%
select(-rn)
# A tibble: 4 x 3
col1 col2 col3
<int> <int> <int>
1 6 8 11
2 16 4 14
3 15 9 19
4 1 NA NA
# another example with breaks at 6
breaks <- 6
df %>%
mutate(
id = rep(paste0("col", 1:breaks), length.out = nrow(.)),
rn = ave(x, id, FUN = seq_along)
) %>%
spread(id, x) %>%
select(-rn)
# A tibble: 2 x 6
col1 col2 col3 col4 col5 col6
<int> <int> <int> <int> <int> <int>
1 6 8 11 16 4 14
2 15 9 19 1 NA NA
Thanks. It's almost working. I am encountering an errorDuplicate identifiers for rows (600, 653,...)
while working on the actual data. For small dummy data it works perfectly fine.
– Dark_Knight
Nov 16 '18 at 11:23
Is your breaks > 26? If so, you need to adjust theletters[1:breaks]
to something more appropriate. Seems like you are breaking at intervals of 52. Also this question has been marked as duplicate so check out the original question for other answers.
– Shree
Nov 16 '18 at 13:17
Yes. Originally I am breaking at intervals of 11446. What modifications needs to be done withletters[1:breaks]
?
– Dark_Knight
Nov 16 '18 at 13:52
I have updated the answer to make it scalable to any breaks value. Try it and let me know.
– Shree
Nov 16 '18 at 15:35
It worked perfectly. Thank you.
– Dark_Knight
Nov 16 '18 at 16:29
add a comment |
Here's a dynamic tidyverse
way. Should work for any breaks value.
set.seed(1)
df <- data_frame(x = sample(20, 10))
breaks <- 3
df %>%
mutate(
id = rep(paste0("col", 1:breaks), length.out = nrow(.)),
rn = ave(x, id, FUN = seq_along)
) %>%
spread(id, x) %>%
select(-rn)
# A tibble: 4 x 3
col1 col2 col3
<int> <int> <int>
1 6 8 11
2 16 4 14
3 15 9 19
4 1 NA NA
# another example with breaks at 6
breaks <- 6
df %>%
mutate(
id = rep(paste0("col", 1:breaks), length.out = nrow(.)),
rn = ave(x, id, FUN = seq_along)
) %>%
spread(id, x) %>%
select(-rn)
# A tibble: 2 x 6
col1 col2 col3 col4 col5 col6
<int> <int> <int> <int> <int> <int>
1 6 8 11 16 4 14
2 15 9 19 1 NA NA
Thanks. It's almost working. I am encountering an errorDuplicate identifiers for rows (600, 653,...)
while working on the actual data. For small dummy data it works perfectly fine.
– Dark_Knight
Nov 16 '18 at 11:23
Is your breaks > 26? If so, you need to adjust theletters[1:breaks]
to something more appropriate. Seems like you are breaking at intervals of 52. Also this question has been marked as duplicate so check out the original question for other answers.
– Shree
Nov 16 '18 at 13:17
Yes. Originally I am breaking at intervals of 11446. What modifications needs to be done withletters[1:breaks]
?
– Dark_Knight
Nov 16 '18 at 13:52
I have updated the answer to make it scalable to any breaks value. Try it and let me know.
– Shree
Nov 16 '18 at 15:35
It worked perfectly. Thank you.
– Dark_Knight
Nov 16 '18 at 16:29
add a comment |
Here's a dynamic tidyverse
way. Should work for any breaks value.
set.seed(1)
df <- data_frame(x = sample(20, 10))
breaks <- 3
df %>%
mutate(
id = rep(paste0("col", 1:breaks), length.out = nrow(.)),
rn = ave(x, id, FUN = seq_along)
) %>%
spread(id, x) %>%
select(-rn)
# A tibble: 4 x 3
col1 col2 col3
<int> <int> <int>
1 6 8 11
2 16 4 14
3 15 9 19
4 1 NA NA
# another example with breaks at 6
breaks <- 6
df %>%
mutate(
id = rep(paste0("col", 1:breaks), length.out = nrow(.)),
rn = ave(x, id, FUN = seq_along)
) %>%
spread(id, x) %>%
select(-rn)
# A tibble: 2 x 6
col1 col2 col3 col4 col5 col6
<int> <int> <int> <int> <int> <int>
1 6 8 11 16 4 14
2 15 9 19 1 NA NA
Here's a dynamic tidyverse
way. Should work for any breaks value.
set.seed(1)
df <- data_frame(x = sample(20, 10))
breaks <- 3
df %>%
mutate(
id = rep(paste0("col", 1:breaks), length.out = nrow(.)),
rn = ave(x, id, FUN = seq_along)
) %>%
spread(id, x) %>%
select(-rn)
# A tibble: 4 x 3
col1 col2 col3
<int> <int> <int>
1 6 8 11
2 16 4 14
3 15 9 19
4 1 NA NA
# another example with breaks at 6
breaks <- 6
df %>%
mutate(
id = rep(paste0("col", 1:breaks), length.out = nrow(.)),
rn = ave(x, id, FUN = seq_along)
) %>%
spread(id, x) %>%
select(-rn)
# A tibble: 2 x 6
col1 col2 col3 col4 col5 col6
<int> <int> <int> <int> <int> <int>
1 6 8 11 16 4 14
2 15 9 19 1 NA NA
edited Nov 16 '18 at 15:35
answered Nov 16 '18 at 6:01
ShreeShree
3,5161424
3,5161424
Thanks. It's almost working. I am encountering an errorDuplicate identifiers for rows (600, 653,...)
while working on the actual data. For small dummy data it works perfectly fine.
– Dark_Knight
Nov 16 '18 at 11:23
Is your breaks > 26? If so, you need to adjust theletters[1:breaks]
to something more appropriate. Seems like you are breaking at intervals of 52. Also this question has been marked as duplicate so check out the original question for other answers.
– Shree
Nov 16 '18 at 13:17
Yes. Originally I am breaking at intervals of 11446. What modifications needs to be done withletters[1:breaks]
?
– Dark_Knight
Nov 16 '18 at 13:52
I have updated the answer to make it scalable to any breaks value. Try it and let me know.
– Shree
Nov 16 '18 at 15:35
It worked perfectly. Thank you.
– Dark_Knight
Nov 16 '18 at 16:29
add a comment |
Thanks. It's almost working. I am encountering an errorDuplicate identifiers for rows (600, 653,...)
while working on the actual data. For small dummy data it works perfectly fine.
– Dark_Knight
Nov 16 '18 at 11:23
Is your breaks > 26? If so, you need to adjust theletters[1:breaks]
to something more appropriate. Seems like you are breaking at intervals of 52. Also this question has been marked as duplicate so check out the original question for other answers.
– Shree
Nov 16 '18 at 13:17
Yes. Originally I am breaking at intervals of 11446. What modifications needs to be done withletters[1:breaks]
?
– Dark_Knight
Nov 16 '18 at 13:52
I have updated the answer to make it scalable to any breaks value. Try it and let me know.
– Shree
Nov 16 '18 at 15:35
It worked perfectly. Thank you.
– Dark_Knight
Nov 16 '18 at 16:29
Thanks. It's almost working. I am encountering an error
Duplicate identifiers for rows (600, 653,...)
while working on the actual data. For small dummy data it works perfectly fine.– Dark_Knight
Nov 16 '18 at 11:23
Thanks. It's almost working. I am encountering an error
Duplicate identifiers for rows (600, 653,...)
while working on the actual data. For small dummy data it works perfectly fine.– Dark_Knight
Nov 16 '18 at 11:23
Is your breaks > 26? If so, you need to adjust the
letters[1:breaks]
to something more appropriate. Seems like you are breaking at intervals of 52. Also this question has been marked as duplicate so check out the original question for other answers.– Shree
Nov 16 '18 at 13:17
Is your breaks > 26? If so, you need to adjust the
letters[1:breaks]
to something more appropriate. Seems like you are breaking at intervals of 52. Also this question has been marked as duplicate so check out the original question for other answers.– Shree
Nov 16 '18 at 13:17
Yes. Originally I am breaking at intervals of 11446. What modifications needs to be done with
letters[1:breaks]
?– Dark_Knight
Nov 16 '18 at 13:52
Yes. Originally I am breaking at intervals of 11446. What modifications needs to be done with
letters[1:breaks]
?– Dark_Knight
Nov 16 '18 at 13:52
I have updated the answer to make it scalable to any breaks value. Try it and let me know.
– Shree
Nov 16 '18 at 15:35
I have updated the answer to make it scalable to any breaks value. Try it and let me know.
– Shree
Nov 16 '18 at 15:35
It worked perfectly. Thank you.
– Dark_Knight
Nov 16 '18 at 16:29
It worked perfectly. Thank you.
– Dark_Knight
Nov 16 '18 at 16:29
add a comment |
Here is one base R option. We can pad your input vector/column with NA
so that its length becomes a multiple of three. Then, generate index series for each of three columns, and create the desired data frame.
rem <- length(input) %% 3
input <- c(input, rep(NA, ifelse(rem == 0, 0, 3 - rem)))
idx1 <- seq(1, length(input), 3)
idx2 <- seq(2, length(input), 3)
idx3 <- seq(3, length(input), 3)
df <- data.frame(v1=input[idx1], v2=input[idx2], v3=input[idx3])
Not for production use, but here is a small demo showing that the logic works.
– Tim Biegeleisen
Nov 16 '18 at 5:26
This works when we take theinput
file as a vectorc(1,2,..)
. However when I import a csv file containing these numbers it doesn't work.
– Dark_Knight
Nov 16 '18 at 5:38
1
@Dark_Knight Then my code would just require a slight modification. We can replaceinput
with the data frame/data table column.
– Tim Biegeleisen
Nov 16 '18 at 5:42
1
read.csv( "my_data.csv" )[ ,1 ]
would give you a vector
– vaettchen
Nov 16 '18 at 5:46
Elegant solution. But iflength(input)
is a very large number (millions) and the break needs to be done at intervals of magnitude thousand, then it won't be possible to generateidx
sequences manually.
– Dark_Knight
Nov 16 '18 at 6:09
|
show 6 more comments
Here is one base R option. We can pad your input vector/column with NA
so that its length becomes a multiple of three. Then, generate index series for each of three columns, and create the desired data frame.
rem <- length(input) %% 3
input <- c(input, rep(NA, ifelse(rem == 0, 0, 3 - rem)))
idx1 <- seq(1, length(input), 3)
idx2 <- seq(2, length(input), 3)
idx3 <- seq(3, length(input), 3)
df <- data.frame(v1=input[idx1], v2=input[idx2], v3=input[idx3])
Not for production use, but here is a small demo showing that the logic works.
– Tim Biegeleisen
Nov 16 '18 at 5:26
This works when we take theinput
file as a vectorc(1,2,..)
. However when I import a csv file containing these numbers it doesn't work.
– Dark_Knight
Nov 16 '18 at 5:38
1
@Dark_Knight Then my code would just require a slight modification. We can replaceinput
with the data frame/data table column.
– Tim Biegeleisen
Nov 16 '18 at 5:42
1
read.csv( "my_data.csv" )[ ,1 ]
would give you a vector
– vaettchen
Nov 16 '18 at 5:46
Elegant solution. But iflength(input)
is a very large number (millions) and the break needs to be done at intervals of magnitude thousand, then it won't be possible to generateidx
sequences manually.
– Dark_Knight
Nov 16 '18 at 6:09
|
show 6 more comments
Here is one base R option. We can pad your input vector/column with NA
so that its length becomes a multiple of three. Then, generate index series for each of three columns, and create the desired data frame.
rem <- length(input) %% 3
input <- c(input, rep(NA, ifelse(rem == 0, 0, 3 - rem)))
idx1 <- seq(1, length(input), 3)
idx2 <- seq(2, length(input), 3)
idx3 <- seq(3, length(input), 3)
df <- data.frame(v1=input[idx1], v2=input[idx2], v3=input[idx3])
Here is one base R option. We can pad your input vector/column with NA
so that its length becomes a multiple of three. Then, generate index series for each of three columns, and create the desired data frame.
rem <- length(input) %% 3
input <- c(input, rep(NA, ifelse(rem == 0, 0, 3 - rem)))
idx1 <- seq(1, length(input), 3)
idx2 <- seq(2, length(input), 3)
idx3 <- seq(3, length(input), 3)
df <- data.frame(v1=input[idx1], v2=input[idx2], v3=input[idx3])
answered Nov 16 '18 at 5:20
Tim BiegeleisenTim Biegeleisen
234k1399157
234k1399157
Not for production use, but here is a small demo showing that the logic works.
– Tim Biegeleisen
Nov 16 '18 at 5:26
This works when we take theinput
file as a vectorc(1,2,..)
. However when I import a csv file containing these numbers it doesn't work.
– Dark_Knight
Nov 16 '18 at 5:38
1
@Dark_Knight Then my code would just require a slight modification. We can replaceinput
with the data frame/data table column.
– Tim Biegeleisen
Nov 16 '18 at 5:42
1
read.csv( "my_data.csv" )[ ,1 ]
would give you a vector
– vaettchen
Nov 16 '18 at 5:46
Elegant solution. But iflength(input)
is a very large number (millions) and the break needs to be done at intervals of magnitude thousand, then it won't be possible to generateidx
sequences manually.
– Dark_Knight
Nov 16 '18 at 6:09
|
show 6 more comments
Not for production use, but here is a small demo showing that the logic works.
– Tim Biegeleisen
Nov 16 '18 at 5:26
This works when we take theinput
file as a vectorc(1,2,..)
. However when I import a csv file containing these numbers it doesn't work.
– Dark_Knight
Nov 16 '18 at 5:38
1
@Dark_Knight Then my code would just require a slight modification. We can replaceinput
with the data frame/data table column.
– Tim Biegeleisen
Nov 16 '18 at 5:42
1
read.csv( "my_data.csv" )[ ,1 ]
would give you a vector
– vaettchen
Nov 16 '18 at 5:46
Elegant solution. But iflength(input)
is a very large number (millions) and the break needs to be done at intervals of magnitude thousand, then it won't be possible to generateidx
sequences manually.
– Dark_Knight
Nov 16 '18 at 6:09
Not for production use, but here is a small demo showing that the logic works.
– Tim Biegeleisen
Nov 16 '18 at 5:26
Not for production use, but here is a small demo showing that the logic works.
– Tim Biegeleisen
Nov 16 '18 at 5:26
This works when we take the
input
file as a vector c(1,2,..)
. However when I import a csv file containing these numbers it doesn't work.– Dark_Knight
Nov 16 '18 at 5:38
This works when we take the
input
file as a vector c(1,2,..)
. However when I import a csv file containing these numbers it doesn't work.– Dark_Knight
Nov 16 '18 at 5:38
1
1
@Dark_Knight Then my code would just require a slight modification. We can replace
input
with the data frame/data table column.– Tim Biegeleisen
Nov 16 '18 at 5:42
@Dark_Knight Then my code would just require a slight modification. We can replace
input
with the data frame/data table column.– Tim Biegeleisen
Nov 16 '18 at 5:42
1
1
read.csv( "my_data.csv" )[ ,1 ]
would give you a vector– vaettchen
Nov 16 '18 at 5:46
read.csv( "my_data.csv" )[ ,1 ]
would give you a vector– vaettchen
Nov 16 '18 at 5:46
Elegant solution. But if
length(input)
is a very large number (millions) and the break needs to be done at intervals of magnitude thousand, then it won't be possible to generate idx
sequences manually.– Dark_Knight
Nov 16 '18 at 6:09
Elegant solution. But if
length(input)
is a very large number (millions) and the break needs to be done at intervals of magnitude thousand, then it won't be possible to generate idx
sequences manually.– Dark_Knight
Nov 16 '18 at 6:09
|
show 6 more comments
You can use cut function in dplyr package.
dataframe %>% group_by(column) %>%
mutate(new_variable = cut(column, breaks=quantile(column, c(0,0.25,0.5,0.75,1), labels=F))
or
#breaks into the intervals you require
new_variable <- cut(as.numeric(dataset$column),breaks = 3)
And then use melt function in reshape package to transpose column to rows
add a comment |
You can use cut function in dplyr package.
dataframe %>% group_by(column) %>%
mutate(new_variable = cut(column, breaks=quantile(column, c(0,0.25,0.5,0.75,1), labels=F))
or
#breaks into the intervals you require
new_variable <- cut(as.numeric(dataset$column),breaks = 3)
And then use melt function in reshape package to transpose column to rows
add a comment |
You can use cut function in dplyr package.
dataframe %>% group_by(column) %>%
mutate(new_variable = cut(column, breaks=quantile(column, c(0,0.25,0.5,0.75,1), labels=F))
or
#breaks into the intervals you require
new_variable <- cut(as.numeric(dataset$column),breaks = 3)
And then use melt function in reshape package to transpose column to rows
You can use cut function in dplyr package.
dataframe %>% group_by(column) %>%
mutate(new_variable = cut(column, breaks=quantile(column, c(0,0.25,0.5,0.75,1), labels=F))
or
#breaks into the intervals you require
new_variable <- cut(as.numeric(dataset$column),breaks = 3)
And then use melt function in reshape package to transpose column to rows
edited Nov 16 '18 at 5:57
Shree
3,5161424
3,5161424
answered Nov 16 '18 at 5:17
john doejohn doe
213
213
add a comment |
add a comment |
If your data is in the form of a vector you can do the following:
data <- c('10', '25', '09', '04', '14', '100', '01',
'10', '100', '04', '04', '01', '04')
split(data, ceiling(seq_along(data) / 3))
If it is in a data frame this should do it:
library(dplyr)
library(tidyr)
data <- data.frame(
value = c('10', '25', '09', '04', '14', '100', '01',
'10', '100', '04', '04', '01', '04'))
data %>%
mutate(key = rep_len(c('a', 'b', 'c'), length.out = nrow(.))) %>%
group_by(idx = as.integer((row_number() - 1) / 3)) %>%
spread(key, value) %>%
select(-idx) %>%
ungroup()
add a comment |
If your data is in the form of a vector you can do the following:
data <- c('10', '25', '09', '04', '14', '100', '01',
'10', '100', '04', '04', '01', '04')
split(data, ceiling(seq_along(data) / 3))
If it is in a data frame this should do it:
library(dplyr)
library(tidyr)
data <- data.frame(
value = c('10', '25', '09', '04', '14', '100', '01',
'10', '100', '04', '04', '01', '04'))
data %>%
mutate(key = rep_len(c('a', 'b', 'c'), length.out = nrow(.))) %>%
group_by(idx = as.integer((row_number() - 1) / 3)) %>%
spread(key, value) %>%
select(-idx) %>%
ungroup()
add a comment |
If your data is in the form of a vector you can do the following:
data <- c('10', '25', '09', '04', '14', '100', '01',
'10', '100', '04', '04', '01', '04')
split(data, ceiling(seq_along(data) / 3))
If it is in a data frame this should do it:
library(dplyr)
library(tidyr)
data <- data.frame(
value = c('10', '25', '09', '04', '14', '100', '01',
'10', '100', '04', '04', '01', '04'))
data %>%
mutate(key = rep_len(c('a', 'b', 'c'), length.out = nrow(.))) %>%
group_by(idx = as.integer((row_number() - 1) / 3)) %>%
spread(key, value) %>%
select(-idx) %>%
ungroup()
If your data is in the form of a vector you can do the following:
data <- c('10', '25', '09', '04', '14', '100', '01',
'10', '100', '04', '04', '01', '04')
split(data, ceiling(seq_along(data) / 3))
If it is in a data frame this should do it:
library(dplyr)
library(tidyr)
data <- data.frame(
value = c('10', '25', '09', '04', '14', '100', '01',
'10', '100', '04', '04', '01', '04'))
data %>%
mutate(key = rep_len(c('a', 'b', 'c'), length.out = nrow(.))) %>%
group_by(idx = as.integer((row_number() - 1) / 3)) %>%
spread(key, value) %>%
select(-idx) %>%
ungroup()
answered Nov 16 '18 at 6:09
dmcadmca
4681515
4681515
add a comment |
add a comment |