Get indexes of a vector of numbers in another vector












27















Let's suppose we have the following vector:



v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)


Given a sequence of numbers, for instance c(2,3,5,8), I am trying to find what is the position of this sequence of numbers in the vector v. The result I expect is something like:



FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE 


I am trying to use which(v == c(2,3,5,8)) but it doesn't give me what I am looking for.



Thanks beforehand.










share|improve this question




















  • 1





    @akrun I want to find the exact beginning and end of the sequence. The first element is 2, followed by 3, and so on... Does it clarify?

    – eirvine
    Feb 7 '18 at 9:54






  • 4





    Keep in mind this will not be possible with floats due to the usual binary representation limits. You might be able to modify any of the given solutions, replacing == with all.equal or cgwtools::approxeq (tooting my own horn there)

    – Carl Witthoft
    Feb 7 '18 at 14:34






  • 1





    Seems like you just a string search algorithm en.wikipedia.org/wiki/String_searching_algorithm

    – Alexander
    Feb 8 '18 at 4:20






  • 2





    @Alexander A string search algorithm isn't the most efficient solution in this case. See the benchmarks for examples.

    – Jaap
    Feb 10 '18 at 7:08






  • 1





    @Alexander 989's answer is the string search answer.

    – Jaap
    Feb 12 '18 at 7:13
















27















Let's suppose we have the following vector:



v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)


Given a sequence of numbers, for instance c(2,3,5,8), I am trying to find what is the position of this sequence of numbers in the vector v. The result I expect is something like:



FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE 


I am trying to use which(v == c(2,3,5,8)) but it doesn't give me what I am looking for.



Thanks beforehand.










share|improve this question




















  • 1





    @akrun I want to find the exact beginning and end of the sequence. The first element is 2, followed by 3, and so on... Does it clarify?

    – eirvine
    Feb 7 '18 at 9:54






  • 4





    Keep in mind this will not be possible with floats due to the usual binary representation limits. You might be able to modify any of the given solutions, replacing == with all.equal or cgwtools::approxeq (tooting my own horn there)

    – Carl Witthoft
    Feb 7 '18 at 14:34






  • 1





    Seems like you just a string search algorithm en.wikipedia.org/wiki/String_searching_algorithm

    – Alexander
    Feb 8 '18 at 4:20






  • 2





    @Alexander A string search algorithm isn't the most efficient solution in this case. See the benchmarks for examples.

    – Jaap
    Feb 10 '18 at 7:08






  • 1





    @Alexander 989's answer is the string search answer.

    – Jaap
    Feb 12 '18 at 7:13














27












27








27


10






Let's suppose we have the following vector:



v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)


Given a sequence of numbers, for instance c(2,3,5,8), I am trying to find what is the position of this sequence of numbers in the vector v. The result I expect is something like:



FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE 


I am trying to use which(v == c(2,3,5,8)) but it doesn't give me what I am looking for.



Thanks beforehand.










share|improve this question
















Let's suppose we have the following vector:



v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)


Given a sequence of numbers, for instance c(2,3,5,8), I am trying to find what is the position of this sequence of numbers in the vector v. The result I expect is something like:



FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE 


I am trying to use which(v == c(2,3,5,8)) but it doesn't give me what I am looking for.



Thanks beforehand.







r vector






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 15:55









Henrik

41.7k994109




41.7k994109










asked Feb 7 '18 at 9:48









eirvineeirvine

14026




14026








  • 1





    @akrun I want to find the exact beginning and end of the sequence. The first element is 2, followed by 3, and so on... Does it clarify?

    – eirvine
    Feb 7 '18 at 9:54






  • 4





    Keep in mind this will not be possible with floats due to the usual binary representation limits. You might be able to modify any of the given solutions, replacing == with all.equal or cgwtools::approxeq (tooting my own horn there)

    – Carl Witthoft
    Feb 7 '18 at 14:34






  • 1





    Seems like you just a string search algorithm en.wikipedia.org/wiki/String_searching_algorithm

    – Alexander
    Feb 8 '18 at 4:20






  • 2





    @Alexander A string search algorithm isn't the most efficient solution in this case. See the benchmarks for examples.

    – Jaap
    Feb 10 '18 at 7:08






  • 1





    @Alexander 989's answer is the string search answer.

    – Jaap
    Feb 12 '18 at 7:13














  • 1





    @akrun I want to find the exact beginning and end of the sequence. The first element is 2, followed by 3, and so on... Does it clarify?

    – eirvine
    Feb 7 '18 at 9:54






  • 4





    Keep in mind this will not be possible with floats due to the usual binary representation limits. You might be able to modify any of the given solutions, replacing == with all.equal or cgwtools::approxeq (tooting my own horn there)

    – Carl Witthoft
    Feb 7 '18 at 14:34






  • 1





    Seems like you just a string search algorithm en.wikipedia.org/wiki/String_searching_algorithm

    – Alexander
    Feb 8 '18 at 4:20






  • 2





    @Alexander A string search algorithm isn't the most efficient solution in this case. See the benchmarks for examples.

    – Jaap
    Feb 10 '18 at 7:08






  • 1





    @Alexander 989's answer is the string search answer.

    – Jaap
    Feb 12 '18 at 7:13








1




1





@akrun I want to find the exact beginning and end of the sequence. The first element is 2, followed by 3, and so on... Does it clarify?

– eirvine
Feb 7 '18 at 9:54





@akrun I want to find the exact beginning and end of the sequence. The first element is 2, followed by 3, and so on... Does it clarify?

– eirvine
Feb 7 '18 at 9:54




4




4





Keep in mind this will not be possible with floats due to the usual binary representation limits. You might be able to modify any of the given solutions, replacing == with all.equal or cgwtools::approxeq (tooting my own horn there)

– Carl Witthoft
Feb 7 '18 at 14:34





Keep in mind this will not be possible with floats due to the usual binary representation limits. You might be able to modify any of the given solutions, replacing == with all.equal or cgwtools::approxeq (tooting my own horn there)

– Carl Witthoft
Feb 7 '18 at 14:34




1




1





Seems like you just a string search algorithm en.wikipedia.org/wiki/String_searching_algorithm

– Alexander
Feb 8 '18 at 4:20





Seems like you just a string search algorithm en.wikipedia.org/wiki/String_searching_algorithm

– Alexander
Feb 8 '18 at 4:20




2




2





@Alexander A string search algorithm isn't the most efficient solution in this case. See the benchmarks for examples.

– Jaap
Feb 10 '18 at 7:08





@Alexander A string search algorithm isn't the most efficient solution in this case. See the benchmarks for examples.

– Jaap
Feb 10 '18 at 7:08




1




1





@Alexander 989's answer is the string search answer.

– Jaap
Feb 12 '18 at 7:13





@Alexander 989's answer is the string search answer.

– Jaap
Feb 12 '18 at 7:13












9 Answers
9






active

oldest

votes


















21














Using base R you could do the following:



v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
x <- c(2,3,5,8)

idx <- which(v == x[1])
idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))]
# [1] 2 12


This tells you that the exact sequence appears twice, starting at positions 2 and 12 of your vector v.



It first checks the possible starting positions, i.e. where v equals the first value of x and then loops through these positions to check if the values after these positions also equal the other values of x.






share|improve this answer


























  • I was going to suggest something like which(colSums(t(embed(v, length(x))[, length(x):1]) == x) == length(x)), but I think this is easy to follow....

    – A5C1D2H2I1M1N2O1R2T1
    Feb 7 '18 at 10:06













  • @A5C1D2H2I1M1N2O1R2T1, that looks indeed a little hard to follow

    – docendo discimus
    Feb 7 '18 at 10:10






  • 2





    Worth noting, idx <- which(v == x[1]) is an important step. While other answers are going through all 1:4 shift variations 14 times, this answer does it in 3 steps.

    – zx8754
    Feb 7 '18 at 10:51











  • @zx8754, ...but the "data.table" approach still manages to win in terms of speed in a couple of tests I did with larger vecs....

    – A5C1D2H2I1M1N2O1R2T1
    Feb 7 '18 at 10:52



















16














Two other approaches using the shift-function trom data.table:



library(data.table)

# option 1
which(rowSums(mapply('==',
shift(v, type = 'lead', n = 0:(length(x) - 1)),
x)
) == length(x))

# option 2
which(Reduce("+", Map('==',
shift(v, type = 'lead', n = 0:(length(x) - 1)),
x)
) == length(x))


both give:




[1]  2 12



To get a full vector of the matching positions:



l <- length(x)
w <- which(Reduce("+", Map('==',
shift(v, type = 'lead', n = 0:(l - 1)),
x)
) == l)
rep(w, each = l) + 0:(l-1)


which gives:




[1]  2  3  4  5 12 13 14 15



The benchmark which was included earlier in this answer has been moved to a separate community wiki answer.





Used data:



v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
x <- c(2,3,5,8)





share|improve this answer





















  • 1





    Many of these solutions don't give the desired output, the extra step is not cost free

    – Moody_Mudskipper
    Feb 8 '18 at 11:29






  • 4





    @989 I will update, but didn't have the time yet. What I do not understand is that you downvote me but don't downvote the invalid answers. What's the reason for that? Furthermore: why didn't you comment under the invalid answers so that they get a chance to improve?

    – Jaap
    Feb 9 '18 at 10:46








  • 2





    @989 You could always suggest Edit to this post or provide your own benchmark on your own post with explanation why Jaap's is wrong. No need for this kind of tone.

    – zx8754
    Feb 9 '18 at 10:57








  • 1





    Not sure whose idea it was, but getting a full vector of matching positions concatenated together seems like a bad idea. If I test with x = c(1,1,1) then I may find positions appearing multiple times. Besides, it's redundant -- the informational content of the first position is enough... Anyway, not a big deal, just my two cents ... noticed it all over the benchmarks.

    – Frank
    Feb 9 '18 at 22:11






  • 1





    @Frank I don't think it is a bad idea necessarily. It depends on what you want to do with it. I included it in the benchmarks to make sure every solution returns the same and thus get a fair comparison.

    – Jaap
    Feb 10 '18 at 7:04



















15














You can use rollapply() from zoo



v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
x <- c(2,3,5,8)

library("zoo")
searchX <- function(x, X) all(x==X)
rollapply(v, FUN=searchX, X=x, width=length(x))


The result TRUEshows you the beginning of the sequence.

The code could be simplified to rollapply(v, length(x), identical, x) (thanks to G. Grothendieck):



set.seed(2)
vl <- as.numeric(sample(1:10, 1e6, TRUE))
# vm <- vl[1:1e5]
# vs <- vl[1:1e4]
x <- c(2,3,5)

library("zoo")
searchX <- function(x, X) all(x==X)
i1 <- rollapply(vl, FUN=searchX, X=x, width=length(x))
i2 <- rollapply(vl, width=length(x), identical, y=x)

identical(i1, i2)


For using identical() both arguments must be of the same type (num and int are not the same).

If needed == coerces int to num; identical() does not any coercion.






share|improve this answer


























  • Could you check your 2nd solution? As you can see in the benchmark answer it doesn't return the same output as the other answers.

    – Jaap
    Feb 9 '18 at 15:04








  • 1





    I tried (also unsuccesfully) to repair it as well. I will remove it from the benchmarks.

    – Jaap
    Feb 9 '18 at 16:30






  • 1





    The code could be simplified to rollapply(v, length(x), identical, x) where v and x must be of the same type, e.g. both integer or both double, since for example identical(5L, 5) is FALSE.

    – G. Grothendieck
    Feb 10 '18 at 3:18








  • 1





    @G.Grothendieck Thx, that was indeed the issue. When both are of the same type, the solution with identical works.

    – Jaap
    Feb 10 '18 at 7:16



















10














I feel like looping should be efficient:



w = seq_along(v)
for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]

w
# [1] 2 12


This should be writable in C++ following @SymbolixAU approach for extra speed.



A basic comparison:



# create functions for selected approaches
redjaap <- function(v,x)
which(Reduce("+", Map('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x)) == length(x))
loop <- function(v,x){
w = seq_along(v)
for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]
w
}

# check consistency
identical(redjaap(v,x), loop(v,x))
# [1] TRUE

# check speed
library(microbenchmark)
vv <- rep(v, 1e4)
microbenchmark(redjaap(vv,x), loop(vv,x), times = 100)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# redjaap(vv, x) 5.883809 8.058230 17.225899 9.080246 9.907514 96.35226 100 b
# loop(vv, x) 3.629213 5.080816 9.475016 5.578508 6.495105 112.61242 100 a

# check consistency again
identical(redjaap(vv,x), loop(vv,x))
# [1] TRUE





share|improve this answer





















  • 1





    this method is really efficient in terms of the amount of code to achieve the objective...can use compiler::cmpfun(frank) for a slight speedup

    – chinsoon12
    Feb 22 '18 at 8:03



















10














Here are two Rcpp solutions. The first one returns the location of v that is the starting position of the sequence.



library(Rcpp)

v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
x <- c(2,3,5,8)

cppFunction('NumericVector SeqInVec(NumericVector myVector, NumericVector mySequence) {

int vecSize = myVector.size();
int seqSize = mySequence.size();
NumericVector comparison(seqSize);
NumericVector res(vecSize);

for (int i = 0; i < vecSize; i++ ) {

for (int j = 0; j < seqSize; j++ ) {
comparison[j] = mySequence[j] == myVector[i + j];
}

if (sum(comparison) == seqSize) {
res[i] = 1;
}else{
res[i] = 0;
}
}

return res;

}')

SeqInVec(v, x)
#[1] 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0




This second one returns the index values (as per the other answers) of every matched entry in the sequence.



cppFunction('NumericVector SeqInVec(NumericVector myVector, NumericVector mySequence) {

int vecSize = myVector.size();
int seqSize = mySequence.size();
NumericVector comparison(seqSize);
NumericVector res(vecSize);
int foundCounter = 0;

for (int i = 0; i < vecSize; i++ ) {

for (int j = 0; j < seqSize; j++ ) {
comparison[j] = mySequence[j] == myVector[i + j];
}

if (sum(comparison) == seqSize) {
for (int j = 0; j < seqSize; j++ ) {
res[foundCounter] = i + j + 1;
foundCounter++;
}
}
}

IntegerVector idx = seq(0, (foundCounter-1));
return res[idx];
}')

SeqInVec(v, x)
# [1] 2 3 4 5 12 13 14 15




Optimising



As @MichaelChirico points out in their comment, further optimisations can be made. For example, if we know the first entry in the sequence doesn't match a value in the vector, we don't need to do the rest of the comparison



cppFunction('NumericVector SeqInVecOpt(NumericVector myVector, NumericVector mySequence) {

int vecSize = myVector.size();
int seqSize = mySequence.size();
NumericVector comparison(seqSize);
NumericVector res(vecSize);
int foundCounter = 0;

for (int i = 0; i < vecSize; i++ ) {

if (myVector[i] == mySequence[0]) {
for (int j = 0; j < seqSize; j++ ) {
comparison[j] = mySequence[j] == myVector[i + j];
}

if (sum(comparison) == seqSize) {
for (int j = 0; j < seqSize; j++ ) {
res[foundCounter] = i + j + 1;
foundCounter++;
}
}
}
}

IntegerVector idx = seq(0, (foundCounter-1));
return res[idx];
}')


The answer with benchmarks shows the performance of these approaches






share|improve this answer


























  • Could you update your solution such that it returns the same output as the others? I can then include it in the benchmark.

    – Jaap
    Feb 8 '18 at 20:14











  • Thx. Included in the separate benchmark answer now.

    – Jaap
    Feb 9 '18 at 15:14






  • 2





    since you're examining subsequent elements, shouldn't there be a way to optimize by skipping elements we already know don't start the sequence? e.g. in OPs example when checking at the second 2 we already know the 3rd element is not 2 so we can skip checking the elements after 3

    – MichaelChirico
    Feb 11 '18 at 11:18






  • 1





    2-3x speed-up, nice! I guess the improvement depends on the length of the "search" string and its density (% of TRUE values).

    – MichaelChirico
    Feb 12 '18 at 0:09






  • 1





    @MichaelChirico - yes that will likely be a factor. I've also tested a variation where it will increment i by the size of the search string, rather than one each time. In this example I didn't see any improvement, however.

    – SymbolixAU
    Feb 12 '18 at 0:13



















8














A benchmark on the posted answers:



Load the needed packages:



library(data.table)
library(microbenchmark)
library(Rcpp)
library(zoo)


Creating vector with which the benchmarks will be run:



set.seed(2)
vl <- sample(1:10, 1e6, TRUE)
vm <- vl[1:1e5]
vs <- vl[1:1e4]
x <- c(2,3,5)


Testing whether all solution give the same outcome on the small vector vs:



> all.equal(jaap1(vs,x), jaap2(vs,x))
[1] TRUE
> all.equal(jaap1(vs,x), docendo(vs,x))
[1] TRUE
> all.equal(jaap1(vs,x), a5c1(vs,x))
[1] TRUE
> all.equal(jaap1(vs,x), jogo1(vs,x))
[1] TRUE
> all.equal(jaap1(vs,x), moody(vs,x))
[1] "Numeric: lengths (24, 873) differ"
> all.equal(jaap1(vs,x), cata1(vs,x))
[1] "Numeric: lengths (24, 0) differ"
> all.equal(jaap1(vs,x), u989(vs,x))
[1] TRUE
> all.equal(jaap1(vs,x), frank(vs,x))
[1] TRUE
> all.equal(jaap1(vs,x), symb(vs,x))
[1] TRUE
> all.equal(jaap1(vs, x), symbOpt(vs, x))
[1] TRUE


Further inspection of the cata1 and moody solutions learns that they don't give the desired output. They are therefore not included in the benchmarks.



The benchmark for the smallest vector vs:



mbs <- microbenchmark(jaap1(vs,x), jaap2(vs,x), docendo(vs,x), a5c1(vs,x),
jogo1(vs,x), u989(vs,x), frank(vs,x), symb(vs,x), symbOpt(vs, x),
times = 100)


gives:




 print(mbs, order = "median")

Unit: microseconds
expr min lq mean median uq max neval
symbOpt(vs, x) 40.658 47.0565 78.47119 51.5220 56.2765 2170.708 100
symb(vs, x) 106.208 112.7885 151.76398 117.0655 123.7450 1976.360 100
frank(vs, x) 121.303 129.0515 203.13616 132.1115 137.9370 6193.837 100
jaap2(vs, x) 187.973 218.7805 322.98300 235.0535 255.2275 6287.548 100
jaap1(vs, x) 306.944 341.4055 452.32426 358.2600 387.7105 6376.805 100
a5c1(vs, x) 463.721 500.9465 628.13475 516.2845 553.2765 6179.304 100
docendo(vs, x) 1139.689 1244.0555 1399.88150 1313.6295 1363.3480 9516.529 100
u989(vs, x) 8048.969 8244.9570 8735.97523 8627.8335 8858.7075 18732.750 100
jogo1(vs, x) 40022.406 42208.4870 44927.58872 43733.8935 45008.0360 124496.190 100



The benchmark for the medium vector vm:



mbm <- microbenchmark(jaap1(vm,x), jaap2(vm,x), docendo(vm,x), a5c1(vm,x),
jogo1(vm,x), u989(vm,x), frank(vm,x), symb(vm,x), symbOpt(vm, x),
times = 100)


gives:




print(mbm, order = "median")

Unit: microseconds
expr min lq mean median uq max neval
symbOpt(vm, x) 357.452 405.0415 974.9058 763.0205 1067.803 7444.126 100
symb(vm, x) 1032.915 1117.7585 1923.4040 1422.1930 1753.044 17498.132 100
frank(vm, x) 1158.744 1470.8170 1829.8024 1826.1330 1935.641 6423.966 100
jaap2(vm, x) 1622.183 2872.7725 3798.6536 3147.7895 3680.954 14886.765 100
jaap1(vm, x) 3053.024 4729.6115 7325.3753 5607.8395 6682.814 87151.774 100
a5c1(vm, x) 5487.547 7458.2025 9612.5545 8137.1255 9420.684 88798.914 100
docendo(vm, x) 10780.920 11357.7440 13313.6269 12029.1720 13411.026 21984.294 100
u989(vm, x) 83518.898 84999.6890 88537.9931 87675.3260 90636.674 105681.313 100
jogo1(vm, x) 471753.735 512979.3840 537232.7003 534780.8050 556866.124 646810.092 100



The benchmark for the largest vector vl:



mbl <- microbenchmark(jaap1(vl,x), jaap2(vl,x), docendo(vl,x), a5c1(vl,x),
jogo1(vl,x), u989(vl,x), frank(vl,x), symb(vl,x), symbOpt(vl, x),
times = 100)


gives:




  print(mbl, order = "median")

Unit: milliseconds
expr min lq mean median uq max neval
symbOpt(vl, x) 4.679646 5.768531 12.30079 6.67608 11.67082 118.3467 100
symb(vl, x) 11.356392 12.656124 21.27423 13.74856 18.66955 149.9840 100
frank(vl, x) 13.523963 14.929656 22.70959 17.53589 22.04182 132.6248 100
jaap2(vl, x) 18.754847 24.968511 37.89915 29.78309 36.47700 145.3471 100
jaap1(vl, x) 37.047549 52.500684 95.28392 72.89496 138.55008 234.8694 100
a5c1(vl, x) 54.563389 76.704769 116.89269 89.53974 167.19679 248.9265 100
docendo(vl, x) 109.824281 124.631557 156.60513 129.64958 145.47547 296.0214 100
u989(vl, x) 1380.886338 1413.878029 1454.50502 1436.18430 1479.18934 1632.3281 100
jogo1(vl, x) 4067.106897 4339.005951 4472.46318 4454.89297 4563.08310 5114.4626 100





The used functions of each solution:



jaap1 <- function(v,x) {
l <- length(x);
w <- which(rowSums(mapply('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x) ) == length(x));
rep(w, each = l) + 0:(l-1)
}

jaap2 <- function(v,x) {
l <- length(x);
w <- which(Reduce("+", Map('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x)) == length(x));
rep(w, each = l) + 0:(l-1)
}

docendo <- function(v,x) {
l <- length(x);
idx <- which(v == x[1]);
w <- idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))];
rep(w, each = l) + 0:(l-1)
}

a5c1 <- function(v,x) {
l <- length(x);
w <- which(colSums(t(embed(v, l)[, l:1]) == x) == l);
rep(w, each = l) + 0:(l-1)
}

jogo1 <- function(v,x) {
l <- length(x);
searchX <- function(x, X) all(x==X);
w <- which(rollapply(v, FUN=searchX, X=x, width=l));
rep(w, each = l) + 0:(l-1)
}

moody <- function(v,x) {
l <- length(x);
v2 <- as.numeric(factor(c(v,NA),levels = x));
v2[is.na(v2)] <- l+1;
which(diff(v2) == 1)
}

cata1 <- function(v,x) {
l <- length(x);
w <- which(sapply(lapply(seq(length(v)-l)-1, function(i) v[seq(x)+i]), identical, x));
rep(w, each = l) + 0:(l-1)
}

u989 <- function(v,x) {
l <- length(x);
s <- paste(v, collapse = '-');
p <- paste0('\b', paste(x, collapse = '-'), '\b');
i <- c(1, unlist(gregexpr(p, s)));
m <- substring(s, head(i,-1), tail(i,-1));
ln <- lengths(strsplit(m, '-'));
w <- cumsum(c(ln[1], ln[-1]-1));
rep(w, each = l) + 0:(l-1)
}

frank <- function(v,x) {
l <- length(x);
w = seq_along(v);
for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]];
rep(w, each = l) + 0:(l-1)
}

cppFunction('NumericVector SeqInVec(NumericVector myVector, NumericVector mySequence) {

int vecSize = myVector.size();
int seqSize = mySequence.size();
NumericVector comparison(seqSize);
NumericVector res(vecSize);
int foundCounter = 0;

for (int i = 0; i < vecSize; i++ ) {

for (int j = 0; j < seqSize; j++ ) {
comparison[j] = mySequence[j] == myVector[i + j];
}

if (sum(comparison) == seqSize) {
for (int j = 0; j < seqSize; j++ ) {
res[foundCounter] = i + j + 1;
foundCounter++;
}
}
}

IntegerVector idx = seq(0, (foundCounter-1));
return res[idx];
}')

symb <- function(v,x) {SeqInVec(v, x)}

cppFunction('NumericVector SeqInVecOpt(NumericVector myVector, NumericVector mySequence) {

int vecSize = myVector.size();
int seqSize = mySequence.size();
NumericVector comparison(seqSize);
NumericVector res(vecSize);
int foundCounter = 0;

for (int i = 0; i < vecSize; i++ ) {

if (myVector[i] == mySequence[0]) {
for (int j = 0; j < seqSize; j++ ) {
comparison[j] = mySequence[j] == myVector[i + j];
}

if (sum(comparison) == seqSize) {
for (int j = 0; j < seqSize; j++ ) {
res[foundCounter] = i + j + 1;
foundCounter++;
}
}
}
}

IntegerVector idx = seq(0, (foundCounter-1));
return res[idx];
}')

symbOpt <- function(v,x) {SeqInVecOpt(v,x)}




Since this is a cw-answer I'll add my own benchmark of some of the answers.



library(data.table)
library(microbenchmark)

set.seed(2); v <- sample(1:100, 5e7, TRUE); x <- c(2,3,5)

jaap1 <- function(v, x) {
which(rowSums(mapply('==',shift(v, type = 'lead', n = 0:(length(x) - 1)),
x)) == length(x))
}

jaap2 <- function(v, x) {
which(Reduce("+", Map('==',shift(v, type = 'lead', n = 0:(length(x) - 1)),
x)) == length(x))
}

dd1 <- function(v, x) {
idx <- which(v == x[1])
idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))]
}

dd2 <- function(v, x) {
idx <- which(v == x[1L])
xl <- length(x) - 1L
idx[sapply(idx, function(i) all(v[i:(i+xl)] == x))]
}

frank <- function(v, x) {
w = seq_along(v)
for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]
w
}

all.equal(jaap1(v, x), dd1(v, x))
all.equal(jaap2(v, x), dd1(v, x))
all.equal(dd2(v, x), dd1(v, x))
all.equal(frank(v, x), dd1(v, x))

bm <- microbenchmark(jaap1(v, x), jaap2(v, x), dd1(v, x), dd2(v, x), frank(v, x),
unit = "relative", times = 25)

plot(bm)


Imgur



bm
Unit: relative
expr min lq mean median uq max neval
jaap1(v, x) 4.487360 4.591961 4.724153 4.870226 4.660023 3.9361093 25
jaap2(v, x) 2.026052 2.159902 2.116204 2.282644 2.138106 2.1133068 25
dd1(v, x) 1.078059 1.151530 1.119067 1.257337 1.201762 0.8646835 25
dd2(v, x) 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000 25
frank(v, x) 1.400735 1.376405 1.442887 1.427433 1.611672 1.3440097 25


Bottom line: without knowing the real data, all these benchmarks don't tell the whole story.






share|improve this answer





















  • 1





    @docendodiscimus - could you update with the data you've used in your benchmarks?

    – SymbolixAU
    Feb 11 '18 at 21:33











  • @SymbolixAU, yes of course. Sorry, I thought I had done that already.

    – docendo discimus
    Feb 12 '18 at 7:52











  • My answer in base R is (on average) 4x times faster than jogo's answer with the help of a library. I have got +2/-2 votes and his answer +15. Hmmm :-/

    – 989
    Feb 12 '18 at 9:47








  • 2





    @989 - I wouldn't take it personally; after the initial flurry of activity & votes, people don't often re-visit questions, which also means down-votes often won't get removed even if you improve the answer.

    – SymbolixAU
    Feb 13 '18 at 3:46





















4














Here's a solution that leverages binary search on secondary indices in data.table. (Great vignette here)



This method has quite a bit of overhead so it's not particularly competitive on the 1e4 length vector in the benchmark, but it hangs near the top of the pack as the size increases.



Hats off to everyone else posting solutions, learning a lot from this question.



matt <- function(v,x){
l <- length(x);
SL <- seq_len(l-1);
DT <- data.table(Seq_0 = v);
for (i in SL) set(DT, j = eval(paste0("Seq_",i)), value = shift(DT[["Seq_0"]],n = i, type = "lead"));
w <- DT[as.list(x),on = paste0("Seq_",c(0L,SL)), which = TRUE];
rep(w, each = l) + 0:(l-1)
}




Benchmarking



library(data.table)
library(microbenchmark)
library(Rcpp)
library(zoo)

set.seed(2)
vl <- sample(1:10, 1e6, TRUE)
vm <- vl[1:1e5]
vs <- vl[1:1e4]
x <- c(2,3,5)


Vector Length 1e4





Unit: microseconds
expr min lq mean median uq max neval
symb(vs, x) 138.342 143.048 161.6681 153.1545 159.269 259.999 10
frank(vs, x) 176.634 184.129 198.8060 193.2850 200.701 257.050 10
jaap2(vs, x) 282.231 299.025 342.5323 316.5185 337.760 524.212 10
jaap1(vs, x) 490.013 528.123 568.6168 538.7595 547.268 731.340 10
a5c1(vs, x) 706.450 742.270 751.3092 756.2075 758.859 793.446 10
dd2(vs, x) 1319.098 1348.082 2061.5579 1363.2265 1497.960 7913.383 10
docendo(vs, x) 1427.768 1459.484 1536.6439 1546.2135 1595.858 1696.070 10
dd1(vs, x) 1377.502 1406.272 2217.2382 1552.5030 1706.131 8084.474 10
matt(vs, x) 1928.418 2041.597 2390.6227 2087.6335 2430.470 4762.909 10
u989(vs, x) 8720.330 8821.987 8935.7188 8882.0190 9106.705 9163.967 10
jogo1(vs, x) 47123.615 47536.700 49158.2600 48449.2390 50957.035 52496.981 10


Vector Length 1e5





Unit: milliseconds
expr min lq mean median uq max neval
symb(vm, x) 1.319921 1.378801 1.464972 1.423782 1.577006 1.682156 10
frank(vm, x) 1.671155 1.739507 1.806548 1.760738 1.844893 2.097404 10
jaap2(vm, x) 2.298449 2.380281 2.683813 2.432373 2.566581 4.310258 10
matt(vm, x) 3.195048 3.495247 3.577080 3.607060 3.687222 3.844508 10
jaap1(vm, x) 4.079117 4.179975 4.776989 4.496603 5.206452 6.295954 10
a5c1(vm, x) 6.488621 6.617709 7.366226 6.720107 6.877529 12.500510 10
dd2(vm, x) 12.595699 12.812876 14.990739 14.058098 16.758380 20.743506 10
docendo(vm, x) 13.635357 13.999721 15.296075 14.729947 16.151790 18.541582 10
dd1(vm, x) 13.474589 14.177410 15.676348 15.446635 17.150199 19.085379 10
u989(vm, x) 94.844298 95.026733 96.309658 95.134400 97.460869 100.536654 10
jogo1(vm, x) 575.230741 581.654544 621.824297 616.474265 628.267155 723.010738 10


Vector Length 1e6





Unit: milliseconds
expr min lq mean median uq max neval
symb(vl, x) 13.34294 13.55564 14.01556 13.61847 14.78210 15.26076 10
frank(vl, x) 17.35628 17.45602 18.62781 17.56914 17.88896 25.38812 10
matt(vl, x) 20.79867 21.07157 22.41467 21.23878 22.56063 27.12909 10
jaap2(vl, x) 22.81464 22.92414 22.96956 22.99085 23.02558 23.10124 10
jaap1(vl, x) 40.00971 40.46594 43.01407 41.03370 42.81724 55.90530 10
a5c1(vl, x) 65.39460 65.97406 69.27288 66.28000 66.72847 83.77490 10
dd2(vl, x) 127.47617 132.99154 161.85129 134.63168 157.40028 342.37526 10
dd1(vl, x) 140.06140 145.45085 154.88780 154.23280 161.90710 171.60294 10
docendo(vl, x) 147.07644 151.58861 162.20522 162.49216 165.49513 183.64135 10
u989(vl, x) 2022.64476 2041.55442 2055.86929 2054.92627 2066.26187 2088.71411 10
jogo1(vl, x) 5563.31171 5632.17506 5863.56265 5872.61793 6016.62838 6244.63205 10





share|improve this answer































    2














    Here is a string-based approach in base R:



    str <- paste(v, collapse = '-')
    # "2-2-3-5-8-0-32-1-3-12-5-2-3-5-8-33-1"

    pattern <- paste0('\b', paste(x, collapse = '-'), '\b')
    # "\b2-3-5-8\b"

    inds <- unlist(gregexpr(pattern, str)) # (1)
    # 3 25
    sapply(inds, function(i) lengths(strsplit(substr(str, 1, i),'-'))) # (2)

    # [1] 2 12




    • \b is used for exact matching.

    • (1) Finds the positions at which pattern is seen within str.

    • (2) Getting back the respective indices within the original vector v.




    UPDATE



    As for the discussion of running-time efficiency, here is a much faster solution than my first solution:



    str <- paste(v, collapse = '-')
    pattern <- paste0('\b', paste(x, collapse = '-'), '\b')

    inds <- c(1, unlist(gregexpr(pattern, str)))

    m <- substring(str, head(inds,-1), tail(inds,-1))
    ln <- lengths(strsplit(m, '-'))
    cumsum(c(ln[1], ln[-1]-1))





    share|improve this answer





















    • 2





      I've updated the benchmarks and only included your fastest solution.

      – Jaap
      Feb 8 '18 at 19:35











    • I looked at what they return and then adjusted the solutions such that all would give the same result (didn't programmatically check it though)

      – Jaap
      Feb 8 '18 at 19:55











    • included now :-)

      – Jaap
      Feb 8 '18 at 20:11











    • thx for notifying, changed the construction of the vectors a bit; now it should return a normal vector :-)

      – Jaap
      Feb 8 '18 at 20:26











    • please leave a note under the respective answers so they can improve; could you check my benchmarking codes? it could as well that I made a mistake somewhere

      – Jaap
      Feb 8 '18 at 20:39





















    1














    EDIT: some have noted that my answer doesn't always give the desired output, I might fix it later, caution meanwhile!



    We can convert v to factors and keep only consecutive values in our transformed vector:



    v2 <- as.numeric(factor(c(v,NA),levels = x)) # [1]  1  1  2  3  4 NA NA NA ...
    v2[is.na(v2)] <- length(x)+1 # [1] 1 1 2 3 4 5 5 5 ...
    output <- diff(v2) ==1
    # [1] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE


    data



    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)





    share|improve this answer





















    • 1





      that's pretty computationally intensive.

      – Carl Witthoft
      Feb 7 '18 at 14:30











    • is it ? I don't know, it's the only fully vectorized solution so far, too many copies ?

      – Moody_Mudskipper
      Feb 7 '18 at 14:33











    • I plead guilty to not having run microbenchmark on the various answers here. It's just a gut feeling because of the number of class coercions going on there.

      – Carl Witthoft
      Feb 7 '18 at 14:40











    • @CarlWitthoft, I guess that the answers by catastrophic-failure, which both utilise nested loops, will be much slower. But I too haven't tested any.

      – docendo discimus
      Feb 7 '18 at 14:51






    • 1





      @docendodiscimus see my latest benchmarks

      – Carl Witthoft
      Feb 7 '18 at 15:32











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f48660606%2fget-indexes-of-a-vector-of-numbers-in-another-vector%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    9 Answers
    9






    active

    oldest

    votes








    9 Answers
    9






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    21














    Using base R you could do the following:



    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)

    idx <- which(v == x[1])
    idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))]
    # [1] 2 12


    This tells you that the exact sequence appears twice, starting at positions 2 and 12 of your vector v.



    It first checks the possible starting positions, i.e. where v equals the first value of x and then loops through these positions to check if the values after these positions also equal the other values of x.






    share|improve this answer


























    • I was going to suggest something like which(colSums(t(embed(v, length(x))[, length(x):1]) == x) == length(x)), but I think this is easy to follow....

      – A5C1D2H2I1M1N2O1R2T1
      Feb 7 '18 at 10:06













    • @A5C1D2H2I1M1N2O1R2T1, that looks indeed a little hard to follow

      – docendo discimus
      Feb 7 '18 at 10:10






    • 2





      Worth noting, idx <- which(v == x[1]) is an important step. While other answers are going through all 1:4 shift variations 14 times, this answer does it in 3 steps.

      – zx8754
      Feb 7 '18 at 10:51











    • @zx8754, ...but the "data.table" approach still manages to win in terms of speed in a couple of tests I did with larger vecs....

      – A5C1D2H2I1M1N2O1R2T1
      Feb 7 '18 at 10:52
















    21














    Using base R you could do the following:



    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)

    idx <- which(v == x[1])
    idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))]
    # [1] 2 12


    This tells you that the exact sequence appears twice, starting at positions 2 and 12 of your vector v.



    It first checks the possible starting positions, i.e. where v equals the first value of x and then loops through these positions to check if the values after these positions also equal the other values of x.






    share|improve this answer


























    • I was going to suggest something like which(colSums(t(embed(v, length(x))[, length(x):1]) == x) == length(x)), but I think this is easy to follow....

      – A5C1D2H2I1M1N2O1R2T1
      Feb 7 '18 at 10:06













    • @A5C1D2H2I1M1N2O1R2T1, that looks indeed a little hard to follow

      – docendo discimus
      Feb 7 '18 at 10:10






    • 2





      Worth noting, idx <- which(v == x[1]) is an important step. While other answers are going through all 1:4 shift variations 14 times, this answer does it in 3 steps.

      – zx8754
      Feb 7 '18 at 10:51











    • @zx8754, ...but the "data.table" approach still manages to win in terms of speed in a couple of tests I did with larger vecs....

      – A5C1D2H2I1M1N2O1R2T1
      Feb 7 '18 at 10:52














    21












    21








    21







    Using base R you could do the following:



    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)

    idx <- which(v == x[1])
    idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))]
    # [1] 2 12


    This tells you that the exact sequence appears twice, starting at positions 2 and 12 of your vector v.



    It first checks the possible starting positions, i.e. where v equals the first value of x and then loops through these positions to check if the values after these positions also equal the other values of x.






    share|improve this answer















    Using base R you could do the following:



    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)

    idx <- which(v == x[1])
    idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))]
    # [1] 2 12


    This tells you that the exact sequence appears twice, starting at positions 2 and 12 of your vector v.



    It first checks the possible starting positions, i.e. where v equals the first value of x and then loops through these positions to check if the values after these positions also equal the other values of x.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Feb 7 '18 at 10:08

























    answered Feb 7 '18 at 10:05









    docendo discimusdocendo discimus

    51.7k1178116




    51.7k1178116













    • I was going to suggest something like which(colSums(t(embed(v, length(x))[, length(x):1]) == x) == length(x)), but I think this is easy to follow....

      – A5C1D2H2I1M1N2O1R2T1
      Feb 7 '18 at 10:06













    • @A5C1D2H2I1M1N2O1R2T1, that looks indeed a little hard to follow

      – docendo discimus
      Feb 7 '18 at 10:10






    • 2





      Worth noting, idx <- which(v == x[1]) is an important step. While other answers are going through all 1:4 shift variations 14 times, this answer does it in 3 steps.

      – zx8754
      Feb 7 '18 at 10:51











    • @zx8754, ...but the "data.table" approach still manages to win in terms of speed in a couple of tests I did with larger vecs....

      – A5C1D2H2I1M1N2O1R2T1
      Feb 7 '18 at 10:52



















    • I was going to suggest something like which(colSums(t(embed(v, length(x))[, length(x):1]) == x) == length(x)), but I think this is easy to follow....

      – A5C1D2H2I1M1N2O1R2T1
      Feb 7 '18 at 10:06













    • @A5C1D2H2I1M1N2O1R2T1, that looks indeed a little hard to follow

      – docendo discimus
      Feb 7 '18 at 10:10






    • 2





      Worth noting, idx <- which(v == x[1]) is an important step. While other answers are going through all 1:4 shift variations 14 times, this answer does it in 3 steps.

      – zx8754
      Feb 7 '18 at 10:51











    • @zx8754, ...but the "data.table" approach still manages to win in terms of speed in a couple of tests I did with larger vecs....

      – A5C1D2H2I1M1N2O1R2T1
      Feb 7 '18 at 10:52

















    I was going to suggest something like which(colSums(t(embed(v, length(x))[, length(x):1]) == x) == length(x)), but I think this is easy to follow....

    – A5C1D2H2I1M1N2O1R2T1
    Feb 7 '18 at 10:06







    I was going to suggest something like which(colSums(t(embed(v, length(x))[, length(x):1]) == x) == length(x)), but I think this is easy to follow....

    – A5C1D2H2I1M1N2O1R2T1
    Feb 7 '18 at 10:06















    @A5C1D2H2I1M1N2O1R2T1, that looks indeed a little hard to follow

    – docendo discimus
    Feb 7 '18 at 10:10





    @A5C1D2H2I1M1N2O1R2T1, that looks indeed a little hard to follow

    – docendo discimus
    Feb 7 '18 at 10:10




    2




    2





    Worth noting, idx <- which(v == x[1]) is an important step. While other answers are going through all 1:4 shift variations 14 times, this answer does it in 3 steps.

    – zx8754
    Feb 7 '18 at 10:51





    Worth noting, idx <- which(v == x[1]) is an important step. While other answers are going through all 1:4 shift variations 14 times, this answer does it in 3 steps.

    – zx8754
    Feb 7 '18 at 10:51













    @zx8754, ...but the "data.table" approach still manages to win in terms of speed in a couple of tests I did with larger vecs....

    – A5C1D2H2I1M1N2O1R2T1
    Feb 7 '18 at 10:52





    @zx8754, ...but the "data.table" approach still manages to win in terms of speed in a couple of tests I did with larger vecs....

    – A5C1D2H2I1M1N2O1R2T1
    Feb 7 '18 at 10:52













    16














    Two other approaches using the shift-function trom data.table:



    library(data.table)

    # option 1
    which(rowSums(mapply('==',
    shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)
    ) == length(x))

    # option 2
    which(Reduce("+", Map('==',
    shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)
    ) == length(x))


    both give:




    [1]  2 12



    To get a full vector of the matching positions:



    l <- length(x)
    w <- which(Reduce("+", Map('==',
    shift(v, type = 'lead', n = 0:(l - 1)),
    x)
    ) == l)
    rep(w, each = l) + 0:(l-1)


    which gives:




    [1]  2  3  4  5 12 13 14 15



    The benchmark which was included earlier in this answer has been moved to a separate community wiki answer.





    Used data:



    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)





    share|improve this answer





















    • 1





      Many of these solutions don't give the desired output, the extra step is not cost free

      – Moody_Mudskipper
      Feb 8 '18 at 11:29






    • 4





      @989 I will update, but didn't have the time yet. What I do not understand is that you downvote me but don't downvote the invalid answers. What's the reason for that? Furthermore: why didn't you comment under the invalid answers so that they get a chance to improve?

      – Jaap
      Feb 9 '18 at 10:46








    • 2





      @989 You could always suggest Edit to this post or provide your own benchmark on your own post with explanation why Jaap's is wrong. No need for this kind of tone.

      – zx8754
      Feb 9 '18 at 10:57








    • 1





      Not sure whose idea it was, but getting a full vector of matching positions concatenated together seems like a bad idea. If I test with x = c(1,1,1) then I may find positions appearing multiple times. Besides, it's redundant -- the informational content of the first position is enough... Anyway, not a big deal, just my two cents ... noticed it all over the benchmarks.

      – Frank
      Feb 9 '18 at 22:11






    • 1





      @Frank I don't think it is a bad idea necessarily. It depends on what you want to do with it. I included it in the benchmarks to make sure every solution returns the same and thus get a fair comparison.

      – Jaap
      Feb 10 '18 at 7:04
















    16














    Two other approaches using the shift-function trom data.table:



    library(data.table)

    # option 1
    which(rowSums(mapply('==',
    shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)
    ) == length(x))

    # option 2
    which(Reduce("+", Map('==',
    shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)
    ) == length(x))


    both give:




    [1]  2 12



    To get a full vector of the matching positions:



    l <- length(x)
    w <- which(Reduce("+", Map('==',
    shift(v, type = 'lead', n = 0:(l - 1)),
    x)
    ) == l)
    rep(w, each = l) + 0:(l-1)


    which gives:




    [1]  2  3  4  5 12 13 14 15



    The benchmark which was included earlier in this answer has been moved to a separate community wiki answer.





    Used data:



    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)





    share|improve this answer





















    • 1





      Many of these solutions don't give the desired output, the extra step is not cost free

      – Moody_Mudskipper
      Feb 8 '18 at 11:29






    • 4





      @989 I will update, but didn't have the time yet. What I do not understand is that you downvote me but don't downvote the invalid answers. What's the reason for that? Furthermore: why didn't you comment under the invalid answers so that they get a chance to improve?

      – Jaap
      Feb 9 '18 at 10:46








    • 2





      @989 You could always suggest Edit to this post or provide your own benchmark on your own post with explanation why Jaap's is wrong. No need for this kind of tone.

      – zx8754
      Feb 9 '18 at 10:57








    • 1





      Not sure whose idea it was, but getting a full vector of matching positions concatenated together seems like a bad idea. If I test with x = c(1,1,1) then I may find positions appearing multiple times. Besides, it's redundant -- the informational content of the first position is enough... Anyway, not a big deal, just my two cents ... noticed it all over the benchmarks.

      – Frank
      Feb 9 '18 at 22:11






    • 1





      @Frank I don't think it is a bad idea necessarily. It depends on what you want to do with it. I included it in the benchmarks to make sure every solution returns the same and thus get a fair comparison.

      – Jaap
      Feb 10 '18 at 7:04














    16












    16








    16







    Two other approaches using the shift-function trom data.table:



    library(data.table)

    # option 1
    which(rowSums(mapply('==',
    shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)
    ) == length(x))

    # option 2
    which(Reduce("+", Map('==',
    shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)
    ) == length(x))


    both give:




    [1]  2 12



    To get a full vector of the matching positions:



    l <- length(x)
    w <- which(Reduce("+", Map('==',
    shift(v, type = 'lead', n = 0:(l - 1)),
    x)
    ) == l)
    rep(w, each = l) + 0:(l-1)


    which gives:




    [1]  2  3  4  5 12 13 14 15



    The benchmark which was included earlier in this answer has been moved to a separate community wiki answer.





    Used data:



    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)





    share|improve this answer















    Two other approaches using the shift-function trom data.table:



    library(data.table)

    # option 1
    which(rowSums(mapply('==',
    shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)
    ) == length(x))

    # option 2
    which(Reduce("+", Map('==',
    shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)
    ) == length(x))


    both give:




    [1]  2 12



    To get a full vector of the matching positions:



    l <- length(x)
    w <- which(Reduce("+", Map('==',
    shift(v, type = 'lead', n = 0:(l - 1)),
    x)
    ) == l)
    rep(w, each = l) + 0:(l-1)


    which gives:




    [1]  2  3  4  5 12 13 14 15



    The benchmark which was included earlier in this answer has been moved to a separate community wiki answer.





    Used data:



    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Feb 9 '18 at 15:02

























    answered Feb 7 '18 at 10:24









    JaapJaap

    56.2k20119132




    56.2k20119132








    • 1





      Many of these solutions don't give the desired output, the extra step is not cost free

      – Moody_Mudskipper
      Feb 8 '18 at 11:29






    • 4





      @989 I will update, but didn't have the time yet. What I do not understand is that you downvote me but don't downvote the invalid answers. What's the reason for that? Furthermore: why didn't you comment under the invalid answers so that they get a chance to improve?

      – Jaap
      Feb 9 '18 at 10:46








    • 2





      @989 You could always suggest Edit to this post or provide your own benchmark on your own post with explanation why Jaap's is wrong. No need for this kind of tone.

      – zx8754
      Feb 9 '18 at 10:57








    • 1





      Not sure whose idea it was, but getting a full vector of matching positions concatenated together seems like a bad idea. If I test with x = c(1,1,1) then I may find positions appearing multiple times. Besides, it's redundant -- the informational content of the first position is enough... Anyway, not a big deal, just my two cents ... noticed it all over the benchmarks.

      – Frank
      Feb 9 '18 at 22:11






    • 1





      @Frank I don't think it is a bad idea necessarily. It depends on what you want to do with it. I included it in the benchmarks to make sure every solution returns the same and thus get a fair comparison.

      – Jaap
      Feb 10 '18 at 7:04














    • 1





      Many of these solutions don't give the desired output, the extra step is not cost free

      – Moody_Mudskipper
      Feb 8 '18 at 11:29






    • 4





      @989 I will update, but didn't have the time yet. What I do not understand is that you downvote me but don't downvote the invalid answers. What's the reason for that? Furthermore: why didn't you comment under the invalid answers so that they get a chance to improve?

      – Jaap
      Feb 9 '18 at 10:46








    • 2





      @989 You could always suggest Edit to this post or provide your own benchmark on your own post with explanation why Jaap's is wrong. No need for this kind of tone.

      – zx8754
      Feb 9 '18 at 10:57








    • 1





      Not sure whose idea it was, but getting a full vector of matching positions concatenated together seems like a bad idea. If I test with x = c(1,1,1) then I may find positions appearing multiple times. Besides, it's redundant -- the informational content of the first position is enough... Anyway, not a big deal, just my two cents ... noticed it all over the benchmarks.

      – Frank
      Feb 9 '18 at 22:11






    • 1





      @Frank I don't think it is a bad idea necessarily. It depends on what you want to do with it. I included it in the benchmarks to make sure every solution returns the same and thus get a fair comparison.

      – Jaap
      Feb 10 '18 at 7:04








    1




    1





    Many of these solutions don't give the desired output, the extra step is not cost free

    – Moody_Mudskipper
    Feb 8 '18 at 11:29





    Many of these solutions don't give the desired output, the extra step is not cost free

    – Moody_Mudskipper
    Feb 8 '18 at 11:29




    4




    4





    @989 I will update, but didn't have the time yet. What I do not understand is that you downvote me but don't downvote the invalid answers. What's the reason for that? Furthermore: why didn't you comment under the invalid answers so that they get a chance to improve?

    – Jaap
    Feb 9 '18 at 10:46







    @989 I will update, but didn't have the time yet. What I do not understand is that you downvote me but don't downvote the invalid answers. What's the reason for that? Furthermore: why didn't you comment under the invalid answers so that they get a chance to improve?

    – Jaap
    Feb 9 '18 at 10:46






    2




    2





    @989 You could always suggest Edit to this post or provide your own benchmark on your own post with explanation why Jaap's is wrong. No need for this kind of tone.

    – zx8754
    Feb 9 '18 at 10:57







    @989 You could always suggest Edit to this post or provide your own benchmark on your own post with explanation why Jaap's is wrong. No need for this kind of tone.

    – zx8754
    Feb 9 '18 at 10:57






    1




    1





    Not sure whose idea it was, but getting a full vector of matching positions concatenated together seems like a bad idea. If I test with x = c(1,1,1) then I may find positions appearing multiple times. Besides, it's redundant -- the informational content of the first position is enough... Anyway, not a big deal, just my two cents ... noticed it all over the benchmarks.

    – Frank
    Feb 9 '18 at 22:11





    Not sure whose idea it was, but getting a full vector of matching positions concatenated together seems like a bad idea. If I test with x = c(1,1,1) then I may find positions appearing multiple times. Besides, it's redundant -- the informational content of the first position is enough... Anyway, not a big deal, just my two cents ... noticed it all over the benchmarks.

    – Frank
    Feb 9 '18 at 22:11




    1




    1





    @Frank I don't think it is a bad idea necessarily. It depends on what you want to do with it. I included it in the benchmarks to make sure every solution returns the same and thus get a fair comparison.

    – Jaap
    Feb 10 '18 at 7:04





    @Frank I don't think it is a bad idea necessarily. It depends on what you want to do with it. I included it in the benchmarks to make sure every solution returns the same and thus get a fair comparison.

    – Jaap
    Feb 10 '18 at 7:04











    15














    You can use rollapply() from zoo



    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)

    library("zoo")
    searchX <- function(x, X) all(x==X)
    rollapply(v, FUN=searchX, X=x, width=length(x))


    The result TRUEshows you the beginning of the sequence.

    The code could be simplified to rollapply(v, length(x), identical, x) (thanks to G. Grothendieck):



    set.seed(2)
    vl <- as.numeric(sample(1:10, 1e6, TRUE))
    # vm <- vl[1:1e5]
    # vs <- vl[1:1e4]
    x <- c(2,3,5)

    library("zoo")
    searchX <- function(x, X) all(x==X)
    i1 <- rollapply(vl, FUN=searchX, X=x, width=length(x))
    i2 <- rollapply(vl, width=length(x), identical, y=x)

    identical(i1, i2)


    For using identical() both arguments must be of the same type (num and int are not the same).

    If needed == coerces int to num; identical() does not any coercion.






    share|improve this answer


























    • Could you check your 2nd solution? As you can see in the benchmark answer it doesn't return the same output as the other answers.

      – Jaap
      Feb 9 '18 at 15:04








    • 1





      I tried (also unsuccesfully) to repair it as well. I will remove it from the benchmarks.

      – Jaap
      Feb 9 '18 at 16:30






    • 1





      The code could be simplified to rollapply(v, length(x), identical, x) where v and x must be of the same type, e.g. both integer or both double, since for example identical(5L, 5) is FALSE.

      – G. Grothendieck
      Feb 10 '18 at 3:18








    • 1





      @G.Grothendieck Thx, that was indeed the issue. When both are of the same type, the solution with identical works.

      – Jaap
      Feb 10 '18 at 7:16
















    15














    You can use rollapply() from zoo



    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)

    library("zoo")
    searchX <- function(x, X) all(x==X)
    rollapply(v, FUN=searchX, X=x, width=length(x))


    The result TRUEshows you the beginning of the sequence.

    The code could be simplified to rollapply(v, length(x), identical, x) (thanks to G. Grothendieck):



    set.seed(2)
    vl <- as.numeric(sample(1:10, 1e6, TRUE))
    # vm <- vl[1:1e5]
    # vs <- vl[1:1e4]
    x <- c(2,3,5)

    library("zoo")
    searchX <- function(x, X) all(x==X)
    i1 <- rollapply(vl, FUN=searchX, X=x, width=length(x))
    i2 <- rollapply(vl, width=length(x), identical, y=x)

    identical(i1, i2)


    For using identical() both arguments must be of the same type (num and int are not the same).

    If needed == coerces int to num; identical() does not any coercion.






    share|improve this answer


























    • Could you check your 2nd solution? As you can see in the benchmark answer it doesn't return the same output as the other answers.

      – Jaap
      Feb 9 '18 at 15:04








    • 1





      I tried (also unsuccesfully) to repair it as well. I will remove it from the benchmarks.

      – Jaap
      Feb 9 '18 at 16:30






    • 1





      The code could be simplified to rollapply(v, length(x), identical, x) where v and x must be of the same type, e.g. both integer or both double, since for example identical(5L, 5) is FALSE.

      – G. Grothendieck
      Feb 10 '18 at 3:18








    • 1





      @G.Grothendieck Thx, that was indeed the issue. When both are of the same type, the solution with identical works.

      – Jaap
      Feb 10 '18 at 7:16














    15












    15








    15







    You can use rollapply() from zoo



    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)

    library("zoo")
    searchX <- function(x, X) all(x==X)
    rollapply(v, FUN=searchX, X=x, width=length(x))


    The result TRUEshows you the beginning of the sequence.

    The code could be simplified to rollapply(v, length(x), identical, x) (thanks to G. Grothendieck):



    set.seed(2)
    vl <- as.numeric(sample(1:10, 1e6, TRUE))
    # vm <- vl[1:1e5]
    # vs <- vl[1:1e4]
    x <- c(2,3,5)

    library("zoo")
    searchX <- function(x, X) all(x==X)
    i1 <- rollapply(vl, FUN=searchX, X=x, width=length(x))
    i2 <- rollapply(vl, width=length(x), identical, y=x)

    identical(i1, i2)


    For using identical() both arguments must be of the same type (num and int are not the same).

    If needed == coerces int to num; identical() does not any coercion.






    share|improve this answer















    You can use rollapply() from zoo



    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)

    library("zoo")
    searchX <- function(x, X) all(x==X)
    rollapply(v, FUN=searchX, X=x, width=length(x))


    The result TRUEshows you the beginning of the sequence.

    The code could be simplified to rollapply(v, length(x), identical, x) (thanks to G. Grothendieck):



    set.seed(2)
    vl <- as.numeric(sample(1:10, 1e6, TRUE))
    # vm <- vl[1:1e5]
    # vs <- vl[1:1e4]
    x <- c(2,3,5)

    library("zoo")
    searchX <- function(x, X) all(x==X)
    i1 <- rollapply(vl, FUN=searchX, X=x, width=length(x))
    i2 <- rollapply(vl, width=length(x), identical, y=x)

    identical(i1, i2)


    For using identical() both arguments must be of the same type (num and int are not the same).

    If needed == coerces int to num; identical() does not any coercion.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Feb 10 '18 at 9:08

























    answered Feb 7 '18 at 10:03









    jogojogo

    10k92135




    10k92135













    • Could you check your 2nd solution? As you can see in the benchmark answer it doesn't return the same output as the other answers.

      – Jaap
      Feb 9 '18 at 15:04








    • 1





      I tried (also unsuccesfully) to repair it as well. I will remove it from the benchmarks.

      – Jaap
      Feb 9 '18 at 16:30






    • 1





      The code could be simplified to rollapply(v, length(x), identical, x) where v and x must be of the same type, e.g. both integer or both double, since for example identical(5L, 5) is FALSE.

      – G. Grothendieck
      Feb 10 '18 at 3:18








    • 1





      @G.Grothendieck Thx, that was indeed the issue. When both are of the same type, the solution with identical works.

      – Jaap
      Feb 10 '18 at 7:16



















    • Could you check your 2nd solution? As you can see in the benchmark answer it doesn't return the same output as the other answers.

      – Jaap
      Feb 9 '18 at 15:04








    • 1





      I tried (also unsuccesfully) to repair it as well. I will remove it from the benchmarks.

      – Jaap
      Feb 9 '18 at 16:30






    • 1





      The code could be simplified to rollapply(v, length(x), identical, x) where v and x must be of the same type, e.g. both integer or both double, since for example identical(5L, 5) is FALSE.

      – G. Grothendieck
      Feb 10 '18 at 3:18








    • 1





      @G.Grothendieck Thx, that was indeed the issue. When both are of the same type, the solution with identical works.

      – Jaap
      Feb 10 '18 at 7:16

















    Could you check your 2nd solution? As you can see in the benchmark answer it doesn't return the same output as the other answers.

    – Jaap
    Feb 9 '18 at 15:04







    Could you check your 2nd solution? As you can see in the benchmark answer it doesn't return the same output as the other answers.

    – Jaap
    Feb 9 '18 at 15:04






    1




    1





    I tried (also unsuccesfully) to repair it as well. I will remove it from the benchmarks.

    – Jaap
    Feb 9 '18 at 16:30





    I tried (also unsuccesfully) to repair it as well. I will remove it from the benchmarks.

    – Jaap
    Feb 9 '18 at 16:30




    1




    1





    The code could be simplified to rollapply(v, length(x), identical, x) where v and x must be of the same type, e.g. both integer or both double, since for example identical(5L, 5) is FALSE.

    – G. Grothendieck
    Feb 10 '18 at 3:18







    The code could be simplified to rollapply(v, length(x), identical, x) where v and x must be of the same type, e.g. both integer or both double, since for example identical(5L, 5) is FALSE.

    – G. Grothendieck
    Feb 10 '18 at 3:18






    1




    1





    @G.Grothendieck Thx, that was indeed the issue. When both are of the same type, the solution with identical works.

    – Jaap
    Feb 10 '18 at 7:16





    @G.Grothendieck Thx, that was indeed the issue. When both are of the same type, the solution with identical works.

    – Jaap
    Feb 10 '18 at 7:16











    10














    I feel like looping should be efficient:



    w = seq_along(v)
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]

    w
    # [1] 2 12


    This should be writable in C++ following @SymbolixAU approach for extra speed.



    A basic comparison:



    # create functions for selected approaches
    redjaap <- function(v,x)
    which(Reduce("+", Map('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x)) == length(x))
    loop <- function(v,x){
    w = seq_along(v)
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]
    w
    }

    # check consistency
    identical(redjaap(v,x), loop(v,x))
    # [1] TRUE

    # check speed
    library(microbenchmark)
    vv <- rep(v, 1e4)
    microbenchmark(redjaap(vv,x), loop(vv,x), times = 100)
    # Unit: milliseconds
    # expr min lq mean median uq max neval cld
    # redjaap(vv, x) 5.883809 8.058230 17.225899 9.080246 9.907514 96.35226 100 b
    # loop(vv, x) 3.629213 5.080816 9.475016 5.578508 6.495105 112.61242 100 a

    # check consistency again
    identical(redjaap(vv,x), loop(vv,x))
    # [1] TRUE





    share|improve this answer





















    • 1





      this method is really efficient in terms of the amount of code to achieve the objective...can use compiler::cmpfun(frank) for a slight speedup

      – chinsoon12
      Feb 22 '18 at 8:03
















    10














    I feel like looping should be efficient:



    w = seq_along(v)
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]

    w
    # [1] 2 12


    This should be writable in C++ following @SymbolixAU approach for extra speed.



    A basic comparison:



    # create functions for selected approaches
    redjaap <- function(v,x)
    which(Reduce("+", Map('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x)) == length(x))
    loop <- function(v,x){
    w = seq_along(v)
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]
    w
    }

    # check consistency
    identical(redjaap(v,x), loop(v,x))
    # [1] TRUE

    # check speed
    library(microbenchmark)
    vv <- rep(v, 1e4)
    microbenchmark(redjaap(vv,x), loop(vv,x), times = 100)
    # Unit: milliseconds
    # expr min lq mean median uq max neval cld
    # redjaap(vv, x) 5.883809 8.058230 17.225899 9.080246 9.907514 96.35226 100 b
    # loop(vv, x) 3.629213 5.080816 9.475016 5.578508 6.495105 112.61242 100 a

    # check consistency again
    identical(redjaap(vv,x), loop(vv,x))
    # [1] TRUE





    share|improve this answer





















    • 1





      this method is really efficient in terms of the amount of code to achieve the objective...can use compiler::cmpfun(frank) for a slight speedup

      – chinsoon12
      Feb 22 '18 at 8:03














    10












    10








    10







    I feel like looping should be efficient:



    w = seq_along(v)
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]

    w
    # [1] 2 12


    This should be writable in C++ following @SymbolixAU approach for extra speed.



    A basic comparison:



    # create functions for selected approaches
    redjaap <- function(v,x)
    which(Reduce("+", Map('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x)) == length(x))
    loop <- function(v,x){
    w = seq_along(v)
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]
    w
    }

    # check consistency
    identical(redjaap(v,x), loop(v,x))
    # [1] TRUE

    # check speed
    library(microbenchmark)
    vv <- rep(v, 1e4)
    microbenchmark(redjaap(vv,x), loop(vv,x), times = 100)
    # Unit: milliseconds
    # expr min lq mean median uq max neval cld
    # redjaap(vv, x) 5.883809 8.058230 17.225899 9.080246 9.907514 96.35226 100 b
    # loop(vv, x) 3.629213 5.080816 9.475016 5.578508 6.495105 112.61242 100 a

    # check consistency again
    identical(redjaap(vv,x), loop(vv,x))
    # [1] TRUE





    share|improve this answer















    I feel like looping should be efficient:



    w = seq_along(v)
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]

    w
    # [1] 2 12


    This should be writable in C++ following @SymbolixAU approach for extra speed.



    A basic comparison:



    # create functions for selected approaches
    redjaap <- function(v,x)
    which(Reduce("+", Map('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x)) == length(x))
    loop <- function(v,x){
    w = seq_along(v)
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]
    w
    }

    # check consistency
    identical(redjaap(v,x), loop(v,x))
    # [1] TRUE

    # check speed
    library(microbenchmark)
    vv <- rep(v, 1e4)
    microbenchmark(redjaap(vv,x), loop(vv,x), times = 100)
    # Unit: milliseconds
    # expr min lq mean median uq max neval cld
    # redjaap(vv, x) 5.883809 8.058230 17.225899 9.080246 9.907514 96.35226 100 b
    # loop(vv, x) 3.629213 5.080816 9.475016 5.578508 6.495105 112.61242 100 a

    # check consistency again
    identical(redjaap(vv,x), loop(vv,x))
    # [1] TRUE






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Feb 7 '18 at 21:38

























    answered Feb 7 '18 at 21:31









    FrankFrank

    54.6k657130




    54.6k657130








    • 1





      this method is really efficient in terms of the amount of code to achieve the objective...can use compiler::cmpfun(frank) for a slight speedup

      – chinsoon12
      Feb 22 '18 at 8:03














    • 1





      this method is really efficient in terms of the amount of code to achieve the objective...can use compiler::cmpfun(frank) for a slight speedup

      – chinsoon12
      Feb 22 '18 at 8:03








    1




    1





    this method is really efficient in terms of the amount of code to achieve the objective...can use compiler::cmpfun(frank) for a slight speedup

    – chinsoon12
    Feb 22 '18 at 8:03





    this method is really efficient in terms of the amount of code to achieve the objective...can use compiler::cmpfun(frank) for a slight speedup

    – chinsoon12
    Feb 22 '18 at 8:03











    10














    Here are two Rcpp solutions. The first one returns the location of v that is the starting position of the sequence.



    library(Rcpp)

    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)

    cppFunction('NumericVector SeqInVec(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);

    for (int i = 0; i < vecSize; i++ ) {

    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    res[i] = 1;
    }else{
    res[i] = 0;
    }
    }

    return res;

    }')

    SeqInVec(v, x)
    #[1] 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0




    This second one returns the index values (as per the other answers) of every matched entry in the sequence.



    cppFunction('NumericVector SeqInVec(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')

    SeqInVec(v, x)
    # [1] 2 3 4 5 12 13 14 15




    Optimising



    As @MichaelChirico points out in their comment, further optimisations can be made. For example, if we know the first entry in the sequence doesn't match a value in the vector, we don't need to do the rest of the comparison



    cppFunction('NumericVector SeqInVecOpt(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    if (myVector[i] == mySequence[0]) {
    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')


    The answer with benchmarks shows the performance of these approaches






    share|improve this answer


























    • Could you update your solution such that it returns the same output as the others? I can then include it in the benchmark.

      – Jaap
      Feb 8 '18 at 20:14











    • Thx. Included in the separate benchmark answer now.

      – Jaap
      Feb 9 '18 at 15:14






    • 2





      since you're examining subsequent elements, shouldn't there be a way to optimize by skipping elements we already know don't start the sequence? e.g. in OPs example when checking at the second 2 we already know the 3rd element is not 2 so we can skip checking the elements after 3

      – MichaelChirico
      Feb 11 '18 at 11:18






    • 1





      2-3x speed-up, nice! I guess the improvement depends on the length of the "search" string and its density (% of TRUE values).

      – MichaelChirico
      Feb 12 '18 at 0:09






    • 1





      @MichaelChirico - yes that will likely be a factor. I've also tested a variation where it will increment i by the size of the search string, rather than one each time. In this example I didn't see any improvement, however.

      – SymbolixAU
      Feb 12 '18 at 0:13
















    10














    Here are two Rcpp solutions. The first one returns the location of v that is the starting position of the sequence.



    library(Rcpp)

    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)

    cppFunction('NumericVector SeqInVec(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);

    for (int i = 0; i < vecSize; i++ ) {

    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    res[i] = 1;
    }else{
    res[i] = 0;
    }
    }

    return res;

    }')

    SeqInVec(v, x)
    #[1] 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0




    This second one returns the index values (as per the other answers) of every matched entry in the sequence.



    cppFunction('NumericVector SeqInVec(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')

    SeqInVec(v, x)
    # [1] 2 3 4 5 12 13 14 15




    Optimising



    As @MichaelChirico points out in their comment, further optimisations can be made. For example, if we know the first entry in the sequence doesn't match a value in the vector, we don't need to do the rest of the comparison



    cppFunction('NumericVector SeqInVecOpt(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    if (myVector[i] == mySequence[0]) {
    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')


    The answer with benchmarks shows the performance of these approaches






    share|improve this answer


























    • Could you update your solution such that it returns the same output as the others? I can then include it in the benchmark.

      – Jaap
      Feb 8 '18 at 20:14











    • Thx. Included in the separate benchmark answer now.

      – Jaap
      Feb 9 '18 at 15:14






    • 2





      since you're examining subsequent elements, shouldn't there be a way to optimize by skipping elements we already know don't start the sequence? e.g. in OPs example when checking at the second 2 we already know the 3rd element is not 2 so we can skip checking the elements after 3

      – MichaelChirico
      Feb 11 '18 at 11:18






    • 1





      2-3x speed-up, nice! I guess the improvement depends on the length of the "search" string and its density (% of TRUE values).

      – MichaelChirico
      Feb 12 '18 at 0:09






    • 1





      @MichaelChirico - yes that will likely be a factor. I've also tested a variation where it will increment i by the size of the search string, rather than one each time. In this example I didn't see any improvement, however.

      – SymbolixAU
      Feb 12 '18 at 0:13














    10












    10








    10







    Here are two Rcpp solutions. The first one returns the location of v that is the starting position of the sequence.



    library(Rcpp)

    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)

    cppFunction('NumericVector SeqInVec(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);

    for (int i = 0; i < vecSize; i++ ) {

    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    res[i] = 1;
    }else{
    res[i] = 0;
    }
    }

    return res;

    }')

    SeqInVec(v, x)
    #[1] 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0




    This second one returns the index values (as per the other answers) of every matched entry in the sequence.



    cppFunction('NumericVector SeqInVec(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')

    SeqInVec(v, x)
    # [1] 2 3 4 5 12 13 14 15




    Optimising



    As @MichaelChirico points out in their comment, further optimisations can be made. For example, if we know the first entry in the sequence doesn't match a value in the vector, we don't need to do the rest of the comparison



    cppFunction('NumericVector SeqInVecOpt(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    if (myVector[i] == mySequence[0]) {
    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')


    The answer with benchmarks shows the performance of these approaches






    share|improve this answer















    Here are two Rcpp solutions. The first one returns the location of v that is the starting position of the sequence.



    library(Rcpp)

    v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
    x <- c(2,3,5,8)

    cppFunction('NumericVector SeqInVec(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);

    for (int i = 0; i < vecSize; i++ ) {

    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    res[i] = 1;
    }else{
    res[i] = 0;
    }
    }

    return res;

    }')

    SeqInVec(v, x)
    #[1] 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0




    This second one returns the index values (as per the other answers) of every matched entry in the sequence.



    cppFunction('NumericVector SeqInVec(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')

    SeqInVec(v, x)
    # [1] 2 3 4 5 12 13 14 15




    Optimising



    As @MichaelChirico points out in their comment, further optimisations can be made. For example, if we know the first entry in the sequence doesn't match a value in the vector, we don't need to do the rest of the comparison



    cppFunction('NumericVector SeqInVecOpt(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    if (myVector[i] == mySequence[0]) {
    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')


    The answer with benchmarks shows the performance of these approaches







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jul 16 '18 at 22:23

























    answered Feb 7 '18 at 21:21









    SymbolixAUSymbolixAU

    16.6k32987




    16.6k32987













    • Could you update your solution such that it returns the same output as the others? I can then include it in the benchmark.

      – Jaap
      Feb 8 '18 at 20:14











    • Thx. Included in the separate benchmark answer now.

      – Jaap
      Feb 9 '18 at 15:14






    • 2





      since you're examining subsequent elements, shouldn't there be a way to optimize by skipping elements we already know don't start the sequence? e.g. in OPs example when checking at the second 2 we already know the 3rd element is not 2 so we can skip checking the elements after 3

      – MichaelChirico
      Feb 11 '18 at 11:18






    • 1





      2-3x speed-up, nice! I guess the improvement depends on the length of the "search" string and its density (% of TRUE values).

      – MichaelChirico
      Feb 12 '18 at 0:09






    • 1





      @MichaelChirico - yes that will likely be a factor. I've also tested a variation where it will increment i by the size of the search string, rather than one each time. In this example I didn't see any improvement, however.

      – SymbolixAU
      Feb 12 '18 at 0:13



















    • Could you update your solution such that it returns the same output as the others? I can then include it in the benchmark.

      – Jaap
      Feb 8 '18 at 20:14











    • Thx. Included in the separate benchmark answer now.

      – Jaap
      Feb 9 '18 at 15:14






    • 2





      since you're examining subsequent elements, shouldn't there be a way to optimize by skipping elements we already know don't start the sequence? e.g. in OPs example when checking at the second 2 we already know the 3rd element is not 2 so we can skip checking the elements after 3

      – MichaelChirico
      Feb 11 '18 at 11:18






    • 1





      2-3x speed-up, nice! I guess the improvement depends on the length of the "search" string and its density (% of TRUE values).

      – MichaelChirico
      Feb 12 '18 at 0:09






    • 1





      @MichaelChirico - yes that will likely be a factor. I've also tested a variation where it will increment i by the size of the search string, rather than one each time. In this example I didn't see any improvement, however.

      – SymbolixAU
      Feb 12 '18 at 0:13

















    Could you update your solution such that it returns the same output as the others? I can then include it in the benchmark.

    – Jaap
    Feb 8 '18 at 20:14





    Could you update your solution such that it returns the same output as the others? I can then include it in the benchmark.

    – Jaap
    Feb 8 '18 at 20:14













    Thx. Included in the separate benchmark answer now.

    – Jaap
    Feb 9 '18 at 15:14





    Thx. Included in the separate benchmark answer now.

    – Jaap
    Feb 9 '18 at 15:14




    2




    2





    since you're examining subsequent elements, shouldn't there be a way to optimize by skipping elements we already know don't start the sequence? e.g. in OPs example when checking at the second 2 we already know the 3rd element is not 2 so we can skip checking the elements after 3

    – MichaelChirico
    Feb 11 '18 at 11:18





    since you're examining subsequent elements, shouldn't there be a way to optimize by skipping elements we already know don't start the sequence? e.g. in OPs example when checking at the second 2 we already know the 3rd element is not 2 so we can skip checking the elements after 3

    – MichaelChirico
    Feb 11 '18 at 11:18




    1




    1





    2-3x speed-up, nice! I guess the improvement depends on the length of the "search" string and its density (% of TRUE values).

    – MichaelChirico
    Feb 12 '18 at 0:09





    2-3x speed-up, nice! I guess the improvement depends on the length of the "search" string and its density (% of TRUE values).

    – MichaelChirico
    Feb 12 '18 at 0:09




    1




    1





    @MichaelChirico - yes that will likely be a factor. I've also tested a variation where it will increment i by the size of the search string, rather than one each time. In this example I didn't see any improvement, however.

    – SymbolixAU
    Feb 12 '18 at 0:13





    @MichaelChirico - yes that will likely be a factor. I've also tested a variation where it will increment i by the size of the search string, rather than one each time. In this example I didn't see any improvement, however.

    – SymbolixAU
    Feb 12 '18 at 0:13











    8














    A benchmark on the posted answers:



    Load the needed packages:



    library(data.table)
    library(microbenchmark)
    library(Rcpp)
    library(zoo)


    Creating vector with which the benchmarks will be run:



    set.seed(2)
    vl <- sample(1:10, 1e6, TRUE)
    vm <- vl[1:1e5]
    vs <- vl[1:1e4]
    x <- c(2,3,5)


    Testing whether all solution give the same outcome on the small vector vs:



    > all.equal(jaap1(vs,x), jaap2(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), docendo(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), a5c1(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), jogo1(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), moody(vs,x))
    [1] "Numeric: lengths (24, 873) differ"
    > all.equal(jaap1(vs,x), cata1(vs,x))
    [1] "Numeric: lengths (24, 0) differ"
    > all.equal(jaap1(vs,x), u989(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), frank(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), symb(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs, x), symbOpt(vs, x))
    [1] TRUE


    Further inspection of the cata1 and moody solutions learns that they don't give the desired output. They are therefore not included in the benchmarks.



    The benchmark for the smallest vector vs:



    mbs <- microbenchmark(jaap1(vs,x), jaap2(vs,x), docendo(vs,x), a5c1(vs,x),
    jogo1(vs,x), u989(vs,x), frank(vs,x), symb(vs,x), symbOpt(vs, x),
    times = 100)


    gives:




     print(mbs, order = "median")

    Unit: microseconds
    expr min lq mean median uq max neval
    symbOpt(vs, x) 40.658 47.0565 78.47119 51.5220 56.2765 2170.708 100
    symb(vs, x) 106.208 112.7885 151.76398 117.0655 123.7450 1976.360 100
    frank(vs, x) 121.303 129.0515 203.13616 132.1115 137.9370 6193.837 100
    jaap2(vs, x) 187.973 218.7805 322.98300 235.0535 255.2275 6287.548 100
    jaap1(vs, x) 306.944 341.4055 452.32426 358.2600 387.7105 6376.805 100
    a5c1(vs, x) 463.721 500.9465 628.13475 516.2845 553.2765 6179.304 100
    docendo(vs, x) 1139.689 1244.0555 1399.88150 1313.6295 1363.3480 9516.529 100
    u989(vs, x) 8048.969 8244.9570 8735.97523 8627.8335 8858.7075 18732.750 100
    jogo1(vs, x) 40022.406 42208.4870 44927.58872 43733.8935 45008.0360 124496.190 100



    The benchmark for the medium vector vm:



    mbm <- microbenchmark(jaap1(vm,x), jaap2(vm,x), docendo(vm,x), a5c1(vm,x),
    jogo1(vm,x), u989(vm,x), frank(vm,x), symb(vm,x), symbOpt(vm, x),
    times = 100)


    gives:




    print(mbm, order = "median")

    Unit: microseconds
    expr min lq mean median uq max neval
    symbOpt(vm, x) 357.452 405.0415 974.9058 763.0205 1067.803 7444.126 100
    symb(vm, x) 1032.915 1117.7585 1923.4040 1422.1930 1753.044 17498.132 100
    frank(vm, x) 1158.744 1470.8170 1829.8024 1826.1330 1935.641 6423.966 100
    jaap2(vm, x) 1622.183 2872.7725 3798.6536 3147.7895 3680.954 14886.765 100
    jaap1(vm, x) 3053.024 4729.6115 7325.3753 5607.8395 6682.814 87151.774 100
    a5c1(vm, x) 5487.547 7458.2025 9612.5545 8137.1255 9420.684 88798.914 100
    docendo(vm, x) 10780.920 11357.7440 13313.6269 12029.1720 13411.026 21984.294 100
    u989(vm, x) 83518.898 84999.6890 88537.9931 87675.3260 90636.674 105681.313 100
    jogo1(vm, x) 471753.735 512979.3840 537232.7003 534780.8050 556866.124 646810.092 100



    The benchmark for the largest vector vl:



    mbl <- microbenchmark(jaap1(vl,x), jaap2(vl,x), docendo(vl,x), a5c1(vl,x),
    jogo1(vl,x), u989(vl,x), frank(vl,x), symb(vl,x), symbOpt(vl, x),
    times = 100)


    gives:




      print(mbl, order = "median")

    Unit: milliseconds
    expr min lq mean median uq max neval
    symbOpt(vl, x) 4.679646 5.768531 12.30079 6.67608 11.67082 118.3467 100
    symb(vl, x) 11.356392 12.656124 21.27423 13.74856 18.66955 149.9840 100
    frank(vl, x) 13.523963 14.929656 22.70959 17.53589 22.04182 132.6248 100
    jaap2(vl, x) 18.754847 24.968511 37.89915 29.78309 36.47700 145.3471 100
    jaap1(vl, x) 37.047549 52.500684 95.28392 72.89496 138.55008 234.8694 100
    a5c1(vl, x) 54.563389 76.704769 116.89269 89.53974 167.19679 248.9265 100
    docendo(vl, x) 109.824281 124.631557 156.60513 129.64958 145.47547 296.0214 100
    u989(vl, x) 1380.886338 1413.878029 1454.50502 1436.18430 1479.18934 1632.3281 100
    jogo1(vl, x) 4067.106897 4339.005951 4472.46318 4454.89297 4563.08310 5114.4626 100





    The used functions of each solution:



    jaap1 <- function(v,x) {
    l <- length(x);
    w <- which(rowSums(mapply('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x) ) == length(x));
    rep(w, each = l) + 0:(l-1)
    }

    jaap2 <- function(v,x) {
    l <- length(x);
    w <- which(Reduce("+", Map('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x)) == length(x));
    rep(w, each = l) + 0:(l-1)
    }

    docendo <- function(v,x) {
    l <- length(x);
    idx <- which(v == x[1]);
    w <- idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))];
    rep(w, each = l) + 0:(l-1)
    }

    a5c1 <- function(v,x) {
    l <- length(x);
    w <- which(colSums(t(embed(v, l)[, l:1]) == x) == l);
    rep(w, each = l) + 0:(l-1)
    }

    jogo1 <- function(v,x) {
    l <- length(x);
    searchX <- function(x, X) all(x==X);
    w <- which(rollapply(v, FUN=searchX, X=x, width=l));
    rep(w, each = l) + 0:(l-1)
    }

    moody <- function(v,x) {
    l <- length(x);
    v2 <- as.numeric(factor(c(v,NA),levels = x));
    v2[is.na(v2)] <- l+1;
    which(diff(v2) == 1)
    }

    cata1 <- function(v,x) {
    l <- length(x);
    w <- which(sapply(lapply(seq(length(v)-l)-1, function(i) v[seq(x)+i]), identical, x));
    rep(w, each = l) + 0:(l-1)
    }

    u989 <- function(v,x) {
    l <- length(x);
    s <- paste(v, collapse = '-');
    p <- paste0('\b', paste(x, collapse = '-'), '\b');
    i <- c(1, unlist(gregexpr(p, s)));
    m <- substring(s, head(i,-1), tail(i,-1));
    ln <- lengths(strsplit(m, '-'));
    w <- cumsum(c(ln[1], ln[-1]-1));
    rep(w, each = l) + 0:(l-1)
    }

    frank <- function(v,x) {
    l <- length(x);
    w = seq_along(v);
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]];
    rep(w, each = l) + 0:(l-1)
    }

    cppFunction('NumericVector SeqInVec(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')

    symb <- function(v,x) {SeqInVec(v, x)}

    cppFunction('NumericVector SeqInVecOpt(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    if (myVector[i] == mySequence[0]) {
    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')

    symbOpt <- function(v,x) {SeqInVecOpt(v,x)}




    Since this is a cw-answer I'll add my own benchmark of some of the answers.



    library(data.table)
    library(microbenchmark)

    set.seed(2); v <- sample(1:100, 5e7, TRUE); x <- c(2,3,5)

    jaap1 <- function(v, x) {
    which(rowSums(mapply('==',shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)) == length(x))
    }

    jaap2 <- function(v, x) {
    which(Reduce("+", Map('==',shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)) == length(x))
    }

    dd1 <- function(v, x) {
    idx <- which(v == x[1])
    idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))]
    }

    dd2 <- function(v, x) {
    idx <- which(v == x[1L])
    xl <- length(x) - 1L
    idx[sapply(idx, function(i) all(v[i:(i+xl)] == x))]
    }

    frank <- function(v, x) {
    w = seq_along(v)
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]
    w
    }

    all.equal(jaap1(v, x), dd1(v, x))
    all.equal(jaap2(v, x), dd1(v, x))
    all.equal(dd2(v, x), dd1(v, x))
    all.equal(frank(v, x), dd1(v, x))

    bm <- microbenchmark(jaap1(v, x), jaap2(v, x), dd1(v, x), dd2(v, x), frank(v, x),
    unit = "relative", times = 25)

    plot(bm)


    Imgur



    bm
    Unit: relative
    expr min lq mean median uq max neval
    jaap1(v, x) 4.487360 4.591961 4.724153 4.870226 4.660023 3.9361093 25
    jaap2(v, x) 2.026052 2.159902 2.116204 2.282644 2.138106 2.1133068 25
    dd1(v, x) 1.078059 1.151530 1.119067 1.257337 1.201762 0.8646835 25
    dd2(v, x) 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000 25
    frank(v, x) 1.400735 1.376405 1.442887 1.427433 1.611672 1.3440097 25


    Bottom line: without knowing the real data, all these benchmarks don't tell the whole story.






    share|improve this answer





















    • 1





      @docendodiscimus - could you update with the data you've used in your benchmarks?

      – SymbolixAU
      Feb 11 '18 at 21:33











    • @SymbolixAU, yes of course. Sorry, I thought I had done that already.

      – docendo discimus
      Feb 12 '18 at 7:52











    • My answer in base R is (on average) 4x times faster than jogo's answer with the help of a library. I have got +2/-2 votes and his answer +15. Hmmm :-/

      – 989
      Feb 12 '18 at 9:47








    • 2





      @989 - I wouldn't take it personally; after the initial flurry of activity & votes, people don't often re-visit questions, which also means down-votes often won't get removed even if you improve the answer.

      – SymbolixAU
      Feb 13 '18 at 3:46


















    8














    A benchmark on the posted answers:



    Load the needed packages:



    library(data.table)
    library(microbenchmark)
    library(Rcpp)
    library(zoo)


    Creating vector with which the benchmarks will be run:



    set.seed(2)
    vl <- sample(1:10, 1e6, TRUE)
    vm <- vl[1:1e5]
    vs <- vl[1:1e4]
    x <- c(2,3,5)


    Testing whether all solution give the same outcome on the small vector vs:



    > all.equal(jaap1(vs,x), jaap2(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), docendo(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), a5c1(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), jogo1(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), moody(vs,x))
    [1] "Numeric: lengths (24, 873) differ"
    > all.equal(jaap1(vs,x), cata1(vs,x))
    [1] "Numeric: lengths (24, 0) differ"
    > all.equal(jaap1(vs,x), u989(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), frank(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), symb(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs, x), symbOpt(vs, x))
    [1] TRUE


    Further inspection of the cata1 and moody solutions learns that they don't give the desired output. They are therefore not included in the benchmarks.



    The benchmark for the smallest vector vs:



    mbs <- microbenchmark(jaap1(vs,x), jaap2(vs,x), docendo(vs,x), a5c1(vs,x),
    jogo1(vs,x), u989(vs,x), frank(vs,x), symb(vs,x), symbOpt(vs, x),
    times = 100)


    gives:




     print(mbs, order = "median")

    Unit: microseconds
    expr min lq mean median uq max neval
    symbOpt(vs, x) 40.658 47.0565 78.47119 51.5220 56.2765 2170.708 100
    symb(vs, x) 106.208 112.7885 151.76398 117.0655 123.7450 1976.360 100
    frank(vs, x) 121.303 129.0515 203.13616 132.1115 137.9370 6193.837 100
    jaap2(vs, x) 187.973 218.7805 322.98300 235.0535 255.2275 6287.548 100
    jaap1(vs, x) 306.944 341.4055 452.32426 358.2600 387.7105 6376.805 100
    a5c1(vs, x) 463.721 500.9465 628.13475 516.2845 553.2765 6179.304 100
    docendo(vs, x) 1139.689 1244.0555 1399.88150 1313.6295 1363.3480 9516.529 100
    u989(vs, x) 8048.969 8244.9570 8735.97523 8627.8335 8858.7075 18732.750 100
    jogo1(vs, x) 40022.406 42208.4870 44927.58872 43733.8935 45008.0360 124496.190 100



    The benchmark for the medium vector vm:



    mbm <- microbenchmark(jaap1(vm,x), jaap2(vm,x), docendo(vm,x), a5c1(vm,x),
    jogo1(vm,x), u989(vm,x), frank(vm,x), symb(vm,x), symbOpt(vm, x),
    times = 100)


    gives:




    print(mbm, order = "median")

    Unit: microseconds
    expr min lq mean median uq max neval
    symbOpt(vm, x) 357.452 405.0415 974.9058 763.0205 1067.803 7444.126 100
    symb(vm, x) 1032.915 1117.7585 1923.4040 1422.1930 1753.044 17498.132 100
    frank(vm, x) 1158.744 1470.8170 1829.8024 1826.1330 1935.641 6423.966 100
    jaap2(vm, x) 1622.183 2872.7725 3798.6536 3147.7895 3680.954 14886.765 100
    jaap1(vm, x) 3053.024 4729.6115 7325.3753 5607.8395 6682.814 87151.774 100
    a5c1(vm, x) 5487.547 7458.2025 9612.5545 8137.1255 9420.684 88798.914 100
    docendo(vm, x) 10780.920 11357.7440 13313.6269 12029.1720 13411.026 21984.294 100
    u989(vm, x) 83518.898 84999.6890 88537.9931 87675.3260 90636.674 105681.313 100
    jogo1(vm, x) 471753.735 512979.3840 537232.7003 534780.8050 556866.124 646810.092 100



    The benchmark for the largest vector vl:



    mbl <- microbenchmark(jaap1(vl,x), jaap2(vl,x), docendo(vl,x), a5c1(vl,x),
    jogo1(vl,x), u989(vl,x), frank(vl,x), symb(vl,x), symbOpt(vl, x),
    times = 100)


    gives:




      print(mbl, order = "median")

    Unit: milliseconds
    expr min lq mean median uq max neval
    symbOpt(vl, x) 4.679646 5.768531 12.30079 6.67608 11.67082 118.3467 100
    symb(vl, x) 11.356392 12.656124 21.27423 13.74856 18.66955 149.9840 100
    frank(vl, x) 13.523963 14.929656 22.70959 17.53589 22.04182 132.6248 100
    jaap2(vl, x) 18.754847 24.968511 37.89915 29.78309 36.47700 145.3471 100
    jaap1(vl, x) 37.047549 52.500684 95.28392 72.89496 138.55008 234.8694 100
    a5c1(vl, x) 54.563389 76.704769 116.89269 89.53974 167.19679 248.9265 100
    docendo(vl, x) 109.824281 124.631557 156.60513 129.64958 145.47547 296.0214 100
    u989(vl, x) 1380.886338 1413.878029 1454.50502 1436.18430 1479.18934 1632.3281 100
    jogo1(vl, x) 4067.106897 4339.005951 4472.46318 4454.89297 4563.08310 5114.4626 100





    The used functions of each solution:



    jaap1 <- function(v,x) {
    l <- length(x);
    w <- which(rowSums(mapply('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x) ) == length(x));
    rep(w, each = l) + 0:(l-1)
    }

    jaap2 <- function(v,x) {
    l <- length(x);
    w <- which(Reduce("+", Map('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x)) == length(x));
    rep(w, each = l) + 0:(l-1)
    }

    docendo <- function(v,x) {
    l <- length(x);
    idx <- which(v == x[1]);
    w <- idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))];
    rep(w, each = l) + 0:(l-1)
    }

    a5c1 <- function(v,x) {
    l <- length(x);
    w <- which(colSums(t(embed(v, l)[, l:1]) == x) == l);
    rep(w, each = l) + 0:(l-1)
    }

    jogo1 <- function(v,x) {
    l <- length(x);
    searchX <- function(x, X) all(x==X);
    w <- which(rollapply(v, FUN=searchX, X=x, width=l));
    rep(w, each = l) + 0:(l-1)
    }

    moody <- function(v,x) {
    l <- length(x);
    v2 <- as.numeric(factor(c(v,NA),levels = x));
    v2[is.na(v2)] <- l+1;
    which(diff(v2) == 1)
    }

    cata1 <- function(v,x) {
    l <- length(x);
    w <- which(sapply(lapply(seq(length(v)-l)-1, function(i) v[seq(x)+i]), identical, x));
    rep(w, each = l) + 0:(l-1)
    }

    u989 <- function(v,x) {
    l <- length(x);
    s <- paste(v, collapse = '-');
    p <- paste0('\b', paste(x, collapse = '-'), '\b');
    i <- c(1, unlist(gregexpr(p, s)));
    m <- substring(s, head(i,-1), tail(i,-1));
    ln <- lengths(strsplit(m, '-'));
    w <- cumsum(c(ln[1], ln[-1]-1));
    rep(w, each = l) + 0:(l-1)
    }

    frank <- function(v,x) {
    l <- length(x);
    w = seq_along(v);
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]];
    rep(w, each = l) + 0:(l-1)
    }

    cppFunction('NumericVector SeqInVec(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')

    symb <- function(v,x) {SeqInVec(v, x)}

    cppFunction('NumericVector SeqInVecOpt(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    if (myVector[i] == mySequence[0]) {
    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')

    symbOpt <- function(v,x) {SeqInVecOpt(v,x)}




    Since this is a cw-answer I'll add my own benchmark of some of the answers.



    library(data.table)
    library(microbenchmark)

    set.seed(2); v <- sample(1:100, 5e7, TRUE); x <- c(2,3,5)

    jaap1 <- function(v, x) {
    which(rowSums(mapply('==',shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)) == length(x))
    }

    jaap2 <- function(v, x) {
    which(Reduce("+", Map('==',shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)) == length(x))
    }

    dd1 <- function(v, x) {
    idx <- which(v == x[1])
    idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))]
    }

    dd2 <- function(v, x) {
    idx <- which(v == x[1L])
    xl <- length(x) - 1L
    idx[sapply(idx, function(i) all(v[i:(i+xl)] == x))]
    }

    frank <- function(v, x) {
    w = seq_along(v)
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]
    w
    }

    all.equal(jaap1(v, x), dd1(v, x))
    all.equal(jaap2(v, x), dd1(v, x))
    all.equal(dd2(v, x), dd1(v, x))
    all.equal(frank(v, x), dd1(v, x))

    bm <- microbenchmark(jaap1(v, x), jaap2(v, x), dd1(v, x), dd2(v, x), frank(v, x),
    unit = "relative", times = 25)

    plot(bm)


    Imgur



    bm
    Unit: relative
    expr min lq mean median uq max neval
    jaap1(v, x) 4.487360 4.591961 4.724153 4.870226 4.660023 3.9361093 25
    jaap2(v, x) 2.026052 2.159902 2.116204 2.282644 2.138106 2.1133068 25
    dd1(v, x) 1.078059 1.151530 1.119067 1.257337 1.201762 0.8646835 25
    dd2(v, x) 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000 25
    frank(v, x) 1.400735 1.376405 1.442887 1.427433 1.611672 1.3440097 25


    Bottom line: without knowing the real data, all these benchmarks don't tell the whole story.






    share|improve this answer





















    • 1





      @docendodiscimus - could you update with the data you've used in your benchmarks?

      – SymbolixAU
      Feb 11 '18 at 21:33











    • @SymbolixAU, yes of course. Sorry, I thought I had done that already.

      – docendo discimus
      Feb 12 '18 at 7:52











    • My answer in base R is (on average) 4x times faster than jogo's answer with the help of a library. I have got +2/-2 votes and his answer +15. Hmmm :-/

      – 989
      Feb 12 '18 at 9:47








    • 2





      @989 - I wouldn't take it personally; after the initial flurry of activity & votes, people don't often re-visit questions, which also means down-votes often won't get removed even if you improve the answer.

      – SymbolixAU
      Feb 13 '18 at 3:46
















    8












    8








    8







    A benchmark on the posted answers:



    Load the needed packages:



    library(data.table)
    library(microbenchmark)
    library(Rcpp)
    library(zoo)


    Creating vector with which the benchmarks will be run:



    set.seed(2)
    vl <- sample(1:10, 1e6, TRUE)
    vm <- vl[1:1e5]
    vs <- vl[1:1e4]
    x <- c(2,3,5)


    Testing whether all solution give the same outcome on the small vector vs:



    > all.equal(jaap1(vs,x), jaap2(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), docendo(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), a5c1(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), jogo1(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), moody(vs,x))
    [1] "Numeric: lengths (24, 873) differ"
    > all.equal(jaap1(vs,x), cata1(vs,x))
    [1] "Numeric: lengths (24, 0) differ"
    > all.equal(jaap1(vs,x), u989(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), frank(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), symb(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs, x), symbOpt(vs, x))
    [1] TRUE


    Further inspection of the cata1 and moody solutions learns that they don't give the desired output. They are therefore not included in the benchmarks.



    The benchmark for the smallest vector vs:



    mbs <- microbenchmark(jaap1(vs,x), jaap2(vs,x), docendo(vs,x), a5c1(vs,x),
    jogo1(vs,x), u989(vs,x), frank(vs,x), symb(vs,x), symbOpt(vs, x),
    times = 100)


    gives:




     print(mbs, order = "median")

    Unit: microseconds
    expr min lq mean median uq max neval
    symbOpt(vs, x) 40.658 47.0565 78.47119 51.5220 56.2765 2170.708 100
    symb(vs, x) 106.208 112.7885 151.76398 117.0655 123.7450 1976.360 100
    frank(vs, x) 121.303 129.0515 203.13616 132.1115 137.9370 6193.837 100
    jaap2(vs, x) 187.973 218.7805 322.98300 235.0535 255.2275 6287.548 100
    jaap1(vs, x) 306.944 341.4055 452.32426 358.2600 387.7105 6376.805 100
    a5c1(vs, x) 463.721 500.9465 628.13475 516.2845 553.2765 6179.304 100
    docendo(vs, x) 1139.689 1244.0555 1399.88150 1313.6295 1363.3480 9516.529 100
    u989(vs, x) 8048.969 8244.9570 8735.97523 8627.8335 8858.7075 18732.750 100
    jogo1(vs, x) 40022.406 42208.4870 44927.58872 43733.8935 45008.0360 124496.190 100



    The benchmark for the medium vector vm:



    mbm <- microbenchmark(jaap1(vm,x), jaap2(vm,x), docendo(vm,x), a5c1(vm,x),
    jogo1(vm,x), u989(vm,x), frank(vm,x), symb(vm,x), symbOpt(vm, x),
    times = 100)


    gives:




    print(mbm, order = "median")

    Unit: microseconds
    expr min lq mean median uq max neval
    symbOpt(vm, x) 357.452 405.0415 974.9058 763.0205 1067.803 7444.126 100
    symb(vm, x) 1032.915 1117.7585 1923.4040 1422.1930 1753.044 17498.132 100
    frank(vm, x) 1158.744 1470.8170 1829.8024 1826.1330 1935.641 6423.966 100
    jaap2(vm, x) 1622.183 2872.7725 3798.6536 3147.7895 3680.954 14886.765 100
    jaap1(vm, x) 3053.024 4729.6115 7325.3753 5607.8395 6682.814 87151.774 100
    a5c1(vm, x) 5487.547 7458.2025 9612.5545 8137.1255 9420.684 88798.914 100
    docendo(vm, x) 10780.920 11357.7440 13313.6269 12029.1720 13411.026 21984.294 100
    u989(vm, x) 83518.898 84999.6890 88537.9931 87675.3260 90636.674 105681.313 100
    jogo1(vm, x) 471753.735 512979.3840 537232.7003 534780.8050 556866.124 646810.092 100



    The benchmark for the largest vector vl:



    mbl <- microbenchmark(jaap1(vl,x), jaap2(vl,x), docendo(vl,x), a5c1(vl,x),
    jogo1(vl,x), u989(vl,x), frank(vl,x), symb(vl,x), symbOpt(vl, x),
    times = 100)


    gives:




      print(mbl, order = "median")

    Unit: milliseconds
    expr min lq mean median uq max neval
    symbOpt(vl, x) 4.679646 5.768531 12.30079 6.67608 11.67082 118.3467 100
    symb(vl, x) 11.356392 12.656124 21.27423 13.74856 18.66955 149.9840 100
    frank(vl, x) 13.523963 14.929656 22.70959 17.53589 22.04182 132.6248 100
    jaap2(vl, x) 18.754847 24.968511 37.89915 29.78309 36.47700 145.3471 100
    jaap1(vl, x) 37.047549 52.500684 95.28392 72.89496 138.55008 234.8694 100
    a5c1(vl, x) 54.563389 76.704769 116.89269 89.53974 167.19679 248.9265 100
    docendo(vl, x) 109.824281 124.631557 156.60513 129.64958 145.47547 296.0214 100
    u989(vl, x) 1380.886338 1413.878029 1454.50502 1436.18430 1479.18934 1632.3281 100
    jogo1(vl, x) 4067.106897 4339.005951 4472.46318 4454.89297 4563.08310 5114.4626 100





    The used functions of each solution:



    jaap1 <- function(v,x) {
    l <- length(x);
    w <- which(rowSums(mapply('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x) ) == length(x));
    rep(w, each = l) + 0:(l-1)
    }

    jaap2 <- function(v,x) {
    l <- length(x);
    w <- which(Reduce("+", Map('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x)) == length(x));
    rep(w, each = l) + 0:(l-1)
    }

    docendo <- function(v,x) {
    l <- length(x);
    idx <- which(v == x[1]);
    w <- idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))];
    rep(w, each = l) + 0:(l-1)
    }

    a5c1 <- function(v,x) {
    l <- length(x);
    w <- which(colSums(t(embed(v, l)[, l:1]) == x) == l);
    rep(w, each = l) + 0:(l-1)
    }

    jogo1 <- function(v,x) {
    l <- length(x);
    searchX <- function(x, X) all(x==X);
    w <- which(rollapply(v, FUN=searchX, X=x, width=l));
    rep(w, each = l) + 0:(l-1)
    }

    moody <- function(v,x) {
    l <- length(x);
    v2 <- as.numeric(factor(c(v,NA),levels = x));
    v2[is.na(v2)] <- l+1;
    which(diff(v2) == 1)
    }

    cata1 <- function(v,x) {
    l <- length(x);
    w <- which(sapply(lapply(seq(length(v)-l)-1, function(i) v[seq(x)+i]), identical, x));
    rep(w, each = l) + 0:(l-1)
    }

    u989 <- function(v,x) {
    l <- length(x);
    s <- paste(v, collapse = '-');
    p <- paste0('\b', paste(x, collapse = '-'), '\b');
    i <- c(1, unlist(gregexpr(p, s)));
    m <- substring(s, head(i,-1), tail(i,-1));
    ln <- lengths(strsplit(m, '-'));
    w <- cumsum(c(ln[1], ln[-1]-1));
    rep(w, each = l) + 0:(l-1)
    }

    frank <- function(v,x) {
    l <- length(x);
    w = seq_along(v);
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]];
    rep(w, each = l) + 0:(l-1)
    }

    cppFunction('NumericVector SeqInVec(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')

    symb <- function(v,x) {SeqInVec(v, x)}

    cppFunction('NumericVector SeqInVecOpt(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    if (myVector[i] == mySequence[0]) {
    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')

    symbOpt <- function(v,x) {SeqInVecOpt(v,x)}




    Since this is a cw-answer I'll add my own benchmark of some of the answers.



    library(data.table)
    library(microbenchmark)

    set.seed(2); v <- sample(1:100, 5e7, TRUE); x <- c(2,3,5)

    jaap1 <- function(v, x) {
    which(rowSums(mapply('==',shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)) == length(x))
    }

    jaap2 <- function(v, x) {
    which(Reduce("+", Map('==',shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)) == length(x))
    }

    dd1 <- function(v, x) {
    idx <- which(v == x[1])
    idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))]
    }

    dd2 <- function(v, x) {
    idx <- which(v == x[1L])
    xl <- length(x) - 1L
    idx[sapply(idx, function(i) all(v[i:(i+xl)] == x))]
    }

    frank <- function(v, x) {
    w = seq_along(v)
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]
    w
    }

    all.equal(jaap1(v, x), dd1(v, x))
    all.equal(jaap2(v, x), dd1(v, x))
    all.equal(dd2(v, x), dd1(v, x))
    all.equal(frank(v, x), dd1(v, x))

    bm <- microbenchmark(jaap1(v, x), jaap2(v, x), dd1(v, x), dd2(v, x), frank(v, x),
    unit = "relative", times = 25)

    plot(bm)


    Imgur



    bm
    Unit: relative
    expr min lq mean median uq max neval
    jaap1(v, x) 4.487360 4.591961 4.724153 4.870226 4.660023 3.9361093 25
    jaap2(v, x) 2.026052 2.159902 2.116204 2.282644 2.138106 2.1133068 25
    dd1(v, x) 1.078059 1.151530 1.119067 1.257337 1.201762 0.8646835 25
    dd2(v, x) 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000 25
    frank(v, x) 1.400735 1.376405 1.442887 1.427433 1.611672 1.3440097 25


    Bottom line: without knowing the real data, all these benchmarks don't tell the whole story.






    share|improve this answer















    A benchmark on the posted answers:



    Load the needed packages:



    library(data.table)
    library(microbenchmark)
    library(Rcpp)
    library(zoo)


    Creating vector with which the benchmarks will be run:



    set.seed(2)
    vl <- sample(1:10, 1e6, TRUE)
    vm <- vl[1:1e5]
    vs <- vl[1:1e4]
    x <- c(2,3,5)


    Testing whether all solution give the same outcome on the small vector vs:



    > all.equal(jaap1(vs,x), jaap2(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), docendo(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), a5c1(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), jogo1(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), moody(vs,x))
    [1] "Numeric: lengths (24, 873) differ"
    > all.equal(jaap1(vs,x), cata1(vs,x))
    [1] "Numeric: lengths (24, 0) differ"
    > all.equal(jaap1(vs,x), u989(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), frank(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs,x), symb(vs,x))
    [1] TRUE
    > all.equal(jaap1(vs, x), symbOpt(vs, x))
    [1] TRUE


    Further inspection of the cata1 and moody solutions learns that they don't give the desired output. They are therefore not included in the benchmarks.



    The benchmark for the smallest vector vs:



    mbs <- microbenchmark(jaap1(vs,x), jaap2(vs,x), docendo(vs,x), a5c1(vs,x),
    jogo1(vs,x), u989(vs,x), frank(vs,x), symb(vs,x), symbOpt(vs, x),
    times = 100)


    gives:




     print(mbs, order = "median")

    Unit: microseconds
    expr min lq mean median uq max neval
    symbOpt(vs, x) 40.658 47.0565 78.47119 51.5220 56.2765 2170.708 100
    symb(vs, x) 106.208 112.7885 151.76398 117.0655 123.7450 1976.360 100
    frank(vs, x) 121.303 129.0515 203.13616 132.1115 137.9370 6193.837 100
    jaap2(vs, x) 187.973 218.7805 322.98300 235.0535 255.2275 6287.548 100
    jaap1(vs, x) 306.944 341.4055 452.32426 358.2600 387.7105 6376.805 100
    a5c1(vs, x) 463.721 500.9465 628.13475 516.2845 553.2765 6179.304 100
    docendo(vs, x) 1139.689 1244.0555 1399.88150 1313.6295 1363.3480 9516.529 100
    u989(vs, x) 8048.969 8244.9570 8735.97523 8627.8335 8858.7075 18732.750 100
    jogo1(vs, x) 40022.406 42208.4870 44927.58872 43733.8935 45008.0360 124496.190 100



    The benchmark for the medium vector vm:



    mbm <- microbenchmark(jaap1(vm,x), jaap2(vm,x), docendo(vm,x), a5c1(vm,x),
    jogo1(vm,x), u989(vm,x), frank(vm,x), symb(vm,x), symbOpt(vm, x),
    times = 100)


    gives:




    print(mbm, order = "median")

    Unit: microseconds
    expr min lq mean median uq max neval
    symbOpt(vm, x) 357.452 405.0415 974.9058 763.0205 1067.803 7444.126 100
    symb(vm, x) 1032.915 1117.7585 1923.4040 1422.1930 1753.044 17498.132 100
    frank(vm, x) 1158.744 1470.8170 1829.8024 1826.1330 1935.641 6423.966 100
    jaap2(vm, x) 1622.183 2872.7725 3798.6536 3147.7895 3680.954 14886.765 100
    jaap1(vm, x) 3053.024 4729.6115 7325.3753 5607.8395 6682.814 87151.774 100
    a5c1(vm, x) 5487.547 7458.2025 9612.5545 8137.1255 9420.684 88798.914 100
    docendo(vm, x) 10780.920 11357.7440 13313.6269 12029.1720 13411.026 21984.294 100
    u989(vm, x) 83518.898 84999.6890 88537.9931 87675.3260 90636.674 105681.313 100
    jogo1(vm, x) 471753.735 512979.3840 537232.7003 534780.8050 556866.124 646810.092 100



    The benchmark for the largest vector vl:



    mbl <- microbenchmark(jaap1(vl,x), jaap2(vl,x), docendo(vl,x), a5c1(vl,x),
    jogo1(vl,x), u989(vl,x), frank(vl,x), symb(vl,x), symbOpt(vl, x),
    times = 100)


    gives:




      print(mbl, order = "median")

    Unit: milliseconds
    expr min lq mean median uq max neval
    symbOpt(vl, x) 4.679646 5.768531 12.30079 6.67608 11.67082 118.3467 100
    symb(vl, x) 11.356392 12.656124 21.27423 13.74856 18.66955 149.9840 100
    frank(vl, x) 13.523963 14.929656 22.70959 17.53589 22.04182 132.6248 100
    jaap2(vl, x) 18.754847 24.968511 37.89915 29.78309 36.47700 145.3471 100
    jaap1(vl, x) 37.047549 52.500684 95.28392 72.89496 138.55008 234.8694 100
    a5c1(vl, x) 54.563389 76.704769 116.89269 89.53974 167.19679 248.9265 100
    docendo(vl, x) 109.824281 124.631557 156.60513 129.64958 145.47547 296.0214 100
    u989(vl, x) 1380.886338 1413.878029 1454.50502 1436.18430 1479.18934 1632.3281 100
    jogo1(vl, x) 4067.106897 4339.005951 4472.46318 4454.89297 4563.08310 5114.4626 100





    The used functions of each solution:



    jaap1 <- function(v,x) {
    l <- length(x);
    w <- which(rowSums(mapply('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x) ) == length(x));
    rep(w, each = l) + 0:(l-1)
    }

    jaap2 <- function(v,x) {
    l <- length(x);
    w <- which(Reduce("+", Map('==', shift(v, type = 'lead', n = 0:(length(x) - 1)), x)) == length(x));
    rep(w, each = l) + 0:(l-1)
    }

    docendo <- function(v,x) {
    l <- length(x);
    idx <- which(v == x[1]);
    w <- idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))];
    rep(w, each = l) + 0:(l-1)
    }

    a5c1 <- function(v,x) {
    l <- length(x);
    w <- which(colSums(t(embed(v, l)[, l:1]) == x) == l);
    rep(w, each = l) + 0:(l-1)
    }

    jogo1 <- function(v,x) {
    l <- length(x);
    searchX <- function(x, X) all(x==X);
    w <- which(rollapply(v, FUN=searchX, X=x, width=l));
    rep(w, each = l) + 0:(l-1)
    }

    moody <- function(v,x) {
    l <- length(x);
    v2 <- as.numeric(factor(c(v,NA),levels = x));
    v2[is.na(v2)] <- l+1;
    which(diff(v2) == 1)
    }

    cata1 <- function(v,x) {
    l <- length(x);
    w <- which(sapply(lapply(seq(length(v)-l)-1, function(i) v[seq(x)+i]), identical, x));
    rep(w, each = l) + 0:(l-1)
    }

    u989 <- function(v,x) {
    l <- length(x);
    s <- paste(v, collapse = '-');
    p <- paste0('\b', paste(x, collapse = '-'), '\b');
    i <- c(1, unlist(gregexpr(p, s)));
    m <- substring(s, head(i,-1), tail(i,-1));
    ln <- lengths(strsplit(m, '-'));
    w <- cumsum(c(ln[1], ln[-1]-1));
    rep(w, each = l) + 0:(l-1)
    }

    frank <- function(v,x) {
    l <- length(x);
    w = seq_along(v);
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]];
    rep(w, each = l) + 0:(l-1)
    }

    cppFunction('NumericVector SeqInVec(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')

    symb <- function(v,x) {SeqInVec(v, x)}

    cppFunction('NumericVector SeqInVecOpt(NumericVector myVector, NumericVector mySequence) {

    int vecSize = myVector.size();
    int seqSize = mySequence.size();
    NumericVector comparison(seqSize);
    NumericVector res(vecSize);
    int foundCounter = 0;

    for (int i = 0; i < vecSize; i++ ) {

    if (myVector[i] == mySequence[0]) {
    for (int j = 0; j < seqSize; j++ ) {
    comparison[j] = mySequence[j] == myVector[i + j];
    }

    if (sum(comparison) == seqSize) {
    for (int j = 0; j < seqSize; j++ ) {
    res[foundCounter] = i + j + 1;
    foundCounter++;
    }
    }
    }
    }

    IntegerVector idx = seq(0, (foundCounter-1));
    return res[idx];
    }')

    symbOpt <- function(v,x) {SeqInVecOpt(v,x)}




    Since this is a cw-answer I'll add my own benchmark of some of the answers.



    library(data.table)
    library(microbenchmark)

    set.seed(2); v <- sample(1:100, 5e7, TRUE); x <- c(2,3,5)

    jaap1 <- function(v, x) {
    which(rowSums(mapply('==',shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)) == length(x))
    }

    jaap2 <- function(v, x) {
    which(Reduce("+", Map('==',shift(v, type = 'lead', n = 0:(length(x) - 1)),
    x)) == length(x))
    }

    dd1 <- function(v, x) {
    idx <- which(v == x[1])
    idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))]
    }

    dd2 <- function(v, x) {
    idx <- which(v == x[1L])
    xl <- length(x) - 1L
    idx[sapply(idx, function(i) all(v[i:(i+xl)] == x))]
    }

    frank <- function(v, x) {
    w = seq_along(v)
    for (i in seq_along(x)) w = w[v[w+i-1L] == x[i]]
    w
    }

    all.equal(jaap1(v, x), dd1(v, x))
    all.equal(jaap2(v, x), dd1(v, x))
    all.equal(dd2(v, x), dd1(v, x))
    all.equal(frank(v, x), dd1(v, x))

    bm <- microbenchmark(jaap1(v, x), jaap2(v, x), dd1(v, x), dd2(v, x), frank(v, x),
    unit = "relative", times = 25)

    plot(bm)


    Imgur



    bm
    Unit: relative
    expr min lq mean median uq max neval
    jaap1(v, x) 4.487360 4.591961 4.724153 4.870226 4.660023 3.9361093 25
    jaap2(v, x) 2.026052 2.159902 2.116204 2.282644 2.138106 2.1133068 25
    dd1(v, x) 1.078059 1.151530 1.119067 1.257337 1.201762 0.8646835 25
    dd2(v, x) 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000 25
    frank(v, x) 1.400735 1.376405 1.442887 1.427433 1.611672 1.3440097 25


    Bottom line: without knowing the real data, all these benchmarks don't tell the whole story.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jul 16 '18 at 22:22


























    community wiki





    9 revs, 3 users 68%
    Jaap









    • 1





      @docendodiscimus - could you update with the data you've used in your benchmarks?

      – SymbolixAU
      Feb 11 '18 at 21:33











    • @SymbolixAU, yes of course. Sorry, I thought I had done that already.

      – docendo discimus
      Feb 12 '18 at 7:52











    • My answer in base R is (on average) 4x times faster than jogo's answer with the help of a library. I have got +2/-2 votes and his answer +15. Hmmm :-/

      – 989
      Feb 12 '18 at 9:47








    • 2





      @989 - I wouldn't take it personally; after the initial flurry of activity & votes, people don't often re-visit questions, which also means down-votes often won't get removed even if you improve the answer.

      – SymbolixAU
      Feb 13 '18 at 3:46
















    • 1





      @docendodiscimus - could you update with the data you've used in your benchmarks?

      – SymbolixAU
      Feb 11 '18 at 21:33











    • @SymbolixAU, yes of course. Sorry, I thought I had done that already.

      – docendo discimus
      Feb 12 '18 at 7:52











    • My answer in base R is (on average) 4x times faster than jogo's answer with the help of a library. I have got +2/-2 votes and his answer +15. Hmmm :-/

      – 989
      Feb 12 '18 at 9:47








    • 2





      @989 - I wouldn't take it personally; after the initial flurry of activity & votes, people don't often re-visit questions, which also means down-votes often won't get removed even if you improve the answer.

      – SymbolixAU
      Feb 13 '18 at 3:46










    1




    1





    @docendodiscimus - could you update with the data you've used in your benchmarks?

    – SymbolixAU
    Feb 11 '18 at 21:33





    @docendodiscimus - could you update with the data you've used in your benchmarks?

    – SymbolixAU
    Feb 11 '18 at 21:33













    @SymbolixAU, yes of course. Sorry, I thought I had done that already.

    – docendo discimus
    Feb 12 '18 at 7:52





    @SymbolixAU, yes of course. Sorry, I thought I had done that already.

    – docendo discimus
    Feb 12 '18 at 7:52













    My answer in base R is (on average) 4x times faster than jogo's answer with the help of a library. I have got +2/-2 votes and his answer +15. Hmmm :-/

    – 989
    Feb 12 '18 at 9:47







    My answer in base R is (on average) 4x times faster than jogo's answer with the help of a library. I have got +2/-2 votes and his answer +15. Hmmm :-/

    – 989
    Feb 12 '18 at 9:47






    2




    2





    @989 - I wouldn't take it personally; after the initial flurry of activity & votes, people don't often re-visit questions, which also means down-votes often won't get removed even if you improve the answer.

    – SymbolixAU
    Feb 13 '18 at 3:46







    @989 - I wouldn't take it personally; after the initial flurry of activity & votes, people don't often re-visit questions, which also means down-votes often won't get removed even if you improve the answer.

    – SymbolixAU
    Feb 13 '18 at 3:46













    4














    Here's a solution that leverages binary search on secondary indices in data.table. (Great vignette here)



    This method has quite a bit of overhead so it's not particularly competitive on the 1e4 length vector in the benchmark, but it hangs near the top of the pack as the size increases.



    Hats off to everyone else posting solutions, learning a lot from this question.



    matt <- function(v,x){
    l <- length(x);
    SL <- seq_len(l-1);
    DT <- data.table(Seq_0 = v);
    for (i in SL) set(DT, j = eval(paste0("Seq_",i)), value = shift(DT[["Seq_0"]],n = i, type = "lead"));
    w <- DT[as.list(x),on = paste0("Seq_",c(0L,SL)), which = TRUE];
    rep(w, each = l) + 0:(l-1)
    }




    Benchmarking



    library(data.table)
    library(microbenchmark)
    library(Rcpp)
    library(zoo)

    set.seed(2)
    vl <- sample(1:10, 1e6, TRUE)
    vm <- vl[1:1e5]
    vs <- vl[1:1e4]
    x <- c(2,3,5)


    Vector Length 1e4





    Unit: microseconds
    expr min lq mean median uq max neval
    symb(vs, x) 138.342 143.048 161.6681 153.1545 159.269 259.999 10
    frank(vs, x) 176.634 184.129 198.8060 193.2850 200.701 257.050 10
    jaap2(vs, x) 282.231 299.025 342.5323 316.5185 337.760 524.212 10
    jaap1(vs, x) 490.013 528.123 568.6168 538.7595 547.268 731.340 10
    a5c1(vs, x) 706.450 742.270 751.3092 756.2075 758.859 793.446 10
    dd2(vs, x) 1319.098 1348.082 2061.5579 1363.2265 1497.960 7913.383 10
    docendo(vs, x) 1427.768 1459.484 1536.6439 1546.2135 1595.858 1696.070 10
    dd1(vs, x) 1377.502 1406.272 2217.2382 1552.5030 1706.131 8084.474 10
    matt(vs, x) 1928.418 2041.597 2390.6227 2087.6335 2430.470 4762.909 10
    u989(vs, x) 8720.330 8821.987 8935.7188 8882.0190 9106.705 9163.967 10
    jogo1(vs, x) 47123.615 47536.700 49158.2600 48449.2390 50957.035 52496.981 10


    Vector Length 1e5





    Unit: milliseconds
    expr min lq mean median uq max neval
    symb(vm, x) 1.319921 1.378801 1.464972 1.423782 1.577006 1.682156 10
    frank(vm, x) 1.671155 1.739507 1.806548 1.760738 1.844893 2.097404 10
    jaap2(vm, x) 2.298449 2.380281 2.683813 2.432373 2.566581 4.310258 10
    matt(vm, x) 3.195048 3.495247 3.577080 3.607060 3.687222 3.844508 10
    jaap1(vm, x) 4.079117 4.179975 4.776989 4.496603 5.206452 6.295954 10
    a5c1(vm, x) 6.488621 6.617709 7.366226 6.720107 6.877529 12.500510 10
    dd2(vm, x) 12.595699 12.812876 14.990739 14.058098 16.758380 20.743506 10
    docendo(vm, x) 13.635357 13.999721 15.296075 14.729947 16.151790 18.541582 10
    dd1(vm, x) 13.474589 14.177410 15.676348 15.446635 17.150199 19.085379 10
    u989(vm, x) 94.844298 95.026733 96.309658 95.134400 97.460869 100.536654 10
    jogo1(vm, x) 575.230741 581.654544 621.824297 616.474265 628.267155 723.010738 10


    Vector Length 1e6





    Unit: milliseconds
    expr min lq mean median uq max neval
    symb(vl, x) 13.34294 13.55564 14.01556 13.61847 14.78210 15.26076 10
    frank(vl, x) 17.35628 17.45602 18.62781 17.56914 17.88896 25.38812 10
    matt(vl, x) 20.79867 21.07157 22.41467 21.23878 22.56063 27.12909 10
    jaap2(vl, x) 22.81464 22.92414 22.96956 22.99085 23.02558 23.10124 10
    jaap1(vl, x) 40.00971 40.46594 43.01407 41.03370 42.81724 55.90530 10
    a5c1(vl, x) 65.39460 65.97406 69.27288 66.28000 66.72847 83.77490 10
    dd2(vl, x) 127.47617 132.99154 161.85129 134.63168 157.40028 342.37526 10
    dd1(vl, x) 140.06140 145.45085 154.88780 154.23280 161.90710 171.60294 10
    docendo(vl, x) 147.07644 151.58861 162.20522 162.49216 165.49513 183.64135 10
    u989(vl, x) 2022.64476 2041.55442 2055.86929 2054.92627 2066.26187 2088.71411 10
    jogo1(vl, x) 5563.31171 5632.17506 5863.56265 5872.61793 6016.62838 6244.63205 10





    share|improve this answer




























      4














      Here's a solution that leverages binary search on secondary indices in data.table. (Great vignette here)



      This method has quite a bit of overhead so it's not particularly competitive on the 1e4 length vector in the benchmark, but it hangs near the top of the pack as the size increases.



      Hats off to everyone else posting solutions, learning a lot from this question.



      matt <- function(v,x){
      l <- length(x);
      SL <- seq_len(l-1);
      DT <- data.table(Seq_0 = v);
      for (i in SL) set(DT, j = eval(paste0("Seq_",i)), value = shift(DT[["Seq_0"]],n = i, type = "lead"));
      w <- DT[as.list(x),on = paste0("Seq_",c(0L,SL)), which = TRUE];
      rep(w, each = l) + 0:(l-1)
      }




      Benchmarking



      library(data.table)
      library(microbenchmark)
      library(Rcpp)
      library(zoo)

      set.seed(2)
      vl <- sample(1:10, 1e6, TRUE)
      vm <- vl[1:1e5]
      vs <- vl[1:1e4]
      x <- c(2,3,5)


      Vector Length 1e4





      Unit: microseconds
      expr min lq mean median uq max neval
      symb(vs, x) 138.342 143.048 161.6681 153.1545 159.269 259.999 10
      frank(vs, x) 176.634 184.129 198.8060 193.2850 200.701 257.050 10
      jaap2(vs, x) 282.231 299.025 342.5323 316.5185 337.760 524.212 10
      jaap1(vs, x) 490.013 528.123 568.6168 538.7595 547.268 731.340 10
      a5c1(vs, x) 706.450 742.270 751.3092 756.2075 758.859 793.446 10
      dd2(vs, x) 1319.098 1348.082 2061.5579 1363.2265 1497.960 7913.383 10
      docendo(vs, x) 1427.768 1459.484 1536.6439 1546.2135 1595.858 1696.070 10
      dd1(vs, x) 1377.502 1406.272 2217.2382 1552.5030 1706.131 8084.474 10
      matt(vs, x) 1928.418 2041.597 2390.6227 2087.6335 2430.470 4762.909 10
      u989(vs, x) 8720.330 8821.987 8935.7188 8882.0190 9106.705 9163.967 10
      jogo1(vs, x) 47123.615 47536.700 49158.2600 48449.2390 50957.035 52496.981 10


      Vector Length 1e5





      Unit: milliseconds
      expr min lq mean median uq max neval
      symb(vm, x) 1.319921 1.378801 1.464972 1.423782 1.577006 1.682156 10
      frank(vm, x) 1.671155 1.739507 1.806548 1.760738 1.844893 2.097404 10
      jaap2(vm, x) 2.298449 2.380281 2.683813 2.432373 2.566581 4.310258 10
      matt(vm, x) 3.195048 3.495247 3.577080 3.607060 3.687222 3.844508 10
      jaap1(vm, x) 4.079117 4.179975 4.776989 4.496603 5.206452 6.295954 10
      a5c1(vm, x) 6.488621 6.617709 7.366226 6.720107 6.877529 12.500510 10
      dd2(vm, x) 12.595699 12.812876 14.990739 14.058098 16.758380 20.743506 10
      docendo(vm, x) 13.635357 13.999721 15.296075 14.729947 16.151790 18.541582 10
      dd1(vm, x) 13.474589 14.177410 15.676348 15.446635 17.150199 19.085379 10
      u989(vm, x) 94.844298 95.026733 96.309658 95.134400 97.460869 100.536654 10
      jogo1(vm, x) 575.230741 581.654544 621.824297 616.474265 628.267155 723.010738 10


      Vector Length 1e6





      Unit: milliseconds
      expr min lq mean median uq max neval
      symb(vl, x) 13.34294 13.55564 14.01556 13.61847 14.78210 15.26076 10
      frank(vl, x) 17.35628 17.45602 18.62781 17.56914 17.88896 25.38812 10
      matt(vl, x) 20.79867 21.07157 22.41467 21.23878 22.56063 27.12909 10
      jaap2(vl, x) 22.81464 22.92414 22.96956 22.99085 23.02558 23.10124 10
      jaap1(vl, x) 40.00971 40.46594 43.01407 41.03370 42.81724 55.90530 10
      a5c1(vl, x) 65.39460 65.97406 69.27288 66.28000 66.72847 83.77490 10
      dd2(vl, x) 127.47617 132.99154 161.85129 134.63168 157.40028 342.37526 10
      dd1(vl, x) 140.06140 145.45085 154.88780 154.23280 161.90710 171.60294 10
      docendo(vl, x) 147.07644 151.58861 162.20522 162.49216 165.49513 183.64135 10
      u989(vl, x) 2022.64476 2041.55442 2055.86929 2054.92627 2066.26187 2088.71411 10
      jogo1(vl, x) 5563.31171 5632.17506 5863.56265 5872.61793 6016.62838 6244.63205 10





      share|improve this answer


























        4












        4








        4







        Here's a solution that leverages binary search on secondary indices in data.table. (Great vignette here)



        This method has quite a bit of overhead so it's not particularly competitive on the 1e4 length vector in the benchmark, but it hangs near the top of the pack as the size increases.



        Hats off to everyone else posting solutions, learning a lot from this question.



        matt <- function(v,x){
        l <- length(x);
        SL <- seq_len(l-1);
        DT <- data.table(Seq_0 = v);
        for (i in SL) set(DT, j = eval(paste0("Seq_",i)), value = shift(DT[["Seq_0"]],n = i, type = "lead"));
        w <- DT[as.list(x),on = paste0("Seq_",c(0L,SL)), which = TRUE];
        rep(w, each = l) + 0:(l-1)
        }




        Benchmarking



        library(data.table)
        library(microbenchmark)
        library(Rcpp)
        library(zoo)

        set.seed(2)
        vl <- sample(1:10, 1e6, TRUE)
        vm <- vl[1:1e5]
        vs <- vl[1:1e4]
        x <- c(2,3,5)


        Vector Length 1e4





        Unit: microseconds
        expr min lq mean median uq max neval
        symb(vs, x) 138.342 143.048 161.6681 153.1545 159.269 259.999 10
        frank(vs, x) 176.634 184.129 198.8060 193.2850 200.701 257.050 10
        jaap2(vs, x) 282.231 299.025 342.5323 316.5185 337.760 524.212 10
        jaap1(vs, x) 490.013 528.123 568.6168 538.7595 547.268 731.340 10
        a5c1(vs, x) 706.450 742.270 751.3092 756.2075 758.859 793.446 10
        dd2(vs, x) 1319.098 1348.082 2061.5579 1363.2265 1497.960 7913.383 10
        docendo(vs, x) 1427.768 1459.484 1536.6439 1546.2135 1595.858 1696.070 10
        dd1(vs, x) 1377.502 1406.272 2217.2382 1552.5030 1706.131 8084.474 10
        matt(vs, x) 1928.418 2041.597 2390.6227 2087.6335 2430.470 4762.909 10
        u989(vs, x) 8720.330 8821.987 8935.7188 8882.0190 9106.705 9163.967 10
        jogo1(vs, x) 47123.615 47536.700 49158.2600 48449.2390 50957.035 52496.981 10


        Vector Length 1e5





        Unit: milliseconds
        expr min lq mean median uq max neval
        symb(vm, x) 1.319921 1.378801 1.464972 1.423782 1.577006 1.682156 10
        frank(vm, x) 1.671155 1.739507 1.806548 1.760738 1.844893 2.097404 10
        jaap2(vm, x) 2.298449 2.380281 2.683813 2.432373 2.566581 4.310258 10
        matt(vm, x) 3.195048 3.495247 3.577080 3.607060 3.687222 3.844508 10
        jaap1(vm, x) 4.079117 4.179975 4.776989 4.496603 5.206452 6.295954 10
        a5c1(vm, x) 6.488621 6.617709 7.366226 6.720107 6.877529 12.500510 10
        dd2(vm, x) 12.595699 12.812876 14.990739 14.058098 16.758380 20.743506 10
        docendo(vm, x) 13.635357 13.999721 15.296075 14.729947 16.151790 18.541582 10
        dd1(vm, x) 13.474589 14.177410 15.676348 15.446635 17.150199 19.085379 10
        u989(vm, x) 94.844298 95.026733 96.309658 95.134400 97.460869 100.536654 10
        jogo1(vm, x) 575.230741 581.654544 621.824297 616.474265 628.267155 723.010738 10


        Vector Length 1e6





        Unit: milliseconds
        expr min lq mean median uq max neval
        symb(vl, x) 13.34294 13.55564 14.01556 13.61847 14.78210 15.26076 10
        frank(vl, x) 17.35628 17.45602 18.62781 17.56914 17.88896 25.38812 10
        matt(vl, x) 20.79867 21.07157 22.41467 21.23878 22.56063 27.12909 10
        jaap2(vl, x) 22.81464 22.92414 22.96956 22.99085 23.02558 23.10124 10
        jaap1(vl, x) 40.00971 40.46594 43.01407 41.03370 42.81724 55.90530 10
        a5c1(vl, x) 65.39460 65.97406 69.27288 66.28000 66.72847 83.77490 10
        dd2(vl, x) 127.47617 132.99154 161.85129 134.63168 157.40028 342.37526 10
        dd1(vl, x) 140.06140 145.45085 154.88780 154.23280 161.90710 171.60294 10
        docendo(vl, x) 147.07644 151.58861 162.20522 162.49216 165.49513 183.64135 10
        u989(vl, x) 2022.64476 2041.55442 2055.86929 2054.92627 2066.26187 2088.71411 10
        jogo1(vl, x) 5563.31171 5632.17506 5863.56265 5872.61793 6016.62838 6244.63205 10





        share|improve this answer













        Here's a solution that leverages binary search on secondary indices in data.table. (Great vignette here)



        This method has quite a bit of overhead so it's not particularly competitive on the 1e4 length vector in the benchmark, but it hangs near the top of the pack as the size increases.



        Hats off to everyone else posting solutions, learning a lot from this question.



        matt <- function(v,x){
        l <- length(x);
        SL <- seq_len(l-1);
        DT <- data.table(Seq_0 = v);
        for (i in SL) set(DT, j = eval(paste0("Seq_",i)), value = shift(DT[["Seq_0"]],n = i, type = "lead"));
        w <- DT[as.list(x),on = paste0("Seq_",c(0L,SL)), which = TRUE];
        rep(w, each = l) + 0:(l-1)
        }




        Benchmarking



        library(data.table)
        library(microbenchmark)
        library(Rcpp)
        library(zoo)

        set.seed(2)
        vl <- sample(1:10, 1e6, TRUE)
        vm <- vl[1:1e5]
        vs <- vl[1:1e4]
        x <- c(2,3,5)


        Vector Length 1e4





        Unit: microseconds
        expr min lq mean median uq max neval
        symb(vs, x) 138.342 143.048 161.6681 153.1545 159.269 259.999 10
        frank(vs, x) 176.634 184.129 198.8060 193.2850 200.701 257.050 10
        jaap2(vs, x) 282.231 299.025 342.5323 316.5185 337.760 524.212 10
        jaap1(vs, x) 490.013 528.123 568.6168 538.7595 547.268 731.340 10
        a5c1(vs, x) 706.450 742.270 751.3092 756.2075 758.859 793.446 10
        dd2(vs, x) 1319.098 1348.082 2061.5579 1363.2265 1497.960 7913.383 10
        docendo(vs, x) 1427.768 1459.484 1536.6439 1546.2135 1595.858 1696.070 10
        dd1(vs, x) 1377.502 1406.272 2217.2382 1552.5030 1706.131 8084.474 10
        matt(vs, x) 1928.418 2041.597 2390.6227 2087.6335 2430.470 4762.909 10
        u989(vs, x) 8720.330 8821.987 8935.7188 8882.0190 9106.705 9163.967 10
        jogo1(vs, x) 47123.615 47536.700 49158.2600 48449.2390 50957.035 52496.981 10


        Vector Length 1e5





        Unit: milliseconds
        expr min lq mean median uq max neval
        symb(vm, x) 1.319921 1.378801 1.464972 1.423782 1.577006 1.682156 10
        frank(vm, x) 1.671155 1.739507 1.806548 1.760738 1.844893 2.097404 10
        jaap2(vm, x) 2.298449 2.380281 2.683813 2.432373 2.566581 4.310258 10
        matt(vm, x) 3.195048 3.495247 3.577080 3.607060 3.687222 3.844508 10
        jaap1(vm, x) 4.079117 4.179975 4.776989 4.496603 5.206452 6.295954 10
        a5c1(vm, x) 6.488621 6.617709 7.366226 6.720107 6.877529 12.500510 10
        dd2(vm, x) 12.595699 12.812876 14.990739 14.058098 16.758380 20.743506 10
        docendo(vm, x) 13.635357 13.999721 15.296075 14.729947 16.151790 18.541582 10
        dd1(vm, x) 13.474589 14.177410 15.676348 15.446635 17.150199 19.085379 10
        u989(vm, x) 94.844298 95.026733 96.309658 95.134400 97.460869 100.536654 10
        jogo1(vm, x) 575.230741 581.654544 621.824297 616.474265 628.267155 723.010738 10


        Vector Length 1e6





        Unit: milliseconds
        expr min lq mean median uq max neval
        symb(vl, x) 13.34294 13.55564 14.01556 13.61847 14.78210 15.26076 10
        frank(vl, x) 17.35628 17.45602 18.62781 17.56914 17.88896 25.38812 10
        matt(vl, x) 20.79867 21.07157 22.41467 21.23878 22.56063 27.12909 10
        jaap2(vl, x) 22.81464 22.92414 22.96956 22.99085 23.02558 23.10124 10
        jaap1(vl, x) 40.00971 40.46594 43.01407 41.03370 42.81724 55.90530 10
        a5c1(vl, x) 65.39460 65.97406 69.27288 66.28000 66.72847 83.77490 10
        dd2(vl, x) 127.47617 132.99154 161.85129 134.63168 157.40028 342.37526 10
        dd1(vl, x) 140.06140 145.45085 154.88780 154.23280 161.90710 171.60294 10
        docendo(vl, x) 147.07644 151.58861 162.20522 162.49216 165.49513 183.64135 10
        u989(vl, x) 2022.64476 2041.55442 2055.86929 2054.92627 2066.26187 2088.71411 10
        jogo1(vl, x) 5563.31171 5632.17506 5863.56265 5872.61793 6016.62838 6244.63205 10






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Feb 9 '18 at 22:23









        Matt SummersgillMatt Summersgill

        1,950523




        1,950523























            2














            Here is a string-based approach in base R:



            str <- paste(v, collapse = '-')
            # "2-2-3-5-8-0-32-1-3-12-5-2-3-5-8-33-1"

            pattern <- paste0('\b', paste(x, collapse = '-'), '\b')
            # "\b2-3-5-8\b"

            inds <- unlist(gregexpr(pattern, str)) # (1)
            # 3 25
            sapply(inds, function(i) lengths(strsplit(substr(str, 1, i),'-'))) # (2)

            # [1] 2 12




            • \b is used for exact matching.

            • (1) Finds the positions at which pattern is seen within str.

            • (2) Getting back the respective indices within the original vector v.




            UPDATE



            As for the discussion of running-time efficiency, here is a much faster solution than my first solution:



            str <- paste(v, collapse = '-')
            pattern <- paste0('\b', paste(x, collapse = '-'), '\b')

            inds <- c(1, unlist(gregexpr(pattern, str)))

            m <- substring(str, head(inds,-1), tail(inds,-1))
            ln <- lengths(strsplit(m, '-'))
            cumsum(c(ln[1], ln[-1]-1))





            share|improve this answer





















            • 2





              I've updated the benchmarks and only included your fastest solution.

              – Jaap
              Feb 8 '18 at 19:35











            • I looked at what they return and then adjusted the solutions such that all would give the same result (didn't programmatically check it though)

              – Jaap
              Feb 8 '18 at 19:55











            • included now :-)

              – Jaap
              Feb 8 '18 at 20:11











            • thx for notifying, changed the construction of the vectors a bit; now it should return a normal vector :-)

              – Jaap
              Feb 8 '18 at 20:26











            • please leave a note under the respective answers so they can improve; could you check my benchmarking codes? it could as well that I made a mistake somewhere

              – Jaap
              Feb 8 '18 at 20:39


















            2














            Here is a string-based approach in base R:



            str <- paste(v, collapse = '-')
            # "2-2-3-5-8-0-32-1-3-12-5-2-3-5-8-33-1"

            pattern <- paste0('\b', paste(x, collapse = '-'), '\b')
            # "\b2-3-5-8\b"

            inds <- unlist(gregexpr(pattern, str)) # (1)
            # 3 25
            sapply(inds, function(i) lengths(strsplit(substr(str, 1, i),'-'))) # (2)

            # [1] 2 12




            • \b is used for exact matching.

            • (1) Finds the positions at which pattern is seen within str.

            • (2) Getting back the respective indices within the original vector v.




            UPDATE



            As for the discussion of running-time efficiency, here is a much faster solution than my first solution:



            str <- paste(v, collapse = '-')
            pattern <- paste0('\b', paste(x, collapse = '-'), '\b')

            inds <- c(1, unlist(gregexpr(pattern, str)))

            m <- substring(str, head(inds,-1), tail(inds,-1))
            ln <- lengths(strsplit(m, '-'))
            cumsum(c(ln[1], ln[-1]-1))





            share|improve this answer





















            • 2





              I've updated the benchmarks and only included your fastest solution.

              – Jaap
              Feb 8 '18 at 19:35











            • I looked at what they return and then adjusted the solutions such that all would give the same result (didn't programmatically check it though)

              – Jaap
              Feb 8 '18 at 19:55











            • included now :-)

              – Jaap
              Feb 8 '18 at 20:11











            • thx for notifying, changed the construction of the vectors a bit; now it should return a normal vector :-)

              – Jaap
              Feb 8 '18 at 20:26











            • please leave a note under the respective answers so they can improve; could you check my benchmarking codes? it could as well that I made a mistake somewhere

              – Jaap
              Feb 8 '18 at 20:39
















            2












            2








            2







            Here is a string-based approach in base R:



            str <- paste(v, collapse = '-')
            # "2-2-3-5-8-0-32-1-3-12-5-2-3-5-8-33-1"

            pattern <- paste0('\b', paste(x, collapse = '-'), '\b')
            # "\b2-3-5-8\b"

            inds <- unlist(gregexpr(pattern, str)) # (1)
            # 3 25
            sapply(inds, function(i) lengths(strsplit(substr(str, 1, i),'-'))) # (2)

            # [1] 2 12




            • \b is used for exact matching.

            • (1) Finds the positions at which pattern is seen within str.

            • (2) Getting back the respective indices within the original vector v.




            UPDATE



            As for the discussion of running-time efficiency, here is a much faster solution than my first solution:



            str <- paste(v, collapse = '-')
            pattern <- paste0('\b', paste(x, collapse = '-'), '\b')

            inds <- c(1, unlist(gregexpr(pattern, str)))

            m <- substring(str, head(inds,-1), tail(inds,-1))
            ln <- lengths(strsplit(m, '-'))
            cumsum(c(ln[1], ln[-1]-1))





            share|improve this answer















            Here is a string-based approach in base R:



            str <- paste(v, collapse = '-')
            # "2-2-3-5-8-0-32-1-3-12-5-2-3-5-8-33-1"

            pattern <- paste0('\b', paste(x, collapse = '-'), '\b')
            # "\b2-3-5-8\b"

            inds <- unlist(gregexpr(pattern, str)) # (1)
            # 3 25
            sapply(inds, function(i) lengths(strsplit(substr(str, 1, i),'-'))) # (2)

            # [1] 2 12




            • \b is used for exact matching.

            • (1) Finds the positions at which pattern is seen within str.

            • (2) Getting back the respective indices within the original vector v.




            UPDATE



            As for the discussion of running-time efficiency, here is a much faster solution than my first solution:



            str <- paste(v, collapse = '-')
            pattern <- paste0('\b', paste(x, collapse = '-'), '\b')

            inds <- c(1, unlist(gregexpr(pattern, str)))

            m <- substring(str, head(inds,-1), tail(inds,-1))
            ln <- lengths(strsplit(m, '-'))
            cumsum(c(ln[1], ln[-1]-1))






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Feb 8 '18 at 19:55

























            answered Feb 7 '18 at 15:51









            989989

            8,98751834




            8,98751834








            • 2





              I've updated the benchmarks and only included your fastest solution.

              – Jaap
              Feb 8 '18 at 19:35











            • I looked at what they return and then adjusted the solutions such that all would give the same result (didn't programmatically check it though)

              – Jaap
              Feb 8 '18 at 19:55











            • included now :-)

              – Jaap
              Feb 8 '18 at 20:11











            • thx for notifying, changed the construction of the vectors a bit; now it should return a normal vector :-)

              – Jaap
              Feb 8 '18 at 20:26











            • please leave a note under the respective answers so they can improve; could you check my benchmarking codes? it could as well that I made a mistake somewhere

              – Jaap
              Feb 8 '18 at 20:39
















            • 2





              I've updated the benchmarks and only included your fastest solution.

              – Jaap
              Feb 8 '18 at 19:35











            • I looked at what they return and then adjusted the solutions such that all would give the same result (didn't programmatically check it though)

              – Jaap
              Feb 8 '18 at 19:55











            • included now :-)

              – Jaap
              Feb 8 '18 at 20:11











            • thx for notifying, changed the construction of the vectors a bit; now it should return a normal vector :-)

              – Jaap
              Feb 8 '18 at 20:26











            • please leave a note under the respective answers so they can improve; could you check my benchmarking codes? it could as well that I made a mistake somewhere

              – Jaap
              Feb 8 '18 at 20:39










            2




            2





            I've updated the benchmarks and only included your fastest solution.

            – Jaap
            Feb 8 '18 at 19:35





            I've updated the benchmarks and only included your fastest solution.

            – Jaap
            Feb 8 '18 at 19:35













            I looked at what they return and then adjusted the solutions such that all would give the same result (didn't programmatically check it though)

            – Jaap
            Feb 8 '18 at 19:55





            I looked at what they return and then adjusted the solutions such that all would give the same result (didn't programmatically check it though)

            – Jaap
            Feb 8 '18 at 19:55













            included now :-)

            – Jaap
            Feb 8 '18 at 20:11





            included now :-)

            – Jaap
            Feb 8 '18 at 20:11













            thx for notifying, changed the construction of the vectors a bit; now it should return a normal vector :-)

            – Jaap
            Feb 8 '18 at 20:26





            thx for notifying, changed the construction of the vectors a bit; now it should return a normal vector :-)

            – Jaap
            Feb 8 '18 at 20:26













            please leave a note under the respective answers so they can improve; could you check my benchmarking codes? it could as well that I made a mistake somewhere

            – Jaap
            Feb 8 '18 at 20:39







            please leave a note under the respective answers so they can improve; could you check my benchmarking codes? it could as well that I made a mistake somewhere

            – Jaap
            Feb 8 '18 at 20:39













            1














            EDIT: some have noted that my answer doesn't always give the desired output, I might fix it later, caution meanwhile!



            We can convert v to factors and keep only consecutive values in our transformed vector:



            v2 <- as.numeric(factor(c(v,NA),levels = x)) # [1]  1  1  2  3  4 NA NA NA ...
            v2[is.na(v2)] <- length(x)+1 # [1] 1 1 2 3 4 5 5 5 ...
            output <- diff(v2) ==1
            # [1] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE


            data



            v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
            x <- c(2,3,5,8)





            share|improve this answer





















            • 1





              that's pretty computationally intensive.

              – Carl Witthoft
              Feb 7 '18 at 14:30











            • is it ? I don't know, it's the only fully vectorized solution so far, too many copies ?

              – Moody_Mudskipper
              Feb 7 '18 at 14:33











            • I plead guilty to not having run microbenchmark on the various answers here. It's just a gut feeling because of the number of class coercions going on there.

              – Carl Witthoft
              Feb 7 '18 at 14:40











            • @CarlWitthoft, I guess that the answers by catastrophic-failure, which both utilise nested loops, will be much slower. But I too haven't tested any.

              – docendo discimus
              Feb 7 '18 at 14:51






            • 1





              @docendodiscimus see my latest benchmarks

              – Carl Witthoft
              Feb 7 '18 at 15:32
















            1














            EDIT: some have noted that my answer doesn't always give the desired output, I might fix it later, caution meanwhile!



            We can convert v to factors and keep only consecutive values in our transformed vector:



            v2 <- as.numeric(factor(c(v,NA),levels = x)) # [1]  1  1  2  3  4 NA NA NA ...
            v2[is.na(v2)] <- length(x)+1 # [1] 1 1 2 3 4 5 5 5 ...
            output <- diff(v2) ==1
            # [1] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE


            data



            v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
            x <- c(2,3,5,8)





            share|improve this answer





















            • 1





              that's pretty computationally intensive.

              – Carl Witthoft
              Feb 7 '18 at 14:30











            • is it ? I don't know, it's the only fully vectorized solution so far, too many copies ?

              – Moody_Mudskipper
              Feb 7 '18 at 14:33











            • I plead guilty to not having run microbenchmark on the various answers here. It's just a gut feeling because of the number of class coercions going on there.

              – Carl Witthoft
              Feb 7 '18 at 14:40











            • @CarlWitthoft, I guess that the answers by catastrophic-failure, which both utilise nested loops, will be much slower. But I too haven't tested any.

              – docendo discimus
              Feb 7 '18 at 14:51






            • 1





              @docendodiscimus see my latest benchmarks

              – Carl Witthoft
              Feb 7 '18 at 15:32














            1












            1








            1







            EDIT: some have noted that my answer doesn't always give the desired output, I might fix it later, caution meanwhile!



            We can convert v to factors and keep only consecutive values in our transformed vector:



            v2 <- as.numeric(factor(c(v,NA),levels = x)) # [1]  1  1  2  3  4 NA NA NA ...
            v2[is.na(v2)] <- length(x)+1 # [1] 1 1 2 3 4 5 5 5 ...
            output <- diff(v2) ==1
            # [1] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE


            data



            v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
            x <- c(2,3,5,8)





            share|improve this answer















            EDIT: some have noted that my answer doesn't always give the desired output, I might fix it later, caution meanwhile!



            We can convert v to factors and keep only consecutive values in our transformed vector:



            v2 <- as.numeric(factor(c(v,NA),levels = x)) # [1]  1  1  2  3  4 NA NA NA ...
            v2[is.na(v2)] <- length(x)+1 # [1] 1 1 2 3 4 5 5 5 ...
            output <- diff(v2) ==1
            # [1] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE


            data



            v <- c(2,2,3,5,8,0,32,1,3,12,5,2,3,5,8,33,1)
            x <- c(2,3,5,8)






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Feb 12 '18 at 15:36

























            answered Feb 7 '18 at 13:03









            Moody_MudskipperMoody_Mudskipper

            22.7k32964




            22.7k32964








            • 1





              that's pretty computationally intensive.

              – Carl Witthoft
              Feb 7 '18 at 14:30











            • is it ? I don't know, it's the only fully vectorized solution so far, too many copies ?

              – Moody_Mudskipper
              Feb 7 '18 at 14:33











            • I plead guilty to not having run microbenchmark on the various answers here. It's just a gut feeling because of the number of class coercions going on there.

              – Carl Witthoft
              Feb 7 '18 at 14:40











            • @CarlWitthoft, I guess that the answers by catastrophic-failure, which both utilise nested loops, will be much slower. But I too haven't tested any.

              – docendo discimus
              Feb 7 '18 at 14:51






            • 1





              @docendodiscimus see my latest benchmarks

              – Carl Witthoft
              Feb 7 '18 at 15:32














            • 1





              that's pretty computationally intensive.

              – Carl Witthoft
              Feb 7 '18 at 14:30











            • is it ? I don't know, it's the only fully vectorized solution so far, too many copies ?

              – Moody_Mudskipper
              Feb 7 '18 at 14:33











            • I plead guilty to not having run microbenchmark on the various answers here. It's just a gut feeling because of the number of class coercions going on there.

              – Carl Witthoft
              Feb 7 '18 at 14:40











            • @CarlWitthoft, I guess that the answers by catastrophic-failure, which both utilise nested loops, will be much slower. But I too haven't tested any.

              – docendo discimus
              Feb 7 '18 at 14:51






            • 1





              @docendodiscimus see my latest benchmarks

              – Carl Witthoft
              Feb 7 '18 at 15:32








            1




            1





            that's pretty computationally intensive.

            – Carl Witthoft
            Feb 7 '18 at 14:30





            that's pretty computationally intensive.

            – Carl Witthoft
            Feb 7 '18 at 14:30













            is it ? I don't know, it's the only fully vectorized solution so far, too many copies ?

            – Moody_Mudskipper
            Feb 7 '18 at 14:33





            is it ? I don't know, it's the only fully vectorized solution so far, too many copies ?

            – Moody_Mudskipper
            Feb 7 '18 at 14:33













            I plead guilty to not having run microbenchmark on the various answers here. It's just a gut feeling because of the number of class coercions going on there.

            – Carl Witthoft
            Feb 7 '18 at 14:40





            I plead guilty to not having run microbenchmark on the various answers here. It's just a gut feeling because of the number of class coercions going on there.

            – Carl Witthoft
            Feb 7 '18 at 14:40













            @CarlWitthoft, I guess that the answers by catastrophic-failure, which both utilise nested loops, will be much slower. But I too haven't tested any.

            – docendo discimus
            Feb 7 '18 at 14:51





            @CarlWitthoft, I guess that the answers by catastrophic-failure, which both utilise nested loops, will be much slower. But I too haven't tested any.

            – docendo discimus
            Feb 7 '18 at 14:51




            1




            1





            @docendodiscimus see my latest benchmarks

            – Carl Witthoft
            Feb 7 '18 at 15:32





            @docendodiscimus see my latest benchmarks

            – Carl Witthoft
            Feb 7 '18 at 15:32


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f48660606%2fget-indexes-of-a-vector-of-numbers-in-another-vector%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            List item for chat from Array inside array React Native

            Thiostrepton

            Caerphilly