How does agrep matching work?
up vote
4
down vote
favorite
The agrep function gives some puzzling results and I'd like to understand its behavior better. For example:
agrep("abcd",c("abc","abcde","abcef"),value=T,max.distance = 1)
Returns:
[1] "abc" "abcde" "abcef"
But the distance between "abcd" and "abcef" is 2. So I'm not sure why the third match shows up.
levenshteinDist("abcd","abcef") # gives the answer of 2
Also, I assume that the function would return only exact matches if distance cap is set at 0:
agrep("abcd",c("abc","abcde","abcef"),value=T,max.distance = 0)
However, I get [1] "abcde" as a match
It would be really helpful if someone could explain how the matching in agrep works.
r fuzzy-comparison agrep
|
show 1 more comment
up vote
4
down vote
favorite
The agrep function gives some puzzling results and I'd like to understand its behavior better. For example:
agrep("abcd",c("abc","abcde","abcef"),value=T,max.distance = 1)
Returns:
[1] "abc" "abcde" "abcef"
But the distance between "abcd" and "abcef" is 2. So I'm not sure why the third match shows up.
levenshteinDist("abcd","abcef") # gives the answer of 2
Also, I assume that the function would return only exact matches if distance cap is set at 0:
agrep("abcd",c("abc","abcde","abcef"),value=T,max.distance = 0)
However, I get [1] "abcde" as a match
It would be really helpful if someone could explain how the matching in agrep works.
r fuzzy-comparison agrep
2
I suspect that the rather testily written Note section in?agrepmight apply here. ;)
– joran
May 15 '15 at 16:21
@joran are you referring to this: "Since someone who read the description carelessly even filed a bug report on it, do note that this matches substrings of each element of x (just as grep does) and not whole elements. See also adist in package utils, which optionally returns the offsets of the matched substrings." I read it but I don't fully understand it..not familiar with how grep works either
– xyy
May 15 '15 at 16:25
Yes, "this matches substrings of each element of x (just as grep does) and not whole elements". So"abcd"needs only to be within 1 of a substring of the comparison strings. It is looking for matches within (that is the word used in the Description section).
– joran
May 15 '15 at 16:28
@joran hm interesting, thanks for the response! So to clarify, the reason that "abcd" is matched to "abcef" in the first example is that if "d" is deleted from "abcd", it would be a match to the substring "abc" in "abcef"? Does this also mean that the transformations are always performed on the pattern argument?
– xyy
May 15 '15 at 16:40
I believe so, yes. I would describe it as "can I transform pattern into a substring of an element of x?" If yes, it matches. The source for agrep is here which would be the definitive answer, provided you know C.
– joran
May 15 '15 at 16:44
|
show 1 more comment
up vote
4
down vote
favorite
up vote
4
down vote
favorite
The agrep function gives some puzzling results and I'd like to understand its behavior better. For example:
agrep("abcd",c("abc","abcde","abcef"),value=T,max.distance = 1)
Returns:
[1] "abc" "abcde" "abcef"
But the distance between "abcd" and "abcef" is 2. So I'm not sure why the third match shows up.
levenshteinDist("abcd","abcef") # gives the answer of 2
Also, I assume that the function would return only exact matches if distance cap is set at 0:
agrep("abcd",c("abc","abcde","abcef"),value=T,max.distance = 0)
However, I get [1] "abcde" as a match
It would be really helpful if someone could explain how the matching in agrep works.
r fuzzy-comparison agrep
The agrep function gives some puzzling results and I'd like to understand its behavior better. For example:
agrep("abcd",c("abc","abcde","abcef"),value=T,max.distance = 1)
Returns:
[1] "abc" "abcde" "abcef"
But the distance between "abcd" and "abcef" is 2. So I'm not sure why the third match shows up.
levenshteinDist("abcd","abcef") # gives the answer of 2
Also, I assume that the function would return only exact matches if distance cap is set at 0:
agrep("abcd",c("abc","abcde","abcef"),value=T,max.distance = 0)
However, I get [1] "abcde" as a match
It would be really helpful if someone could explain how the matching in agrep works.
r fuzzy-comparison agrep
r fuzzy-comparison agrep
asked May 15 '15 at 16:06
xyy
18229
18229
2
I suspect that the rather testily written Note section in?agrepmight apply here. ;)
– joran
May 15 '15 at 16:21
@joran are you referring to this: "Since someone who read the description carelessly even filed a bug report on it, do note that this matches substrings of each element of x (just as grep does) and not whole elements. See also adist in package utils, which optionally returns the offsets of the matched substrings." I read it but I don't fully understand it..not familiar with how grep works either
– xyy
May 15 '15 at 16:25
Yes, "this matches substrings of each element of x (just as grep does) and not whole elements". So"abcd"needs only to be within 1 of a substring of the comparison strings. It is looking for matches within (that is the word used in the Description section).
– joran
May 15 '15 at 16:28
@joran hm interesting, thanks for the response! So to clarify, the reason that "abcd" is matched to "abcef" in the first example is that if "d" is deleted from "abcd", it would be a match to the substring "abc" in "abcef"? Does this also mean that the transformations are always performed on the pattern argument?
– xyy
May 15 '15 at 16:40
I believe so, yes. I would describe it as "can I transform pattern into a substring of an element of x?" If yes, it matches. The source for agrep is here which would be the definitive answer, provided you know C.
– joran
May 15 '15 at 16:44
|
show 1 more comment
2
I suspect that the rather testily written Note section in?agrepmight apply here. ;)
– joran
May 15 '15 at 16:21
@joran are you referring to this: "Since someone who read the description carelessly even filed a bug report on it, do note that this matches substrings of each element of x (just as grep does) and not whole elements. See also adist in package utils, which optionally returns the offsets of the matched substrings." I read it but I don't fully understand it..not familiar with how grep works either
– xyy
May 15 '15 at 16:25
Yes, "this matches substrings of each element of x (just as grep does) and not whole elements". So"abcd"needs only to be within 1 of a substring of the comparison strings. It is looking for matches within (that is the word used in the Description section).
– joran
May 15 '15 at 16:28
@joran hm interesting, thanks for the response! So to clarify, the reason that "abcd" is matched to "abcef" in the first example is that if "d" is deleted from "abcd", it would be a match to the substring "abc" in "abcef"? Does this also mean that the transformations are always performed on the pattern argument?
– xyy
May 15 '15 at 16:40
I believe so, yes. I would describe it as "can I transform pattern into a substring of an element of x?" If yes, it matches. The source for agrep is here which would be the definitive answer, provided you know C.
– joran
May 15 '15 at 16:44
2
2
I suspect that the rather testily written Note section in
?agrep might apply here. ;)– joran
May 15 '15 at 16:21
I suspect that the rather testily written Note section in
?agrep might apply here. ;)– joran
May 15 '15 at 16:21
@joran are you referring to this: "Since someone who read the description carelessly even filed a bug report on it, do note that this matches substrings of each element of x (just as grep does) and not whole elements. See also adist in package utils, which optionally returns the offsets of the matched substrings." I read it but I don't fully understand it..not familiar with how grep works either
– xyy
May 15 '15 at 16:25
@joran are you referring to this: "Since someone who read the description carelessly even filed a bug report on it, do note that this matches substrings of each element of x (just as grep does) and not whole elements. See also adist in package utils, which optionally returns the offsets of the matched substrings." I read it but I don't fully understand it..not familiar with how grep works either
– xyy
May 15 '15 at 16:25
Yes, "this matches substrings of each element of x (just as grep does) and not whole elements". So
"abcd" needs only to be within 1 of a substring of the comparison strings. It is looking for matches within (that is the word used in the Description section).– joran
May 15 '15 at 16:28
Yes, "this matches substrings of each element of x (just as grep does) and not whole elements". So
"abcd" needs only to be within 1 of a substring of the comparison strings. It is looking for matches within (that is the word used in the Description section).– joran
May 15 '15 at 16:28
@joran hm interesting, thanks for the response! So to clarify, the reason that "abcd" is matched to "abcef" in the first example is that if "d" is deleted from "abcd", it would be a match to the substring "abc" in "abcef"? Does this also mean that the transformations are always performed on the pattern argument?
– xyy
May 15 '15 at 16:40
@joran hm interesting, thanks for the response! So to clarify, the reason that "abcd" is matched to "abcef" in the first example is that if "d" is deleted from "abcd", it would be a match to the substring "abc" in "abcef"? Does this also mean that the transformations are always performed on the pattern argument?
– xyy
May 15 '15 at 16:40
I believe so, yes. I would describe it as "can I transform pattern into a substring of an element of x?" If yes, it matches. The source for agrep is here which would be the definitive answer, provided you know C.
– joran
May 15 '15 at 16:44
I believe so, yes. I would describe it as "can I transform pattern into a substring of an element of x?" If yes, it matches. The source for agrep is here which would be the definitive answer, provided you know C.
– joran
May 15 '15 at 16:44
|
show 1 more comment
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f30264025%2fhow-does-agrep-matching-work%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
I suspect that the rather testily written Note section in
?agrepmight apply here. ;)– joran
May 15 '15 at 16:21
@joran are you referring to this: "Since someone who read the description carelessly even filed a bug report on it, do note that this matches substrings of each element of x (just as grep does) and not whole elements. See also adist in package utils, which optionally returns the offsets of the matched substrings." I read it but I don't fully understand it..not familiar with how grep works either
– xyy
May 15 '15 at 16:25
Yes, "this matches substrings of each element of x (just as grep does) and not whole elements". So
"abcd"needs only to be within 1 of a substring of the comparison strings. It is looking for matches within (that is the word used in the Description section).– joran
May 15 '15 at 16:28
@joran hm interesting, thanks for the response! So to clarify, the reason that "abcd" is matched to "abcef" in the first example is that if "d" is deleted from "abcd", it would be a match to the substring "abc" in "abcef"? Does this also mean that the transformations are always performed on the pattern argument?
– xyy
May 15 '15 at 16:40
I believe so, yes. I would describe it as "can I transform pattern into a substring of an element of x?" If yes, it matches. The source for agrep is here which would be the definitive answer, provided you know C.
– joran
May 15 '15 at 16:44