how to extract sentences containg citations from scientific articles in pdf with R [duplicate]












-3
















This question already has an answer here:




  • How to extract sentences that contains citation mark with R

    2 answers




For example :



url="https://ac.els-cdn.com/S000145751000254X/1-s2.0-S000145751000254X-main.pdf _tid=9209e0fa-6c39-4cb5-a2b9 enter code here49135bd88c91&acdnat=1542194335_14cca2d44bdb5aed4199fb5ba4f2451a" #it is pdf file

library(pdftools)
data <- pdf_text(url) # import scientific article


From this scientific article(data) I have to extract all the sentences that contain citations. if for example the text of my pdf is this:




We have analyzed the association between the rider related factors and
the risk of being responsible for an injury accident. A number of our
findings merit discussion. First, we found the factor with the
strongest association with accident responsibility to be alcohol
consumption: we identified an increase in the risk of injury accidents
after alcohol consumption, with a dose effect. Two other studies have
measured this association among PTW riders and also found an increase
in the risk of responsibility (Lardelli-Claret et al., 2005; Williams
et al., 1985). In common with the majority of studies, we identified
an excess risk of accident involvement among novice PTW riders (Chang
and Yeh, 2006; Evans, 2004; Harrison and Christie, 2005;
Lardelli-Claret et al., 2005; Lin et al., 2003; Mullin et al., 2000;
Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998;
Skalkidou et al., 1999; Yannis et al., 2005). This may be due to a
combination of young people’s inexperience and risk-taking (Chesham et
al., 1993; Ryan et al., 1998) as well as risk exposure (Harrison and
Christie, 2005; Yannis et al., 2005).




The result should be like this:




[1] Two other studies have measured
this association among PTW riders and also found an increase in the
risk of responsibility (Lardelli-Claret et al., 2005; Williams et al.,
1985).



[2] In common with the majority of studies, we identified an excess
risk of accident involvement among novice PTW riders (Chang and Yeh,
2006; Evans, 2004; Harrison and Christie, 2005; Lardelli-Claret et
al., 2005; Lin et al., 2003; Mullin et al., 2000; Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998; Skalkidou et al., 1999;
Yannis et al., 2005).



[3] This may be due to a combination of young people’s inexperience and risk-taking (Chesham et al., 1993; Ryan et al., 1998)




etc










share|improve this question















marked as duplicate by hrbrmstr, 42- r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 14 '18 at 19:56


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • Why are you asking the same question over and over and over again?

    – hrbrmstr
    Nov 14 '18 at 19:35











  • that is not even a valid URL

    – hrbrmstr
    Nov 14 '18 at 19:37











  • This appears to be homework and the OP is also trying to subvert intellectual property rights (not that I like paywalled journals but the law is the law) and get the SO community to do that for them. Anyone coming to the should review the previous questions to see the progression.

    – hrbrmstr
    Nov 14 '18 at 19:43











  • @Alfonso You you should go back to the first question and undo the errors in posting. That "answer" you put in was not an answer but rather a clarification and should have been done as an edit to your original question.

    – 42-
    Nov 14 '18 at 19:58











  • I apologize to everyone, I'm new to the community. I messed up all. I humbly apologize

    – Alfonso Sorrentino
    Nov 14 '18 at 20:13
















-3
















This question already has an answer here:




  • How to extract sentences that contains citation mark with R

    2 answers




For example :



url="https://ac.els-cdn.com/S000145751000254X/1-s2.0-S000145751000254X-main.pdf _tid=9209e0fa-6c39-4cb5-a2b9 enter code here49135bd88c91&acdnat=1542194335_14cca2d44bdb5aed4199fb5ba4f2451a" #it is pdf file

library(pdftools)
data <- pdf_text(url) # import scientific article


From this scientific article(data) I have to extract all the sentences that contain citations. if for example the text of my pdf is this:




We have analyzed the association between the rider related factors and
the risk of being responsible for an injury accident. A number of our
findings merit discussion. First, we found the factor with the
strongest association with accident responsibility to be alcohol
consumption: we identified an increase in the risk of injury accidents
after alcohol consumption, with a dose effect. Two other studies have
measured this association among PTW riders and also found an increase
in the risk of responsibility (Lardelli-Claret et al., 2005; Williams
et al., 1985). In common with the majority of studies, we identified
an excess risk of accident involvement among novice PTW riders (Chang
and Yeh, 2006; Evans, 2004; Harrison and Christie, 2005;
Lardelli-Claret et al., 2005; Lin et al., 2003; Mullin et al., 2000;
Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998;
Skalkidou et al., 1999; Yannis et al., 2005). This may be due to a
combination of young people’s inexperience and risk-taking (Chesham et
al., 1993; Ryan et al., 1998) as well as risk exposure (Harrison and
Christie, 2005; Yannis et al., 2005).




The result should be like this:




[1] Two other studies have measured
this association among PTW riders and also found an increase in the
risk of responsibility (Lardelli-Claret et al., 2005; Williams et al.,
1985).



[2] In common with the majority of studies, we identified an excess
risk of accident involvement among novice PTW riders (Chang and Yeh,
2006; Evans, 2004; Harrison and Christie, 2005; Lardelli-Claret et
al., 2005; Lin et al., 2003; Mullin et al., 2000; Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998; Skalkidou et al., 1999;
Yannis et al., 2005).



[3] This may be due to a combination of young people’s inexperience and risk-taking (Chesham et al., 1993; Ryan et al., 1998)




etc










share|improve this question















marked as duplicate by hrbrmstr, 42- r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 14 '18 at 19:56


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • Why are you asking the same question over and over and over again?

    – hrbrmstr
    Nov 14 '18 at 19:35











  • that is not even a valid URL

    – hrbrmstr
    Nov 14 '18 at 19:37











  • This appears to be homework and the OP is also trying to subvert intellectual property rights (not that I like paywalled journals but the law is the law) and get the SO community to do that for them. Anyone coming to the should review the previous questions to see the progression.

    – hrbrmstr
    Nov 14 '18 at 19:43











  • @Alfonso You you should go back to the first question and undo the errors in posting. That "answer" you put in was not an answer but rather a clarification and should have been done as an edit to your original question.

    – 42-
    Nov 14 '18 at 19:58











  • I apologize to everyone, I'm new to the community. I messed up all. I humbly apologize

    – Alfonso Sorrentino
    Nov 14 '18 at 20:13














-3












-3








-3


1







This question already has an answer here:




  • How to extract sentences that contains citation mark with R

    2 answers




For example :



url="https://ac.els-cdn.com/S000145751000254X/1-s2.0-S000145751000254X-main.pdf _tid=9209e0fa-6c39-4cb5-a2b9 enter code here49135bd88c91&acdnat=1542194335_14cca2d44bdb5aed4199fb5ba4f2451a" #it is pdf file

library(pdftools)
data <- pdf_text(url) # import scientific article


From this scientific article(data) I have to extract all the sentences that contain citations. if for example the text of my pdf is this:




We have analyzed the association between the rider related factors and
the risk of being responsible for an injury accident. A number of our
findings merit discussion. First, we found the factor with the
strongest association with accident responsibility to be alcohol
consumption: we identified an increase in the risk of injury accidents
after alcohol consumption, with a dose effect. Two other studies have
measured this association among PTW riders and also found an increase
in the risk of responsibility (Lardelli-Claret et al., 2005; Williams
et al., 1985). In common with the majority of studies, we identified
an excess risk of accident involvement among novice PTW riders (Chang
and Yeh, 2006; Evans, 2004; Harrison and Christie, 2005;
Lardelli-Claret et al., 2005; Lin et al., 2003; Mullin et al., 2000;
Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998;
Skalkidou et al., 1999; Yannis et al., 2005). This may be due to a
combination of young people’s inexperience and risk-taking (Chesham et
al., 1993; Ryan et al., 1998) as well as risk exposure (Harrison and
Christie, 2005; Yannis et al., 2005).




The result should be like this:




[1] Two other studies have measured
this association among PTW riders and also found an increase in the
risk of responsibility (Lardelli-Claret et al., 2005; Williams et al.,
1985).



[2] In common with the majority of studies, we identified an excess
risk of accident involvement among novice PTW riders (Chang and Yeh,
2006; Evans, 2004; Harrison and Christie, 2005; Lardelli-Claret et
al., 2005; Lin et al., 2003; Mullin et al., 2000; Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998; Skalkidou et al., 1999;
Yannis et al., 2005).



[3] This may be due to a combination of young people’s inexperience and risk-taking (Chesham et al., 1993; Ryan et al., 1998)




etc










share|improve this question

















This question already has an answer here:




  • How to extract sentences that contains citation mark with R

    2 answers




For example :



url="https://ac.els-cdn.com/S000145751000254X/1-s2.0-S000145751000254X-main.pdf _tid=9209e0fa-6c39-4cb5-a2b9 enter code here49135bd88c91&acdnat=1542194335_14cca2d44bdb5aed4199fb5ba4f2451a" #it is pdf file

library(pdftools)
data <- pdf_text(url) # import scientific article


From this scientific article(data) I have to extract all the sentences that contain citations. if for example the text of my pdf is this:




We have analyzed the association between the rider related factors and
the risk of being responsible for an injury accident. A number of our
findings merit discussion. First, we found the factor with the
strongest association with accident responsibility to be alcohol
consumption: we identified an increase in the risk of injury accidents
after alcohol consumption, with a dose effect. Two other studies have
measured this association among PTW riders and also found an increase
in the risk of responsibility (Lardelli-Claret et al., 2005; Williams
et al., 1985). In common with the majority of studies, we identified
an excess risk of accident involvement among novice PTW riders (Chang
and Yeh, 2006; Evans, 2004; Harrison and Christie, 2005;
Lardelli-Claret et al., 2005; Lin et al., 2003; Mullin et al., 2000;
Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998;
Skalkidou et al., 1999; Yannis et al., 2005). This may be due to a
combination of young people’s inexperience and risk-taking (Chesham et
al., 1993; Ryan et al., 1998) as well as risk exposure (Harrison and
Christie, 2005; Yannis et al., 2005).




The result should be like this:




[1] Two other studies have measured
this association among PTW riders and also found an increase in the
risk of responsibility (Lardelli-Claret et al., 2005; Williams et al.,
1985).



[2] In common with the majority of studies, we identified an excess
risk of accident involvement among novice PTW riders (Chang and Yeh,
2006; Evans, 2004; Harrison and Christie, 2005; Lardelli-Claret et
al., 2005; Lin et al., 2003; Mullin et al., 2000; Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998; Skalkidou et al., 1999;
Yannis et al., 2005).



[3] This may be due to a combination of young people’s inexperience and risk-taking (Chesham et al., 1993; Ryan et al., 1998)




etc





This question already has an answer here:




  • How to extract sentences that contains citation mark with R

    2 answers








r regex pdf






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 19:16









Jake Kaupp

5,64721428




5,64721428










asked Nov 14 '18 at 19:11









Alfonso SorrentinoAlfonso Sorrentino

1




1




marked as duplicate by hrbrmstr, 42- r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 14 '18 at 19:56


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









marked as duplicate by hrbrmstr, 42- r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 14 '18 at 19:56


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.















  • Why are you asking the same question over and over and over again?

    – hrbrmstr
    Nov 14 '18 at 19:35











  • that is not even a valid URL

    – hrbrmstr
    Nov 14 '18 at 19:37











  • This appears to be homework and the OP is also trying to subvert intellectual property rights (not that I like paywalled journals but the law is the law) and get the SO community to do that for them. Anyone coming to the should review the previous questions to see the progression.

    – hrbrmstr
    Nov 14 '18 at 19:43











  • @Alfonso You you should go back to the first question and undo the errors in posting. That "answer" you put in was not an answer but rather a clarification and should have been done as an edit to your original question.

    – 42-
    Nov 14 '18 at 19:58











  • I apologize to everyone, I'm new to the community. I messed up all. I humbly apologize

    – Alfonso Sorrentino
    Nov 14 '18 at 20:13



















  • Why are you asking the same question over and over and over again?

    – hrbrmstr
    Nov 14 '18 at 19:35











  • that is not even a valid URL

    – hrbrmstr
    Nov 14 '18 at 19:37











  • This appears to be homework and the OP is also trying to subvert intellectual property rights (not that I like paywalled journals but the law is the law) and get the SO community to do that for them. Anyone coming to the should review the previous questions to see the progression.

    – hrbrmstr
    Nov 14 '18 at 19:43











  • @Alfonso You you should go back to the first question and undo the errors in posting. That "answer" you put in was not an answer but rather a clarification and should have been done as an edit to your original question.

    – 42-
    Nov 14 '18 at 19:58











  • I apologize to everyone, I'm new to the community. I messed up all. I humbly apologize

    – Alfonso Sorrentino
    Nov 14 '18 at 20:13

















Why are you asking the same question over and over and over again?

– hrbrmstr
Nov 14 '18 at 19:35





Why are you asking the same question over and over and over again?

– hrbrmstr
Nov 14 '18 at 19:35













that is not even a valid URL

– hrbrmstr
Nov 14 '18 at 19:37





that is not even a valid URL

– hrbrmstr
Nov 14 '18 at 19:37













This appears to be homework and the OP is also trying to subvert intellectual property rights (not that I like paywalled journals but the law is the law) and get the SO community to do that for them. Anyone coming to the should review the previous questions to see the progression.

– hrbrmstr
Nov 14 '18 at 19:43





This appears to be homework and the OP is also trying to subvert intellectual property rights (not that I like paywalled journals but the law is the law) and get the SO community to do that for them. Anyone coming to the should review the previous questions to see the progression.

– hrbrmstr
Nov 14 '18 at 19:43













@Alfonso You you should go back to the first question and undo the errors in posting. That "answer" you put in was not an answer but rather a clarification and should have been done as an edit to your original question.

– 42-
Nov 14 '18 at 19:58





@Alfonso You you should go back to the first question and undo the errors in posting. That "answer" you put in was not an answer but rather a clarification and should have been done as an edit to your original question.

– 42-
Nov 14 '18 at 19:58













I apologize to everyone, I'm new to the community. I messed up all. I humbly apologize

– Alfonso Sorrentino
Nov 14 '18 at 20:13





I apologize to everyone, I'm new to the community. I messed up all. I humbly apologize

– Alfonso Sorrentino
Nov 14 '18 at 20:13












1 Answer
1






active

oldest

votes


















0














Please note that the regular expression selection (in grepl) is rather crude and is looking for a match with 4 digits and anything else within brackets.





data <- "We have analyzed the association between the rider related factors and the risk of being responsible for an injury accident. A number of our findings merit discussion. First, we found the factor with the strongest association with accident responsibility to be alcohol consumption: we identified an increase in the risk of injury accidents after alcohol consumption, with a dose effect. Two other studies have measured this association among PTW riders and also found an increase in the risk of responsibility (Lardelli-Claret et al., 2005; Williams et al., 1985). In common with the majority of studies, we identified an excess risk of accident involvement among novice PTW riders (Chang and Yeh, 2006; Evans, 2004; Harrison and Christie, 2005; Lardelli-Claret et al., 2005; Lin et al., 2003; Mullin et al., 2000; Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998; Skalkidou et al., 1999; Yannis et al., 2005). This may be due to a combination of young people’s inexperience and risk-taking (Chesham et al., 1993; Ryan et al., 1998) as well as risk exposure (Harrison and Christie, 2005; Yannis et al., 2005)."

split_txt <- unlist(strsplit(data, "\. "))

split_txt[grepl("(\(.*[0-9]{4}.*\))", split_txt)]
#> [1] "Two other studies have measured this association among PTW riders and also found an increase in the risk of responsibility (Lardelli-Claret et al., 2005; Williams et al., 1985)"
#> [2] "In common with the majority of studies, we identified an excess risk of accident involvement among novice PTW riders (Chang and Yeh, 2006; Evans, 2004; Harrison and Christie, 2005; Lardelli-Claret et al., 2005; Lin et al., 2003; Mullin et al., 2000; Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998; Skalkidou et al., 1999; Yannis et al., 2005)"
#> [3] "This may be due to a combination of young people’s inexperience and risk-taking (Chesham et al., 1993; Ryan et al., 1998) as well as risk exposure (Harrison and Christie, 2005; Yannis et al., 2005)."


Created on 2018-11-14 by the reprex package (v0.2.1)






share|improve this answer






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Please note that the regular expression selection (in grepl) is rather crude and is looking for a match with 4 digits and anything else within brackets.





    data <- "We have analyzed the association between the rider related factors and the risk of being responsible for an injury accident. A number of our findings merit discussion. First, we found the factor with the strongest association with accident responsibility to be alcohol consumption: we identified an increase in the risk of injury accidents after alcohol consumption, with a dose effect. Two other studies have measured this association among PTW riders and also found an increase in the risk of responsibility (Lardelli-Claret et al., 2005; Williams et al., 1985). In common with the majority of studies, we identified an excess risk of accident involvement among novice PTW riders (Chang and Yeh, 2006; Evans, 2004; Harrison and Christie, 2005; Lardelli-Claret et al., 2005; Lin et al., 2003; Mullin et al., 2000; Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998; Skalkidou et al., 1999; Yannis et al., 2005). This may be due to a combination of young people’s inexperience and risk-taking (Chesham et al., 1993; Ryan et al., 1998) as well as risk exposure (Harrison and Christie, 2005; Yannis et al., 2005)."

    split_txt <- unlist(strsplit(data, "\. "))

    split_txt[grepl("(\(.*[0-9]{4}.*\))", split_txt)]
    #> [1] "Two other studies have measured this association among PTW riders and also found an increase in the risk of responsibility (Lardelli-Claret et al., 2005; Williams et al., 1985)"
    #> [2] "In common with the majority of studies, we identified an excess risk of accident involvement among novice PTW riders (Chang and Yeh, 2006; Evans, 2004; Harrison and Christie, 2005; Lardelli-Claret et al., 2005; Lin et al., 2003; Mullin et al., 2000; Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998; Skalkidou et al., 1999; Yannis et al., 2005)"
    #> [3] "This may be due to a combination of young people’s inexperience and risk-taking (Chesham et al., 1993; Ryan et al., 1998) as well as risk exposure (Harrison and Christie, 2005; Yannis et al., 2005)."


    Created on 2018-11-14 by the reprex package (v0.2.1)






    share|improve this answer




























      0














      Please note that the regular expression selection (in grepl) is rather crude and is looking for a match with 4 digits and anything else within brackets.





      data <- "We have analyzed the association between the rider related factors and the risk of being responsible for an injury accident. A number of our findings merit discussion. First, we found the factor with the strongest association with accident responsibility to be alcohol consumption: we identified an increase in the risk of injury accidents after alcohol consumption, with a dose effect. Two other studies have measured this association among PTW riders and also found an increase in the risk of responsibility (Lardelli-Claret et al., 2005; Williams et al., 1985). In common with the majority of studies, we identified an excess risk of accident involvement among novice PTW riders (Chang and Yeh, 2006; Evans, 2004; Harrison and Christie, 2005; Lardelli-Claret et al., 2005; Lin et al., 2003; Mullin et al., 2000; Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998; Skalkidou et al., 1999; Yannis et al., 2005). This may be due to a combination of young people’s inexperience and risk-taking (Chesham et al., 1993; Ryan et al., 1998) as well as risk exposure (Harrison and Christie, 2005; Yannis et al., 2005)."

      split_txt <- unlist(strsplit(data, "\. "))

      split_txt[grepl("(\(.*[0-9]{4}.*\))", split_txt)]
      #> [1] "Two other studies have measured this association among PTW riders and also found an increase in the risk of responsibility (Lardelli-Claret et al., 2005; Williams et al., 1985)"
      #> [2] "In common with the majority of studies, we identified an excess risk of accident involvement among novice PTW riders (Chang and Yeh, 2006; Evans, 2004; Harrison and Christie, 2005; Lardelli-Claret et al., 2005; Lin et al., 2003; Mullin et al., 2000; Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998; Skalkidou et al., 1999; Yannis et al., 2005)"
      #> [3] "This may be due to a combination of young people’s inexperience and risk-taking (Chesham et al., 1993; Ryan et al., 1998) as well as risk exposure (Harrison and Christie, 2005; Yannis et al., 2005)."


      Created on 2018-11-14 by the reprex package (v0.2.1)






      share|improve this answer


























        0












        0








        0







        Please note that the regular expression selection (in grepl) is rather crude and is looking for a match with 4 digits and anything else within brackets.





        data <- "We have analyzed the association between the rider related factors and the risk of being responsible for an injury accident. A number of our findings merit discussion. First, we found the factor with the strongest association with accident responsibility to be alcohol consumption: we identified an increase in the risk of injury accidents after alcohol consumption, with a dose effect. Two other studies have measured this association among PTW riders and also found an increase in the risk of responsibility (Lardelli-Claret et al., 2005; Williams et al., 1985). In common with the majority of studies, we identified an excess risk of accident involvement among novice PTW riders (Chang and Yeh, 2006; Evans, 2004; Harrison and Christie, 2005; Lardelli-Claret et al., 2005; Lin et al., 2003; Mullin et al., 2000; Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998; Skalkidou et al., 1999; Yannis et al., 2005). This may be due to a combination of young people’s inexperience and risk-taking (Chesham et al., 1993; Ryan et al., 1998) as well as risk exposure (Harrison and Christie, 2005; Yannis et al., 2005)."

        split_txt <- unlist(strsplit(data, "\. "))

        split_txt[grepl("(\(.*[0-9]{4}.*\))", split_txt)]
        #> [1] "Two other studies have measured this association among PTW riders and also found an increase in the risk of responsibility (Lardelli-Claret et al., 2005; Williams et al., 1985)"
        #> [2] "In common with the majority of studies, we identified an excess risk of accident involvement among novice PTW riders (Chang and Yeh, 2006; Evans, 2004; Harrison and Christie, 2005; Lardelli-Claret et al., 2005; Lin et al., 2003; Mullin et al., 2000; Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998; Skalkidou et al., 1999; Yannis et al., 2005)"
        #> [3] "This may be due to a combination of young people’s inexperience and risk-taking (Chesham et al., 1993; Ryan et al., 1998) as well as risk exposure (Harrison and Christie, 2005; Yannis et al., 2005)."


        Created on 2018-11-14 by the reprex package (v0.2.1)






        share|improve this answer













        Please note that the regular expression selection (in grepl) is rather crude and is looking for a match with 4 digits and anything else within brackets.





        data <- "We have analyzed the association between the rider related factors and the risk of being responsible for an injury accident. A number of our findings merit discussion. First, we found the factor with the strongest association with accident responsibility to be alcohol consumption: we identified an increase in the risk of injury accidents after alcohol consumption, with a dose effect. Two other studies have measured this association among PTW riders and also found an increase in the risk of responsibility (Lardelli-Claret et al., 2005; Williams et al., 1985). In common with the majority of studies, we identified an excess risk of accident involvement among novice PTW riders (Chang and Yeh, 2006; Evans, 2004; Harrison and Christie, 2005; Lardelli-Claret et al., 2005; Lin et al., 2003; Mullin et al., 2000; Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998; Skalkidou et al., 1999; Yannis et al., 2005). This may be due to a combination of young people’s inexperience and risk-taking (Chesham et al., 1993; Ryan et al., 1998) as well as risk exposure (Harrison and Christie, 2005; Yannis et al., 2005)."

        split_txt <- unlist(strsplit(data, "\. "))

        split_txt[grepl("(\(.*[0-9]{4}.*\))", split_txt)]
        #> [1] "Two other studies have measured this association among PTW riders and also found an increase in the risk of responsibility (Lardelli-Claret et al., 2005; Williams et al., 1985)"
        #> [2] "In common with the majority of studies, we identified an excess risk of accident involvement among novice PTW riders (Chang and Yeh, 2006; Evans, 2004; Harrison and Christie, 2005; Lardelli-Claret et al., 2005; Lin et al., 2003; Mullin et al., 2000; Reeder et al., 1995; Rutter and Quine, 1996; Ryan et al., 1998; Skalkidou et al., 1999; Yannis et al., 2005)"
        #> [3] "This may be due to a combination of young people’s inexperience and risk-taking (Chesham et al., 1993; Ryan et al., 1998) as well as risk exposure (Harrison and Christie, 2005; Yannis et al., 2005)."


        Created on 2018-11-14 by the reprex package (v0.2.1)







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 14 '18 at 19:28









        Jake KauppJake Kaupp

        5,64721428




        5,64721428

















            Popular posts from this blog

            Xamarin.iOS Cant Deploy on Iphone

            Glorious Revolution

            Dulmage-Mendelsohn matrix decomposition in Python