Find string then print what comes next until another string

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

Here's my input.file (thousands of lines):

FN545816.1  EMBL    CDS 9450    9857    .   +   0   ID=cds-CBE01461.1;Parent=gene-CDR20291_3551;Dbxref=EnsemblGenomes-Gn:CDR20291_3551,EnsemblGenomes-Tr:CBE01461,GOA:C9YHF8,InterPro:IPR003594,UniProtKB/TrEMBL:C9YHF8,NCBI_GP:CBE01461.1;Name=CBE01461.1;gbkey=CDS;gene=rsbW;product=anti-sigma-B factor (serine-protein kinase);protein_id=CBE01461.1;transl_table=11

I want to extract only what comes after product= up to the next ;
So, in this case, I want to get "anti-sigma-B factor (serine-protein kinase)"

I tried this:

awk '{for(i=1; i<=NF; i++) if($i~/*product=/) print $(i+1)}' input.file > output.file

but it prints only "factor" (presumably because there's no space in between "product=" and "anti-sigma-B". It doesn't print the rest neither.

I tried many previous solutions but none gave what I want.

Thank you.

asked Nov 17 '18 at 1:03

ThePresident

888

give it sometime and try to select and answer out of all as a correct one to complete the thread, see this once too stackoverflow.com/help/someone-answers

– RavinderSingh13
Nov 17 '18 at 2:24

add a comment |

Here's my input.file (thousands of lines):

FN545816.1  EMBL    CDS 9450    9857    .   +   0   ID=cds-CBE01461.1;Parent=gene-CDR20291_3551;Dbxref=EnsemblGenomes-Gn:CDR20291_3551,EnsemblGenomes-Tr:CBE01461,GOA:C9YHF8,InterPro:IPR003594,UniProtKB/TrEMBL:C9YHF8,NCBI_GP:CBE01461.1;Name=CBE01461.1;gbkey=CDS;gene=rsbW;product=anti-sigma-B factor (serine-protein kinase);protein_id=CBE01461.1;transl_table=11

I want to extract only what comes after product= up to the next ;
So, in this case, I want to get "anti-sigma-B factor (serine-protein kinase)"

I tried this:

awk '{for(i=1; i<=NF; i++) if($i~/*product=/) print $(i+1)}' input.file > output.file

but it prints only "factor" (presumably because there's no space in between "product=" and "anti-sigma-B". It doesn't print the rest neither.

I tried many previous solutions but none gave what I want.

Thank you.

asked Nov 17 '18 at 1:03

ThePresident

888

give it sometime and try to select and answer out of all as a correct one to complete the thread, see this once too stackoverflow.com/help/someone-answers

– RavinderSingh13
Nov 17 '18 at 2:24

add a comment |

Here's my input.file (thousands of lines):

FN545816.1  EMBL    CDS 9450    9857    .   +   0   ID=cds-CBE01461.1;Parent=gene-CDR20291_3551;Dbxref=EnsemblGenomes-Gn:CDR20291_3551,EnsemblGenomes-Tr:CBE01461,GOA:C9YHF8,InterPro:IPR003594,UniProtKB/TrEMBL:C9YHF8,NCBI_GP:CBE01461.1;Name=CBE01461.1;gbkey=CDS;gene=rsbW;product=anti-sigma-B factor (serine-protein kinase);protein_id=CBE01461.1;transl_table=11

I want to extract only what comes after product= up to the next ;
So, in this case, I want to get "anti-sigma-B factor (serine-protein kinase)"

I tried this:

awk '{for(i=1; i<=NF; i++) if($i~/*product=/) print $(i+1)}' input.file > output.file

but it prints only "factor" (presumably because there's no space in between "product=" and "anti-sigma-B". It doesn't print the rest neither.

I tried many previous solutions but none gave what I want.

Thank you.

asked Nov 17 '18 at 1:03

ThePresident

888

Here's my input.file (thousands of lines):

FN545816.1  EMBL    CDS 9450    9857    .   +   0   ID=cds-CBE01461.1;Parent=gene-CDR20291_3551;Dbxref=EnsemblGenomes-Gn:CDR20291_3551,EnsemblGenomes-Tr:CBE01461,GOA:C9YHF8,InterPro:IPR003594,UniProtKB/TrEMBL:C9YHF8,NCBI_GP:CBE01461.1;Name=CBE01461.1;gbkey=CDS;gene=rsbW;product=anti-sigma-B factor (serine-protein kinase);protein_id=CBE01461.1;transl_table=11

I want to extract only what comes after product= up to the next ;
So, in this case, I want to get "anti-sigma-B factor (serine-protein kinase)"

I tried this:

awk '{for(i=1; i<=NF; i++) if($i~/*product=/) print $(i+1)}' input.file > output.file

but it prints only "factor" (presumably because there's no space in between "product=" and "anti-sigma-B". It doesn't print the rest neither.

I tried many previous solutions but none gave what I want.

Thank you.

awk

asked Nov 17 '18 at 1:03

ThePresident

888

asked Nov 17 '18 at 1:03

ThePresident

888

asked Nov 17 '18 at 1:03

ThePresident

888

asked Nov 17 '18 at 1:03

ThePresident

888

asked Nov 17 '18 at 1:03

ThePresident

888

give it sometime and try to select and answer out of all as a correct one to complete the thread, see this once too stackoverflow.com/help/someone-answers

– RavinderSingh13
Nov 17 '18 at 2:24

add a comment |

give it sometime and try to select and answer out of all as a correct one to complete the thread, see this once too stackoverflow.com/help/someone-answers

– RavinderSingh13
Nov 17 '18 at 2:24

give it sometime and try to select and answer out of all as a correct one to complete the thread, see this once too stackoverflow.com/help/someone-answers

– RavinderSingh13
Nov 17 '18 at 2:24

add a comment |

1 Answer
1

active

oldest

votes

Could you please try following.

awk 'match($0,/product=[^;]*/){print substr($0,RSTART+8,RLENGTH-8)}' Input_file

Explanation: Adding explanation for above code too now.

awk '                                           ##Starting awk program here.

match($0,/product=[^;]*/){                      ##Using match function for awk here, where giving REGEX to match from string product= till first occurrence of ;

  print substr($0,RSTART+8,RLENGTH-8)           ##Printing substring whose value is from RSTART+8 to till RLENGTH-8, where RSTART and RLENGTH are out of the box keywords which will be set once REGEX condition is satisfied. RSTART mean starting point of regex and RLENGTH is length of REGEX matched.

}'  Input_file                                  ##Mentioning Input_file name here.

edited Nov 17 '18 at 2:13

answered Nov 17 '18 at 2:07

RavinderSingh13

31k41639

Yes, this works perfectly, thank you! Honestly, I did not get the substr($0,RSTART+8,RLENGTH-8) part.

– ThePresident
Nov 17 '18 at 2:28

@ThePresident, I had added explanation to my answer too, where RSTART is the starting index(point) of matched regex and RLENGTH is the length of that matched REGEX. So as we know substr is used to get sub string then it has starting point(from where it should cut string) I have put RSTART till RLENGTH, is it clear now please let me know in case of any queries?

– RavinderSingh13
Nov 17 '18 at 2:30

1

Yeah makes sense now. Thank you.

– ThePresident
Nov 17 '18 at 2:32

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53347256%2ffind-string-then-print-what-comes-next-until-another-string%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Could you please try following.

awk 'match($0,/product=[^;]*/){print substr($0,RSTART+8,RLENGTH-8)}' Input_file

Explanation: Adding explanation for above code too now.

awk '                                           ##Starting awk program here.

match($0,/product=[^;]*/){                      ##Using match function for awk here, where giving REGEX to match from string product= till first occurrence of ;

  print substr($0,RSTART+8,RLENGTH-8)           ##Printing substring whose value is from RSTART+8 to till RLENGTH-8, where RSTART and RLENGTH are out of the box keywords which will be set once REGEX condition is satisfied. RSTART mean starting point of regex and RLENGTH is length of REGEX matched.

}'  Input_file                                  ##Mentioning Input_file name here.

edited Nov 17 '18 at 2:13

answered Nov 17 '18 at 2:07

RavinderSingh13

31k41639

Yes, this works perfectly, thank you! Honestly, I did not get the substr($0,RSTART+8,RLENGTH-8) part.

– ThePresident
Nov 17 '18 at 2:28

@ThePresident, I had added explanation to my answer too, where RSTART is the starting index(point) of matched regex and RLENGTH is the length of that matched REGEX. So as we know substr is used to get sub string then it has starting point(from where it should cut string) I have put RSTART till RLENGTH, is it clear now please let me know in case of any queries?

– RavinderSingh13
Nov 17 '18 at 2:30

1

Yeah makes sense now. Thank you.

– ThePresident
Nov 17 '18 at 2:32

add a comment |

Could you please try following.

awk 'match($0,/product=[^;]*/){print substr($0,RSTART+8,RLENGTH-8)}' Input_file

Explanation: Adding explanation for above code too now.

awk '                                           ##Starting awk program here.

match($0,/product=[^;]*/){                      ##Using match function for awk here, where giving REGEX to match from string product= till first occurrence of ;

  print substr($0,RSTART+8,RLENGTH-8)           ##Printing substring whose value is from RSTART+8 to till RLENGTH-8, where RSTART and RLENGTH are out of the box keywords which will be set once REGEX condition is satisfied. RSTART mean starting point of regex and RLENGTH is length of REGEX matched.

}'  Input_file                                  ##Mentioning Input_file name here.

edited Nov 17 '18 at 2:13

answered Nov 17 '18 at 2:07

RavinderSingh13

31k41639

Yes, this works perfectly, thank you! Honestly, I did not get the substr($0,RSTART+8,RLENGTH-8) part.

– ThePresident
Nov 17 '18 at 2:28

@ThePresident, I had added explanation to my answer too, where RSTART is the starting index(point) of matched regex and RLENGTH is the length of that matched REGEX. So as we know substr is used to get sub string then it has starting point(from where it should cut string) I have put RSTART till RLENGTH, is it clear now please let me know in case of any queries?

– RavinderSingh13
Nov 17 '18 at 2:30

1

Yeah makes sense now. Thank you.

– ThePresident
Nov 17 '18 at 2:32

add a comment |

Could you please try following.

awk 'match($0,/product=[^;]*/){print substr($0,RSTART+8,RLENGTH-8)}' Input_file

Explanation: Adding explanation for above code too now.

awk '                                           ##Starting awk program here.

match($0,/product=[^;]*/){                      ##Using match function for awk here, where giving REGEX to match from string product= till first occurrence of ;

  print substr($0,RSTART+8,RLENGTH-8)           ##Printing substring whose value is from RSTART+8 to till RLENGTH-8, where RSTART and RLENGTH are out of the box keywords which will be set once REGEX condition is satisfied. RSTART mean starting point of regex and RLENGTH is length of REGEX matched.

}'  Input_file                                  ##Mentioning Input_file name here.

edited Nov 17 '18 at 2:13

answered Nov 17 '18 at 2:07

RavinderSingh13

31k41639

Could you please try following.

awk 'match($0,/product=[^;]*/){print substr($0,RSTART+8,RLENGTH-8)}' Input_file

Explanation: Adding explanation for above code too now.

awk '                                           ##Starting awk program here.

match($0,/product=[^;]*/){                      ##Using match function for awk here, where giving REGEX to match from string product= till first occurrence of ;

  print substr($0,RSTART+8,RLENGTH-8)           ##Printing substring whose value is from RSTART+8 to till RLENGTH-8, where RSTART and RLENGTH are out of the box keywords which will be set once REGEX condition is satisfied. RSTART mean starting point of regex and RLENGTH is length of REGEX matched.

}'  Input_file                                  ##Mentioning Input_file name here.

edited Nov 17 '18 at 2:13

answered Nov 17 '18 at 2:07

RavinderSingh13

31k41639

edited Nov 17 '18 at 2:13

answered Nov 17 '18 at 2:07

RavinderSingh13

31k41639

answered Nov 17 '18 at 2:07

RavinderSingh13

31k41639

answered Nov 17 '18 at 2:07

RavinderSingh13

31k41639

Yes, this works perfectly, thank you! Honestly, I did not get the substr($0,RSTART+8,RLENGTH-8) part.

– ThePresident
Nov 17 '18 at 2:28

@ThePresident, I had added explanation to my answer too, where RSTART is the starting index(point) of matched regex and RLENGTH is the length of that matched REGEX. So as we know substr is used to get sub string then it has starting point(from where it should cut string) I have put RSTART till RLENGTH, is it clear now please let me know in case of any queries?

– RavinderSingh13
Nov 17 '18 at 2:30

1

Yeah makes sense now. Thank you.

– ThePresident
Nov 17 '18 at 2:32

add a comment |

Yes, this works perfectly, thank you! Honestly, I did not get the substr($0,RSTART+8,RLENGTH-8) part.

– ThePresident
Nov 17 '18 at 2:28

@ThePresident, I had added explanation to my answer too, where RSTART is the starting index(point) of matched regex and RLENGTH is the length of that matched REGEX. So as we know substr is used to get sub string then it has starting point(from where it should cut string) I have put RSTART till RLENGTH, is it clear now please let me know in case of any queries?

– RavinderSingh13
Nov 17 '18 at 2:30

1

Yeah makes sense now. Thank you.

– ThePresident
Nov 17 '18 at 2:32

Yes, this works perfectly, thank you! Honestly, I did not get the substr($0,RSTART+8,RLENGTH-8) part.

– ThePresident
Nov 17 '18 at 2:28

@ThePresident, I had added explanation to my answer too, where RSTART is the starting index(point) of matched regex and RLENGTH is the length of that matched REGEX. So as we know substr is used to get sub string then it has starting point(from where it should cut string) I have put RSTART till RLENGTH, is it clear now please let me know in case of any queries?

– RavinderSingh13
Nov 17 '18 at 2:30

Yeah makes sense now. Thank you.

– ThePresident
Nov 17 '18 at 2:32

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky