Find string then print what comes next until another string





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















Here's my input.file (thousands of lines):



FN545816.1  EMBL    CDS 9450    9857    .   +   0   ID=cds-CBE01461.1;Parent=gene-CDR20291_3551;Dbxref=EnsemblGenomes-Gn:CDR20291_3551,EnsemblGenomes-Tr:CBE01461,GOA:C9YHF8,InterPro:IPR003594,UniProtKB/TrEMBL:C9YHF8,NCBI_GP:CBE01461.1;Name=CBE01461.1;gbkey=CDS;gene=rsbW;product=anti-sigma-B factor (serine-protein kinase);protein_id=CBE01461.1;transl_table=11


I want to extract only what comes after product= up to the next ;
So, in this case, I want to get "anti-sigma-B factor (serine-protein kinase)"



I tried this:



awk '{for(i=1; i<=NF; i++) if($i~/*product=/) print $(i+1)}' input.file > output.file 


but it prints only "factor" (presumably because there's no space in between "product=" and "anti-sigma-B". It doesn't print the rest neither.



I tried many previous solutions but none gave what I want.



Thank you.










share|improve this question























  • give it sometime and try to select and answer out of all as a correct one to complete the thread, see this once too stackoverflow.com/help/someone-answers

    – RavinderSingh13
    Nov 17 '18 at 2:24


















0















Here's my input.file (thousands of lines):



FN545816.1  EMBL    CDS 9450    9857    .   +   0   ID=cds-CBE01461.1;Parent=gene-CDR20291_3551;Dbxref=EnsemblGenomes-Gn:CDR20291_3551,EnsemblGenomes-Tr:CBE01461,GOA:C9YHF8,InterPro:IPR003594,UniProtKB/TrEMBL:C9YHF8,NCBI_GP:CBE01461.1;Name=CBE01461.1;gbkey=CDS;gene=rsbW;product=anti-sigma-B factor (serine-protein kinase);protein_id=CBE01461.1;transl_table=11


I want to extract only what comes after product= up to the next ;
So, in this case, I want to get "anti-sigma-B factor (serine-protein kinase)"



I tried this:



awk '{for(i=1; i<=NF; i++) if($i~/*product=/) print $(i+1)}' input.file > output.file 


but it prints only "factor" (presumably because there's no space in between "product=" and "anti-sigma-B". It doesn't print the rest neither.



I tried many previous solutions but none gave what I want.



Thank you.










share|improve this question























  • give it sometime and try to select and answer out of all as a correct one to complete the thread, see this once too stackoverflow.com/help/someone-answers

    – RavinderSingh13
    Nov 17 '18 at 2:24














0












0








0








Here's my input.file (thousands of lines):



FN545816.1  EMBL    CDS 9450    9857    .   +   0   ID=cds-CBE01461.1;Parent=gene-CDR20291_3551;Dbxref=EnsemblGenomes-Gn:CDR20291_3551,EnsemblGenomes-Tr:CBE01461,GOA:C9YHF8,InterPro:IPR003594,UniProtKB/TrEMBL:C9YHF8,NCBI_GP:CBE01461.1;Name=CBE01461.1;gbkey=CDS;gene=rsbW;product=anti-sigma-B factor (serine-protein kinase);protein_id=CBE01461.1;transl_table=11


I want to extract only what comes after product= up to the next ;
So, in this case, I want to get "anti-sigma-B factor (serine-protein kinase)"



I tried this:



awk '{for(i=1; i<=NF; i++) if($i~/*product=/) print $(i+1)}' input.file > output.file 


but it prints only "factor" (presumably because there's no space in between "product=" and "anti-sigma-B". It doesn't print the rest neither.



I tried many previous solutions but none gave what I want.



Thank you.










share|improve this question














Here's my input.file (thousands of lines):



FN545816.1  EMBL    CDS 9450    9857    .   +   0   ID=cds-CBE01461.1;Parent=gene-CDR20291_3551;Dbxref=EnsemblGenomes-Gn:CDR20291_3551,EnsemblGenomes-Tr:CBE01461,GOA:C9YHF8,InterPro:IPR003594,UniProtKB/TrEMBL:C9YHF8,NCBI_GP:CBE01461.1;Name=CBE01461.1;gbkey=CDS;gene=rsbW;product=anti-sigma-B factor (serine-protein kinase);protein_id=CBE01461.1;transl_table=11


I want to extract only what comes after product= up to the next ;
So, in this case, I want to get "anti-sigma-B factor (serine-protein kinase)"



I tried this:



awk '{for(i=1; i<=NF; i++) if($i~/*product=/) print $(i+1)}' input.file > output.file 


but it prints only "factor" (presumably because there's no space in between "product=" and "anti-sigma-B". It doesn't print the rest neither.



I tried many previous solutions but none gave what I want.



Thank you.







awk






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 17 '18 at 1:03









ThePresidentThePresident

888




888













  • give it sometime and try to select and answer out of all as a correct one to complete the thread, see this once too stackoverflow.com/help/someone-answers

    – RavinderSingh13
    Nov 17 '18 at 2:24



















  • give it sometime and try to select and answer out of all as a correct one to complete the thread, see this once too stackoverflow.com/help/someone-answers

    – RavinderSingh13
    Nov 17 '18 at 2:24

















give it sometime and try to select and answer out of all as a correct one to complete the thread, see this once too stackoverflow.com/help/someone-answers

– RavinderSingh13
Nov 17 '18 at 2:24





give it sometime and try to select and answer out of all as a correct one to complete the thread, see this once too stackoverflow.com/help/someone-answers

– RavinderSingh13
Nov 17 '18 at 2:24












1 Answer
1






active

oldest

votes


















1














Could you please try following.



awk 'match($0,/product=[^;]*/){print substr($0,RSTART+8,RLENGTH-8)}' Input_file


Explanation: Adding explanation for above code too now.



awk '                                           ##Starting awk program here.
match($0,/product=[^;]*/){ ##Using match function for awk here, where giving REGEX to match from string product= till first occurrence of ;
print substr($0,RSTART+8,RLENGTH-8) ##Printing substring whose value is from RSTART+8 to till RLENGTH-8, where RSTART and RLENGTH are out of the box keywords which will be set once REGEX condition is satisfied. RSTART mean starting point of regex and RLENGTH is length of REGEX matched.
}' Input_file ##Mentioning Input_file name here.





share|improve this answer


























  • Yes, this works perfectly, thank you! Honestly, I did not get the substr($0,RSTART+8,RLENGTH-8) part.

    – ThePresident
    Nov 17 '18 at 2:28











  • @ThePresident, I had added explanation to my answer too, where RSTART is the starting index(point) of matched regex and RLENGTH is the length of that matched REGEX. So as we know substr is used to get sub string then it has starting point(from where it should cut string) I have put RSTART till RLENGTH, is it clear now please let me know in case of any queries?

    – RavinderSingh13
    Nov 17 '18 at 2:30






  • 1





    Yeah makes sense now. Thank you.

    – ThePresident
    Nov 17 '18 at 2:32












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53347256%2ffind-string-then-print-what-comes-next-until-another-string%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Could you please try following.



awk 'match($0,/product=[^;]*/){print substr($0,RSTART+8,RLENGTH-8)}' Input_file


Explanation: Adding explanation for above code too now.



awk '                                           ##Starting awk program here.
match($0,/product=[^;]*/){ ##Using match function for awk here, where giving REGEX to match from string product= till first occurrence of ;
print substr($0,RSTART+8,RLENGTH-8) ##Printing substring whose value is from RSTART+8 to till RLENGTH-8, where RSTART and RLENGTH are out of the box keywords which will be set once REGEX condition is satisfied. RSTART mean starting point of regex and RLENGTH is length of REGEX matched.
}' Input_file ##Mentioning Input_file name here.





share|improve this answer


























  • Yes, this works perfectly, thank you! Honestly, I did not get the substr($0,RSTART+8,RLENGTH-8) part.

    – ThePresident
    Nov 17 '18 at 2:28











  • @ThePresident, I had added explanation to my answer too, where RSTART is the starting index(point) of matched regex and RLENGTH is the length of that matched REGEX. So as we know substr is used to get sub string then it has starting point(from where it should cut string) I have put RSTART till RLENGTH, is it clear now please let me know in case of any queries?

    – RavinderSingh13
    Nov 17 '18 at 2:30






  • 1





    Yeah makes sense now. Thank you.

    – ThePresident
    Nov 17 '18 at 2:32
















1














Could you please try following.



awk 'match($0,/product=[^;]*/){print substr($0,RSTART+8,RLENGTH-8)}' Input_file


Explanation: Adding explanation for above code too now.



awk '                                           ##Starting awk program here.
match($0,/product=[^;]*/){ ##Using match function for awk here, where giving REGEX to match from string product= till first occurrence of ;
print substr($0,RSTART+8,RLENGTH-8) ##Printing substring whose value is from RSTART+8 to till RLENGTH-8, where RSTART and RLENGTH are out of the box keywords which will be set once REGEX condition is satisfied. RSTART mean starting point of regex and RLENGTH is length of REGEX matched.
}' Input_file ##Mentioning Input_file name here.





share|improve this answer


























  • Yes, this works perfectly, thank you! Honestly, I did not get the substr($0,RSTART+8,RLENGTH-8) part.

    – ThePresident
    Nov 17 '18 at 2:28











  • @ThePresident, I had added explanation to my answer too, where RSTART is the starting index(point) of matched regex and RLENGTH is the length of that matched REGEX. So as we know substr is used to get sub string then it has starting point(from where it should cut string) I have put RSTART till RLENGTH, is it clear now please let me know in case of any queries?

    – RavinderSingh13
    Nov 17 '18 at 2:30






  • 1





    Yeah makes sense now. Thank you.

    – ThePresident
    Nov 17 '18 at 2:32














1












1








1







Could you please try following.



awk 'match($0,/product=[^;]*/){print substr($0,RSTART+8,RLENGTH-8)}' Input_file


Explanation: Adding explanation for above code too now.



awk '                                           ##Starting awk program here.
match($0,/product=[^;]*/){ ##Using match function for awk here, where giving REGEX to match from string product= till first occurrence of ;
print substr($0,RSTART+8,RLENGTH-8) ##Printing substring whose value is from RSTART+8 to till RLENGTH-8, where RSTART and RLENGTH are out of the box keywords which will be set once REGEX condition is satisfied. RSTART mean starting point of regex and RLENGTH is length of REGEX matched.
}' Input_file ##Mentioning Input_file name here.





share|improve this answer















Could you please try following.



awk 'match($0,/product=[^;]*/){print substr($0,RSTART+8,RLENGTH-8)}' Input_file


Explanation: Adding explanation for above code too now.



awk '                                           ##Starting awk program here.
match($0,/product=[^;]*/){ ##Using match function for awk here, where giving REGEX to match from string product= till first occurrence of ;
print substr($0,RSTART+8,RLENGTH-8) ##Printing substring whose value is from RSTART+8 to till RLENGTH-8, where RSTART and RLENGTH are out of the box keywords which will be set once REGEX condition is satisfied. RSTART mean starting point of regex and RLENGTH is length of REGEX matched.
}' Input_file ##Mentioning Input_file name here.






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 17 '18 at 2:13

























answered Nov 17 '18 at 2:07









RavinderSingh13RavinderSingh13

31k41639




31k41639













  • Yes, this works perfectly, thank you! Honestly, I did not get the substr($0,RSTART+8,RLENGTH-8) part.

    – ThePresident
    Nov 17 '18 at 2:28











  • @ThePresident, I had added explanation to my answer too, where RSTART is the starting index(point) of matched regex and RLENGTH is the length of that matched REGEX. So as we know substr is used to get sub string then it has starting point(from where it should cut string) I have put RSTART till RLENGTH, is it clear now please let me know in case of any queries?

    – RavinderSingh13
    Nov 17 '18 at 2:30






  • 1





    Yeah makes sense now. Thank you.

    – ThePresident
    Nov 17 '18 at 2:32



















  • Yes, this works perfectly, thank you! Honestly, I did not get the substr($0,RSTART+8,RLENGTH-8) part.

    – ThePresident
    Nov 17 '18 at 2:28











  • @ThePresident, I had added explanation to my answer too, where RSTART is the starting index(point) of matched regex and RLENGTH is the length of that matched REGEX. So as we know substr is used to get sub string then it has starting point(from where it should cut string) I have put RSTART till RLENGTH, is it clear now please let me know in case of any queries?

    – RavinderSingh13
    Nov 17 '18 at 2:30






  • 1





    Yeah makes sense now. Thank you.

    – ThePresident
    Nov 17 '18 at 2:32

















Yes, this works perfectly, thank you! Honestly, I did not get the substr($0,RSTART+8,RLENGTH-8) part.

– ThePresident
Nov 17 '18 at 2:28





Yes, this works perfectly, thank you! Honestly, I did not get the substr($0,RSTART+8,RLENGTH-8) part.

– ThePresident
Nov 17 '18 at 2:28













@ThePresident, I had added explanation to my answer too, where RSTART is the starting index(point) of matched regex and RLENGTH is the length of that matched REGEX. So as we know substr is used to get sub string then it has starting point(from where it should cut string) I have put RSTART till RLENGTH, is it clear now please let me know in case of any queries?

– RavinderSingh13
Nov 17 '18 at 2:30





@ThePresident, I had added explanation to my answer too, where RSTART is the starting index(point) of matched regex and RLENGTH is the length of that matched REGEX. So as we know substr is used to get sub string then it has starting point(from where it should cut string) I have put RSTART till RLENGTH, is it clear now please let me know in case of any queries?

– RavinderSingh13
Nov 17 '18 at 2:30




1




1





Yeah makes sense now. Thank you.

– ThePresident
Nov 17 '18 at 2:32





Yeah makes sense now. Thank you.

– ThePresident
Nov 17 '18 at 2:32




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53347256%2ffind-string-then-print-what-comes-next-until-another-string%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Xamarin.iOS Cant Deploy on Iphone

Glorious Revolution

Dulmage-Mendelsohn matrix decomposition in Python