How do I fix/edit this regular expression?
up vote
1
down vote
favorite
lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS]
ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG
lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS]
ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA
I want to extract word after locus_tag= (only LBPC_RS14705 and LBPC_RS14710). How do I fix this regular expression?
[locus_tag][=]w+
javascript regex
New contributor
|
show 1 more comment
up vote
1
down vote
favorite
lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS]
ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG
lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS]
ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA
I want to extract word after locus_tag= (only LBPC_RS14705 and LBPC_RS14710). How do I fix this regular expression?
[locus_tag][=]w+
javascript regex
New contributor
What exactly do you want as your desired output,LBPC_RS14705
or the whole text after it?
– rv7
21 hours ago
@rv7 Just only LBPC_RS14705 and LBPC_RS14710.
– Glufflix
21 hours ago
You need a capturing group around yourw+
, like this
– rv7
21 hours ago
1
@rv7 Thanks a lot! It's very helpful for me.
– Glufflix
20 hours ago
@Glufflix I updated my answer to retrieve both tags
– Nick Parsons
20 hours ago
|
show 1 more comment
up vote
1
down vote
favorite
up vote
1
down vote
favorite
lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS]
ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG
lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS]
ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA
I want to extract word after locus_tag= (only LBPC_RS14705 and LBPC_RS14710). How do I fix this regular expression?
[locus_tag][=]w+
javascript regex
New contributor
lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS]
ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG
lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS]
ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA
I want to extract word after locus_tag= (only LBPC_RS14705 and LBPC_RS14710). How do I fix this regular expression?
[locus_tag][=]w+
javascript regex
javascript regex
New contributor
New contributor
edited 20 hours ago
quant
8831825
8831825
New contributor
asked 21 hours ago
Glufflix
84
84
New contributor
New contributor
What exactly do you want as your desired output,LBPC_RS14705
or the whole text after it?
– rv7
21 hours ago
@rv7 Just only LBPC_RS14705 and LBPC_RS14710.
– Glufflix
21 hours ago
You need a capturing group around yourw+
, like this
– rv7
21 hours ago
1
@rv7 Thanks a lot! It's very helpful for me.
– Glufflix
20 hours ago
@Glufflix I updated my answer to retrieve both tags
– Nick Parsons
20 hours ago
|
show 1 more comment
What exactly do you want as your desired output,LBPC_RS14705
or the whole text after it?
– rv7
21 hours ago
@rv7 Just only LBPC_RS14705 and LBPC_RS14710.
– Glufflix
21 hours ago
You need a capturing group around yourw+
, like this
– rv7
21 hours ago
1
@rv7 Thanks a lot! It's very helpful for me.
– Glufflix
20 hours ago
@Glufflix I updated my answer to retrieve both tags
– Nick Parsons
20 hours ago
What exactly do you want as your desired output,
LBPC_RS14705
or the whole text after it?– rv7
21 hours ago
What exactly do you want as your desired output,
LBPC_RS14705
or the whole text after it?– rv7
21 hours ago
@rv7 Just only LBPC_RS14705 and LBPC_RS14710.
– Glufflix
21 hours ago
@rv7 Just only LBPC_RS14705 and LBPC_RS14710.
– Glufflix
21 hours ago
You need a capturing group around your
w+
, like this– rv7
21 hours ago
You need a capturing group around your
w+
, like this– rv7
21 hours ago
1
1
@rv7 Thanks a lot! It's very helpful for me.
– Glufflix
20 hours ago
@rv7 Thanks a lot! It's very helpful for me.
– Glufflix
20 hours ago
@Glufflix I updated my answer to retrieve both tags
– Nick Parsons
20 hours ago
@Glufflix I updated my answer to retrieve both tags
– Nick Parsons
20 hours ago
|
show 1 more comment
2 Answers
2
active
oldest
votes
up vote
1
down vote
You can use the following regular expression to match the locus_tag
:
/[locus_tag=(w+)]/g;
In this expression, I have captured word characters after the "locus_tag=" and so you can access it by doing .exec(str)[1]
twice to get both of the tags.
See a working example below:
const str =
`lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS1477705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG
lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA`;
const regex = /[locus_tag=(w+)]/g;
console.log(regex.exec(str)[1]); // Run exec once to get the first match
console.log(regex.exec(str)[1]); // Run exec twice to get the second match
add a comment |
up vote
0
down vote
You can also try any of the following approaches.
Here I've assumed your locus tag has word characters as I can see. And
w+
is there to match it.
Helpful link: https://javascript.info/regexp-groups
1st way
var s1 = "lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG";
var s2 = "lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA";
const regEx = /(locus_tag=(w+))/;
var locus_tag1 = s1.match(regEx)[2];
var locus_tag2 = s2.match(regEx)[2];
console.log(locus_tag1); // LBPC_RS14705
console.log(locus_tag2); // LBPC_RS14710
2nd way
var s1 = "lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG";
var s2 = "lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA";
const regEx = /(locus_tag=w+)/;
var locus_tag1 = s1.match(regEx)[0].split('=')[1];
var locus_tag2 = s2.match(regEx)[0].split('=')[1];
console.log(locus_tag1); // LBPC_RS14705
console.log(locus_tag2); // LBPC_RS14710
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
You can use the following regular expression to match the locus_tag
:
/[locus_tag=(w+)]/g;
In this expression, I have captured word characters after the "locus_tag=" and so you can access it by doing .exec(str)[1]
twice to get both of the tags.
See a working example below:
const str =
`lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS1477705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG
lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA`;
const regex = /[locus_tag=(w+)]/g;
console.log(regex.exec(str)[1]); // Run exec once to get the first match
console.log(regex.exec(str)[1]); // Run exec twice to get the second match
add a comment |
up vote
1
down vote
You can use the following regular expression to match the locus_tag
:
/[locus_tag=(w+)]/g;
In this expression, I have captured word characters after the "locus_tag=" and so you can access it by doing .exec(str)[1]
twice to get both of the tags.
See a working example below:
const str =
`lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS1477705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG
lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA`;
const regex = /[locus_tag=(w+)]/g;
console.log(regex.exec(str)[1]); // Run exec once to get the first match
console.log(regex.exec(str)[1]); // Run exec twice to get the second match
add a comment |
up vote
1
down vote
up vote
1
down vote
You can use the following regular expression to match the locus_tag
:
/[locus_tag=(w+)]/g;
In this expression, I have captured word characters after the "locus_tag=" and so you can access it by doing .exec(str)[1]
twice to get both of the tags.
See a working example below:
const str =
`lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS1477705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG
lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA`;
const regex = /[locus_tag=(w+)]/g;
console.log(regex.exec(str)[1]); // Run exec once to get the first match
console.log(regex.exec(str)[1]); // Run exec twice to get the second match
You can use the following regular expression to match the locus_tag
:
/[locus_tag=(w+)]/g;
In this expression, I have captured word characters after the "locus_tag=" and so you can access it by doing .exec(str)[1]
twice to get both of the tags.
See a working example below:
const str =
`lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS1477705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG
lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA`;
const regex = /[locus_tag=(w+)]/g;
console.log(regex.exec(str)[1]); // Run exec once to get the first match
console.log(regex.exec(str)[1]); // Run exec twice to get the second match
const str =
`lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS1477705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG
lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA`;
const regex = /[locus_tag=(w+)]/g;
console.log(regex.exec(str)[1]); // Run exec once to get the first match
console.log(regex.exec(str)[1]); // Run exec twice to get the second match
const str =
`lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS1477705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG
lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA`;
const regex = /[locus_tag=(w+)]/g;
console.log(regex.exec(str)[1]); // Run exec once to get the first match
console.log(regex.exec(str)[1]); // Run exec twice to get the second match
edited 18 hours ago
answered 21 hours ago
Nick Parsons
2,0682518
2,0682518
add a comment |
add a comment |
up vote
0
down vote
You can also try any of the following approaches.
Here I've assumed your locus tag has word characters as I can see. And
w+
is there to match it.
Helpful link: https://javascript.info/regexp-groups
1st way
var s1 = "lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG";
var s2 = "lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA";
const regEx = /(locus_tag=(w+))/;
var locus_tag1 = s1.match(regEx)[2];
var locus_tag2 = s2.match(regEx)[2];
console.log(locus_tag1); // LBPC_RS14705
console.log(locus_tag2); // LBPC_RS14710
2nd way
var s1 = "lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG";
var s2 = "lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA";
const regEx = /(locus_tag=w+)/;
var locus_tag1 = s1.match(regEx)[0].split('=')[1];
var locus_tag2 = s2.match(regEx)[0].split('=')[1];
console.log(locus_tag1); // LBPC_RS14705
console.log(locus_tag2); // LBPC_RS14710
add a comment |
up vote
0
down vote
You can also try any of the following approaches.
Here I've assumed your locus tag has word characters as I can see. And
w+
is there to match it.
Helpful link: https://javascript.info/regexp-groups
1st way
var s1 = "lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG";
var s2 = "lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA";
const regEx = /(locus_tag=(w+))/;
var locus_tag1 = s1.match(regEx)[2];
var locus_tag2 = s2.match(regEx)[2];
console.log(locus_tag1); // LBPC_RS14705
console.log(locus_tag2); // LBPC_RS14710
2nd way
var s1 = "lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG";
var s2 = "lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA";
const regEx = /(locus_tag=w+)/;
var locus_tag1 = s1.match(regEx)[0].split('=')[1];
var locus_tag2 = s2.match(regEx)[0].split('=')[1];
console.log(locus_tag1); // LBPC_RS14705
console.log(locus_tag2); // LBPC_RS14710
add a comment |
up vote
0
down vote
up vote
0
down vote
You can also try any of the following approaches.
Here I've assumed your locus tag has word characters as I can see. And
w+
is there to match it.
Helpful link: https://javascript.info/regexp-groups
1st way
var s1 = "lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG";
var s2 = "lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA";
const regEx = /(locus_tag=(w+))/;
var locus_tag1 = s1.match(regEx)[2];
var locus_tag2 = s2.match(regEx)[2];
console.log(locus_tag1); // LBPC_RS14705
console.log(locus_tag2); // LBPC_RS14710
2nd way
var s1 = "lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG";
var s2 = "lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA";
const regEx = /(locus_tag=w+)/;
var locus_tag1 = s1.match(regEx)[0].split('=')[1];
var locus_tag2 = s2.match(regEx)[0].split('=')[1];
console.log(locus_tag1); // LBPC_RS14705
console.log(locus_tag2); // LBPC_RS14710
You can also try any of the following approaches.
Here I've assumed your locus tag has word characters as I can see. And
w+
is there to match it.
Helpful link: https://javascript.info/regexp-groups
1st way
var s1 = "lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG";
var s2 = "lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA";
const regEx = /(locus_tag=(w+))/;
var locus_tag1 = s1.match(regEx)[2];
var locus_tag2 = s2.match(regEx)[2];
console.log(locus_tag1); // LBPC_RS14705
console.log(locus_tag2); // LBPC_RS14710
2nd way
var s1 = "lcl|NZ_AP012542.1_cds_WP_003600377.1_1 [locus_tag=LBPC_RS14705] [db_xref=GeneID:31583580] [protein=RepB family plasmid replication initiator protein] [protein_id=WP_003600377.1] [location=1..780] [gbkey=CDS] ATGGCAAATACAATCAACAAAAAACAAAATCTGGCGATGCAGGCGTTGCTTAAACGCCAAGACTATCTTG";
var s2 = "lcl|NZ_AP012542.1_cds_WP_016377574.1_2 [locus_tag=LBPC_RS14710] [db_xref=GeneID:31583581] [protein=DUF536 domain-containing protein] [protein_id=WP_016377574.1] [location=complement(1459..1956)] [gbkey=CDS] ATGAGTAAGACCATCAAAGAACTTGCAGAGGAATTGAGCTTATCTAAATCTGGTATTCGTAAATATCTAA";
const regEx = /(locus_tag=w+)/;
var locus_tag1 = s1.match(regEx)[0].split('=')[1];
var locus_tag2 = s2.match(regEx)[0].split('=')[1];
console.log(locus_tag1); // LBPC_RS14705
console.log(locus_tag2); // LBPC_RS14710
edited 18 hours ago
answered 18 hours ago
hygull
2,67311126
2,67311126
add a comment |
add a comment |
Glufflix is a new contributor. Be nice, and check out our Code of Conduct.
Glufflix is a new contributor. Be nice, and check out our Code of Conduct.
Glufflix is a new contributor. Be nice, and check out our Code of Conduct.
Glufflix is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237041%2fhow-do-i-fix-edit-this-regular-expression%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
What exactly do you want as your desired output,
LBPC_RS14705
or the whole text after it?– rv7
21 hours ago
@rv7 Just only LBPC_RS14705 and LBPC_RS14710.
– Glufflix
21 hours ago
You need a capturing group around your
w+
, like this– rv7
21 hours ago
1
@rv7 Thanks a lot! It's very helpful for me.
– Glufflix
20 hours ago
@Glufflix I updated my answer to retrieve both tags
– Nick Parsons
20 hours ago