Create JSON with XML file using BeautifulSoup
up vote
0
down vote
favorite
I am using Jupyer notebook, running python 3. My task is to extract data from XML file and convert it to json format (perhaps even save the json in an output.dat file). I am using BeautifulSoup to navigate through the nodes. I have the following data:
<?xml version='1.0' encoding='UTF-8'?>
<Terms>
<Term>
<Title>.177 (4.5mm) Airgun</Title>
<Description>The standard airgun calibre for international target
shooting.</Description>
<RelatedTerms>
<Term>
<Title>Shooting sport equipment</Title>
<Relationship>Narrower Term</Relationship>
</Term>
</RelatedTerms>
</Term>
<Term>
<Title>1 Kilometre Time Trial</Title>
<Description>test2</Description>
<RelatedTerms>
<Term>
<Title>1 Kilometre TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>One km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
</RelatedTerms>
</Term>
This is the following output that I am expecting in JSON:
{
"thesaurus": [
{
"Description": "The standard airgun calibre for international target shooting.",
"RelatedTerms": [
{
"Relationship": "Narrower Term",
"Title": "Shooting sport equipment"
}
],
"Title": ".177 (4.5mm) Airgun"
},
{
"Description": "test2",
"RelatedTerms": [
{
"Relationship": "Used For",
"Title": "1 Kilometre TT"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km TT"
},
{
"Relationship": "Used For",
"Title": "One km Time Trial"
}
],
"Title": "1 Kilometre Time Trial"
},
I am navigating through the tags so that I can create dictionaries as seen in the output example. Since I am new to text scraping, this is quite frustrating.
I was able to extract the "Description" tag with the following code:
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, encoding="utf8"),"xml")
elements = btree.find_all('Description')
descriptionTag =
for element in elements:
descriptionTag.append(element.text)
Like the above Description tag, I am not sure how to create a list of dictionaries for the information stored between the "RelatedTerms" tag.
Ideally, I would parse all the tags to a dataframe which would then convert the data to JSON format.
So, can someone please help in determining how to extract the information from "RelatedTerms" tag.
json xml beautifulsoup
add a comment |
up vote
0
down vote
favorite
I am using Jupyer notebook, running python 3. My task is to extract data from XML file and convert it to json format (perhaps even save the json in an output.dat file). I am using BeautifulSoup to navigate through the nodes. I have the following data:
<?xml version='1.0' encoding='UTF-8'?>
<Terms>
<Term>
<Title>.177 (4.5mm) Airgun</Title>
<Description>The standard airgun calibre for international target
shooting.</Description>
<RelatedTerms>
<Term>
<Title>Shooting sport equipment</Title>
<Relationship>Narrower Term</Relationship>
</Term>
</RelatedTerms>
</Term>
<Term>
<Title>1 Kilometre Time Trial</Title>
<Description>test2</Description>
<RelatedTerms>
<Term>
<Title>1 Kilometre TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>One km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
</RelatedTerms>
</Term>
This is the following output that I am expecting in JSON:
{
"thesaurus": [
{
"Description": "The standard airgun calibre for international target shooting.",
"RelatedTerms": [
{
"Relationship": "Narrower Term",
"Title": "Shooting sport equipment"
}
],
"Title": ".177 (4.5mm) Airgun"
},
{
"Description": "test2",
"RelatedTerms": [
{
"Relationship": "Used For",
"Title": "1 Kilometre TT"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km TT"
},
{
"Relationship": "Used For",
"Title": "One km Time Trial"
}
],
"Title": "1 Kilometre Time Trial"
},
I am navigating through the tags so that I can create dictionaries as seen in the output example. Since I am new to text scraping, this is quite frustrating.
I was able to extract the "Description" tag with the following code:
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, encoding="utf8"),"xml")
elements = btree.find_all('Description')
descriptionTag =
for element in elements:
descriptionTag.append(element.text)
Like the above Description tag, I am not sure how to create a list of dictionaries for the information stored between the "RelatedTerms" tag.
Ideally, I would parse all the tags to a dataframe which would then convert the data to JSON format.
So, can someone please help in determining how to extract the information from "RelatedTerms" tag.
json xml beautifulsoup
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am using Jupyer notebook, running python 3. My task is to extract data from XML file and convert it to json format (perhaps even save the json in an output.dat file). I am using BeautifulSoup to navigate through the nodes. I have the following data:
<?xml version='1.0' encoding='UTF-8'?>
<Terms>
<Term>
<Title>.177 (4.5mm) Airgun</Title>
<Description>The standard airgun calibre for international target
shooting.</Description>
<RelatedTerms>
<Term>
<Title>Shooting sport equipment</Title>
<Relationship>Narrower Term</Relationship>
</Term>
</RelatedTerms>
</Term>
<Term>
<Title>1 Kilometre Time Trial</Title>
<Description>test2</Description>
<RelatedTerms>
<Term>
<Title>1 Kilometre TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>One km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
</RelatedTerms>
</Term>
This is the following output that I am expecting in JSON:
{
"thesaurus": [
{
"Description": "The standard airgun calibre for international target shooting.",
"RelatedTerms": [
{
"Relationship": "Narrower Term",
"Title": "Shooting sport equipment"
}
],
"Title": ".177 (4.5mm) Airgun"
},
{
"Description": "test2",
"RelatedTerms": [
{
"Relationship": "Used For",
"Title": "1 Kilometre TT"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km TT"
},
{
"Relationship": "Used For",
"Title": "One km Time Trial"
}
],
"Title": "1 Kilometre Time Trial"
},
I am navigating through the tags so that I can create dictionaries as seen in the output example. Since I am new to text scraping, this is quite frustrating.
I was able to extract the "Description" tag with the following code:
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, encoding="utf8"),"xml")
elements = btree.find_all('Description')
descriptionTag =
for element in elements:
descriptionTag.append(element.text)
Like the above Description tag, I am not sure how to create a list of dictionaries for the information stored between the "RelatedTerms" tag.
Ideally, I would parse all the tags to a dataframe which would then convert the data to JSON format.
So, can someone please help in determining how to extract the information from "RelatedTerms" tag.
json xml beautifulsoup
I am using Jupyer notebook, running python 3. My task is to extract data from XML file and convert it to json format (perhaps even save the json in an output.dat file). I am using BeautifulSoup to navigate through the nodes. I have the following data:
<?xml version='1.0' encoding='UTF-8'?>
<Terms>
<Term>
<Title>.177 (4.5mm) Airgun</Title>
<Description>The standard airgun calibre for international target
shooting.</Description>
<RelatedTerms>
<Term>
<Title>Shooting sport equipment</Title>
<Relationship>Narrower Term</Relationship>
</Term>
</RelatedTerms>
</Term>
<Term>
<Title>1 Kilometre Time Trial</Title>
<Description>test2</Description>
<RelatedTerms>
<Term>
<Title>1 Kilometre TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>One km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
</RelatedTerms>
</Term>
This is the following output that I am expecting in JSON:
{
"thesaurus": [
{
"Description": "The standard airgun calibre for international target shooting.",
"RelatedTerms": [
{
"Relationship": "Narrower Term",
"Title": "Shooting sport equipment"
}
],
"Title": ".177 (4.5mm) Airgun"
},
{
"Description": "test2",
"RelatedTerms": [
{
"Relationship": "Used For",
"Title": "1 Kilometre TT"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km TT"
},
{
"Relationship": "Used For",
"Title": "One km Time Trial"
}
],
"Title": "1 Kilometre Time Trial"
},
I am navigating through the tags so that I can create dictionaries as seen in the output example. Since I am new to text scraping, this is quite frustrating.
I was able to extract the "Description" tag with the following code:
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, encoding="utf8"),"xml")
elements = btree.find_all('Description')
descriptionTag =
for element in elements:
descriptionTag.append(element.text)
Like the above Description tag, I am not sure how to create a list of dictionaries for the information stored between the "RelatedTerms" tag.
Ideally, I would parse all the tags to a dataframe which would then convert the data to JSON format.
So, can someone please help in determining how to extract the information from "RelatedTerms" tag.
json xml beautifulsoup
json xml beautifulsoup
asked 16 hours ago
Timetraveller
117114
117114
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
to extract RelatedTerms first you have to extract top Term element using btree.select('Terms > Term') now you can loop it and extract Term inside RelatedTerms using term.select('RelatedTerms > Term')
import json
from bs4 import BeautifulSoup
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, 'r'), "xml")
Terms = btree.select('Terms > Term')
jsonObj = {"thesaurus": }
for term in Terms:
termDetail = {
"Description": term.find('Description').text,
"Title": term.find('Title').text
}
RelatedTerms = term.select('RelatedTerms > Term')
if RelatedTerms:
termDetail["RelatedTerms"] =
for rterm in RelatedTerms:
termDetail["RelatedTerms"].append({
"Title": rterm.find('Title').text,
"Relationship": rterm.find('Relationship').text
})
jsonObj["thesaurus"].append(termDetail)
print json.dumps(jsonObj, indent=4)
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
to extract RelatedTerms first you have to extract top Term element using btree.select('Terms > Term') now you can loop it and extract Term inside RelatedTerms using term.select('RelatedTerms > Term')
import json
from bs4 import BeautifulSoup
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, 'r'), "xml")
Terms = btree.select('Terms > Term')
jsonObj = {"thesaurus": }
for term in Terms:
termDetail = {
"Description": term.find('Description').text,
"Title": term.find('Title').text
}
RelatedTerms = term.select('RelatedTerms > Term')
if RelatedTerms:
termDetail["RelatedTerms"] =
for rterm in RelatedTerms:
termDetail["RelatedTerms"].append({
"Title": rterm.find('Title').text,
"Relationship": rterm.find('Relationship').text
})
jsonObj["thesaurus"].append(termDetail)
print json.dumps(jsonObj, indent=4)
add a comment |
up vote
0
down vote
to extract RelatedTerms first you have to extract top Term element using btree.select('Terms > Term') now you can loop it and extract Term inside RelatedTerms using term.select('RelatedTerms > Term')
import json
from bs4 import BeautifulSoup
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, 'r'), "xml")
Terms = btree.select('Terms > Term')
jsonObj = {"thesaurus": }
for term in Terms:
termDetail = {
"Description": term.find('Description').text,
"Title": term.find('Title').text
}
RelatedTerms = term.select('RelatedTerms > Term')
if RelatedTerms:
termDetail["RelatedTerms"] =
for rterm in RelatedTerms:
termDetail["RelatedTerms"].append({
"Title": rterm.find('Title').text,
"Relationship": rterm.find('Relationship').text
})
jsonObj["thesaurus"].append(termDetail)
print json.dumps(jsonObj, indent=4)
add a comment |
up vote
0
down vote
up vote
0
down vote
to extract RelatedTerms first you have to extract top Term element using btree.select('Terms > Term') now you can loop it and extract Term inside RelatedTerms using term.select('RelatedTerms > Term')
import json
from bs4 import BeautifulSoup
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, 'r'), "xml")
Terms = btree.select('Terms > Term')
jsonObj = {"thesaurus": }
for term in Terms:
termDetail = {
"Description": term.find('Description').text,
"Title": term.find('Title').text
}
RelatedTerms = term.select('RelatedTerms > Term')
if RelatedTerms:
termDetail["RelatedTerms"] =
for rterm in RelatedTerms:
termDetail["RelatedTerms"].append({
"Title": rterm.find('Title').text,
"Relationship": rterm.find('Relationship').text
})
jsonObj["thesaurus"].append(termDetail)
print json.dumps(jsonObj, indent=4)
to extract RelatedTerms first you have to extract top Term element using btree.select('Terms > Term') now you can loop it and extract Term inside RelatedTerms using term.select('RelatedTerms > Term')
import json
from bs4 import BeautifulSoup
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, 'r'), "xml")
Terms = btree.select('Terms > Term')
jsonObj = {"thesaurus": }
for term in Terms:
termDetail = {
"Description": term.find('Description').text,
"Title": term.find('Title').text
}
RelatedTerms = term.select('RelatedTerms > Term')
if RelatedTerms:
termDetail["RelatedTerms"] =
for rterm in RelatedTerms:
termDetail["RelatedTerms"].append({
"Title": rterm.find('Title').text,
"Relationship": rterm.find('Relationship').text
})
jsonObj["thesaurus"].append(termDetail)
print json.dumps(jsonObj, indent=4)
edited 13 hours ago
answered 14 hours ago
ewwink
5,33922231
5,33922231
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237663%2fcreate-json-with-xml-file-using-beautifulsoup%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password