Create JSON with XML file using BeautifulSoup











up vote
0
down vote

favorite












I am using Jupyer notebook, running python 3. My task is to extract data from XML file and convert it to json format (perhaps even save the json in an output.dat file). I am using BeautifulSoup to navigate through the nodes. I have the following data:



<?xml version='1.0' encoding='UTF-8'?> 
<Terms>
<Term>
<Title>.177 (4.5mm) Airgun</Title>
<Description>The standard airgun calibre for international target
shooting.</Description>
<RelatedTerms>
<Term>
<Title>Shooting sport equipment</Title>
<Relationship>Narrower Term</Relationship>
</Term>
</RelatedTerms>
</Term>
<Term>
<Title>1 Kilometre Time Trial</Title>
<Description>test2</Description>
<RelatedTerms>
<Term>
<Title>1 Kilometre TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>One km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
</RelatedTerms>
</Term>


This is the following output that I am expecting in JSON:



{
"thesaurus": [
{
"Description": "The standard airgun calibre for international target shooting.",
"RelatedTerms": [
{
"Relationship": "Narrower Term",
"Title": "Shooting sport equipment"
}
],
"Title": ".177 (4.5mm) Airgun"
},

{
"Description": "test2",
"RelatedTerms": [
{
"Relationship": "Used For",
"Title": "1 Kilometre TT"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km TT"
},
{
"Relationship": "Used For",
"Title": "One km Time Trial"
}
],
"Title": "1 Kilometre Time Trial"
},


I am navigating through the tags so that I can create dictionaries as seen in the output example. Since I am new to text scraping, this is quite frustrating.



I was able to extract the "Description" tag with the following code:



xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, encoding="utf8"),"xml")
elements = btree.find_all('Description')
descriptionTag =
for element in elements:
descriptionTag.append(element.text)


Like the above Description tag, I am not sure how to create a list of dictionaries for the information stored between the "RelatedTerms" tag.
Ideally, I would parse all the tags to a dataframe which would then convert the data to JSON format.



So, can someone please help in determining how to extract the information from "RelatedTerms" tag.










share|improve this question


























    up vote
    0
    down vote

    favorite












    I am using Jupyer notebook, running python 3. My task is to extract data from XML file and convert it to json format (perhaps even save the json in an output.dat file). I am using BeautifulSoup to navigate through the nodes. I have the following data:



    <?xml version='1.0' encoding='UTF-8'?> 
    <Terms>
    <Term>
    <Title>.177 (4.5mm) Airgun</Title>
    <Description>The standard airgun calibre for international target
    shooting.</Description>
    <RelatedTerms>
    <Term>
    <Title>Shooting sport equipment</Title>
    <Relationship>Narrower Term</Relationship>
    </Term>
    </RelatedTerms>
    </Term>
    <Term>
    <Title>1 Kilometre Time Trial</Title>
    <Description>test2</Description>
    <RelatedTerms>
    <Term>
    <Title>1 Kilometre TT</Title>
    <Relationship>Used For</Relationship>
    </Term>
    <Term>
    <Title>1km Time Trial</Title>
    <Relationship>Used For</Relationship>
    </Term>
    <Term>
    <Title>1km Time Trial</Title>
    <Relationship>Used For</Relationship>
    </Term>
    <Term>
    <Title>1km TT</Title>
    <Relationship>Used For</Relationship>
    </Term>
    <Term>
    <Title>One km Time Trial</Title>
    <Relationship>Used For</Relationship>
    </Term>
    </RelatedTerms>
    </Term>


    This is the following output that I am expecting in JSON:



    {
    "thesaurus": [
    {
    "Description": "The standard airgun calibre for international target shooting.",
    "RelatedTerms": [
    {
    "Relationship": "Narrower Term",
    "Title": "Shooting sport equipment"
    }
    ],
    "Title": ".177 (4.5mm) Airgun"
    },

    {
    "Description": "test2",
    "RelatedTerms": [
    {
    "Relationship": "Used For",
    "Title": "1 Kilometre TT"
    },
    {
    "Relationship": "Used For",
    "Title": "1km Time Trial"
    },
    {
    "Relationship": "Used For",
    "Title": "1km Time Trial"
    },
    {
    "Relationship": "Used For",
    "Title": "1km TT"
    },
    {
    "Relationship": "Used For",
    "Title": "One km Time Trial"
    }
    ],
    "Title": "1 Kilometre Time Trial"
    },


    I am navigating through the tags so that I can create dictionaries as seen in the output example. Since I am new to text scraping, this is quite frustrating.



    I was able to extract the "Description" tag with the following code:



    xml_file = './xml.xml'
    btree = BeautifulSoup(open(xml_file, encoding="utf8"),"xml")
    elements = btree.find_all('Description')
    descriptionTag =
    for element in elements:
    descriptionTag.append(element.text)


    Like the above Description tag, I am not sure how to create a list of dictionaries for the information stored between the "RelatedTerms" tag.
    Ideally, I would parse all the tags to a dataframe which would then convert the data to JSON format.



    So, can someone please help in determining how to extract the information from "RelatedTerms" tag.










    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I am using Jupyer notebook, running python 3. My task is to extract data from XML file and convert it to json format (perhaps even save the json in an output.dat file). I am using BeautifulSoup to navigate through the nodes. I have the following data:



      <?xml version='1.0' encoding='UTF-8'?> 
      <Terms>
      <Term>
      <Title>.177 (4.5mm) Airgun</Title>
      <Description>The standard airgun calibre for international target
      shooting.</Description>
      <RelatedTerms>
      <Term>
      <Title>Shooting sport equipment</Title>
      <Relationship>Narrower Term</Relationship>
      </Term>
      </RelatedTerms>
      </Term>
      <Term>
      <Title>1 Kilometre Time Trial</Title>
      <Description>test2</Description>
      <RelatedTerms>
      <Term>
      <Title>1 Kilometre TT</Title>
      <Relationship>Used For</Relationship>
      </Term>
      <Term>
      <Title>1km Time Trial</Title>
      <Relationship>Used For</Relationship>
      </Term>
      <Term>
      <Title>1km Time Trial</Title>
      <Relationship>Used For</Relationship>
      </Term>
      <Term>
      <Title>1km TT</Title>
      <Relationship>Used For</Relationship>
      </Term>
      <Term>
      <Title>One km Time Trial</Title>
      <Relationship>Used For</Relationship>
      </Term>
      </RelatedTerms>
      </Term>


      This is the following output that I am expecting in JSON:



      {
      "thesaurus": [
      {
      "Description": "The standard airgun calibre for international target shooting.",
      "RelatedTerms": [
      {
      "Relationship": "Narrower Term",
      "Title": "Shooting sport equipment"
      }
      ],
      "Title": ".177 (4.5mm) Airgun"
      },

      {
      "Description": "test2",
      "RelatedTerms": [
      {
      "Relationship": "Used For",
      "Title": "1 Kilometre TT"
      },
      {
      "Relationship": "Used For",
      "Title": "1km Time Trial"
      },
      {
      "Relationship": "Used For",
      "Title": "1km Time Trial"
      },
      {
      "Relationship": "Used For",
      "Title": "1km TT"
      },
      {
      "Relationship": "Used For",
      "Title": "One km Time Trial"
      }
      ],
      "Title": "1 Kilometre Time Trial"
      },


      I am navigating through the tags so that I can create dictionaries as seen in the output example. Since I am new to text scraping, this is quite frustrating.



      I was able to extract the "Description" tag with the following code:



      xml_file = './xml.xml'
      btree = BeautifulSoup(open(xml_file, encoding="utf8"),"xml")
      elements = btree.find_all('Description')
      descriptionTag =
      for element in elements:
      descriptionTag.append(element.text)


      Like the above Description tag, I am not sure how to create a list of dictionaries for the information stored between the "RelatedTerms" tag.
      Ideally, I would parse all the tags to a dataframe which would then convert the data to JSON format.



      So, can someone please help in determining how to extract the information from "RelatedTerms" tag.










      share|improve this question













      I am using Jupyer notebook, running python 3. My task is to extract data from XML file and convert it to json format (perhaps even save the json in an output.dat file). I am using BeautifulSoup to navigate through the nodes. I have the following data:



      <?xml version='1.0' encoding='UTF-8'?> 
      <Terms>
      <Term>
      <Title>.177 (4.5mm) Airgun</Title>
      <Description>The standard airgun calibre for international target
      shooting.</Description>
      <RelatedTerms>
      <Term>
      <Title>Shooting sport equipment</Title>
      <Relationship>Narrower Term</Relationship>
      </Term>
      </RelatedTerms>
      </Term>
      <Term>
      <Title>1 Kilometre Time Trial</Title>
      <Description>test2</Description>
      <RelatedTerms>
      <Term>
      <Title>1 Kilometre TT</Title>
      <Relationship>Used For</Relationship>
      </Term>
      <Term>
      <Title>1km Time Trial</Title>
      <Relationship>Used For</Relationship>
      </Term>
      <Term>
      <Title>1km Time Trial</Title>
      <Relationship>Used For</Relationship>
      </Term>
      <Term>
      <Title>1km TT</Title>
      <Relationship>Used For</Relationship>
      </Term>
      <Term>
      <Title>One km Time Trial</Title>
      <Relationship>Used For</Relationship>
      </Term>
      </RelatedTerms>
      </Term>


      This is the following output that I am expecting in JSON:



      {
      "thesaurus": [
      {
      "Description": "The standard airgun calibre for international target shooting.",
      "RelatedTerms": [
      {
      "Relationship": "Narrower Term",
      "Title": "Shooting sport equipment"
      }
      ],
      "Title": ".177 (4.5mm) Airgun"
      },

      {
      "Description": "test2",
      "RelatedTerms": [
      {
      "Relationship": "Used For",
      "Title": "1 Kilometre TT"
      },
      {
      "Relationship": "Used For",
      "Title": "1km Time Trial"
      },
      {
      "Relationship": "Used For",
      "Title": "1km Time Trial"
      },
      {
      "Relationship": "Used For",
      "Title": "1km TT"
      },
      {
      "Relationship": "Used For",
      "Title": "One km Time Trial"
      }
      ],
      "Title": "1 Kilometre Time Trial"
      },


      I am navigating through the tags so that I can create dictionaries as seen in the output example. Since I am new to text scraping, this is quite frustrating.



      I was able to extract the "Description" tag with the following code:



      xml_file = './xml.xml'
      btree = BeautifulSoup(open(xml_file, encoding="utf8"),"xml")
      elements = btree.find_all('Description')
      descriptionTag =
      for element in elements:
      descriptionTag.append(element.text)


      Like the above Description tag, I am not sure how to create a list of dictionaries for the information stored between the "RelatedTerms" tag.
      Ideally, I would parse all the tags to a dataframe which would then convert the data to JSON format.



      So, can someone please help in determining how to extract the information from "RelatedTerms" tag.







      json xml beautifulsoup






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 16 hours ago









      Timetraveller

      117114




      117114
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          to extract RelatedTerms first you have to extract top Term element using btree.select('Terms > Term') now you can loop it and extract Term inside RelatedTerms using term.select('RelatedTerms > Term')



          import json
          from bs4 import BeautifulSoup

          xml_file = './xml.xml'
          btree = BeautifulSoup(open(xml_file, 'r'), "xml")
          Terms = btree.select('Terms > Term')
          jsonObj = {"thesaurus": }

          for term in Terms:
          termDetail = {
          "Description": term.find('Description').text,
          "Title": term.find('Title').text
          }
          RelatedTerms = term.select('RelatedTerms > Term')
          if RelatedTerms:
          termDetail["RelatedTerms"] =
          for rterm in RelatedTerms:
          termDetail["RelatedTerms"].append({
          "Title": rterm.find('Title').text,
          "Relationship": rterm.find('Relationship').text
          })
          jsonObj["thesaurus"].append(termDetail)

          print json.dumps(jsonObj, indent=4)





          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237663%2fcreate-json-with-xml-file-using-beautifulsoup%23new-answer', 'question_page');
            }
            );

            Post as a guest
































            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            0
            down vote













            to extract RelatedTerms first you have to extract top Term element using btree.select('Terms > Term') now you can loop it and extract Term inside RelatedTerms using term.select('RelatedTerms > Term')



            import json
            from bs4 import BeautifulSoup

            xml_file = './xml.xml'
            btree = BeautifulSoup(open(xml_file, 'r'), "xml")
            Terms = btree.select('Terms > Term')
            jsonObj = {"thesaurus": }

            for term in Terms:
            termDetail = {
            "Description": term.find('Description').text,
            "Title": term.find('Title').text
            }
            RelatedTerms = term.select('RelatedTerms > Term')
            if RelatedTerms:
            termDetail["RelatedTerms"] =
            for rterm in RelatedTerms:
            termDetail["RelatedTerms"].append({
            "Title": rterm.find('Title').text,
            "Relationship": rterm.find('Relationship').text
            })
            jsonObj["thesaurus"].append(termDetail)

            print json.dumps(jsonObj, indent=4)





            share|improve this answer



























              up vote
              0
              down vote













              to extract RelatedTerms first you have to extract top Term element using btree.select('Terms > Term') now you can loop it and extract Term inside RelatedTerms using term.select('RelatedTerms > Term')



              import json
              from bs4 import BeautifulSoup

              xml_file = './xml.xml'
              btree = BeautifulSoup(open(xml_file, 'r'), "xml")
              Terms = btree.select('Terms > Term')
              jsonObj = {"thesaurus": }

              for term in Terms:
              termDetail = {
              "Description": term.find('Description').text,
              "Title": term.find('Title').text
              }
              RelatedTerms = term.select('RelatedTerms > Term')
              if RelatedTerms:
              termDetail["RelatedTerms"] =
              for rterm in RelatedTerms:
              termDetail["RelatedTerms"].append({
              "Title": rterm.find('Title').text,
              "Relationship": rterm.find('Relationship').text
              })
              jsonObj["thesaurus"].append(termDetail)

              print json.dumps(jsonObj, indent=4)





              share|improve this answer

























                up vote
                0
                down vote










                up vote
                0
                down vote









                to extract RelatedTerms first you have to extract top Term element using btree.select('Terms > Term') now you can loop it and extract Term inside RelatedTerms using term.select('RelatedTerms > Term')



                import json
                from bs4 import BeautifulSoup

                xml_file = './xml.xml'
                btree = BeautifulSoup(open(xml_file, 'r'), "xml")
                Terms = btree.select('Terms > Term')
                jsonObj = {"thesaurus": }

                for term in Terms:
                termDetail = {
                "Description": term.find('Description').text,
                "Title": term.find('Title').text
                }
                RelatedTerms = term.select('RelatedTerms > Term')
                if RelatedTerms:
                termDetail["RelatedTerms"] =
                for rterm in RelatedTerms:
                termDetail["RelatedTerms"].append({
                "Title": rterm.find('Title').text,
                "Relationship": rterm.find('Relationship').text
                })
                jsonObj["thesaurus"].append(termDetail)

                print json.dumps(jsonObj, indent=4)





                share|improve this answer














                to extract RelatedTerms first you have to extract top Term element using btree.select('Terms > Term') now you can loop it and extract Term inside RelatedTerms using term.select('RelatedTerms > Term')



                import json
                from bs4 import BeautifulSoup

                xml_file = './xml.xml'
                btree = BeautifulSoup(open(xml_file, 'r'), "xml")
                Terms = btree.select('Terms > Term')
                jsonObj = {"thesaurus": }

                for term in Terms:
                termDetail = {
                "Description": term.find('Description').text,
                "Title": term.find('Title').text
                }
                RelatedTerms = term.select('RelatedTerms > Term')
                if RelatedTerms:
                termDetail["RelatedTerms"] =
                for rterm in RelatedTerms:
                termDetail["RelatedTerms"].append({
                "Title": rterm.find('Title').text,
                "Relationship": rterm.find('Relationship').text
                })
                jsonObj["thesaurus"].append(termDetail)

                print json.dumps(jsonObj, indent=4)






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited 13 hours ago

























                answered 14 hours ago









                ewwink

                5,33922231




                5,33922231






























                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237663%2fcreate-json-with-xml-file-using-beautifulsoup%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest




















































































                    Popular posts from this blog

                    List item for chat from Array inside array React Native

                    Thiostrepton

                    Caerphilly