Java, Unknown characters ン □ Xml response











up vote
1
down vote

favorite












I am currently trying to extract from an Xml file (that is accessible via this address: http://mobilite.euroairport.com/services/getDepartureAirports?language=French) a list of airports.
My problem is that the 'Ü' that should appear in "DÜSSELDORF" is impossible to read (even Ie or firefox directly).
I obtain something like this:
D□SSELDORF or D SSELDORF or D?SSELDORF



The following is the code that I used to try to find the encoding of this file (n is the string that contain "DÜSSELDORF"):



        byte bytes = n.getBytes();            
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ASCII");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("Cp1252");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("UTF-8");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_1");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_2");


And this is the result (in Logcat for android) :



        10-08 09:41:30.557: W/tagtag(1506): D □ ン
10-08 09:41:30.557: W/tagtag(1506): D ? S
10-08 09:41:30.567: W/tagtag(1506): D ン S
10-08 09:41:30.567: W/tagtag(1506): D □ ン
10-08 09:41:30.577: W/tagtag(1506): D ン S
10-08 09:41:30.637: W/tagtag(1506): D ン S


My question is: do I make a mistake while trying to read this string, or is it a problem due to the server?










share|improve this question




















  • 2




    Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
    – Johan Sjöberg
    Oct 8 '12 at 15:05






  • 2




    On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
    – linski
    Oct 8 '12 at 15:10












  • yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
    – sam
    Oct 8 '12 at 15:12












  • Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
    – Andrew Thompson
    Oct 8 '12 at 15:16










  • sorry i didn't think
    – sam
    Oct 9 '12 at 15:11















up vote
1
down vote

favorite












I am currently trying to extract from an Xml file (that is accessible via this address: http://mobilite.euroairport.com/services/getDepartureAirports?language=French) a list of airports.
My problem is that the 'Ü' that should appear in "DÜSSELDORF" is impossible to read (even Ie or firefox directly).
I obtain something like this:
D□SSELDORF or D SSELDORF or D?SSELDORF



The following is the code that I used to try to find the encoding of this file (n is the string that contain "DÜSSELDORF"):



        byte bytes = n.getBytes();            
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ASCII");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("Cp1252");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("UTF-8");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_1");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_2");


And this is the result (in Logcat for android) :



        10-08 09:41:30.557: W/tagtag(1506): D □ ン
10-08 09:41:30.557: W/tagtag(1506): D ? S
10-08 09:41:30.567: W/tagtag(1506): D ン S
10-08 09:41:30.567: W/tagtag(1506): D □ ン
10-08 09:41:30.577: W/tagtag(1506): D ン S
10-08 09:41:30.637: W/tagtag(1506): D ン S


My question is: do I make a mistake while trying to read this string, or is it a problem due to the server?










share|improve this question




















  • 2




    Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
    – Johan Sjöberg
    Oct 8 '12 at 15:05






  • 2




    On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
    – linski
    Oct 8 '12 at 15:10












  • yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
    – sam
    Oct 8 '12 at 15:12












  • Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
    – Andrew Thompson
    Oct 8 '12 at 15:16










  • sorry i didn't think
    – sam
    Oct 9 '12 at 15:11













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I am currently trying to extract from an Xml file (that is accessible via this address: http://mobilite.euroairport.com/services/getDepartureAirports?language=French) a list of airports.
My problem is that the 'Ü' that should appear in "DÜSSELDORF" is impossible to read (even Ie or firefox directly).
I obtain something like this:
D□SSELDORF or D SSELDORF or D?SSELDORF



The following is the code that I used to try to find the encoding of this file (n is the string that contain "DÜSSELDORF"):



        byte bytes = n.getBytes();            
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ASCII");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("Cp1252");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("UTF-8");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_1");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_2");


And this is the result (in Logcat for android) :



        10-08 09:41:30.557: W/tagtag(1506): D □ ン
10-08 09:41:30.557: W/tagtag(1506): D ? S
10-08 09:41:30.567: W/tagtag(1506): D ン S
10-08 09:41:30.567: W/tagtag(1506): D □ ン
10-08 09:41:30.577: W/tagtag(1506): D ン S
10-08 09:41:30.637: W/tagtag(1506): D ン S


My question is: do I make a mistake while trying to read this string, or is it a problem due to the server?










share|improve this question















I am currently trying to extract from an Xml file (that is accessible via this address: http://mobilite.euroairport.com/services/getDepartureAirports?language=French) a list of airports.
My problem is that the 'Ü' that should appear in "DÜSSELDORF" is impossible to read (even Ie or firefox directly).
I obtain something like this:
D□SSELDORF or D SSELDORF or D?SSELDORF



The following is the code that I used to try to find the encoding of this file (n is the string that contain "DÜSSELDORF"):



        byte bytes = n.getBytes();            
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ASCII");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("Cp1252");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("UTF-8");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_1");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_2");


And this is the result (in Logcat for android) :



        10-08 09:41:30.557: W/tagtag(1506): D □ ン
10-08 09:41:30.557: W/tagtag(1506): D ? S
10-08 09:41:30.567: W/tagtag(1506): D ン S
10-08 09:41:30.567: W/tagtag(1506): D □ ン
10-08 09:41:30.577: W/tagtag(1506): D ン S
10-08 09:41:30.637: W/tagtag(1506): D ン S


My question is: do I make a mistake while trying to read this string, or is it a problem due to the server?







java html xml encoding utf-8






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 3:35









Cœur

17k9102140




17k9102140










asked Oct 8 '12 at 14:59









sam

2,22711933




2,22711933








  • 2




    Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
    – Johan Sjöberg
    Oct 8 '12 at 15:05






  • 2




    On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
    – linski
    Oct 8 '12 at 15:10












  • yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
    – sam
    Oct 8 '12 at 15:12












  • Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
    – Andrew Thompson
    Oct 8 '12 at 15:16










  • sorry i didn't think
    – sam
    Oct 9 '12 at 15:11














  • 2




    Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
    – Johan Sjöberg
    Oct 8 '12 at 15:05






  • 2




    On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
    – linski
    Oct 8 '12 at 15:10












  • yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
    – sam
    Oct 8 '12 at 15:12












  • Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
    – Andrew Thompson
    Oct 8 '12 at 15:16










  • sorry i didn't think
    – sam
    Oct 9 '12 at 15:11








2




2




Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
– Johan Sjöberg
Oct 8 '12 at 15:05




Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
– Johan Sjöberg
Oct 8 '12 at 15:05




2




2




On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
– linski
Oct 8 '12 at 15:10






On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
– linski
Oct 8 '12 at 15:10














yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
– sam
Oct 8 '12 at 15:12






yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
– sam
Oct 8 '12 at 15:12














Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
– Andrew Thompson
Oct 8 '12 at 15:16




Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
– Andrew Thompson
Oct 8 '12 at 15:16












sorry i didn't think
– sam
Oct 9 '12 at 15:11




sorry i didn't think
– sam
Oct 9 '12 at 15:11












2 Answers
2






active

oldest

votes

















up vote
2
down vote



accepted










Definitley server/ (data service) (misconfiguration issue) / bug.



Server returns this line in HTML/XML response:



Content-Type: application/xml;charset=UTF-8rn


I just inspected byte dump of the xml, this is how wireshark represents "DSSELDORF":



D..SSELDORF


in hex dump (see UTF-8 code table for hex value c2 9d):



44 c2 9d 53 53


which would be:



44 - D
53 - S


and
C2 9D



gets interpreted as control character which is also known as non printable character - hence the "missing" U - which also explains your logcat output.






share|improve this answer























  • Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
    – sam
    Oct 8 '12 at 17:56










  • np, glad to help! :)
    – linski
    Oct 8 '12 at 18:02


















up vote
0
down vote













I think that i found the problem.
I manage to get access to the webservice code and I discovered that the file that is used to update the bdd of the webservice is encoded in ANSI.
This file is read using this code :



InputStreamReader input = new InputStreamReader(new FileInputStream("vols"), "UTF-8");
BufferedReader buffer = new BufferedReader(input);


I guess that the problem is here, so i will ask the client to change the encoding of this file but i am not sure that it is the only reason for my problem.



thanks for your help guys.






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f12784307%2fjava-unknown-characters-%25ef%25be%259d-xml-response%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote



    accepted










    Definitley server/ (data service) (misconfiguration issue) / bug.



    Server returns this line in HTML/XML response:



    Content-Type: application/xml;charset=UTF-8rn


    I just inspected byte dump of the xml, this is how wireshark represents "DSSELDORF":



    D..SSELDORF


    in hex dump (see UTF-8 code table for hex value c2 9d):



    44 c2 9d 53 53


    which would be:



    44 - D
    53 - S


    and
    C2 9D



    gets interpreted as control character which is also known as non printable character - hence the "missing" U - which also explains your logcat output.






    share|improve this answer























    • Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
      – sam
      Oct 8 '12 at 17:56










    • np, glad to help! :)
      – linski
      Oct 8 '12 at 18:02















    up vote
    2
    down vote



    accepted










    Definitley server/ (data service) (misconfiguration issue) / bug.



    Server returns this line in HTML/XML response:



    Content-Type: application/xml;charset=UTF-8rn


    I just inspected byte dump of the xml, this is how wireshark represents "DSSELDORF":



    D..SSELDORF


    in hex dump (see UTF-8 code table for hex value c2 9d):



    44 c2 9d 53 53


    which would be:



    44 - D
    53 - S


    and
    C2 9D



    gets interpreted as control character which is also known as non printable character - hence the "missing" U - which also explains your logcat output.






    share|improve this answer























    • Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
      – sam
      Oct 8 '12 at 17:56










    • np, glad to help! :)
      – linski
      Oct 8 '12 at 18:02













    up vote
    2
    down vote



    accepted







    up vote
    2
    down vote



    accepted






    Definitley server/ (data service) (misconfiguration issue) / bug.



    Server returns this line in HTML/XML response:



    Content-Type: application/xml;charset=UTF-8rn


    I just inspected byte dump of the xml, this is how wireshark represents "DSSELDORF":



    D..SSELDORF


    in hex dump (see UTF-8 code table for hex value c2 9d):



    44 c2 9d 53 53


    which would be:



    44 - D
    53 - S


    and
    C2 9D



    gets interpreted as control character which is also known as non printable character - hence the "missing" U - which also explains your logcat output.






    share|improve this answer














    Definitley server/ (data service) (misconfiguration issue) / bug.



    Server returns this line in HTML/XML response:



    Content-Type: application/xml;charset=UTF-8rn


    I just inspected byte dump of the xml, this is how wireshark represents "DSSELDORF":



    D..SSELDORF


    in hex dump (see UTF-8 code table for hex value c2 9d):



    44 c2 9d 53 53


    which would be:



    44 - D
    53 - S


    and
    C2 9D



    gets interpreted as control character which is also known as non printable character - hence the "missing" U - which also explains your logcat output.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Oct 8 '12 at 15:30

























    answered Oct 8 '12 at 15:24









    linski

    4,33131429




    4,33131429












    • Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
      – sam
      Oct 8 '12 at 17:56










    • np, glad to help! :)
      – linski
      Oct 8 '12 at 18:02


















    • Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
      – sam
      Oct 8 '12 at 17:56










    • np, glad to help! :)
      – linski
      Oct 8 '12 at 18:02
















    Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
    – sam
    Oct 8 '12 at 17:56




    Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
    – sam
    Oct 8 '12 at 17:56












    np, glad to help! :)
    – linski
    Oct 8 '12 at 18:02




    np, glad to help! :)
    – linski
    Oct 8 '12 at 18:02












    up vote
    0
    down vote













    I think that i found the problem.
    I manage to get access to the webservice code and I discovered that the file that is used to update the bdd of the webservice is encoded in ANSI.
    This file is read using this code :



    InputStreamReader input = new InputStreamReader(new FileInputStream("vols"), "UTF-8");
    BufferedReader buffer = new BufferedReader(input);


    I guess that the problem is here, so i will ask the client to change the encoding of this file but i am not sure that it is the only reason for my problem.



    thanks for your help guys.






    share|improve this answer

























      up vote
      0
      down vote













      I think that i found the problem.
      I manage to get access to the webservice code and I discovered that the file that is used to update the bdd of the webservice is encoded in ANSI.
      This file is read using this code :



      InputStreamReader input = new InputStreamReader(new FileInputStream("vols"), "UTF-8");
      BufferedReader buffer = new BufferedReader(input);


      I guess that the problem is here, so i will ask the client to change the encoding of this file but i am not sure that it is the only reason for my problem.



      thanks for your help guys.






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        I think that i found the problem.
        I manage to get access to the webservice code and I discovered that the file that is used to update the bdd of the webservice is encoded in ANSI.
        This file is read using this code :



        InputStreamReader input = new InputStreamReader(new FileInputStream("vols"), "UTF-8");
        BufferedReader buffer = new BufferedReader(input);


        I guess that the problem is here, so i will ask the client to change the encoding of this file but i am not sure that it is the only reason for my problem.



        thanks for your help guys.






        share|improve this answer












        I think that i found the problem.
        I manage to get access to the webservice code and I discovered that the file that is used to update the bdd of the webservice is encoded in ANSI.
        This file is read using this code :



        InputStreamReader input = new InputStreamReader(new FileInputStream("vols"), "UTF-8");
        BufferedReader buffer = new BufferedReader(input);


        I guess that the problem is here, so i will ask the client to change the encoding of this file but i am not sure that it is the only reason for my problem.



        thanks for your help guys.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Oct 17 '12 at 12:55









        sam

        2,22711933




        2,22711933






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f12784307%2fjava-unknown-characters-%25ef%25be%259d-xml-response%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Bressuire

            Vorschmack

            Quarantine