Java, Unknown characters ﾝ □ Xml response

up vote
1
down vote

favorite

I am currently trying to extract from an Xml file (that is accessible via this address: http://mobilite.euroairport.com/services/getDepartureAirports?language=French) a list of airports.
My problem is that the 'Ü' that should appear in "DÜSSELDORF" is impossible to read (even Ie or firefox directly).
I obtain something like this:
D□SSELDORF or D SSELDORF or D?SSELDORF

The following is the code that I used to try to find the encoding of this file (n is the string that contain "DÜSSELDORF"):

        byte bytes = n.getBytes();            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("ASCII");            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("Cp1252");           

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("UTF-8");            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("ISO8859_1");            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("ISO8859_2");

And this is the result (in Logcat for android) :

        10-08 09:41:30.557: W/tagtag(1506): D □ ﾝ

        10-08 09:41:30.557: W/tagtag(1506): D ? S

        10-08 09:41:30.567: W/tagtag(1506): D ﾝ S

        10-08 09:41:30.567: W/tagtag(1506): D □ ﾝ

        10-08 09:41:30.577: W/tagtag(1506): D ﾝ S

        10-08 09:41:30.637: W/tagtag(1506): D ﾝ S

My question is: do I make a mistake while trying to read this string, or is it a problem due to the server?

edited Nov 11 at 3:35

Cœur

17k9102140

asked Oct 8 '12 at 14:59

sam

2,22711933

2

Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
– Johan Sjöberg
Oct 8 '12 at 15:05

2

On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
– linski
Oct 8 '12 at 15:10

yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
– sam
Oct 8 '12 at 15:12

Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
– Andrew Thompson
Oct 8 '12 at 15:16

sorry i didn't think
– sam
Oct 9 '12 at 15:11

add a comment |

up vote
1
down vote

favorite

The following is the code that I used to try to find the encoding of this file (n is the string that contain "DÜSSELDORF"):

        byte bytes = n.getBytes();            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("ASCII");            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("Cp1252");           

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("UTF-8");            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("ISO8859_1");            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("ISO8859_2");

And this is the result (in Logcat for android) :

        10-08 09:41:30.557: W/tagtag(1506): D □ ﾝ

        10-08 09:41:30.557: W/tagtag(1506): D ? S

        10-08 09:41:30.567: W/tagtag(1506): D ﾝ S

        10-08 09:41:30.567: W/tagtag(1506): D □ ﾝ

        10-08 09:41:30.577: W/tagtag(1506): D ﾝ S

        10-08 09:41:30.637: W/tagtag(1506): D ﾝ S

My question is: do I make a mistake while trying to read this string, or is it a problem due to the server?

edited Nov 11 at 3:35

Cœur

17k9102140

asked Oct 8 '12 at 14:59

sam

2,22711933

2

Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
– Johan Sjöberg
Oct 8 '12 at 15:05

2

On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
– linski
Oct 8 '12 at 15:10

yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
– sam
Oct 8 '12 at 15:12

Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
– Andrew Thompson
Oct 8 '12 at 15:16

sorry i didn't think
– sam
Oct 9 '12 at 15:11

add a comment |

up vote
1
down vote

favorite

The following is the code that I used to try to find the encoding of this file (n is the string that contain "DÜSSELDORF"):

        byte bytes = n.getBytes();            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("ASCII");            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("Cp1252");           

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("UTF-8");            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("ISO8859_1");            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("ISO8859_2");

And this is the result (in Logcat for android) :

        10-08 09:41:30.557: W/tagtag(1506): D □ ﾝ

        10-08 09:41:30.557: W/tagtag(1506): D ? S

        10-08 09:41:30.567: W/tagtag(1506): D ﾝ S

        10-08 09:41:30.567: W/tagtag(1506): D □ ﾝ

        10-08 09:41:30.577: W/tagtag(1506): D ﾝ S

        10-08 09:41:30.637: W/tagtag(1506): D ﾝ S

My question is: do I make a mistake while trying to read this string, or is it a problem due to the server?

edited Nov 11 at 3:35

Cœur

17k9102140

asked Oct 8 '12 at 14:59

sam

2,22711933

The following is the code that I used to try to find the encoding of this file (n is the string that contain "DÜSSELDORF"):

        byte bytes = n.getBytes();            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("ASCII");            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("Cp1252");           

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("UTF-8");            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("ISO8859_1");            

        Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);

        bytes = n.getBytes("ISO8859_2");

And this is the result (in Logcat for android) :

        10-08 09:41:30.557: W/tagtag(1506): D □ ﾝ

        10-08 09:41:30.557: W/tagtag(1506): D ? S

        10-08 09:41:30.567: W/tagtag(1506): D ﾝ S

        10-08 09:41:30.567: W/tagtag(1506): D □ ﾝ

        10-08 09:41:30.577: W/tagtag(1506): D ﾝ S

        10-08 09:41:30.637: W/tagtag(1506): D ﾝ S

My question is: do I make a mistake while trying to read this string, or is it a problem due to the server?

java html xml encoding utf-8

edited Nov 11 at 3:35

Cœur

17k9102140

asked Oct 8 '12 at 14:59

sam

2,22711933

edited Nov 11 at 3:35

Cœur

17k9102140

asked Oct 8 '12 at 14:59

sam

2,22711933

edited Nov 11 at 3:35

Cœur

17k9102140

edited Nov 11 at 3:35

Cœur

17k9102140

edited Nov 11 at 3:35

Cœur

17k9102140

asked Oct 8 '12 at 14:59

sam

2,22711933

asked Oct 8 '12 at 14:59

sam

2,22711933

asked Oct 8 '12 at 14:59

sam

2,22711933

2

Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
– Johan Sjöberg
Oct 8 '12 at 15:05

2

On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
– linski
Oct 8 '12 at 15:10

yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
– sam
Oct 8 '12 at 15:12

Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
– Andrew Thompson
Oct 8 '12 at 15:16

sorry i didn't think
– sam
Oct 9 '12 at 15:11

add a comment |

2

Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
– Johan Sjöberg
Oct 8 '12 at 15:05

2

On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
– linski
Oct 8 '12 at 15:10

yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
– sam
Oct 8 '12 at 15:12

Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
– Andrew Thompson
Oct 8 '12 at 15:16

sorry i didn't think
– sam
Oct 9 '12 at 15:11

Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
– Johan Sjöberg
Oct 8 '12 at 15:05

On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
– linski
Oct 8 '12 at 15:10

yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
– sam
Oct 8 '12 at 15:12

Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
– Andrew Thompson
Oct 8 '12 at 15:16

sorry i didn't think
– sam
Oct 9 '12 at 15:11

add a comment |

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

Definitley server/ (data service) (misconfiguration issue) / bug.

Server returns this line in HTML/XML response:

Content-Type: application/xml;charset=UTF-8rn

I just inspected byte dump of the xml, this is how wireshark represents "DSSELDORF":

D..SSELDORF

in hex dump (see UTF-8 code table for hex value c2 9d):

44 c2 9d 53 53

which would be:

44 - D

53 - S

and
C2 9D

gets interpreted as control character which is also known as non printable character - hence the "missing" U - which also explains your logcat output.

edited Oct 8 '12 at 15:30

answered Oct 8 '12 at 15:24

linski

4,33131429

Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
– sam
Oct 8 '12 at 17:56

np, glad to help! :)
– linski
Oct 8 '12 at 18:02

add a comment |

up vote
0
down vote

I think that i found the problem.
I manage to get access to the webservice code and I discovered that the file that is used to update the bdd of the webservice is encoded in ANSI.
This file is read using this code :

InputStreamReader input = new InputStreamReader(new FileInputStream("vols"), "UTF-8");

BufferedReader buffer = new BufferedReader(input);

I guess that the problem is here, so i will ask the client to change the encoding of this file but i am not sure that it is the only reason for my problem.

thanks for your help guys.

answered Oct 17 '12 at 12:55

sam

2,22711933

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f12784307%2fjava-unknown-characters-%25ef%25be%259d-xml-response%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

Definitley server/ (data service) (misconfiguration issue) / bug.

Server returns this line in HTML/XML response:

Content-Type: application/xml;charset=UTF-8rn

I just inspected byte dump of the xml, this is how wireshark represents "DSSELDORF":

D..SSELDORF

in hex dump (see UTF-8 code table for hex value c2 9d):

44 c2 9d 53 53

which would be:

44 - D

53 - S

and
C2 9D

gets interpreted as control character which is also known as non printable character - hence the "missing" U - which also explains your logcat output.

edited Oct 8 '12 at 15:30

answered Oct 8 '12 at 15:24

linski

4,33131429

Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
– sam
Oct 8 '12 at 17:56

np, glad to help! :)
– linski
Oct 8 '12 at 18:02

add a comment |

up vote
2
down vote

accepted

Definitley server/ (data service) (misconfiguration issue) / bug.

Server returns this line in HTML/XML response:

Content-Type: application/xml;charset=UTF-8rn

I just inspected byte dump of the xml, this is how wireshark represents "DSSELDORF":

D..SSELDORF

in hex dump (see UTF-8 code table for hex value c2 9d):

44 c2 9d 53 53

which would be:

44 - D

53 - S

and
C2 9D

gets interpreted as control character which is also known as non printable character - hence the "missing" U - which also explains your logcat output.

edited Oct 8 '12 at 15:30

answered Oct 8 '12 at 15:24

linski

4,33131429

Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
– sam
Oct 8 '12 at 17:56

np, glad to help! :)
– linski
Oct 8 '12 at 18:02

add a comment |

up vote
2
down vote

accepted

Definitley server/ (data service) (misconfiguration issue) / bug.

Server returns this line in HTML/XML response:

Content-Type: application/xml;charset=UTF-8rn

I just inspected byte dump of the xml, this is how wireshark represents "DSSELDORF":

D..SSELDORF

in hex dump (see UTF-8 code table for hex value c2 9d):

44 c2 9d 53 53

which would be:

44 - D

53 - S

and
C2 9D

gets interpreted as control character which is also known as non printable character - hence the "missing" U - which also explains your logcat output.

edited Oct 8 '12 at 15:30

answered Oct 8 '12 at 15:24

linski

4,33131429

Definitley server/ (data service) (misconfiguration issue) / bug.

Server returns this line in HTML/XML response:

Content-Type: application/xml;charset=UTF-8rn

I just inspected byte dump of the xml, this is how wireshark represents "DSSELDORF":

D..SSELDORF

in hex dump (see UTF-8 code table for hex value c2 9d):

44 c2 9d 53 53

which would be:

44 - D

53 - S

and
C2 9D

gets interpreted as control character which is also known as non printable character - hence the "missing" U - which also explains your logcat output.

edited Oct 8 '12 at 15:30

answered Oct 8 '12 at 15:24

linski

4,33131429

edited Oct 8 '12 at 15:30

answered Oct 8 '12 at 15:24

linski

4,33131429

answered Oct 8 '12 at 15:24

linski

4,33131429

answered Oct 8 '12 at 15:24

linski

4,33131429

Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
– sam
Oct 8 '12 at 17:56

np, glad to help! :)
– linski
Oct 8 '12 at 18:02

add a comment |

Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
– sam
Oct 8 '12 at 17:56

np, glad to help! :)
– linski
Oct 8 '12 at 18:02

Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
– sam
Oct 8 '12 at 17:56

np, glad to help! :)
– linski
Oct 8 '12 at 18:02

add a comment |

up vote
0
down vote

InputStreamReader input = new InputStreamReader(new FileInputStream("vols"), "UTF-8");

BufferedReader buffer = new BufferedReader(input);

I guess that the problem is here, so i will ask the client to change the encoding of this file but i am not sure that it is the only reason for my problem.

thanks for your help guys.

answered Oct 17 '12 at 12:55

sam

2,22711933

add a comment |

up vote
0
down vote

InputStreamReader input = new InputStreamReader(new FileInputStream("vols"), "UTF-8");

BufferedReader buffer = new BufferedReader(input);

I guess that the problem is here, so i will ask the client to change the encoding of this file but i am not sure that it is the only reason for my problem.

thanks for your help guys.

answered Oct 17 '12 at 12:55

sam

2,22711933

add a comment |

up vote
0
down vote

InputStreamReader input = new InputStreamReader(new FileInputStream("vols"), "UTF-8");

BufferedReader buffer = new BufferedReader(input);

I guess that the problem is here, so i will ask the client to change the encoding of this file but i am not sure that it is the only reason for my problem.

thanks for your help guys.

answered Oct 17 '12 at 12:55

sam

2,22711933

InputStreamReader input = new InputStreamReader(new FileInputStream("vols"), "UTF-8");

BufferedReader buffer = new BufferedReader(input);

I guess that the problem is here, so i will ask the client to change the encoding of this file but i am not sure that it is the only reason for my problem.

thanks for your help guys.

answered Oct 17 '12 at 12:55

sam

2,22711933

answered Oct 17 '12 at 12:55

sam

2,22711933

answered Oct 17 '12 at 12:55

sam

2,22711933

answered Oct 17 '12 at 12:55

sam

2,22711933

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

m5VGV0DJkq6igseGPHoHIvJP36Hvvf

搜尋此網誌

Vfrdtyky