Java, Unknown characters ン □ Xml response
up vote
1
down vote
favorite
I am currently trying to extract from an Xml file (that is accessible via this address: http://mobilite.euroairport.com/services/getDepartureAirports?language=French) a list of airports.
My problem is that the 'Ü' that should appear in "DÜSSELDORF" is impossible to read (even Ie or firefox directly).
I obtain something like this:
D□SSELDORF or D SSELDORF or D?SSELDORF
The following is the code that I used to try to find the encoding of this file (n is the string that contain "DÜSSELDORF"):
byte bytes = n.getBytes();
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ASCII");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("Cp1252");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("UTF-8");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_1");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_2");
And this is the result (in Logcat for android) :
10-08 09:41:30.557: W/tagtag(1506): D □ ン
10-08 09:41:30.557: W/tagtag(1506): D ? S
10-08 09:41:30.567: W/tagtag(1506): D ン S
10-08 09:41:30.567: W/tagtag(1506): D □ ン
10-08 09:41:30.577: W/tagtag(1506): D ン S
10-08 09:41:30.637: W/tagtag(1506): D ン S
My question is: do I make a mistake while trying to read this string, or is it a problem due to the server?
java html xml encoding utf-8
add a comment |
up vote
1
down vote
favorite
I am currently trying to extract from an Xml file (that is accessible via this address: http://mobilite.euroairport.com/services/getDepartureAirports?language=French) a list of airports.
My problem is that the 'Ü' that should appear in "DÜSSELDORF" is impossible to read (even Ie or firefox directly).
I obtain something like this:
D□SSELDORF or D SSELDORF or D?SSELDORF
The following is the code that I used to try to find the encoding of this file (n is the string that contain "DÜSSELDORF"):
byte bytes = n.getBytes();
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ASCII");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("Cp1252");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("UTF-8");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_1");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_2");
And this is the result (in Logcat for android) :
10-08 09:41:30.557: W/tagtag(1506): D □ ン
10-08 09:41:30.557: W/tagtag(1506): D ? S
10-08 09:41:30.567: W/tagtag(1506): D ン S
10-08 09:41:30.567: W/tagtag(1506): D □ ン
10-08 09:41:30.577: W/tagtag(1506): D ン S
10-08 09:41:30.637: W/tagtag(1506): D ン S
My question is: do I make a mistake while trying to read this string, or is it a problem due to the server?
java html xml encoding utf-8
2
Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
– Johan Sjöberg
Oct 8 '12 at 15:05
2
On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
– linski
Oct 8 '12 at 15:10
yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
– sam
Oct 8 '12 at 15:12
Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
– Andrew Thompson
Oct 8 '12 at 15:16
sorry i didn't think
– sam
Oct 9 '12 at 15:11
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I am currently trying to extract from an Xml file (that is accessible via this address: http://mobilite.euroairport.com/services/getDepartureAirports?language=French) a list of airports.
My problem is that the 'Ü' that should appear in "DÜSSELDORF" is impossible to read (even Ie or firefox directly).
I obtain something like this:
D□SSELDORF or D SSELDORF or D?SSELDORF
The following is the code that I used to try to find the encoding of this file (n is the string that contain "DÜSSELDORF"):
byte bytes = n.getBytes();
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ASCII");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("Cp1252");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("UTF-8");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_1");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_2");
And this is the result (in Logcat for android) :
10-08 09:41:30.557: W/tagtag(1506): D □ ン
10-08 09:41:30.557: W/tagtag(1506): D ? S
10-08 09:41:30.567: W/tagtag(1506): D ン S
10-08 09:41:30.567: W/tagtag(1506): D □ ン
10-08 09:41:30.577: W/tagtag(1506): D ン S
10-08 09:41:30.637: W/tagtag(1506): D ン S
My question is: do I make a mistake while trying to read this string, or is it a problem due to the server?
java html xml encoding utf-8
I am currently trying to extract from an Xml file (that is accessible via this address: http://mobilite.euroairport.com/services/getDepartureAirports?language=French) a list of airports.
My problem is that the 'Ü' that should appear in "DÜSSELDORF" is impossible to read (even Ie or firefox directly).
I obtain something like this:
D□SSELDORF or D SSELDORF or D?SSELDORF
The following is the code that I used to try to find the encoding of this file (n is the string that contain "DÜSSELDORF"):
byte bytes = n.getBytes();
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ASCII");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("Cp1252");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("UTF-8");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_1");
Log.w("tagtag", (char) bytes[0] + " "+(char) bytes[1]+" "+(char) bytes[2]);
bytes = n.getBytes("ISO8859_2");
And this is the result (in Logcat for android) :
10-08 09:41:30.557: W/tagtag(1506): D □ ン
10-08 09:41:30.557: W/tagtag(1506): D ? S
10-08 09:41:30.567: W/tagtag(1506): D ン S
10-08 09:41:30.567: W/tagtag(1506): D □ ン
10-08 09:41:30.577: W/tagtag(1506): D ン S
10-08 09:41:30.637: W/tagtag(1506): D ン S
My question is: do I make a mistake while trying to read this string, or is it a problem due to the server?
java html xml encoding utf-8
java html xml encoding utf-8
edited Nov 11 at 3:35
Cœur
17k9102140
17k9102140
asked Oct 8 '12 at 14:59
sam
2,22711933
2,22711933
2
Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
– Johan Sjöberg
Oct 8 '12 at 15:05
2
On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
– linski
Oct 8 '12 at 15:10
yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
– sam
Oct 8 '12 at 15:12
Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
– Andrew Thompson
Oct 8 '12 at 15:16
sorry i didn't think
– sam
Oct 9 '12 at 15:11
add a comment |
2
Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
– Johan Sjöberg
Oct 8 '12 at 15:05
2
On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
– linski
Oct 8 '12 at 15:10
yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
– sam
Oct 8 '12 at 15:12
Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
– Andrew Thompson
Oct 8 '12 at 15:16
sorry i didn't think
– sam
Oct 9 '12 at 15:11
2
2
Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
– Johan Sjöberg
Oct 8 '12 at 15:05
Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
– Johan Sjöberg
Oct 8 '12 at 15:05
2
2
On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
– linski
Oct 8 '12 at 15:10
On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
– linski
Oct 8 '12 at 15:10
yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
– sam
Oct 8 '12 at 15:12
yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
– sam
Oct 8 '12 at 15:12
Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
– Andrew Thompson
Oct 8 '12 at 15:16
Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
– Andrew Thompson
Oct 8 '12 at 15:16
sorry i didn't think
– sam
Oct 9 '12 at 15:11
sorry i didn't think
– sam
Oct 9 '12 at 15:11
add a comment |
2 Answers
2
active
oldest
votes
up vote
2
down vote
accepted
Definitley server/ (data service) (misconfiguration issue) / bug.
Server returns this line in HTML/XML response:
Content-Type: application/xml;charset=UTF-8rn
I just inspected byte dump of the xml, this is how wireshark represents "DSSELDORF":
D..SSELDORF
in hex dump (see UTF-8 code table for hex value c2 9d):
44 c2 9d 53 53
which would be:
44 - D
53 - S
and
C2 9D
gets interpreted as control character which is also known as non printable character - hence the "missing" U - which also explains your logcat output.
Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
– sam
Oct 8 '12 at 17:56
np, glad to help! :)
– linski
Oct 8 '12 at 18:02
add a comment |
up vote
0
down vote
I think that i found the problem.
I manage to get access to the webservice code and I discovered that the file that is used to update the bdd of the webservice is encoded in ANSI.
This file is read using this code :
InputStreamReader input = new InputStreamReader(new FileInputStream("vols"), "UTF-8");
BufferedReader buffer = new BufferedReader(input);
I guess that the problem is here, so i will ask the client to change the encoding of this file but i am not sure that it is the only reason for my problem.
thanks for your help guys.
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Definitley server/ (data service) (misconfiguration issue) / bug.
Server returns this line in HTML/XML response:
Content-Type: application/xml;charset=UTF-8rn
I just inspected byte dump of the xml, this is how wireshark represents "DSSELDORF":
D..SSELDORF
in hex dump (see UTF-8 code table for hex value c2 9d):
44 c2 9d 53 53
which would be:
44 - D
53 - S
and
C2 9D
gets interpreted as control character which is also known as non printable character - hence the "missing" U - which also explains your logcat output.
Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
– sam
Oct 8 '12 at 17:56
np, glad to help! :)
– linski
Oct 8 '12 at 18:02
add a comment |
up vote
2
down vote
accepted
Definitley server/ (data service) (misconfiguration issue) / bug.
Server returns this line in HTML/XML response:
Content-Type: application/xml;charset=UTF-8rn
I just inspected byte dump of the xml, this is how wireshark represents "DSSELDORF":
D..SSELDORF
in hex dump (see UTF-8 code table for hex value c2 9d):
44 c2 9d 53 53
which would be:
44 - D
53 - S
and
C2 9D
gets interpreted as control character which is also known as non printable character - hence the "missing" U - which also explains your logcat output.
Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
– sam
Oct 8 '12 at 17:56
np, glad to help! :)
– linski
Oct 8 '12 at 18:02
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Definitley server/ (data service) (misconfiguration issue) / bug.
Server returns this line in HTML/XML response:
Content-Type: application/xml;charset=UTF-8rn
I just inspected byte dump of the xml, this is how wireshark represents "DSSELDORF":
D..SSELDORF
in hex dump (see UTF-8 code table for hex value c2 9d):
44 c2 9d 53 53
which would be:
44 - D
53 - S
and
C2 9D
gets interpreted as control character which is also known as non printable character - hence the "missing" U - which also explains your logcat output.
Definitley server/ (data service) (misconfiguration issue) / bug.
Server returns this line in HTML/XML response:
Content-Type: application/xml;charset=UTF-8rn
I just inspected byte dump of the xml, this is how wireshark represents "DSSELDORF":
D..SSELDORF
in hex dump (see UTF-8 code table for hex value c2 9d):
44 c2 9d 53 53
which would be:
44 - D
53 - S
and
C2 9D
gets interpreted as control character which is also known as non printable character - hence the "missing" U - which also explains your logcat output.
edited Oct 8 '12 at 15:30
answered Oct 8 '12 at 15:24
linski
4,33131429
4,33131429
Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
– sam
Oct 8 '12 at 17:56
np, glad to help! :)
– linski
Oct 8 '12 at 18:02
add a comment |
Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
– sam
Oct 8 '12 at 17:56
np, glad to help! :)
– linski
Oct 8 '12 at 18:02
Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
– sam
Oct 8 '12 at 17:56
Thanks guy ! it was exactly what I guessed but I don't know much about servers and encoding
– sam
Oct 8 '12 at 17:56
np, glad to help! :)
– linski
Oct 8 '12 at 18:02
np, glad to help! :)
– linski
Oct 8 '12 at 18:02
add a comment |
up vote
0
down vote
I think that i found the problem.
I manage to get access to the webservice code and I discovered that the file that is used to update the bdd of the webservice is encoded in ANSI.
This file is read using this code :
InputStreamReader input = new InputStreamReader(new FileInputStream("vols"), "UTF-8");
BufferedReader buffer = new BufferedReader(input);
I guess that the problem is here, so i will ask the client to change the encoding of this file but i am not sure that it is the only reason for my problem.
thanks for your help guys.
add a comment |
up vote
0
down vote
I think that i found the problem.
I manage to get access to the webservice code and I discovered that the file that is used to update the bdd of the webservice is encoded in ANSI.
This file is read using this code :
InputStreamReader input = new InputStreamReader(new FileInputStream("vols"), "UTF-8");
BufferedReader buffer = new BufferedReader(input);
I guess that the problem is here, so i will ask the client to change the encoding of this file but i am not sure that it is the only reason for my problem.
thanks for your help guys.
add a comment |
up vote
0
down vote
up vote
0
down vote
I think that i found the problem.
I manage to get access to the webservice code and I discovered that the file that is used to update the bdd of the webservice is encoded in ANSI.
This file is read using this code :
InputStreamReader input = new InputStreamReader(new FileInputStream("vols"), "UTF-8");
BufferedReader buffer = new BufferedReader(input);
I guess that the problem is here, so i will ask the client to change the encoding of this file but i am not sure that it is the only reason for my problem.
thanks for your help guys.
I think that i found the problem.
I manage to get access to the webservice code and I discovered that the file that is used to update the bdd of the webservice is encoded in ANSI.
This file is read using this code :
InputStreamReader input = new InputStreamReader(new FileInputStream("vols"), "UTF-8");
BufferedReader buffer = new BufferedReader(input);
I guess that the problem is here, so i will ask the client to change the encoding of this file but i am not sure that it is the only reason for my problem.
thanks for your help guys.
answered Oct 17 '12 at 12:55
sam
2,22711933
2,22711933
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f12784307%2fjava-unknown-characters-%25ef%25be%259d-xml-response%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Seems like an ecoding issue. I believe it could be useful to observe the related links ----->
– Johan Sjöberg
Oct 8 '12 at 15:05
2
On the provided link, it returns DSELLDORF for DUS? Apart from that I inspected the traffic with wireshark, and the server returns "Content-Type: application/xml;charset=UTF-8rn" which AFAIK means that the server is misconfigured
– linski
Oct 8 '12 at 15:10
yes it return DSSELDORF. That's why i said that even the IE or firefox can't read it. About your advice Johan, I don't understand where you want me to look ?
– sam
Oct 8 '12 at 15:12
Hmm.. yes, but DSSELDORF is not the same as "something like this : D□SSELDORF or D SSELDORF or D?SSELDORF". I think @linski is right in that the file does not have any character for that letter, and the general Upper Case nature of the letters makes me think whoever designed the format was intending the field to be an unique and identifiable representation of an airport, rather than the correct name in the local text.
– Andrew Thompson
Oct 8 '12 at 15:16
sorry i didn't think
– sam
Oct 9 '12 at 15:11