Emacs displays chinese character if I open xml file

I have an xml-file. When I open it with Emacs it displays chinese characters (see attachment). This happens on my Windows 7 PC with Emacs and Notepad and also on my Windows XP (see figure A). Figure B is the hexl-mode of A.

enter image description here

If I use the Windows XP PC of a collegue and open the file with Notepad there are no chinese characters but a strange character character. I saved it as txt-file and sent it by email to my Windows7-PC (see figure C). The strange character was replaced with "?". (Due to restriction I could not use the PC of my collegue and reproduce the notepad file with the strange character).

My questions: it seems that there are characters in the XML-file which creates problems. I don't know how to cope with that. Does anybody has an idea how I can manage this problem? Does it have something to do with encoding? Thanks for hints.

edited Nov 15 '18 at 9:26

asked Aug 19 '13 at 10:19

giordano

1,03821634

add a comment |

enter image description here

edited Nov 15 '18 at 9:26

asked Aug 19 '13 at 10:19

giordano

1,03821634

add a comment |

enter image description here

edited Nov 15 '18 at 9:26

asked Aug 19 '13 at 10:19

giordano

1,03821634

enter image description here

emacs encoding

edited Nov 15 '18 at 9:26

asked Aug 19 '13 at 10:19

giordano

1,03821634

edited Nov 15 '18 at 9:26

asked Aug 19 '13 at 10:19

giordano

1,03821634

edited Nov 15 '18 at 9:26

asked Aug 19 '13 at 10:19

giordano

1,03821634

asked Aug 19 '13 at 10:19

giordano

1,03821634

asked Aug 19 '13 at 10:19

giordano

1,03821634

add a comment |

2 Answers
2

active

oldest

votes

By figure B, it looks like this file is encoded with a mixture of big-endian and little-endian UTF-16. It starts with fe ff, which is the byte order mark for big-endian UTF-16, and the XML declaration (<?xml version=...) is also big-endian, but the part starting with <report is little-endian. You can tell because the letters appear on even positions in the first part of the hexl display, but on odd positions further down.

Also, there is a null character (encoded as two bytes, 00 00) right before <report. Null characters are not allowed in XML documents.

However, since some of the XML elements appear correctly in figure A, it seems that the confusion goes on through the file. The file is corrupt, and this probably needs to be resolved manually.

If there are no non-ASCII characters in the file, I would try to open the file in Emacs as binary (M-x revert-buffer-with-coding-system and specify binary), remove all null bytes (M-% C-q C-@ RET RET), save the file and hope for the best.

Another possible solution is to mark each region appearing with Chinese characters and recode it with M-x recode-region, giving "Text was really in" as utf-16-le and "But was interpreted as" as utf-16-be.

answered Aug 19 '13 at 11:17

legoscia

29.4k1185114

Thanks a lot for the answer. The first try (with revert buffer) gives me a lot of ^@,the second (recode region) does it. That means that I try to get an non corrupted file since. Thanks to your input I can gives some hints to the one from whom I get the file.

– giordano
Aug 19 '13 at 16:30

1

The producer of the xml-file said that the encoding utf-16 would create this problem (as legoscia observed). After change to utf-8 the xml-file was readable. Obviously, enceoding is a major issue. It is a pity that something which has such an effect is not commuincated when files are shared.

– giordano
Aug 20 '13 at 6:29

add a comment |

The solution of legoscia using the possibility of Emacs to change encoding within a file solved my problem. An other possibility is:

cut the part to convert

paste in a new file and save it

open it with an editor which can convert encodings

convert the file and save it

copy the converted string and add (paste) to the original file where you cut the part to convert

In my case it worked with Atom, but not with Notepad++.

PS: The reason why I used this way is that Emacs could not open anymore this kind of corrupted files. I don't know why but this is another issue.

Edit 1: Since copy, paste and merge is cumbersome I found the solution how to open currupted files with emacs: emacs -q xmlfile.xml. Using emacs like legoscia suggested is the best way to repair such files.

edited 13 hours ago

answered Jan 24 at 9:27

giordano

1,03821634

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f18311399%2femacs-displays-chinese-character-if-i-open-xml-file%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Also, there is a null character (encoded as two bytes, 00 00) right before <report. Null characters are not allowed in XML documents.

However, since some of the XML elements appear correctly in figure A, it seems that the confusion goes on through the file. The file is corrupt, and this probably needs to be resolved manually.

answered Aug 19 '13 at 11:17

legoscia

29.4k1185114

Thanks a lot for the answer. The first try (with revert buffer) gives me a lot of ^@,the second (recode region) does it. That means that I try to get an non corrupted file since. Thanks to your input I can gives some hints to the one from whom I get the file.

– giordano
Aug 19 '13 at 16:30

1

The producer of the xml-file said that the encoding utf-16 would create this problem (as legoscia observed). After change to utf-8 the xml-file was readable. Obviously, enceoding is a major issue. It is a pity that something which has such an effect is not commuincated when files are shared.

– giordano
Aug 20 '13 at 6:29

add a comment |

Also, there is a null character (encoded as two bytes, 00 00) right before <report. Null characters are not allowed in XML documents.

However, since some of the XML elements appear correctly in figure A, it seems that the confusion goes on through the file. The file is corrupt, and this probably needs to be resolved manually.

answered Aug 19 '13 at 11:17

legoscia

29.4k1185114

Thanks a lot for the answer. The first try (with revert buffer) gives me a lot of ^@,the second (recode region) does it. That means that I try to get an non corrupted file since. Thanks to your input I can gives some hints to the one from whom I get the file.

– giordano
Aug 19 '13 at 16:30

1

The producer of the xml-file said that the encoding utf-16 would create this problem (as legoscia observed). After change to utf-8 the xml-file was readable. Obviously, enceoding is a major issue. It is a pity that something which has such an effect is not commuincated when files are shared.

– giordano
Aug 20 '13 at 6:29

add a comment |

Also, there is a null character (encoded as two bytes, 00 00) right before <report. Null characters are not allowed in XML documents.

However, since some of the XML elements appear correctly in figure A, it seems that the confusion goes on through the file. The file is corrupt, and this probably needs to be resolved manually.

answered Aug 19 '13 at 11:17

legoscia

29.4k1185114

Also, there is a null character (encoded as two bytes, 00 00) right before <report. Null characters are not allowed in XML documents.

However, since some of the XML elements appear correctly in figure A, it seems that the confusion goes on through the file. The file is corrupt, and this probably needs to be resolved manually.

answered Aug 19 '13 at 11:17

legoscia

29.4k1185114

answered Aug 19 '13 at 11:17

legoscia

29.4k1185114

answered Aug 19 '13 at 11:17

legoscia

29.4k1185114

answered Aug 19 '13 at 11:17

legoscia

29.4k1185114

Thanks a lot for the answer. The first try (with revert buffer) gives me a lot of ^@,the second (recode region) does it. That means that I try to get an non corrupted file since. Thanks to your input I can gives some hints to the one from whom I get the file.

– giordano
Aug 19 '13 at 16:30

1

The producer of the xml-file said that the encoding utf-16 would create this problem (as legoscia observed). After change to utf-8 the xml-file was readable. Obviously, enceoding is a major issue. It is a pity that something which has such an effect is not commuincated when files are shared.

– giordano
Aug 20 '13 at 6:29

add a comment |

Thanks a lot for the answer. The first try (with revert buffer) gives me a lot of ^@,the second (recode region) does it. That means that I try to get an non corrupted file since. Thanks to your input I can gives some hints to the one from whom I get the file.

– giordano
Aug 19 '13 at 16:30

1

The producer of the xml-file said that the encoding utf-16 would create this problem (as legoscia observed). After change to utf-8 the xml-file was readable. Obviously, enceoding is a major issue. It is a pity that something which has such an effect is not commuincated when files are shared.

– giordano
Aug 20 '13 at 6:29

Thanks a lot for the answer. The first try (with revert buffer) gives me a lot of ^@,the second (recode region) does it. That means that I try to get an non corrupted file since. Thanks to your input I can gives some hints to the one from whom I get the file.

– giordano
Aug 19 '13 at 16:30

The producer of the xml-file said that the encoding utf-16 would create this problem (as legoscia observed). After change to utf-8 the xml-file was readable. Obviously, enceoding is a major issue. It is a pity that something which has such an effect is not commuincated when files are shared.

– giordano
Aug 20 '13 at 6:29

add a comment |

The solution of legoscia using the possibility of Emacs to change encoding within a file solved my problem. An other possibility is:

cut the part to convert

paste in a new file and save it

open it with an editor which can convert encodings

convert the file and save it

copy the converted string and add (paste) to the original file where you cut the part to convert

In my case it worked with Atom, but not with Notepad++.

PS: The reason why I used this way is that Emacs could not open anymore this kind of corrupted files. I don't know why but this is another issue.

edited 13 hours ago

answered Jan 24 at 9:27

giordano

1,03821634

add a comment |

The solution of legoscia using the possibility of Emacs to change encoding within a file solved my problem. An other possibility is:

cut the part to convert

paste in a new file and save it

open it with an editor which can convert encodings

convert the file and save it

copy the converted string and add (paste) to the original file where you cut the part to convert

In my case it worked with Atom, but not with Notepad++.

PS: The reason why I used this way is that Emacs could not open anymore this kind of corrupted files. I don't know why but this is another issue.

edited 13 hours ago

answered Jan 24 at 9:27

giordano

1,03821634

add a comment |

The solution of legoscia using the possibility of Emacs to change encoding within a file solved my problem. An other possibility is:

cut the part to convert

paste in a new file and save it

open it with an editor which can convert encodings

convert the file and save it

copy the converted string and add (paste) to the original file where you cut the part to convert

In my case it worked with Atom, but not with Notepad++.

PS: The reason why I used this way is that Emacs could not open anymore this kind of corrupted files. I don't know why but this is another issue.

edited 13 hours ago

answered Jan 24 at 9:27

giordano

1,03821634

The solution of legoscia using the possibility of Emacs to change encoding within a file solved my problem. An other possibility is:

cut the part to convert

paste in a new file and save it

open it with an editor which can convert encodings

convert the file and save it

copy the converted string and add (paste) to the original file where you cut the part to convert

In my case it worked with Atom, but not with Notepad++.

PS: The reason why I used this way is that Emacs could not open anymore this kind of corrupted files. I don't know why but this is another issue.

edited 13 hours ago

answered Jan 24 at 9:27

giordano

1,03821634

edited 13 hours ago

answered Jan 24 at 9:27

giordano

1,03821634

answered Jan 24 at 9:27

giordano

1,03821634

answered Jan 24 at 9:27

giordano

1,03821634

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky