Emacs displays chinese character if I open xml file












3















I have an xml-file. When I open it with Emacs it displays chinese characters (see attachment). This happens on my Windows 7 PC with Emacs and Notepad and also on my Windows XP (see figure A). Figure B is the hexl-mode of A.



enter image description here



If I use the Windows XP PC of a collegue and open the file with Notepad there are no chinese characters but a strange character character. I saved it as txt-file and sent it by email to my Windows7-PC (see figure C). The strange character was replaced with "?". (Due to restriction I could not use the PC of my collegue and reproduce the notepad file with the strange character).



My questions: it seems that there are characters in the XML-file which creates problems. I don't know how to cope with that. Does anybody has an idea how I can manage this problem? Does it have something to do with encoding? Thanks for hints.










share|improve this question





























    3















    I have an xml-file. When I open it with Emacs it displays chinese characters (see attachment). This happens on my Windows 7 PC with Emacs and Notepad and also on my Windows XP (see figure A). Figure B is the hexl-mode of A.



    enter image description here



    If I use the Windows XP PC of a collegue and open the file with Notepad there are no chinese characters but a strange character character. I saved it as txt-file and sent it by email to my Windows7-PC (see figure C). The strange character was replaced with "?". (Due to restriction I could not use the PC of my collegue and reproduce the notepad file with the strange character).



    My questions: it seems that there are characters in the XML-file which creates problems. I don't know how to cope with that. Does anybody has an idea how I can manage this problem? Does it have something to do with encoding? Thanks for hints.










    share|improve this question



























      3












      3








      3


      2






      I have an xml-file. When I open it with Emacs it displays chinese characters (see attachment). This happens on my Windows 7 PC with Emacs and Notepad and also on my Windows XP (see figure A). Figure B is the hexl-mode of A.



      enter image description here



      If I use the Windows XP PC of a collegue and open the file with Notepad there are no chinese characters but a strange character character. I saved it as txt-file and sent it by email to my Windows7-PC (see figure C). The strange character was replaced with "?". (Due to restriction I could not use the PC of my collegue and reproduce the notepad file with the strange character).



      My questions: it seems that there are characters in the XML-file which creates problems. I don't know how to cope with that. Does anybody has an idea how I can manage this problem? Does it have something to do with encoding? Thanks for hints.










      share|improve this question
















      I have an xml-file. When I open it with Emacs it displays chinese characters (see attachment). This happens on my Windows 7 PC with Emacs and Notepad and also on my Windows XP (see figure A). Figure B is the hexl-mode of A.



      enter image description here



      If I use the Windows XP PC of a collegue and open the file with Notepad there are no chinese characters but a strange character character. I saved it as txt-file and sent it by email to my Windows7-PC (see figure C). The strange character was replaced with "?". (Due to restriction I could not use the PC of my collegue and reproduce the notepad file with the strange character).



      My questions: it seems that there are characters in the XML-file which creates problems. I don't know how to cope with that. Does anybody has an idea how I can manage this problem? Does it have something to do with encoding? Thanks for hints.







      emacs encoding






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 15 '18 at 9:26







      giordano

















      asked Aug 19 '13 at 10:19









      giordanogiordano

      1,03821634




      1,03821634
























          2 Answers
          2






          active

          oldest

          votes


















          6














          By figure B, it looks like this file is encoded with a mixture of big-endian and little-endian UTF-16. It starts with fe ff, which is the byte order mark for big-endian UTF-16, and the XML declaration (<?xml version=...) is also big-endian, but the part starting with <report is little-endian. You can tell because the letters appear on even positions in the first part of the hexl display, but on odd positions further down.



          Also, there is a null character (encoded as two bytes, 00 00) right before <report. Null characters are not allowed in XML documents.



          However, since some of the XML elements appear correctly in figure A, it seems that the confusion goes on through the file. The file is corrupt, and this probably needs to be resolved manually.



          If there are no non-ASCII characters in the file, I would try to open the file in Emacs as binary (M-x revert-buffer-with-coding-system and specify binary), remove all null bytes (M-% C-q C-@ RET RET), save the file and hope for the best.



          Another possible solution is to mark each region appearing with Chinese characters and recode it with M-x recode-region, giving "Text was really in" as utf-16-le and "But was interpreted as" as utf-16-be.






          share|improve this answer
























          • Thanks a lot for the answer. The first try (with revert buffer) gives me a lot of ^@,the second (recode region) does it. That means that I try to get an non corrupted file since. Thanks to your input I can gives some hints to the one from whom I get the file.

            – giordano
            Aug 19 '13 at 16:30






          • 1





            The producer of the xml-file said that the encoding utf-16 would create this problem (as legoscia observed). After change to utf-8 the xml-file was readable. Obviously, enceoding is a major issue. It is a pity that something which has such an effect is not commuincated when files are shared.

            – giordano
            Aug 20 '13 at 6:29





















          0














          The solution of legoscia using the possibility of Emacs to change encoding within a file solved my problem. An other possibility is:




          1. cut the part to convert

          2. paste in a new file and save it

          3. open it with an editor which can convert encodings

          4. convert the file and save it

          5. copy the converted string and add (paste) to the original file where you cut the part to convert


          In my case it worked with Atom, but not with Notepad++.



          PS: The reason why I used this way is that Emacs could not open anymore this kind of corrupted files. I don't know why but this is another issue.



          Edit 1: Since copy, paste and merge is cumbersome I found the solution how to open currupted files with emacs: emacs -q xmlfile.xml. Using emacs like legoscia suggested is the best way to repair such files.






          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f18311399%2femacs-displays-chinese-character-if-i-open-xml-file%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            6














            By figure B, it looks like this file is encoded with a mixture of big-endian and little-endian UTF-16. It starts with fe ff, which is the byte order mark for big-endian UTF-16, and the XML declaration (<?xml version=...) is also big-endian, but the part starting with <report is little-endian. You can tell because the letters appear on even positions in the first part of the hexl display, but on odd positions further down.



            Also, there is a null character (encoded as two bytes, 00 00) right before <report. Null characters are not allowed in XML documents.



            However, since some of the XML elements appear correctly in figure A, it seems that the confusion goes on through the file. The file is corrupt, and this probably needs to be resolved manually.



            If there are no non-ASCII characters in the file, I would try to open the file in Emacs as binary (M-x revert-buffer-with-coding-system and specify binary), remove all null bytes (M-% C-q C-@ RET RET), save the file and hope for the best.



            Another possible solution is to mark each region appearing with Chinese characters and recode it with M-x recode-region, giving "Text was really in" as utf-16-le and "But was interpreted as" as utf-16-be.






            share|improve this answer
























            • Thanks a lot for the answer. The first try (with revert buffer) gives me a lot of ^@,the second (recode region) does it. That means that I try to get an non corrupted file since. Thanks to your input I can gives some hints to the one from whom I get the file.

              – giordano
              Aug 19 '13 at 16:30






            • 1





              The producer of the xml-file said that the encoding utf-16 would create this problem (as legoscia observed). After change to utf-8 the xml-file was readable. Obviously, enceoding is a major issue. It is a pity that something which has such an effect is not commuincated when files are shared.

              – giordano
              Aug 20 '13 at 6:29


















            6














            By figure B, it looks like this file is encoded with a mixture of big-endian and little-endian UTF-16. It starts with fe ff, which is the byte order mark for big-endian UTF-16, and the XML declaration (<?xml version=...) is also big-endian, but the part starting with <report is little-endian. You can tell because the letters appear on even positions in the first part of the hexl display, but on odd positions further down.



            Also, there is a null character (encoded as two bytes, 00 00) right before <report. Null characters are not allowed in XML documents.



            However, since some of the XML elements appear correctly in figure A, it seems that the confusion goes on through the file. The file is corrupt, and this probably needs to be resolved manually.



            If there are no non-ASCII characters in the file, I would try to open the file in Emacs as binary (M-x revert-buffer-with-coding-system and specify binary), remove all null bytes (M-% C-q C-@ RET RET), save the file and hope for the best.



            Another possible solution is to mark each region appearing with Chinese characters and recode it with M-x recode-region, giving "Text was really in" as utf-16-le and "But was interpreted as" as utf-16-be.






            share|improve this answer
























            • Thanks a lot for the answer. The first try (with revert buffer) gives me a lot of ^@,the second (recode region) does it. That means that I try to get an non corrupted file since. Thanks to your input I can gives some hints to the one from whom I get the file.

              – giordano
              Aug 19 '13 at 16:30






            • 1





              The producer of the xml-file said that the encoding utf-16 would create this problem (as legoscia observed). After change to utf-8 the xml-file was readable. Obviously, enceoding is a major issue. It is a pity that something which has such an effect is not commuincated when files are shared.

              – giordano
              Aug 20 '13 at 6:29
















            6












            6








            6







            By figure B, it looks like this file is encoded with a mixture of big-endian and little-endian UTF-16. It starts with fe ff, which is the byte order mark for big-endian UTF-16, and the XML declaration (<?xml version=...) is also big-endian, but the part starting with <report is little-endian. You can tell because the letters appear on even positions in the first part of the hexl display, but on odd positions further down.



            Also, there is a null character (encoded as two bytes, 00 00) right before <report. Null characters are not allowed in XML documents.



            However, since some of the XML elements appear correctly in figure A, it seems that the confusion goes on through the file. The file is corrupt, and this probably needs to be resolved manually.



            If there are no non-ASCII characters in the file, I would try to open the file in Emacs as binary (M-x revert-buffer-with-coding-system and specify binary), remove all null bytes (M-% C-q C-@ RET RET), save the file and hope for the best.



            Another possible solution is to mark each region appearing with Chinese characters and recode it with M-x recode-region, giving "Text was really in" as utf-16-le and "But was interpreted as" as utf-16-be.






            share|improve this answer













            By figure B, it looks like this file is encoded with a mixture of big-endian and little-endian UTF-16. It starts with fe ff, which is the byte order mark for big-endian UTF-16, and the XML declaration (<?xml version=...) is also big-endian, but the part starting with <report is little-endian. You can tell because the letters appear on even positions in the first part of the hexl display, but on odd positions further down.



            Also, there is a null character (encoded as two bytes, 00 00) right before <report. Null characters are not allowed in XML documents.



            However, since some of the XML elements appear correctly in figure A, it seems that the confusion goes on through the file. The file is corrupt, and this probably needs to be resolved manually.



            If there are no non-ASCII characters in the file, I would try to open the file in Emacs as binary (M-x revert-buffer-with-coding-system and specify binary), remove all null bytes (M-% C-q C-@ RET RET), save the file and hope for the best.



            Another possible solution is to mark each region appearing with Chinese characters and recode it with M-x recode-region, giving "Text was really in" as utf-16-le and "But was interpreted as" as utf-16-be.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Aug 19 '13 at 11:17









            legoscialegoscia

            29.4k1185114




            29.4k1185114













            • Thanks a lot for the answer. The first try (with revert buffer) gives me a lot of ^@,the second (recode region) does it. That means that I try to get an non corrupted file since. Thanks to your input I can gives some hints to the one from whom I get the file.

              – giordano
              Aug 19 '13 at 16:30






            • 1





              The producer of the xml-file said that the encoding utf-16 would create this problem (as legoscia observed). After change to utf-8 the xml-file was readable. Obviously, enceoding is a major issue. It is a pity that something which has such an effect is not commuincated when files are shared.

              – giordano
              Aug 20 '13 at 6:29





















            • Thanks a lot for the answer. The first try (with revert buffer) gives me a lot of ^@,the second (recode region) does it. That means that I try to get an non corrupted file since. Thanks to your input I can gives some hints to the one from whom I get the file.

              – giordano
              Aug 19 '13 at 16:30






            • 1





              The producer of the xml-file said that the encoding utf-16 would create this problem (as legoscia observed). After change to utf-8 the xml-file was readable. Obviously, enceoding is a major issue. It is a pity that something which has such an effect is not commuincated when files are shared.

              – giordano
              Aug 20 '13 at 6:29



















            Thanks a lot for the answer. The first try (with revert buffer) gives me a lot of ^@,the second (recode region) does it. That means that I try to get an non corrupted file since. Thanks to your input I can gives some hints to the one from whom I get the file.

            – giordano
            Aug 19 '13 at 16:30





            Thanks a lot for the answer. The first try (with revert buffer) gives me a lot of ^@,the second (recode region) does it. That means that I try to get an non corrupted file since. Thanks to your input I can gives some hints to the one from whom I get the file.

            – giordano
            Aug 19 '13 at 16:30




            1




            1





            The producer of the xml-file said that the encoding utf-16 would create this problem (as legoscia observed). After change to utf-8 the xml-file was readable. Obviously, enceoding is a major issue. It is a pity that something which has such an effect is not commuincated when files are shared.

            – giordano
            Aug 20 '13 at 6:29







            The producer of the xml-file said that the encoding utf-16 would create this problem (as legoscia observed). After change to utf-8 the xml-file was readable. Obviously, enceoding is a major issue. It is a pity that something which has such an effect is not commuincated when files are shared.

            – giordano
            Aug 20 '13 at 6:29















            0














            The solution of legoscia using the possibility of Emacs to change encoding within a file solved my problem. An other possibility is:




            1. cut the part to convert

            2. paste in a new file and save it

            3. open it with an editor which can convert encodings

            4. convert the file and save it

            5. copy the converted string and add (paste) to the original file where you cut the part to convert


            In my case it worked with Atom, but not with Notepad++.



            PS: The reason why I used this way is that Emacs could not open anymore this kind of corrupted files. I don't know why but this is another issue.



            Edit 1: Since copy, paste and merge is cumbersome I found the solution how to open currupted files with emacs: emacs -q xmlfile.xml. Using emacs like legoscia suggested is the best way to repair such files.






            share|improve this answer






























              0














              The solution of legoscia using the possibility of Emacs to change encoding within a file solved my problem. An other possibility is:




              1. cut the part to convert

              2. paste in a new file and save it

              3. open it with an editor which can convert encodings

              4. convert the file and save it

              5. copy the converted string and add (paste) to the original file where you cut the part to convert


              In my case it worked with Atom, but not with Notepad++.



              PS: The reason why I used this way is that Emacs could not open anymore this kind of corrupted files. I don't know why but this is another issue.



              Edit 1: Since copy, paste and merge is cumbersome I found the solution how to open currupted files with emacs: emacs -q xmlfile.xml. Using emacs like legoscia suggested is the best way to repair such files.






              share|improve this answer




























                0












                0








                0







                The solution of legoscia using the possibility of Emacs to change encoding within a file solved my problem. An other possibility is:




                1. cut the part to convert

                2. paste in a new file and save it

                3. open it with an editor which can convert encodings

                4. convert the file and save it

                5. copy the converted string and add (paste) to the original file where you cut the part to convert


                In my case it worked with Atom, but not with Notepad++.



                PS: The reason why I used this way is that Emacs could not open anymore this kind of corrupted files. I don't know why but this is another issue.



                Edit 1: Since copy, paste and merge is cumbersome I found the solution how to open currupted files with emacs: emacs -q xmlfile.xml. Using emacs like legoscia suggested is the best way to repair such files.






                share|improve this answer















                The solution of legoscia using the possibility of Emacs to change encoding within a file solved my problem. An other possibility is:




                1. cut the part to convert

                2. paste in a new file and save it

                3. open it with an editor which can convert encodings

                4. convert the file and save it

                5. copy the converted string and add (paste) to the original file where you cut the part to convert


                In my case it worked with Atom, but not with Notepad++.



                PS: The reason why I used this way is that Emacs could not open anymore this kind of corrupted files. I don't know why but this is another issue.



                Edit 1: Since copy, paste and merge is cumbersome I found the solution how to open currupted files with emacs: emacs -q xmlfile.xml. Using emacs like legoscia suggested is the best way to repair such files.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited 13 hours ago

























                answered Jan 24 at 9:27









                giordanogiordano

                1,03821634




                1,03821634






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f18311399%2femacs-displays-chinese-character-if-i-open-xml-file%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    List item for chat from Array inside array React Native

                    Thiostrepton

                    Caerphilly