Why does emoji have two different utf-8 codes? How to convert emoji from utf-8 , use NSString in ios?












9















We have found an issue, that some emoji have two utf-8 codes, such as:



emoji   unicode    utf-8                another utf-8
😁 U+1F601 xf0x9fx98x81 xedxa0xbdxedxb8x81


But ios language can't decode the other type of utf-8, so resulting an error when i decode string from utf-8.



ios code





In all documents i found, i can just find one type of utf-8 code for a emoji, no where to find the other.



Documents i referenced includes:



emoji code link



whole utf-8 code link



But in a web tool bianma, all the two types of utf-8 code can be converted into emoji correctly.



input code



ouput





So, my question is :




  1. Why does there have two types of utf-8 codes for one emoji ?


  2. Where has a document which includes the two types of utf-8 codes?


  3. How to correctly convert string from utf-8, using NSString in ios language?











share|improve this question























  • This had me intrigued as my first thought was that the long UTF-8 representation was two UTF-8 blocks. It turns out that there are two variations of UTF-8, CESU-8 and Modified UTF-8, which encode UTF-16 style. You may be able to use this article iphonedevsdk.com/forum/iphone-sdk-development/… to write a decoder if there's no suitable iOS/Objective-C native decoder.

    – Alastair McCormack
    Dec 22 '15 at 11:33











  • @AlastairMcCormack That's the answer I think. You should post that as an answer.

    – roeland
    Dec 22 '15 at 22:23











  • @user692793 Please never post text as images, especially not code or output.

    – roeland
    Dec 22 '15 at 22:24











  • Thanks @roeland. I think a proper answer should contain some working code, but as I'm not an Objective-C coder I'll leave it to someone else to pickup the glory :)

    – Alastair McCormack
    Dec 22 '15 at 22:26
















9















We have found an issue, that some emoji have two utf-8 codes, such as:



emoji   unicode    utf-8                another utf-8
😁 U+1F601 xf0x9fx98x81 xedxa0xbdxedxb8x81


But ios language can't decode the other type of utf-8, so resulting an error when i decode string from utf-8.



ios code





In all documents i found, i can just find one type of utf-8 code for a emoji, no where to find the other.



Documents i referenced includes:



emoji code link



whole utf-8 code link



But in a web tool bianma, all the two types of utf-8 code can be converted into emoji correctly.



input code



ouput





So, my question is :




  1. Why does there have two types of utf-8 codes for one emoji ?


  2. Where has a document which includes the two types of utf-8 codes?


  3. How to correctly convert string from utf-8, using NSString in ios language?











share|improve this question























  • This had me intrigued as my first thought was that the long UTF-8 representation was two UTF-8 blocks. It turns out that there are two variations of UTF-8, CESU-8 and Modified UTF-8, which encode UTF-16 style. You may be able to use this article iphonedevsdk.com/forum/iphone-sdk-development/… to write a decoder if there's no suitable iOS/Objective-C native decoder.

    – Alastair McCormack
    Dec 22 '15 at 11:33











  • @AlastairMcCormack That's the answer I think. You should post that as an answer.

    – roeland
    Dec 22 '15 at 22:23











  • @user692793 Please never post text as images, especially not code or output.

    – roeland
    Dec 22 '15 at 22:24











  • Thanks @roeland. I think a proper answer should contain some working code, but as I'm not an Objective-C coder I'll leave it to someone else to pickup the glory :)

    – Alastair McCormack
    Dec 22 '15 at 22:26














9












9








9


4






We have found an issue, that some emoji have two utf-8 codes, such as:



emoji   unicode    utf-8                another utf-8
😁 U+1F601 xf0x9fx98x81 xedxa0xbdxedxb8x81


But ios language can't decode the other type of utf-8, so resulting an error when i decode string from utf-8.



ios code





In all documents i found, i can just find one type of utf-8 code for a emoji, no where to find the other.



Documents i referenced includes:



emoji code link



whole utf-8 code link



But in a web tool bianma, all the two types of utf-8 code can be converted into emoji correctly.



input code



ouput





So, my question is :




  1. Why does there have two types of utf-8 codes for one emoji ?


  2. Where has a document which includes the two types of utf-8 codes?


  3. How to correctly convert string from utf-8, using NSString in ios language?











share|improve this question














We have found an issue, that some emoji have two utf-8 codes, such as:



emoji   unicode    utf-8                another utf-8
😁 U+1F601 xf0x9fx98x81 xedxa0xbdxedxb8x81


But ios language can't decode the other type of utf-8, so resulting an error when i decode string from utf-8.



ios code





In all documents i found, i can just find one type of utf-8 code for a emoji, no where to find the other.



Documents i referenced includes:



emoji code link



whole utf-8 code link



But in a web tool bianma, all the two types of utf-8 code can be converted into emoji correctly.



input code



ouput





So, my question is :




  1. Why does there have two types of utf-8 codes for one emoji ?


  2. Where has a document which includes the two types of utf-8 codes?


  3. How to correctly convert string from utf-8, using NSString in ios language?








ios unicode utf-8 nsstring emoji






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 22 '15 at 5:34









pinchwangpinchwang

831110




831110













  • This had me intrigued as my first thought was that the long UTF-8 representation was two UTF-8 blocks. It turns out that there are two variations of UTF-8, CESU-8 and Modified UTF-8, which encode UTF-16 style. You may be able to use this article iphonedevsdk.com/forum/iphone-sdk-development/… to write a decoder if there's no suitable iOS/Objective-C native decoder.

    – Alastair McCormack
    Dec 22 '15 at 11:33











  • @AlastairMcCormack That's the answer I think. You should post that as an answer.

    – roeland
    Dec 22 '15 at 22:23











  • @user692793 Please never post text as images, especially not code or output.

    – roeland
    Dec 22 '15 at 22:24











  • Thanks @roeland. I think a proper answer should contain some working code, but as I'm not an Objective-C coder I'll leave it to someone else to pickup the glory :)

    – Alastair McCormack
    Dec 22 '15 at 22:26



















  • This had me intrigued as my first thought was that the long UTF-8 representation was two UTF-8 blocks. It turns out that there are two variations of UTF-8, CESU-8 and Modified UTF-8, which encode UTF-16 style. You may be able to use this article iphonedevsdk.com/forum/iphone-sdk-development/… to write a decoder if there's no suitable iOS/Objective-C native decoder.

    – Alastair McCormack
    Dec 22 '15 at 11:33











  • @AlastairMcCormack That's the answer I think. You should post that as an answer.

    – roeland
    Dec 22 '15 at 22:23











  • @user692793 Please never post text as images, especially not code or output.

    – roeland
    Dec 22 '15 at 22:24











  • Thanks @roeland. I think a proper answer should contain some working code, but as I'm not an Objective-C coder I'll leave it to someone else to pickup the glory :)

    – Alastair McCormack
    Dec 22 '15 at 22:26

















This had me intrigued as my first thought was that the long UTF-8 representation was two UTF-8 blocks. It turns out that there are two variations of UTF-8, CESU-8 and Modified UTF-8, which encode UTF-16 style. You may be able to use this article iphonedevsdk.com/forum/iphone-sdk-development/… to write a decoder if there's no suitable iOS/Objective-C native decoder.

– Alastair McCormack
Dec 22 '15 at 11:33





This had me intrigued as my first thought was that the long UTF-8 representation was two UTF-8 blocks. It turns out that there are two variations of UTF-8, CESU-8 and Modified UTF-8, which encode UTF-16 style. You may be able to use this article iphonedevsdk.com/forum/iphone-sdk-development/… to write a decoder if there's no suitable iOS/Objective-C native decoder.

– Alastair McCormack
Dec 22 '15 at 11:33













@AlastairMcCormack That's the answer I think. You should post that as an answer.

– roeland
Dec 22 '15 at 22:23





@AlastairMcCormack That's the answer I think. You should post that as an answer.

– roeland
Dec 22 '15 at 22:23













@user692793 Please never post text as images, especially not code or output.

– roeland
Dec 22 '15 at 22:24





@user692793 Please never post text as images, especially not code or output.

– roeland
Dec 22 '15 at 22:24













Thanks @roeland. I think a proper answer should contain some working code, but as I'm not an Objective-C coder I'll leave it to someone else to pickup the glory :)

– Alastair McCormack
Dec 22 '15 at 22:26





Thanks @roeland. I think a proper answer should contain some working code, but as I'm not an Objective-C coder I'll leave it to someone else to pickup the glory :)

– Alastair McCormack
Dec 22 '15 at 22:26












2 Answers
2






active

oldest

votes


















9















0xF0, 0x9F, 0x98, 0x81




Is the correct UTF-8 encoding for U+1F601 😁.




0xED, 0xA0, 0xBD, 0xED, 0xB8, 0x81




Is not a valid UTF-8 sequence(*). It should really be rejected; iOS is correct to do so.



This is a bug in the bianma tool: the convertUtf8BytesToUnicodeCodePoints function is more lenient about what input it accepts than the specified algorithm in eg RFC 3629.



This happens to return a working string only because the tool is written in JavaScript. Having decoded the above byte sequence to the bogus surrogate code point sequence U+D83D,U+DE01 it then converts that into a JavaScript string using a direct code-point-to-code-unit mapping giving uD83DxDE01. As this is the correct way to encode 😁 in a UTF-16 string it appears to have worked.



(*: It is a valid CESU-8 sequence, but that encoding is just “bogus broken encoding for compatibility with badly-written historical tools” and should generally be avoided.)



You should not usually encounter a sequence like this; it is typically not worth catering for unless you have a specific source of this kind of malformed data which you don't have the power to get fixed.






share|improve this answer


























  • Thank you very much for answer. We read string data from our server which use C++ language, after server convert unicode string to utf-8, this issue occurs. One more thing need to mention is that, when our client receive data as a string value cstr, and printf("%s", cstr) it's correct. But when convert string to NSString, NSString *ocstr = [[NSString alloc] initWithBytes:cstr.c_str() length:cstr.length() encoding:NSUTF8StringEncoding]; ocstr results as nil. why apple do not support the CESU-8 sequence? Do we have function to resolve the issue?

    – pinchwang
    Dec 23 '15 at 2:12











  • I would first look at the C++ server UTF-8 encoder, to see if it can be fixed properly at source. CESU-8 is considered an undesirable anomaly that you'd never deliberately want to use; most systems don't support it. If you have to accept it you'll need to write your own CESU-8 decoder walking through the input byte array (or use an existing library, eg ICU though that would be a really heavy dependency just for this).

    – bobince
    Dec 23 '15 at 11:36











  • Just as a side note, there is one particularly bothersome source of encoding like this: JNI (Java Native Interface). If you attempt to retrieve "UTF-8" bytes from a Java string you will receive the "modified UTF-8" variant. That is a rather large source of malformed data that cannot be fixed, unfortunately.

    – borrrden
    Jul 12 '18 at 17:02



















0














This worked for me in php to send a message with emoji to telegram bot:



$message_text = " xf0x9fx98x81 ";





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f34409085%2fwhy-does-emoji-have-two-different-utf-8-codes-how-to-convert-emoji-from-utf-8%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    9















    0xF0, 0x9F, 0x98, 0x81




    Is the correct UTF-8 encoding for U+1F601 😁.




    0xED, 0xA0, 0xBD, 0xED, 0xB8, 0x81




    Is not a valid UTF-8 sequence(*). It should really be rejected; iOS is correct to do so.



    This is a bug in the bianma tool: the convertUtf8BytesToUnicodeCodePoints function is more lenient about what input it accepts than the specified algorithm in eg RFC 3629.



    This happens to return a working string only because the tool is written in JavaScript. Having decoded the above byte sequence to the bogus surrogate code point sequence U+D83D,U+DE01 it then converts that into a JavaScript string using a direct code-point-to-code-unit mapping giving uD83DxDE01. As this is the correct way to encode 😁 in a UTF-16 string it appears to have worked.



    (*: It is a valid CESU-8 sequence, but that encoding is just “bogus broken encoding for compatibility with badly-written historical tools” and should generally be avoided.)



    You should not usually encounter a sequence like this; it is typically not worth catering for unless you have a specific source of this kind of malformed data which you don't have the power to get fixed.






    share|improve this answer


























    • Thank you very much for answer. We read string data from our server which use C++ language, after server convert unicode string to utf-8, this issue occurs. One more thing need to mention is that, when our client receive data as a string value cstr, and printf("%s", cstr) it's correct. But when convert string to NSString, NSString *ocstr = [[NSString alloc] initWithBytes:cstr.c_str() length:cstr.length() encoding:NSUTF8StringEncoding]; ocstr results as nil. why apple do not support the CESU-8 sequence? Do we have function to resolve the issue?

      – pinchwang
      Dec 23 '15 at 2:12











    • I would first look at the C++ server UTF-8 encoder, to see if it can be fixed properly at source. CESU-8 is considered an undesirable anomaly that you'd never deliberately want to use; most systems don't support it. If you have to accept it you'll need to write your own CESU-8 decoder walking through the input byte array (or use an existing library, eg ICU though that would be a really heavy dependency just for this).

      – bobince
      Dec 23 '15 at 11:36











    • Just as a side note, there is one particularly bothersome source of encoding like this: JNI (Java Native Interface). If you attempt to retrieve "UTF-8" bytes from a Java string you will receive the "modified UTF-8" variant. That is a rather large source of malformed data that cannot be fixed, unfortunately.

      – borrrden
      Jul 12 '18 at 17:02
















    9















    0xF0, 0x9F, 0x98, 0x81




    Is the correct UTF-8 encoding for U+1F601 😁.




    0xED, 0xA0, 0xBD, 0xED, 0xB8, 0x81




    Is not a valid UTF-8 sequence(*). It should really be rejected; iOS is correct to do so.



    This is a bug in the bianma tool: the convertUtf8BytesToUnicodeCodePoints function is more lenient about what input it accepts than the specified algorithm in eg RFC 3629.



    This happens to return a working string only because the tool is written in JavaScript. Having decoded the above byte sequence to the bogus surrogate code point sequence U+D83D,U+DE01 it then converts that into a JavaScript string using a direct code-point-to-code-unit mapping giving uD83DxDE01. As this is the correct way to encode 😁 in a UTF-16 string it appears to have worked.



    (*: It is a valid CESU-8 sequence, but that encoding is just “bogus broken encoding for compatibility with badly-written historical tools” and should generally be avoided.)



    You should not usually encounter a sequence like this; it is typically not worth catering for unless you have a specific source of this kind of malformed data which you don't have the power to get fixed.






    share|improve this answer


























    • Thank you very much for answer. We read string data from our server which use C++ language, after server convert unicode string to utf-8, this issue occurs. One more thing need to mention is that, when our client receive data as a string value cstr, and printf("%s", cstr) it's correct. But when convert string to NSString, NSString *ocstr = [[NSString alloc] initWithBytes:cstr.c_str() length:cstr.length() encoding:NSUTF8StringEncoding]; ocstr results as nil. why apple do not support the CESU-8 sequence? Do we have function to resolve the issue?

      – pinchwang
      Dec 23 '15 at 2:12











    • I would first look at the C++ server UTF-8 encoder, to see if it can be fixed properly at source. CESU-8 is considered an undesirable anomaly that you'd never deliberately want to use; most systems don't support it. If you have to accept it you'll need to write your own CESU-8 decoder walking through the input byte array (or use an existing library, eg ICU though that would be a really heavy dependency just for this).

      – bobince
      Dec 23 '15 at 11:36











    • Just as a side note, there is one particularly bothersome source of encoding like this: JNI (Java Native Interface). If you attempt to retrieve "UTF-8" bytes from a Java string you will receive the "modified UTF-8" variant. That is a rather large source of malformed data that cannot be fixed, unfortunately.

      – borrrden
      Jul 12 '18 at 17:02














    9












    9








    9








    0xF0, 0x9F, 0x98, 0x81




    Is the correct UTF-8 encoding for U+1F601 😁.




    0xED, 0xA0, 0xBD, 0xED, 0xB8, 0x81




    Is not a valid UTF-8 sequence(*). It should really be rejected; iOS is correct to do so.



    This is a bug in the bianma tool: the convertUtf8BytesToUnicodeCodePoints function is more lenient about what input it accepts than the specified algorithm in eg RFC 3629.



    This happens to return a working string only because the tool is written in JavaScript. Having decoded the above byte sequence to the bogus surrogate code point sequence U+D83D,U+DE01 it then converts that into a JavaScript string using a direct code-point-to-code-unit mapping giving uD83DxDE01. As this is the correct way to encode 😁 in a UTF-16 string it appears to have worked.



    (*: It is a valid CESU-8 sequence, but that encoding is just “bogus broken encoding for compatibility with badly-written historical tools” and should generally be avoided.)



    You should not usually encounter a sequence like this; it is typically not worth catering for unless you have a specific source of this kind of malformed data which you don't have the power to get fixed.






    share|improve this answer
















    0xF0, 0x9F, 0x98, 0x81




    Is the correct UTF-8 encoding for U+1F601 😁.




    0xED, 0xA0, 0xBD, 0xED, 0xB8, 0x81




    Is not a valid UTF-8 sequence(*). It should really be rejected; iOS is correct to do so.



    This is a bug in the bianma tool: the convertUtf8BytesToUnicodeCodePoints function is more lenient about what input it accepts than the specified algorithm in eg RFC 3629.



    This happens to return a working string only because the tool is written in JavaScript. Having decoded the above byte sequence to the bogus surrogate code point sequence U+D83D,U+DE01 it then converts that into a JavaScript string using a direct code-point-to-code-unit mapping giving uD83DxDE01. As this is the correct way to encode 😁 in a UTF-16 string it appears to have worked.



    (*: It is a valid CESU-8 sequence, but that encoding is just “bogus broken encoding for compatibility with badly-written historical tools” and should generally be avoided.)



    You should not usually encounter a sequence like this; it is typically not worth catering for unless you have a specific source of this kind of malformed data which you don't have the power to get fixed.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 22 '15 at 23:08

























    answered Dec 22 '15 at 23:03









    bobincebobince

    444k89571770




    444k89571770













    • Thank you very much for answer. We read string data from our server which use C++ language, after server convert unicode string to utf-8, this issue occurs. One more thing need to mention is that, when our client receive data as a string value cstr, and printf("%s", cstr) it's correct. But when convert string to NSString, NSString *ocstr = [[NSString alloc] initWithBytes:cstr.c_str() length:cstr.length() encoding:NSUTF8StringEncoding]; ocstr results as nil. why apple do not support the CESU-8 sequence? Do we have function to resolve the issue?

      – pinchwang
      Dec 23 '15 at 2:12











    • I would first look at the C++ server UTF-8 encoder, to see if it can be fixed properly at source. CESU-8 is considered an undesirable anomaly that you'd never deliberately want to use; most systems don't support it. If you have to accept it you'll need to write your own CESU-8 decoder walking through the input byte array (or use an existing library, eg ICU though that would be a really heavy dependency just for this).

      – bobince
      Dec 23 '15 at 11:36











    • Just as a side note, there is one particularly bothersome source of encoding like this: JNI (Java Native Interface). If you attempt to retrieve "UTF-8" bytes from a Java string you will receive the "modified UTF-8" variant. That is a rather large source of malformed data that cannot be fixed, unfortunately.

      – borrrden
      Jul 12 '18 at 17:02



















    • Thank you very much for answer. We read string data from our server which use C++ language, after server convert unicode string to utf-8, this issue occurs. One more thing need to mention is that, when our client receive data as a string value cstr, and printf("%s", cstr) it's correct. But when convert string to NSString, NSString *ocstr = [[NSString alloc] initWithBytes:cstr.c_str() length:cstr.length() encoding:NSUTF8StringEncoding]; ocstr results as nil. why apple do not support the CESU-8 sequence? Do we have function to resolve the issue?

      – pinchwang
      Dec 23 '15 at 2:12











    • I would first look at the C++ server UTF-8 encoder, to see if it can be fixed properly at source. CESU-8 is considered an undesirable anomaly that you'd never deliberately want to use; most systems don't support it. If you have to accept it you'll need to write your own CESU-8 decoder walking through the input byte array (or use an existing library, eg ICU though that would be a really heavy dependency just for this).

      – bobince
      Dec 23 '15 at 11:36











    • Just as a side note, there is one particularly bothersome source of encoding like this: JNI (Java Native Interface). If you attempt to retrieve "UTF-8" bytes from a Java string you will receive the "modified UTF-8" variant. That is a rather large source of malformed data that cannot be fixed, unfortunately.

      – borrrden
      Jul 12 '18 at 17:02

















    Thank you very much for answer. We read string data from our server which use C++ language, after server convert unicode string to utf-8, this issue occurs. One more thing need to mention is that, when our client receive data as a string value cstr, and printf("%s", cstr) it's correct. But when convert string to NSString, NSString *ocstr = [[NSString alloc] initWithBytes:cstr.c_str() length:cstr.length() encoding:NSUTF8StringEncoding]; ocstr results as nil. why apple do not support the CESU-8 sequence? Do we have function to resolve the issue?

    – pinchwang
    Dec 23 '15 at 2:12





    Thank you very much for answer. We read string data from our server which use C++ language, after server convert unicode string to utf-8, this issue occurs. One more thing need to mention is that, when our client receive data as a string value cstr, and printf("%s", cstr) it's correct. But when convert string to NSString, NSString *ocstr = [[NSString alloc] initWithBytes:cstr.c_str() length:cstr.length() encoding:NSUTF8StringEncoding]; ocstr results as nil. why apple do not support the CESU-8 sequence? Do we have function to resolve the issue?

    – pinchwang
    Dec 23 '15 at 2:12













    I would first look at the C++ server UTF-8 encoder, to see if it can be fixed properly at source. CESU-8 is considered an undesirable anomaly that you'd never deliberately want to use; most systems don't support it. If you have to accept it you'll need to write your own CESU-8 decoder walking through the input byte array (or use an existing library, eg ICU though that would be a really heavy dependency just for this).

    – bobince
    Dec 23 '15 at 11:36





    I would first look at the C++ server UTF-8 encoder, to see if it can be fixed properly at source. CESU-8 is considered an undesirable anomaly that you'd never deliberately want to use; most systems don't support it. If you have to accept it you'll need to write your own CESU-8 decoder walking through the input byte array (or use an existing library, eg ICU though that would be a really heavy dependency just for this).

    – bobince
    Dec 23 '15 at 11:36













    Just as a side note, there is one particularly bothersome source of encoding like this: JNI (Java Native Interface). If you attempt to retrieve "UTF-8" bytes from a Java string you will receive the "modified UTF-8" variant. That is a rather large source of malformed data that cannot be fixed, unfortunately.

    – borrrden
    Jul 12 '18 at 17:02





    Just as a side note, there is one particularly bothersome source of encoding like this: JNI (Java Native Interface). If you attempt to retrieve "UTF-8" bytes from a Java string you will receive the "modified UTF-8" variant. That is a rather large source of malformed data that cannot be fixed, unfortunately.

    – borrrden
    Jul 12 '18 at 17:02













    0














    This worked for me in php to send a message with emoji to telegram bot:



    $message_text = " xf0x9fx98x81 ";





    share|improve this answer




























      0














      This worked for me in php to send a message with emoji to telegram bot:



      $message_text = " xf0x9fx98x81 ";





      share|improve this answer


























        0












        0








        0







        This worked for me in php to send a message with emoji to telegram bot:



        $message_text = " xf0x9fx98x81 ";





        share|improve this answer













        This worked for me in php to send a message with emoji to telegram bot:



        $message_text = " xf0x9fx98x81 ";






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jun 12 '18 at 9:41









        PolinaPolina

        31




        31






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f34409085%2fwhy-does-emoji-have-two-different-utf-8-codes-how-to-convert-emoji-from-utf-8%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Bressuire

            Vorschmack

            Quarantine