Why does emoji have two different utf-8 codes? How to convert emoji from utf-8 , use NSString in ios?
We have found an issue, that some emoji have two utf-8 codes, such as:
emoji unicode utf-8 another utf-8
😁 U+1F601 xf0x9fx98x81 xedxa0xbdxedxb8x81
But ios language can't decode the other type of utf-8, so resulting an error when i decode string from utf-8.
In all documents i found, i can just find one type of utf-8 code for a emoji, no where to find the other.
Documents i referenced includes:
emoji code link
whole utf-8 code link
But in a web tool bianma, all the two types of utf-8 code can be converted into emoji correctly.
So, my question is :
Why does there have two types of utf-8 codes for one emoji ?
Where has a document which includes the two types of utf-8 codes?
How to correctly convert string from utf-8, using NSString in ios language?
ios unicode utf-8 nsstring emoji
add a comment |
We have found an issue, that some emoji have two utf-8 codes, such as:
emoji unicode utf-8 another utf-8
😁 U+1F601 xf0x9fx98x81 xedxa0xbdxedxb8x81
But ios language can't decode the other type of utf-8, so resulting an error when i decode string from utf-8.
In all documents i found, i can just find one type of utf-8 code for a emoji, no where to find the other.
Documents i referenced includes:
emoji code link
whole utf-8 code link
But in a web tool bianma, all the two types of utf-8 code can be converted into emoji correctly.
So, my question is :
Why does there have two types of utf-8 codes for one emoji ?
Where has a document which includes the two types of utf-8 codes?
How to correctly convert string from utf-8, using NSString in ios language?
ios unicode utf-8 nsstring emoji
This had me intrigued as my first thought was that the long UTF-8 representation was two UTF-8 blocks. It turns out that there are two variations of UTF-8, CESU-8 and Modified UTF-8, which encode UTF-16 style. You may be able to use this article iphonedevsdk.com/forum/iphone-sdk-development/… to write a decoder if there's no suitable iOS/Objective-C native decoder.
– Alastair McCormack
Dec 22 '15 at 11:33
@AlastairMcCormack That's the answer I think. You should post that as an answer.
– roeland
Dec 22 '15 at 22:23
@user692793 Please never post text as images, especially not code or output.
– roeland
Dec 22 '15 at 22:24
Thanks @roeland. I think a proper answer should contain some working code, but as I'm not an Objective-C coder I'll leave it to someone else to pickup the glory :)
– Alastair McCormack
Dec 22 '15 at 22:26
add a comment |
We have found an issue, that some emoji have two utf-8 codes, such as:
emoji unicode utf-8 another utf-8
😁 U+1F601 xf0x9fx98x81 xedxa0xbdxedxb8x81
But ios language can't decode the other type of utf-8, so resulting an error when i decode string from utf-8.
In all documents i found, i can just find one type of utf-8 code for a emoji, no where to find the other.
Documents i referenced includes:
emoji code link
whole utf-8 code link
But in a web tool bianma, all the two types of utf-8 code can be converted into emoji correctly.
So, my question is :
Why does there have two types of utf-8 codes for one emoji ?
Where has a document which includes the two types of utf-8 codes?
How to correctly convert string from utf-8, using NSString in ios language?
ios unicode utf-8 nsstring emoji
We have found an issue, that some emoji have two utf-8 codes, such as:
emoji unicode utf-8 another utf-8
😁 U+1F601 xf0x9fx98x81 xedxa0xbdxedxb8x81
But ios language can't decode the other type of utf-8, so resulting an error when i decode string from utf-8.
In all documents i found, i can just find one type of utf-8 code for a emoji, no where to find the other.
Documents i referenced includes:
emoji code link
whole utf-8 code link
But in a web tool bianma, all the two types of utf-8 code can be converted into emoji correctly.
So, my question is :
Why does there have two types of utf-8 codes for one emoji ?
Where has a document which includes the two types of utf-8 codes?
How to correctly convert string from utf-8, using NSString in ios language?
ios unicode utf-8 nsstring emoji
ios unicode utf-8 nsstring emoji
asked Dec 22 '15 at 5:34
pinchwangpinchwang
831110
831110
This had me intrigued as my first thought was that the long UTF-8 representation was two UTF-8 blocks. It turns out that there are two variations of UTF-8, CESU-8 and Modified UTF-8, which encode UTF-16 style. You may be able to use this article iphonedevsdk.com/forum/iphone-sdk-development/… to write a decoder if there's no suitable iOS/Objective-C native decoder.
– Alastair McCormack
Dec 22 '15 at 11:33
@AlastairMcCormack That's the answer I think. You should post that as an answer.
– roeland
Dec 22 '15 at 22:23
@user692793 Please never post text as images, especially not code or output.
– roeland
Dec 22 '15 at 22:24
Thanks @roeland. I think a proper answer should contain some working code, but as I'm not an Objective-C coder I'll leave it to someone else to pickup the glory :)
– Alastair McCormack
Dec 22 '15 at 22:26
add a comment |
This had me intrigued as my first thought was that the long UTF-8 representation was two UTF-8 blocks. It turns out that there are two variations of UTF-8, CESU-8 and Modified UTF-8, which encode UTF-16 style. You may be able to use this article iphonedevsdk.com/forum/iphone-sdk-development/… to write a decoder if there's no suitable iOS/Objective-C native decoder.
– Alastair McCormack
Dec 22 '15 at 11:33
@AlastairMcCormack That's the answer I think. You should post that as an answer.
– roeland
Dec 22 '15 at 22:23
@user692793 Please never post text as images, especially not code or output.
– roeland
Dec 22 '15 at 22:24
Thanks @roeland. I think a proper answer should contain some working code, but as I'm not an Objective-C coder I'll leave it to someone else to pickup the glory :)
– Alastair McCormack
Dec 22 '15 at 22:26
This had me intrigued as my first thought was that the long UTF-8 representation was two UTF-8 blocks. It turns out that there are two variations of UTF-8, CESU-8 and Modified UTF-8, which encode UTF-16 style. You may be able to use this article iphonedevsdk.com/forum/iphone-sdk-development/… to write a decoder if there's no suitable iOS/Objective-C native decoder.
– Alastair McCormack
Dec 22 '15 at 11:33
This had me intrigued as my first thought was that the long UTF-8 representation was two UTF-8 blocks. It turns out that there are two variations of UTF-8, CESU-8 and Modified UTF-8, which encode UTF-16 style. You may be able to use this article iphonedevsdk.com/forum/iphone-sdk-development/… to write a decoder if there's no suitable iOS/Objective-C native decoder.
– Alastair McCormack
Dec 22 '15 at 11:33
@AlastairMcCormack That's the answer I think. You should post that as an answer.
– roeland
Dec 22 '15 at 22:23
@AlastairMcCormack That's the answer I think. You should post that as an answer.
– roeland
Dec 22 '15 at 22:23
@user692793 Please never post text as images, especially not code or output.
– roeland
Dec 22 '15 at 22:24
@user692793 Please never post text as images, especially not code or output.
– roeland
Dec 22 '15 at 22:24
Thanks @roeland. I think a proper answer should contain some working code, but as I'm not an Objective-C coder I'll leave it to someone else to pickup the glory :)
– Alastair McCormack
Dec 22 '15 at 22:26
Thanks @roeland. I think a proper answer should contain some working code, but as I'm not an Objective-C coder I'll leave it to someone else to pickup the glory :)
– Alastair McCormack
Dec 22 '15 at 22:26
add a comment |
2 Answers
2
active
oldest
votes
0xF0, 0x9F, 0x98, 0x81
Is the correct UTF-8 encoding for U+1F601 😁.
0xED, 0xA0, 0xBD, 0xED, 0xB8, 0x81
Is not a valid UTF-8 sequence(*). It should really be rejected; iOS is correct to do so.
This is a bug in the bianma tool: the convertUtf8BytesToUnicodeCodePoints
function is more lenient about what input it accepts than the specified algorithm in eg RFC 3629.
This happens to return a working string only because the tool is written in JavaScript. Having decoded the above byte sequence to the bogus surrogate code point sequence U+D83D,U+DE01 it then converts that into a JavaScript string using a direct code-point-to-code-unit mapping giving uD83DxDE01
. As this is the correct way to encode 😁 in a UTF-16 string it appears to have worked.
(*: It is a valid CESU-8 sequence, but that encoding is just “bogus broken encoding for compatibility with badly-written historical tools” and should generally be avoided.)
You should not usually encounter a sequence like this; it is typically not worth catering for unless you have a specific source of this kind of malformed data which you don't have the power to get fixed.
Thank you very much for answer. We read string data from our server which use C++ language, after server convert unicode string to utf-8, this issue occurs. One more thing need to mention is that, when our client receive data as a string value cstr, and printf("%s", cstr) it's correct. But when convert string to NSString, NSString *ocstr = [[NSString alloc] initWithBytes:cstr.c_str() length:cstr.length() encoding:NSUTF8StringEncoding]; ocstr results as nil. why apple do not support the CESU-8 sequence? Do we have function to resolve the issue?
– pinchwang
Dec 23 '15 at 2:12
I would first look at the C++ server UTF-8 encoder, to see if it can be fixed properly at source. CESU-8 is considered an undesirable anomaly that you'd never deliberately want to use; most systems don't support it. If you have to accept it you'll need to write your own CESU-8 decoder walking through the input byte array (or use an existing library, eg ICU though that would be a really heavy dependency just for this).
– bobince
Dec 23 '15 at 11:36
Just as a side note, there is one particularly bothersome source of encoding like this: JNI (Java Native Interface). If you attempt to retrieve "UTF-8" bytes from a Java string you will receive the "modified UTF-8" variant. That is a rather large source of malformed data that cannot be fixed, unfortunately.
– borrrden
Jul 12 '18 at 17:02
add a comment |
This worked for me in php to send a message with emoji to telegram bot:
$message_text = " xf0x9fx98x81 ";
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f34409085%2fwhy-does-emoji-have-two-different-utf-8-codes-how-to-convert-emoji-from-utf-8%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
0xF0, 0x9F, 0x98, 0x81
Is the correct UTF-8 encoding for U+1F601 😁.
0xED, 0xA0, 0xBD, 0xED, 0xB8, 0x81
Is not a valid UTF-8 sequence(*). It should really be rejected; iOS is correct to do so.
This is a bug in the bianma tool: the convertUtf8BytesToUnicodeCodePoints
function is more lenient about what input it accepts than the specified algorithm in eg RFC 3629.
This happens to return a working string only because the tool is written in JavaScript. Having decoded the above byte sequence to the bogus surrogate code point sequence U+D83D,U+DE01 it then converts that into a JavaScript string using a direct code-point-to-code-unit mapping giving uD83DxDE01
. As this is the correct way to encode 😁 in a UTF-16 string it appears to have worked.
(*: It is a valid CESU-8 sequence, but that encoding is just “bogus broken encoding for compatibility with badly-written historical tools” and should generally be avoided.)
You should not usually encounter a sequence like this; it is typically not worth catering for unless you have a specific source of this kind of malformed data which you don't have the power to get fixed.
Thank you very much for answer. We read string data from our server which use C++ language, after server convert unicode string to utf-8, this issue occurs. One more thing need to mention is that, when our client receive data as a string value cstr, and printf("%s", cstr) it's correct. But when convert string to NSString, NSString *ocstr = [[NSString alloc] initWithBytes:cstr.c_str() length:cstr.length() encoding:NSUTF8StringEncoding]; ocstr results as nil. why apple do not support the CESU-8 sequence? Do we have function to resolve the issue?
– pinchwang
Dec 23 '15 at 2:12
I would first look at the C++ server UTF-8 encoder, to see if it can be fixed properly at source. CESU-8 is considered an undesirable anomaly that you'd never deliberately want to use; most systems don't support it. If you have to accept it you'll need to write your own CESU-8 decoder walking through the input byte array (or use an existing library, eg ICU though that would be a really heavy dependency just for this).
– bobince
Dec 23 '15 at 11:36
Just as a side note, there is one particularly bothersome source of encoding like this: JNI (Java Native Interface). If you attempt to retrieve "UTF-8" bytes from a Java string you will receive the "modified UTF-8" variant. That is a rather large source of malformed data that cannot be fixed, unfortunately.
– borrrden
Jul 12 '18 at 17:02
add a comment |
0xF0, 0x9F, 0x98, 0x81
Is the correct UTF-8 encoding for U+1F601 😁.
0xED, 0xA0, 0xBD, 0xED, 0xB8, 0x81
Is not a valid UTF-8 sequence(*). It should really be rejected; iOS is correct to do so.
This is a bug in the bianma tool: the convertUtf8BytesToUnicodeCodePoints
function is more lenient about what input it accepts than the specified algorithm in eg RFC 3629.
This happens to return a working string only because the tool is written in JavaScript. Having decoded the above byte sequence to the bogus surrogate code point sequence U+D83D,U+DE01 it then converts that into a JavaScript string using a direct code-point-to-code-unit mapping giving uD83DxDE01
. As this is the correct way to encode 😁 in a UTF-16 string it appears to have worked.
(*: It is a valid CESU-8 sequence, but that encoding is just “bogus broken encoding for compatibility with badly-written historical tools” and should generally be avoided.)
You should not usually encounter a sequence like this; it is typically not worth catering for unless you have a specific source of this kind of malformed data which you don't have the power to get fixed.
Thank you very much for answer. We read string data from our server which use C++ language, after server convert unicode string to utf-8, this issue occurs. One more thing need to mention is that, when our client receive data as a string value cstr, and printf("%s", cstr) it's correct. But when convert string to NSString, NSString *ocstr = [[NSString alloc] initWithBytes:cstr.c_str() length:cstr.length() encoding:NSUTF8StringEncoding]; ocstr results as nil. why apple do not support the CESU-8 sequence? Do we have function to resolve the issue?
– pinchwang
Dec 23 '15 at 2:12
I would first look at the C++ server UTF-8 encoder, to see if it can be fixed properly at source. CESU-8 is considered an undesirable anomaly that you'd never deliberately want to use; most systems don't support it. If you have to accept it you'll need to write your own CESU-8 decoder walking through the input byte array (or use an existing library, eg ICU though that would be a really heavy dependency just for this).
– bobince
Dec 23 '15 at 11:36
Just as a side note, there is one particularly bothersome source of encoding like this: JNI (Java Native Interface). If you attempt to retrieve "UTF-8" bytes from a Java string you will receive the "modified UTF-8" variant. That is a rather large source of malformed data that cannot be fixed, unfortunately.
– borrrden
Jul 12 '18 at 17:02
add a comment |
0xF0, 0x9F, 0x98, 0x81
Is the correct UTF-8 encoding for U+1F601 😁.
0xED, 0xA0, 0xBD, 0xED, 0xB8, 0x81
Is not a valid UTF-8 sequence(*). It should really be rejected; iOS is correct to do so.
This is a bug in the bianma tool: the convertUtf8BytesToUnicodeCodePoints
function is more lenient about what input it accepts than the specified algorithm in eg RFC 3629.
This happens to return a working string only because the tool is written in JavaScript. Having decoded the above byte sequence to the bogus surrogate code point sequence U+D83D,U+DE01 it then converts that into a JavaScript string using a direct code-point-to-code-unit mapping giving uD83DxDE01
. As this is the correct way to encode 😁 in a UTF-16 string it appears to have worked.
(*: It is a valid CESU-8 sequence, but that encoding is just “bogus broken encoding for compatibility with badly-written historical tools” and should generally be avoided.)
You should not usually encounter a sequence like this; it is typically not worth catering for unless you have a specific source of this kind of malformed data which you don't have the power to get fixed.
0xF0, 0x9F, 0x98, 0x81
Is the correct UTF-8 encoding for U+1F601 😁.
0xED, 0xA0, 0xBD, 0xED, 0xB8, 0x81
Is not a valid UTF-8 sequence(*). It should really be rejected; iOS is correct to do so.
This is a bug in the bianma tool: the convertUtf8BytesToUnicodeCodePoints
function is more lenient about what input it accepts than the specified algorithm in eg RFC 3629.
This happens to return a working string only because the tool is written in JavaScript. Having decoded the above byte sequence to the bogus surrogate code point sequence U+D83D,U+DE01 it then converts that into a JavaScript string using a direct code-point-to-code-unit mapping giving uD83DxDE01
. As this is the correct way to encode 😁 in a UTF-16 string it appears to have worked.
(*: It is a valid CESU-8 sequence, but that encoding is just “bogus broken encoding for compatibility with badly-written historical tools” and should generally be avoided.)
You should not usually encounter a sequence like this; it is typically not worth catering for unless you have a specific source of this kind of malformed data which you don't have the power to get fixed.
edited Dec 22 '15 at 23:08
answered Dec 22 '15 at 23:03
bobincebobince
444k89571770
444k89571770
Thank you very much for answer. We read string data from our server which use C++ language, after server convert unicode string to utf-8, this issue occurs. One more thing need to mention is that, when our client receive data as a string value cstr, and printf("%s", cstr) it's correct. But when convert string to NSString, NSString *ocstr = [[NSString alloc] initWithBytes:cstr.c_str() length:cstr.length() encoding:NSUTF8StringEncoding]; ocstr results as nil. why apple do not support the CESU-8 sequence? Do we have function to resolve the issue?
– pinchwang
Dec 23 '15 at 2:12
I would first look at the C++ server UTF-8 encoder, to see if it can be fixed properly at source. CESU-8 is considered an undesirable anomaly that you'd never deliberately want to use; most systems don't support it. If you have to accept it you'll need to write your own CESU-8 decoder walking through the input byte array (or use an existing library, eg ICU though that would be a really heavy dependency just for this).
– bobince
Dec 23 '15 at 11:36
Just as a side note, there is one particularly bothersome source of encoding like this: JNI (Java Native Interface). If you attempt to retrieve "UTF-8" bytes from a Java string you will receive the "modified UTF-8" variant. That is a rather large source of malformed data that cannot be fixed, unfortunately.
– borrrden
Jul 12 '18 at 17:02
add a comment |
Thank you very much for answer. We read string data from our server which use C++ language, after server convert unicode string to utf-8, this issue occurs. One more thing need to mention is that, when our client receive data as a string value cstr, and printf("%s", cstr) it's correct. But when convert string to NSString, NSString *ocstr = [[NSString alloc] initWithBytes:cstr.c_str() length:cstr.length() encoding:NSUTF8StringEncoding]; ocstr results as nil. why apple do not support the CESU-8 sequence? Do we have function to resolve the issue?
– pinchwang
Dec 23 '15 at 2:12
I would first look at the C++ server UTF-8 encoder, to see if it can be fixed properly at source. CESU-8 is considered an undesirable anomaly that you'd never deliberately want to use; most systems don't support it. If you have to accept it you'll need to write your own CESU-8 decoder walking through the input byte array (or use an existing library, eg ICU though that would be a really heavy dependency just for this).
– bobince
Dec 23 '15 at 11:36
Just as a side note, there is one particularly bothersome source of encoding like this: JNI (Java Native Interface). If you attempt to retrieve "UTF-8" bytes from a Java string you will receive the "modified UTF-8" variant. That is a rather large source of malformed data that cannot be fixed, unfortunately.
– borrrden
Jul 12 '18 at 17:02
Thank you very much for answer. We read string data from our server which use C++ language, after server convert unicode string to utf-8, this issue occurs. One more thing need to mention is that, when our client receive data as a string value cstr, and printf("%s", cstr) it's correct. But when convert string to NSString, NSString *ocstr = [[NSString alloc] initWithBytes:cstr.c_str() length:cstr.length() encoding:NSUTF8StringEncoding]; ocstr results as nil. why apple do not support the CESU-8 sequence? Do we have function to resolve the issue?
– pinchwang
Dec 23 '15 at 2:12
Thank you very much for answer. We read string data from our server which use C++ language, after server convert unicode string to utf-8, this issue occurs. One more thing need to mention is that, when our client receive data as a string value cstr, and printf("%s", cstr) it's correct. But when convert string to NSString, NSString *ocstr = [[NSString alloc] initWithBytes:cstr.c_str() length:cstr.length() encoding:NSUTF8StringEncoding]; ocstr results as nil. why apple do not support the CESU-8 sequence? Do we have function to resolve the issue?
– pinchwang
Dec 23 '15 at 2:12
I would first look at the C++ server UTF-8 encoder, to see if it can be fixed properly at source. CESU-8 is considered an undesirable anomaly that you'd never deliberately want to use; most systems don't support it. If you have to accept it you'll need to write your own CESU-8 decoder walking through the input byte array (or use an existing library, eg ICU though that would be a really heavy dependency just for this).
– bobince
Dec 23 '15 at 11:36
I would first look at the C++ server UTF-8 encoder, to see if it can be fixed properly at source. CESU-8 is considered an undesirable anomaly that you'd never deliberately want to use; most systems don't support it. If you have to accept it you'll need to write your own CESU-8 decoder walking through the input byte array (or use an existing library, eg ICU though that would be a really heavy dependency just for this).
– bobince
Dec 23 '15 at 11:36
Just as a side note, there is one particularly bothersome source of encoding like this: JNI (Java Native Interface). If you attempt to retrieve "UTF-8" bytes from a Java string you will receive the "modified UTF-8" variant. That is a rather large source of malformed data that cannot be fixed, unfortunately.
– borrrden
Jul 12 '18 at 17:02
Just as a side note, there is one particularly bothersome source of encoding like this: JNI (Java Native Interface). If you attempt to retrieve "UTF-8" bytes from a Java string you will receive the "modified UTF-8" variant. That is a rather large source of malformed data that cannot be fixed, unfortunately.
– borrrden
Jul 12 '18 at 17:02
add a comment |
This worked for me in php to send a message with emoji to telegram bot:
$message_text = " xf0x9fx98x81 ";
add a comment |
This worked for me in php to send a message with emoji to telegram bot:
$message_text = " xf0x9fx98x81 ";
add a comment |
This worked for me in php to send a message with emoji to telegram bot:
$message_text = " xf0x9fx98x81 ";
This worked for me in php to send a message with emoji to telegram bot:
$message_text = " xf0x9fx98x81 ";
answered Jun 12 '18 at 9:41
PolinaPolina
31
31
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f34409085%2fwhy-does-emoji-have-two-different-utf-8-codes-how-to-convert-emoji-from-utf-8%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
This had me intrigued as my first thought was that the long UTF-8 representation was two UTF-8 blocks. It turns out that there are two variations of UTF-8, CESU-8 and Modified UTF-8, which encode UTF-16 style. You may be able to use this article iphonedevsdk.com/forum/iphone-sdk-development/… to write a decoder if there's no suitable iOS/Objective-C native decoder.
– Alastair McCormack
Dec 22 '15 at 11:33
@AlastairMcCormack That's the answer I think. You should post that as an answer.
– roeland
Dec 22 '15 at 22:23
@user692793 Please never post text as images, especially not code or output.
– roeland
Dec 22 '15 at 22:24
Thanks @roeland. I think a proper answer should contain some working code, but as I'm not an Objective-C coder I'll leave it to someone else to pickup the glory :)
– Alastair McCormack
Dec 22 '15 at 22:26