Why is rune in golang an alias for int32 and not uint32?
The type rune
in Go is defined as
an alias for
int32
and is equivalent toint32
in all ways. It is
used, by convention, to distinguish character values from integer
values.
If the intention is to use this type to represent character values, why did the authors of the Go language do not use uint32
instead of int32
? How do they expect a rune
value to be handled in a program, when it is negative? The other similar type, byte
, is an alias for uint8
(and not int8
), which seems reasonable.
go
add a comment |
The type rune
in Go is defined as
an alias for
int32
and is equivalent toint32
in all ways. It is
used, by convention, to distinguish character values from integer
values.
If the intention is to use this type to represent character values, why did the authors of the Go language do not use uint32
instead of int32
? How do they expect a rune
value to be handled in a program, when it is negative? The other similar type, byte
, is an alias for uint8
(and not int8
), which seems reasonable.
go
1
Note:byte
is an alias foruint8
, notuint
.
– Filipe Gonçalves
Aug 26 '15 at 23:11
You selected the right answer before, what has changed?
– VonC
May 17 '18 at 18:44
add a comment |
The type rune
in Go is defined as
an alias for
int32
and is equivalent toint32
in all ways. It is
used, by convention, to distinguish character values from integer
values.
If the intention is to use this type to represent character values, why did the authors of the Go language do not use uint32
instead of int32
? How do they expect a rune
value to be handled in a program, when it is negative? The other similar type, byte
, is an alias for uint8
(and not int8
), which seems reasonable.
go
The type rune
in Go is defined as
an alias for
int32
and is equivalent toint32
in all ways. It is
used, by convention, to distinguish character values from integer
values.
If the intention is to use this type to represent character values, why did the authors of the Go language do not use uint32
instead of int32
? How do they expect a rune
value to be handled in a program, when it is negative? The other similar type, byte
, is an alias for uint8
(and not int8
), which seems reasonable.
go
go
edited Nov 15 '18 at 7:28
Rene Knop
1,3633722
1,3633722
asked Jul 12 '14 at 15:55
Tapan KarechaTapan Karecha
952817
952817
1
Note:byte
is an alias foruint8
, notuint
.
– Filipe Gonçalves
Aug 26 '15 at 23:11
You selected the right answer before, what has changed?
– VonC
May 17 '18 at 18:44
add a comment |
1
Note:byte
is an alias foruint8
, notuint
.
– Filipe Gonçalves
Aug 26 '15 at 23:11
You selected the right answer before, what has changed?
– VonC
May 17 '18 at 18:44
1
1
Note:
byte
is an alias for uint8
, not uint
.– Filipe Gonçalves
Aug 26 '15 at 23:11
Note:
byte
is an alias for uint8
, not uint
.– Filipe Gonçalves
Aug 26 '15 at 23:11
You selected the right answer before, what has changed?
– VonC
May 17 '18 at 18:44
You selected the right answer before, what has changed?
– VonC
May 17 '18 at 18:44
add a comment |
3 Answers
3
active
oldest
votes
I googled and found this:
https://groups.google.com/forum/#!topic/golang-nuts/d3_GPK8bwBg
This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types.
All answers in that thread argue that there is enough space to reference all code points of Unicode in a signed 32 bit integer. Hence, I do understand how rune is big enough to address the Unicode range. The question still remains about the choice of type. Why not uint16 (which has comparable range of values for positive integers) but uses only half the space as int32?
– Tapan Karecha
Jul 12 '14 at 16:20
2
@TapanKarecha: uint16 doesn’t fit all of Unicode, though. It fits a really big chunk of it, but Unicode ends at0x10fffd
.
– Ry-♦
Jul 12 '14 at 16:21
3
Christoph Hack: "This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types."
– chendesheng
Jul 12 '14 at 16:27
1
@chendesheng, please add your comment into your answer. It is the most important part, in my opinion.
– andybalholm
Jul 12 '14 at 17:55
2
Yes:uint
can have hard-to-debug behavior likea-b > 1000
whena=1
andb=2
(play). So Go usesint
where it can.
– twotwotwo
Jul 13 '14 at 2:21
|
show 1 more comment
It doesn’t become negative. There are currently 1,114,112 codepoints in Unicode, which is far from 2,147,483,647 (0x7fffffff
) – even considering all the reserved blocks.
2
Thanks! Though a rune may address a range much larger than needed by unicode at this time, the question is about the fact that a negative value can be assigned to a rune. This could have been avoided if it was an unsigned integer. But there may be other considerations that make sense for a rune to still be a signed type, and I wonder what those are.
– Tapan Karecha
Jul 12 '14 at 16:10
@TapanKarecha: Sure, but you could also assign a positive value outside of Unicode’s range. Neither one would be valid Unicode. (Negative numbers might be more obvious to check for as an error condition, as a habit taken from C?)
– Ry-♦
Jul 12 '14 at 16:23
.@false: Yes, there will be invalid values on the positive end of the type range, but having invalid values on both ends of the type range is something I am having trouble dealing with as a concept. As you said, if the type was unsigned, I wont have to worry about checking for the negative value, which is one less check during validation.
– Tapan Karecha
Jul 12 '14 at 16:32
@TapanKarecha: No, I was saying that a negative return value on something that ought to return Unicode would be an obvious error (not something that Go needs, but something that you might commonly do in other languages), but checking the positive isn’t convenient at all. Judging by Unicode’s stability policy, it might not even be possible.
– Ry-♦
Jul 12 '14 at 16:35
6
I think chendesheng's quote gets at the root cause best: Go uses a lot of signed values, not just for runes but array indices,Read
/Write
byte counts, etc. That's becauseuint
s, in any language, behave confusingly unless you guard every piece of arithmetic against overflow (for example ifvar a, b uint = 1, 2
,a-b > 0
anda-b > 1000000
: play.golang.org/p/lsdiZJiN7V).int
s behave more like numbers in everyday life, which is a compelling reason to use them, and there is no equally compelling reason not to use them.
– twotwotwo
Jul 13 '14 at 2:03
add a comment |
"Golang, Go : what is rune by the way?" mentioned:
With the recent Unicode 6.3, there are over 110,000 symbols defined. This requires at least 21-bit representation of each code point, so a rune is like int32 and has plenty of bits.
But regarding the overflow or negative value issues, note that the implementation of some of the unicode functions like unicode.IsGraphic do include:
We convert to
uint32
to avoid the extra test for negative
Code:
const MaxLatin1 = 'u00FF' // maximum Latin-1 value.
// IsGraphic reports whether the rune is defined as a Graphic by Unicode.
// Such characters include letters, marks, numbers, punctuation, symbols, and
// spaces, from categories L, M, N, P, S, Zs.
func IsGraphic(r rune) bool {
// We convert to uint32 to avoid the extra test for negative,
// and in the index we convert to uint8 to avoid the range check.
if uint32(r) <= MaxLatin1 {
return properties[uint8(r)]&pg != 0
}
return In(r, GraphicRanges...)
}
That maybe because a rune is supposed to be constant (as mentioned in "Go rune type explanation", where a rune could be in an int32
or uint32
or even float32
or ...: its constant value authorizes it to be stored in any of those numeric types).
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f24714665%2fwhy-is-rune-in-golang-an-alias-for-int32-and-not-uint32%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
I googled and found this:
https://groups.google.com/forum/#!topic/golang-nuts/d3_GPK8bwBg
This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types.
All answers in that thread argue that there is enough space to reference all code points of Unicode in a signed 32 bit integer. Hence, I do understand how rune is big enough to address the Unicode range. The question still remains about the choice of type. Why not uint16 (which has comparable range of values for positive integers) but uses only half the space as int32?
– Tapan Karecha
Jul 12 '14 at 16:20
2
@TapanKarecha: uint16 doesn’t fit all of Unicode, though. It fits a really big chunk of it, but Unicode ends at0x10fffd
.
– Ry-♦
Jul 12 '14 at 16:21
3
Christoph Hack: "This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types."
– chendesheng
Jul 12 '14 at 16:27
1
@chendesheng, please add your comment into your answer. It is the most important part, in my opinion.
– andybalholm
Jul 12 '14 at 17:55
2
Yes:uint
can have hard-to-debug behavior likea-b > 1000
whena=1
andb=2
(play). So Go usesint
where it can.
– twotwotwo
Jul 13 '14 at 2:21
|
show 1 more comment
I googled and found this:
https://groups.google.com/forum/#!topic/golang-nuts/d3_GPK8bwBg
This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types.
All answers in that thread argue that there is enough space to reference all code points of Unicode in a signed 32 bit integer. Hence, I do understand how rune is big enough to address the Unicode range. The question still remains about the choice of type. Why not uint16 (which has comparable range of values for positive integers) but uses only half the space as int32?
– Tapan Karecha
Jul 12 '14 at 16:20
2
@TapanKarecha: uint16 doesn’t fit all of Unicode, though. It fits a really big chunk of it, but Unicode ends at0x10fffd
.
– Ry-♦
Jul 12 '14 at 16:21
3
Christoph Hack: "This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types."
– chendesheng
Jul 12 '14 at 16:27
1
@chendesheng, please add your comment into your answer. It is the most important part, in my opinion.
– andybalholm
Jul 12 '14 at 17:55
2
Yes:uint
can have hard-to-debug behavior likea-b > 1000
whena=1
andb=2
(play). So Go usesint
where it can.
– twotwotwo
Jul 13 '14 at 2:21
|
show 1 more comment
I googled and found this:
https://groups.google.com/forum/#!topic/golang-nuts/d3_GPK8bwBg
This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types.
I googled and found this:
https://groups.google.com/forum/#!topic/golang-nuts/d3_GPK8bwBg
This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types.
edited Sep 6 '16 at 15:22
Trevor Hickey
16.8k1494187
16.8k1494187
answered Jul 12 '14 at 16:08
chendeshengchendesheng
1,169611
1,169611
All answers in that thread argue that there is enough space to reference all code points of Unicode in a signed 32 bit integer. Hence, I do understand how rune is big enough to address the Unicode range. The question still remains about the choice of type. Why not uint16 (which has comparable range of values for positive integers) but uses only half the space as int32?
– Tapan Karecha
Jul 12 '14 at 16:20
2
@TapanKarecha: uint16 doesn’t fit all of Unicode, though. It fits a really big chunk of it, but Unicode ends at0x10fffd
.
– Ry-♦
Jul 12 '14 at 16:21
3
Christoph Hack: "This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types."
– chendesheng
Jul 12 '14 at 16:27
1
@chendesheng, please add your comment into your answer. It is the most important part, in my opinion.
– andybalholm
Jul 12 '14 at 17:55
2
Yes:uint
can have hard-to-debug behavior likea-b > 1000
whena=1
andb=2
(play). So Go usesint
where it can.
– twotwotwo
Jul 13 '14 at 2:21
|
show 1 more comment
All answers in that thread argue that there is enough space to reference all code points of Unicode in a signed 32 bit integer. Hence, I do understand how rune is big enough to address the Unicode range. The question still remains about the choice of type. Why not uint16 (which has comparable range of values for positive integers) but uses only half the space as int32?
– Tapan Karecha
Jul 12 '14 at 16:20
2
@TapanKarecha: uint16 doesn’t fit all of Unicode, though. It fits a really big chunk of it, but Unicode ends at0x10fffd
.
– Ry-♦
Jul 12 '14 at 16:21
3
Christoph Hack: "This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types."
– chendesheng
Jul 12 '14 at 16:27
1
@chendesheng, please add your comment into your answer. It is the most important part, in my opinion.
– andybalholm
Jul 12 '14 at 17:55
2
Yes:uint
can have hard-to-debug behavior likea-b > 1000
whena=1
andb=2
(play). So Go usesint
where it can.
– twotwotwo
Jul 13 '14 at 2:21
All answers in that thread argue that there is enough space to reference all code points of Unicode in a signed 32 bit integer. Hence, I do understand how rune is big enough to address the Unicode range. The question still remains about the choice of type. Why not uint16 (which has comparable range of values for positive integers) but uses only half the space as int32?
– Tapan Karecha
Jul 12 '14 at 16:20
All answers in that thread argue that there is enough space to reference all code points of Unicode in a signed 32 bit integer. Hence, I do understand how rune is big enough to address the Unicode range. The question still remains about the choice of type. Why not uint16 (which has comparable range of values for positive integers) but uses only half the space as int32?
– Tapan Karecha
Jul 12 '14 at 16:20
2
2
@TapanKarecha: uint16 doesn’t fit all of Unicode, though. It fits a really big chunk of it, but Unicode ends at
0x10fffd
.– Ry-♦
Jul 12 '14 at 16:21
@TapanKarecha: uint16 doesn’t fit all of Unicode, though. It fits a really big chunk of it, but Unicode ends at
0x10fffd
.– Ry-♦
Jul 12 '14 at 16:21
3
3
Christoph Hack: "This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types."
– chendesheng
Jul 12 '14 at 16:27
Christoph Hack: "This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types."
– chendesheng
Jul 12 '14 at 16:27
1
1
@chendesheng, please add your comment into your answer. It is the most important part, in my opinion.
– andybalholm
Jul 12 '14 at 17:55
@chendesheng, please add your comment into your answer. It is the most important part, in my opinion.
– andybalholm
Jul 12 '14 at 17:55
2
2
Yes:
uint
can have hard-to-debug behavior like a-b > 1000
when a=1
and b=2
(play). So Go uses int
where it can.– twotwotwo
Jul 13 '14 at 2:21
Yes:
uint
can have hard-to-debug behavior like a-b > 1000
when a=1
and b=2
(play). So Go uses int
where it can.– twotwotwo
Jul 13 '14 at 2:21
|
show 1 more comment
It doesn’t become negative. There are currently 1,114,112 codepoints in Unicode, which is far from 2,147,483,647 (0x7fffffff
) – even considering all the reserved blocks.
2
Thanks! Though a rune may address a range much larger than needed by unicode at this time, the question is about the fact that a negative value can be assigned to a rune. This could have been avoided if it was an unsigned integer. But there may be other considerations that make sense for a rune to still be a signed type, and I wonder what those are.
– Tapan Karecha
Jul 12 '14 at 16:10
@TapanKarecha: Sure, but you could also assign a positive value outside of Unicode’s range. Neither one would be valid Unicode. (Negative numbers might be more obvious to check for as an error condition, as a habit taken from C?)
– Ry-♦
Jul 12 '14 at 16:23
.@false: Yes, there will be invalid values on the positive end of the type range, but having invalid values on both ends of the type range is something I am having trouble dealing with as a concept. As you said, if the type was unsigned, I wont have to worry about checking for the negative value, which is one less check during validation.
– Tapan Karecha
Jul 12 '14 at 16:32
@TapanKarecha: No, I was saying that a negative return value on something that ought to return Unicode would be an obvious error (not something that Go needs, but something that you might commonly do in other languages), but checking the positive isn’t convenient at all. Judging by Unicode’s stability policy, it might not even be possible.
– Ry-♦
Jul 12 '14 at 16:35
6
I think chendesheng's quote gets at the root cause best: Go uses a lot of signed values, not just for runes but array indices,Read
/Write
byte counts, etc. That's becauseuint
s, in any language, behave confusingly unless you guard every piece of arithmetic against overflow (for example ifvar a, b uint = 1, 2
,a-b > 0
anda-b > 1000000
: play.golang.org/p/lsdiZJiN7V).int
s behave more like numbers in everyday life, which is a compelling reason to use them, and there is no equally compelling reason not to use them.
– twotwotwo
Jul 13 '14 at 2:03
add a comment |
It doesn’t become negative. There are currently 1,114,112 codepoints in Unicode, which is far from 2,147,483,647 (0x7fffffff
) – even considering all the reserved blocks.
2
Thanks! Though a rune may address a range much larger than needed by unicode at this time, the question is about the fact that a negative value can be assigned to a rune. This could have been avoided if it was an unsigned integer. But there may be other considerations that make sense for a rune to still be a signed type, and I wonder what those are.
– Tapan Karecha
Jul 12 '14 at 16:10
@TapanKarecha: Sure, but you could also assign a positive value outside of Unicode’s range. Neither one would be valid Unicode. (Negative numbers might be more obvious to check for as an error condition, as a habit taken from C?)
– Ry-♦
Jul 12 '14 at 16:23
.@false: Yes, there will be invalid values on the positive end of the type range, but having invalid values on both ends of the type range is something I am having trouble dealing with as a concept. As you said, if the type was unsigned, I wont have to worry about checking for the negative value, which is one less check during validation.
– Tapan Karecha
Jul 12 '14 at 16:32
@TapanKarecha: No, I was saying that a negative return value on something that ought to return Unicode would be an obvious error (not something that Go needs, but something that you might commonly do in other languages), but checking the positive isn’t convenient at all. Judging by Unicode’s stability policy, it might not even be possible.
– Ry-♦
Jul 12 '14 at 16:35
6
I think chendesheng's quote gets at the root cause best: Go uses a lot of signed values, not just for runes but array indices,Read
/Write
byte counts, etc. That's becauseuint
s, in any language, behave confusingly unless you guard every piece of arithmetic against overflow (for example ifvar a, b uint = 1, 2
,a-b > 0
anda-b > 1000000
: play.golang.org/p/lsdiZJiN7V).int
s behave more like numbers in everyday life, which is a compelling reason to use them, and there is no equally compelling reason not to use them.
– twotwotwo
Jul 13 '14 at 2:03
add a comment |
It doesn’t become negative. There are currently 1,114,112 codepoints in Unicode, which is far from 2,147,483,647 (0x7fffffff
) – even considering all the reserved blocks.
It doesn’t become negative. There are currently 1,114,112 codepoints in Unicode, which is far from 2,147,483,647 (0x7fffffff
) – even considering all the reserved blocks.
answered Jul 12 '14 at 16:00
Ry-♦Ry-
169k40344360
169k40344360
2
Thanks! Though a rune may address a range much larger than needed by unicode at this time, the question is about the fact that a negative value can be assigned to a rune. This could have been avoided if it was an unsigned integer. But there may be other considerations that make sense for a rune to still be a signed type, and I wonder what those are.
– Tapan Karecha
Jul 12 '14 at 16:10
@TapanKarecha: Sure, but you could also assign a positive value outside of Unicode’s range. Neither one would be valid Unicode. (Negative numbers might be more obvious to check for as an error condition, as a habit taken from C?)
– Ry-♦
Jul 12 '14 at 16:23
.@false: Yes, there will be invalid values on the positive end of the type range, but having invalid values on both ends of the type range is something I am having trouble dealing with as a concept. As you said, if the type was unsigned, I wont have to worry about checking for the negative value, which is one less check during validation.
– Tapan Karecha
Jul 12 '14 at 16:32
@TapanKarecha: No, I was saying that a negative return value on something that ought to return Unicode would be an obvious error (not something that Go needs, but something that you might commonly do in other languages), but checking the positive isn’t convenient at all. Judging by Unicode’s stability policy, it might not even be possible.
– Ry-♦
Jul 12 '14 at 16:35
6
I think chendesheng's quote gets at the root cause best: Go uses a lot of signed values, not just for runes but array indices,Read
/Write
byte counts, etc. That's becauseuint
s, in any language, behave confusingly unless you guard every piece of arithmetic against overflow (for example ifvar a, b uint = 1, 2
,a-b > 0
anda-b > 1000000
: play.golang.org/p/lsdiZJiN7V).int
s behave more like numbers in everyday life, which is a compelling reason to use them, and there is no equally compelling reason not to use them.
– twotwotwo
Jul 13 '14 at 2:03
add a comment |
2
Thanks! Though a rune may address a range much larger than needed by unicode at this time, the question is about the fact that a negative value can be assigned to a rune. This could have been avoided if it was an unsigned integer. But there may be other considerations that make sense for a rune to still be a signed type, and I wonder what those are.
– Tapan Karecha
Jul 12 '14 at 16:10
@TapanKarecha: Sure, but you could also assign a positive value outside of Unicode’s range. Neither one would be valid Unicode. (Negative numbers might be more obvious to check for as an error condition, as a habit taken from C?)
– Ry-♦
Jul 12 '14 at 16:23
.@false: Yes, there will be invalid values on the positive end of the type range, but having invalid values on both ends of the type range is something I am having trouble dealing with as a concept. As you said, if the type was unsigned, I wont have to worry about checking for the negative value, which is one less check during validation.
– Tapan Karecha
Jul 12 '14 at 16:32
@TapanKarecha: No, I was saying that a negative return value on something that ought to return Unicode would be an obvious error (not something that Go needs, but something that you might commonly do in other languages), but checking the positive isn’t convenient at all. Judging by Unicode’s stability policy, it might not even be possible.
– Ry-♦
Jul 12 '14 at 16:35
6
I think chendesheng's quote gets at the root cause best: Go uses a lot of signed values, not just for runes but array indices,Read
/Write
byte counts, etc. That's becauseuint
s, in any language, behave confusingly unless you guard every piece of arithmetic against overflow (for example ifvar a, b uint = 1, 2
,a-b > 0
anda-b > 1000000
: play.golang.org/p/lsdiZJiN7V).int
s behave more like numbers in everyday life, which is a compelling reason to use them, and there is no equally compelling reason not to use them.
– twotwotwo
Jul 13 '14 at 2:03
2
2
Thanks! Though a rune may address a range much larger than needed by unicode at this time, the question is about the fact that a negative value can be assigned to a rune. This could have been avoided if it was an unsigned integer. But there may be other considerations that make sense for a rune to still be a signed type, and I wonder what those are.
– Tapan Karecha
Jul 12 '14 at 16:10
Thanks! Though a rune may address a range much larger than needed by unicode at this time, the question is about the fact that a negative value can be assigned to a rune. This could have been avoided if it was an unsigned integer. But there may be other considerations that make sense for a rune to still be a signed type, and I wonder what those are.
– Tapan Karecha
Jul 12 '14 at 16:10
@TapanKarecha: Sure, but you could also assign a positive value outside of Unicode’s range. Neither one would be valid Unicode. (Negative numbers might be more obvious to check for as an error condition, as a habit taken from C?)
– Ry-♦
Jul 12 '14 at 16:23
@TapanKarecha: Sure, but you could also assign a positive value outside of Unicode’s range. Neither one would be valid Unicode. (Negative numbers might be more obvious to check for as an error condition, as a habit taken from C?)
– Ry-♦
Jul 12 '14 at 16:23
.@false: Yes, there will be invalid values on the positive end of the type range, but having invalid values on both ends of the type range is something I am having trouble dealing with as a concept. As you said, if the type was unsigned, I wont have to worry about checking for the negative value, which is one less check during validation.
– Tapan Karecha
Jul 12 '14 at 16:32
.@false: Yes, there will be invalid values on the positive end of the type range, but having invalid values on both ends of the type range is something I am having trouble dealing with as a concept. As you said, if the type was unsigned, I wont have to worry about checking for the negative value, which is one less check during validation.
– Tapan Karecha
Jul 12 '14 at 16:32
@TapanKarecha: No, I was saying that a negative return value on something that ought to return Unicode would be an obvious error (not something that Go needs, but something that you might commonly do in other languages), but checking the positive isn’t convenient at all. Judging by Unicode’s stability policy, it might not even be possible.
– Ry-♦
Jul 12 '14 at 16:35
@TapanKarecha: No, I was saying that a negative return value on something that ought to return Unicode would be an obvious error (not something that Go needs, but something that you might commonly do in other languages), but checking the positive isn’t convenient at all. Judging by Unicode’s stability policy, it might not even be possible.
– Ry-♦
Jul 12 '14 at 16:35
6
6
I think chendesheng's quote gets at the root cause best: Go uses a lot of signed values, not just for runes but array indices,
Read
/Write
byte counts, etc. That's because uint
s, in any language, behave confusingly unless you guard every piece of arithmetic against overflow (for example if var a, b uint = 1, 2
, a-b > 0
and a-b > 1000000
: play.golang.org/p/lsdiZJiN7V). int
s behave more like numbers in everyday life, which is a compelling reason to use them, and there is no equally compelling reason not to use them.– twotwotwo
Jul 13 '14 at 2:03
I think chendesheng's quote gets at the root cause best: Go uses a lot of signed values, not just for runes but array indices,
Read
/Write
byte counts, etc. That's because uint
s, in any language, behave confusingly unless you guard every piece of arithmetic against overflow (for example if var a, b uint = 1, 2
, a-b > 0
and a-b > 1000000
: play.golang.org/p/lsdiZJiN7V). int
s behave more like numbers in everyday life, which is a compelling reason to use them, and there is no equally compelling reason not to use them.– twotwotwo
Jul 13 '14 at 2:03
add a comment |
"Golang, Go : what is rune by the way?" mentioned:
With the recent Unicode 6.3, there are over 110,000 symbols defined. This requires at least 21-bit representation of each code point, so a rune is like int32 and has plenty of bits.
But regarding the overflow or negative value issues, note that the implementation of some of the unicode functions like unicode.IsGraphic do include:
We convert to
uint32
to avoid the extra test for negative
Code:
const MaxLatin1 = 'u00FF' // maximum Latin-1 value.
// IsGraphic reports whether the rune is defined as a Graphic by Unicode.
// Such characters include letters, marks, numbers, punctuation, symbols, and
// spaces, from categories L, M, N, P, S, Zs.
func IsGraphic(r rune) bool {
// We convert to uint32 to avoid the extra test for negative,
// and in the index we convert to uint8 to avoid the range check.
if uint32(r) <= MaxLatin1 {
return properties[uint8(r)]&pg != 0
}
return In(r, GraphicRanges...)
}
That maybe because a rune is supposed to be constant (as mentioned in "Go rune type explanation", where a rune could be in an int32
or uint32
or even float32
or ...: its constant value authorizes it to be stored in any of those numeric types).
add a comment |
"Golang, Go : what is rune by the way?" mentioned:
With the recent Unicode 6.3, there are over 110,000 symbols defined. This requires at least 21-bit representation of each code point, so a rune is like int32 and has plenty of bits.
But regarding the overflow or negative value issues, note that the implementation of some of the unicode functions like unicode.IsGraphic do include:
We convert to
uint32
to avoid the extra test for negative
Code:
const MaxLatin1 = 'u00FF' // maximum Latin-1 value.
// IsGraphic reports whether the rune is defined as a Graphic by Unicode.
// Such characters include letters, marks, numbers, punctuation, symbols, and
// spaces, from categories L, M, N, P, S, Zs.
func IsGraphic(r rune) bool {
// We convert to uint32 to avoid the extra test for negative,
// and in the index we convert to uint8 to avoid the range check.
if uint32(r) <= MaxLatin1 {
return properties[uint8(r)]&pg != 0
}
return In(r, GraphicRanges...)
}
That maybe because a rune is supposed to be constant (as mentioned in "Go rune type explanation", where a rune could be in an int32
or uint32
or even float32
or ...: its constant value authorizes it to be stored in any of those numeric types).
add a comment |
"Golang, Go : what is rune by the way?" mentioned:
With the recent Unicode 6.3, there are over 110,000 symbols defined. This requires at least 21-bit representation of each code point, so a rune is like int32 and has plenty of bits.
But regarding the overflow or negative value issues, note that the implementation of some of the unicode functions like unicode.IsGraphic do include:
We convert to
uint32
to avoid the extra test for negative
Code:
const MaxLatin1 = 'u00FF' // maximum Latin-1 value.
// IsGraphic reports whether the rune is defined as a Graphic by Unicode.
// Such characters include letters, marks, numbers, punctuation, symbols, and
// spaces, from categories L, M, N, P, S, Zs.
func IsGraphic(r rune) bool {
// We convert to uint32 to avoid the extra test for negative,
// and in the index we convert to uint8 to avoid the range check.
if uint32(r) <= MaxLatin1 {
return properties[uint8(r)]&pg != 0
}
return In(r, GraphicRanges...)
}
That maybe because a rune is supposed to be constant (as mentioned in "Go rune type explanation", where a rune could be in an int32
or uint32
or even float32
or ...: its constant value authorizes it to be stored in any of those numeric types).
"Golang, Go : what is rune by the way?" mentioned:
With the recent Unicode 6.3, there are over 110,000 symbols defined. This requires at least 21-bit representation of each code point, so a rune is like int32 and has plenty of bits.
But regarding the overflow or negative value issues, note that the implementation of some of the unicode functions like unicode.IsGraphic do include:
We convert to
uint32
to avoid the extra test for negative
Code:
const MaxLatin1 = 'u00FF' // maximum Latin-1 value.
// IsGraphic reports whether the rune is defined as a Graphic by Unicode.
// Such characters include letters, marks, numbers, punctuation, symbols, and
// spaces, from categories L, M, N, P, S, Zs.
func IsGraphic(r rune) bool {
// We convert to uint32 to avoid the extra test for negative,
// and in the index we convert to uint8 to avoid the range check.
if uint32(r) <= MaxLatin1 {
return properties[uint8(r)]&pg != 0
}
return In(r, GraphicRanges...)
}
That maybe because a rune is supposed to be constant (as mentioned in "Go rune type explanation", where a rune could be in an int32
or uint32
or even float32
or ...: its constant value authorizes it to be stored in any of those numeric types).
edited May 23 '17 at 12:25
Community♦
11
11
answered Jul 12 '14 at 18:21
VonCVonC
844k29426773230
844k29426773230
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f24714665%2fwhy-is-rune-in-golang-an-alias-for-int32-and-not-uint32%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Note:
byte
is an alias foruint8
, notuint
.– Filipe Gonçalves
Aug 26 '15 at 23:11
You selected the right answer before, what has changed?
– VonC
May 17 '18 at 18:44