Why is rune in golang an alias for int32 and not uint32?

The type rune in Go is defined as

an alias for int32 and is equivalent to int32 in all ways. It is
used, by convention, to distinguish character values from integer
values.

If the intention is to use this type to represent character values, why did the authors of the Go language do not use uint32 instead of int32? How do they expect a rune value to be handled in a program, when it is negative? The other similar type, byte, is an alias for uint8 (and not int8), which seems reasonable.

edited Nov 15 '18 at 7:28

Rene Knop

1,3633722

asked Jul 12 '14 at 15:55

Tapan Karecha

952817

1

Note: byte is an alias for uint8, not uint.

– Filipe Gonçalves
Aug 26 '15 at 23:11

You selected the right answer before, what has changed?

– VonC
May 17 '18 at 18:44

add a comment |

The type rune in Go is defined as

an alias for int32 and is equivalent to int32 in all ways. It is
used, by convention, to distinguish character values from integer
values.

edited Nov 15 '18 at 7:28

Rene Knop

1,3633722

asked Jul 12 '14 at 15:55

Tapan Karecha

952817

1

Note: byte is an alias for uint8, not uint.

– Filipe Gonçalves
Aug 26 '15 at 23:11

You selected the right answer before, what has changed?

– VonC
May 17 '18 at 18:44

add a comment |

The type rune in Go is defined as

an alias for int32 and is equivalent to int32 in all ways. It is
used, by convention, to distinguish character values from integer
values.

edited Nov 15 '18 at 7:28

Rene Knop

1,3633722

asked Jul 12 '14 at 15:55

Tapan Karecha

952817

The type rune in Go is defined as

an alias for int32 and is equivalent to int32 in all ways. It is
used, by convention, to distinguish character values from integer
values.

edited Nov 15 '18 at 7:28

Rene Knop

1,3633722

asked Jul 12 '14 at 15:55

Tapan Karecha

952817

edited Nov 15 '18 at 7:28

Rene Knop

1,3633722

asked Jul 12 '14 at 15:55

Tapan Karecha

952817

edited Nov 15 '18 at 7:28

Rene Knop

1,3633722

edited Nov 15 '18 at 7:28

Rene Knop

1,3633722

edited Nov 15 '18 at 7:28

Rene Knop

1,3633722

asked Jul 12 '14 at 15:55

Tapan Karecha

952817

asked Jul 12 '14 at 15:55

Tapan Karecha

952817

asked Jul 12 '14 at 15:55

Tapan Karecha

952817

1

Note: byte is an alias for uint8, not uint.

– Filipe Gonçalves
Aug 26 '15 at 23:11

You selected the right answer before, what has changed?

– VonC
May 17 '18 at 18:44

add a comment |

1

Note: byte is an alias for uint8, not uint.

– Filipe Gonçalves
Aug 26 '15 at 23:11

You selected the right answer before, what has changed?

– VonC
May 17 '18 at 18:44

Note: byte is an alias for uint8, not uint.

– Filipe Gonçalves
Aug 26 '15 at 23:11

You selected the right answer before, what has changed?

– VonC
May 17 '18 at 18:44

add a comment |

3 Answers
3

active

oldest

votes

I googled and found this:
https://groups.google.com/forum/#!topic/golang-nuts/d3_GPK8bwBg

This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types.

edited Sep 6 '16 at 15:22

Trevor Hickey

16.8k1494187

answered Jul 12 '14 at 16:08

chendesheng

1,169611

All answers in that thread argue that there is enough space to reference all code points of Unicode in a signed 32 bit integer. Hence, I do understand how rune is big enough to address the Unicode range. The question still remains about the choice of type. Why not uint16 (which has comparable range of values for positive integers) but uses only half the space as int32?

– Tapan Karecha
Jul 12 '14 at 16:20

2

@TapanKarecha: uint16 doesn’t fit all of Unicode, though. It fits a really big chunk of it, but Unicode ends at 0x10fffd.

– Ry-♦
Jul 12 '14 at 16:21

3

Christoph Hack: "This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types."

– chendesheng
Jul 12 '14 at 16:27

1

@chendesheng, please add your comment into your answer. It is the most important part, in my opinion.

– andybalholm
Jul 12 '14 at 17:55

2

Yes: uint can have hard-to-debug behavior like a-b > 1000 when a=1 and b=2 (play). So Go uses int where it can.

– twotwotwo
Jul 13 '14 at 2:21

|
show 1 more comment

It doesn’t become negative. There are currently 1,114,112 codepoints in Unicode, which is far from 2,147,483,647 (0x7fffffff) – even considering all the reserved blocks.

answered Jul 12 '14 at 16:00

Ry-♦

169k40344360

2

Thanks! Though a rune may address a range much larger than needed by unicode at this time, the question is about the fact that a negative value can be assigned to a rune. This could have been avoided if it was an unsigned integer. But there may be other considerations that make sense for a rune to still be a signed type, and I wonder what those are.

– Tapan Karecha
Jul 12 '14 at 16:10

@TapanKarecha: Sure, but you could also assign a positive value outside of Unicode’s range. Neither one would be valid Unicode. (Negative numbers might be more obvious to check for as an error condition, as a habit taken from C?)

– Ry-♦
Jul 12 '14 at 16:23

.@false: Yes, there will be invalid values on the positive end of the type range, but having invalid values on both ends of the type range is something I am having trouble dealing with as a concept. As you said, if the type was unsigned, I wont have to worry about checking for the negative value, which is one less check during validation.

– Tapan Karecha
Jul 12 '14 at 16:32

@TapanKarecha: No, I was saying that a negative return value on something that ought to return Unicode would be an obvious error (not something that Go needs, but something that you might commonly do in other languages), but checking the positive isn’t convenient at all. Judging by Unicode’s stability policy, it might not even be possible.

– Ry-♦
Jul 12 '14 at 16:35

6

I think chendesheng's quote gets at the root cause best: Go uses a lot of signed values, not just for runes but array indices, Read/Write byte counts, etc. That's because uints, in any language, behave confusingly unless you guard every piece of arithmetic against overflow (for example if var a, b uint = 1, 2, a-b > 0 and a-b > 1000000: play.golang.org/p/lsdiZJiN7V). ints behave more like numbers in everyday life, which is a compelling reason to use them, and there is no equally compelling reason not to use them.

– twotwotwo
Jul 13 '14 at 2:03

add a comment |

"Golang, Go : what is rune by the way?" mentioned:

With the recent Unicode 6.3, there are over 110,000 symbols defined. This requires at least 21-bit representation of each code point, so a rune is like int32 and has plenty of bits.

But regarding the overflow or negative value issues, note that the implementation of some of the unicode functions like unicode.IsGraphic do include:

We convert to uint32 to avoid the extra test for negative

Code:

const MaxLatin1 = 'u00FF' // maximum Latin-1 value.



// IsGraphic reports whether the rune is defined as a Graphic by Unicode.

// Such characters include letters, marks, numbers, punctuation, symbols, and

// spaces, from categories L, M, N, P, S, Zs.

func IsGraphic(r rune) bool {

    // We convert to uint32 to avoid the extra test for negative,

    // and in the index we convert to uint8 to avoid the range check.

    if uint32(r) <= MaxLatin1 {

        return properties[uint8(r)]&pg != 0

    }

    return In(r, GraphicRanges...)

}

That maybe because a rune is supposed to be constant (as mentioned in "Go rune type explanation", where a rune could be in an int32 or uint32 or even float32 or ...: its constant value authorizes it to be stored in any of those numeric types).

edited May 23 '17 at 12:25

Community♦

answered Jul 12 '14 at 18:21

VonC

844k29426773230

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f24714665%2fwhy-is-rune-in-golang-an-alias-for-int32-and-not-uint32%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

I googled and found this:
https://groups.google.com/forum/#!topic/golang-nuts/d3_GPK8bwBg

This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types.

edited Sep 6 '16 at 15:22

Trevor Hickey

16.8k1494187

answered Jul 12 '14 at 16:08

chendesheng

1,169611

All answers in that thread argue that there is enough space to reference all code points of Unicode in a signed 32 bit integer. Hence, I do understand how rune is big enough to address the Unicode range. The question still remains about the choice of type. Why not uint16 (which has comparable range of values for positive integers) but uses only half the space as int32?

– Tapan Karecha
Jul 12 '14 at 16:20

2

@TapanKarecha: uint16 doesn’t fit all of Unicode, though. It fits a really big chunk of it, but Unicode ends at 0x10fffd.

– Ry-♦
Jul 12 '14 at 16:21

3

Christoph Hack: "This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types."

– chendesheng
Jul 12 '14 at 16:27

1

@chendesheng, please add your comment into your answer. It is the most important part, in my opinion.

– andybalholm
Jul 12 '14 at 17:55

2

Yes: uint can have hard-to-debug behavior like a-b > 1000 when a=1 and b=2 (play). So Go uses int where it can.

– twotwotwo
Jul 13 '14 at 2:21

|
show 1 more comment

I googled and found this:
https://groups.google.com/forum/#!topic/golang-nuts/d3_GPK8bwBg

This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types.

edited Sep 6 '16 at 15:22

Trevor Hickey

16.8k1494187

answered Jul 12 '14 at 16:08

chendesheng

1,169611

All answers in that thread argue that there is enough space to reference all code points of Unicode in a signed 32 bit integer. Hence, I do understand how rune is big enough to address the Unicode range. The question still remains about the choice of type. Why not uint16 (which has comparable range of values for positive integers) but uses only half the space as int32?

– Tapan Karecha
Jul 12 '14 at 16:20

2

@TapanKarecha: uint16 doesn’t fit all of Unicode, though. It fits a really big chunk of it, but Unicode ends at 0x10fffd.

– Ry-♦
Jul 12 '14 at 16:21

3

Christoph Hack: "This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types."

– chendesheng
Jul 12 '14 at 16:27

1

@chendesheng, please add your comment into your answer. It is the most important part, in my opinion.

– andybalholm
Jul 12 '14 at 17:55

2

Yes: uint can have hard-to-debug behavior like a-b > 1000 when a=1 and b=2 (play). So Go uses int where it can.

– twotwotwo
Jul 13 '14 at 2:21

|
show 1 more comment

I googled and found this:
https://groups.google.com/forum/#!topic/golang-nuts/d3_GPK8bwBg

This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types.

edited Sep 6 '16 at 15:22

Trevor Hickey

16.8k1494187

answered Jul 12 '14 at 16:08

chendesheng

1,169611

I googled and found this:
https://groups.google.com/forum/#!topic/golang-nuts/d3_GPK8bwBg

This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types.

edited Sep 6 '16 at 15:22

Trevor Hickey

16.8k1494187

answered Jul 12 '14 at 16:08

chendesheng

1,169611

edited Sep 6 '16 at 15:22

Trevor Hickey

16.8k1494187

edited Sep 6 '16 at 15:22

Trevor Hickey

16.8k1494187

edited Sep 6 '16 at 15:22

Trevor Hickey

16.8k1494187

answered Jul 12 '14 at 16:08

chendesheng

1,169611

answered Jul 12 '14 at 16:08

chendesheng

1,169611

answered Jul 12 '14 at 16:08

chendesheng

1,169611

All answers in that thread argue that there is enough space to reference all code points of Unicode in a signed 32 bit integer. Hence, I do understand how rune is big enough to address the Unicode range. The question still remains about the choice of type. Why not uint16 (which has comparable range of values for positive integers) but uses only half the space as int32?

– Tapan Karecha
Jul 12 '14 at 16:20

2

@TapanKarecha: uint16 doesn’t fit all of Unicode, though. It fits a really big chunk of it, but Unicode ends at 0x10fffd.

– Ry-♦
Jul 12 '14 at 16:21

3

Christoph Hack: "This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types."

– chendesheng
Jul 12 '14 at 16:27

1

@chendesheng, please add your comment into your answer. It is the most important part, in my opinion.

– andybalholm
Jul 12 '14 at 17:55

2

Yes: uint can have hard-to-debug behavior like a-b > 1000 when a=1 and b=2 (play). So Go uses int where it can.

– twotwotwo
Jul 13 '14 at 2:21

|
show 1 more comment

All answers in that thread argue that there is enough space to reference all code points of Unicode in a signed 32 bit integer. Hence, I do understand how rune is big enough to address the Unicode range. The question still remains about the choice of type. Why not uint16 (which has comparable range of values for positive integers) but uses only half the space as int32?

– Tapan Karecha
Jul 12 '14 at 16:20

2

@TapanKarecha: uint16 doesn’t fit all of Unicode, though. It fits a really big chunk of it, but Unicode ends at 0x10fffd.

– Ry-♦
Jul 12 '14 at 16:21

3

Christoph Hack: "This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types."

– chendesheng
Jul 12 '14 at 16:27

1

@chendesheng, please add your comment into your answer. It is the most important part, in my opinion.

– andybalholm
Jul 12 '14 at 17:55

2

Yes: uint can have hard-to-debug behavior like a-b > 1000 when a=1 and b=2 (play). So Go uses int where it can.

– twotwotwo
Jul 13 '14 at 2:21

All answers in that thread argue that there is enough space to reference all code points of Unicode in a signed 32 bit integer. Hence, I do understand how rune is big enough to address the Unicode range. The question still remains about the choice of type. Why not uint16 (which has comparable range of values for positive integers) but uses only half the space as int32?

– Tapan Karecha
Jul 12 '14 at 16:20

@TapanKarecha: uint16 doesn’t fit all of Unicode, though. It fits a really big chunk of it, but Unicode ends at 0x10fffd.

– Ry-♦
Jul 12 '14 at 16:21

Christoph Hack: "This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types."

– chendesheng
Jul 12 '14 at 16:27

@chendesheng, please add your comment into your answer. It is the most important part, in my opinion.

– andybalholm
Jul 12 '14 at 17:55

Yes: uint can have hard-to-debug behavior like a-b > 1000 when a=1 and b=2 (play). So Go uses int where it can.

– twotwotwo
Jul 13 '14 at 2:21

|
show 1 more comment

It doesn’t become negative. There are currently 1,114,112 codepoints in Unicode, which is far from 2,147,483,647 (0x7fffffff) – even considering all the reserved blocks.

answered Jul 12 '14 at 16:00

Ry-♦

169k40344360

2

Thanks! Though a rune may address a range much larger than needed by unicode at this time, the question is about the fact that a negative value can be assigned to a rune. This could have been avoided if it was an unsigned integer. But there may be other considerations that make sense for a rune to still be a signed type, and I wonder what those are.

– Tapan Karecha
Jul 12 '14 at 16:10

@TapanKarecha: Sure, but you could also assign a positive value outside of Unicode’s range. Neither one would be valid Unicode. (Negative numbers might be more obvious to check for as an error condition, as a habit taken from C?)

– Ry-♦
Jul 12 '14 at 16:23

.@false: Yes, there will be invalid values on the positive end of the type range, but having invalid values on both ends of the type range is something I am having trouble dealing with as a concept. As you said, if the type was unsigned, I wont have to worry about checking for the negative value, which is one less check during validation.

– Tapan Karecha
Jul 12 '14 at 16:32

@TapanKarecha: No, I was saying that a negative return value on something that ought to return Unicode would be an obvious error (not something that Go needs, but something that you might commonly do in other languages), but checking the positive isn’t convenient at all. Judging by Unicode’s stability policy, it might not even be possible.

– Ry-♦
Jul 12 '14 at 16:35

6

I think chendesheng's quote gets at the root cause best: Go uses a lot of signed values, not just for runes but array indices, Read/Write byte counts, etc. That's because uints, in any language, behave confusingly unless you guard every piece of arithmetic against overflow (for example if var a, b uint = 1, 2, a-b > 0 and a-b > 1000000: play.golang.org/p/lsdiZJiN7V). ints behave more like numbers in everyday life, which is a compelling reason to use them, and there is no equally compelling reason not to use them.

– twotwotwo
Jul 13 '14 at 2:03

add a comment |

It doesn’t become negative. There are currently 1,114,112 codepoints in Unicode, which is far from 2,147,483,647 (0x7fffffff) – even considering all the reserved blocks.

answered Jul 12 '14 at 16:00

Ry-♦

169k40344360

2

Thanks! Though a rune may address a range much larger than needed by unicode at this time, the question is about the fact that a negative value can be assigned to a rune. This could have been avoided if it was an unsigned integer. But there may be other considerations that make sense for a rune to still be a signed type, and I wonder what those are.

– Tapan Karecha
Jul 12 '14 at 16:10

@TapanKarecha: Sure, but you could also assign a positive value outside of Unicode’s range. Neither one would be valid Unicode. (Negative numbers might be more obvious to check for as an error condition, as a habit taken from C?)

– Ry-♦
Jul 12 '14 at 16:23

.@false: Yes, there will be invalid values on the positive end of the type range, but having invalid values on both ends of the type range is something I am having trouble dealing with as a concept. As you said, if the type was unsigned, I wont have to worry about checking for the negative value, which is one less check during validation.

– Tapan Karecha
Jul 12 '14 at 16:32

@TapanKarecha: No, I was saying that a negative return value on something that ought to return Unicode would be an obvious error (not something that Go needs, but something that you might commonly do in other languages), but checking the positive isn’t convenient at all. Judging by Unicode’s stability policy, it might not even be possible.

– Ry-♦
Jul 12 '14 at 16:35

6

I think chendesheng's quote gets at the root cause best: Go uses a lot of signed values, not just for runes but array indices, Read/Write byte counts, etc. That's because uints, in any language, behave confusingly unless you guard every piece of arithmetic against overflow (for example if var a, b uint = 1, 2, a-b > 0 and a-b > 1000000: play.golang.org/p/lsdiZJiN7V). ints behave more like numbers in everyday life, which is a compelling reason to use them, and there is no equally compelling reason not to use them.

– twotwotwo
Jul 13 '14 at 2:03

add a comment |

It doesn’t become negative. There are currently 1,114,112 codepoints in Unicode, which is far from 2,147,483,647 (0x7fffffff) – even considering all the reserved blocks.

answered Jul 12 '14 at 16:00

Ry-♦

169k40344360

It doesn’t become negative. There are currently 1,114,112 codepoints in Unicode, which is far from 2,147,483,647 (0x7fffffff) – even considering all the reserved blocks.

answered Jul 12 '14 at 16:00

Ry-♦

169k40344360

answered Jul 12 '14 at 16:00

Ry-♦

169k40344360

answered Jul 12 '14 at 16:00

Ry-♦

169k40344360

answered Jul 12 '14 at 16:00

Ry-♦

169k40344360

2

Thanks! Though a rune may address a range much larger than needed by unicode at this time, the question is about the fact that a negative value can be assigned to a rune. This could have been avoided if it was an unsigned integer. But there may be other considerations that make sense for a rune to still be a signed type, and I wonder what those are.

– Tapan Karecha
Jul 12 '14 at 16:10

@TapanKarecha: Sure, but you could also assign a positive value outside of Unicode’s range. Neither one would be valid Unicode. (Negative numbers might be more obvious to check for as an error condition, as a habit taken from C?)

– Ry-♦
Jul 12 '14 at 16:23

.@false: Yes, there will be invalid values on the positive end of the type range, but having invalid values on both ends of the type range is something I am having trouble dealing with as a concept. As you said, if the type was unsigned, I wont have to worry about checking for the negative value, which is one less check during validation.

– Tapan Karecha
Jul 12 '14 at 16:32

@TapanKarecha: No, I was saying that a negative return value on something that ought to return Unicode would be an obvious error (not something that Go needs, but something that you might commonly do in other languages), but checking the positive isn’t convenient at all. Judging by Unicode’s stability policy, it might not even be possible.

– Ry-♦
Jul 12 '14 at 16:35

6

I think chendesheng's quote gets at the root cause best: Go uses a lot of signed values, not just for runes but array indices, Read/Write byte counts, etc. That's because uints, in any language, behave confusingly unless you guard every piece of arithmetic against overflow (for example if var a, b uint = 1, 2, a-b > 0 and a-b > 1000000: play.golang.org/p/lsdiZJiN7V). ints behave more like numbers in everyday life, which is a compelling reason to use them, and there is no equally compelling reason not to use them.

– twotwotwo
Jul 13 '14 at 2:03

add a comment |

2

Thanks! Though a rune may address a range much larger than needed by unicode at this time, the question is about the fact that a negative value can be assigned to a rune. This could have been avoided if it was an unsigned integer. But there may be other considerations that make sense for a rune to still be a signed type, and I wonder what those are.

– Tapan Karecha
Jul 12 '14 at 16:10

@TapanKarecha: Sure, but you could also assign a positive value outside of Unicode’s range. Neither one would be valid Unicode. (Negative numbers might be more obvious to check for as an error condition, as a habit taken from C?)

– Ry-♦
Jul 12 '14 at 16:23

.@false: Yes, there will be invalid values on the positive end of the type range, but having invalid values on both ends of the type range is something I am having trouble dealing with as a concept. As you said, if the type was unsigned, I wont have to worry about checking for the negative value, which is one less check during validation.

– Tapan Karecha
Jul 12 '14 at 16:32

@TapanKarecha: No, I was saying that a negative return value on something that ought to return Unicode would be an obvious error (not something that Go needs, but something that you might commonly do in other languages), but checking the positive isn’t convenient at all. Judging by Unicode’s stability policy, it might not even be possible.

– Ry-♦
Jul 12 '14 at 16:35

6

I think chendesheng's quote gets at the root cause best: Go uses a lot of signed values, not just for runes but array indices, Read/Write byte counts, etc. That's because uints, in any language, behave confusingly unless you guard every piece of arithmetic against overflow (for example if var a, b uint = 1, 2, a-b > 0 and a-b > 1000000: play.golang.org/p/lsdiZJiN7V). ints behave more like numbers in everyday life, which is a compelling reason to use them, and there is no equally compelling reason not to use them.

– twotwotwo
Jul 13 '14 at 2:03

Thanks! Though a rune may address a range much larger than needed by unicode at this time, the question is about the fact that a negative value can be assigned to a rune. This could have been avoided if it was an unsigned integer. But there may be other considerations that make sense for a rune to still be a signed type, and I wonder what those are.

– Tapan Karecha
Jul 12 '14 at 16:10

@TapanKarecha: Sure, but you could also assign a positive value outside of Unicode’s range. Neither one would be valid Unicode. (Negative numbers might be more obvious to check for as an error condition, as a habit taken from C?)

– Ry-♦
Jul 12 '14 at 16:23

.@false: Yes, there will be invalid values on the positive end of the type range, but having invalid values on both ends of the type range is something I am having trouble dealing with as a concept. As you said, if the type was unsigned, I wont have to worry about checking for the negative value, which is one less check during validation.

– Tapan Karecha
Jul 12 '14 at 16:32

@TapanKarecha: No, I was saying that a negative return value on something that ought to return Unicode would be an obvious error (not something that Go needs, but something that you might commonly do in other languages), but checking the positive isn’t convenient at all. Judging by Unicode’s stability policy, it might not even be possible.

– Ry-♦
Jul 12 '14 at 16:35

I think chendesheng's quote gets at the root cause best: Go uses a lot of signed values, not just for runes but array indices, Read/Write byte counts, etc. That's because uints, in any language, behave confusingly unless you guard every piece of arithmetic against overflow (for example if var a, b uint = 1, 2, a-b > 0 and a-b > 1000000: play.golang.org/p/lsdiZJiN7V). ints behave more like numbers in everyday life, which is a compelling reason to use them, and there is no equally compelling reason not to use them.

– twotwotwo
Jul 13 '14 at 2:03

add a comment |

"Golang, Go : what is rune by the way?" mentioned:

With the recent Unicode 6.3, there are over 110,000 symbols defined. This requires at least 21-bit representation of each code point, so a rune is like int32 and has plenty of bits.

But regarding the overflow or negative value issues, note that the implementation of some of the unicode functions like unicode.IsGraphic do include:

We convert to uint32 to avoid the extra test for negative

Code:

const MaxLatin1 = 'u00FF' // maximum Latin-1 value.



// IsGraphic reports whether the rune is defined as a Graphic by Unicode.

// Such characters include letters, marks, numbers, punctuation, symbols, and

// spaces, from categories L, M, N, P, S, Zs.

func IsGraphic(r rune) bool {

    // We convert to uint32 to avoid the extra test for negative,

    // and in the index we convert to uint8 to avoid the range check.

    if uint32(r) <= MaxLatin1 {

        return properties[uint8(r)]&pg != 0

    }

    return In(r, GraphicRanges...)

}

edited May 23 '17 at 12:25

Community♦

answered Jul 12 '14 at 18:21

VonC

844k29426773230

add a comment |

"Golang, Go : what is rune by the way?" mentioned:

With the recent Unicode 6.3, there are over 110,000 symbols defined. This requires at least 21-bit representation of each code point, so a rune is like int32 and has plenty of bits.

But regarding the overflow or negative value issues, note that the implementation of some of the unicode functions like unicode.IsGraphic do include:

We convert to uint32 to avoid the extra test for negative

Code:

const MaxLatin1 = 'u00FF' // maximum Latin-1 value.



// IsGraphic reports whether the rune is defined as a Graphic by Unicode.

// Such characters include letters, marks, numbers, punctuation, symbols, and

// spaces, from categories L, M, N, P, S, Zs.

func IsGraphic(r rune) bool {

    // We convert to uint32 to avoid the extra test for negative,

    // and in the index we convert to uint8 to avoid the range check.

    if uint32(r) <= MaxLatin1 {

        return properties[uint8(r)]&pg != 0

    }

    return In(r, GraphicRanges...)

}

edited May 23 '17 at 12:25

Community♦

answered Jul 12 '14 at 18:21

VonC

844k29426773230

add a comment |

"Golang, Go : what is rune by the way?" mentioned:

With the recent Unicode 6.3, there are over 110,000 symbols defined. This requires at least 21-bit representation of each code point, so a rune is like int32 and has plenty of bits.

But regarding the overflow or negative value issues, note that the implementation of some of the unicode functions like unicode.IsGraphic do include:

We convert to uint32 to avoid the extra test for negative

Code:

const MaxLatin1 = 'u00FF' // maximum Latin-1 value.



// IsGraphic reports whether the rune is defined as a Graphic by Unicode.

// Such characters include letters, marks, numbers, punctuation, symbols, and

// spaces, from categories L, M, N, P, S, Zs.

func IsGraphic(r rune) bool {

    // We convert to uint32 to avoid the extra test for negative,

    // and in the index we convert to uint8 to avoid the range check.

    if uint32(r) <= MaxLatin1 {

        return properties[uint8(r)]&pg != 0

    }

    return In(r, GraphicRanges...)

}

edited May 23 '17 at 12:25

Community♦

answered Jul 12 '14 at 18:21

VonC

844k29426773230

"Golang, Go : what is rune by the way?" mentioned:

With the recent Unicode 6.3, there are over 110,000 symbols defined. This requires at least 21-bit representation of each code point, so a rune is like int32 and has plenty of bits.

But regarding the overflow or negative value issues, note that the implementation of some of the unicode functions like unicode.IsGraphic do include:

We convert to uint32 to avoid the extra test for negative

Code:

const MaxLatin1 = 'u00FF' // maximum Latin-1 value.



// IsGraphic reports whether the rune is defined as a Graphic by Unicode.

// Such characters include letters, marks, numbers, punctuation, symbols, and

// spaces, from categories L, M, N, P, S, Zs.

func IsGraphic(r rune) bool {

    // We convert to uint32 to avoid the extra test for negative,

    // and in the index we convert to uint8 to avoid the range check.

    if uint32(r) <= MaxLatin1 {

        return properties[uint8(r)]&pg != 0

    }

    return In(r, GraphicRanges...)

}

edited May 23 '17 at 12:25

Community♦

answered Jul 12 '14 at 18:21

VonC

844k29426773230

edited May 23 '17 at 12:25

Community♦

edited May 23 '17 at 12:25

Community♦

edited May 23 '17 at 12:25

Community♦

answered Jul 12 '14 at 18:21

VonC

844k29426773230

answered Jul 12 '14 at 18:21

VonC

844k29426773230

answered Jul 12 '14 at 18:21

VonC

844k29426773230

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

nYnZq,b,IxcqVCqA e3l6DsNwTtYbx a50AdMAGXuW Pt DXQNvy,hQoO ex

搜尋此網誌

Vfrdtyky