Maximum cache misses possible from using Thread Local Variables
Referring to question this already asked/answered question: (How are the fs/gs registers used in Linux AMD64?), and this doc referenced in an answer to this question (https://akkadia.org/drepper/tls.pdf)
According to the doc the FS register points to the TCB(Thread control block), which points to the DTV (dynamic thread vector) which ultimately leads to the thread local data.
Is it then right to assume we can incur up to 3 cache misses loading a thread local variable? (1 for TCB, 1 for DTV, and 1 for the data itself?
c++ linux x86-64 cpu-cache thread-local-storage
add a comment |
Referring to question this already asked/answered question: (How are the fs/gs registers used in Linux AMD64?), and this doc referenced in an answer to this question (https://akkadia.org/drepper/tls.pdf)
According to the doc the FS register points to the TCB(Thread control block), which points to the DTV (dynamic thread vector) which ultimately leads to the thread local data.
Is it then right to assume we can incur up to 3 cache misses loading a thread local variable? (1 for TCB, 1 for DTV, and 1 for the data itself?
c++ linux x86-64 cpu-cache thread-local-storage
add a comment |
Referring to question this already asked/answered question: (How are the fs/gs registers used in Linux AMD64?), and this doc referenced in an answer to this question (https://akkadia.org/drepper/tls.pdf)
According to the doc the FS register points to the TCB(Thread control block), which points to the DTV (dynamic thread vector) which ultimately leads to the thread local data.
Is it then right to assume we can incur up to 3 cache misses loading a thread local variable? (1 for TCB, 1 for DTV, and 1 for the data itself?
c++ linux x86-64 cpu-cache thread-local-storage
Referring to question this already asked/answered question: (How are the fs/gs registers used in Linux AMD64?), and this doc referenced in an answer to this question (https://akkadia.org/drepper/tls.pdf)
According to the doc the FS register points to the TCB(Thread control block), which points to the DTV (dynamic thread vector) which ultimately leads to the thread local data.
Is it then right to assume we can incur up to 3 cache misses loading a thread local variable? (1 for TCB, 1 for DTV, and 1 for the data itself?
c++ linux x86-64 cpu-cache thread-local-storage
c++ linux x86-64 cpu-cache thread-local-storage
edited Nov 15 '18 at 16:44
Peter Cordes
130k18197335
130k18197335
asked Nov 15 '18 at 15:48
SubliminalBroccoliSubliminalBroccoli
4115
4115
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
According to Godbolt, the following code:
thread_local int t;
int get_t () {
return t;
}
Generates the following object code:
mov eax, DWORD PTR fs:t@tpoff
ret
So I make that one memory access. And there is in fact an answer in the post you link to that says the same thing.
4
Yup, exactly. The linker resolves the symbol to an offset added to the segment base, not any extra levels of indirection. (Well, fs base is kind of an extra level of indirection, but that's stored inside the CPU in the descriptor cache, not loaded from memory on every use. On Intel CPUs, using a non-zero segment base adds 1 cycle of load-latency. So that and a bit of extra code-size are the only cost to TLS)
– Peter Cordes
Nov 15 '18 at 16:46
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53323106%2fmaximum-cache-misses-possible-from-using-thread-local-variables%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
According to Godbolt, the following code:
thread_local int t;
int get_t () {
return t;
}
Generates the following object code:
mov eax, DWORD PTR fs:t@tpoff
ret
So I make that one memory access. And there is in fact an answer in the post you link to that says the same thing.
4
Yup, exactly. The linker resolves the symbol to an offset added to the segment base, not any extra levels of indirection. (Well, fs base is kind of an extra level of indirection, but that's stored inside the CPU in the descriptor cache, not loaded from memory on every use. On Intel CPUs, using a non-zero segment base adds 1 cycle of load-latency. So that and a bit of extra code-size are the only cost to TLS)
– Peter Cordes
Nov 15 '18 at 16:46
add a comment |
According to Godbolt, the following code:
thread_local int t;
int get_t () {
return t;
}
Generates the following object code:
mov eax, DWORD PTR fs:t@tpoff
ret
So I make that one memory access. And there is in fact an answer in the post you link to that says the same thing.
4
Yup, exactly. The linker resolves the symbol to an offset added to the segment base, not any extra levels of indirection. (Well, fs base is kind of an extra level of indirection, but that's stored inside the CPU in the descriptor cache, not loaded from memory on every use. On Intel CPUs, using a non-zero segment base adds 1 cycle of load-latency. So that and a bit of extra code-size are the only cost to TLS)
– Peter Cordes
Nov 15 '18 at 16:46
add a comment |
According to Godbolt, the following code:
thread_local int t;
int get_t () {
return t;
}
Generates the following object code:
mov eax, DWORD PTR fs:t@tpoff
ret
So I make that one memory access. And there is in fact an answer in the post you link to that says the same thing.
According to Godbolt, the following code:
thread_local int t;
int get_t () {
return t;
}
Generates the following object code:
mov eax, DWORD PTR fs:t@tpoff
ret
So I make that one memory access. And there is in fact an answer in the post you link to that says the same thing.
answered Nov 15 '18 at 16:36
Paul SandersPaul Sanders
5,2912621
5,2912621
4
Yup, exactly. The linker resolves the symbol to an offset added to the segment base, not any extra levels of indirection. (Well, fs base is kind of an extra level of indirection, but that's stored inside the CPU in the descriptor cache, not loaded from memory on every use. On Intel CPUs, using a non-zero segment base adds 1 cycle of load-latency. So that and a bit of extra code-size are the only cost to TLS)
– Peter Cordes
Nov 15 '18 at 16:46
add a comment |
4
Yup, exactly. The linker resolves the symbol to an offset added to the segment base, not any extra levels of indirection. (Well, fs base is kind of an extra level of indirection, but that's stored inside the CPU in the descriptor cache, not loaded from memory on every use. On Intel CPUs, using a non-zero segment base adds 1 cycle of load-latency. So that and a bit of extra code-size are the only cost to TLS)
– Peter Cordes
Nov 15 '18 at 16:46
4
4
Yup, exactly. The linker resolves the symbol to an offset added to the segment base, not any extra levels of indirection. (Well, fs base is kind of an extra level of indirection, but that's stored inside the CPU in the descriptor cache, not loaded from memory on every use. On Intel CPUs, using a non-zero segment base adds 1 cycle of load-latency. So that and a bit of extra code-size are the only cost to TLS)
– Peter Cordes
Nov 15 '18 at 16:46
Yup, exactly. The linker resolves the symbol to an offset added to the segment base, not any extra levels of indirection. (Well, fs base is kind of an extra level of indirection, but that's stored inside the CPU in the descriptor cache, not loaded from memory on every use. On Intel CPUs, using a non-zero segment base adds 1 cycle of load-latency. So that and a bit of extra code-size are the only cost to TLS)
– Peter Cordes
Nov 15 '18 at 16:46
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53323106%2fmaximum-cache-misses-possible-from-using-thread-local-variables%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown