What are R10-R15 registers used for in the Windows x64 calling convention?












0















From Intel's introduction to x64 assembly at https://software.intel.com/en-us/articles/introduction-to-x64-assembly,




  • RCX, RDX, R8, R9 are used for integer and pointer arguments in that order left to right.

  • Registers RAX, RCX, RDX, R8, R9, R10, and R11 are considered volatile and must be considered destroyed on function calls.

  • RBX, RBP, RDI, RSI, R12, R14, R14, and R15 must be saved in any function using them.


While I understand how RCX, RDX, R8, R9 are used as function arguments, I've seen functions that take more than 4 arguments revert to using the stack like 32 bit code. An example is below:



sub_18000BF10   proc near 
lpDirectory = qword ptr -638h
nShowCmd = dword ptr -630h
Parameters = word ptr -628h

sub rsp, 658h
mov r9, rcx
mov r8, rdx
lea rdx, someCommand ; "echo "Hello""...
lea rcx, [rsp+658h+Parameters] ; LPWSTR
call cs:wsprintfW
xor r11d, r11d
lea r9, [rsp+658h+Parameters] ; lpParameters
mov [rsp+658h+nShowCmd], r11d ; nShowCmd
lea r8, aCmdExe ; "cmd.exe"
lea rdx, Operation ; "open"
xor ecx, ecx ; hwnd
mov [rsp+658h+lpDirectory], r11 ; lpDirectory
call cs:ShellExecuteW
mov eax, 1
add rsp, 658h
retn
sub_18000BF10 endp


This is an excerpt from IDA, and you can see the nShowCmd and lpDirectory arguments to ShellExecute are on the stack. Why cant we use the extra registers after R9 for fast-call behavior?



Or if we can do that in user-defined functions and the system API functions don't do that, is there a reason for it? I imagine fast-call arguments in registers would be more efficient than checking, offsetting the stack.










share|improve this question

























  • Yes in your own convention you can use them. Microsoft decided to only allow 4 registers. SysV uses 6, including r9.

    – Jester
    Nov 13 '18 at 23:33
















0















From Intel's introduction to x64 assembly at https://software.intel.com/en-us/articles/introduction-to-x64-assembly,




  • RCX, RDX, R8, R9 are used for integer and pointer arguments in that order left to right.

  • Registers RAX, RCX, RDX, R8, R9, R10, and R11 are considered volatile and must be considered destroyed on function calls.

  • RBX, RBP, RDI, RSI, R12, R14, R14, and R15 must be saved in any function using them.


While I understand how RCX, RDX, R8, R9 are used as function arguments, I've seen functions that take more than 4 arguments revert to using the stack like 32 bit code. An example is below:



sub_18000BF10   proc near 
lpDirectory = qword ptr -638h
nShowCmd = dword ptr -630h
Parameters = word ptr -628h

sub rsp, 658h
mov r9, rcx
mov r8, rdx
lea rdx, someCommand ; "echo "Hello""...
lea rcx, [rsp+658h+Parameters] ; LPWSTR
call cs:wsprintfW
xor r11d, r11d
lea r9, [rsp+658h+Parameters] ; lpParameters
mov [rsp+658h+nShowCmd], r11d ; nShowCmd
lea r8, aCmdExe ; "cmd.exe"
lea rdx, Operation ; "open"
xor ecx, ecx ; hwnd
mov [rsp+658h+lpDirectory], r11 ; lpDirectory
call cs:ShellExecuteW
mov eax, 1
add rsp, 658h
retn
sub_18000BF10 endp


This is an excerpt from IDA, and you can see the nShowCmd and lpDirectory arguments to ShellExecute are on the stack. Why cant we use the extra registers after R9 for fast-call behavior?



Or if we can do that in user-defined functions and the system API functions don't do that, is there a reason for it? I imagine fast-call arguments in registers would be more efficient than checking, offsetting the stack.










share|improve this question

























  • Yes in your own convention you can use them. Microsoft decided to only allow 4 registers. SysV uses 6, including r9.

    – Jester
    Nov 13 '18 at 23:33














0












0








0








From Intel's introduction to x64 assembly at https://software.intel.com/en-us/articles/introduction-to-x64-assembly,




  • RCX, RDX, R8, R9 are used for integer and pointer arguments in that order left to right.

  • Registers RAX, RCX, RDX, R8, R9, R10, and R11 are considered volatile and must be considered destroyed on function calls.

  • RBX, RBP, RDI, RSI, R12, R14, R14, and R15 must be saved in any function using them.


While I understand how RCX, RDX, R8, R9 are used as function arguments, I've seen functions that take more than 4 arguments revert to using the stack like 32 bit code. An example is below:



sub_18000BF10   proc near 
lpDirectory = qword ptr -638h
nShowCmd = dword ptr -630h
Parameters = word ptr -628h

sub rsp, 658h
mov r9, rcx
mov r8, rdx
lea rdx, someCommand ; "echo "Hello""...
lea rcx, [rsp+658h+Parameters] ; LPWSTR
call cs:wsprintfW
xor r11d, r11d
lea r9, [rsp+658h+Parameters] ; lpParameters
mov [rsp+658h+nShowCmd], r11d ; nShowCmd
lea r8, aCmdExe ; "cmd.exe"
lea rdx, Operation ; "open"
xor ecx, ecx ; hwnd
mov [rsp+658h+lpDirectory], r11 ; lpDirectory
call cs:ShellExecuteW
mov eax, 1
add rsp, 658h
retn
sub_18000BF10 endp


This is an excerpt from IDA, and you can see the nShowCmd and lpDirectory arguments to ShellExecute are on the stack. Why cant we use the extra registers after R9 for fast-call behavior?



Or if we can do that in user-defined functions and the system API functions don't do that, is there a reason for it? I imagine fast-call arguments in registers would be more efficient than checking, offsetting the stack.










share|improve this question
















From Intel's introduction to x64 assembly at https://software.intel.com/en-us/articles/introduction-to-x64-assembly,




  • RCX, RDX, R8, R9 are used for integer and pointer arguments in that order left to right.

  • Registers RAX, RCX, RDX, R8, R9, R10, and R11 are considered volatile and must be considered destroyed on function calls.

  • RBX, RBP, RDI, RSI, R12, R14, R14, and R15 must be saved in any function using them.


While I understand how RCX, RDX, R8, R9 are used as function arguments, I've seen functions that take more than 4 arguments revert to using the stack like 32 bit code. An example is below:



sub_18000BF10   proc near 
lpDirectory = qword ptr -638h
nShowCmd = dword ptr -630h
Parameters = word ptr -628h

sub rsp, 658h
mov r9, rcx
mov r8, rdx
lea rdx, someCommand ; "echo "Hello""...
lea rcx, [rsp+658h+Parameters] ; LPWSTR
call cs:wsprintfW
xor r11d, r11d
lea r9, [rsp+658h+Parameters] ; lpParameters
mov [rsp+658h+nShowCmd], r11d ; nShowCmd
lea r8, aCmdExe ; "cmd.exe"
lea rdx, Operation ; "open"
xor ecx, ecx ; hwnd
mov [rsp+658h+lpDirectory], r11 ; lpDirectory
call cs:ShellExecuteW
mov eax, 1
add rsp, 658h
retn
sub_18000BF10 endp


This is an excerpt from IDA, and you can see the nShowCmd and lpDirectory arguments to ShellExecute are on the stack. Why cant we use the extra registers after R9 for fast-call behavior?



Or if we can do that in user-defined functions and the system API functions don't do that, is there a reason for it? I imagine fast-call arguments in registers would be more efficient than checking, offsetting the stack.







assembly x86-64 calling-convention






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 23:23









Peter Cordes

123k17185312




123k17185312










asked Nov 13 '18 at 23:19









deefunktdeefunkt

288




288













  • Yes in your own convention you can use them. Microsoft decided to only allow 4 registers. SysV uses 6, including r9.

    – Jester
    Nov 13 '18 at 23:33



















  • Yes in your own convention you can use them. Microsoft decided to only allow 4 registers. SysV uses 6, including r9.

    – Jester
    Nov 13 '18 at 23:33

















Yes in your own convention you can use them. Microsoft decided to only allow 4 registers. SysV uses 6, including r9.

– Jester
Nov 13 '18 at 23:33





Yes in your own convention you can use them. Microsoft decided to only allow 4 registers. SysV uses 6, including r9.

– Jester
Nov 13 '18 at 23:33












1 Answer
1






active

oldest

votes


















1














The Windows x64 calling convention is designed to make it easy to implement variadic functions (like printf and scanf) by dumping the 4 register args into the shadow space, creating a contiguous array of all args. Args larger than 8 bytes are passed by reference, so each arg always takes exactly 1 arg-passing slot.



Given this design constraint, more register args would require a larger shadow space, which wastes more stack space for small functions that don't have a lot of args.



Yes, more register args would normally be more efficient. But if the callee wants to make another function call right away with different args, it would then have to store all its register args to the stack, so there's a limit on how many register args are useful.



You want a good mix of call-preserved and call-clobbered registers, regardless of how many are used for arg-passing. R10 and R11 are call-clobbered scratch regs. A transparent wrapper function written in asm might use them for scratch space without disturbing any of the args in RCX,RDX,R8,R9, and without needing to save/restore a call-preserved register anywhere.



R12..R15 are call-preserved registers you can use for whatever you want, as long as your save/restore them before returning.






Or if we can do that in user-defined functions




Yes, you can freely make up your own calling conventions when calling from asm to asm, subject to constraints imposed by the OS. But if you want exceptions to be able to unwind the stack through such a call (e.g. if one of the child functions calls back into some C++ that can throw), you have to follow more restrictions, such as creating unwind metadata. If not, you can do nearly anything.



See my Choose your calling convention to put args where you want them. answer on the CodeGolf Q&A "Tips for golfing in x86/x64 machine code".



You can also return in whatever register(s) you want, and return multiple values. (e.g. an asm strcmp or memcmp function can return the -/0/+ difference in the mismatch in EAX, and return the mismatch position in RDI, so the caller can use either or both.)





A useful exercise in evaluating a design is to compare it to other actual or possible designs



By comparison, the x86-64 System V ABI passes the first 6 integer args in registers, and the first 8 FP args in XMM0..7. (Windows x64 passes the 5th arg on the stack, even if it's FP and the first 4 args were all integer.)



So the other major x86-64 calling convention does use more arg-passing registers. It doesn't use shadow-space; it defines a red-zone below RSP that's safe from being asynchronously clobbered. Small leaf functions can still avoid manipulating RSP to reserve space.



Fun fact: R10 and R11 are also non-arg-passing call-clobbered registers in x86-64 SysV. Fun fact #2: syscall destroys R11 (and RCX), so Linux uses R10 instead of RCX for passing arguments to system calls, but otherwise uses the same register-arg passing convention as user-space function calls.



See also Why does Windows64 use a different calling convention from all other OSes on x86-64? for more guesswork and info about why Microsoft made the design choices they did with their calling convention.



x86-64 System V makes it more complex to implement variadic functions (more code to index args), but they're generally rare. Most code doesn't bottleneck on sscanf throughput. Shadow space is usually worse than a red-zone. The original Windows x64 convention doesn't pass vector args (__m128) by value, so there's a 2nd 64-bit calling convention on Windows called vectorcall that allows efficient vector args. (Not usually a big deal because most functions that take vector args are inline, but SIMD math library functions would benefit.)



Having more args passed in the low 8 (rax..rdi original registers that don't need a REX prefix), and having more call-clobbered registers that don't need a REX prefix, is probably good for code-size in code that inlines enough to not make a huge amount of function calls. You could say that Window's choice of having more of the non-REX registers be call-preserved is better for code with loops containing function calls, but if you're making lots of function calls to short callees, then they'd benefit from more call-clobbered scratch registers that didn't need REX prefixes. I wonder how much thought MS put into this, or if they just mostly kept things similar to 32-bit calling conventions when choosing which of the low-8 registers would be call-preserved.



One of x86-64 System V's weaknesses is having no call-preserved XMM registers, though. So any function call requires spilling/reloading any FP vars. Having a couple, like the low 128 or 64 bits of xmm6 and xmm7, would have been maybe good.






share|improve this answer





















  • 2





    "Yes, you can freely make up your own calling conventions when calling from asm to asm." Within limits. The Windows platform does impose some restrictions in order to support asynchronous exceptions. For example, you cannot use memory below esp beyond the red zone. (Of course, if you are code golfing, you probably don't care that your program doesn't behave properly in the face of asynchronous exceptions. But if you're writing production code, you need to be aware of platform requirements.)

    – Raymond Chen
    Nov 14 '18 at 0:34













  • @RaymondChen: I meant as far as arg-passing, not how you use stack space within a function, but yeah good point about wording. I didn't just mean for golf; as long as a function respects its caller's ABI, its internal implementation can include call instructions however you like. (And BTW, Windows x64 doesn't officially have a red-zone at all below rsp. In practice there might not be anything that asynchronously clobbers that space but it's not guaranteed. Perhaps you're thinking about Itanium again, like you were last time you mentioned something about a red-zone for x64.)

    – Peter Cordes
    Nov 14 '18 at 0:42











  • I was speaking more generally. There are parts of the software conventions that apply throughout the entire lifetime of a function, not just at points where you call into the operating system. For example, you must register unwind codes if the return address is not in its default place (top of stack for x64, default return address register for RISC), or if you use nonvolatile registers. "There might not be anything that asynchronously clobbers that space." Any in-page error could clobber that space. And you cannot prevent in-page errors.

    – Raymond Chen
    Nov 14 '18 at 2:03













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53290932%2fwhat-are-r10-r15-registers-used-for-in-the-windows-x64-calling-convention%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














The Windows x64 calling convention is designed to make it easy to implement variadic functions (like printf and scanf) by dumping the 4 register args into the shadow space, creating a contiguous array of all args. Args larger than 8 bytes are passed by reference, so each arg always takes exactly 1 arg-passing slot.



Given this design constraint, more register args would require a larger shadow space, which wastes more stack space for small functions that don't have a lot of args.



Yes, more register args would normally be more efficient. But if the callee wants to make another function call right away with different args, it would then have to store all its register args to the stack, so there's a limit on how many register args are useful.



You want a good mix of call-preserved and call-clobbered registers, regardless of how many are used for arg-passing. R10 and R11 are call-clobbered scratch regs. A transparent wrapper function written in asm might use them for scratch space without disturbing any of the args in RCX,RDX,R8,R9, and without needing to save/restore a call-preserved register anywhere.



R12..R15 are call-preserved registers you can use for whatever you want, as long as your save/restore them before returning.






Or if we can do that in user-defined functions




Yes, you can freely make up your own calling conventions when calling from asm to asm, subject to constraints imposed by the OS. But if you want exceptions to be able to unwind the stack through such a call (e.g. if one of the child functions calls back into some C++ that can throw), you have to follow more restrictions, such as creating unwind metadata. If not, you can do nearly anything.



See my Choose your calling convention to put args where you want them. answer on the CodeGolf Q&A "Tips for golfing in x86/x64 machine code".



You can also return in whatever register(s) you want, and return multiple values. (e.g. an asm strcmp or memcmp function can return the -/0/+ difference in the mismatch in EAX, and return the mismatch position in RDI, so the caller can use either or both.)





A useful exercise in evaluating a design is to compare it to other actual or possible designs



By comparison, the x86-64 System V ABI passes the first 6 integer args in registers, and the first 8 FP args in XMM0..7. (Windows x64 passes the 5th arg on the stack, even if it's FP and the first 4 args were all integer.)



So the other major x86-64 calling convention does use more arg-passing registers. It doesn't use shadow-space; it defines a red-zone below RSP that's safe from being asynchronously clobbered. Small leaf functions can still avoid manipulating RSP to reserve space.



Fun fact: R10 and R11 are also non-arg-passing call-clobbered registers in x86-64 SysV. Fun fact #2: syscall destroys R11 (and RCX), so Linux uses R10 instead of RCX for passing arguments to system calls, but otherwise uses the same register-arg passing convention as user-space function calls.



See also Why does Windows64 use a different calling convention from all other OSes on x86-64? for more guesswork and info about why Microsoft made the design choices they did with their calling convention.



x86-64 System V makes it more complex to implement variadic functions (more code to index args), but they're generally rare. Most code doesn't bottleneck on sscanf throughput. Shadow space is usually worse than a red-zone. The original Windows x64 convention doesn't pass vector args (__m128) by value, so there's a 2nd 64-bit calling convention on Windows called vectorcall that allows efficient vector args. (Not usually a big deal because most functions that take vector args are inline, but SIMD math library functions would benefit.)



Having more args passed in the low 8 (rax..rdi original registers that don't need a REX prefix), and having more call-clobbered registers that don't need a REX prefix, is probably good for code-size in code that inlines enough to not make a huge amount of function calls. You could say that Window's choice of having more of the non-REX registers be call-preserved is better for code with loops containing function calls, but if you're making lots of function calls to short callees, then they'd benefit from more call-clobbered scratch registers that didn't need REX prefixes. I wonder how much thought MS put into this, or if they just mostly kept things similar to 32-bit calling conventions when choosing which of the low-8 registers would be call-preserved.



One of x86-64 System V's weaknesses is having no call-preserved XMM registers, though. So any function call requires spilling/reloading any FP vars. Having a couple, like the low 128 or 64 bits of xmm6 and xmm7, would have been maybe good.






share|improve this answer





















  • 2





    "Yes, you can freely make up your own calling conventions when calling from asm to asm." Within limits. The Windows platform does impose some restrictions in order to support asynchronous exceptions. For example, you cannot use memory below esp beyond the red zone. (Of course, if you are code golfing, you probably don't care that your program doesn't behave properly in the face of asynchronous exceptions. But if you're writing production code, you need to be aware of platform requirements.)

    – Raymond Chen
    Nov 14 '18 at 0:34













  • @RaymondChen: I meant as far as arg-passing, not how you use stack space within a function, but yeah good point about wording. I didn't just mean for golf; as long as a function respects its caller's ABI, its internal implementation can include call instructions however you like. (And BTW, Windows x64 doesn't officially have a red-zone at all below rsp. In practice there might not be anything that asynchronously clobbers that space but it's not guaranteed. Perhaps you're thinking about Itanium again, like you were last time you mentioned something about a red-zone for x64.)

    – Peter Cordes
    Nov 14 '18 at 0:42











  • I was speaking more generally. There are parts of the software conventions that apply throughout the entire lifetime of a function, not just at points where you call into the operating system. For example, you must register unwind codes if the return address is not in its default place (top of stack for x64, default return address register for RISC), or if you use nonvolatile registers. "There might not be anything that asynchronously clobbers that space." Any in-page error could clobber that space. And you cannot prevent in-page errors.

    – Raymond Chen
    Nov 14 '18 at 2:03


















1














The Windows x64 calling convention is designed to make it easy to implement variadic functions (like printf and scanf) by dumping the 4 register args into the shadow space, creating a contiguous array of all args. Args larger than 8 bytes are passed by reference, so each arg always takes exactly 1 arg-passing slot.



Given this design constraint, more register args would require a larger shadow space, which wastes more stack space for small functions that don't have a lot of args.



Yes, more register args would normally be more efficient. But if the callee wants to make another function call right away with different args, it would then have to store all its register args to the stack, so there's a limit on how many register args are useful.



You want a good mix of call-preserved and call-clobbered registers, regardless of how many are used for arg-passing. R10 and R11 are call-clobbered scratch regs. A transparent wrapper function written in asm might use them for scratch space without disturbing any of the args in RCX,RDX,R8,R9, and without needing to save/restore a call-preserved register anywhere.



R12..R15 are call-preserved registers you can use for whatever you want, as long as your save/restore them before returning.






Or if we can do that in user-defined functions




Yes, you can freely make up your own calling conventions when calling from asm to asm, subject to constraints imposed by the OS. But if you want exceptions to be able to unwind the stack through such a call (e.g. if one of the child functions calls back into some C++ that can throw), you have to follow more restrictions, such as creating unwind metadata. If not, you can do nearly anything.



See my Choose your calling convention to put args where you want them. answer on the CodeGolf Q&A "Tips for golfing in x86/x64 machine code".



You can also return in whatever register(s) you want, and return multiple values. (e.g. an asm strcmp or memcmp function can return the -/0/+ difference in the mismatch in EAX, and return the mismatch position in RDI, so the caller can use either or both.)





A useful exercise in evaluating a design is to compare it to other actual or possible designs



By comparison, the x86-64 System V ABI passes the first 6 integer args in registers, and the first 8 FP args in XMM0..7. (Windows x64 passes the 5th arg on the stack, even if it's FP and the first 4 args were all integer.)



So the other major x86-64 calling convention does use more arg-passing registers. It doesn't use shadow-space; it defines a red-zone below RSP that's safe from being asynchronously clobbered. Small leaf functions can still avoid manipulating RSP to reserve space.



Fun fact: R10 and R11 are also non-arg-passing call-clobbered registers in x86-64 SysV. Fun fact #2: syscall destroys R11 (and RCX), so Linux uses R10 instead of RCX for passing arguments to system calls, but otherwise uses the same register-arg passing convention as user-space function calls.



See also Why does Windows64 use a different calling convention from all other OSes on x86-64? for more guesswork and info about why Microsoft made the design choices they did with their calling convention.



x86-64 System V makes it more complex to implement variadic functions (more code to index args), but they're generally rare. Most code doesn't bottleneck on sscanf throughput. Shadow space is usually worse than a red-zone. The original Windows x64 convention doesn't pass vector args (__m128) by value, so there's a 2nd 64-bit calling convention on Windows called vectorcall that allows efficient vector args. (Not usually a big deal because most functions that take vector args are inline, but SIMD math library functions would benefit.)



Having more args passed in the low 8 (rax..rdi original registers that don't need a REX prefix), and having more call-clobbered registers that don't need a REX prefix, is probably good for code-size in code that inlines enough to not make a huge amount of function calls. You could say that Window's choice of having more of the non-REX registers be call-preserved is better for code with loops containing function calls, but if you're making lots of function calls to short callees, then they'd benefit from more call-clobbered scratch registers that didn't need REX prefixes. I wonder how much thought MS put into this, or if they just mostly kept things similar to 32-bit calling conventions when choosing which of the low-8 registers would be call-preserved.



One of x86-64 System V's weaknesses is having no call-preserved XMM registers, though. So any function call requires spilling/reloading any FP vars. Having a couple, like the low 128 or 64 bits of xmm6 and xmm7, would have been maybe good.






share|improve this answer





















  • 2





    "Yes, you can freely make up your own calling conventions when calling from asm to asm." Within limits. The Windows platform does impose some restrictions in order to support asynchronous exceptions. For example, you cannot use memory below esp beyond the red zone. (Of course, if you are code golfing, you probably don't care that your program doesn't behave properly in the face of asynchronous exceptions. But if you're writing production code, you need to be aware of platform requirements.)

    – Raymond Chen
    Nov 14 '18 at 0:34













  • @RaymondChen: I meant as far as arg-passing, not how you use stack space within a function, but yeah good point about wording. I didn't just mean for golf; as long as a function respects its caller's ABI, its internal implementation can include call instructions however you like. (And BTW, Windows x64 doesn't officially have a red-zone at all below rsp. In practice there might not be anything that asynchronously clobbers that space but it's not guaranteed. Perhaps you're thinking about Itanium again, like you were last time you mentioned something about a red-zone for x64.)

    – Peter Cordes
    Nov 14 '18 at 0:42











  • I was speaking more generally. There are parts of the software conventions that apply throughout the entire lifetime of a function, not just at points where you call into the operating system. For example, you must register unwind codes if the return address is not in its default place (top of stack for x64, default return address register for RISC), or if you use nonvolatile registers. "There might not be anything that asynchronously clobbers that space." Any in-page error could clobber that space. And you cannot prevent in-page errors.

    – Raymond Chen
    Nov 14 '18 at 2:03
















1












1








1







The Windows x64 calling convention is designed to make it easy to implement variadic functions (like printf and scanf) by dumping the 4 register args into the shadow space, creating a contiguous array of all args. Args larger than 8 bytes are passed by reference, so each arg always takes exactly 1 arg-passing slot.



Given this design constraint, more register args would require a larger shadow space, which wastes more stack space for small functions that don't have a lot of args.



Yes, more register args would normally be more efficient. But if the callee wants to make another function call right away with different args, it would then have to store all its register args to the stack, so there's a limit on how many register args are useful.



You want a good mix of call-preserved and call-clobbered registers, regardless of how many are used for arg-passing. R10 and R11 are call-clobbered scratch regs. A transparent wrapper function written in asm might use them for scratch space without disturbing any of the args in RCX,RDX,R8,R9, and without needing to save/restore a call-preserved register anywhere.



R12..R15 are call-preserved registers you can use for whatever you want, as long as your save/restore them before returning.






Or if we can do that in user-defined functions




Yes, you can freely make up your own calling conventions when calling from asm to asm, subject to constraints imposed by the OS. But if you want exceptions to be able to unwind the stack through such a call (e.g. if one of the child functions calls back into some C++ that can throw), you have to follow more restrictions, such as creating unwind metadata. If not, you can do nearly anything.



See my Choose your calling convention to put args where you want them. answer on the CodeGolf Q&A "Tips for golfing in x86/x64 machine code".



You can also return in whatever register(s) you want, and return multiple values. (e.g. an asm strcmp or memcmp function can return the -/0/+ difference in the mismatch in EAX, and return the mismatch position in RDI, so the caller can use either or both.)





A useful exercise in evaluating a design is to compare it to other actual or possible designs



By comparison, the x86-64 System V ABI passes the first 6 integer args in registers, and the first 8 FP args in XMM0..7. (Windows x64 passes the 5th arg on the stack, even if it's FP and the first 4 args were all integer.)



So the other major x86-64 calling convention does use more arg-passing registers. It doesn't use shadow-space; it defines a red-zone below RSP that's safe from being asynchronously clobbered. Small leaf functions can still avoid manipulating RSP to reserve space.



Fun fact: R10 and R11 are also non-arg-passing call-clobbered registers in x86-64 SysV. Fun fact #2: syscall destroys R11 (and RCX), so Linux uses R10 instead of RCX for passing arguments to system calls, but otherwise uses the same register-arg passing convention as user-space function calls.



See also Why does Windows64 use a different calling convention from all other OSes on x86-64? for more guesswork and info about why Microsoft made the design choices they did with their calling convention.



x86-64 System V makes it more complex to implement variadic functions (more code to index args), but they're generally rare. Most code doesn't bottleneck on sscanf throughput. Shadow space is usually worse than a red-zone. The original Windows x64 convention doesn't pass vector args (__m128) by value, so there's a 2nd 64-bit calling convention on Windows called vectorcall that allows efficient vector args. (Not usually a big deal because most functions that take vector args are inline, but SIMD math library functions would benefit.)



Having more args passed in the low 8 (rax..rdi original registers that don't need a REX prefix), and having more call-clobbered registers that don't need a REX prefix, is probably good for code-size in code that inlines enough to not make a huge amount of function calls. You could say that Window's choice of having more of the non-REX registers be call-preserved is better for code with loops containing function calls, but if you're making lots of function calls to short callees, then they'd benefit from more call-clobbered scratch registers that didn't need REX prefixes. I wonder how much thought MS put into this, or if they just mostly kept things similar to 32-bit calling conventions when choosing which of the low-8 registers would be call-preserved.



One of x86-64 System V's weaknesses is having no call-preserved XMM registers, though. So any function call requires spilling/reloading any FP vars. Having a couple, like the low 128 or 64 bits of xmm6 and xmm7, would have been maybe good.






share|improve this answer















The Windows x64 calling convention is designed to make it easy to implement variadic functions (like printf and scanf) by dumping the 4 register args into the shadow space, creating a contiguous array of all args. Args larger than 8 bytes are passed by reference, so each arg always takes exactly 1 arg-passing slot.



Given this design constraint, more register args would require a larger shadow space, which wastes more stack space for small functions that don't have a lot of args.



Yes, more register args would normally be more efficient. But if the callee wants to make another function call right away with different args, it would then have to store all its register args to the stack, so there's a limit on how many register args are useful.



You want a good mix of call-preserved and call-clobbered registers, regardless of how many are used for arg-passing. R10 and R11 are call-clobbered scratch regs. A transparent wrapper function written in asm might use them for scratch space without disturbing any of the args in RCX,RDX,R8,R9, and without needing to save/restore a call-preserved register anywhere.



R12..R15 are call-preserved registers you can use for whatever you want, as long as your save/restore them before returning.






Or if we can do that in user-defined functions




Yes, you can freely make up your own calling conventions when calling from asm to asm, subject to constraints imposed by the OS. But if you want exceptions to be able to unwind the stack through such a call (e.g. if one of the child functions calls back into some C++ that can throw), you have to follow more restrictions, such as creating unwind metadata. If not, you can do nearly anything.



See my Choose your calling convention to put args where you want them. answer on the CodeGolf Q&A "Tips for golfing in x86/x64 machine code".



You can also return in whatever register(s) you want, and return multiple values. (e.g. an asm strcmp or memcmp function can return the -/0/+ difference in the mismatch in EAX, and return the mismatch position in RDI, so the caller can use either or both.)





A useful exercise in evaluating a design is to compare it to other actual or possible designs



By comparison, the x86-64 System V ABI passes the first 6 integer args in registers, and the first 8 FP args in XMM0..7. (Windows x64 passes the 5th arg on the stack, even if it's FP and the first 4 args were all integer.)



So the other major x86-64 calling convention does use more arg-passing registers. It doesn't use shadow-space; it defines a red-zone below RSP that's safe from being asynchronously clobbered. Small leaf functions can still avoid manipulating RSP to reserve space.



Fun fact: R10 and R11 are also non-arg-passing call-clobbered registers in x86-64 SysV. Fun fact #2: syscall destroys R11 (and RCX), so Linux uses R10 instead of RCX for passing arguments to system calls, but otherwise uses the same register-arg passing convention as user-space function calls.



See also Why does Windows64 use a different calling convention from all other OSes on x86-64? for more guesswork and info about why Microsoft made the design choices they did with their calling convention.



x86-64 System V makes it more complex to implement variadic functions (more code to index args), but they're generally rare. Most code doesn't bottleneck on sscanf throughput. Shadow space is usually worse than a red-zone. The original Windows x64 convention doesn't pass vector args (__m128) by value, so there's a 2nd 64-bit calling convention on Windows called vectorcall that allows efficient vector args. (Not usually a big deal because most functions that take vector args are inline, but SIMD math library functions would benefit.)



Having more args passed in the low 8 (rax..rdi original registers that don't need a REX prefix), and having more call-clobbered registers that don't need a REX prefix, is probably good for code-size in code that inlines enough to not make a huge amount of function calls. You could say that Window's choice of having more of the non-REX registers be call-preserved is better for code with loops containing function calls, but if you're making lots of function calls to short callees, then they'd benefit from more call-clobbered scratch registers that didn't need REX prefixes. I wonder how much thought MS put into this, or if they just mostly kept things similar to 32-bit calling conventions when choosing which of the low-8 registers would be call-preserved.



One of x86-64 System V's weaknesses is having no call-preserved XMM registers, though. So any function call requires spilling/reloading any FP vars. Having a couple, like the low 128 or 64 bits of xmm6 and xmm7, would have been maybe good.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 14 '18 at 2:34

























answered Nov 13 '18 at 23:37









Peter CordesPeter Cordes

123k17185312




123k17185312








  • 2





    "Yes, you can freely make up your own calling conventions when calling from asm to asm." Within limits. The Windows platform does impose some restrictions in order to support asynchronous exceptions. For example, you cannot use memory below esp beyond the red zone. (Of course, if you are code golfing, you probably don't care that your program doesn't behave properly in the face of asynchronous exceptions. But if you're writing production code, you need to be aware of platform requirements.)

    – Raymond Chen
    Nov 14 '18 at 0:34













  • @RaymondChen: I meant as far as arg-passing, not how you use stack space within a function, but yeah good point about wording. I didn't just mean for golf; as long as a function respects its caller's ABI, its internal implementation can include call instructions however you like. (And BTW, Windows x64 doesn't officially have a red-zone at all below rsp. In practice there might not be anything that asynchronously clobbers that space but it's not guaranteed. Perhaps you're thinking about Itanium again, like you were last time you mentioned something about a red-zone for x64.)

    – Peter Cordes
    Nov 14 '18 at 0:42











  • I was speaking more generally. There are parts of the software conventions that apply throughout the entire lifetime of a function, not just at points where you call into the operating system. For example, you must register unwind codes if the return address is not in its default place (top of stack for x64, default return address register for RISC), or if you use nonvolatile registers. "There might not be anything that asynchronously clobbers that space." Any in-page error could clobber that space. And you cannot prevent in-page errors.

    – Raymond Chen
    Nov 14 '18 at 2:03
















  • 2





    "Yes, you can freely make up your own calling conventions when calling from asm to asm." Within limits. The Windows platform does impose some restrictions in order to support asynchronous exceptions. For example, you cannot use memory below esp beyond the red zone. (Of course, if you are code golfing, you probably don't care that your program doesn't behave properly in the face of asynchronous exceptions. But if you're writing production code, you need to be aware of platform requirements.)

    – Raymond Chen
    Nov 14 '18 at 0:34













  • @RaymondChen: I meant as far as arg-passing, not how you use stack space within a function, but yeah good point about wording. I didn't just mean for golf; as long as a function respects its caller's ABI, its internal implementation can include call instructions however you like. (And BTW, Windows x64 doesn't officially have a red-zone at all below rsp. In practice there might not be anything that asynchronously clobbers that space but it's not guaranteed. Perhaps you're thinking about Itanium again, like you were last time you mentioned something about a red-zone for x64.)

    – Peter Cordes
    Nov 14 '18 at 0:42











  • I was speaking more generally. There are parts of the software conventions that apply throughout the entire lifetime of a function, not just at points where you call into the operating system. For example, you must register unwind codes if the return address is not in its default place (top of stack for x64, default return address register for RISC), or if you use nonvolatile registers. "There might not be anything that asynchronously clobbers that space." Any in-page error could clobber that space. And you cannot prevent in-page errors.

    – Raymond Chen
    Nov 14 '18 at 2:03










2




2





"Yes, you can freely make up your own calling conventions when calling from asm to asm." Within limits. The Windows platform does impose some restrictions in order to support asynchronous exceptions. For example, you cannot use memory below esp beyond the red zone. (Of course, if you are code golfing, you probably don't care that your program doesn't behave properly in the face of asynchronous exceptions. But if you're writing production code, you need to be aware of platform requirements.)

– Raymond Chen
Nov 14 '18 at 0:34







"Yes, you can freely make up your own calling conventions when calling from asm to asm." Within limits. The Windows platform does impose some restrictions in order to support asynchronous exceptions. For example, you cannot use memory below esp beyond the red zone. (Of course, if you are code golfing, you probably don't care that your program doesn't behave properly in the face of asynchronous exceptions. But if you're writing production code, you need to be aware of platform requirements.)

– Raymond Chen
Nov 14 '18 at 0:34















@RaymondChen: I meant as far as arg-passing, not how you use stack space within a function, but yeah good point about wording. I didn't just mean for golf; as long as a function respects its caller's ABI, its internal implementation can include call instructions however you like. (And BTW, Windows x64 doesn't officially have a red-zone at all below rsp. In practice there might not be anything that asynchronously clobbers that space but it's not guaranteed. Perhaps you're thinking about Itanium again, like you were last time you mentioned something about a red-zone for x64.)

– Peter Cordes
Nov 14 '18 at 0:42





@RaymondChen: I meant as far as arg-passing, not how you use stack space within a function, but yeah good point about wording. I didn't just mean for golf; as long as a function respects its caller's ABI, its internal implementation can include call instructions however you like. (And BTW, Windows x64 doesn't officially have a red-zone at all below rsp. In practice there might not be anything that asynchronously clobbers that space but it's not guaranteed. Perhaps you're thinking about Itanium again, like you were last time you mentioned something about a red-zone for x64.)

– Peter Cordes
Nov 14 '18 at 0:42













I was speaking more generally. There are parts of the software conventions that apply throughout the entire lifetime of a function, not just at points where you call into the operating system. For example, you must register unwind codes if the return address is not in its default place (top of stack for x64, default return address register for RISC), or if you use nonvolatile registers. "There might not be anything that asynchronously clobbers that space." Any in-page error could clobber that space. And you cannot prevent in-page errors.

– Raymond Chen
Nov 14 '18 at 2:03







I was speaking more generally. There are parts of the software conventions that apply throughout the entire lifetime of a function, not just at points where you call into the operating system. For example, you must register unwind codes if the return address is not in its default place (top of stack for x64, default return address register for RISC), or if you use nonvolatile registers. "There might not be anything that asynchronously clobbers that space." Any in-page error could clobber that space. And you cannot prevent in-page errors.

– Raymond Chen
Nov 14 '18 at 2:03




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53290932%2fwhat-are-r10-r15-registers-used-for-in-the-windows-x64-calling-convention%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Bressuire

Vorschmack

Quarantine