F# Breakable Array Iteration With Bounds Checking Elided?











up vote
2
down vote

favorite












I am interested in trying F# in a high-performance application. I do not want to have a large array's bounds checked during iteration and the lack of break/return statements is concerning.



This is a contrived example that will break upon finding a value, but can someone tell me if bounds checking is also elided?



let innerExists (item: Char) (items: Char array): bool = 
let mutable state = false
let mutable i = 0
while not state && i < items.Length do
state <- item = items.[i]
i <- i + 1

state

let exists (input: Char array)(illegalChars: Char array): bool =
let mutable state = false
let mutable i = 0
while not state && i < input.Length do
state <- innerExists input.[i] illegalChars
i <- i + 1

state

exists [|'A'..'z'|] [|'.';',';';'|]


Here is the relevant disassembly:



    while not state && i < input.Length do
000007FE6EB4237A cmp dword ptr [rbp-14h],0
000007FE6EB4237E jne 000007FE6EB42383
000007FE6EB42380 nop
000007FE6EB42381 jmp 000007FE6EB42386
000007FE6EB42383 nop
000007FE6EB42384 jmp 000007FE6EB423A9
000007FE6EB42386 nop
000007FE6EB42387 mov r8d,dword ptr [rbp-18h]
000007FE6EB4238B mov rdx,qword ptr [rbp+18h]
000007FE6EB4238F cmp r8d,dword ptr [rdx+8]
000007FE6EB42393 setl r8b
000007FE6EB42397 movzx r8d,r8b
000007FE6EB4239B mov dword ptr [rbp-24h],r8d
000007FE6EB4239F mov r8d,dword ptr [rbp-24h]
000007FE6EB423A3 mov dword ptr [rbp-1Ch],r8d
000007FE6EB423A7 jmp 000007FE6EB423B1
000007FE6EB423A9 nop
000007FE6EB423AA xor r8d,r8d
000007FE6EB423AD mov dword ptr [rbp-1Ch],r8d
000007FE6EB423B1 cmp dword ptr [rbp-1Ch],0
000007FE6EB423B5 je 000007FE6EB42409
state <- innerExists input.[i] illegalChars
000007FE6EB423B7 mov r8d,dword ptr [rbp-18h]
000007FE6EB423BB mov rdx,qword ptr [rbp+18h]
000007FE6EB423BF cmp r8,qword ptr [rdx+8]
000007FE6EB423C3 jb 000007FE6EB423CA
000007FE6EB423C5 call 000007FECD796850
000007FE6EB423CA lea rdx,[rdx+r8*2+10h]
000007FE6EB423CF movzx r8d,word ptr [rdx]
000007FE6EB423D3 mov rdx,qword ptr [rbp+10h]
000007FE6EB423D7 mov rdx,qword ptr [rdx+8]
000007FE6EB423DB mov r9,qword ptr [rbp+20h]
000007FE6EB423DF mov rcx,7FE6EEE0640h
000007FE6EB423E9 call 000007FE6EB41E40
000007FE6EB423EE mov dword ptr [rbp-20h],eax
000007FE6EB423F1 mov eax,dword ptr [rbp-20h]
000007FE6EB423F4 movzx eax,al
000007FE6EB423F7 mov dword ptr [rbp-14h],eax
i <- i + 1
000007FE6EB423FA mov eax,dword ptr [rbp-18h]









share|improve this question




















  • 1




    That's not CIL, it looks like x86 assembly language
    – Fyodor Soikin
    Nov 10 at 18:12










  • It's easier to read the x64 code if you annotate it with comments and point out what you think is the problem. Also, it is important that you grab the x64 code without the VS debugger attached as jitter don't optimize with VS debugger attached. With that said tail recursion is the idiom for loops that needs break in idioms.
    – Just another metaprogrammer
    Nov 11 at 0:37










  • @FyodorSoikin You're correct. I changed the label on the second code block.
    – EricP
    Nov 11 at 1:16






  • 1




    Since you are thinking about high-performance F# perhaps you find a blog post I wrote a while ago about optimizing mandelbrot in F# interesting? gist.github.com/mrange/20fa976388167e294aa01a1266ad0a8c
    – Just another metaprogrammer
    Nov 11 at 1:27















up vote
2
down vote

favorite












I am interested in trying F# in a high-performance application. I do not want to have a large array's bounds checked during iteration and the lack of break/return statements is concerning.



This is a contrived example that will break upon finding a value, but can someone tell me if bounds checking is also elided?



let innerExists (item: Char) (items: Char array): bool = 
let mutable state = false
let mutable i = 0
while not state && i < items.Length do
state <- item = items.[i]
i <- i + 1

state

let exists (input: Char array)(illegalChars: Char array): bool =
let mutable state = false
let mutable i = 0
while not state && i < input.Length do
state <- innerExists input.[i] illegalChars
i <- i + 1

state

exists [|'A'..'z'|] [|'.';',';';'|]


Here is the relevant disassembly:



    while not state && i < input.Length do
000007FE6EB4237A cmp dword ptr [rbp-14h],0
000007FE6EB4237E jne 000007FE6EB42383
000007FE6EB42380 nop
000007FE6EB42381 jmp 000007FE6EB42386
000007FE6EB42383 nop
000007FE6EB42384 jmp 000007FE6EB423A9
000007FE6EB42386 nop
000007FE6EB42387 mov r8d,dword ptr [rbp-18h]
000007FE6EB4238B mov rdx,qword ptr [rbp+18h]
000007FE6EB4238F cmp r8d,dword ptr [rdx+8]
000007FE6EB42393 setl r8b
000007FE6EB42397 movzx r8d,r8b
000007FE6EB4239B mov dword ptr [rbp-24h],r8d
000007FE6EB4239F mov r8d,dword ptr [rbp-24h]
000007FE6EB423A3 mov dword ptr [rbp-1Ch],r8d
000007FE6EB423A7 jmp 000007FE6EB423B1
000007FE6EB423A9 nop
000007FE6EB423AA xor r8d,r8d
000007FE6EB423AD mov dword ptr [rbp-1Ch],r8d
000007FE6EB423B1 cmp dword ptr [rbp-1Ch],0
000007FE6EB423B5 je 000007FE6EB42409
state <- innerExists input.[i] illegalChars
000007FE6EB423B7 mov r8d,dword ptr [rbp-18h]
000007FE6EB423BB mov rdx,qword ptr [rbp+18h]
000007FE6EB423BF cmp r8,qword ptr [rdx+8]
000007FE6EB423C3 jb 000007FE6EB423CA
000007FE6EB423C5 call 000007FECD796850
000007FE6EB423CA lea rdx,[rdx+r8*2+10h]
000007FE6EB423CF movzx r8d,word ptr [rdx]
000007FE6EB423D3 mov rdx,qword ptr [rbp+10h]
000007FE6EB423D7 mov rdx,qword ptr [rdx+8]
000007FE6EB423DB mov r9,qword ptr [rbp+20h]
000007FE6EB423DF mov rcx,7FE6EEE0640h
000007FE6EB423E9 call 000007FE6EB41E40
000007FE6EB423EE mov dword ptr [rbp-20h],eax
000007FE6EB423F1 mov eax,dword ptr [rbp-20h]
000007FE6EB423F4 movzx eax,al
000007FE6EB423F7 mov dword ptr [rbp-14h],eax
i <- i + 1
000007FE6EB423FA mov eax,dword ptr [rbp-18h]









share|improve this question




















  • 1




    That's not CIL, it looks like x86 assembly language
    – Fyodor Soikin
    Nov 10 at 18:12










  • It's easier to read the x64 code if you annotate it with comments and point out what you think is the problem. Also, it is important that you grab the x64 code without the VS debugger attached as jitter don't optimize with VS debugger attached. With that said tail recursion is the idiom for loops that needs break in idioms.
    – Just another metaprogrammer
    Nov 11 at 0:37










  • @FyodorSoikin You're correct. I changed the label on the second code block.
    – EricP
    Nov 11 at 1:16






  • 1




    Since you are thinking about high-performance F# perhaps you find a blog post I wrote a while ago about optimizing mandelbrot in F# interesting? gist.github.com/mrange/20fa976388167e294aa01a1266ad0a8c
    – Just another metaprogrammer
    Nov 11 at 1:27













up vote
2
down vote

favorite









up vote
2
down vote

favorite











I am interested in trying F# in a high-performance application. I do not want to have a large array's bounds checked during iteration and the lack of break/return statements is concerning.



This is a contrived example that will break upon finding a value, but can someone tell me if bounds checking is also elided?



let innerExists (item: Char) (items: Char array): bool = 
let mutable state = false
let mutable i = 0
while not state && i < items.Length do
state <- item = items.[i]
i <- i + 1

state

let exists (input: Char array)(illegalChars: Char array): bool =
let mutable state = false
let mutable i = 0
while not state && i < input.Length do
state <- innerExists input.[i] illegalChars
i <- i + 1

state

exists [|'A'..'z'|] [|'.';',';';'|]


Here is the relevant disassembly:



    while not state && i < input.Length do
000007FE6EB4237A cmp dword ptr [rbp-14h],0
000007FE6EB4237E jne 000007FE6EB42383
000007FE6EB42380 nop
000007FE6EB42381 jmp 000007FE6EB42386
000007FE6EB42383 nop
000007FE6EB42384 jmp 000007FE6EB423A9
000007FE6EB42386 nop
000007FE6EB42387 mov r8d,dword ptr [rbp-18h]
000007FE6EB4238B mov rdx,qword ptr [rbp+18h]
000007FE6EB4238F cmp r8d,dword ptr [rdx+8]
000007FE6EB42393 setl r8b
000007FE6EB42397 movzx r8d,r8b
000007FE6EB4239B mov dword ptr [rbp-24h],r8d
000007FE6EB4239F mov r8d,dword ptr [rbp-24h]
000007FE6EB423A3 mov dword ptr [rbp-1Ch],r8d
000007FE6EB423A7 jmp 000007FE6EB423B1
000007FE6EB423A9 nop
000007FE6EB423AA xor r8d,r8d
000007FE6EB423AD mov dword ptr [rbp-1Ch],r8d
000007FE6EB423B1 cmp dword ptr [rbp-1Ch],0
000007FE6EB423B5 je 000007FE6EB42409
state <- innerExists input.[i] illegalChars
000007FE6EB423B7 mov r8d,dword ptr [rbp-18h]
000007FE6EB423BB mov rdx,qword ptr [rbp+18h]
000007FE6EB423BF cmp r8,qword ptr [rdx+8]
000007FE6EB423C3 jb 000007FE6EB423CA
000007FE6EB423C5 call 000007FECD796850
000007FE6EB423CA lea rdx,[rdx+r8*2+10h]
000007FE6EB423CF movzx r8d,word ptr [rdx]
000007FE6EB423D3 mov rdx,qword ptr [rbp+10h]
000007FE6EB423D7 mov rdx,qword ptr [rdx+8]
000007FE6EB423DB mov r9,qword ptr [rbp+20h]
000007FE6EB423DF mov rcx,7FE6EEE0640h
000007FE6EB423E9 call 000007FE6EB41E40
000007FE6EB423EE mov dword ptr [rbp-20h],eax
000007FE6EB423F1 mov eax,dword ptr [rbp-20h]
000007FE6EB423F4 movzx eax,al
000007FE6EB423F7 mov dword ptr [rbp-14h],eax
i <- i + 1
000007FE6EB423FA mov eax,dword ptr [rbp-18h]









share|improve this question















I am interested in trying F# in a high-performance application. I do not want to have a large array's bounds checked during iteration and the lack of break/return statements is concerning.



This is a contrived example that will break upon finding a value, but can someone tell me if bounds checking is also elided?



let innerExists (item: Char) (items: Char array): bool = 
let mutable state = false
let mutable i = 0
while not state && i < items.Length do
state <- item = items.[i]
i <- i + 1

state

let exists (input: Char array)(illegalChars: Char array): bool =
let mutable state = false
let mutable i = 0
while not state && i < input.Length do
state <- innerExists input.[i] illegalChars
i <- i + 1

state

exists [|'A'..'z'|] [|'.';',';';'|]


Here is the relevant disassembly:



    while not state && i < input.Length do
000007FE6EB4237A cmp dword ptr [rbp-14h],0
000007FE6EB4237E jne 000007FE6EB42383
000007FE6EB42380 nop
000007FE6EB42381 jmp 000007FE6EB42386
000007FE6EB42383 nop
000007FE6EB42384 jmp 000007FE6EB423A9
000007FE6EB42386 nop
000007FE6EB42387 mov r8d,dword ptr [rbp-18h]
000007FE6EB4238B mov rdx,qword ptr [rbp+18h]
000007FE6EB4238F cmp r8d,dword ptr [rdx+8]
000007FE6EB42393 setl r8b
000007FE6EB42397 movzx r8d,r8b
000007FE6EB4239B mov dword ptr [rbp-24h],r8d
000007FE6EB4239F mov r8d,dword ptr [rbp-24h]
000007FE6EB423A3 mov dword ptr [rbp-1Ch],r8d
000007FE6EB423A7 jmp 000007FE6EB423B1
000007FE6EB423A9 nop
000007FE6EB423AA xor r8d,r8d
000007FE6EB423AD mov dword ptr [rbp-1Ch],r8d
000007FE6EB423B1 cmp dword ptr [rbp-1Ch],0
000007FE6EB423B5 je 000007FE6EB42409
state <- innerExists input.[i] illegalChars
000007FE6EB423B7 mov r8d,dword ptr [rbp-18h]
000007FE6EB423BB mov rdx,qword ptr [rbp+18h]
000007FE6EB423BF cmp r8,qword ptr [rdx+8]
000007FE6EB423C3 jb 000007FE6EB423CA
000007FE6EB423C5 call 000007FECD796850
000007FE6EB423CA lea rdx,[rdx+r8*2+10h]
000007FE6EB423CF movzx r8d,word ptr [rdx]
000007FE6EB423D3 mov rdx,qword ptr [rbp+10h]
000007FE6EB423D7 mov rdx,qword ptr [rdx+8]
000007FE6EB423DB mov r9,qword ptr [rbp+20h]
000007FE6EB423DF mov rcx,7FE6EEE0640h
000007FE6EB423E9 call 000007FE6EB41E40
000007FE6EB423EE mov dword ptr [rbp-20h],eax
000007FE6EB423F1 mov eax,dword ptr [rbp-20h]
000007FE6EB423F4 movzx eax,al
000007FE6EB423F7 mov dword ptr [rbp-14h],eax
i <- i + 1
000007FE6EB423FA mov eax,dword ptr [rbp-18h]






arrays performance f# break cil






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 1:14

























asked Nov 10 at 17:28









EricP

204




204








  • 1




    That's not CIL, it looks like x86 assembly language
    – Fyodor Soikin
    Nov 10 at 18:12










  • It's easier to read the x64 code if you annotate it with comments and point out what you think is the problem. Also, it is important that you grab the x64 code without the VS debugger attached as jitter don't optimize with VS debugger attached. With that said tail recursion is the idiom for loops that needs break in idioms.
    – Just another metaprogrammer
    Nov 11 at 0:37










  • @FyodorSoikin You're correct. I changed the label on the second code block.
    – EricP
    Nov 11 at 1:16






  • 1




    Since you are thinking about high-performance F# perhaps you find a blog post I wrote a while ago about optimizing mandelbrot in F# interesting? gist.github.com/mrange/20fa976388167e294aa01a1266ad0a8c
    – Just another metaprogrammer
    Nov 11 at 1:27














  • 1




    That's not CIL, it looks like x86 assembly language
    – Fyodor Soikin
    Nov 10 at 18:12










  • It's easier to read the x64 code if you annotate it with comments and point out what you think is the problem. Also, it is important that you grab the x64 code without the VS debugger attached as jitter don't optimize with VS debugger attached. With that said tail recursion is the idiom for loops that needs break in idioms.
    – Just another metaprogrammer
    Nov 11 at 0:37










  • @FyodorSoikin You're correct. I changed the label on the second code block.
    – EricP
    Nov 11 at 1:16






  • 1




    Since you are thinking about high-performance F# perhaps you find a blog post I wrote a while ago about optimizing mandelbrot in F# interesting? gist.github.com/mrange/20fa976388167e294aa01a1266ad0a8c
    – Just another metaprogrammer
    Nov 11 at 1:27








1




1




That's not CIL, it looks like x86 assembly language
– Fyodor Soikin
Nov 10 at 18:12




That's not CIL, it looks like x86 assembly language
– Fyodor Soikin
Nov 10 at 18:12












It's easier to read the x64 code if you annotate it with comments and point out what you think is the problem. Also, it is important that you grab the x64 code without the VS debugger attached as jitter don't optimize with VS debugger attached. With that said tail recursion is the idiom for loops that needs break in idioms.
– Just another metaprogrammer
Nov 11 at 0:37




It's easier to read the x64 code if you annotate it with comments and point out what you think is the problem. Also, it is important that you grab the x64 code without the VS debugger attached as jitter don't optimize with VS debugger attached. With that said tail recursion is the idiom for loops that needs break in idioms.
– Just another metaprogrammer
Nov 11 at 0:37












@FyodorSoikin You're correct. I changed the label on the second code block.
– EricP
Nov 11 at 1:16




@FyodorSoikin You're correct. I changed the label on the second code block.
– EricP
Nov 11 at 1:16




1




1




Since you are thinking about high-performance F# perhaps you find a blog post I wrote a while ago about optimizing mandelbrot in F# interesting? gist.github.com/mrange/20fa976388167e294aa01a1266ad0a8c
– Just another metaprogrammer
Nov 11 at 1:27




Since you are thinking about high-performance F# perhaps you find a blog post I wrote a while ago about optimizing mandelbrot in F# interesting? gist.github.com/mrange/20fa976388167e294aa01a1266ad0a8c
– Just another metaprogrammer
Nov 11 at 1:27












3 Answers
3






active

oldest

votes

















up vote
2
down vote



accepted










Others pointed out to use existing function FSharp.Core to achieve the same result but I think that OP asks if in loops like the boundary check of arrays are elided (as it is checked in the loop condition).



For simple code like above the jitter should be able to elide the checks. To see this it is correct to check the assembly code but it is important to not run with the VS debugger attached as the jitter don't optimize the code then. The reason that it can be impossible to show correct values in the debugger.



First let's look at exists optimized x64:



; not state?
00007ff9`1cd37551 85c0 test eax,eax
; if state is true then exit the loop
00007ff9`1cd37553 7521 jne 00007ff9`1cd37576
; i < input.Length?
00007ff9`1cd37555 395e08 cmp dword ptr [rsi+8],ebx
; Seems overly complex but perhaps this is as good as it gets?
00007ff9`1cd37558 0f9fc1 setg cl
00007ff9`1cd3755b 0fb6c9 movzx ecx,cl
00007ff9`1cd3755e 85c9 test ecx,ecx
; if we have reached end of the array then exit
00007ff9`1cd37560 7414 je 00007ff9`1cd37576
; mov i in ebx to rcx, unnecessary but moves like these are very cheap
00007ff9`1cd37562 4863cb movsxd rcx,ebx
; input.[i] (note we don't check the boundary again)
00007ff9`1cd37565 0fb74c4e10 movzx ecx,word ptr [rsi+rcx*2+10h]
; move illegalChars pointer to rdx
00007ff9`1cd3756a 488bd7 mov rdx,rdi
; call innerExists
00007ff9`1cd3756d e8ee9affff call 00007ff9`1cd31060
; i <- i + 1
00007ff9`1cd37572 ffc3 inc ebx
; Jump top of loop
00007ff9`1cd37574 ebdb jmp 00007ff9`1cd37551
; We are done!
00007ff9`1cd37576


So the code looks a bit too complex for what it should need to be but it seems it only checks the array condition once.



Now let's look at innerExists optimized x64:



# let mutable state = false
00007ff9`1cd375a0 33c0 xor eax,eax
# let mutable i = 0
00007ff9`1cd375a2 4533c0 xor r8d,r8d
; not state?
00007ff9`1cd375a5 85c0 test eax,eax
; if state is true then exit the loop
00007ff9`1cd375a7 752b jne 00007ff9`1cd375d4
; i < items.Length
00007ff9`1cd375a9 44394208 cmp dword ptr [rdx+8],r8d
; Seems overly complex but perhaps this is as good as it gets?
00007ff9`1cd375ad 410f9fc1 setg r9b
00007ff9`1cd375b1 450fb6c9 movzx r9d,r9b
00007ff9`1cd375b5 4585c9 test r9d,r9d
; if we have reached end of the array then exit
00007ff9`1cd375b8 741a je 00007ff9`1cd375d4
; mov i in r8d to rax, unnecessary but moves like these are very cheap
00007ff9`1cd375ba 4963c0 movsxd rax,r8d
; items.[i] (note we don't check the boundary again)
00007ff9`1cd375bd 0fb7444210 movzx eax,word ptr [rdx+rax*2+10h]
; mov item in cx to r9d, unnecessary but moves like these are very cheap
00007ff9`1cd375c2 440fb7c9 movzx r9d,cx
; item = items.[i]?
00007ff9`1cd375c6 413bc1 cmp eax,r9d
00007ff9`1cd375c9 0f94c0 sete al
; state <- ?
00007ff9`1cd375cc 0fb6c0 movzx eax,al
; i <- i + 1
00007ff9`1cd375cf 41ffc0 inc r8d
; Jump top of loop
00007ff9`1cd375d2 ebd1 jmp 00007ff9`1cd375a5
; We are done!
00007ff9`1cd375d4 c3 ret


So looks overly complex for what it should be but at least it looks like it only checks the array condition once.



So finally, it looks like the jitter eliminates the array boundary checks because it can prove this has already been checked successfully in the loop condition which I believe is what the OP wondered.



The x64 code doesn't look as clean as it could but from my experimentation x64 code that is cleaned up doesn't perform that much better, I suspect the CPU vendors optimize the CPU for the crappy code jitters produce.



An interesting exercise would be to code up an equivalent program in C++ and run it through https://godbolt.org/, choose x86-64 gcc (trunk) (gcc seems to do best right now) and specify the options -O3 -march=native and see the resulting x64 code.



Update



The code rewritten in https://godbolt.org/ to allow us seeing the assembly code generated by a c++ compiler:



template<int N>
bool innerExists(char item, char const (&items)[N]) {
for (auto i = 0; i < N; ++i) {
if (item == items[i]) return true;
}
return false;
}

template<int N1, int N2>
bool exists(char const (&input)[N1], char const (&illegalCharacters)[N2]) {
for (auto i = 0; i < N1; ++i) {
if (innerExists(input[i], illegalCharacters)) return true;
}
return false;
}

char const separators = { '.', ',', ';' };
char const str[58] = { };

bool test() {
return exists(str, separators);
}


x86-64 gcc (trunk) with options -O3 -march=native the following code is generated



; Load the string to test into edx
mov edx, OFFSET FLAT:str+1
.L2:
; Have we reached the end?
cmp rdx, OFFSET FLAT:str+58
; If yes, then jump to the end
je .L7
; Load a character
movzx ecx, BYTE PTR [rdx]
; Comparing the 3 separators are encoded in the assembler
; because the compiler detected the array is always the same
mov eax, ecx
and eax, -3
cmp al, 44
sete al
cmp cl, 59
sete cl
; increase outer i
inc rdx
; Did we find a match?
or al, cl
; If no then loop to .L2
je .L2
; We are done!
ret
.L7:
; No match found, clear result
xor eax, eax
; We are done!
ret


Looks pretty good but what I missing in the code above is using AVX to test multiple characters at once.






share|improve this answer






























    up vote
    3
    down vote













    Bound check are eliminated by the JIT compiler, so it works the same for F# as for C#. You can expect elimination for code as in your example, as well as for



    for i = 0 to data.Lenght - 1 do
    ...


    and also for tail recursive functions, which compiles down to loops.



    The built in Array.contains and Array.exists (source code) are written so that the JIT compiler can eliminate bounds checks.






    share|improve this answer





















    • I used BenchmarkRunner and confirmed my version was just as fast as the builtin methods @gileCAD posted. Then I tried to force it to perform poorly by creating the input array with Random.Next() length and passing the length in as a parameter. It was unexpectedly, marginally faster than the other two (local var vs. querying the array length?). I even tried iterating from i = len -1; i >= 0 with no change. Does the JIT compiler do that much better of a job than it used to? blogs.msdn.microsoft.com/clrcodegeneration/2009/08/13/…
      – EricP
      Nov 11 at 1:02












    • Looping towards 0 is a good trick to save a register but in this case there are enough registers to avoid pushing them to the stack. Also because how F# loops works in order to benefit looping towards 0 you have to do tail recursion.
      – Just another metaprogrammer
      Nov 11 at 1:21


















    up vote
    1
    down vote













    What's wrong with the Array.contains and Array.exists functions ?



    let exists input illegalChars = 
    input |> Array.exists (fun c -> illegalChars |> Array.contains c)





    share|improve this answer

















    • 1




      Nothing. I really want to use F#, but before I sink a ton of time into this project I need to make sure all the "things I hate about F#" google results aren't going to bit me in the rear.
      – EricP
      Nov 11 at 1:07






    • 1




      In my experience as both C# expert (well not so much anymore perhaps) and a F# expert I find that I can make F# perform better than C#. However, the code is not idiomatic F# anymore. Looks more like C code with tail recursion instead of loops.
      – Just another metaprogrammer
      Nov 11 at 1:23













    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241571%2ff-breakable-array-iteration-with-bounds-checking-elided%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote



    accepted










    Others pointed out to use existing function FSharp.Core to achieve the same result but I think that OP asks if in loops like the boundary check of arrays are elided (as it is checked in the loop condition).



    For simple code like above the jitter should be able to elide the checks. To see this it is correct to check the assembly code but it is important to not run with the VS debugger attached as the jitter don't optimize the code then. The reason that it can be impossible to show correct values in the debugger.



    First let's look at exists optimized x64:



    ; not state?
    00007ff9`1cd37551 85c0 test eax,eax
    ; if state is true then exit the loop
    00007ff9`1cd37553 7521 jne 00007ff9`1cd37576
    ; i < input.Length?
    00007ff9`1cd37555 395e08 cmp dword ptr [rsi+8],ebx
    ; Seems overly complex but perhaps this is as good as it gets?
    00007ff9`1cd37558 0f9fc1 setg cl
    00007ff9`1cd3755b 0fb6c9 movzx ecx,cl
    00007ff9`1cd3755e 85c9 test ecx,ecx
    ; if we have reached end of the array then exit
    00007ff9`1cd37560 7414 je 00007ff9`1cd37576
    ; mov i in ebx to rcx, unnecessary but moves like these are very cheap
    00007ff9`1cd37562 4863cb movsxd rcx,ebx
    ; input.[i] (note we don't check the boundary again)
    00007ff9`1cd37565 0fb74c4e10 movzx ecx,word ptr [rsi+rcx*2+10h]
    ; move illegalChars pointer to rdx
    00007ff9`1cd3756a 488bd7 mov rdx,rdi
    ; call innerExists
    00007ff9`1cd3756d e8ee9affff call 00007ff9`1cd31060
    ; i <- i + 1
    00007ff9`1cd37572 ffc3 inc ebx
    ; Jump top of loop
    00007ff9`1cd37574 ebdb jmp 00007ff9`1cd37551
    ; We are done!
    00007ff9`1cd37576


    So the code looks a bit too complex for what it should need to be but it seems it only checks the array condition once.



    Now let's look at innerExists optimized x64:



    # let mutable state = false
    00007ff9`1cd375a0 33c0 xor eax,eax
    # let mutable i = 0
    00007ff9`1cd375a2 4533c0 xor r8d,r8d
    ; not state?
    00007ff9`1cd375a5 85c0 test eax,eax
    ; if state is true then exit the loop
    00007ff9`1cd375a7 752b jne 00007ff9`1cd375d4
    ; i < items.Length
    00007ff9`1cd375a9 44394208 cmp dword ptr [rdx+8],r8d
    ; Seems overly complex but perhaps this is as good as it gets?
    00007ff9`1cd375ad 410f9fc1 setg r9b
    00007ff9`1cd375b1 450fb6c9 movzx r9d,r9b
    00007ff9`1cd375b5 4585c9 test r9d,r9d
    ; if we have reached end of the array then exit
    00007ff9`1cd375b8 741a je 00007ff9`1cd375d4
    ; mov i in r8d to rax, unnecessary but moves like these are very cheap
    00007ff9`1cd375ba 4963c0 movsxd rax,r8d
    ; items.[i] (note we don't check the boundary again)
    00007ff9`1cd375bd 0fb7444210 movzx eax,word ptr [rdx+rax*2+10h]
    ; mov item in cx to r9d, unnecessary but moves like these are very cheap
    00007ff9`1cd375c2 440fb7c9 movzx r9d,cx
    ; item = items.[i]?
    00007ff9`1cd375c6 413bc1 cmp eax,r9d
    00007ff9`1cd375c9 0f94c0 sete al
    ; state <- ?
    00007ff9`1cd375cc 0fb6c0 movzx eax,al
    ; i <- i + 1
    00007ff9`1cd375cf 41ffc0 inc r8d
    ; Jump top of loop
    00007ff9`1cd375d2 ebd1 jmp 00007ff9`1cd375a5
    ; We are done!
    00007ff9`1cd375d4 c3 ret


    So looks overly complex for what it should be but at least it looks like it only checks the array condition once.



    So finally, it looks like the jitter eliminates the array boundary checks because it can prove this has already been checked successfully in the loop condition which I believe is what the OP wondered.



    The x64 code doesn't look as clean as it could but from my experimentation x64 code that is cleaned up doesn't perform that much better, I suspect the CPU vendors optimize the CPU for the crappy code jitters produce.



    An interesting exercise would be to code up an equivalent program in C++ and run it through https://godbolt.org/, choose x86-64 gcc (trunk) (gcc seems to do best right now) and specify the options -O3 -march=native and see the resulting x64 code.



    Update



    The code rewritten in https://godbolt.org/ to allow us seeing the assembly code generated by a c++ compiler:



    template<int N>
    bool innerExists(char item, char const (&items)[N]) {
    for (auto i = 0; i < N; ++i) {
    if (item == items[i]) return true;
    }
    return false;
    }

    template<int N1, int N2>
    bool exists(char const (&input)[N1], char const (&illegalCharacters)[N2]) {
    for (auto i = 0; i < N1; ++i) {
    if (innerExists(input[i], illegalCharacters)) return true;
    }
    return false;
    }

    char const separators = { '.', ',', ';' };
    char const str[58] = { };

    bool test() {
    return exists(str, separators);
    }


    x86-64 gcc (trunk) with options -O3 -march=native the following code is generated



    ; Load the string to test into edx
    mov edx, OFFSET FLAT:str+1
    .L2:
    ; Have we reached the end?
    cmp rdx, OFFSET FLAT:str+58
    ; If yes, then jump to the end
    je .L7
    ; Load a character
    movzx ecx, BYTE PTR [rdx]
    ; Comparing the 3 separators are encoded in the assembler
    ; because the compiler detected the array is always the same
    mov eax, ecx
    and eax, -3
    cmp al, 44
    sete al
    cmp cl, 59
    sete cl
    ; increase outer i
    inc rdx
    ; Did we find a match?
    or al, cl
    ; If no then loop to .L2
    je .L2
    ; We are done!
    ret
    .L7:
    ; No match found, clear result
    xor eax, eax
    ; We are done!
    ret


    Looks pretty good but what I missing in the code above is using AVX to test multiple characters at once.






    share|improve this answer



























      up vote
      2
      down vote



      accepted










      Others pointed out to use existing function FSharp.Core to achieve the same result but I think that OP asks if in loops like the boundary check of arrays are elided (as it is checked in the loop condition).



      For simple code like above the jitter should be able to elide the checks. To see this it is correct to check the assembly code but it is important to not run with the VS debugger attached as the jitter don't optimize the code then. The reason that it can be impossible to show correct values in the debugger.



      First let's look at exists optimized x64:



      ; not state?
      00007ff9`1cd37551 85c0 test eax,eax
      ; if state is true then exit the loop
      00007ff9`1cd37553 7521 jne 00007ff9`1cd37576
      ; i < input.Length?
      00007ff9`1cd37555 395e08 cmp dword ptr [rsi+8],ebx
      ; Seems overly complex but perhaps this is as good as it gets?
      00007ff9`1cd37558 0f9fc1 setg cl
      00007ff9`1cd3755b 0fb6c9 movzx ecx,cl
      00007ff9`1cd3755e 85c9 test ecx,ecx
      ; if we have reached end of the array then exit
      00007ff9`1cd37560 7414 je 00007ff9`1cd37576
      ; mov i in ebx to rcx, unnecessary but moves like these are very cheap
      00007ff9`1cd37562 4863cb movsxd rcx,ebx
      ; input.[i] (note we don't check the boundary again)
      00007ff9`1cd37565 0fb74c4e10 movzx ecx,word ptr [rsi+rcx*2+10h]
      ; move illegalChars pointer to rdx
      00007ff9`1cd3756a 488bd7 mov rdx,rdi
      ; call innerExists
      00007ff9`1cd3756d e8ee9affff call 00007ff9`1cd31060
      ; i <- i + 1
      00007ff9`1cd37572 ffc3 inc ebx
      ; Jump top of loop
      00007ff9`1cd37574 ebdb jmp 00007ff9`1cd37551
      ; We are done!
      00007ff9`1cd37576


      So the code looks a bit too complex for what it should need to be but it seems it only checks the array condition once.



      Now let's look at innerExists optimized x64:



      # let mutable state = false
      00007ff9`1cd375a0 33c0 xor eax,eax
      # let mutable i = 0
      00007ff9`1cd375a2 4533c0 xor r8d,r8d
      ; not state?
      00007ff9`1cd375a5 85c0 test eax,eax
      ; if state is true then exit the loop
      00007ff9`1cd375a7 752b jne 00007ff9`1cd375d4
      ; i < items.Length
      00007ff9`1cd375a9 44394208 cmp dword ptr [rdx+8],r8d
      ; Seems overly complex but perhaps this is as good as it gets?
      00007ff9`1cd375ad 410f9fc1 setg r9b
      00007ff9`1cd375b1 450fb6c9 movzx r9d,r9b
      00007ff9`1cd375b5 4585c9 test r9d,r9d
      ; if we have reached end of the array then exit
      00007ff9`1cd375b8 741a je 00007ff9`1cd375d4
      ; mov i in r8d to rax, unnecessary but moves like these are very cheap
      00007ff9`1cd375ba 4963c0 movsxd rax,r8d
      ; items.[i] (note we don't check the boundary again)
      00007ff9`1cd375bd 0fb7444210 movzx eax,word ptr [rdx+rax*2+10h]
      ; mov item in cx to r9d, unnecessary but moves like these are very cheap
      00007ff9`1cd375c2 440fb7c9 movzx r9d,cx
      ; item = items.[i]?
      00007ff9`1cd375c6 413bc1 cmp eax,r9d
      00007ff9`1cd375c9 0f94c0 sete al
      ; state <- ?
      00007ff9`1cd375cc 0fb6c0 movzx eax,al
      ; i <- i + 1
      00007ff9`1cd375cf 41ffc0 inc r8d
      ; Jump top of loop
      00007ff9`1cd375d2 ebd1 jmp 00007ff9`1cd375a5
      ; We are done!
      00007ff9`1cd375d4 c3 ret


      So looks overly complex for what it should be but at least it looks like it only checks the array condition once.



      So finally, it looks like the jitter eliminates the array boundary checks because it can prove this has already been checked successfully in the loop condition which I believe is what the OP wondered.



      The x64 code doesn't look as clean as it could but from my experimentation x64 code that is cleaned up doesn't perform that much better, I suspect the CPU vendors optimize the CPU for the crappy code jitters produce.



      An interesting exercise would be to code up an equivalent program in C++ and run it through https://godbolt.org/, choose x86-64 gcc (trunk) (gcc seems to do best right now) and specify the options -O3 -march=native and see the resulting x64 code.



      Update



      The code rewritten in https://godbolt.org/ to allow us seeing the assembly code generated by a c++ compiler:



      template<int N>
      bool innerExists(char item, char const (&items)[N]) {
      for (auto i = 0; i < N; ++i) {
      if (item == items[i]) return true;
      }
      return false;
      }

      template<int N1, int N2>
      bool exists(char const (&input)[N1], char const (&illegalCharacters)[N2]) {
      for (auto i = 0; i < N1; ++i) {
      if (innerExists(input[i], illegalCharacters)) return true;
      }
      return false;
      }

      char const separators = { '.', ',', ';' };
      char const str[58] = { };

      bool test() {
      return exists(str, separators);
      }


      x86-64 gcc (trunk) with options -O3 -march=native the following code is generated



      ; Load the string to test into edx
      mov edx, OFFSET FLAT:str+1
      .L2:
      ; Have we reached the end?
      cmp rdx, OFFSET FLAT:str+58
      ; If yes, then jump to the end
      je .L7
      ; Load a character
      movzx ecx, BYTE PTR [rdx]
      ; Comparing the 3 separators are encoded in the assembler
      ; because the compiler detected the array is always the same
      mov eax, ecx
      and eax, -3
      cmp al, 44
      sete al
      cmp cl, 59
      sete cl
      ; increase outer i
      inc rdx
      ; Did we find a match?
      or al, cl
      ; If no then loop to .L2
      je .L2
      ; We are done!
      ret
      .L7:
      ; No match found, clear result
      xor eax, eax
      ; We are done!
      ret


      Looks pretty good but what I missing in the code above is using AVX to test multiple characters at once.






      share|improve this answer

























        up vote
        2
        down vote



        accepted







        up vote
        2
        down vote



        accepted






        Others pointed out to use existing function FSharp.Core to achieve the same result but I think that OP asks if in loops like the boundary check of arrays are elided (as it is checked in the loop condition).



        For simple code like above the jitter should be able to elide the checks. To see this it is correct to check the assembly code but it is important to not run with the VS debugger attached as the jitter don't optimize the code then. The reason that it can be impossible to show correct values in the debugger.



        First let's look at exists optimized x64:



        ; not state?
        00007ff9`1cd37551 85c0 test eax,eax
        ; if state is true then exit the loop
        00007ff9`1cd37553 7521 jne 00007ff9`1cd37576
        ; i < input.Length?
        00007ff9`1cd37555 395e08 cmp dword ptr [rsi+8],ebx
        ; Seems overly complex but perhaps this is as good as it gets?
        00007ff9`1cd37558 0f9fc1 setg cl
        00007ff9`1cd3755b 0fb6c9 movzx ecx,cl
        00007ff9`1cd3755e 85c9 test ecx,ecx
        ; if we have reached end of the array then exit
        00007ff9`1cd37560 7414 je 00007ff9`1cd37576
        ; mov i in ebx to rcx, unnecessary but moves like these are very cheap
        00007ff9`1cd37562 4863cb movsxd rcx,ebx
        ; input.[i] (note we don't check the boundary again)
        00007ff9`1cd37565 0fb74c4e10 movzx ecx,word ptr [rsi+rcx*2+10h]
        ; move illegalChars pointer to rdx
        00007ff9`1cd3756a 488bd7 mov rdx,rdi
        ; call innerExists
        00007ff9`1cd3756d e8ee9affff call 00007ff9`1cd31060
        ; i <- i + 1
        00007ff9`1cd37572 ffc3 inc ebx
        ; Jump top of loop
        00007ff9`1cd37574 ebdb jmp 00007ff9`1cd37551
        ; We are done!
        00007ff9`1cd37576


        So the code looks a bit too complex for what it should need to be but it seems it only checks the array condition once.



        Now let's look at innerExists optimized x64:



        # let mutable state = false
        00007ff9`1cd375a0 33c0 xor eax,eax
        # let mutable i = 0
        00007ff9`1cd375a2 4533c0 xor r8d,r8d
        ; not state?
        00007ff9`1cd375a5 85c0 test eax,eax
        ; if state is true then exit the loop
        00007ff9`1cd375a7 752b jne 00007ff9`1cd375d4
        ; i < items.Length
        00007ff9`1cd375a9 44394208 cmp dword ptr [rdx+8],r8d
        ; Seems overly complex but perhaps this is as good as it gets?
        00007ff9`1cd375ad 410f9fc1 setg r9b
        00007ff9`1cd375b1 450fb6c9 movzx r9d,r9b
        00007ff9`1cd375b5 4585c9 test r9d,r9d
        ; if we have reached end of the array then exit
        00007ff9`1cd375b8 741a je 00007ff9`1cd375d4
        ; mov i in r8d to rax, unnecessary but moves like these are very cheap
        00007ff9`1cd375ba 4963c0 movsxd rax,r8d
        ; items.[i] (note we don't check the boundary again)
        00007ff9`1cd375bd 0fb7444210 movzx eax,word ptr [rdx+rax*2+10h]
        ; mov item in cx to r9d, unnecessary but moves like these are very cheap
        00007ff9`1cd375c2 440fb7c9 movzx r9d,cx
        ; item = items.[i]?
        00007ff9`1cd375c6 413bc1 cmp eax,r9d
        00007ff9`1cd375c9 0f94c0 sete al
        ; state <- ?
        00007ff9`1cd375cc 0fb6c0 movzx eax,al
        ; i <- i + 1
        00007ff9`1cd375cf 41ffc0 inc r8d
        ; Jump top of loop
        00007ff9`1cd375d2 ebd1 jmp 00007ff9`1cd375a5
        ; We are done!
        00007ff9`1cd375d4 c3 ret


        So looks overly complex for what it should be but at least it looks like it only checks the array condition once.



        So finally, it looks like the jitter eliminates the array boundary checks because it can prove this has already been checked successfully in the loop condition which I believe is what the OP wondered.



        The x64 code doesn't look as clean as it could but from my experimentation x64 code that is cleaned up doesn't perform that much better, I suspect the CPU vendors optimize the CPU for the crappy code jitters produce.



        An interesting exercise would be to code up an equivalent program in C++ and run it through https://godbolt.org/, choose x86-64 gcc (trunk) (gcc seems to do best right now) and specify the options -O3 -march=native and see the resulting x64 code.



        Update



        The code rewritten in https://godbolt.org/ to allow us seeing the assembly code generated by a c++ compiler:



        template<int N>
        bool innerExists(char item, char const (&items)[N]) {
        for (auto i = 0; i < N; ++i) {
        if (item == items[i]) return true;
        }
        return false;
        }

        template<int N1, int N2>
        bool exists(char const (&input)[N1], char const (&illegalCharacters)[N2]) {
        for (auto i = 0; i < N1; ++i) {
        if (innerExists(input[i], illegalCharacters)) return true;
        }
        return false;
        }

        char const separators = { '.', ',', ';' };
        char const str[58] = { };

        bool test() {
        return exists(str, separators);
        }


        x86-64 gcc (trunk) with options -O3 -march=native the following code is generated



        ; Load the string to test into edx
        mov edx, OFFSET FLAT:str+1
        .L2:
        ; Have we reached the end?
        cmp rdx, OFFSET FLAT:str+58
        ; If yes, then jump to the end
        je .L7
        ; Load a character
        movzx ecx, BYTE PTR [rdx]
        ; Comparing the 3 separators are encoded in the assembler
        ; because the compiler detected the array is always the same
        mov eax, ecx
        and eax, -3
        cmp al, 44
        sete al
        cmp cl, 59
        sete cl
        ; increase outer i
        inc rdx
        ; Did we find a match?
        or al, cl
        ; If no then loop to .L2
        je .L2
        ; We are done!
        ret
        .L7:
        ; No match found, clear result
        xor eax, eax
        ; We are done!
        ret


        Looks pretty good but what I missing in the code above is using AVX to test multiple characters at once.






        share|improve this answer














        Others pointed out to use existing function FSharp.Core to achieve the same result but I think that OP asks if in loops like the boundary check of arrays are elided (as it is checked in the loop condition).



        For simple code like above the jitter should be able to elide the checks. To see this it is correct to check the assembly code but it is important to not run with the VS debugger attached as the jitter don't optimize the code then. The reason that it can be impossible to show correct values in the debugger.



        First let's look at exists optimized x64:



        ; not state?
        00007ff9`1cd37551 85c0 test eax,eax
        ; if state is true then exit the loop
        00007ff9`1cd37553 7521 jne 00007ff9`1cd37576
        ; i < input.Length?
        00007ff9`1cd37555 395e08 cmp dword ptr [rsi+8],ebx
        ; Seems overly complex but perhaps this is as good as it gets?
        00007ff9`1cd37558 0f9fc1 setg cl
        00007ff9`1cd3755b 0fb6c9 movzx ecx,cl
        00007ff9`1cd3755e 85c9 test ecx,ecx
        ; if we have reached end of the array then exit
        00007ff9`1cd37560 7414 je 00007ff9`1cd37576
        ; mov i in ebx to rcx, unnecessary but moves like these are very cheap
        00007ff9`1cd37562 4863cb movsxd rcx,ebx
        ; input.[i] (note we don't check the boundary again)
        00007ff9`1cd37565 0fb74c4e10 movzx ecx,word ptr [rsi+rcx*2+10h]
        ; move illegalChars pointer to rdx
        00007ff9`1cd3756a 488bd7 mov rdx,rdi
        ; call innerExists
        00007ff9`1cd3756d e8ee9affff call 00007ff9`1cd31060
        ; i <- i + 1
        00007ff9`1cd37572 ffc3 inc ebx
        ; Jump top of loop
        00007ff9`1cd37574 ebdb jmp 00007ff9`1cd37551
        ; We are done!
        00007ff9`1cd37576


        So the code looks a bit too complex for what it should need to be but it seems it only checks the array condition once.



        Now let's look at innerExists optimized x64:



        # let mutable state = false
        00007ff9`1cd375a0 33c0 xor eax,eax
        # let mutable i = 0
        00007ff9`1cd375a2 4533c0 xor r8d,r8d
        ; not state?
        00007ff9`1cd375a5 85c0 test eax,eax
        ; if state is true then exit the loop
        00007ff9`1cd375a7 752b jne 00007ff9`1cd375d4
        ; i < items.Length
        00007ff9`1cd375a9 44394208 cmp dword ptr [rdx+8],r8d
        ; Seems overly complex but perhaps this is as good as it gets?
        00007ff9`1cd375ad 410f9fc1 setg r9b
        00007ff9`1cd375b1 450fb6c9 movzx r9d,r9b
        00007ff9`1cd375b5 4585c9 test r9d,r9d
        ; if we have reached end of the array then exit
        00007ff9`1cd375b8 741a je 00007ff9`1cd375d4
        ; mov i in r8d to rax, unnecessary but moves like these are very cheap
        00007ff9`1cd375ba 4963c0 movsxd rax,r8d
        ; items.[i] (note we don't check the boundary again)
        00007ff9`1cd375bd 0fb7444210 movzx eax,word ptr [rdx+rax*2+10h]
        ; mov item in cx to r9d, unnecessary but moves like these are very cheap
        00007ff9`1cd375c2 440fb7c9 movzx r9d,cx
        ; item = items.[i]?
        00007ff9`1cd375c6 413bc1 cmp eax,r9d
        00007ff9`1cd375c9 0f94c0 sete al
        ; state <- ?
        00007ff9`1cd375cc 0fb6c0 movzx eax,al
        ; i <- i + 1
        00007ff9`1cd375cf 41ffc0 inc r8d
        ; Jump top of loop
        00007ff9`1cd375d2 ebd1 jmp 00007ff9`1cd375a5
        ; We are done!
        00007ff9`1cd375d4 c3 ret


        So looks overly complex for what it should be but at least it looks like it only checks the array condition once.



        So finally, it looks like the jitter eliminates the array boundary checks because it can prove this has already been checked successfully in the loop condition which I believe is what the OP wondered.



        The x64 code doesn't look as clean as it could but from my experimentation x64 code that is cleaned up doesn't perform that much better, I suspect the CPU vendors optimize the CPU for the crappy code jitters produce.



        An interesting exercise would be to code up an equivalent program in C++ and run it through https://godbolt.org/, choose x86-64 gcc (trunk) (gcc seems to do best right now) and specify the options -O3 -march=native and see the resulting x64 code.



        Update



        The code rewritten in https://godbolt.org/ to allow us seeing the assembly code generated by a c++ compiler:



        template<int N>
        bool innerExists(char item, char const (&items)[N]) {
        for (auto i = 0; i < N; ++i) {
        if (item == items[i]) return true;
        }
        return false;
        }

        template<int N1, int N2>
        bool exists(char const (&input)[N1], char const (&illegalCharacters)[N2]) {
        for (auto i = 0; i < N1; ++i) {
        if (innerExists(input[i], illegalCharacters)) return true;
        }
        return false;
        }

        char const separators = { '.', ',', ';' };
        char const str[58] = { };

        bool test() {
        return exists(str, separators);
        }


        x86-64 gcc (trunk) with options -O3 -march=native the following code is generated



        ; Load the string to test into edx
        mov edx, OFFSET FLAT:str+1
        .L2:
        ; Have we reached the end?
        cmp rdx, OFFSET FLAT:str+58
        ; If yes, then jump to the end
        je .L7
        ; Load a character
        movzx ecx, BYTE PTR [rdx]
        ; Comparing the 3 separators are encoded in the assembler
        ; because the compiler detected the array is always the same
        mov eax, ecx
        and eax, -3
        cmp al, 44
        sete al
        cmp cl, 59
        sete cl
        ; increase outer i
        inc rdx
        ; Did we find a match?
        or al, cl
        ; If no then loop to .L2
        je .L2
        ; We are done!
        ret
        .L7:
        ; No match found, clear result
        xor eax, eax
        ; We are done!
        ret


        Looks pretty good but what I missing in the code above is using AVX to test multiple characters at once.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 11 at 9:16

























        answered Nov 11 at 1:18









        Just another metaprogrammer

        8,42522140




        8,42522140
























            up vote
            3
            down vote













            Bound check are eliminated by the JIT compiler, so it works the same for F# as for C#. You can expect elimination for code as in your example, as well as for



            for i = 0 to data.Lenght - 1 do
            ...


            and also for tail recursive functions, which compiles down to loops.



            The built in Array.contains and Array.exists (source code) are written so that the JIT compiler can eliminate bounds checks.






            share|improve this answer





















            • I used BenchmarkRunner and confirmed my version was just as fast as the builtin methods @gileCAD posted. Then I tried to force it to perform poorly by creating the input array with Random.Next() length and passing the length in as a parameter. It was unexpectedly, marginally faster than the other two (local var vs. querying the array length?). I even tried iterating from i = len -1; i >= 0 with no change. Does the JIT compiler do that much better of a job than it used to? blogs.msdn.microsoft.com/clrcodegeneration/2009/08/13/…
              – EricP
              Nov 11 at 1:02












            • Looping towards 0 is a good trick to save a register but in this case there are enough registers to avoid pushing them to the stack. Also because how F# loops works in order to benefit looping towards 0 you have to do tail recursion.
              – Just another metaprogrammer
              Nov 11 at 1:21















            up vote
            3
            down vote













            Bound check are eliminated by the JIT compiler, so it works the same for F# as for C#. You can expect elimination for code as in your example, as well as for



            for i = 0 to data.Lenght - 1 do
            ...


            and also for tail recursive functions, which compiles down to loops.



            The built in Array.contains and Array.exists (source code) are written so that the JIT compiler can eliminate bounds checks.






            share|improve this answer





















            • I used BenchmarkRunner and confirmed my version was just as fast as the builtin methods @gileCAD posted. Then I tried to force it to perform poorly by creating the input array with Random.Next() length and passing the length in as a parameter. It was unexpectedly, marginally faster than the other two (local var vs. querying the array length?). I even tried iterating from i = len -1; i >= 0 with no change. Does the JIT compiler do that much better of a job than it used to? blogs.msdn.microsoft.com/clrcodegeneration/2009/08/13/…
              – EricP
              Nov 11 at 1:02












            • Looping towards 0 is a good trick to save a register but in this case there are enough registers to avoid pushing them to the stack. Also because how F# loops works in order to benefit looping towards 0 you have to do tail recursion.
              – Just another metaprogrammer
              Nov 11 at 1:21













            up vote
            3
            down vote










            up vote
            3
            down vote









            Bound check are eliminated by the JIT compiler, so it works the same for F# as for C#. You can expect elimination for code as in your example, as well as for



            for i = 0 to data.Lenght - 1 do
            ...


            and also for tail recursive functions, which compiles down to loops.



            The built in Array.contains and Array.exists (source code) are written so that the JIT compiler can eliminate bounds checks.






            share|improve this answer












            Bound check are eliminated by the JIT compiler, so it works the same for F# as for C#. You can expect elimination for code as in your example, as well as for



            for i = 0 to data.Lenght - 1 do
            ...


            and also for tail recursive functions, which compiles down to loops.



            The built in Array.contains and Array.exists (source code) are written so that the JIT compiler can eliminate bounds checks.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 10 at 18:44









            nilekirk

            4121




            4121












            • I used BenchmarkRunner and confirmed my version was just as fast as the builtin methods @gileCAD posted. Then I tried to force it to perform poorly by creating the input array with Random.Next() length and passing the length in as a parameter. It was unexpectedly, marginally faster than the other two (local var vs. querying the array length?). I even tried iterating from i = len -1; i >= 0 with no change. Does the JIT compiler do that much better of a job than it used to? blogs.msdn.microsoft.com/clrcodegeneration/2009/08/13/…
              – EricP
              Nov 11 at 1:02












            • Looping towards 0 is a good trick to save a register but in this case there are enough registers to avoid pushing them to the stack. Also because how F# loops works in order to benefit looping towards 0 you have to do tail recursion.
              – Just another metaprogrammer
              Nov 11 at 1:21


















            • I used BenchmarkRunner and confirmed my version was just as fast as the builtin methods @gileCAD posted. Then I tried to force it to perform poorly by creating the input array with Random.Next() length and passing the length in as a parameter. It was unexpectedly, marginally faster than the other two (local var vs. querying the array length?). I even tried iterating from i = len -1; i >= 0 with no change. Does the JIT compiler do that much better of a job than it used to? blogs.msdn.microsoft.com/clrcodegeneration/2009/08/13/…
              – EricP
              Nov 11 at 1:02












            • Looping towards 0 is a good trick to save a register but in this case there are enough registers to avoid pushing them to the stack. Also because how F# loops works in order to benefit looping towards 0 you have to do tail recursion.
              – Just another metaprogrammer
              Nov 11 at 1:21
















            I used BenchmarkRunner and confirmed my version was just as fast as the builtin methods @gileCAD posted. Then I tried to force it to perform poorly by creating the input array with Random.Next() length and passing the length in as a parameter. It was unexpectedly, marginally faster than the other two (local var vs. querying the array length?). I even tried iterating from i = len -1; i >= 0 with no change. Does the JIT compiler do that much better of a job than it used to? blogs.msdn.microsoft.com/clrcodegeneration/2009/08/13/…
            – EricP
            Nov 11 at 1:02






            I used BenchmarkRunner and confirmed my version was just as fast as the builtin methods @gileCAD posted. Then I tried to force it to perform poorly by creating the input array with Random.Next() length and passing the length in as a parameter. It was unexpectedly, marginally faster than the other two (local var vs. querying the array length?). I even tried iterating from i = len -1; i >= 0 with no change. Does the JIT compiler do that much better of a job than it used to? blogs.msdn.microsoft.com/clrcodegeneration/2009/08/13/…
            – EricP
            Nov 11 at 1:02














            Looping towards 0 is a good trick to save a register but in this case there are enough registers to avoid pushing them to the stack. Also because how F# loops works in order to benefit looping towards 0 you have to do tail recursion.
            – Just another metaprogrammer
            Nov 11 at 1:21




            Looping towards 0 is a good trick to save a register but in this case there are enough registers to avoid pushing them to the stack. Also because how F# loops works in order to benefit looping towards 0 you have to do tail recursion.
            – Just another metaprogrammer
            Nov 11 at 1:21










            up vote
            1
            down vote













            What's wrong with the Array.contains and Array.exists functions ?



            let exists input illegalChars = 
            input |> Array.exists (fun c -> illegalChars |> Array.contains c)





            share|improve this answer

















            • 1




              Nothing. I really want to use F#, but before I sink a ton of time into this project I need to make sure all the "things I hate about F#" google results aren't going to bit me in the rear.
              – EricP
              Nov 11 at 1:07






            • 1




              In my experience as both C# expert (well not so much anymore perhaps) and a F# expert I find that I can make F# perform better than C#. However, the code is not idiomatic F# anymore. Looks more like C code with tail recursion instead of loops.
              – Just another metaprogrammer
              Nov 11 at 1:23

















            up vote
            1
            down vote













            What's wrong with the Array.contains and Array.exists functions ?



            let exists input illegalChars = 
            input |> Array.exists (fun c -> illegalChars |> Array.contains c)





            share|improve this answer

















            • 1




              Nothing. I really want to use F#, but before I sink a ton of time into this project I need to make sure all the "things I hate about F#" google results aren't going to bit me in the rear.
              – EricP
              Nov 11 at 1:07






            • 1




              In my experience as both C# expert (well not so much anymore perhaps) and a F# expert I find that I can make F# perform better than C#. However, the code is not idiomatic F# anymore. Looks more like C code with tail recursion instead of loops.
              – Just another metaprogrammer
              Nov 11 at 1:23















            up vote
            1
            down vote










            up vote
            1
            down vote









            What's wrong with the Array.contains and Array.exists functions ?



            let exists input illegalChars = 
            input |> Array.exists (fun c -> illegalChars |> Array.contains c)





            share|improve this answer












            What's wrong with the Array.contains and Array.exists functions ?



            let exists input illegalChars = 
            input |> Array.exists (fun c -> illegalChars |> Array.contains c)






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 10 at 18:03









            gileCAD

            1,28656




            1,28656








            • 1




              Nothing. I really want to use F#, but before I sink a ton of time into this project I need to make sure all the "things I hate about F#" google results aren't going to bit me in the rear.
              – EricP
              Nov 11 at 1:07






            • 1




              In my experience as both C# expert (well not so much anymore perhaps) and a F# expert I find that I can make F# perform better than C#. However, the code is not idiomatic F# anymore. Looks more like C code with tail recursion instead of loops.
              – Just another metaprogrammer
              Nov 11 at 1:23
















            • 1




              Nothing. I really want to use F#, but before I sink a ton of time into this project I need to make sure all the "things I hate about F#" google results aren't going to bit me in the rear.
              – EricP
              Nov 11 at 1:07






            • 1




              In my experience as both C# expert (well not so much anymore perhaps) and a F# expert I find that I can make F# perform better than C#. However, the code is not idiomatic F# anymore. Looks more like C code with tail recursion instead of loops.
              – Just another metaprogrammer
              Nov 11 at 1:23










            1




            1




            Nothing. I really want to use F#, but before I sink a ton of time into this project I need to make sure all the "things I hate about F#" google results aren't going to bit me in the rear.
            – EricP
            Nov 11 at 1:07




            Nothing. I really want to use F#, but before I sink a ton of time into this project I need to make sure all the "things I hate about F#" google results aren't going to bit me in the rear.
            – EricP
            Nov 11 at 1:07




            1




            1




            In my experience as both C# expert (well not so much anymore perhaps) and a F# expert I find that I can make F# perform better than C#. However, the code is not idiomatic F# anymore. Looks more like C code with tail recursion instead of loops.
            – Just another metaprogrammer
            Nov 11 at 1:23






            In my experience as both C# expert (well not so much anymore perhaps) and a F# expert I find that I can make F# perform better than C#. However, the code is not idiomatic F# anymore. Looks more like C code with tail recursion instead of loops.
            – Just another metaprogrammer
            Nov 11 at 1:23




















             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241571%2ff-breakable-array-iteration-with-bounds-checking-elided%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Xamarin.iOS Cant Deploy on Iphone

            Glorious Revolution

            Dulmage-Mendelsohn matrix decomposition in Python