[ACCEPTED]-clearing a small integer array: memset vs. for loop-benchmarking

Accepted answer
Score: 47

In all likelihood, memset() will be inlined 27 by your compiler (most compilers treat it 26 as an 'intrinsic', which basically means 25 it's inlined, except maybe at the lowest 24 optimizations or unless explicitly disabled).

For 23 example, here are some release notes from GCC 4.3:

Code generation 22 of block move (memcpy) and block set (memset) was 21 rewritten. GCC can now pick the best algorithm 20 (loop, unrolled loop, instruction with 19 rep prefix or a library call) based on 18 the size of the block being copied and 17 the CPU being optimized for. A new option -minline-stringops-dynamically has been 16 added. With this option string operations 15 of unknown size are expanded such that 14 small blocks are copied by in-line code, while 13 for large blocks a library call is used. This 12 results in faster code than -minline-all-stringops when the library 11 implementation is capable of using cache 10 hierarchy hints. The heuristic choosing 9 the particular algorithm can be overwritten 8 via -mstringop-strategy. Newly also memset of values different 7 from 0 is inlined.

It might be possible 6 for the compiler to do something similar 5 with the alternative examples you gave, but 4 I'd bet it's less likely to.

And it's grep-able 3 and more immediately obvious at a glance 2 what the intent is to boot (not that the 1 loop is particularly difficult to grok either).

Score: 23

As Michael already noted, gcc and I guess 3 most other compilers optimize this already 2 very well. For example gcc turns this

char arr[5];
memset(arr, 0, sizeof arr);

into

movl  $0x0, <arr+0x0>
movb  $0x0, <arr+0x4>

It 1 doesn't get any better than that...

Score: 9

There's no way of answering the question 16 without measuring. It will depend entirely 15 on the compiler, cpu and runtime library 14 implementations.

memset() can be bit of a 13 "code smell", because it can be prone to 12 buffer overflows, parameter reversals and 11 has the unfortunate ability of only clearing 10 'byte-wise'. However it's a safe bet that 9 it will be 'fastest' in all but extreme 8 cases.

I tend to use a macro to wrap this 7 to avoid some of the issues:

#define CLEAR(s) memset(&(s), 0, sizeof(s))

This sidesteps 6 the size calculations and removes the problem 5 of swapping the length and vlaue parameters.

In 4 short, use memset() "under the hood". Write 3 what you intend, and let the compiler worry 2 about optimizations. Most are incredibly 1 good at it.

Score: 1

Considering this code per se evrything is 16 already been told. But if you consider it 15 in its program, of which I don't know nothing, something 14 else can be done. For example, if this code 13 is to be executed every some time to clear 12 an array, you could run a thread that constantly 11 allocates a new array of zero elements assigned 10 to a global variable, which your code, when 9 needs the array to be cleared, simply points 8 to.

This is a third option. Of course if 7 you plan to run your code on a processor 6 with at least two cores and this makes sense. Also 5 the code must be run more than once to see 4 the benefits. For only a one-time run, you 3 could declare an array filled with zeros 2 and then point to it when needed.

Hope this 1 may help someone

More Related questions