Some honest feedback I received were criticisms about optimization
problems. To this end I am going to analyze this technique against
both gcc
and cl
. First I need to segment the possible compiler
optimizations that could effect functions.
I'm not an expert on compiler optimizations at this time so I am
going to have to defer to the Wikipedia listings. I went through
the 61 listed optimizations and tried to target what optimizations
might have trouble and have to do with function manipulation rather
than instruction/variable/precomputed value manipulation. The final
list I came up with was as follows:
The remainder of optimizations listed on that page have to do with
the deduplication of work, loop optimization or some other
rearrangement of basic blocks of code. The five listed articles
above had to do with effectively inlining of either the
instructions or the result of a function. If the 'namespace' is
const
then by definition then the compiler is free to do direct
value replacement in the resulting function.
I was going to make an example but the previous exposition took me
too long to write and someone beat me to the punch (in my own
comments no less)! So in the comments of the post that started all this Hraban writes:
...
Not unless by "vastly" you mean "not at all". Compile with -O3, look at the assembly:
https://gist.github.com/3033687#file_test_o3.s
Line 11 and 12 are equal. Thus is the power of const and -O3.
Hraban
The relevant link to the gist is here and it matches my own
tests. Even now in realizing how little I know about compiler
optimization tricks the answer is pretty clear: once you can
guarantee a value to be constant you are free to replace it as is
anywhere. This const
modifier applied to the 'namespace' gives C
the greenlight to drop the address to the appropriate function in
as if you had called it directly without the 'namespace.'
The only thing left to check is if the common theme of the five
remaining articles applies: if a function could normally be inlined
would it still be inlined using this namespace technique on both
gcc
and cl
? The first thing to do is to identify a code sample
that with the appropriate options is inlined as a normal function
call. While it's definitely possible for the function to be inlined
it also possible for the inline optimization pass to happen AFTER
the constant replacement/injection manipulation.
The act of inlining functions became official as part of the C99 standard and so while some compilers might do the expected thing
C89 effectively 'doesn't inline.' So if we were compiling based on
the C89 spec we already 'won.' Still we are interested in what the
compiler would do.
I'm starting with gcc
first just because I happen to be working
on Linux at the time of writing this post, I will do cl
after. Here is the very simple and short code sample build with the
command line gcc -S inlinetest.c
:
#include <stdio.h>
int inline_value()
{
return 5;
}
int main( int argc, char **argv )
{
printf( "%d", inline_value() );
return 0;
}
Since an optimization setting wasn't picked I did not get gcc
to
inline by default. The resulting assembly from main()
is as
follows:
.globl main
.type main, @function
main:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $16, %rsp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $0, %eax
call inline_value
movl %eax, %edx
movl $.LC0, %eax
movl %edx, %esi
movq %rax, %rdi
movl $0, %eax
call printf
movl $0, %eax
leave
ret
.cfi_endproc
Now even if you aren't familiar with assembler at all you should be
able to recognize 'call inlinevalue' being an instruction that
occurs. Next let's ask gcc
to optimize our code with gcc -S -O2 inlinetest.c
:
.globl main
.type main, @function
main:
.LFB12:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $5, %esi
movl $.LC0, %edi
xorl %eax, %eax
call printf
xorl %eax, %eax
addq $8, %rsp
ret
.cfi_endproc
Now from this example it is clear the inline occured as one would
expect, as you can see no call to inline_value()
. Now, let's
revise the code to use namespacing. I am going to keep it in the
same file for no particularly good reason other than to make it
short to post:
#include <stdio.h>
int inline_value()
{
return 5;
}
struct A_namespace
{
int (*_ivalue) ( void );
};
const struct A_namespace A = { &inline_value };
int main( int argc, char **argv )
{
printf( "%d", inline_value() );
printf( "%d", A.value() );
return 0;
}
First let's analyze the unoptimized output for main()
:
.globl main
.type main, @function
main:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $16, %rsp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $0, %eax
call inline_value
movl %eax, %edx
movl $.LC0, %eax
movl %edx, %esi
movq %rax, %rdi
movl $0, %eax
call printf
movq A(%rip), %rax
call *%rax
movl %eax, %edx
movl $.LC0, %eax
movl %edx, %esi
movq %rax, %rdi
movl $0, %eax
call printf
movl $0, %eax
leave
ret
.cfi_endproc
Now for simplicity's sake I am going to pare it down to the actual
'namespace' call:
movq A(%rip), %rax
call *%rax
This sort of code example is actually one of the main drawbacks
that I expected right away. There is an excess call to load the
function pointer into a register and then call the function. This
is just an extra line of code but there is a lot that could go
on. For instance if there was a page fault not only would you
effectively have to 'fetch' the namespace struct and load a value
you would also have a 'fetch' to make sure the function is loaded
and then call that function. This double pagefault could really add
up, especially if a lot of functions used this technique. Now let's
try an optimized version from gcc -S -O2 inlinetest.c
:
.globl main
.type main, @function
main:
.LFB12:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $5, %esi
movl $.LC0, %edi
xorl %eax, %eax
call printf
call *A(%rip)
movl $.LC0, %edi
movl %eax, %esi
xorl %eax, %eax
call printf
xorl %eax, %eax
addq $8, %rsp
ret
.cfi_endproc
Now the two lines was reduced to a single line: call *A(%rip)
. This is actually still a 'bad result' since the true
right result is to eschew the function call entirely and just load
-
I did try
gcc
with -O3
but I got the same result as -O2
. This is a rather heartbreaking result for me personally
because it proves me wrong.
I do have a few long shot ideas though:
-
I did not set the code to be static in the original
source. Nope. Still the same.
-
Now to provide an inline hint to
inline_value()
? Nope!
-
With -std=c99? Nope!
To be fair I did expect this as a result, since it seemed
reasonable to me that this sort of technique would not be
inline-friendly.
There still is one final phase to this technique. While inlining
does not happen within the same file what if I use it in the method
I described above where the functions are effective external?
/in 'inlinetest2.h'/
#ifndef __INLINETEST2_H__
#define __INLINETEST2_H__
struct A_namespace
{
int (*value) ( void );
};
extern const struct A_namespace A;
#endif /* __INLINETEST2_H__ *
/in 'inlinetest2.c'/
#include "inlinetest2.h"
inline int inline_value()
{
return 5;
}
const struct A_namespace A = { &inline_value };
/in 'test2.c'/
#include <stdio.h>
#include "inlinetest2.h"
int main( int argc, char **argv )
{
printf( "%d", A.value() );
return 0;
}
The first thing we want to do is compile by parts. First let's
compile inlinetest2.c with gcc -c -O2 inlinetest2.c
. Next we need
to compile test2.c and examine its source after gcc -S -O2 test2.c
(We have to use -c or it attempts to link in both these
cases). Unfortunately we have the function still being called:
.globl main
.type main, @function
main:
.LFB11:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
call *A(%rip)
movl $.LC0, %edi
movl %eax, %esi
xorl %eax, %eax
call printf
xorl %eax, %eax
addq $8, %rsp
ret
.cfi_endproc
Now let me try -O3.. still the same. In this example inlining will
not occur under any circumstance with gcc
I was going to try this
with cl
as well but if it doesn't work in one place it's not
reliable anywhere else.
I think the last thing we need to do is replicate the test case
and the resulting example. In my testing using the namespace example project I was not getting the 'pointer to function' call
statement, I was getting a literal call statement as much as Hraban
was getting. Let me use that project again. I suspect that in NS.h
defining both the functions and the namespace in the header file
allowed the compiler to optimize in a direct function call. Here is
the NS.h file for reference:
#ifndef __NS_H__
#define __NS_H__
#include <stdio.h>
void func1 ( void );
int func2( char *word );
struct NS_namespace
{
void (*func1) ( void );
int (*func2) ( char* );
};
static const struct NS_namespace NS = { &func1,
&func2 };
#endif /* __NS_H__ */
After running this through gcc -S -O2 main.c
I got the same
result Hraban got:
.globl main
.type main, @function
main:
.LFB11:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
call func1
xorl %edi, %edi
call func2
testl %eax, %eax
js .L6
Notice now both calls to func1 and func2 are not pointers? This is
a very odd response. Let me investigate trying to make them
pointer-to-functions like the other tests.
The first thing I think would be to annotate both func1()
and
func2()
as extern
. No change.
Now I am going to use the technique where I define the struct and
make the actual namespace struct extern
. This changes our two
NS.h and NS.c files to be:
#ifndef __NS_H__
#define __NS_H__
#include <stdio.h>
struct NS_namespace
{
void (*func1) ( void );
int (*func2) ( char* );
};
extern static const struct NS_namespace NS;
#endif /* __NS_H__ */
#include "NS.h"
void func1 ( void )
{
printf( "I am function 1.\n" );
}
int func2( char *word )
{
/* Check Parameters */
if ( word == NULL ) {
printf ( "NULL word passed to func2.\n" );
return -1;
}
printf( "The word of the day is %s.\n", word );
return 0;
}
const struct NS_namespace NS = { &func1,
&func2 };
Note I had to drop the static
specifier from the function since
as soon as it was no longer in a shared header the declaration
became 'non-static'. I know this is an awful thing to say but at
this point I am getting a little frustrated and just want to finish
this blog post, so I just made 'em const. Now let's see the
assembly:
.globl main
.type main, @function
main:
.LFB11:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
call *NS(%rip)
xorl %edi, %edi
movq NS+8(%rip), %rbx
call *%rbx
testl %eax, %eax
js .L6
Aha! Well at least I know where it comes from. It seems as soon as
you lose the 'static' context gcc
at least is unable to do a 1:1
replacement which makes sense. By my interpretation static
represents that there is one canonical value and const
implies
that a value doesn't change. As soon as static
is removed from
the equation this replacement can't occur. I'm going to take a
second to look around quick to make sure I am interpreting the bits
right. Now straight from the horse's mouth in the C99 specification, page 140:
4 The storage-class specifier, if any, in the declaration specifiers shall be either extern or static.
So what this means is that you can't have extern
and static
on
the same storage class specifier (like a struct), it's got to be
one or the other. Since extern
causes the regression and static
does not I will have to say the naive approach I started with is
ironically the most correct (even though inlining doesn't work).