This particular post will focus on the 'namespace technique' that I made mention of in a prior post I received a lot of negative feedback in both the comments on that post and on reddit. There is a separate post that I would like to go into on its own merits, and I will focus on that on the next post in this pair. If any new concerns are raised I will have to respond to them in future posts. Now after re-reading the post and the influx of comments I've received I feel it would be reasonable for me to sit down and express in more detail why this solves a problem for me along with some more detailed examples and some new analysis of the costs of what I've been doing/done.
afterthoughts
I highly recommend skipping to the Conclusions section at the end. If the results interest you then feel free to examine my reasoning and how I got there. I write these posts as a learning experience so what I know at the beginning may be different than what I know at the end so if you only care about the answer skip ahead.
Table of Contents
1 Environments
There are a lot of code samples and assembly samples that I am going
to be providing in this post. If I mention cl
I am referring
specifically to Microsoft's compiler version "15.00.30729.01 for
x64" with setenv /Vista /x64 /Release
set. I use the Microsoft SDK
and not Visual Studio out of personal preference as it lets me use
my ASUS Eee 1005HA as a coding workstation when I am killing a bit
of time or wanting to test out a new idea. Visual Studio works
fairly well on it but I've started to become accustomed to using
org-mode with Emacs to write in a literate-esque coding style by
embedding and discussing the merits of snippets (to myself) for
future reference and just because it helps me learn better.
The other compiler I am using is gcc
of version 4.4.5 with the
target of "x8664-linux-gnu." Depending on how much information I
get back from these two compilers I might explore clang, but I run
Debian stable on my laptop so I would have to compile it from
scratch to make sure I gave it a fair analysis. I also would like to
spend some time exploring other platforms and the costs I am
incurring with my namespace technique. Some future work might be
analyzing the AVR C compiler and maybe some ARM output, but I don't
have a good test environment for either of those outputs.
2 Preconceptions and Reasoning
The fundamental question I feel that wasn't explained was why I even conceived of such a wacky technique in the first place. A while ago I was considering how to make C a more interactive coding environment like you would find in a lot of other languages like Haskell, Lisp, Scheme and Python. By 'interactive' I am referring to the idea presented fairly well on Wikipedia, but my most direct experience with the idea was spawned by "Writing Interactve Compilers and Interpreters" by P. J. Brown. I admit I inherited(stole) this book from my father but it is published at a very interesting time in programming (1979). SmallTalk (1972) and C (1979) had only been out for a few years. GNU (1983) hadn't even started yet and it was still kind of a wild time. All of these were before my time so it's very interesting to look back in retrospective and realize that some things I grew up with just didn't exist. Another interesting fact is that this particular book is very much out of print and it is just a fascinating realization that there are probably hundreds of books like this that didn't carry on. Computer Science as a field has progressed to the point that we have separated 'classics' like Knuth's works and K&R from pieces of the day like the book I just referenced. Just because a book is out of print doesn't mean there isn't information contained within the text that doesn't exist anywhere else.
Anyway, back to the point at hand- I was trying to conceive of a proper interactive programming environment in C with the same kind of workflow I had with Haskell or Python. I'd write a bit of code then C-c C-l and hop over to a REPL that just had loaded the code to try it out. I never progressed particularly far on this project because I got distracted by other things going on and never got back to it but some of the techniques I explored to accomplish the project stayed with me. The first problem I had was how to create a mapping between function name to function which is a reasonable problem. In C though there really isn't a good way to abstract away from function lists so in my tests I started building some rather static function lists like:
/* Assume foo, bar and foobar are functions defined elsewhere with matching function signatures. The functions aren't really important for this demonstration. */ struct fp_map { void (*foo) (void); int (*bar) (int, int); char* (*foobar) (void); }; static struct fp_map FP = { &foo, &bar, &foobar };
The idea here is now that I had a set of function pointers, so I could write a test program that mapped to the function pointers in a fashion like the following code:
int main( int argc, char **argv ) { FP.foo(); printf( "bar() called with result %d\n", FP.bar( 1, 2 ) ); printf( "%s\n", FP.foobar() ); return 0; }
So this allowed me to do some dynamic swapping of functions. Let's list the output of the existing code first:
foo() called. bar() called with result 3 foobar() called.
Let's then define a new code snippet that replaces the call to foobar():
char* new_foobar ( void ) { return "new_foobar() called.\n"; }
Then we change our main()
to swap in new_foobar()
:
int main( int argc, char **argv ) { FP.foobar = &new_foobar; FP.foo(); printf( "bar() called with result %d\n", FP.bar( 1, 2 ) ); printf( "%s\n", FP.foobar() ); return 0; }
After running the program our output is as expected:
foo() called. bar() called with result 3 new_foobar() called.
So now I had a technique to hotswap in functions. The next step would be to flesh out the fully interactive programming environment and do some magic with Position-Independent Code(PIC) to essentially make a Just-in-Time Compiler(JIT) and hotload in functions. I never fully explored this concept because I needed to learn a lot more things that would allowed me to pull this off in a reasonable way (mostly just finding a JIT, or stealing LLVM's codebase to allow me to pull this off). I also needed to learn elisp a bit better because half of the project was UI. So all in all an interesting idea, but I just needed to learn a lot more before I dived in headfirst.
So what did I take away from this? I found that it was actually
pretty handy to have the functionality in one place, especially if
I didn't make it const
and allowed me to swap in functions. I
explored this concept a bit more in other small programs I made
where I was trying to create a cross-platform implementation of
some system specific calls. I created a struct
containing the
functions I wanted to use and then a 'make' function that returned
that struct
filled out with the system-specific functions. How I
would do this would be to have two separate files:
# On Windows cl main.c somelib_win32.c /Femain.exe # On Linux gcc main.c somelib_linux.c -o main
So the header would define the external 'make' function for this
library and then define the struct's format. In the main program
you'd call the make function to provide the main functionality and
then you could have cross-platform code without using a ton of
preprocessor tricks to get it to work the same way. When I am just
experimenting on small code ideas it's a lot easier for me
personally to write a simple makefile that works both with nmake
and make
that has a setup like this:
all: echo "use nmake win32 or make linux" win32: cl main.c somelib_win32.c /Femain.exe linux: gcc main.c somelib_linux.c -o main
Yes this makefile is not good because I don't compile separately to object files, but I have used the more abbreviated versions to try and not just be too verbose with the code listings. I am assuming most people reading this have compiled software on the command line and have used sensible makefiles. What this results in is having a makefile that works on Windows and Linux without having to define platform specific options, use CMake, scons or some other tool.
Since I'd abstracted away from pointing directly to function names
this 'namespace technique' allowed me to write the main logic once
and worry about the actual implementation separately and also allow
myself to test the platform specific stuff in a contained
environment. I find reading code with a lot of #ifdef
preprocessor action to be a bit awkward since you end up having to
interpret two different code execution trees that might intersect
and also become a lot less modular since trying to rip out the
Linux-only code or the Windows-only code for another project is a
challenging and error prone proposition. The technique I presented
lets me avoid that a little better.
What about duplicating work? Won't I be writing the same functions twice? These are very reasonable questions, but you can solve these in a rather clean and preprocessor free manner.
[insert image#1]
By putting the platform independent code in a separate file entirely you really pare down to the platform dependent code. To this end I feel there is a very limited duplication of effort since in most platform dependent code I want to use normally has wildly different interfaces or requirements to set things up and tear things down in a responsible and correct manner. By using this technique I can prevent this from being even an issue. Here is an example project demonstrating the method up until this point.
Now at this point I'd already started to use this pattern regularly in my code, but I realized there was a benefit to this particular technique that went beyond just giving a replaceable interface: you could actually form different entry points to a complex library and you could create an implicit tree of what a library provided. I'm going to create a theoretical code example for a while regarding parts of speech that I am not going to provide a code sample for, just to demonstrate what I mean.
2.1 English Parser
Let's say you wrote a library to parse the English language. This is a hard task in general just because English is very complex, meaning there are probably a lot of functions that belong together but have wildly different results. I'm going to present a subset of those theoretical functions and pretend this project is called 'epar' for English parser which is something you would normally have in C for a function prefix. These functions won't be commented either because in most cases that's what you actually get in libraries.
/* Text Processing */ extern int engl_next_word ( char *src, char *buff, size_t buffsz ); extern int engl_next_sentence ( char *src, char *buff, size_t buffsz ); extern char** engl_break_words ( char *src ); extern char** engl_break_sentences ( char *src ); /* Sentence Operators */ extern int engl_sentence_tense_cons ( char *src ); extern struct engl_sentence_diagram* engl_get_sentence_diagram( char *src ); extern struct engl_partial_diagram* engl_get_subject_diagram( char *src ); extern char* engl_get_subject ( char *src ); extern struct engl_partial_diagram* engl_get_predicate_diagram( char *src); extern char* engl_get_predicate ( char *src ); /* Word Analysis */ extern int engl_word_is_verb ( char *src ); extern int engl_word_is_noun ( char *src ); extern int engl_word_is_adjective ( char *src ); /* Word Tense Checker */ extern int engl_word_past_tense ( char *src ); extern int engl_word_present_tense ( char *src ); extern int engl_word_future_tense ( char *src ); /* Spellcheck */ extern char** engl_spellcheck_word ( char *src ); extern struct spellcheck_list* engl_spellcheck_sentence ( char *src ); /* Bad Words */ extern int engl_is_badword ( char *src ); extern int engl_contains_badword ( char *src ); extern char* engl_replace_badwords ( char *src );
I don't feel this is an unreasonable representation of actual header files you would see in the field. Some examples of this are ECL's external.h, Lua's lua.h and Linux's include/linux/console.h. In fact now in hindsight I think it was browsing the Linux kernel or watching an old talk based for new Linux kernel developers that clued me into this idea, since they use a similar technique that I described above in that console.h file. I vaguely remembered the talk describing why not bother with C++ in the kernel and the reverse argument was "You can still overload functions and do object oriented programming techniques, and the kernel has it so why bother with C++?" I can't remember the video and that quotation is from memory. Anyway, the demonstration I wanted to make by linking to those real-world source files is that they are a listing of functions loosely categorized with a single-line comment and whitespace. It's a common thing to see.
Now let's categorize the functions in that english code sample into some simple classifications:
/* Linear Retrieval */ extern int engl_next_word ( char *src, char *buff, size_t buffsz ); extern int engl_next_sentence ( char *src, char *buff, size_t buffsz ); /* Full String Processing */ extern char** engl_break_words ( char *src ); extern char** engl_break_sentences ( char *src ); /* Value Retrieval */ extern char* engl_get_subject ( char *src ); extern char* engl_get_predicate ( char *src ); /* Predicates (Functions that return a true or false value) */ extern int engl_sentence_tense_cons ( char *src ); extern int engl_word_is_verb ( char *src ); extern int engl_word_is_noun ( char *src ); extern int engl_word_is_adjective ( char *src ); extern int engl_is_badword ( char *src ); extern int engl_contains_badword ( char *src ); extern int engl_word_past_tense ( char *src ); extern int engl_word_present_tense ( char *src ); extern int engl_word_future_tense ( char *src ); /* Full String Replacement */ extern char* engl_replace_badwords ( char *src ); /* List of Unrelated Values */ extern char** engl_spellcheck_word ( char *src ); /* Returns Custom Struct with new Data */ extern struct engl_sentence_diagram* engl_get_sentence_diagram( char *src ); extern struct engl_partial_diagram* engl_get_subject_diagram( char *src ); extern struct engl_partial_diagram* engl_get_predicate_diagram( char *src); extern struct spellcheck_list* engl_spellcheck_sentence ( char *src );
Now looking at the second listing compared to the first it is
obvious that the categorization among some of the things is
difficult to interpret properly. In fact someone trying to use
this header might miss that engl_sentence_tense_cons
is a
predicate because of an artifical shortening of the name. In this
example I am defining this hypothetical function as a 'function
that returns a true or false value based on whether the passed
sentence has a consistent tense.' I didn't even mean to make this
an example of a terrible name for a function, I did it
automatically just as a C programmer because it makes sense to
shorten the name after a 'certain number of characters.' Assuredly
though the next one who looks at this code a month from now
couldn't figure it out without assistance.
Also some of these functions could be used in a standalone
context. What if I just wanted a library that would analyze a
sentence and remove all the bad words for a blog filter? Are you
sure that you could extract engl_replace_badwords
from the
listing above and apply it properly? Would you have to go read the
source or pray that there was some documentation elsewhere? I find
myself having to exercise the other options to figure out what
return values I should expect and what would be reasonable.
We could even list these same functions in a third way:
/* String Operations */ extern int engl_next_sentence ( char *src, char *buff, size_t buffsz ); extern char** engl_break_sentences ( char *src ); /* Sentence Operations */ extern int engl_next_word ( char *src, char *buff, size_t buffsz ); extern char** engl_break_words ( char *src ); extern char* engl_get_predicate ( char *src ); extern int engl_sentence_tense_cons ( char *src ); extern char* engl_get_subject ( char *src ); extern int engl_contains_badword ( char *src ); /* Word Operations */ extern int engl_word_future_tense ( char *src ); extern int engl_word_past_tense ( char *src ); extern int engl_word_present_tense ( char *src ); extern int engl_word_is_adjective ( char *src ); extern int engl_word_is_verb ( char *src ); extern int engl_word_is_noun ( char *src ); extern int engl_is_badword ( char *src ); /* Language Detail Functions */ extern struct engl_sentence_diagram* engl_get_sentence_diagram ( char *src ); extern struct engl_partial_diagram* engl_get_subject_diagram ( char *src ); extern struct engl_partial_diagram* engl_get_predicate_diagram( char *src); /* Language Filter Operations */ extern char** engl_spellcheck_word ( char *src ); extern char* engl_replace_badwords ( char *src ); extern struct spellcheck_list* engl_spellcheck_sentence ( char *src );
This is a completely valid way to list the same functions but depending on what you are trying to do, the order presented could make it really easy to find what you want or nearly impossible without help. So back to what I presented last time, I'll cut and paste what I presented as an alternative:
#ifndef __NS_H__ #define __NS_H__ #include <stdio.h> void func1 ( void ); int func2( char *word ); struct NS_namespace { void (*func1) ( void ); int (*func2) ( char* ); }; static const struct NS_namespace NS = { &func1, &func2 }; #endif /* __NS_H__ */
This structure allows you to call func1()
with NS.func1()
as
was discussed earlier. Now what I am trying to express is that if
you use my terrible technique you can create a structured tree of
your code which gives one true place to use as reference but also
can divide up the code into things with the same return
types. While you can kind of create this with a naming scheme you
can easily have a situation where the nomenclature just doesn't
line up just out of organic creation of the code. I'm going to
focus just on the word predicates and propose a namespace using
the technique I have described above:
struct Word_namespace { struct { /* Returns 0 if true, 1 if false */ int (*verb) ( char* ); int (*noun) ( char* ); int (*adjective) ( char* ); int (*future_tense) ( char* ); int (*past_tense) ( char* ); int (*present_tense) ( char* ); } is; }; const struct Word_namespace Word = { { &engl_word_is_verb, &engl_word_is_noun, &engl_word_is_adjective, &engl_word_future_tense, &engl_word_past_tense, &engl_word_present_tense }};
Now I think would be a good time to express my normal workflow which is:
- Get an idea
- Write code, Test for Correctness
- Build a 'namespace'
- Map the functions
As was mentioned on the comments it is something that could easily be error prone, but it is something I usually do at the end to tie everything up in a nice bow for the next time I need it. Something to note is that in the initial function list I made the tenses follow a logically different tree, but in reality they were the same sort of predicate as the types of words. They inspected the word passed in and returned true if they were in the set of verbs, nouns, adjectives or in the set of words with a future tense, past tense or present tense.
I was able to do this after the fact at a very high level while
still keeping the internal source consistent. Maybe I used
engl_word_past_tense
all over the place in my tests, in my
source files and in my internal documentation. Replacing the
function's name everywhere it appears just can't be assumed to be
safe, because it's possible that you miss one file and break
everything. Additionally at one point it made sense to me to
call it that at one point in time. Part of coding is expressing
the ideas you have the best way you can in a language a compiler
understands, so maybe in the context of what you wrote it makes
sense and changing the name after the fact makes the code
confusing to read or less insightful of your original intent.
3 New Information
*The information in this 'New Information' section is problematic and should not be followed. It is left in because it was a thing I wanted to try. Move on to the Conclusions for more details.*
Now in both the reddit comment thread, the original post and even a contribution on github it was revealed to me that the function definitions didn't need to linger in the header files and that by defining the struct and informing the compiler that the namespace struct was defined elsewhere was actually a superior option. I never thought of this personally because it never really hurt me or bothered me, but in fact this gives some additional benefits.
I will bring back the original NS.h header file to demonstrate:
#ifndef __NS_H__ #define __NS_H__ #include <stdio.h> void func1 ( void ); int func2( char *word ); struct NS_namespace { void (*func1) ( void ); int (*func2) ( char* ); }; static const struct NS_namespace NS = { &func1, &func2 }; #endif /* __NS_H__ */
Now with this new information it becomes clear that defining NS
and both func1
and func2
in the header file is not only
redundant but it hurts the modularity gain that I saw when I was
using function pointers in the fp_map
example. It is much more
reasonable and sane to write the following:
In NS.h
#ifndef __NS_H__ #define __NS_H__ #include <stdio.h> struct NS_namespace { void (*func1) ( void ); int (*func2) ( char* ); }; extern const struct NS_namespace NS; #endif /* __NS_H__ */
In NS.c
#include "NS.h" void func1 ( void ) { printf( "I am function 1.\n" ); } int func2( char *word ) { /* Check Parameters */ if ( word == NULL ) { printf ( "NULL word passed to func2.\n" ); return -1; } printf( "The word of the day is %s.\n", word ); return 0; } /* Define NS here */ static const struct NS_namespace NS = { &func1, &func2 };
Now after all of what I've learned this is the superior option of both the const technique and the variant that allowed you to do replacement of source files for cross-platform antics. Anyway, my bad idea just got a little bit better.
4 Experimentation
Some honest feedback I received were criticisms about optimization
problems. To this end I am going to analyze this technique against
both gcc
and cl
. First I need to segment the possible compiler
optimizations that could effect functions.
I'm not an expert on compiler optimizations at this time so I am going to have to defer to the Wikipedia listings. I went through the 61 listed optimizations and tried to target what optimizations might have trouble and have to do with function manipulation rather than instruction/variable/precomputed value manipulation. The final list I came up with was as follows:
Enabling Transformation |
Inline Caching |
Inline Expansion |
Interprocedural Optimization |
Return Value Optimization |
The remainder of optimizations listed on that page have to do with
the deduplication of work, loop optimization or some other
rearrangement of basic blocks of code. The five listed articles
above had to do with effectively inlining of either the
instructions or the result of a function. If the 'namespace' is
const
then by definition then the compiler is free to do direct
value replacement in the resulting function.
I was going to make an example but the previous exposition took me too long to write and someone beat me to the punch (in my own comments no less)! So in the comments of the post that started all this Hraban writes:
... Not unless by "vastly" you mean "not at all". Compile with -O3, look at the assembly: https://gist.github.com/3033687#file_test_o3.s Line 11 and 12 are equal. Thus is the power of const and -O3. Hraban
The relevant link to the gist is here and it matches my own
tests. Even now in realizing how little I know about compiler
optimization tricks the answer is pretty clear: once you can
guarantee a value to be constant you are free to replace it as is
anywhere. This const
modifier applied to the 'namespace' gives C
the greenlight to drop the address to the appropriate function in
as if you had called it directly without the 'namespace.'
The only thing left to check is if the common theme of the five
remaining articles applies: if a function could normally be inlined
would it still be inlined using this namespace technique on both
gcc
and cl
? The first thing to do is to identify a code sample
that with the appropriate options is inlined as a normal function
call. While it's definitely possible for the function to be inlined
it also possible for the inline optimization pass to happen AFTER
the constant replacement/injection manipulation.
The act of inlining functions became official as part of the C99 standard and so while some compilers might do the expected thing C89 effectively 'doesn't inline.' So if we were compiling based on the C89 spec we already 'won.' Still we are interested in what the compiler would do.
I'm starting with gcc
first just because I happen to be working
on Linux at the time of writing this post, I will do cl
after. Here is the very simple and short code sample build with the
command line gcc -S inlinetest.c
:
#include <stdio.h> int inline_value() { return 5; } int main( int argc, char **argv ) { printf( "%d", inline_value() ); return 0; }
Since an optimization setting wasn't picked I did not get gcc
to
inline by default. The resulting assembly from main()
is as
follows:
.globl main .type main, @function main: .LFB1: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 movq %rsp, %rbp .cfi_offset 6, -16 .cfi_def_cfa_register 6 subq $16, %rsp movl %edi, -4(%rbp) movq %rsi, -16(%rbp) movl $0, %eax call inline_value movl %eax, %edx movl $.LC0, %eax movl %edx, %esi movq %rax, %rdi movl $0, %eax call printf movl $0, %eax leave ret .cfi_endproc
Now even if you aren't familiar with assembler at all you should be
able to recognize 'call inlinevalue' being an instruction that
occurs. Next let's ask gcc
to optimize our code with gcc -S -O2 inlinetest.c
:
.globl main .type main, @function main: .LFB12: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 movl $5, %esi movl $.LC0, %edi xorl %eax, %eax call printf xorl %eax, %eax addq $8, %rsp ret .cfi_endproc
Now from this example it is clear the inline occured as one would
expect, as you can see no call to inline_value()
. Now, let's
revise the code to use namespacing. I am going to keep it in the
same file for no particularly good reason other than to make it
short to post:
#include <stdio.h> int inline_value() { return 5; } struct A_namespace { int (*_ivalue) ( void ); }; const struct A_namespace A = { &inline_value }; int main( int argc, char **argv ) { printf( "%d", inline_value() ); printf( "%d", A.value() ); return 0; }
First let's analyze the unoptimized output for main()
:
.globl main .type main, @function main: .LFB1: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 movq %rsp, %rbp .cfi_offset 6, -16 .cfi_def_cfa_register 6 subq $16, %rsp movl %edi, -4(%rbp) movq %rsi, -16(%rbp) movl $0, %eax call inline_value movl %eax, %edx movl $.LC0, %eax movl %edx, %esi movq %rax, %rdi movl $0, %eax call printf movq A(%rip), %rax call *%rax movl %eax, %edx movl $.LC0, %eax movl %edx, %esi movq %rax, %rdi movl $0, %eax call printf movl $0, %eax leave ret .cfi_endproc
Now for simplicity's sake I am going to pare it down to the actual 'namespace' call:
movq A(%rip), %rax call *%rax
This sort of code example is actually one of the main drawbacks
that I expected right away. There is an excess call to load the
function pointer into a register and then call the function. This
is just an extra line of code but there is a lot that could go
on. For instance if there was a page fault not only would you
effectively have to 'fetch' the namespace struct and load a value
you would also have a 'fetch' to make sure the function is loaded
and then call that function. This double pagefault could really add
up, especially if a lot of functions used this technique. Now let's
try an optimized version from gcc -S -O2 inlinetest.c
:
.globl main .type main, @function main: .LFB12: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 movl $5, %esi movl $.LC0, %edi xorl %eax, %eax call printf call *A(%rip) movl $.LC0, %edi movl %eax, %esi xorl %eax, %eax call printf xorl %eax, %eax addq $8, %rsp ret .cfi_endproc
Now the two lines was reduced to a single line: call *A(%rip)
. This is actually still a 'bad result' since the true
right result is to eschew the function call entirely and just load
-
I did try
gcc
with-O3
but I got the same result as-O2
. This is a rather heartbreaking result for me personally
because it proves me wrong.
I do have a few long shot ideas though:
- I did not set the code to be static in the original
source. Nope. Still the same.
-
Now to provide an inline hint to
inline_value()
? Nope! - With -std=c99? Nope!
To be fair I did expect this as a result, since it seemed reasonable to me that this sort of technique would not be inline-friendly.
There still is one final phase to this technique. While inlining does not happen within the same file what if I use it in the method I described above where the functions are effective external?
/in 'inlinetest2.h'/
#ifndef __INLINETEST2_H__ #define __INLINETEST2_H__ struct A_namespace { int (*value) ( void ); }; extern const struct A_namespace A; #endif /* __INLINETEST2_H__ *
/in 'inlinetest2.c'/
#include "inlinetest2.h" inline int inline_value() { return 5; } const struct A_namespace A = { &inline_value };
/in 'test2.c'/
#include <stdio.h> #include "inlinetest2.h" int main( int argc, char **argv ) { printf( "%d", A.value() ); return 0; }
The first thing we want to do is compile by parts. First let's
compile inlinetest2.c with gcc -c -O2 inlinetest2.c
. Next we need
to compile test2.c and examine its source after gcc -S -O2 test2.c
(We have to use -c or it attempts to link in both these
cases). Unfortunately we have the function still being called:
.globl main .type main, @function main: .LFB11: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 call *A(%rip) movl $.LC0, %edi movl %eax, %esi xorl %eax, %eax call printf xorl %eax, %eax addq $8, %rsp ret .cfi_endproc
Now let me try -O3.. still the same. In this example inlining will
not occur under any circumstance with gcc
I was going to try this
with cl
as well but if it doesn't work in one place it's not
reliable anywhere else.
I think the last thing we need to do is replicate the test case and the resulting example. In my testing using the namespace example project I was not getting the 'pointer to function' call statement, I was getting a literal call statement as much as Hraban was getting. Let me use that project again. I suspect that in NS.h defining both the functions and the namespace in the header file allowed the compiler to optimize in a direct function call. Here is the NS.h file for reference:
#ifndef __NS_H__ #define __NS_H__ #include <stdio.h> void func1 ( void ); int func2( char *word ); struct NS_namespace { void (*func1) ( void ); int (*func2) ( char* ); }; static const struct NS_namespace NS = { &func1, &func2 }; #endif /* __NS_H__ */
After running this through gcc -S -O2 main.c
I got the same
result Hraban got:
.globl main .type main, @function main: .LFB11: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 call func1 xorl %edi, %edi call func2 testl %eax, %eax js .L6
Notice now both calls to func1 and func2 are not pointers? This is a very odd response. Let me investigate trying to make them pointer-to-functions like the other tests.
The first thing I think would be to annotate both func1()
and
func2()
as extern
. No change.
Now I am going to use the technique where I define the struct and
make the actual namespace struct extern
. This changes our two
NS.h and NS.c files to be:
#ifndef __NS_H__ #define __NS_H__ #include <stdio.h> struct NS_namespace { void (*func1) ( void ); int (*func2) ( char* ); }; extern static const struct NS_namespace NS; #endif /* __NS_H__ */
#include "NS.h" void func1 ( void ) { printf( "I am function 1.\n" ); } int func2( char *word ) { /* Check Parameters */ if ( word == NULL ) { printf ( "NULL word passed to func2.\n" ); return -1; } printf( "The word of the day is %s.\n", word ); return 0; } const struct NS_namespace NS = { &func1, &func2 };
Note I had to drop the static
specifier from the function since
as soon as it was no longer in a shared header the declaration
became 'non-static'. I know this is an awful thing to say but at
this point I am getting a little frustrated and just want to finish
this blog post, so I just made 'em const. Now let's see the
assembly:
.globl main .type main, @function main: .LFB11: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 call *NS(%rip) xorl %edi, %edi movq NS+8(%rip), %rbx call *%rbx testl %eax, %eax js .L6
Aha! Well at least I know where it comes from. It seems as soon as
you lose the 'static' context gcc
at least is unable to do a 1:1
replacement which makes sense. By my interpretation static
represents that there is one canonical value and const
implies
that a value doesn't change. As soon as static
is removed from
the equation this replacement can't occur. I'm going to take a
second to look around quick to make sure I am interpreting the bits
right. Now straight from the horse's mouth in the C99 specification, page 140:
4 The storage-class specifier, if any, in the declaration specifiers shall be either extern or static.
So what this means is that you can't have extern
and static
on
the same storage class specifier (like a struct), it's got to be
one or the other. Since extern
causes the regression and static
does not I will have to say the naive approach I started with is
ironically the most correct (even though inlining doesn't work).
5 Conclusions
This turned out to be a much more monumental process than I thought it would be. I intended on reaching out to multiple compilers and trying to explore some of the options but at this point I need to just put a stake in this and call it done. In the end this is one of those posts where you start off in a direction and you walk really far to the point of your own uncertainty in existence itself to find that you were right all along. This feeling though at the peak of enlightment is often a humbling and almost depressing experience. In seeking the truth in this case I felt like a true antagonist to myself to the point that I was disappointed I was right.
My naive response to earnest criticism based on just a gut feeling was actually the right result, suggestions made like this one where there was implications on how to do it better were in fact the wrong path even though it seemed better to both parties at the time. I kind of also wanted this to be an example of 'why not to be crazy and stay within the painted lines' that I could quote myself to others to try and stave off their mistakes but that is a blog post for another time.
In the end though the way to get this to work correctly and right at the cost of not having some functions inline is the same as it was before as seen in the first revision of my 'namespace example.'
In the end though I hope I gave this idea a fair shake and passably crude analysis. If people think otherwise please let me know where I went wrong either in the comments here, the eventual reddit comment threads or just harrass me on twitter. I plan on writing another post expressing the right way to do this technique separate from all this fumbling around.
6 Future Work
I do have one more serious grievance posed by a reddit comment to take a look at and thoroughly analyze, and that has to do with the Python array slicer I threw together in a couple of hours. I want to really give that as much thought as I did this to really make sure I am approaching the language correctly and not spready false truths around in my wake. I do teach other people so I feel it's my responsibility to make sure I know it 110% right.
I did this sort of analysis for gcc
only. I intended to use cl
but the time got away from me.
Next there was a particular piece of information in the middle of
this post about using my namespace technique to offload/sort and to
classify the complexity of code. Now that I have proven at least
with gcc
that this technique is effectively 'free' compared to
calling a function itself I want to quantify this idea of
complexity coming from misnaming stuff. I planned to insert this
somewhere in this post but oops.
Also I mentioned Hraban in this post a few times near the end and he actually submitted a really great idea on how to extend my idea and make it work a little differently/intelligently. You can see that change in this github pull request. I'm not going to merge it right away until I take a look at it and really try to examine the side effects but it definitely seems like a way to use preprocessors for good and not for evil. I had a similar idea I was contemplating but his seems like a hundred times more sensible.