Friday, March 10, 2017

Check a Certificate Key Identifier

On certificates as part of the signing process you submit a CSR (certificate signing request) to a CA (certificate authority) to be signed, but because of certificate expiration being a core concept in PKI those CAs will too expire. Or in the case of the SHA-1 deprecation you might be in a situation where you have a private key with a refreshed certificate associated with it, or in the case of a secure token or smart card you may have a single key but multiple certificates associated with it. This leads to the concept of Key Identifiers, which are encoded in certificates:

Here's an example of what a certificate would look like:
echo "Q" | openssl s_client -connect google.com:443 2>1 | openssl x509 -text -noout | grep -A1 'Key Identifier'
            X509v3 Subject Key Identifier:
                EB:27:08:F6:93:E5:92:F2:DE:06:FD:1F:9A:89:9F:F6:E4:97:51:30
--
            X509v3 Authority Key Identifier:
                keyid:4A:DD:06:16:1B:BC:F6:68:B5:76:F5:81:B6:BB:62:1A:BA:5A:81:2F
Breaking down this command:
  1. echo "Q" - send the 'Q' command to close down the connection as described in man s_client.
  2. openssl s_client -connect google.com:443 2>1 - connect to host and port combination google.com via port 443, additionally pipe stderr to stdout so it gets filtered out of the final output.
  3. openssl x509 -text -noout - The openssl s_client command emits the certificate information to stdout, and we can use a pipe to forward that to openssl x509 to read from stdin. We then ask to interpret the certificate and then provide the decoded output with -text and not emit the actual certificate again with -noout.
  4. grep -A1 'Key Identifier' - just grab the x509v3 section and the line after which is where the value is displayed.

In RFC 5280 § 4.2.1.1 it is indicated that this provides a signature of the private key used to sign the certificate. And in RFC 5280 § 4.2.1.2 it describes the strategy on how theses keys could be generated. The caveat here is that like the last section mentions you can do anything you want here to generate that value.

To then check what the signature of a key would be if you only have the private key and you are working with certificates signed by openssl, the fastest way is to self sign a certificate. You can do that as follows:
openssl req -new -x509 -key CA.key -sha256 -subj '/CN=test'| openssl x509 -text -noout | grep -A2 'Key Identifier'
            X509v3 Subject Key Identifier:
                B3:1B:00:4C:10:55:73:D9:91:66:36:1C:4B:4F:07:98:74:68:DE:09
            X509v3 Authority Key Identifier:
                keyid:B3:1B:00:4C:10:55:73:D9:91:66:36:1C:4B:4F:07:98:74:68:DE:09
By using this you now can do a direct comparison, and then look at an openssl x509 output to investigate who signed this certificate based on what you now know.

Monday, January 30, 2017

OpenSSL Parameterized Configuration

This is a common question that comes up in ##openssl with regards to handling openssl req, and here is a strategy to make values in the configuration parameters:
openssl genrsa -out ca.key 2048
config_file=some_file
cn="example.com"
echo "
[ req ]
prompt = no
distinguished_name = req_distinguished_name
x509_extensions = v3_ca

[ req_distinguished_name ]
C                = AB
ST               = CD
L                = EF
O                = G
OU               = HI
CN               = $cn

[ v3_ca ]
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid:always,issuer:always
basicConstraints = CA:true
" > $config_file
openssl req -new -x509 -days 3650 -key ca.key -out ca.crt -config $config_file

How to generate a matching CA with a different signing algorithm

This is a bit past it's time ever since Google decided to deprecate SHA-1, but there is still a possibility for a site to need a CA with a different signing algorithm to resolve compatibility with some cryptography in an environment. This in general is not something that should be taken as a recommendation, it is something that needs to be determined is an unfortunate circumstance to support some piece of software or hardware in environment.

The first thing to remember is that a certificate can be simplified to (certificate_info, public_key, signature). The first step of verification is to look through the CA data to match strings of issuer and subject, and then start doing the cryptographic validation. This lookup is the piece you are trying to take advantage of to have differing CA's. This can be seen as part of the following steps:

openssl genrsa -out CA.key 2048
openssl req -new -x509 -key CA.key -out CA.sha256.crt -sha256 -subj /C=AB/ST=CD/L=EF/O=G/OU=HI/CN=JK
openssl req -new -x509 -key CA.key -out CA.sha512.crt -sha512 -subj /C=AB/ST=CD/L=EF/O=G/OU=HI/CN=JK
openssl req -new -x509 -key CA.key -out CA.sha512b.crt -sha512 -subj /C=ED/ST=RFR/L=GHH/O=URU/OU=DS/CN=OL

These openssl req can be done how a CA is normally signed or some other CA signing infrastructure. The particular invocation above was only done for illustration. In the end we have certificates that look like this:
$ find -name 'CA.*.crt'  -print -exec openssl x509 -in {} -subject -noout \;
./CA.sha256.crt
subject= /C=AB/ST=CD/L=EF/O=G/OU=HI/CN=JK
./CA.sha512.crt
subject= /C=AB/ST=CD/L=EF/O=G/OU=HI/CN=JK
./CA.sha512b.crt
subject= /C=ED/ST=RFR/L=GHH/O=URU/OU=DS/CN=OL
Then let's sign a new host key with just one certificate from the CA, but using the unique private key for the CA.
openssl genrsa -out host.key
openssl req -new -key host.key -out host.csr
openssl x509 -req -days 3650 -in host.csr -out host.crt -CAkey CA.key -CA CA.sha256.crt  -sha256 -set_serial 2
This is how a normal signing works, and now let's see how it verifies:
$ find -name 'CA.*.crt'  -print -exec openssl verify -CAfile {} host.crt \;
./CA.sha256.crt
host.crt: OK
./CA.sha512.crt
host.crt: OK
./CA.sha512b.crt
host.crt: C = QA, ST = WSD, L = ED, O = RGFG, OU = YHJ, CN = UJ
error 20 at 0 depth lookup:unable to get local issuer certificate
As you can see from the steps above we are using the same private key material in CA.key with several certificates, but at the moment of verification we are only using one of the CA certificates to verify the signature of the host certificate. In practice you should be making sure only one is available in an environment as the verification process will generally take the first match- even if it can't process it. There have been instances in the past where a MD5 based certificate and a SHA-1 certificate were in a CA trust store but because of natural ordering the MD5 certificate was selected first and was rejected on policy grounds since MD5 is deprecated.

Wednesday, January 25, 2017

Using openssl s_server and openssl s_client to test client certificates

In openssl's man pages understanding how to invoke openssl s_server to experiment with client certificates can be challenging as there is not enough examples on that man page compared to others. A good understanding of how to setup a CAfile that validates with openssl s_client is helpful here, with the general logic being PEM-format certificates joined in a single file. On Unix this is easy with cat rootCA.crt intermediateCA.crt > caFile.crt, and we will be using this caFile.crt throughout this example. It is expected that this file has enough information to validate both the client and the server.

This is easiest to do with two separate terminals with one terminal running the following command:
openssl s_server -accept 10000 -cert server.crt -key server.key -verify 10 -CAfile caFile.crt

And the other terminal running this command:
openssl s_client -connect localhost:10000 -cert client.crt -key client.key -CAfile caFile.crt

For both commands we are using certificates, and so we need the certificate piece with -cert and the key piece with -key. We had already described that we needed a file containing the CA information to verify certificates (caFile.crt) and this is a required piece for verification on the server side, and on the client side since s_client proceeds whether or not the certificate validates.

-accept indicates what port to listen on, which is reflected in the -connect parameter to s_client but is otherwise uninteresting.

The last and most critical piece is -verify which comes in two versions of -verify and -Verify. Without this parameter s_server does not request a certificate. With -verify it requests a certificate but proceeds if one is not sent (something that I describe as 'want'), and with -Verify it requests a certificate and does not proceed if one is not sent (something that I describe as 'need'). The parameter's value is just the depth of the certificate chain, and this is knowledge you would know from working with the CA where you are generating the certificates. If you aren't worrying about verification chain depth for this testing just pick a big number.

After you have a mutual connection or otherwise, you can type into s_client or s_server and then hit return to send a command as if you had connected with netcat or telnet to a non-TLS port. This is something you can use for other situations, such as sending GET / then hitting return twice to send a HTTP GET request to a remote server.

Thursday, December 15, 2016

Conversion to Microsoft Compatible .p12 from a PEM certificate and key

The openssl shell commands work great on PEM certificates, but both Java (via keytool) and Windows (via certmgr.msc) work better with PKCS#12 certificates. This is the command you have to construct to join the entire keypair, which isn't obvious in the EXAMPLES section in openssl's man pkcs12:

openssl pkcs12 -export -out user.p12 -in user.crt -inkey user.key

Wednesday, August 26, 2015

Straightforward Generation of Self-Signed CA for Testing

A common and straightforward question on ##openssl is that someone needs to invent some certificates for testing. For a single host a self-signed certificate is acceptable, which can quickly be made in two steps:

openssl genrsa -out selfsign.key 2048
openssl req -new -x509 -key selfsign.key -out selfsign.crt -sha256
# ... then follow the interactive prompts 
 
When you are doing things like TLS Client Authentication or generally need a certificate that will verify the fastest way is to invent your own CA (certificate authority):

This takes five steps (and filling out the interactive prompts after each 'req'):

openssl genrsa -out ca.key 2048
openssl req -new -x509 -key ca.key -days 3650 -out ca.crt -sha256

openssl genrsa -out host.key 2048
openssl req -new -key host.key -sha256 -out host.csr

openssl x509 -req -days 3650 -in host.csr -out host.crt -CAkey ca.key -CA ca.crt -sha256 -set_serial 2 
 
The first two generate the self-signed certificate that will be the CA. A root CA by definition is self-signed, you just choose as a user that this particular certificate is a trusted root CA.

The second two generate a CSR (certificate signing request) that we want the CA to sign.

The last step takes the CSR and signs it with the root CA, which sets the 'issuer' attribute in the certificate to reflect the CA.

To sign more certificates, change the -set_serial to be the next number, and change from host.key, host.csr and host.crt to the new files.

To inspect any certificate use: openssl x509 -in certificate-file.crt -text -noout if it is in PEM format , which is the default for the above commands. Certificate file extensions are for people, not for the cryptographic libraries. You may name these files whatever you want, but it is up to you to understand the application and usage to make sure they are in the proper format. If they are not in the right format seek out how to convert them to the correct format.

After you understand this a little better you will want to revise how each certificate is generated and probably introduce an intermediate CA for signing. This will allow you to set the root CA to have a much longer lifespan but still be able to manage. A great way to see what options you might want to set is to look in the wild with openssl s_client -connect google.com:443 | openssl x509 -text -noout and research attributes you think are important.

Notes:
  • '-sha256' is added because of the Google (among other companies and groups) pushing to sunset sha1 signed certificates.
  • These commands work as-is on openssl 0.9.8, 1.0.0 and 1.0.1
  • ##openssl on irc.freenode.net

Monday, July 2, 2012

Bad Coders Doing Bad Things, 1 of 2

This particular post will focus on the 'namespace technique' that I made mention of in a prior post I received a lot of negative feedback in both the comments on that post and on reddit. There is a separate post that I would like to go into on its own merits, and I will focus on that on the next post in this pair. If any new concerns are raised I will have to respond to them in future posts. Now after re-reading the post and the influx of comments I've received I feel it would be reasonable for me to sit down and express in more detail why this solves a problem for me along with some more detailed examples and some new analysis of the costs of what I've been doing/done.

afterthoughts

I highly recommend skipping to the Conclusions section at the end. If the results interest you then feel free to examine my reasoning and how I got there. I write these posts as a learning experience so what I know at the beginning may be different than what I know at the end so if you only care about the answer skip ahead.

1 Environments

There are a lot of code samples and assembly samples that I am going to be providing in this post. If I mention cl I am referring specifically to Microsoft's compiler version "15.00.30729.01 for x64" with setenv /Vista /x64 /Release set. I use the Microsoft SDK and not Visual Studio out of personal preference as it lets me use my ASUS Eee 1005HA as a coding workstation when I am killing a bit of time or wanting to test out a new idea. Visual Studio works fairly well on it but I've started to become accustomed to using org-mode with Emacs to write in a literate-esque coding style by embedding and discussing the merits of snippets (to myself) for future reference and just because it helps me learn better.

The other compiler I am using is gcc of version 4.4.5 with the target of "x8664-linux-gnu." Depending on how much information I get back from these two compilers I might explore clang, but I run Debian stable on my laptop so I would have to compile it from scratch to make sure I gave it a fair analysis. I also would like to spend some time exploring other platforms and the costs I am incurring with my namespace technique. Some future work might be analyzing the AVR C compiler and maybe some ARM output, but I don't have a good test environment for either of those outputs.

2 Preconceptions and Reasoning

The fundamental question I feel that wasn't explained was why I even conceived of such a wacky technique in the first place. A while ago I was considering how to make C a more interactive coding environment like you would find in a lot of other languages like Haskell, Lisp, Scheme and Python. By 'interactive' I am referring to the idea presented fairly well on Wikipedia, but my most direct experience with the idea was spawned by "Writing Interactve Compilers and Interpreters" by P. J. Brown. I admit I inherited(stole) this book from my father but it is published at a very interesting time in programming (1979). SmallTalk (1972) and C (1979) had only been out for a few years. GNU (1983) hadn't even started yet and it was still kind of a wild time. All of these were before my time so it's very interesting to look back in retrospective and realize that some things I grew up with just didn't exist. Another interesting fact is that this particular book is very much out of print and it is just a fascinating realization that there are probably hundreds of books like this that didn't carry on. Computer Science as a field has progressed to the point that we have separated 'classics' like Knuth's works and K&R from pieces of the day like the book I just referenced. Just because a book is out of print doesn't mean there isn't information contained within the text that doesn't exist anywhere else.

Anyway, back to the point at hand- I was trying to conceive of a proper interactive programming environment in C with the same kind of workflow I had with Haskell or Python. I'd write a bit of code then C-c C-l and hop over to a REPL that just had loaded the code to try it out. I never progressed particularly far on this project because I got distracted by other things going on and never got back to it but some of the techniques I explored to accomplish the project stayed with me. The first problem I had was how to create a mapping between function name to function which is a reasonable problem. In C though there really isn't a good way to abstract away from function lists so in my tests I started building some rather static function lists like:

/* Assume foo, bar and foobar are functions defined elsewhere with
   matching function signatures. The functions aren't really important
   for this demonstration. */

struct fp_map {
  
  void  (*foo)    (void);
  int   (*bar)    (int, int);
  char* (*foobar) (void);
  
};

static struct fp_map FP = { &foo, &bar, &foobar };

The idea here is now that I had a set of function pointers, so I could write a test program that mapped to the function pointers in a fashion like the following code:

int main( int argc, char **argv )
{

  FP.foo();
  printf( "bar() called with result %d\n", FP.bar( 1, 2 ) );
  printf( "%s\n", FP.foobar() );

  return 0;
}

So this allowed me to do some dynamic swapping of functions. Let's list the output of the existing code first:

foo() called.
bar() called with result 3
foobar() called.

Let's then define a new code snippet that replaces the call to foobar():

char* new_foobar ( void )
{
  return "new_foobar() called.\n";
}

Then we change our main() to swap in new_foobar():

int main( int argc, char **argv )
{
  FP.foobar = &new_foobar;
  FP.foo();
  printf( "bar() called with result %d\n", FP.bar( 1, 2 ) );
  printf( "%s\n", FP.foobar() );

  return 0;
}   

After running the program our output is as expected:

foo() called.
bar() called with result 3
new_foobar() called.

So now I had a technique to hotswap in functions. The next step would be to flesh out the fully interactive programming environment and do some magic with Position-Independent Code(PIC) to essentially make a Just-in-Time Compiler(JIT) and hotload in functions. I never fully explored this concept because I needed to learn a lot more things that would allowed me to pull this off in a reasonable way (mostly just finding a JIT, or stealing LLVM's codebase to allow me to pull this off). I also needed to learn elisp a bit better because half of the project was UI. So all in all an interesting idea, but I just needed to learn a lot more before I dived in headfirst.

So what did I take away from this? I found that it was actually pretty handy to have the functionality in one place, especially if I didn't make it const and allowed me to swap in functions. I explored this concept a bit more in other small programs I made where I was trying to create a cross-platform implementation of some system specific calls. I created a struct containing the functions I wanted to use and then a 'make' function that returned that struct filled out with the system-specific functions. How I would do this would be to have two separate files:

# On Windows
cl main.c somelib_win32.c /Femain.exe

# On Linux
gcc main.c somelib_linux.c -o main

So the header would define the external 'make' function for this library and then define the struct's format. In the main program you'd call the make function to provide the main functionality and then you could have cross-platform code without using a ton of preprocessor tricks to get it to work the same way. When I am just experimenting on small code ideas it's a lot easier for me personally to write a simple makefile that works both with nmake and make that has a setup like this:

all:
        echo "use nmake win32 or make linux"

win32: 
        cl main.c somelib_win32.c /Femain.exe

linux: 
        gcc main.c somelib_linux.c -o main

Yes this makefile is not good because I don't compile separately to object files, but I have used the more abbreviated versions to try and not just be too verbose with the code listings. I am assuming most people reading this have compiled software on the command line and have used sensible makefiles. What this results in is having a makefile that works on Windows and Linux without having to define platform specific options, use CMake, scons or some other tool.

Since I'd abstracted away from pointing directly to function names this 'namespace technique' allowed me to write the main logic once and worry about the actual implementation separately and also allow myself to test the platform specific stuff in a contained environment. I find reading code with a lot of #ifdef preprocessor action to be a bit awkward since you end up having to interpret two different code execution trees that might intersect and also become a lot less modular since trying to rip out the Linux-only code or the Windows-only code for another project is a challenging and error prone proposition. The technique I presented lets me avoid that a little better.

What about duplicating work? Won't I be writing the same functions twice? These are very reasonable questions, but you can solve these in a rather clean and preprocessor free manner.

[insert image#1]

By putting the platform independent code in a separate file entirely you really pare down to the platform dependent code. To this end I feel there is a very limited duplication of effort since in most platform dependent code I want to use normally has wildly different interfaces or requirements to set things up and tear things down in a responsible and correct manner. By using this technique I can prevent this from being even an issue. Here is an example project demonstrating the method up until this point.

Now at this point I'd already started to use this pattern regularly in my code, but I realized there was a benefit to this particular technique that went beyond just giving a replaceable interface: you could actually form different entry points to a complex library and you could create an implicit tree of what a library provided. I'm going to create a theoretical code example for a while regarding parts of speech that I am not going to provide a code sample for, just to demonstrate what I mean.

2.1 English Parser

Let's say you wrote a library to parse the English language. This is a hard task in general just because English is very complex, meaning there are probably a lot of functions that belong together but have wildly different results. I'm going to present a subset of those theoretical functions and pretend this project is called 'epar' for English parser which is something you would normally have in C for a function prefix. These functions won't be commented either because in most cases that's what you actually get in libraries.

/* Text Processing */
extern int engl_next_word          ( char *src, char *buff, size_t buffsz );
extern int engl_next_sentence      ( char *src, char *buff, size_t buffsz );
extern char** engl_break_words     ( char *src );
extern char** engl_break_sentences ( char *src );

/* Sentence Operators */
extern int engl_sentence_tense_cons ( char *src );
extern struct engl_sentence_diagram* engl_get_sentence_diagram( char *src );
extern struct engl_partial_diagram*  engl_get_subject_diagram( char *src );
extern char* engl_get_subject   ( char *src );
extern struct engl_partial_diagram* engl_get_predicate_diagram( char *src);
extern char* engl_get_predicate ( char *src );

/* Word Analysis */
extern int engl_word_is_verb      ( char *src );
extern int engl_word_is_noun      ( char *src );
extern int engl_word_is_adjective ( char *src );

/* Word Tense Checker */
extern int engl_word_past_tense    ( char *src );
extern int engl_word_present_tense ( char *src );
extern int engl_word_future_tense  ( char *src );

/* Spellcheck */
extern char** engl_spellcheck_word     ( char *src );
extern struct spellcheck_list* engl_spellcheck_sentence ( char *src );

/* Bad Words */
extern int   engl_is_badword       ( char *src );
extern int   engl_contains_badword ( char *src );
extern char* engl_replace_badwords ( char *src );

I don't feel this is an unreasonable representation of actual header files you would see in the field. Some examples of this are ECL's external.h, Lua's lua.h and Linux's include/linux/console.h. In fact now in hindsight I think it was browsing the Linux kernel or watching an old talk based for new Linux kernel developers that clued me into this idea, since they use a similar technique that I described above in that console.h file. I vaguely remembered the talk describing why not bother with C++ in the kernel and the reverse argument was "You can still overload functions and do object oriented programming techniques, and the kernel has it so why bother with C++?" I can't remember the video and that quotation is from memory. Anyway, the demonstration I wanted to make by linking to those real-world source files is that they are a listing of functions loosely categorized with a single-line comment and whitespace. It's a common thing to see.

Now let's categorize the functions in that english code sample into some simple classifications:

/* Linear Retrieval */
extern int engl_next_word         ( char *src, char *buff, size_t buffsz );
extern int engl_next_sentence     ( char *src, char *buff, size_t buffsz );

/* Full String Processing */
extern char** engl_break_words     ( char *src );
extern char** engl_break_sentences ( char *src );

/* Value Retrieval */
extern char* engl_get_subject   ( char *src );
extern char* engl_get_predicate ( char *src );

/* Predicates (Functions that return a true or false value) */
extern int engl_sentence_tense_cons ( char *src );
extern int engl_word_is_verb        ( char *src );
extern int engl_word_is_noun        ( char *src );
extern int engl_word_is_adjective   ( char *src );
extern int engl_is_badword          ( char *src );
extern int engl_contains_badword    ( char *src );
extern int engl_word_past_tense     ( char *src );
extern int engl_word_present_tense  ( char *src );
extern int engl_word_future_tense   ( char *src );

/* Full String Replacement */
extern char* engl_replace_badwords ( char *src );

/* List of Unrelated Values */
extern char** engl_spellcheck_word     ( char *src );

/* Returns Custom Struct with new Data */
extern struct engl_sentence_diagram* engl_get_sentence_diagram( char *src );
extern struct engl_partial_diagram*  engl_get_subject_diagram( char *src );
extern struct engl_partial_diagram* engl_get_predicate_diagram( char *src);
extern struct spellcheck_list* engl_spellcheck_sentence ( char *src );

Now looking at the second listing compared to the first it is obvious that the categorization among some of the things is difficult to interpret properly. In fact someone trying to use this header might miss that engl_sentence_tense_cons is a predicate because of an artifical shortening of the name. In this example I am defining this hypothetical function as a 'function that returns a true or false value based on whether the passed sentence has a consistent tense.' I didn't even mean to make this an example of a terrible name for a function, I did it automatically just as a C programmer because it makes sense to shorten the name after a 'certain number of characters.' Assuredly though the next one who looks at this code a month from now couldn't figure it out without assistance.

Also some of these functions could be used in a standalone context. What if I just wanted a library that would analyze a sentence and remove all the bad words for a blog filter? Are you sure that you could extract engl_replace_badwords from the listing above and apply it properly? Would you have to go read the source or pray that there was some documentation elsewhere? I find myself having to exercise the other options to figure out what return values I should expect and what would be reasonable.

We could even list these same functions in a third way:

/* String Operations */
extern int    engl_next_sentence   ( char *src, char *buff, size_t buffsz );
extern char** engl_break_sentences ( char *src );

/* Sentence Operations */
extern int engl_next_word              ( char *src, char *buff, size_t buffsz );
extern char** engl_break_words         ( char *src );
extern char*  engl_get_predicate       ( char *src );
extern int    engl_sentence_tense_cons ( char *src );
extern char*  engl_get_subject         ( char *src );
extern int    engl_contains_badword    ( char *src );

/* Word Operations */
extern int engl_word_future_tense   ( char *src );
extern int engl_word_past_tense     ( char *src );
extern int engl_word_present_tense  ( char *src );
extern int engl_word_is_adjective   ( char *src );
extern int engl_word_is_verb        ( char *src );
extern int engl_word_is_noun        ( char *src );
extern int engl_is_badword          ( char *src );

/* Language Detail Functions */
extern struct engl_sentence_diagram* engl_get_sentence_diagram ( char *src );
extern struct engl_partial_diagram*  engl_get_subject_diagram  ( char *src );
extern struct engl_partial_diagram*  engl_get_predicate_diagram( char *src);

/* Language Filter Operations */
extern char**                  engl_spellcheck_word     ( char *src );
extern char*                   engl_replace_badwords    ( char *src );
extern struct spellcheck_list* engl_spellcheck_sentence ( char *src );    

This is a completely valid way to list the same functions but depending on what you are trying to do, the order presented could make it really easy to find what you want or nearly impossible without help. So back to what I presented last time, I'll cut and paste what I presented as an alternative:

#ifndef __NS_H__
#define __NS_H__

#include <stdio.h>

void func1 ( void );
int func2( char *word );

struct NS_namespace
{
  void (*func1) ( void );
  int  (*func2) ( char* );
};

static const struct NS_namespace NS = { &func1, 
                                        &func2 };

#endif /* __NS_H__ */

This structure allows you to call func1() with NS.func1() as was discussed earlier. Now what I am trying to express is that if you use my terrible technique you can create a structured tree of your code which gives one true place to use as reference but also can divide up the code into things with the same return types. While you can kind of create this with a naming scheme you can easily have a situation where the nomenclature just doesn't line up just out of organic creation of the code. I'm going to focus just on the word predicates and propose a namespace using the technique I have described above:

struct Word_namespace
{
  
  struct {
    /* Returns 0 if true, 1 if false */
    int (*verb)          ( char* );
    int (*noun)          ( char* );
    int (*adjective)     ( char* );
    int (*future_tense)  ( char* );
    int (*past_tense)    ( char* );
    int (*present_tense) ( char* );
  } is;
  
};

const struct Word_namespace Word =  { { &engl_word_is_verb,
                                        &engl_word_is_noun,
                                        &engl_word_is_adjective,
                                        &engl_word_future_tense,
                                        &engl_word_past_tense,
                                        &engl_word_present_tense }};

Now I think would be a good time to express my normal workflow which is:

  • Get an idea
  • Write code, Test for Correctness
  • Build a 'namespace'
  • Map the functions

As was mentioned on the comments it is something that could easily be error prone, but it is something I usually do at the end to tie everything up in a nice bow for the next time I need it. Something to note is that in the initial function list I made the tenses follow a logically different tree, but in reality they were the same sort of predicate as the types of words. They inspected the word passed in and returned true if they were in the set of verbs, nouns, adjectives or in the set of words with a future tense, past tense or present tense.

I was able to do this after the fact at a very high level while still keeping the internal source consistent. Maybe I used engl_word_past_tense all over the place in my tests, in my source files and in my internal documentation. Replacing the function's name everywhere it appears just can't be assumed to be safe, because it's possible that you miss one file and break everything. Additionally at one point it made sense to me to call it that at one point in time. Part of coding is expressing the ideas you have the best way you can in a language a compiler understands, so maybe in the context of what you wrote it makes sense and changing the name after the fact makes the code confusing to read or less insightful of your original intent.

3 New Information

*The information in this 'New Information' section is problematic and should not be followed. It is left in because it was a thing I wanted to try. Move on to the Conclusions for more details.*

Now in both the reddit comment thread, the original post and even a contribution on github it was revealed to me that the function definitions didn't need to linger in the header files and that by defining the struct and informing the compiler that the namespace struct was defined elsewhere was actually a superior option. I never thought of this personally because it never really hurt me or bothered me, but in fact this gives some additional benefits.

I will bring back the original NS.h header file to demonstrate:

#ifndef __NS_H__
#define __NS_H__

#include <stdio.h>

void func1 ( void );
int func2( char *word );

struct NS_namespace
{
  void (*func1) ( void );
  int  (*func2) ( char* );
};

static const struct NS_namespace NS = { &func1, 
                                        &func2 };

#endif /* __NS_H__ */

Now with this new information it becomes clear that defining NS and both func1 and func2 in the header file is not only redundant but it hurts the modularity gain that I saw when I was using function pointers in the fp_map example. It is much more reasonable and sane to write the following:

In NS.h

#ifndef __NS_H__
#define __NS_H__

#include <stdio.h>

struct NS_namespace
{
  void (*func1) ( void );
  int  (*func2) ( char* );
};

extern const struct NS_namespace NS;

#endif /* __NS_H__ */

In NS.c

#include "NS.h"

void func1 ( void )
{
  printf( "I am function 1.\n" );
}

int func2( char *word )
{
  
  /* Check Parameters */
  if ( word == NULL ) {
    
    printf ( "NULL word passed to func2.\n" );
    return -1;
  }
  
  printf( "The word of the day is %s.\n", word );

  return 0;
}

/* Define NS here */
static const struct NS_namespace NS = { &func1, 
                                        &func2 };

Now after all of what I've learned this is the superior option of both the const technique and the variant that allowed you to do replacement of source files for cross-platform antics. Anyway, my bad idea just got a little bit better.

4 Experimentation

Some honest feedback I received were criticisms about optimization problems. To this end I am going to analyze this technique against both gcc and cl. First I need to segment the possible compiler optimizations that could effect functions.

I'm not an expert on compiler optimizations at this time so I am going to have to defer to the Wikipedia listings. I went through the 61 listed optimizations and tried to target what optimizations might have trouble and have to do with function manipulation rather than instruction/variable/precomputed value manipulation. The final list I came up with was as follows:

Enabling Transformation
Inline Caching
Inline Expansion
Interprocedural Optimization
Return Value Optimization

The remainder of optimizations listed on that page have to do with the deduplication of work, loop optimization or some other rearrangement of basic blocks of code. The five listed articles above had to do with effectively inlining of either the instructions or the result of a function. If the 'namespace' is const then by definition then the compiler is free to do direct value replacement in the resulting function.

I was going to make an example but the previous exposition took me too long to write and someone beat me to the punch (in my own comments no less)! So in the comments of the post that started all this Hraban writes:

...
Not unless by "vastly" you mean "not at all". Compile with -O3, look at the assembly:

https://gist.github.com/3033687#file_test_o3.s

Line 11 and 12 are equal. Thus is the power of const and -O3.

Hraban

The relevant link to the gist is here and it matches my own tests. Even now in realizing how little I know about compiler optimization tricks the answer is pretty clear: once you can guarantee a value to be constant you are free to replace it as is anywhere. This const modifier applied to the 'namespace' gives C the greenlight to drop the address to the appropriate function in as if you had called it directly without the 'namespace.'

The only thing left to check is if the common theme of the five remaining articles applies: if a function could normally be inlined would it still be inlined using this namespace technique on both gcc and cl? The first thing to do is to identify a code sample that with the appropriate options is inlined as a normal function call. While it's definitely possible for the function to be inlined it also possible for the inline optimization pass to happen AFTER the constant replacement/injection manipulation.

The act of inlining functions became official as part of the C99 standard and so while some compilers might do the expected thing C89 effectively 'doesn't inline.' So if we were compiling based on the C89 spec we already 'won.' Still we are interested in what the compiler would do.

I'm starting with gcc first just because I happen to be working on Linux at the time of writing this post, I will do cl after. Here is the very simple and short code sample build with the command line gcc -S inlinetest.c:

#include <stdio.h>

int inline_value()
{
  return 5;
}


int main( int argc, char **argv )
{
  printf( "%d", inline_value() );
  return 0;
}

Since an optimization setting wasn't picked I did not get gcc to inline by default. The resulting assembly from main() is as follows:

.globl main
        .type   main, @function
main:
.LFB1:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        movq    %rsp, %rbp
        .cfi_offset 6, -16
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        movl    %edi, -4(%rbp)
        movq    %rsi, -16(%rbp)
        movl    $0, %eax
        call    inline_value
        movl    %eax, %edx
        movl    $.LC0, %eax
        movl    %edx, %esi
        movq    %rax, %rdi
        movl    $0, %eax
        call    printf
        movl    $0, %eax
        leave
        ret
        .cfi_endproc

Now even if you aren't familiar with assembler at all you should be able to recognize 'call inlinevalue' being an instruction that occurs. Next let's ask gcc to optimize our code with gcc -S -O2 inlinetest.c:

.globl main
        .type   main, @function
main:
.LFB12:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        movl    $5, %esi
        movl    $.LC0, %edi
        xorl    %eax, %eax
        call    printf
        xorl    %eax, %eax
        addq    $8, %rsp
        ret
        .cfi_endproc

Now from this example it is clear the inline occured as one would expect, as you can see no call to inline_value(). Now, let's revise the code to use namespacing. I am going to keep it in the same file for no particularly good reason other than to make it short to post:

#include <stdio.h>

int inline_value()
{
  return 5;
}

struct A_namespace
{
  int (*_ivalue) ( void );
};

const struct A_namespace A = { &inline_value };


int main( int argc, char **argv )
{
  printf( "%d", inline_value() );
  printf( "%d", A.value() );
  return 0;
}

First let's analyze the unoptimized output for main():

.globl main
        .type   main, @function
main:
.LFB1:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        movq    %rsp, %rbp
        .cfi_offset 6, -16
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        movl    %edi, -4(%rbp)
        movq    %rsi, -16(%rbp)
        movl    $0, %eax
        call    inline_value
        movl    %eax, %edx
        movl    $.LC0, %eax
        movl    %edx, %esi
        movq    %rax, %rdi
        movl    $0, %eax
        call    printf
        movq    A(%rip), %rax
        call    *%rax
        movl    %eax, %edx
        movl    $.LC0, %eax
        movl    %edx, %esi
        movq    %rax, %rdi
        movl    $0, %eax
        call    printf
        movl    $0, %eax
        leave
        ret
        .cfi_endproc

Now for simplicity's sake I am going to pare it down to the actual 'namespace' call:

movq    A(%rip), %rax
call    *%rax

This sort of code example is actually one of the main drawbacks that I expected right away. There is an excess call to load the function pointer into a register and then call the function. This is just an extra line of code but there is a lot that could go on. For instance if there was a page fault not only would you effectively have to 'fetch' the namespace struct and load a value you would also have a 'fetch' to make sure the function is loaded and then call that function. This double pagefault could really add up, especially if a lot of functions used this technique. Now let's try an optimized version from gcc -S -O2 inlinetest.c:

.globl main
        .type   main, @function
main:
.LFB12:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        movl    $5, %esi
        movl    $.LC0, %edi
        xorl    %eax, %eax
        call    printf
        call    *A(%rip)
        movl    $.LC0, %edi
        movl    %eax, %esi
        xorl    %eax, %eax
        call    printf
        xorl    %eax, %eax
        addq    $8, %rsp
        ret
        .cfi_endproc

Now the two lines was reduced to a single line: call *A(%rip). This is actually still a 'bad result' since the true right result is to eschew the function call entirely and just load

  1. I did try gcc with -O3 but I got the same result as -O2. This is a rather heartbreaking result for me personally

because it proves me wrong.

I do have a few long shot ideas though:

  • I did not set the code to be static in the original

source. Nope. Still the same.

  • Now to provide an inline hint to inline_value()? Nope!
  • With -std=c99? Nope!

To be fair I did expect this as a result, since it seemed reasonable to me that this sort of technique would not be inline-friendly.

There still is one final phase to this technique. While inlining does not happen within the same file what if I use it in the method I described above where the functions are effective external?

/in 'inlinetest2.h'/

   #ifndef __INLINETEST2_H__
#define __INLINETEST2_H__

struct A_namespace
{
  int (*value) ( void );
};

extern const struct A_namespace A;

#endif /* __INLINETEST2_H__ *

/in 'inlinetest2.c'/

#include "inlinetest2.h"

inline int inline_value()
{
  return 5;
}

const struct A_namespace A = { &inline_value };

/in 'test2.c'/

#include <stdio.h>
#include "inlinetest2.h"

int main( int argc, char **argv )
{
  printf( "%d", A.value() );
  return 0;
}

The first thing we want to do is compile by parts. First let's compile inlinetest2.c with gcc -c -O2 inlinetest2.c. Next we need to compile test2.c and examine its source after gcc -S -O2 test2.c (We have to use -c or it attempts to link in both these cases). Unfortunately we have the function still being called:

.globl main
        .type   main, @function
main:
.LFB11:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        call    *A(%rip)
        movl    $.LC0, %edi
        movl    %eax, %esi
        xorl    %eax, %eax
        call    printf
        xorl    %eax, %eax
        addq    $8, %rsp
        ret
        .cfi_endproc   

Now let me try -O3.. still the same. In this example inlining will not occur under any circumstance with gcc I was going to try this with cl as well but if it doesn't work in one place it's not reliable anywhere else.

I think the last thing we need to do is replicate the test case and the resulting example. In my testing using the namespace example project I was not getting the 'pointer to function' call statement, I was getting a literal call statement as much as Hraban was getting. Let me use that project again. I suspect that in NS.h defining both the functions and the namespace in the header file allowed the compiler to optimize in a direct function call. Here is the NS.h file for reference:

#ifndef __NS_H__
#define __NS_H__

#include <stdio.h>

   void func1 ( void );
   int func2( char *word );

   struct NS_namespace
   {
   void (*func1) ( void );
   int  (*func2) ( char* );
   };

   static const struct NS_namespace NS = { &func1, 
                                           &func2 };

#endif /* __NS_H__ */

After running this through gcc -S -O2 main.c I got the same result Hraban got:

.globl main
        .type   main, @function
main:
.LFB11:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        call    func1
        xorl    %edi, %edi
        call    func2
        testl   %eax, %eax
        js      .L6

Notice now both calls to func1 and func2 are not pointers? This is a very odd response. Let me investigate trying to make them pointer-to-functions like the other tests.

The first thing I think would be to annotate both func1() and func2() as extern. No change.

Now I am going to use the technique where I define the struct and make the actual namespace struct extern. This changes our two NS.h and NS.c files to be:

#ifndef __NS_H__
#define __NS_H__

#include <stdio.h>

struct NS_namespace
{
  void (*func1) ( void );
  int  (*func2) ( char* );
};

extern static const struct NS_namespace NS;

#endif /* __NS_H__ */
#include "NS.h"

void func1 ( void )
{
  printf( "I am function 1.\n" );
}

int func2( char *word )
{
  
  /* Check Parameters */
  if ( word == NULL ) {
    
    printf ( "NULL word passed to func2.\n" );
    return -1;
  }
  
  printf( "The word of the day is %s.\n", word );

  return 0;
}

const struct NS_namespace NS = { &func1, 
                                 &func2 };   

Note I had to drop the static specifier from the function since as soon as it was no longer in a shared header the declaration became 'non-static'. I know this is an awful thing to say but at this point I am getting a little frustrated and just want to finish this blog post, so I just made 'em const. Now let's see the assembly:

.globl main
        .type   main, @function
main:
.LFB11:
        .cfi_startproc
        pushq   %rbx
        .cfi_def_cfa_offset 16
        .cfi_offset 3, -16
        call    *NS(%rip)
        xorl    %edi, %edi
        movq    NS+8(%rip), %rbx
        call    *%rbx
        testl   %eax, %eax
        js      .L6        

Aha! Well at least I know where it comes from. It seems as soon as you lose the 'static' context gcc at least is unable to do a 1:1 replacement which makes sense. By my interpretation static represents that there is one canonical value and const implies that a value doesn't change. As soon as static is removed from the equation this replacement can't occur. I'm going to take a second to look around quick to make sure I am interpreting the bits right. Now straight from the horse's mouth in the C99 specification, page 140:

4 The storage-class specifier, if any, in the declaration specifiers shall be either extern or static.

So what this means is that you can't have extern and static on the same storage class specifier (like a struct), it's got to be one or the other. Since extern causes the regression and static does not I will have to say the naive approach I started with is ironically the most correct (even though inlining doesn't work).

5 Conclusions

This turned out to be a much more monumental process than I thought it would be. I intended on reaching out to multiple compilers and trying to explore some of the options but at this point I need to just put a stake in this and call it done. In the end this is one of those posts where you start off in a direction and you walk really far to the point of your own uncertainty in existence itself to find that you were right all along. This feeling though at the peak of enlightment is often a humbling and almost depressing experience. In seeking the truth in this case I felt like a true antagonist to myself to the point that I was disappointed I was right.

My naive response to earnest criticism based on just a gut feeling was actually the right result, suggestions made like this one where there was implications on how to do it better were in fact the wrong path even though it seemed better to both parties at the time. I kind of also wanted this to be an example of 'why not to be crazy and stay within the painted lines' that I could quote myself to others to try and stave off their mistakes but that is a blog post for another time.

In the end though the way to get this to work correctly and right at the cost of not having some functions inline is the same as it was before as seen in the first revision of my 'namespace example.'

In the end though I hope I gave this idea a fair shake and passably crude analysis. If people think otherwise please let me know where I went wrong either in the comments here, the eventual reddit comment threads or just harrass me on twitter. I plan on writing another post expressing the right way to do this technique separate from all this fumbling around.

6 Future Work

I do have one more serious grievance posed by a reddit comment to take a look at and thoroughly analyze, and that has to do with the Python array slicer I threw together in a couple of hours. I want to really give that as much thought as I did this to really make sure I am approaching the language correctly and not spready false truths around in my wake. I do teach other people so I feel it's my responsibility to make sure I know it 110% right.

I did this sort of analysis for gcc only. I intended to use cl but the time got away from me.

Next there was a particular piece of information in the middle of this post about using my namespace technique to offload/sort and to classify the complexity of code. Now that I have proven at least with gcc that this technique is effectively 'free' compared to calling a function itself I want to quantify this idea of complexity coming from misnaming stuff. I planned to insert this somewhere in this post but oops.

Also I mentioned Hraban in this post a few times near the end and he actually submitted a really great idea on how to extend my idea and make it work a little differently/intelligently. You can see that change in this github pull request. I'm not going to merge it right away until I take a look at it and really try to examine the side effects but it definitely seems like a way to use preprocessors for good and not for evil. I had a similar idea I was contemplating but his seems like a hundred times more sensible.