Friday, June 29, 2012

"How Do I Get C Namespaces?", "How do I get Python Slices in C?" and other questions

I admit I haven't been on proggit for a while because it often is very frustrating to try and enjoy reading some of the articles & comments on there recently. One good example was a link to an O'Reilly article & video that spawned a lot of really uninformed and mostly awful comments that either glorified C as some sort of arcane magic beyond the common man or problematic viewpoints that aren't helpful to anybody. The main comment thread I am referencing is here. I've broken down most of the posts into subsections since the post got rather long.

C is either Rocket Science or Magic
First off there is this post by carpalDebris:
... now i like C a lot. but when i program C i am constantly considering how many instructions i am generating, calculating my run time in terms of nano seconds, calculating my memory usage to the byte, considering what the throughput of my bus architecture is. this is NOT what the average programmer wants to do. heck it's not even what i want to do, it's what i have to do to accomplish my mission...
This is an insane perspective on using C, since unless really considering your final executable size is an actual issue as well as the thoroughput of the hardware under your code is the absolute true goal of your software it just isn't important. You can code in C just like any other language without worry about such features, and regardless you can't consider this sort of stuff while you are coding anyway. The Knuth quote "premature optimization is the root of all evil" is fitting here.

Next up is a rather long comment from aMindWellArmed:
A word of warning regarding C: learn assembly first. The reason is that C isn't actually to be thought of as a high-level language, but as a more expressive assembly language. The problem is that it does this in a manner which can only be described as arcane. With assembly, all you have to keep in your head are the assumptions about the details of the machine architecture you are using and the details you have about your own code. And since assembly is far from easy, you will be very diligent writing your code. With C however, you not only have to keep in your head what you have to for writing in assembly, but also intricate knowledge of different versions of the C standard AND the idiosyncrasies of pretty much all popular compilers, and sometimes even specific version of them. At the same time C looks and feels like a high-level language, so you will let down your guard down and write code with less diligence, introducing bugs which often only appear once you want to switch architecture.

Ignore people who say that C is easy. Claiming that C is easy is like claiming that brain surgery is easy: of course, anyone can perform brain surgery, the only difference will be with the patients' survival rate. C is harder than assembly.
...

I tried a few times to cut out some of the problems I had with this comment but I found it hard not to distort what he was trying to say. The C to assembler analogy is one of the worst things to bring up to new programmers. When you are compiling a C program you don't really care about the assembly at all unless you are doing math with large numbers and at that point you should be using a proper math library anyway like GMP. Yes, if you try to do craziness with packing structs in a certain way or really testing the limits of pointers you have to worry about the specifics of compilers and architectures but in general code it just isn't a problem. Since a student won't jump to doing these sorts of tricks you can definitely put this off until later.

With a bit of discipline it's fairly easy to write C code without significant problems as long as you just have in the back of your head that when some numbers get really big things could explode.

My feelings are best summed up by Whisper:
It makes me sad that this is the modern attitude. Those of us who learned on C don't consider it so... instead, we consider high-level languages to be easy and convenient, when you can get away with using them.

I have always thought that one should learn to program using the lowest level language possible, then move up the stack when convenient.
On Namespaces

There are quite a few people including myself who do miss namespaces in C. They are a great way to reign in complexity and make things a lot more readable and sensible. Xanny is right with "Big projects get ridiculously out of hand with crazy syntax like foo_bar_function() to duplicate having namespaces." There were a few other posts regarding this sort of thing as well.
I love having a good module system, and I'm definitely spoiled when I use C#, Java, Haskell or one of the other languages I know with namespace functionality. I just wish people didn't forget that namespaces are easy and possible with C. As you can see in this github namespaces are not only doable but rather straightforward.

I won't go into NS.c since it is just a simple definition of functions that someone familiar with C should read and understand easily. Instead I am going to focus on NS.h.

After the include guard there are two functions defined of func1 and func2 which are very straightforward and link up to the definitions in NS.c. Next up is a simple struct that has two function pointers inside:
struct NS_namespace
{
  void (*func1) ( void );
  int  (*func2) ( char* );
};

This, as one would expect matches up to the definitions of func1 and func2. Now we need a static instance of this to reference:
static const struct NS_namespace NS = { &func1,
                                        &func2 };

This essentially creates the namespace of "NS" to any file that includes NS.h. Let's see how we use it in main.c:
/* Let's try all the possible outcomes. */

NS.func1(); /* Since func1() returns void you can assume it to be right */

/* Call func2 and trigger error condition. */
if ( NS.func2( NULL ) < 0 ) {

  printf( "Error detected in calling func2!\n" );
}

/* Call func2 with a word. */
if ( NS.func2( "salt" ) < 0 ) {

  printf( "Error detected: Best Practice is to check return values.\n " );
}
If this doesn't remind you of namespaces I don't know what will. Plus in the Function pointers you can still accomodate having a traditional c prefix, but alias it to something simple:
/* Given a function */
void long_prefix_function_foo( void );

/* And a namespace struct */
struct Bar_namespace
{
   void (*foo) ( void );
};

/* Map the namespace */
static const Bar_namespace Bar = { &long_prefix_function_foo };

/* Use it! */
Bar.foo();

Easy to do, easy to write and not much more painful than explicitly labeling exports in a module system. I haven't done amazing amounts of research into the costs for doing this but for readibility it's great.

Let's Just Bolt a Spoiler to It

This is a set of honest to goodness good ideas to handle C programs better ralin suggests embedding Lua, and greenspans suggests embedding Guile. These are both reasonable options, I just have found that trying to sort out some of these embedded languages across systems in a portable and reasonable manner is just a lot of trouble. You really need to know how Lua and Guile work under the hood to port it fast and easily, as you usually have to find some contorted way to wrap and lift your C data into Lua and Guile's formats. It's not impossible but it's also pretty time consuming, and just makes it hard to 'get things done.'

Furthermore in my opinion most of these embedded languages don't really work 'side-by-side' with C but really turn into a case of you writing a program in another language with a lot of C extensions. I feel this is alright to be honest but my use cases in general don't often fit that. I have tried using Lua, Guile, and ECL at length (and probably some other options that I have forgotten about) and I never really feel like I nail the right feel of just having a common space where I can call functions with some functions being from the embedded language and some functions being in C. I think I need to go back to trying to have Haskell and C side by side a bit but I'm not entirely sure that would be as nice as one would hope.

Actual Issues

After you get past the initial molehills and start climbing the mountain of what you can do in C there are some actual issues. A decent roundup was done by zvrba:
  • implicit conversions (also mixing signed/unsigned arithmetic and comparisons),
  • no built-in data structures nor a standard library providing them,
  • undefined behavior lurking everywhere (unsafe integer arithmetic, hooray!),
  • the built-in string datatype is a joke,
  • you must use vendor-specific extensions when you REALLY want to use all capabilities of the hardware (e.g., specialized bit instructions)
  • no module or namespace system (uses primitive text-substituting preprocessor)
  • no macro system (ditto, primitive text substitution)
  • no nested functions [no penalty if you don't use them]
  • no pattern matching / tuples / generics
...
  • safety (checked access to dynamically allocated blocks)
  • decent array / array slice type, also dynamically allocatable when needed
  • checked integer arithmetic
  • a better error handling system than manual checking of return codes (not necessarily exception-based)

From that list there are some that I completely agree with and some that I take issue with. The first few I agree with are the "no module or namespace system", "implicit conversions", and the "no macro system". These are huge problems with C in the modern world and while things like the module and namespace systems can be halfhazardly programmed into the language like in the method described previously the other pieces of having undefined math behaviour and no macro system really can cause a lot of problems. The next thing is the lack of pattern matching or even predicate matching in a language-supported way. It's stuff you can halfheartedly program in but there is a lot of heavy lifting to be done. It's so ubiquitous now to have good switch pattern matching that it is one of the first things you miss when working in C.

The checked access to dynamically allocated blocks is a huge issue that doesn't need much debate, but at the same time some of the things you need C to do might include situations where checking access is something you can't afford. It would be good if there was a standardized alternative to malloc that did this sort of thing, but at the same time there has to be at least ONE good language that doesn't. It is almost a chicken and the egg sort of problem and it just so happens C is the egg where checked dynamic memory comes from.

Array slicing is absolutely a great tool in a lot of other languages, but it isn't impossible to do in C, and you could do it without macros. See my array slice library.
The things I start to disagree with are the "built-in string datatype is a joke," as I feel it's not really as much of a problem. Yes it's tricky and hard to use but if you play by the rules everything works out fine. It's not impossibly hard to do strings right, and most people write a small utility library for strings early enough in their C experience that it ceases to be a problem, and also it works for them.

All in all while valid complaints, most of the things you can program into a language rather easily.

Why C is for me

Most languages can interface with C with limited problems.Why? Because they are written in C/C++ to begin with, or at least have some kernel that's in C. If you write a good function or set of functions in C you can carry them with you pretty much anywhere you go. I point to this very relevant post from the guy who wrote 0MQ. If you are trying to build a core feature or component and you want people to be able to use it you should heavily consider writing it in C, and that's something I do more often than not.

The key thing that C provides is that it has some really straightforward best practices to try to make code survivable. They are:

  • Check every parameter
  • Check every return value
  • Make sure to free() at the same level you malloc()

That's it. That's all you have to do and you can easily reign in most of the possible outcomes of a particular function. It's easy to remember and actually do. Did you get an OOM problem from malloc? Not a problem because you checked the return value and did something about it if it went south. Did you pass in a NULL pointer on accident? Doesn't matter because you checked it. What about memory leaks? I tread carefully and know when malloc() is called and I try to free() it at the same level I allocated it. No return value? It should work 100% every time and handle all of the loose ends of memory, forming a complete thought. If you want some other good ideas check out the CERT C Secure Coding Standard as it gives you a fairly explicit list of things not to do in your C program (and probably some good habits to get into).

Anyway my moral of the story is that C isn't magic, it gets the job done and is not some sort of insurmountable challenge to use reasonably well.