Friday, June 29, 2012

"How Do I Get C Namespaces?", "How do I get Python Slices in C?" and other questions

I admit I haven't been on proggit for a while because it often is very frustrating to try and enjoy reading some of the articles & comments on there recently. One good example was a link to an O'Reilly article & video that spawned a lot of really uninformed and mostly awful comments that either glorified C as some sort of arcane magic beyond the common man or problematic viewpoints that aren't helpful to anybody. The main comment thread I am referencing is here. I've broken down most of the posts into subsections since the post got rather long.

C is either Rocket Science or Magic
First off there is this post by carpalDebris:
... now i like C a lot. but when i program C i am constantly considering how many instructions i am generating, calculating my run time in terms of nano seconds, calculating my memory usage to the byte, considering what the throughput of my bus architecture is. this is NOT what the average programmer wants to do. heck it's not even what i want to do, it's what i have to do to accomplish my mission...
This is an insane perspective on using C, since unless really considering your final executable size is an actual issue as well as the thoroughput of the hardware under your code is the absolute true goal of your software it just isn't important. You can code in C just like any other language without worry about such features, and regardless you can't consider this sort of stuff while you are coding anyway. The Knuth quote "premature optimization is the root of all evil" is fitting here.

Next up is a rather long comment from aMindWellArmed:
A word of warning regarding C: learn assembly first. The reason is that C isn't actually to be thought of as a high-level language, but as a more expressive assembly language. The problem is that it does this in a manner which can only be described as arcane. With assembly, all you have to keep in your head are the assumptions about the details of the machine architecture you are using and the details you have about your own code. And since assembly is far from easy, you will be very diligent writing your code. With C however, you not only have to keep in your head what you have to for writing in assembly, but also intricate knowledge of different versions of the C standard AND the idiosyncrasies of pretty much all popular compilers, and sometimes even specific version of them. At the same time C looks and feels like a high-level language, so you will let down your guard down and write code with less diligence, introducing bugs which often only appear once you want to switch architecture.

Ignore people who say that C is easy. Claiming that C is easy is like claiming that brain surgery is easy: of course, anyone can perform brain surgery, the only difference will be with the patients' survival rate. C is harder than assembly.
...

I tried a few times to cut out some of the problems I had with this comment but I found it hard not to distort what he was trying to say. The C to assembler analogy is one of the worst things to bring up to new programmers. When you are compiling a C program you don't really care about the assembly at all unless you are doing math with large numbers and at that point you should be using a proper math library anyway like GMP. Yes, if you try to do craziness with packing structs in a certain way or really testing the limits of pointers you have to worry about the specifics of compilers and architectures but in general code it just isn't a problem. Since a student won't jump to doing these sorts of tricks you can definitely put this off until later.

With a bit of discipline it's fairly easy to write C code without significant problems as long as you just have in the back of your head that when some numbers get really big things could explode.

My feelings are best summed up by Whisper:
It makes me sad that this is the modern attitude. Those of us who learned on C don't consider it so... instead, we consider high-level languages to be easy and convenient, when you can get away with using them.

I have always thought that one should learn to program using the lowest level language possible, then move up the stack when convenient.
On Namespaces

There are quite a few people including myself who do miss namespaces in C. They are a great way to reign in complexity and make things a lot more readable and sensible. Xanny is right with "Big projects get ridiculously out of hand with crazy syntax like foo_bar_function() to duplicate having namespaces." There were a few other posts regarding this sort of thing as well.
I love having a good module system, and I'm definitely spoiled when I use C#, Java, Haskell or one of the other languages I know with namespace functionality. I just wish people didn't forget that namespaces are easy and possible with C. As you can see in this github namespaces are not only doable but rather straightforward.

I won't go into NS.c since it is just a simple definition of functions that someone familiar with C should read and understand easily. Instead I am going to focus on NS.h.

After the include guard there are two functions defined of func1 and func2 which are very straightforward and link up to the definitions in NS.c. Next up is a simple struct that has two function pointers inside:
struct NS_namespace
{
  void (*func1) ( void );
  int  (*func2) ( char* );
};

This, as one would expect matches up to the definitions of func1 and func2. Now we need a static instance of this to reference:
static const struct NS_namespace NS = { &func1,
                                        &func2 };

This essentially creates the namespace of "NS" to any file that includes NS.h. Let's see how we use it in main.c:
/* Let's try all the possible outcomes. */

NS.func1(); /* Since func1() returns void you can assume it to be right */

/* Call func2 and trigger error condition. */
if ( NS.func2( NULL ) < 0 ) {

  printf( "Error detected in calling func2!\n" );
}

/* Call func2 with a word. */
if ( NS.func2( "salt" ) < 0 ) {

  printf( "Error detected: Best Practice is to check return values.\n " );
}
If this doesn't remind you of namespaces I don't know what will. Plus in the Function pointers you can still accomodate having a traditional c prefix, but alias it to something simple:
/* Given a function */
void long_prefix_function_foo( void );

/* And a namespace struct */
struct Bar_namespace
{
   void (*foo) ( void );
};

/* Map the namespace */
static const Bar_namespace Bar = { &long_prefix_function_foo };

/* Use it! */
Bar.foo();

Easy to do, easy to write and not much more painful than explicitly labeling exports in a module system. I haven't done amazing amounts of research into the costs for doing this but for readibility it's great.

Let's Just Bolt a Spoiler to It

This is a set of honest to goodness good ideas to handle C programs better ralin suggests embedding Lua, and greenspans suggests embedding Guile. These are both reasonable options, I just have found that trying to sort out some of these embedded languages across systems in a portable and reasonable manner is just a lot of trouble. You really need to know how Lua and Guile work under the hood to port it fast and easily, as you usually have to find some contorted way to wrap and lift your C data into Lua and Guile's formats. It's not impossible but it's also pretty time consuming, and just makes it hard to 'get things done.'

Furthermore in my opinion most of these embedded languages don't really work 'side-by-side' with C but really turn into a case of you writing a program in another language with a lot of C extensions. I feel this is alright to be honest but my use cases in general don't often fit that. I have tried using Lua, Guile, and ECL at length (and probably some other options that I have forgotten about) and I never really feel like I nail the right feel of just having a common space where I can call functions with some functions being from the embedded language and some functions being in C. I think I need to go back to trying to have Haskell and C side by side a bit but I'm not entirely sure that would be as nice as one would hope.

Actual Issues

After you get past the initial molehills and start climbing the mountain of what you can do in C there are some actual issues. A decent roundup was done by zvrba:
  • implicit conversions (also mixing signed/unsigned arithmetic and comparisons),
  • no built-in data structures nor a standard library providing them,
  • undefined behavior lurking everywhere (unsafe integer arithmetic, hooray!),
  • the built-in string datatype is a joke,
  • you must use vendor-specific extensions when you REALLY want to use all capabilities of the hardware (e.g., specialized bit instructions)
  • no module or namespace system (uses primitive text-substituting preprocessor)
  • no macro system (ditto, primitive text substitution)
  • no nested functions [no penalty if you don't use them]
  • no pattern matching / tuples / generics
...
  • safety (checked access to dynamically allocated blocks)
  • decent array / array slice type, also dynamically allocatable when needed
  • checked integer arithmetic
  • a better error handling system than manual checking of return codes (not necessarily exception-based)

From that list there are some that I completely agree with and some that I take issue with. The first few I agree with are the "no module or namespace system", "implicit conversions", and the "no macro system". These are huge problems with C in the modern world and while things like the module and namespace systems can be halfhazardly programmed into the language like in the method described previously the other pieces of having undefined math behaviour and no macro system really can cause a lot of problems. The next thing is the lack of pattern matching or even predicate matching in a language-supported way. It's stuff you can halfheartedly program in but there is a lot of heavy lifting to be done. It's so ubiquitous now to have good switch pattern matching that it is one of the first things you miss when working in C.

The checked access to dynamically allocated blocks is a huge issue that doesn't need much debate, but at the same time some of the things you need C to do might include situations where checking access is something you can't afford. It would be good if there was a standardized alternative to malloc that did this sort of thing, but at the same time there has to be at least ONE good language that doesn't. It is almost a chicken and the egg sort of problem and it just so happens C is the egg where checked dynamic memory comes from.

Array slicing is absolutely a great tool in a lot of other languages, but it isn't impossible to do in C, and you could do it without macros. See my array slice library.
The things I start to disagree with are the "built-in string datatype is a joke," as I feel it's not really as much of a problem. Yes it's tricky and hard to use but if you play by the rules everything works out fine. It's not impossibly hard to do strings right, and most people write a small utility library for strings early enough in their C experience that it ceases to be a problem, and also it works for them.

All in all while valid complaints, most of the things you can program into a language rather easily.

Why C is for me

Most languages can interface with C with limited problems.Why? Because they are written in C/C++ to begin with, or at least have some kernel that's in C. If you write a good function or set of functions in C you can carry them with you pretty much anywhere you go. I point to this very relevant post from the guy who wrote 0MQ. If you are trying to build a core feature or component and you want people to be able to use it you should heavily consider writing it in C, and that's something I do more often than not.

The key thing that C provides is that it has some really straightforward best practices to try to make code survivable. They are:

  • Check every parameter
  • Check every return value
  • Make sure to free() at the same level you malloc()

That's it. That's all you have to do and you can easily reign in most of the possible outcomes of a particular function. It's easy to remember and actually do. Did you get an OOM problem from malloc? Not a problem because you checked the return value and did something about it if it went south. Did you pass in a NULL pointer on accident? Doesn't matter because you checked it. What about memory leaks? I tread carefully and know when malloc() is called and I try to free() it at the same level I allocated it. No return value? It should work 100% every time and handle all of the loose ends of memory, forming a complete thought. If you want some other good ideas check out the CERT C Secure Coding Standard as it gives you a fairly explicit list of things not to do in your C program (and probably some good habits to get into).

Anyway my moral of the story is that C isn't magic, it gets the job done and is not some sort of insurmountable challenge to use reasonably well.

12 comments:

  1. Your idea of using a struct and function pointers to simulate C++ namespaces at the syntactic level is Worse Than Useless. Yes, it lets you write Bar.foo() instead of Bar_foo(), but so what? By not being scared of underscores, I get the exact same amount of code readability and vastly greater performance (because the code doesn't have to indirect through a pointer every time it makes a function call).

    Plus, you're not even fixing the original problem with Bar_foo(), which is that long_names_are_cumbersome(). "Bar.foo()" might *look* more C++ish, but you can't do the equivalent of "using namespace Bar" to bring foo() into scope; you must refer to it as "Bar.foo()" on every invocation. Again, there's no savings versus simply referring to it as "Bar_foo()" on every invocation.

    ReplyDelete
    Replies
    1. Good idea, why have namespaces if we can just type "using namespace Bar" to have everything in the global scope?

      Now I understand why my school CPP classes forbade us for ever using "using namespace". And I get Torvalds and his ire for CPP even more...

      Delete
    2. > vastly greater performance

      Not unless by "vastly" you mean "not at all". Compile with -O3, look at the assembly:

      https://gist.github.com/3033687#file_test_o3.s

      Line 11 and 12 are equal. Thus is the power of const and -O3.

      Hraban

      Delete
    3. This was my assumption and I am still trying to make sure I do a proper testing of all possible optimizations I could miss out on as well as other failures of my dumb technique. I had the assumption since it was a const value that a sensible compiler would just do a replacement like your code sample suggests.

      I hope to finish my analysis soon and I'll post another update soon but I am trying to really do a fair and proper test.

      Thanks for your comment,

      xemdetia

      Delete
    4. I made test of compiling the example for AVR with avr-gcc and found that generated code did not use function pointers at all, it called functions directly. I'm sure this is because static const is used: I know that avr-gcc (or maybe just gcc) is very good in optimizing (read: inlining) static functions.

      Delete
  2. The critical feature of namespaces is not granting access to some functions, but prohibiting access to others. If a namespace/module is not imported, I cannot call it, which greatly aids my reasoning about what could be going wrong in my code.

    Your dumb idea does not have this feature. It looks like namespaces, while providing none of the benefits of namespaces save for looking like them.

    ReplyDelete
    Replies
    1. I never understood this whole concept of "prohibiting access to functions":

      You are writing the code, you should KNOW when a function is or is not called, you should be able to limit the instances in which a function is called, all because the code is yours.

      Maybe it rings true in a paradigm like CPP (because object B is not supposed to call a function from object A etc...), but the same reasoning happens: if object B calls a prohibited functions from object A in your code, you will never compile. So why in the first place did you call this function?

      So really, enlighten me (no joke, I really want to understand this), why would you need to prohibit access to a function when YOU are the one deciding which function is called and when? It just bugs me, and strikes me as bad programming.

      Delete
    2. Apparently you've never heard of building libraries for other's use, or even working with a team in general. God forbid you actually build something useful enough for someone else to use.

      Delete
  3. I liked your idea. It's an option; "using namespace Food;" is no good anyway.

    ReplyDelete
  4. Thank you.

    I've been using C for two years now and despite all my love for javascript, it's still my "de facto" language. Despite everything people find "flawed" in it, I still love it.

    - No support for strings? I don't need strings when I can manipulate data to the byte.
    - I need to malloc stuff myself? Good, I'll then be sure I use the right data when juggling my pointers around.
    - No namespaces? I coded a whole shell and never found myself needing namespaces.

    C is not "rocket science", people have been so used to scripting languages that they don't understand what it all boils down to, the same way people that (ab)use jQuery don't know what "document.selectElementById()" or even the concept of DOM nodes really mean... (And I've found the harsh way that there are a lot of people like this...)

    C does what you tell it to do, C will be strict in asking you to not mix types, C will make sure mostly everything is okay before being given a chance to run, C will talk to your kernel directly,...

    To C is to Love =)

    ReplyDelete
    Replies
    1. Thank you for your kind post. I've gotten a lot of negative feedback that I hopefully can address sooner rather than later. Still though I think I made the point I wanted to make, and getting trying to address the misconceptions I always see come up from people who learned a dynamic language first.

      Delete
    2. I kinda have the same background. My first self-taught languages were php and...actionscript 3. While I quickly understood them, I never got the mastery I got when I was taught the C in computer classes: prototypes, return values,...

      Where php and as3 let me do multiple mistakes (and were mostly complaining about scope or references to non-existing variables), the C language was HARD with me, always complaining about types (like casting the void * returned by calls to dlsym to a pointer to function), undefined references, uninitialised variables, unused return types, unchecked return values,...

      I was mallocing my own memory, I was juggling with my own pointers, I was playing with my own bytes,... And when your school forbids you to use printf() for one year to make sure you know how to use write(), you're so close to the kernel you could smell it.

      And to me, that's what C is about: pure control. Unfortunately, even CPP doesn't have this level of control and everything just looks like "C for people who want it easy". No, I don't want a "new", I want to control my memory and be able to check if something went wrong so I can go another branch, not use exceptions to rewind my stack. Why do I need to catch an exception for memory allocation failure when I can just check if (ptr == NULL)?

      I don't blame dynamic languages, nor do I hate them.
      I love php, I love javascript (though I still don't get some of its memory/scope use and miss pointers sometimes...),... I just know that I don't have the same level of control, I know I have it easy.

      And because my school also taught me about rigor and code cleanliness (and especially because I'm a geek/nerd), I want to have control everywhere, I want to KNOW what my application is doing underneath, I want to know I'm out of bounds, I want to be sure I get a specific type before sending it to another function,...

      What's not to love about C? :)

      Delete