Thursday, October 30, 2008

Freaking memory management !

As I've been working more and more with multi-threading, the pain of the non-garbage collected memory management reached its peak.

In an ideal world, human would be robots, able to foresee the complex intricacies of code flow.

I remember when I started programming in C, it was on the Amiga. There, the official memory management functions where AllocMem() and FreeMem() ..with FreeMem() requiring the pointer that came from the allocation function plus the block size originally requested to the same function ..pretty dangerous indeed !
I spent a long time with that model in mind, keeping around the allocated memory size around. Computers were a lot slower. Memory resources were also very scarce, it was more important to write efficient code all the way than to code efficiently.

This is not unlike the situation right now with C/C++ programmers still doing a lot of memory management themselves.
I remember a shot discussion I had with some fellow developer about a year ago. We were talking about high level languages and he said something like "I don't understand what's the big deal with Java. You can do exactly the same things with smart pointers in C++ !"

..was he very wrong, indeed !!

There is a number of reasons why smart pointers and C++ are utterly inadequate. Smart pointers are nothing but using templates to wrap pointers with some extra code. Everyone has a different smart pointer, and they are all wrong in a way or another.

Another big issue is that most of those smart pointer implementations are reference counted.. which is a very simple way or managing resources, simple, limited and dangerous to depend on.
For a more detailed argument against smart pointers, I suggest the following article: Complexity++ (Notes on Works in Progress)

Another big problem is that with multi-threading, the smartest smart pointers are still bound to fail, unless one somehow refrains to use actual pointers altogether (and also forget about proper inheritance) and fill the code with those overloaded "beauties" (which BTW are painful to debug). A proper implementation, of course, would also have to deal with locking/unlocking or more direct atomic operations to avoid potential corruption of the internals of the smart pointer code.

In a practical world, mixing smart pointers and non-smart pointers will inevitably lead to some non-deterministic bug, but that's pretty much inevitable when dealing with threads and user interaction. partial solution to this was to manually implement some sort of garbage collection.
I have a concept of container, where objects from the 3D engine are stored.
Objects also have some sort fo reference counted smart pointer that will not delete them when the counter reaches 0, but rather consider them unused.

Then, at some fixed point in the main loop, the containers are checked and the objects that still have the reference count to 0, will be actually deleted, or finalized.. as it's called in garbage collection terms.

The point is that in a MT environment, but even simply in a cyclic reference case, where by cyclic reference it means two objects that point to each other (very common: doubly linked lists, trees..), it's too easy to accidentally get an object accidentally deletd, or deleted while some other routine/thread is still using it.
By postponing actual deletion to a specific "secure" time in the program flow, a lot of worries go away.

The book that helped to get me started (I'm really just at the ABC and not planning to go much further 8) is Garbage Collection: Algorithms for Automatic Dynamic Memory Management. *
I found this book in a library here in Tokyo by accident. It was expensive but it was well worth it. I bought it some time ago, but only really got busy with it recently as I needed to make sense of my memory management.
It's important to look into other material on the web, of course... but having a book on the subject is necessary, I believe.

Another day, another reason to dislike C++ 8)


* Disclamier: links to my Amazon Associates account !


  1. hmmm, i recently implemented a multithreaded system for generating command buffers for the GPU.

    The idea is that the engine queues up a list of very high level commands, like 'set lights' or 'set material' and then my system asynchronously uses those commands to build a GPU command buffer.

    You can choose how many hardware threads you want to dedicate to this system. (This is on xbox and ps3) so you are limited by the number of cores on the xbox and number of SPUs on ps3.

    ps3 implementation was of course the more complex of the 2, but in some cases the fact that SPUs can only work with data in their local store actually helped prevent this accidental reference of the same object by multiple threads.

    Also, because you need to dma data to and from the SPUs, it forces you to think more about data organization. For example if you have some array of light objects, you really want to isolate just the data that is needed to use those lights during rendering. Preferably you also want to be contiguous so you can minimize the dma overhead.

  2. Its funny how as a web developer, I totally overlook many of the hardships of programming traditional desktop apps. The applications I create live inside of a web browser, and I never think twice about memory, or hardware for that matter. Good luck with your app development.


  3. * Paul
    Working that way does indeed make one organize things better.. more cache friendly.
    Splitting into subsystems also helps to avoid troubles, but it's a pain to have to build interfaces for what practically becomes a remote function call (at least in my recent experience on PC, communicating with a discrete "GPU" directly).

    * PuReWebDev
    Web dev itself isn't necessarily all fun and games 8)

  4. I am not exactly sure what you mean by "interfaces for remote function call"...

  5. I mean, when a host processor has some data that needs to be processed, normally one would make a function that takes values in the stack, including pointers.

    When dealing with a separate processor (or a separate process), one has to "simulate" the parameter passing, especially when it comes to pointers.
    Data that is pointed to, needs to be physically transferred by means of DMA, TCP/IP or whatever protocol one is using 8)

    ..a formalization of something that we are already doing.
    APIs like CORBA and SOAP do those things, too, but at a higher level ..though those APIs are a lot more formal, complex, painful to use (I recently read an article on why CORBA failed.. "corporate IT" BS !!).

  6. Ah I see. Yes pointer 'dereferencing' incurs performance costs. I try to 'flatten' out the source structures as much as possible and in cases where its not practical I double buffer the processing of current command with 'dereferencing' data for the next command to hide the latency.