Any one up for OSX work.

Ah, I think I understand what’s going on. This is a static-init ordering problem.

This is one of those real nasty C++ problems; and one that I’m quite familiar with (we’ve been fighting it in various forms for years). It’s also one of the reasons I’m not at all a fan of doing a lot of stuff automatically in static init, but one of our early Panda developers thought this was a swell idea and started us down this path, and it’s too late to go back now.

Static init is a concept that was introduced with the development of C++ and its constructors. Originally, when all compiled programs were written in C or some similar non-object-oriented language, there wasn’t much code that ran before main() was called; just some startup stuff hardcoded into the system runtime libraries. C allows you to define global or “static” variables outside of any function scope, and even give them initial values, like this:

int x = 10;
int main() {
  ...
}

which means that at the time main() is called, x already exists and has the value 10. This was implemented by preloading a memory image that already had the right bits in the right place when it was loaded from disk; no code was necessary to run before main in order to assign 10 to x.

But, now introduce C++ and its constructors. Now you can declare an object outside of main that has a constructor. According to C++ semantics, that constructor has to be called to initialize that object, and thus you now have user code that is running before main:

class Thing {
  Thing() { cerr << "initializing\n"; }
};
Thing x;
int main() {
  cerr << "running main\n";
  return 0;
};

This caused a sea change in system library support, because suddenly the system runtime loader has to support calling user code automatically when a program is started, or even when a .so is loaded in at runtime.

But anyway. Part of Panda’s low-level design takes advantage of these static initializers to call all sorts of setup function when the libraries are loaded. init_libpgraphnodes() is one of those functions, and one of the things it calls is ShaderGenerator::set_default(new ShaderGenerator()). This gets called at static init time, by virtue of a class object with a constructor, and so it is supposed to be called automatically when libpgraphnodes.so gets loaded into the running program. So, we’re supposed to be guaranteed that the ShaderGenerator already has a default value set by the time we start running.

But wait! We also have a static constuctor in libpgraph.so. It looks like this:

PT(ShaderGeneratorBase) ShaderGeneratorBase::_default_generator;

Don’t see the static constructor? It’s hard to see, isn’t it? Welcome to the joys of C++, where code can be hidden from the programmer. In fact, there’s a default constructor for the class PT(ShaderGeneratorBase), and the default constructor’s job is to initialize its pointer to NULL.

So, as long as libpgraph.so’s static constructors are called before the ones in libpgraphnodes.so, then everything is good: the default constructor for _default_generator will be called, ensuring that pointer is NULL. Then the static constructors in libpgraphnodes.so will be called, which will call set_default(), reassigning the pointer to a valid value. But, if the static constructors happen to get called in the opposite order, we have a terrible situation: the set_default() will be called first, assigning the pointer to a valid value, and then the default constructor will be called later, reassigning the pointer to NULL! That’s certainly what’s happening here.

Unfortunately, the system does not guarantee any ordering of static init constructors between different .so’s. It’s absolutely unpredictable. So on one system, it might call these in the correct order, and on another system, it might call them in the incorrect order. The ordering might even change from one day to the next.

So, basically, I introduced this bug when I split up libpgraph.so and libpgraphnodes.so, because in doing so I introduced a nondeterministic behavior between these static initializers. But because C++ tries so hard to make things automatic, the bug is extremely hard to see until it bites you, and you spend days isolating it down to discover that a pointer is getting reset to NULL after you had thought it was properly set.

I’ll fix the bug now. It’s easy to fix, by replacing the PT(ShaderGeneratorBase) with an ordinary ShaderGeneratorBase * pointer. The reason this will fix the problem is that an ordinary pointer doesn’t have a constructor, so its default value will be set to NULL by preloading the memory image, and so there’s no longer an ordering issue between static initializers. (I’ll also have to explicitly manage the reference counts in set_default() to compensate for this change, but that’s not so bad.)

My apologies for the long trip down a dark corridor I caused you guys.

David