Buffer protocol support

I’d like to talk for a moment about the new buffer protocol support in the latest development version of Panda3D. It’s not a particularly exciting feature, but it can be an important one, especially if you use Panda3D together with other libraries like NumPy or if you need to do a lot of low-level operations on texture or geometry data from Python. However, most use cases will not require this functionality.

The Python buffer protocol is a way for Python applications to get a direct hook into C/C++ memory, arrays in particular. Panda3D classes that support it provide a pointer into the underlying memory to the Python interpreter along with a description of how the data is laid out in memory. This description is necessary for Python to know how to access and copy the information.

Starting with Python 2.6, you can use the built-in multiview type to access the memory underlying an object exposing the buffer interface. You can then manipulate the data by converting it to a list or array.array object, creating sub-views and operating on those, writing or reading parts of it to a file, or even just modifying the memory directly as if it were a regular Python list.

Right now, the only Panda3D classes that expose the buffer interface are GeomVertexArrayData and PointerToArray (the latter of which is used for most array storage purposes in Panda3D, including textures), but more classes can easily be added on request. Conversely, Panda allows taking an existing buffer object (such as from array.array or a NumPy array) as source data for a texture or a vertex data array.

When copying data to other libraries such as NumPy, this can help cut down on unnecessary copy operations. Presently, you would call a method like get_data() to create a C++ string, which is one copy operation already, which would then be wrapped into a Python string, which is another. Finally, you would pass this string to NumPy, which would perform at least one more copy operation to copy it into its own representation. But since NumPy also supports the buffer protocol, you can now copy the contents of a texture or of a vertex data array straight into a NumPy array without any unnecessary copy operations.

One other interesting use case for this feature is the fast and efficient manipulation of vertex data and texture data from Python. Instead of having to create a GeomVertexRewriter or a PNMImage to modify the respective data, you can now create a memoryview to iterate over the data directly, easily copy subsets around, or page them out to disk. In my own use case, which involved a lot of geometry generation and manipulation, this flexibility allowed me to dramatically decrease the time spent generating geometry and flattening it. Direct access to the memory also allowed me to quickly page chunks of geometry data out to disk and back into memory when necessary.

The buffer protocol provides a lot of the flexibility of low-level memory access without being exposed to all of the intricacies of C/C++ memory management. In particular, the data is reference counted, so you don’t need to worry about deleting the data. You should in fact be able to keep multiviews around for non-immediate consumption, but keep in mind that you may still need to tell Panda3D when you’ve modified the data later (for instance with an additional explicit call to modify_ram_image()).

This feature will be available in the 1.9.0 release of the Panda3D SDK.

Triple your frame rate?

Historically, Panda has always run single-core. And even though the Panda3D codebase has been written to provide true multithreaded, multi-processor support when it is compiled in, by default we’ve provided a version of Panda built with the so-called “simple threads” model which enforces a single-core processing mode, even on a multi-core machine. But all that is changing.

Beginning with the upcoming Panda3D version 1.8, we’ll start distributing Panda with true threads enabled in the build, which enables you to take advantage of true parallelization on any modern, multi-core machine. Of course, if you want to use threading directly, you will have to deal with the coding complexity issues, like deadlocks and race conditions, that always come along with this sort of thing. And the Python interpreter is still fundamentally single-core, so any truly parallel code must be written in C++.

But, more excitingly, we’re also enabling an optional new feature within the Panda3D engine itself, to make the rendering (which is all C++ code) run entirely on a sub-thread, allowing your Python code to run fully parallel with the rendering process, possibly doubling your frame rate. But it goes even further than that. You can potentially divide the entire frame onto three different cores, achieving unprecedented parallelization and a theoretical 3x performance improvement (although, realistically, 1.5x to 2x is more likely). And all of this happens with no special coding effort on your part, the application developer–you only have to turn it on.

How does it work?

To use this feature successfully, you will need to understand something about how it works. First, consider Panda’s normal, single-threaded render pipeline. The time spent processing each frame can be subdivided into three separate phases, called “App”, “Cull”, and “Draw”:

app, cull, draw

In Panda’s nomenclature, “App” is any time spent in the application yourself, i.e. your program. This is your main loop, including any Python code (or C++ code) you write to control your particular game’s logic. It also includes any Panda-based calculations that must be performed synchronously with this application code; for instance, the collision traversal is usually considered to be part of App.

“Cull” and “Draw” are the two phases of Panda’s main rendering engine. Once your application code finishes executing for the frame, then Cull takes over. The name “Cull” implies view-frustum culling, and this is part of it; but it is also much more. This phase includes all processing of the scene graph needed to identify the objects that are going to be rendered this frame and their current state, and all processing needed to place them into an ordered list for drawing. Cull typically also includes the time to compute character animations. The output of Cull is a sorted list of objects and their associated states to be sent to the graphics card.

“Draw” is the final phase of the rendering process, which is nothing more than walking through the list of objects output by Cull, and sending them one at a time to the graphics card. Draw is designed to be as lightweight as possible on the CPU; the idea is to keep the graphics command pipe filled with as many rendering commands as it will hold. Draw is the only phase of the process during which graphics commands are actually being issued.

You can see the actual time spent within these three phases if you inspect your program’s execution via the PStats tool. Every application is different, of course, but in many moderately complex applications, the time spent in each of these three phases is similar to the others, so that the three phases roughly divide the total frame time into thirds.

Now that we have the frame time divided into three more-or-less equal pieces, the threaded pipeline code can take effect, by splitting each phase into a different thread, so that it can run (potentially) on a different CPU, like this:

app, cull, draw on separate threads

Note that App remains on the first, or main thread; we have only moved Cull and Draw onto separate threads. This is important, because it means that all of your application code can continue to be single-threaded (and therefore much easier and faster to develop). Of course, there’s also nothing preventing you from using additional threads in App if you wish (and if you have enough additional CPU’s to make it worthwhile).

If separating the phases onto different threads were all that we did, we wouldn’t have accomplished anything useful, because each phase must still wait for the previous phase to complete before it can proceed. It’s impossible to run Cull to figure out what things are going to be rendered before the App phase has finished arranging the scene graph properly. Similarly, it’s impossible to run Draw until the Cull phase has finished processing the scene graph and constructing the list of objects.

However, once App has finished processing frame 1, there’s no reason for that thread to sit around waiting for the rest of the frame to be finished drawing. It can go right ahead and start working on frame 2, at the same time that the Cull thread starts processing frame 1. And then by the time Cull has finished processing frame 1, it can start working on culling frame 2 (which App has also just finished with). Putting it all in graphical form, the frame time now looks like this:

The fully staged render pipeline

So, we see that we can now crank out frames up to three times faster than in the original, single-threaded case. Each frame now takes the same amount of time, total, as the longest of the original three phases. (Thus, the theoretical maximum speedup of 3x can only be achieved in practice if all three phases are exactly equal in length.)

It’s worth pointing out that the only thing we have improved here is frame *throughput*–the total number of frames per second that the system can render. This approach does nothing to improve frame *latency*, or the total time that elapses between the time some change happens in the game, and the time it appears onscreen. This might be one reason to avoid this approach, if latency is more important than throughput. However, we’re still talking about a total latency that’s usually less than 100ms or so, which is faster than human response time anyway; and most applications (including games) can tolerate a small amount of latency like this in exchange for a smooth, fast frame rate.

In order for all of this to work, Panda has to do some clever tricks behind the scenes. The most important trick is that there need to be three different copies of the scene graph in different states of modification. As your App process is moving nodes around for frame 3, for instance, Cull is still analyzing frame 2, and must be able to analyze the scene graph *before* anything in App started mucking around to make frame 3. So there needs to be a complete copy of the scene graph saved as of the end of App’s frame 2. Panda does a pretty good job of doing this efficiently, relying on the fact that most things are the same from one frame to the next; but still there is some overhead to all this, so the total performance gain is always somewhat less than the theoretical 3x speedup. In particular, if the application is already running fast (60fps or above), then the gain from parallelization is likely to be dwarfed by the additional overhead requirements. And, of course, if your application is very one-sided, such that almost all of its time is spent in App (or, conversely, almost all of its time is spent in Draw), then you will not see much benefit from this trick.

Also, note that it is no longer possible for anything in App to contact the graphics card directly; while App is running, the graphics card is being sent the drawing commands from two frames ago, and you can’t reliably interrupt this without taking a big performance hit. So this means that OpenGL callbacks and the like have to be sensitive to the threaded nature of the graphics pipeline. (This is why Panda’s interface to the graphics window requires an indirect call: base.win.requestProperties(), rather than base.win.setProperties(). It’s necessary because the property-change request must be handled by the draw thread.)

Early adopters are invited to try this new feature out today, before we formally release 1.8. It’s already available in the current buildbot release; to turn it on, see the new manual page on the subject. Let us know your feedback! There are still likely to be kinks to work out, so we’d love to know how well it works for you.

Panda3D and Cython

This is about how to speed up your Python Code, and has no direct impact on Panda3D’s performance. For most projects, the vast majority of the execution time is inside Panda3D’s C++ or in the GPU, so no matter what you do, fixing your Python will never help. For the other cases where you do need to speed up your Python code, Cython can help. This is mainly addressed to people who prefer programming in Python, but know at least a little about C. I will not discuss how to do optimizations within Python, though if this article is relevant to you, you really should look into it.

Cython is an interesting programming language. It uses an extended version of python’s syntax to allow things like statically typed variables, and direct calls into C++ libraries. Cython compiles this code to C++. The C++ then compiles as a python extension module that you can import and use just like a regular python module. There are several benefits to this, but in our context the main one is speed. Properly written Cython code can be as fast as C code, which in some particular cases can be even 1000 times faster than nearly identical python code. Generally you won’t see 1000x speed increases, but it can be quite a bit. This does cause the modules to only work on the platform they were compiled for, so you will need to compile alternate versions for different platforms.

By default, Cython compiles to C, but the new 0.13 version supports C++. This is more useful as you probably use at least one C++ library, Panda3D. I decided to try this out, and after stumbling on a few simple issues, I got it to work, and I don’t even know C++.

Before I get to the details, I’ll outline why you might want to use Cython, rather than porting performance bottlenecks to C++ by hand. The main benefit is in the process, as well as the required skill set. If you have a large base of Python code for a project, and you decide some of it needs to be much faster, you have a few options. The common approach seems to be to learn C++, port the code, and learn how to make it so you can interface to it from python. With Cython, you can just add a few type definitions on variables where you need the performance increase, and compile it which gives you a Python modules that works just like the one you had. If you need to speed up the code that interfaces with Panda3D, you can swap the Python API calls for C++ ones. Using Cython allows you to just put effort into speeding up the parts of code you need to work on, and to do so without having to change very much. This is vastly different from ditching all the code and reimplementing it another language. It also requires you to learn a pretty minimal amount of stuff. You also get to keep the niceness of the Python syntax which may Python coders have come to appreciate.

There are still major reasons to actually code in C++ when working with Panda, but as someone who does not do any coding in C++, I won’t talk about it much. If you want to directly extend or contribute to Panda3D, want to avoid redundantly specifying your imports from header files (Cython will require you to re-specify the parts of API you are using rather than just using the header files shipped with Panda), or you simply prefer C++, C++ may be a better option. I mainly see Cython as a convenient option when you end up needing to speed up parts of a Python code-base; however, it is practical to undertake large projects from the beginning in Cython.

Cython does have some downsides as well. It is still in rather early development. This means you will encounter bugs in its translators as well as the produced code. It also lacks support for a few Python features, such as most uses of generators. Generally I haven’t had much trouble with these issues, but your experience may differ.

Cython does offer an interesting side benefit as well. It allows you to optionally   statically type variables and thus can detect more errors at compile time than Python.

To get started, first you need an install of Cython 0.13 (or probably any newer version). If you have a Cython install you can check the version with the -V command. You can pick up the source from the Cython Site, and install it by running “python setup.py install” from the Cython source directory. You will also need to have a compiler. The Cython site should help you get everything setup if you need additional guidance.

Then you should try out a sample to make sure you have everything you need, and that it’s all working. There is a nice C++ sample for Cython on the Cython Wiki. (This worked for me on Mac, and on Windows using MinGW or MSVC as a compiler).

As for working with Panda3D, there are a few things I discovered:

  • There are significant performance gains to be had by just compiling your existing Python modules as Cython. With a little additional work adding static typed variables, you can have larger performance gains without even moving over to Panda’s C++ API (Which means you don’t need to worry about linking against Panda3D which can be an issue).
  • Panda3D already has python bindings with nice memory management, so I recommend instancing all the objects using the python API, and only switching to the C++ one as needed.
  • You can use the ‘this’ property on Panda3D’s Python objects to get a pointer to the underlying C++ object.
  • On mac, you need to make sure libpanda (and is some cases, possibly others as well) is loaded before importing your extension module if you use any of Panda3D’s libraries.
  • On Windows, you need to specify the libraries you need when compiling (in my case, just libpanda)
  • The C++ classes and Python classes tend to have the same name. To resolve this, you can use “from x import y as z” when importing the python ones, or you can just import panda3d.core, and use the full name of the classes (like panda3d.core.Geom). There may be a way to rename the C++ classes on import too.
  • If using the Panda3D C++ API on Windows, you will need to use the MSVC compiler. You can get Microsoft Visual Studio 2008 Express Edition for free which includes the needed compiler.

Using this technique I got a 10x performance increase on my code for updating the vertex positions in my Geom. It avoided having to create python objects for all of the vertexes and passing them through the Python API which translates them back to C++ objects. It was just a matter of moving over one call in the inner loop to the other API. This, however, was done in already optimized Cython code that was simply loading vertex positions stored in a block of memory into the Geom. Most use cases would likely see less of a benefit. Overall though, I gained a lot of performance both from the change over to Cython, and from the change over to the C++ API. These changes only required relatively small changes to the speed critical portions of my existing python code.

I made a rather minimal example of using a Panda3D C++ API call from Cython. Place the setup.py and the testc.pyx files in the same directory, and from the said directory, run setup.py with your Python install you use with Panda3D. If everything is properly configured, this should compile the example Cython module, testc.pyx, to a python extension module and run it. If it works, it will print out a few lines ending with “done”. It is likely you may need to tweak the paths in setup.py. If not on Mac or Windows, you will get an error indicating where you need to enter your compiler settings (mostly just the paths to Panda3D’s libraries).

I would like to thank Lisandro Dalcin from the Cython-Users mailing list who helped me get this working on Windows.

Pandai Library: A Quick Review of panda3d.ai

The Entertainment Technology Center (ETC) at Carnegie Mellon University in 2009 launched a graduate student project to add a collection of artificial intelligence behaviors like seek, flock, and evade along with 2D pathfinding to Panda3D.  This blog post is a reminder that this work is available as part of Panda 1.7.0, and receives ongoing attention from the ETC team.  The timeline for this work has been as follows:

  • Summer 2009: Collect feedback from the Panda3D community via Pandai forum post on requirements for an AI library. This forum post remains active today to address additions to the Pandai code base.
  • August – December 2009:  Start with Craig Reynolds’ published work on flocking behavior and A* algorithm for two-dimensional pathfinding between points. Develop the C++ code, based on community feedback, and prototyping with Building Virtual Worlds graduate student work at the ETC.
  • December 2009:  At the insistence of ETC Faculty advisors Ruth Comley and myself, the Pandai student team created a number of demonstrations and a detailed Pandai ETC Project Web site documenting the project work. The demonstrations show capabilities, such as a fish demonstration that shows wander, pursue, and evade.  The project web site includes further descriptions on the project team, motivations for the work, and downloadable content, including the art assets (like the fish) and code (the fish demo) needed to run demonstrations.  See Pandai ETC Project download page.

    Pandai demo: fish pursue hook until one is caught, then evade it

    Pandai demo: fish pursue hook until one is caught, then evade it

  • January 2010: Thanks to rdb, the Pandai library was published as part of the Panda3D 1.7.0 release. One change of note regarding the downloadable examples from the ETC Pandai Project web site: rather than “from libpandaai import *” the Python code should use “from panda3d.ai import *”. With this minor edit, you will be able to download and run the fish demo and others using Panda 1.7.0.
  • July 2010: Ongoing collection of feedback from the Pandai forum post led to the release of a Blender meshgen tool for pathfinding. A link to this tool has been added to the Pandai ETC Project download page.

The Pandai ETC team is responsible for version 1.0 of the Pandai library, and remain active in its support. Your comments are welcome here regarding the Pandai effort and shared code and examples. For help requests, continue the thread within the Pandai forum post.

Porting to Java

This was an April Fool’s Joke. The information in it is not meant to be taken seriously. Click the post title if you want to see it.

Panda SE Project

During the past few months, several students at Carnegie Mellon University’s Entertainment Technology Center (ETC) have been working on improving the egging process as well as incrementally improving the shader system.  Just take a look at their smiling faces!

Panda SE Team Photo

From Left: Wei-Feng Huang, Federico Perazzi, Shuying Feng (Panda), Deepak Chandrasekaran, Andrew Gartner

For those of you that have been with Panda 3D a long time you’ll know that there have been ETC Panda 3D projects in the past.  Some of them have had limited success due to an oversized project scope.  This project will instead focus on making complete feature sets rather than half implemented pieces like those past unsuccessful projects.  It will also focus on documentation both within the code and the manual to make sure that you, the Panda community will be able to take their work and build on top of it.

With that said, this project will primarily focus on two things:

  1. The shader inputs
  2. The egging/model exporting process

Shader Inputs

If you’ve taken a look at the source code of Panda 3D’s shader system and have had any experience in professional game engine development, you’ll notice that it’s a system that isn’t implemented fully.  Actually, the first shader system was an ETC student project and it has since then been improved through other ETC projects and the Panda 3D community.  Shader inputs is continuing this work in a structured manor.

Shaders have supported the input of arrays and arrays of vectors for quite some time.  However, Panda 3D has never supported this.  There have been some hacks in the past where arrays are passed as textures, but this is not ideal for performance and it ruins texture caching schemes.  After this project completes, users will be able to input arrays and arrays of vectors/matrices directly into the shader.

Screenshot of multiple lights demo

Screenshot of multiple lights demo

This may not seem that exciting at first but this lays the groundwork for many more things.  If your new to computer graphics having a complete shader inputs system allows for some of the following just to name a few.

  • Hardware accelerated actors/characters
  • Shader based instancing with dynamic texture and animation support (crowds)
  • Shader based vegetation system (fast trees and grass)
  • A real deferred shading system
  • A real light manager system for shader based lights

A Real Egging Pipeline

Up until now, there have been several attempts at user interfaces to the maya2egg, dae2egg, etc.  Most of them are just simple user interfaces to the command line equivalents of them.  This new user interface is much more than that.  It is an artist friendly build system.  Just check out some of the features.

  • Simple mode for when you don’t want a build system
  • Support for multiple maya versions
  • Support for egg tools such as egg-opt-char and egg-palettize
  • A batching system that automatically detects whether a file has been changed to allow for minimal rebuilds
  • Support for all tools to be built into batch system
  • Save/Load batch scripts

Like shader inputs this lays the groundwork for much future work.  For any game engine to be professional quality, it needs a set of robust artist tools such as node-based shader generators and artist friendly level editors.

Screenshot of WIP Egging GUI

Screenshot of WIP Egging GUI

ABI Backward Compatibility

Hey C++ developers of Panda3D,

I’ve just checked in a fix to the codebase that should give minor releases a backward compatible ABI. This means that if you link something against the Panda3D 1.8.0 libraries, you’ll still be able to use it with libraries of any Panda3D 1.8.X release. This rule was created to make C++ users able to use the web plugin functionality.

To the people working directly on the Panda3D codebase: do not merge anything onto the release branch (e.g. panda3d_1_7_branch) that is not backward ABI compatible. You can merge in new symbols, but you cannot merge altered or removed symbols. This rule does not apply to the trunk – you can do whatever you want on the trunk as there are ABI rules there. Of course, these rules don’t affect Python code; just exposed C/C++ symbols.

But you really don’t need to worry about any of this unless you actually want to merge things onto the release branch – and this is usually done by the release maintainer anyways.

As for linking to libraries on non-Windows systems: libraries like libpanda.so / libpanda.dylib will now symlink to libpanda.so.1.7 / libpanda.1.7.dylib. This ensures that if you link to libpanda, it will link against the 1.7 version of the library and won’t conflict with libraries of any other series. This allows you to have multiple series of Panda3D installed at the same time and run different games that are linked against different series of Panda.

The latest buildbot releases should already abide by these rules. I’m going to put up a buildbot script soon that regularly builds the release branch and alerts when the ABI compatibility is broken. (E-mail me if you want to be put on the buildbot e-mail notify list.) The next release, 1.7.1, will start being ABI compatible.

Have fun!

Pointer Textures

With the introduction of Panda3D 1.7.0, there is now a very powerful but also very dangerous feature that is now available to the public.  This feature has been used internally at Walt Disney Imagineering for some time now, but it is now available to everyone.  We call these Pointer Textures.  It allows a user within python to give a long int to Panda3D, and Panda will cast this to a pointer and without question, upload the data at this pointer to the graphics card!

You can see why this is dangerous.  This bypasses any checks or copies within Panda and is a direct gate to the graphics card.  This means that if you do something wrong, your application will crash.  No asserts, no error messages, it will just flat out crash.  That should be about the extent of it, but I’d be lying if I said I didn’t cause a few BSODs here and there with this technique.   Now that we have this little disclaimer out of the way, what is it good for?

For starters, it’s really fast.  The current MovieTexture implementation actually does three copies of the same data within memory.  Once from the decoder to system memory, once to Panda safe memory, and then another to the graphics card.  This certainly works and the performance is quite reasonable.  Where this fails is when you have large amounts of video data like 1080p content. That’s a lot of data to be unnecessarily moving around.  For these applications, we use Pointer Textures.

We have an external python module that decodes movies using DirectShow and through pointer textures, we can display it in Panda as a texture.  This has side benefits as well.  Since DirectShow is multithreaded and a python module is not locked to the main python thread, DirectShow can decode the movies using different cores.  This makes it a truly multi-threaded application. On a machine with 4 cores, I’ve managed to squeeze two simultaneous 720p videos through without problems.

Since pointer textures allow you direct access to graphics card memory, it also means that you can load anything as long as you have the pointer to the data and the correct format.  Using this we’ve implemented Image Based Lighting via HDR textures.

Another added benefit to using pointer textures is that you don’t have to have to modify Panda to get special features.  For instance, if you wanted to load an image format that is not currently supported by the texture loader, you would have to setup a Panda build environment, get the third party libs, and then modify the texture loader.  Now you don’t have to.  Just compile a python module via Swig or via the C Python API and make a function that returns the pointer as a long integer.

This goes far beyond static images or movies as well.  Since it’s just data at a pointer, this method also works with special devices.  Using this, we have connected Panda to specialized cameras, webcams, TV Tuner cards and even capture cards.  The possibilities are endless.

If you want to start using this, here is a quick example that just loads a HDR pfm texture and displays it. It only works on Windows with Panda3D 1.7.0. I’ve included the source code for the python module as well if you are feeling like you want to compile that too. Simply run PfmTexture.py and you’ll should see the demo.

Hardware Geometry Instancing

Recently, I have been working on a little Infinite Terrain demo that explains how to take full advantage of Panda3D’s terrain capabilities. In the process I have been adding various features to Panda3D that would make it easier for me, including many improvements to the Shader Generator. (On a sidenote, I didn’t end up needing any shaders for the terrain.)
Last Monday, I added trees to the terrain. As this is an infinite terrain demo, I needed to add quite an amount of trees. But I found that (even with Panda3D’s flattening capabilities) my GPU quickly let me down after a few thousand trees. So, I realized I needed to add geometry instancing support to Panda3D.
And so I did. It turned out to be quite trivial to implement and only took me an hour or two. The results are quite pleasing! I have managed to render over 100000 trees (and a huge terrain) at the same time at a reasonable framerate! Here’s a screenshot of what it looks like right now:


Of course, that WIP scene could use a lot of improvement, but you get my point. And with some proper culling and LOD, I could push the amount of trees even higher.

But doesn’t Panda3D already support instancing?

Currently, Panda3D supports instancing of animated models. That is entirely unrelated to geometry instancing. The existing instancing system only exists to improve performance if you have a lot of animated models, by reducing the amount of vertex displacements that are done by Panda3D’s animation system. Geometry instancing, on the other hand, exists to greatly reduce the amount of data that is passed to the video card. Whether the model is animated or not is irrelevant with the new instancing system.

How does it work?

Before yesterday, if you wanted to create multiple instances of a model, you’d either load the model multiple times or use copyTo. I’ve seen that many people use instanceTo in this case, but that will have no positive effect on performance for static models. The geometry will still be passed many times to the GPU, which is a slow process.
With the new system, you keep just a single copy of the model and call setInstanceCount(n) on it. This means that it will still be passed to the GPU only once, but it will be rendered n times.

You might be wondering, how do I give each node different parameters or a different position? Well, that can be done in the shader. You can access the instance ID in the shader and calculate the position, color, etc. based on that, or simply use a different model projection matrix from an array of transform matrices that you pass to the shader. This allows you to do basically anything. You can send a single sphere to the GPU, passing a 3D texture with a displacement map in each layer, and set the instance count to 1024. That will result in 1024 unique rocks using just one batch call and a very limited amount of uploaded geometry.

This leaves the question of whether this will actually work without use of shaders. The answer is no, I’m afraid. There is an OpenGL extension that allows you to use geometry instancing with the fixed-function pipeline, but very few video cards support it. Because it would be quite complicated to implement that, I decided not to do it.

Note that this is only supported in OpenGL so far. Maybe someone will add support for this to Panda’s DirectX side someday.

Import system for C++ modules

I’ve just checked in support for a shiny new ‘panda3d’ module that better organizes the C++ classes of Panda3D. This feature has been requested for a while now as many were annoyed with the long, unorganized and imports through pandac.PandaModules. Basically, it now allows you to import Panda3D classes similar to this:

from panda3d.egg import EggData
from panda3d.ode import OdeJoint
from panda3d.core import *

The current system

There are a number of Panda3D dynamic libraries that contain a Python module, for example libp3direct, libpandaexpress, libpanda, libpandaegg and libpandaode. You can directly import them from Python by importing them by their library name, and thus access the wrapped classes and functions that this library exposes to CPython.

However, not all of the functions for those classes are implemented in C++. There are a bunch of functions that are implemented in Python that provide extra functionality to the C++ classes. Those methods are just convenience methods to make your life easier, or exist to make an interface more Pythonic. Some of these methods have already been deprecated in favor of implementing it on the C++ side (or in the wrapper generator tool, interrogate.)

These extension functions are defined in pandac/libnameModules.py, where ‘libname’ is the name of the library in question. Once you import pandac/PandaModules.py, the Python code is imported from all of Panda3D’s libraries, and the extension functions are added to the classes.
The extension functions are not the only reason why the ‘pandac’ tree exists. I’ve just said that you can directly import a dynamic library from Python, but that is not entirely true – it is not possible anymore, as of Python 2.5, on Mac and Windows. (On Windows, you can still import it by renaming the dll into pyd, but that trick doesn’t work for Mac OSX dylibs.)

Therefore, we need to manually import the library by locating it first and then directly loading it via the load_dynamic function in Python’s imp module. The code to do this has been added to ‘pandac’, as that’s the place where the Panda3D libraries are usually imported.

The new system

In the new system, instead of importing the class in question from pandac.PandaModules, you will import the class from a submodule of the ‘panda3d’ module.
We did not group the classes by their source directory, as was proposed. Organizing the ‘panda3d’ module in such a way that the sub-packages represent source directories would make it needlessly complicated.
Instead, we chose to group classes by the C++ dynamic library. The name of the submodule is defined by the name of the library without the “libpanda” or “libp3″ prefix. For instance, libpandaegg maps to panda3d.egg, and libp3direct maps to panda3d.direct.

There is one exception for the libpanda and libpandaexpress libraries, they are merged into one module, panda3d.core. We chose to do this because libpanda and libpandaexpress contain the Python wrappers for the core of Panda3D – you will need these libraries to do anything useful with Panda3D. Furthermore, the distinction between libpanda and libpandaexpress is arbitrary and not meaningful to the developer.
When more libraries are added later, such as libpandaphysx, and libpandaai, we could simply add those to the list.

As for the direct tree, we have left it in it’s place. It has been suggested that it be merged into the panda3d module, but we decided not to do it, as it would become needlessly complicated and confusing. Furthermore, I think there is a benefit in keeping Panda3D’s Python modules and C++ modules separated.
We do acknowledge the unfortunate naming of the direct tree, and we might change that for a future release, but that would be a major change.

The new import system is all bundled into one file, panda3d.py. This monstrosity of Python black magic does everything needed – it locates the Panda3D libraries, adds their location to the system’s library path, registers phony modules, and dynamically loads the required libraries when somebody starts using a class from it.
As for the Python extension functions, those are not supported in the new system. The reason for this is that we are trying to phase it out, and try to move most functions to the C++ side of Panda3D.

Another cool feature is that the ‘panda3d’ module implements a lazy-loading system for it’s libraries. That means that the libraries will only get loaded when you actually use one of it’s classes. This has the advantage that libraries you don’t use will not get loaded, which can result in a slight speed and memory gain.


Now, why did we do it? Besides the fact that it has been requested various times, there are a few benefits to it. First of all, it is better organized, allows people to type it faster and more easily, and it leads to cleaner code. Furthermore, you don’t import the libraries you don’t need, leading to slightly faster import times, and slightly less memory usage. But even more importantly, because we need it for the plugin/runtime system.

Uh, what, why? I’ll explain. When you distribute a game that can be used with the Panda3D runtime (be it the browser plugin, or be it the standalone runtime), you pack it into a .p3d file. That .p3d file will indicate which version of Panda3D it is built for. When someone runs a .p3d file through the runtime, the runtime downloads a small ultra-optimized build of Panda3D to run the provided .p3d package.

This Panda3D package that is downloaded by the runtime must be as small as possible. Because the entire Panda3D build will contain big components that not every game will use, we have split those into separate packages. For instance, a “panda3d” package containing the core, an “egg” package containing libpandaegg (usually, a packed game will only contain .bam files, so this is usually not needed), libpandaode, a “models” package containing the default models, one package per audio library, etc. The .p3d file can indicate which packages it depends on, so that the download size is kept to a minimum.

That gives a problem with the old pandac.PandaModules-style import system, though. First of all, pandac.PandaModules imports every single library, while it is not guaranteed that all libraries are on the system (as the .p3d file may not need some of them). We had to put hacky ImportError exception handlers in PandaModules.py to work around that.

Furthermore, with the old system, we have no control anymore over the imported classes. Why we would need that? Well, the game has the ability to download and install more components of Panda3D while it is running. With the new system, when for example the “egg” component is installed, we can easily add a hook into the ‘panda3d.egg’ module to be automatically updated with the new installed libpandaegg library. This allows game developers to keep the pre-download time to a minimum by installing packages on demand (for example, if only a part of the game uses ODE physics, the developer can choose to have the ‘ode’ package installed whenever the end-user chooses to run that part of the game.)

How it affects you

If you’re still awake after reading all that, you might be wondering what will happen to all the existing code. Will this break anything? The answer is, no. Right now, it’s just a small Python file that I added, which allows you to import Panda3D classes using a different convention. This is still experimental, so the ‘pandac’ tree is still around, and not even deprecated. When, perhaps after the 1.7 release series, the new system has been thoroughly tested and proved to work better, we might switch the Python code and sample programs to the new system, and recommend people to use it.
But if you don’t feel like switching, don’t worry. As we care about backward compatibility, and because most of the code is heavily dependent upon the old structure, the ‘pandac’ structure will probably be around for a loooong time.


Last but not least, some example imports showing you a bit how it works:

from panda3d.egg import EggData
from panda3d import ode
joint = ode.OdeJoint
from panda3d.core import *
tex = Texture()
import panda3d.direct
ival = panda3d.direct.CInterval()