Today I am excited to announce the release of version 1.9.1 of the Panda3D SDK. This release fixes a lot of bugs and stability issues. With this release, the 1.9 branch is now considered to be stable. We feel that it is now ready to use for production purposes.
Since 1.9.0 was an unstable release not yet ready for deployment, it did not include deployment tools such as pdeploy and packp3d. This release reintroduces them so that Panda3D applications can be deployed against the 1.9 branch of Panda3D.
As discussed in the previous blog post, the issues with installing and running the SDK on Mac OS X 10.11 “El Capitan” have been resolved. An update has also been made to pdeploy so that the installers it generates are compatible with the latest OS X version as well.
Besides the countless bug fixes in this release, a few smaller features have been introduced, mostly to address usability shortcomings. A couple of those will be highlighted in this post.
One of the new features is that Panda3D is now a DPI-aware process on Windows. This solves a problem that was introduced by Windows 8.1, namely that Windows will try to automatically scale windows if the user has configured his monitor to a DPI other than 100%, a feature that is called “DPI virtualization”. This was introduced in order to prevent text from looking too small on high-DPI displays like tablets.
This scaling looks bad in 3-D applications like Panda3D and causes blurring artifacts, particularly in text, as demonstrated in the following screenshots. It also causes Windows to misrepresent the true size of the graphics window and display size, which can lead to other subtle bugs. Panda3D 1.9.1 disables this feature by telling Windows it will take care of any scaling itself.
By default, since developers tend to expect their window not to unexpectedly change their size in pixels, Panda3D will not automatically resize windows when they are dragged to a monitor with a different DPI setting or when the DPI setting is changed in the Control Panel while Panda3D is running. However, this can be turned on by setting the “dpi-window-resize” variable to “true”. For applications that instead wish to keep using Windows’ DPI virtualization, the Config.prc variable “dpi-aware” may be set to “false”, in which case Panda3D will behave as it did in 1.9.0 and earlier.
Another added feature is the M_confined mouse mode. This mouse mode keeps the cursor confined to the window and can be used in cases where M_relative cannot be used but where it is still desirable to prevent the mouse cursor from leaving the window. A new sample program has been added to demonstrate the different mouse modes that are available.
The p3d_TransformTable GLSL input, which makes it easier to implement hardware skinning in a custom GLSL shader, has been backported from the 1.10 branch by user request. Although the fully automatic hardware skinning feature is a 1.10 feature and is not available yet in 1.9.1, this feature should make it easier for people who use custom shaders for their characters. See this forum thread for more information on the subject.
Special thanks go to Ed Swartz, who has contributed a significant amount of fixes and testing effort for this release.
The following issues were fixed in this release:
- SDK now properly installs in Mac OS X 10.11 “El Capitan”
- Windows 8.1+ no longer applies DPI virtualization to Panda window
- Fix ffmpeg library load issue on Mac OS X
- Fix issues running maya2egg on Mac OS X
- Fix compiler errors on different platforms
- Fix various rare crashes
- Fix crashes on shutdown in threaded pipeline
- Fix low-level threading crash on ARM machines
- More reliably and robustly handle failures opening OpenAL device
- Textures were not being scaled to power-of-2 in some cases
- Correct scaling of normal vectors with flatten operation
- Correct positioning of viewing axis when showing lens frustum
- Add dpi-window-resize option to auto-resize window on DPI change
- Fix assertions when alpha-file-channel references unknown channel
- Use OpenGL-style vertex colors by default on non-Windows systems
- Default vertex column alignment is now 4 bytes
- Add PNMImage premultiply/unpremultiply methods.
- Fix incorrect parsing of numbers with exponents in Config.prc
- Fix for reading URLs mounted via the virtual file system
- Fix shader generator memory leaks and runtime performance
- Fix shader generator scaling of binormals and tangents
- Expose _NET_WM_PID to window managers in X11
- Fix a range of bugs in tinydisplay renderer.
- Don’t error when setting lens far distance to infinity
- Allow passing custom lens to saveCubeMap/saveSphereMap
- Fix errors in saveCubeMap/saveSphereMap in threaded pipeline
- Fix DynamicTextFont.makeCopy()
- Make Texture memory size estimation more accurate
- Fix various window resizing issues
- Fix PandaSystem.getCompiler() value for clang (it reported gcc)
- x2egg no longer replaces face normals with vertex normals
- Include Eigen headers in Mac and Windows SDK
- Added geomipterrain-incorrect-normals setting, default=true
- DisplayInformation resolution list was missing on Windows
- Upgrade FMOD and Bullet versions on Windows and Mac OS X
- Various performance optimizations
- Fixed various other bugs not listed here.
Fixes and improvements for the runtime:
- Fix splash screen freezing in the X11 web plug-in
- pdeploy will now handle extracted files (eg. .ico and .cur)
- Added more options for customizing splash screen
- Fix missing xml and ast modules from morepy package
- Certificate dialog is now localized to various languages
- Fix packp3d error when Python file is not in a package
- Pass on failing exit status from packaged application
- Remove annoying “:Packager(warning): No such file” warning
- Fix issue installing pdeploy-generated .pkg on OS X 10.11
Fixes for the Python API:
- Fix mysterious and rare crash in tp_traverse
- Bullet step function accidentally defaulted to step size of 0
- Fix overflow of file offsets (eg. when seeking in huge files)
- Fix regression with memoryviews
- Fix hasattr/getattr of vector classes for invalid attributes
- Allow passing a long to methods accepting an int
- Fix crash when passing None to Filename constructor
- MouseWatcherGroup was erroneously not exposed in 1.9.0
- ShowBase no longer unmounts VFS when shutting down
- No longer requires setting PATH to import panda3d.*
- DirectDialog default geom is once again respected
- DirectDialog no longer overrides custom frameSize
- Fix WebcamVideo/MicrophoneAudio.getOptions() methods
Changes relating to the OpenGL renderer:
- Various performance improvements
- Fix point/line thickness setting
- Improve GLSL error reporting
- Fix Intel driver issues, particularly with geometry shaders
- Add more error checking for parameter types
- Integer shader inputs were not being converted to float properly
- Fix crash passing an undersized array to a GLSL shader input
- p3d_ColorScale et al may now be declared as vec3
- Fix flickering when using trans_model_to_apiview in Cg
- Support wireframe and point rendering modes in OpenGL ES
- Fix issue with model disappearing in rare cases with GLSL
- Fix ColorWriteAttrib not working as it should
- Allow deactivating PStats collectors for GPU timers
- Memory residency of graphics buffers now tracked by PStats
- Allow changing OpenGL coordinate system with gl-coordinate-system
Fixes for libRocket integration:
- libRocket did not work on Mac OS X in 1.9.0
- Fix inconsistent behavior with non-power-of-2 textures in rocket
- Use model-path for finding libRocket assets
- Add missing keys to libRocket keymap
- libRocket elements showed up white in tinydisplay
The following new features were added:
- Add -L (lighting) and -P (graphics pipe) pview options
- Add M_confined mouse mode that keeps cursor in window
- Add sample program demonstrating mouse modes
- bam2egg supports collision sphere and plane solids
- p3d_TransformTable GLSL input backported from 1.10 branch
- Add openal-device setting for selecting OpenAL audio output
- Add limited modification timestamp tracking for Ramdisk mounts
- Support for Autodesk Maya 2016
Several weeks ago, Apple released the latest version of Mac OS X, code-named “El Capitan”. Among other things, it introduced a number of security features, including System Integrity Protection (SIP). This feature primarily places restrictions on which filesystem locations can be modified, even by root processes.
These changes have caused an inability to install and run the Panda3D SDK on Mac OS X 10.11. When running the SDK installer package, you may encounter an error message like “This package is incompatible with this version of OS X and may fail to install”, prematurely interrupting the installation procedure. A similar issue is present in the installer for the Panda3D 1.0.4 Runtime.
An additional problem is the removal of PackageMaker, which is used to produce the installer package for the SDK when building Panda3D from source. It has been replaced by a different set of utilities that fulfill the same purpose.
These issues have been resolved now and a fix will be part of the upcoming 1.9.1 release. In the meantime, to install the Panda3D SDK on Mac OS X without disabling the System Integrity Protection features, you may use a pre-release build of the Panda3D 1.9.1 SDK which has been made available at the following location:
A patched version of the Panda3D 1.0.4 Runtime is available as well:
Please let us know on the bug tracker, forums or IRC channel if there are still issues with this build, so that they can be fixed before the 1.9.1 release.
We’ve been working hard for the past months to update the OpenGL renderer and bring support for the latest and greatest features that OpenGL has to offer. We’ve not been very good at updating the blog, though, so we decided to make a post highlighting some of those features and how they are implemented into Panda3D. These features will be part of the upcoming Panda3D 1.9.0 release, which should come out within the following month, assuming that everything goes according to plan.
sRGB support (linear pipeline)
Virtually all lighting and blending calculations are written under the assumption that they happen in a linear space, meaning that multiplying a color value with x results in a color value that is x times as bright. However, what is often overlooked by game developers is the fact that the average monitor isn’t linear. CRT monitors have a gamma of around 2.2, meaning that the output luminosity was proportional to the input voltage raised to the power of 2.2. This means that a pixel value of 0.5 brightness isn’t actually half as bright as one of 1.0 brightness, but only around 0.22 times as bright! To compensate for this, content is produced in the sRGB color space, which has a built-in gamma correction of around 1/2.2. Modern monitors and digital cameras are calibrated to use that standard, so that no gamma correction is typically needed to display images in an image viewer or in the browser.
However, this presents an issue for 3D engines that perform lighting and blending calculations. Because both the input and output color space are non-linear, when you have a light that is supposed to attenuate colors to 0.5x brightness, it will actually cause it to show up with 0.22x brightness—more than twice as dark as it should be! This results in dark areas appearing too dark, and the transition between dark and bright areas will look very unnatural. To have a proper linear lighting pipeline, what we have to do is correct for the gamma on both the input and the output side: we have to convert our input textures to linear space, and we have to convert the rendered output back to sRGB space.
It’s very easy for developers to overlook or dismiss this issue as being unimportant, because it doesn’t really affect unlit textures; they look roughly the same because the two wrongs cancel each other out. Developers simply tweak the lighting values to compensate for the incorrect light ramps until it looks acceptable. However, until you properly address gamma correction, your lighting will look wrong. The transition between light and dark will look unnatural, and people may see banding artifacts around specular highlights. This also applies when using techniques like physically based rendering, where it is more important that the lights behave like they would actually behave in real life.
The screenshots below show a scene rendered in Panda3D using physically-based rendering with and without gamma correction. Note how the left image looks far too dark whereas the right image has a far more natural-looking balance of lighting. Click to enlarge.
Fortunately, we now have support for a range of hardware features that can correct for all of these issues automatically—for free. There are two parts to this: sRGB framebuffers and sRGB textures. If the former is enabled, you’re telling OpenGL that the framebuffer is in the sRGB color space, and will let us do all of the lighting calculations in linear space, and will then gamma-adjust it before displaying it on the monitor. However, just doing that would cause your textures to look way too bright since they are already gamma-corrected! Therefore, you can set your textures to the sRGB format to indicate to OpenGL that they are in the sRGB color space, and that they should therefore automatically be converted to linear space when they are sampled.
The nice thing is that all of these operations are virtually free, because they are nowadays implemented in hardware. These features have existed for a long time, and you can rely on the vast majority of modern graphics hardware to correctly implement sRGB support. We’ve added support to the Direct3D 9 renderer as well, and even to our software renderer! However, keep in mind that you can’t always rely on a monitor being calibrated to 2.2 gamma, and therefore it is always best to have a screen that allows the user to calibrate the application’s gamma. We’ve added a special post-processing filter that applies an additional gamma correction to help with that.
To read more about color spaces in the future Panda3D version, check out this manual page, although the details are still subject to change.
Tessellation shaders are a way to let the GPU take a base mesh and subdivide it into more densely tessellated geometry. It is possible to use a simple base mesh and subdivide it to immensely detailed proportions with a tessellation shader, unburdened by the narrow bandwidth between the GPU and the CPU. Their programmable nature allows for continuous and highly dynamic level of detail without popping artifacts.
To enable tessellation, you have to provide two new types of shader: the control shader, which specifies which control points to subdivide and how many times to subdivide each part of the patch, and the evaluation shader, which then specifies what to do with the tessellated vertices. (They are respectively called a “hull shader” and a “domain shader” in Direct3D parlance.) They are used with a new type of primitive, GeomPatches, which can contain any number of vertices. There are helpful methods for automatically converting existing geometry into patches.
One immediately obvious application is LOD-based terrain or water rendering. Only a small number of patches has to be uploaded to the GPU, after which a tesselation control shader can subdivide the patches by an amount that is calculated based on the distance to the camera. In the tesselation evaluation shader, the desired height can be calculated either based on a height map texture or based on procedural algorithms, or a combination thereof.
Another application is displacement mapping, where an existing mesh is subdivided on the GPU and a displacement map is used to displace the vertices of the subdivided mesh. This allows for showing very high-detailed meshes with dynamic level of detail even when the actual base mesh is very low-poly. Panda3D exposes methods that can be used for converting existing triangle geometry to patches to make it easier to apply this technique. Alternatively, this can be done by way of a new primitive type supported by the egg loader.
Both methods are demonstrated in the screenshots below. Click to enlarge.
Thanks to David Rose for implementing this feature! Support for tesselation shaders is available in the development builds.
Besides the new tessellation shaders mentioned earlier, Panda3D supports vertex shaders, geometry shaders, and fragment shaders. All of these shaders are designed to perform a very particular task in the rendering pipeline, and as such work on a specific set of data, such as the vertices in a geometry mesh or pixels in a framebuffer.
However, because each shader is designed to do a very specific task, it can be difficult to write shaders to do things that the graphics card manufacturer didn’t plan for. Sometimes one might want to implement a fancy ray tracing algorithm or an erosion simulation, or simply make a small modification to a texture on the GPU. These things may require code to be invoked on the GPU at will, and be able to operate on something other than the vertices in a mesh or fragments in a framebuffer.
Enter compute shaders: a type of shader program that is general purpose and can perform a wide variety of tasks on the video card. Somewhat comparable to OpenCL programs, they can be invoked at any point during the rendering process, operating on a completely user-defined set of inputs. Their flexibility allows them to perform a lot of the tasks one might be used to implementing on the CPU or via a render-to-texture buffer. Compute shaders are particularly interesting for parallelizable tasks like physics simulations, global illumination computation, and tiled rendering; but also for simpler tasks like generating a procedural texture or otherwise modifying the contents of a texture on the GPU.
Of note is to mention that a lot of these tasks require another GLSL feature that we’ve added: ARB_image_load_store. This means that it’s not only possible to sample textures in a shader, but also perform direct read and write operations on a texture image. This is particularly useful for compute shaders, but this feature can be used in any type of shader. That means that you can now write to textures from a regular fragment or vertex shader as well.
If you’d like to get into the details, you can read this manual page. Or, if you’re feeling adventurous, you can always check out a development build of Panda3D and try it out for yourself.
In the following screenshots, compute shaders have been used to implement voxel cone tracing: a type of real-time ray tracing algorithm that provides global illumination and soft reflections. Click here to see a video of it in action in Panda3D.
The scene on the left is based on a model by Matthew Wong and the scene on the right makes use of the famous Sponza model by Crytek.
We’ve made various enhancements to the render-to-texture functionality by making it possible to render directly to texture arrays, cube maps, and 3D textures. Previously, separate rendering passes were needed to render to each layer, but you can now bind the entire texture and render to it in one go. Using a geometry shader, one selects the layer of the texture to render to, which can be combined with geometry instancing to duplicate the geometry onto the different layers, depending on the desired effect.
One technique where this is immediately useful is cube map rendering (such as in point light shadows), where the geometry can be instanced across all six cube map faces in one rendering pass. This improves performance tremendously by eliminating the need to issue the rendering calls to the GPU six times.
In the screenshots below, however, layered rendering is used to render the various components of an atmospheric scattering model into different layers of a 3D texture. Click to enlarge.
It is now also possible to use viewport arrays to render to different parts of a texture at once, though support for this is still experimental. As with layered render-to-texture, the viewport to render into is selected by writing to a special output variable in the geometry shader. This makes it possible to render into various parts of a texture atlas or render to different areas of the screen within the same rendering pass.
In the screenshots below, it is used to render the shadow maps of different cameras into one big shadow atlas, allowing the rendering of many shadow-casting lights in one rendering pass. The advantages of this approach are that parts of the texture can be rendered to on-demand, so that the number of shadow casters updating within one frame can be effectively limited, as well as the fact that different lights can use a different resolution shadow map.
We’ve long supported stereoscopic rendering and Panda3D could already take advantage of specialized stereo hardware. But now, as part of the development toward Oculus Rift support, we’ve made it possible to create a buffer on any hardware that will automatically render both left and right views when associated with a stereoscopic camera, making it possible to create postprocessing effects in your stereoscopic application without having to create two separate buffers.
With the multiview texture support introduced in Panda3D 1.8, a single texture object can contain both left and right views, and Panda3D automatically knows which view to use. This makes enabling stereo rendering in an application that uses post-processing filters very straightforward, without needing to set up two separate buffers or textures for each view, and it can be enabled or disabled at the flick of a switch.
Debugging and profiling features
We now take advantage of timer query support with the new GPU profiling feature added to PStats: Panda3D can now ask the driver to measure how much time the draw operations actually take to complete, rather than the CPU time it takes to issue the commands. This feature is instrumental in finding performance bottlenecks in the rendering pipeline by letting you know exactly which parts of the process take the longest.
The reason this feature is important is that PStats currently only displays the time it takes for the OpenGL drawing functions to finish. Most OpenGL functions, however, only cause the commands to be queued up to be sent to the GPU later, and return almost immediately. This makes the performance statistics very misleading, and makes it very difficult to track down bottlenecks in the draw thread. By inserting timer queries into the command stream, and asking for the results thereof a few frames later, we know how much time the commands actually take without significantly delaying the rendering pipeline.
It is also possible to measure the command latency, which is the time that it takes for the GPU to catch up with the CPU and process the draw commands that the CPU has issued.
This feature is available in current development builds. The documentation about this feature is forthcoming.
This blog post is by no means a comprehensive listing of all the features that will be available in the new version, but here are a few other features we thought were worth mentioning:
Comparison of cube mapping with and without seamless mode enabled
- We’ve added support for seamless cube maps, to eliminate the seams that can appear up on the edges of cube maps, especially at lower mip map levels. This is enabled by default, assuming that the driver indicates support.
- We also added support for the new KHR_debug and ARB_debug_output extensions, which give much more fine-grained and detailed debugging information when something goes wrong.
- We’ve added a range of performance improvements, and we’ve got even more planned!
- We’ve made it even easier to write GLSL shaders by adding more shader inputs that Panda3D can provide. This includes not only built-in inputs containing render attributes, but also a greater amount of custom input types, such as integer data types and matrix arrays.
- Line segments are now supported in a single draw call through use of a primitive restart index (also known as a strip-cut index), improving line drawing performance.
- It is now possible to access integer vertex data using ivec and uvec data types, and create textures with an integer format (particularly useful for atomic access from compute shaders).
- In the fixed-function pipeline, the diffuse lighting component is now calculated separately from the specular lighting component. This means that the specular highlight will no longer be tinted by the diffuse color. This is usually more desirable and better matches up with Direct3D behavior. The old effect can still be obtained by multiplying the diffuse color into the specular color, or by disabling a configuration flag.
Special thanks go to Tobias Springer, who has been using some of the new features in his project, for graciously contributing the screenshots for some of the listed features. His results with Panda3D look amazing!
I’d like to talk for a moment about the new buffer protocol support in the latest development version of Panda3D. It’s not a particularly exciting feature, but it can be an important one, especially if you use Panda3D together with other libraries like NumPy or if you need to do a lot of low-level operations on texture or geometry data from Python. However, most use cases will not require this functionality.
The Python buffer protocol is a way for Python applications to get a direct hook into C/C++ memory, arrays in particular. Panda3D classes that support it provide a pointer into the underlying memory to the Python interpreter along with a description of how the data is laid out in memory. This description is necessary for Python to know how to access and copy the information.
Starting with Python 2.6, you can use the built-in multiview type to access the memory underlying an object exposing the buffer interface. You can then manipulate the data by converting it to a list or array.array object, creating sub-views and operating on those, writing or reading parts of it to a file, or even just modifying the memory directly as if it were a regular Python list.
Right now, the only Panda3D classes that expose the buffer interface are GeomVertexArrayData and PointerToArray (the latter of which is used for most array storage purposes in Panda3D, including textures), but more classes can easily be added on request. Conversely, Panda allows taking an existing buffer object (such as from array.array or a NumPy array) as source data for a texture or a vertex data array.
When copying data to other libraries such as NumPy, this can help cut down on unnecessary copy operations. Presently, you would call a method like get_data() to create a C++ string, which is one copy operation already, which would then be wrapped into a Python string, which is another. Finally, you would pass this string to NumPy, which would perform at least one more copy operation to copy it into its own representation. But since NumPy also supports the buffer protocol, you can now copy the contents of a texture or of a vertex data array straight into a NumPy array without any unnecessary copy operations.
One other interesting use case for this feature is the fast and efficient manipulation of vertex data and texture data from Python. Instead of having to create a GeomVertexRewriter or a PNMImage to modify the respective data, you can now create a memoryview to iterate over the data directly, easily copy subsets around, or page them out to disk. In my own use case, which involved a lot of geometry generation and manipulation, this flexibility allowed me to dramatically decrease the time spent generating geometry and flattening it. Direct access to the memory also allowed me to quickly page chunks of geometry data out to disk and back into memory when necessary.
The buffer protocol provides a lot of the flexibility of low-level memory access without being exposed to all of the intricacies of C/C++ memory management. In particular, the data is reference counted, so you don’t need to worry about deleting the data. You should in fact be able to keep multiviews around for non-immediate consumption, but keep in mind that you may still need to tell Panda3D when you’ve modified the data later (for instance with an additional explicit call to modify_ram_image()).
This feature will be available in the 1.9.0 release of the Panda3D SDK.
Historically, Panda has always run single-core. And even though the Panda3D codebase has been written to provide true multithreaded, multi-processor support when it is compiled in, by default we’ve provided a version of Panda built with the so-called “simple threads” model which enforces a single-core processing mode, even on a multi-core machine. But all that is changing.
Beginning with the upcoming Panda3D version 1.8, we’ll start distributing Panda with true threads enabled in the build, which enables you to take advantage of true parallelization on any modern, multi-core machine. Of course, if you want to use threading directly, you will have to deal with the coding complexity issues, like deadlocks and race conditions, that always come along with this sort of thing. And the Python interpreter is still fundamentally single-core, so any truly parallel code must be written in C++.
But, more excitingly, we’re also enabling an optional new feature within the Panda3D engine itself, to make the rendering (which is all C++ code) run entirely on a sub-thread, allowing your Python code to run fully parallel with the rendering process, possibly doubling your frame rate. But it goes even further than that. You can potentially divide the entire frame onto three different cores, achieving unprecedented parallelization and a theoretical 3x performance improvement (although, realistically, 1.5x to 2x is more likely). And all of this happens with no special coding effort on your part, the application developer–you only have to turn it on.
How does it work?
To use this feature successfully, you will need to understand something about how it works. First, consider Panda’s normal, single-threaded render pipeline. The time spent processing each frame can be subdivided into three separate phases, called “App”, “Cull”, and “Draw”:
In Panda’s nomenclature, “App” is any time spent in the application yourself, i.e. your program. This is your main loop, including any Python code (or C++ code) you write to control your particular game’s logic. It also includes any Panda-based calculations that must be performed synchronously with this application code; for instance, the collision traversal is usually considered to be part of App.
“Cull” and “Draw” are the two phases of Panda’s main rendering engine. Once your application code finishes executing for the frame, then Cull takes over. The name “Cull” implies view-frustum culling, and this is part of it; but it is also much more. This phase includes all processing of the scene graph needed to identify the objects that are going to be rendered this frame and their current state, and all processing needed to place them into an ordered list for drawing. Cull typically also includes the time to compute character animations. The output of Cull is a sorted list of objects and their associated states to be sent to the graphics card.
“Draw” is the final phase of the rendering process, which is nothing more than walking through the list of objects output by Cull, and sending them one at a time to the graphics card. Draw is designed to be as lightweight as possible on the CPU; the idea is to keep the graphics command pipe filled with as many rendering commands as it will hold. Draw is the only phase of the process during which graphics commands are actually being issued.
You can see the actual time spent within these three phases if you inspect your program’s execution via the PStats tool. Every application is different, of course, but in many moderately complex applications, the time spent in each of these three phases is similar to the others, so that the three phases roughly divide the total frame time into thirds.
Now that we have the frame time divided into three more-or-less equal pieces, the threaded pipeline code can take effect, by splitting each phase into a different thread, so that it can run (potentially) on a different CPU, like this:
Note that App remains on the first, or main thread; we have only moved Cull and Draw onto separate threads. This is important, because it means that all of your application code can continue to be single-threaded (and therefore much easier and faster to develop). Of course, there’s also nothing preventing you from using additional threads in App if you wish (and if you have enough additional CPU’s to make it worthwhile).
If separating the phases onto different threads were all that we did, we wouldn’t have accomplished anything useful, because each phase must still wait for the previous phase to complete before it can proceed. It’s impossible to run Cull to figure out what things are going to be rendered before the App phase has finished arranging the scene graph properly. Similarly, it’s impossible to run Draw until the Cull phase has finished processing the scene graph and constructing the list of objects.
However, once App has finished processing frame 1, there’s no reason for that thread to sit around waiting for the rest of the frame to be finished drawing. It can go right ahead and start working on frame 2, at the same time that the Cull thread starts processing frame 1. And then by the time Cull has finished processing frame 1, it can start working on culling frame 2 (which App has also just finished with). Putting it all in graphical form, the frame time now looks like this:
So, we see that we can now crank out frames up to three times faster than in the original, single-threaded case. Each frame now takes the same amount of time, total, as the longest of the original three phases. (Thus, the theoretical maximum speedup of 3x can only be achieved in practice if all three phases are exactly equal in length.)
It’s worth pointing out that the only thing we have improved here is frame *throughput*–the total number of frames per second that the system can render. This approach does nothing to improve frame *latency*, or the total time that elapses between the time some change happens in the game, and the time it appears onscreen. This might be one reason to avoid this approach, if latency is more important than throughput. However, we’re still talking about a total latency that’s usually less than 100ms or so, which is faster than human response time anyway; and most applications (including games) can tolerate a small amount of latency like this in exchange for a smooth, fast frame rate.
In order for all of this to work, Panda has to do some clever tricks behind the scenes. The most important trick is that there need to be three different copies of the scene graph in different states of modification. As your App process is moving nodes around for frame 3, for instance, Cull is still analyzing frame 2, and must be able to analyze the scene graph *before* anything in App started mucking around to make frame 3. So there needs to be a complete copy of the scene graph saved as of the end of App’s frame 2. Panda does a pretty good job of doing this efficiently, relying on the fact that most things are the same from one frame to the next; but still there is some overhead to all this, so the total performance gain is always somewhat less than the theoretical 3x speedup. In particular, if the application is already running fast (60fps or above), then the gain from parallelization is likely to be dwarfed by the additional overhead requirements. And, of course, if your application is very one-sided, such that almost all of its time is spent in App (or, conversely, almost all of its time is spent in Draw), then you will not see much benefit from this trick.
Also, note that it is no longer possible for anything in App to contact the graphics card directly; while App is running, the graphics card is being sent the drawing commands from two frames ago, and you can’t reliably interrupt this without taking a big performance hit. So this means that OpenGL callbacks and the like have to be sensitive to the threaded nature of the graphics pipeline. (This is why Panda’s interface to the graphics window requires an indirect call: base.win.requestProperties(), rather than base.win.setProperties(). It’s necessary because the property-change request must be handled by the draw thread.)
Early adopters are invited to try this new feature out today, before we formally release 1.8. It’s already available in the current buildbot release; to turn it on, see the new manual page on the subject. Let us know your feedback! There are still likely to be kinks to work out, so we’d love to know how well it works for you.
This is about how to speed up your Python Code, and has no direct impact on Panda3D’s performance. For most projects, the vast majority of the execution time is inside Panda3D’s C++ or in the GPU, so no matter what you do, fixing your Python will never help. For the other cases where you do need to speed up your Python code, Cython can help. This is mainly addressed to people who prefer programming in Python, but know at least a little about C. I will not discuss how to do optimizations within Python, though if this article is relevant to you, you really should look into it.
Cython is an interesting programming language. It uses an extended version of python’s syntax to allow things like statically typed variables, and direct calls into C++ libraries. Cython compiles this code to C++. The C++ then compiles as a python extension module that you can import and use just like a regular python module. There are several benefits to this, but in our context the main one is speed. Properly written Cython code can be as fast as C code, which in some particular cases can be even 1000 times faster than nearly identical python code. Generally you won’t see 1000x speed increases, but it can be quite a bit. This does cause the modules to only work on the platform they were compiled for, so you will need to compile alternate versions for different platforms.
By default, Cython compiles to C, but the new 0.13 version supports C++. This is more useful as you probably use at least one C++ library, Panda3D. I decided to try this out, and after stumbling on a few simple issues, I got it to work, and I don’t even know C++.
Before I get to the details, I’ll outline why you might want to use Cython, rather than porting performance bottlenecks to C++ by hand. The main benefit is in the process, as well as the required skill set. If you have a large base of Python code for a project, and you decide some of it needs to be much faster, you have a few options. The common approach seems to be to learn C++, port the code, and learn how to make it so you can interface to it from python. With Cython, you can just add a few type definitions on variables where you need the performance increase, and compile it which gives you a Python modules that works just like the one you had. If you need to speed up the code that interfaces with Panda3D, you can swap the Python API calls for C++ ones. Using Cython allows you to just put effort into speeding up the parts of code you need to work on, and to do so without having to change very much. This is vastly different from ditching all the code and reimplementing it another language. It also requires you to learn a pretty minimal amount of stuff. You also get to keep the niceness of the Python syntax which may Python coders have come to appreciate.
There are still major reasons to actually code in C++ when working with Panda, but as someone who does not do any coding in C++, I won’t talk about it much. If you want to directly extend or contribute to Panda3D, want to avoid redundantly specifying your imports from header files (Cython will require you to re-specify the parts of API you are using rather than just using the header files shipped with Panda), or you simply prefer C++, C++ may be a better option. I mainly see Cython as a convenient option when you end up needing to speed up parts of a Python code-base; however, it is practical to undertake large projects from the beginning in Cython.
Cython does have some downsides as well. It is still in rather early development. This means you will encounter bugs in its translators as well as the produced code. It also lacks support for a few Python features, such as most uses of generators. Generally I haven’t had much trouble with these issues, but your experience may differ.
Cython does offer an interesting side benefit as well. It allows you to optionally statically type variables and thus can detect more errors at compile time than Python.
To get started, first you need an install of Cython 0.13 (or probably any newer version). If you have a Cython install you can check the version with the -V command. You can pick up the source from the Cython Site, and install it by running “python setup.py install” from the Cython source directory. You will also need to have a compiler. The Cython site should help you get everything setup if you need additional guidance.
Then you should try out a sample to make sure you have everything you need, and that it’s all working. There is a nice C++ sample for Cython on the Cython Wiki. (This worked for me on Mac, and on Windows using MinGW or MSVC as a compiler).
As for working with Panda3D, there are a few things I discovered:
- There are significant performance gains to be had by just compiling your existing Python modules as Cython. With a little additional work adding static typed variables, you can have larger performance gains without even moving over to Panda’s C++ API (Which means you don’t need to worry about linking against Panda3D which can be an issue).
- Panda3D already has python bindings with nice memory management, so I recommend instancing all the objects using the python API, and only switching to the C++ one as needed.
- You can use the ‘this’ property on Panda3D’s Python objects to get a pointer to the underlying C++ object.
- On mac, you need to make sure libpanda (and is some cases, possibly others as well) is loaded before importing your extension module if you use any of Panda3D’s libraries.
- On Windows, you need to specify the libraries you need when compiling (in my case, just libpanda)
- The C++ classes and Python classes tend to have the same name. To resolve this, you can use “from x import y as z” when importing the python ones, or you can just import panda3d.core, and use the full name of the classes (like panda3d.core.Geom). There may be a way to rename the C++ classes on import too.
- If using the Panda3D C++ API on Windows, you will need to use the MSVC compiler. You can get Microsoft Visual Studio 2008 Express Edition for free which includes the needed compiler.
Using this technique I got a 10x performance increase on my code for updating the vertex positions in my Geom. It avoided having to create python objects for all of the vertexes and passing them through the Python API which translates them back to C++ objects. It was just a matter of moving over one call in the inner loop to the other API. This, however, was done in already optimized Cython code that was simply loading vertex positions stored in a block of memory into the Geom. Most use cases would likely see less of a benefit. Overall though, I gained a lot of performance both from the change over to Cython, and from the change over to the C++ API. These changes only required relatively small changes to the speed critical portions of my existing python code.
I made a rather minimal example of using a Panda3D C++ API call from Cython. Place the setup.py and the testc.pyx files in the same directory, and from the said directory, run setup.py with your Python install you use with Panda3D. If everything is properly configured, this should compile the example Cython module, testc.pyx, to a python extension module and run it. If it works, it will print out a few lines ending with “done”. It is likely you may need to tweak the paths in setup.py. If not on Mac or Windows, you will get an error indicating where you need to enter your compiler settings (mostly just the paths to Panda3D’s libraries).
I would like to thank Lisandro Dalcin from the Cython-Users mailing list who helped me get this working on Windows.
The Entertainment Technology Center (ETC) at Carnegie Mellon University in 2009 launched a graduate student project to add a collection of artificial intelligence behaviors like seek, flock, and evade along with 2D pathfinding to Panda3D. This blog post is a reminder that this work is available as part of Panda 1.7.0, and receives ongoing attention from the ETC team. The timeline for this work has been as follows:
- Summer 2009: Collect feedback from the Panda3D community via Pandai forum post on requirements for an AI library. This forum post remains active today to address additions to the Pandai code base.
- August – December 2009: Start with Craig Reynolds’ published work on flocking behavior and A* algorithm for two-dimensional pathfinding between points. Develop the C++ code, based on community feedback, and prototyping with Building Virtual Worlds graduate student work at the ETC.
- December 2009: At the insistence of ETC Faculty advisors Ruth Comley and myself, the Pandai student team created a number of demonstrations and a detailed Pandai ETC Project Web site documenting the project work. The demonstrations show capabilities, such as a fish demonstration that shows wander, pursue, and evade. The project web site includes further descriptions on the project team, motivations for the work, and downloadable content, including the art assets (like the fish) and code (the fish demo) needed to run demonstrations. See Pandai ETC Project download page.
Pandai demo: fish pursue hook until one is caught, then evade it
- January 2010: Thanks to rdb, the Pandai library was published as part of the Panda3D 1.7.0 release. One change of note regarding the downloadable examples from the ETC Pandai Project web site: rather than “from libpandaai import *” the Python code should use “from panda3d.ai import *”. With this minor edit, you will be able to download and run the fish demo and others using Panda 1.7.0.
- July 2010: Ongoing collection of feedback from the Pandai forum post led to the release of a Blender meshgen tool for pathfinding. A link to this tool has been added to the Pandai ETC Project download page.
The Pandai ETC team is responsible for version 1.0 of the Pandai library, and remain active in its support. Your comments are welcome here regarding the Pandai effort and shared code and examples. For help requests, continue the thread within the Pandai forum post.
This was an April Fool’s Joke. The information in it is not meant to be taken seriously. Click the post title if you want to see it.
During the past few months, several students at Carnegie Mellon University’s Entertainment Technology Center (ETC) have been working on improving the egging process as well as incrementally improving the shader system. Just take a look at their smiling faces!
From Left: Wei-Feng Huang, Federico Perazzi, Shuying Feng (Panda), Deepak Chandrasekaran, Andrew Gartner
For those of you that have been with Panda 3D a long time you’ll know that there have been ETC Panda 3D projects in the past. Some of them have had limited success due to an oversized project scope. This project will instead focus on making complete feature sets rather than half implemented pieces like those past unsuccessful projects. It will also focus on documentation both within the code and the manual to make sure that you, the Panda community will be able to take their work and build on top of it.
With that said, this project will primarily focus on two things:
- The shader inputs
- The egging/model exporting process
If you’ve taken a look at the source code of Panda 3D’s shader system and have had any experience in professional game engine development, you’ll notice that it’s a system that isn’t implemented fully. Actually, the first shader system was an ETC student project and it has since then been improved through other ETC projects and the Panda 3D community. Shader inputs is continuing this work in a structured manor.
Shaders have supported the input of arrays and arrays of vectors for quite some time. However, Panda 3D has never supported this. There have been some hacks in the past where arrays are passed as textures, but this is not ideal for performance and it ruins texture caching schemes. After this project completes, users will be able to input arrays and arrays of vectors/matrices directly into the shader.
Screenshot of multiple lights demo
This may not seem that exciting at first but this lays the groundwork for many more things. If your new to computer graphics having a complete shader inputs system allows for some of the following just to name a few.
- Hardware accelerated actors/characters
- Shader based instancing with dynamic texture and animation support (crowds)
- Shader based vegetation system (fast trees and grass)
- A real deferred shading system
- A real light manager system for shader based lights
A Real Egging Pipeline
Up until now, there have been several attempts at user interfaces to the maya2egg, dae2egg, etc. Most of them are just simple user interfaces to the command line equivalents of them. This new user interface is much more than that. It is an artist friendly build system. Just check out some of the features.
- Simple mode for when you don’t want a build system
- Support for multiple maya versions
- Support for egg tools such as egg-opt-char and egg-palettize
- A batching system that automatically detects whether a file has been changed to allow for minimal rebuilds
- Support for all tools to be built into batch system
- Save/Load batch scripts
Like shader inputs this lays the groundwork for much future work. For any game engine to be professional quality, it needs a set of robust artist tools such as node-based shader generators and artist friendly level editors.
Screenshot of WIP Egging GUI
Hey C++ developers of Panda3D,
I’ve just checked in a fix to the codebase that should give minor releases a backward compatible ABI. This means that if you link something against the Panda3D 1.8.0 libraries, you’ll still be able to use it with libraries of any Panda3D 1.8.X release. This rule was created to make C++ users able to use the web plugin functionality.
To the people working directly on the Panda3D codebase: do not merge anything onto the release branch (e.g. panda3d_1_7_branch) that is not backward ABI compatible. You can merge in new symbols, but you cannot merge altered or removed symbols. This rule does not apply to the trunk – you can do whatever you want on the trunk as there are ABI rules there. Of course, these rules don’t affect Python code; just exposed C/C++ symbols.
But you really don’t need to worry about any of this unless you actually want to merge things onto the release branch – and this is usually done by the release maintainer anyways.
As for linking to libraries on non-Windows systems: libraries like libpanda.so / libpanda.dylib will now symlink to libpanda.so.1.7 / libpanda.1.7.dylib. This ensures that if you link to libpanda, it will link against the 1.7 version of the library and won’t conflict with libraries of any other series. This allows you to have multiple series of Panda3D installed at the same time and run different games that are linked against different series of Panda.
The latest buildbot releases should already abide by these rules. I’m going to put up a buildbot script soon that regularly builds the release branch and alerts when the ABI compatibility is broken. (E-mail me if you want to be put on the buildbot e-mail notify list.) The next release, 1.7.1, will start being ABI compatible.