Archive for the ‘Uncategorized’ Category

August / September 2018 Development Update

Wednesday, October 3rd, 2018 by rdb

In catching up with our delay in bringing out these posts, we have decided to cover the developments in the months of August and September in a single blog post. This allows us to better focus our developments on the work ahead, concordant with our plan to release 1.10 before the end of the year.

Stability improvements

To that end, we have been working diligently on improving the stability of the engine and getting the major feature branches stable enough to be merged. Among the fixes are improved thread safety for various interfaces, regressions in the shader generator, improvements to the build system, fixes for the Android port and a few improvements to the Python 3 support.

A long-standing bug has been fixed whereby Eigen-enabled builds would load some .egg models in the wrong orientation. This turned out to be a compiler bug which we managed to find a workaround for. If you were using a development build of Panda3D and had added an additional rotation in your code to work around this bug, you may need to double-check that your models are still loaded the right way.

Input API

The new input device API being developed on the input-overhaul branch has been significantly changed following a round of community feedback on the design. Sample programs showing off the various types of game devices have been added to the repository as well. After some adjustments to the API to take touch input into account, and some other finishing touches, the final review process can take place after which the code can be merged.

Stay tuned, we will follow up on this with a more in-depth post describing the new changes.

Multi-touch

As part of improving our support for mobile platforms, we have designed a new API as part of the input-overhaul effort for handling multi-touch more effectively. The new interface makes it easier to handle both touch and mouse input in the application, as well as other future pointing devices, such as the “laser pointer” style of pointing input seen in Virtual Reality games. It is expected to become part of the new release of Panda3D.

The design unifies both touch inputs and mouse clicks/drags through a concept called “active pointers”, which refers to a finger contacting the surface of the screen or a mouse with at least one button active. Whenever an active pointer is added, an event called `pointer-down` is fired. An argument is passed indicating the pointer type, a unique identifier in the case of a finger touch, as well as the position and velocity of the pointer. Likewise, events called `pointer-move` and `pointer-up` are fired when this pointer is dragged or when it ends, respectively.

Of course, it will still be possible to track the mouse or a stylus that is hovering over the surface of a digitizer independently of this by exposing a single “primary pointer” that can be accessed by legacy applications or ones that do not require multi-touch support.

You are quite welcome to provide input on the specification via the corresponding specification document, which we expect to be implemented in the coming weeks.

Other changes

The annoying “attrib lock” that prevented changing certain properties of a Material after it had been assigned to a node has been removed. It used to be necessary since the shader generator was not able to detect certain changes to existing materials after the shader for that material had already been generated. However, the improved shader generator architecture in 1.10 makes it possible for Panda3D to handle this case correctly without the attribute lock.

Support for Maya 2018 has been added, and the latest development builds now contain exporter and importer plug-ins for this version of Maya.

IRC channel

Due to ongoing spam attacks on the FreeNode network, we have needed to restrict access to the #panda3d channel on FreeNode to users who are registered with FreeNode. Click here to find out how to do so. This restriction does not apply to users of the webchat interface.

A look behind the curtains

Monday, November 20th, 2017 by fireclaw

Much has happened in Panda3D development for the upcoming 1.10 version. To bring you up-to-date with the latest developments, we will summarize some of the new changes here. Also, to further keep you informed about new and upcoming features, we’ll start a regular blog post series highlighting new developments.

Aside from a lot of optimization changes to improve various parts of Panda’s performance, as well as numerous bugfixes to improve stability and compatibility, there were some larger changes as well.

Python support

The first thing we’d like to highlight is the ability for Python users to install Panda3D via the pip package manager. No more fiddling with platform dependent installers—it takes only a single command to install the right version of Panda3D for your platform and Python version:

pip install panda3d

As a bonus feature, this allows you to install Panda into a virtualenv environment, allowing you to try out the latest development version in isolation without fear of contaminating your existing setup.

Furthermore, Panda3D has been updated to be compatible with the latest Python 3 versions. This includes interoperability with the pathlib module and the Python 3.6 path protocol, as well as fixes for the upcoming Python 3.7.

The Shader Generator

If you are using the shader generator in your application, you may significantly benefit from upgrading to 1.10. It has been overhauled to address a major performance concern for applications with complex scenes containing a large amount of render states, which could cause lag due to an excessive amount of shaders being generated.

Some new features have been added as well, such as support for hardware skinning and multiple normal maps.

Text rendering updates

The text rendering subsystem has been improved significantly. Panda’s text assembler used to perform well mainly for smaller texts, whereas frequently updating large blocks of text could cause considerable lag. But the improved text assembler code is up to 75 times as fast, making assembling large swaths of text a non-issue.

A comparison with HarfBuzz disabled and enabled. Of note is the spacing between the A and V, the "fi" ligature. The Arabic text renders like a mess without HarfBuzz.

A comparison with HarfBuzz disabled and enabled. Of note is the spacing between the A and V, the “fi” ligature. The Arabic text doesn’t render correctly at all without HarfBuzz.

Furthermore, the HarfBuzz library can now be utilized to implement text shaping, which not only enables support for ligatures and correct kerning but also allows us to better support languages with more complex shaping requirements, such as Arabic. This includes support for right-to-left text rendering, with automatic language detection enabled by default. Although bidirectional text is not yet fully supported, you can explicitly switch or re-detect direction for specific text segments using embedded TextProperties.

If Panda3D has been compiled with HarfBuzz support, it can be enabled using the text-use-harfbuzz variable. Otherwise, more basic kerning support can be enabled using text-kerning true, although many fonts will only kern correctly with HarfBuzz enabled.

Media playback

Panda3D now directly supports the Opus audio codec, a high-quality open standard designed to efficiently encode both speech and other audio. This is implemented via the opusfile library, so that it doesn’t require pulling in the heavier and more restrictively licensed FFmpeg libraries.

The FFmpeg plug-in now also supports loading video files with an embedded alpha channel, such as is possible with WebM files encoded with the VP8 codec. However, FFmpeg offers both a preferred native implementation and a decoder based on libvpx. The default is the native implementation, so if you wish to play VP8 videos with alpha channel, you should set the ffmpeg-prefer-libvpx configuration variable to true, to force FFmpeg to use the libvpx implementation.

 

We’d also like to highlight ongoing work outside the main Panda3D development branch. These things have been developed for Panda3D and will be merged into the main branch when they have reached maturity. But until then, they can be checked out from their respective branches on GitHub.

Deployment

First off, significant progress has been made on a new deployment system thanks to invaluable contributions by the community. The project is tentatively named “deploy-ng” and intends to make it easier more reliable to package and distribute your finished application, and as such it stands to replace the current deployment system entirely.

This new deployment system builds upon the existing Python setuptools, adding an extra plug-in to easily package your Panda3D applications. It already is quite usable, but still needs some love and testing until it’s production ready.

Graphics back-ends

A significant amount of work has been done on the effort to support two new graphics back-ends. The first of these is the WebGL back-end, happening on the webgl-port branch. This allows us to run Panda3D applications in the browser without requiring the use of a browser plug-in. The bulk of the work on the renderer itself has already been done, but there remains work to be done to make it easier to package up a Panda application for the web. Check out the proof-of-concept demos or the online editor demo if you’re curious about the possibilities.

On the vulkan branch, a prototype renderer for the new Vulkan graphics API has materialized as well. Like OpenGL, Vulkan is a cross-platform open graphics standard developed by Khronos. Unlike OpenGL, however, Vulkan offers a more low-level interface to the underlying graphics hardware, enabling a reduction in driver overhead for CPU-bound applications. Before you get too excited though, it’s not yet capable of running much more than a few of the sample programs. There is a lot more work to be done before it will reach feature-parity with or performance benefits over the OpenGL renderer, and it is unlikely to be a priority for the next release.

glTF 2.0

Behind the curtains there also is work going on to support glTF 2.0. This is a new JSON-based model transmission format recently standardized by the Khronos Group, the consortium that is also responsible for OpenGL, and plug-ins are already available to export it from various content creation tools. Importantly, glTF 2.0 defines a modern standard for physically-based materials, and as such is considered a milestone in the development of a physically-based rendering pipeline in Panda3D.

Input devices

Gamepad support is something that many in the community have been asking about for a long time. The input framework is receiving a significant overhaul to allow us to support game controllers, while also laying the groundwork for exposing commercial virtual reality hardware using a straightforward API. This work is happening on the input-overhaul branch and will be merged into the master branch soon.

 

That’s all for now, but keep an eye open for upcoming blog posts with all new and interesting updates in the coming months. In the meantime we encourage you to try the latest version for yourself and let us know how it works for you.

 

Update for Mac OS X “El Capitan”

Wednesday, November 18th, 2015 by rdb

Several weeks ago, Apple released the latest version of Mac OS X, code-named “El Capitan”. Among other things, it introduced a number of security features, including System Integrity Protection (SIP). This feature primarily places restrictions on which filesystem locations can be modified, even by root processes.

These changes have caused an inability to install and run the Panda3D SDK on Mac OS X 10.11. When running the SDK installer package, you may encounter an error message like “This package is incompatible with this version of OS X and may fail to install”, prematurely interrupting the installation procedure. A similar issue is present in the installer for the Panda3D 1.0.4 Runtime.

An additional problem is the removal of PackageMaker, which is used to produce the installer package for the SDK when building Panda3D from source. It has been replaced by a different set of utilities that fulfill the same purpose.

These issues have been resolved now and a fix will be part of the upcoming 1.9.1 release. In the meantime, to install the Panda3D SDK on Mac OS X without disabling the System Integrity Protection features, you may use a pre-release build of the Panda3D 1.9.1 SDK which has been made available at the following location:
Panda3D-SDK-1.9.1-6fb08f1-MacOSX10.7.dmg
A patched version of the Panda3D 1.0.4 Runtime is available as well:
Panda3D-Runtime-1.0.4-828fe2a.dmg

Please let us know on the bug tracker, forums or IRC channel if there are still issues with this build, so that they can be fixed before the 1.9.1 release.

The New OpenGL Features in Panda3D 1.9

Tuesday, October 7th, 2014 by rdb

We’ve been working hard for the past months to update the OpenGL renderer and bring support for the latest and greatest features that OpenGL has to offer. We’ve not been very good at updating the blog, though, so we decided to make a post highlighting some of those features and how they are implemented into Panda3D. These features will be part of the upcoming Panda3D 1.9.0 release, which should come out within the following month, assuming that everything goes according to plan.

sRGB support (linear pipeline)

Virtually all lighting and blending calculations are written under the assumption that they happen in a linear space, meaning that multiplying a color value with x results in a color value that is x times as bright. However, what is often overlooked by game developers is the fact that the average monitor isn’t linear. CRT monitors have a gamma of around 2.2, meaning that the output luminosity was proportional to the input voltage raised to the power of 2.2. This means that a pixel value of 0.5 brightness isn’t actually half as bright as one of 1.0 brightness, but only around 0.22 times as bright! To compensate for this, content is produced in the sRGB color space, which has a built-in gamma correction of around 1/2.2. Modern monitors and digital cameras are calibrated to use that standard, so that no gamma correction is typically needed to display images in an image viewer or in the browser.

However, this presents an issue for 3D engines that perform lighting and blending calculations. Because both the input and output color space are non-linear, when you have a light that is supposed to attenuate colors to 0.5x brightness, it will actually cause it to show up with 0.22x brightness—more than twice as dark as it should be! This results in dark areas appearing too dark, and the transition between dark and bright areas will look very unnatural. To have a proper linear lighting pipeline, what we have to do is correct for the gamma on both the input and the output side: we have to convert our input textures to linear space, and we have to convert the rendered output back to sRGB space.

It’s very easy for developers to overlook or dismiss this issue as being unimportant, because it doesn’t really affect unlit textures; they look roughly the same because the two wrongs cancel each other out. Developers simply tweak the lighting values to compensate for the incorrect light ramps until it looks acceptable. However, until you properly address gamma correction, your lighting will look wrong. The transition between light and dark will look unnatural, and people may see banding artifacts around specular highlights. This also applies when using techniques like physically based rendering, where it is more important that the lights behave like they would actually behave in real life.

The screenshots below show a scene rendered in Panda3D using physically-based rendering with and without gamma correction. Note how the left image looks far too dark whereas the right image has a far more natural-looking balance of lighting. Click to enlarge.

ColorSpace-UncorrectedColorSpace-Corrected

Fortunately, we now have support for a range of hardware features that can correct for all of these issues automatically—for free. There are two parts to this: sRGB framebuffers and sRGB textures. If the former is enabled, you’re telling OpenGL that the framebuffer is in the sRGB color space, and will let us do all of the lighting calculations in linear space, and will then gamma-adjust it before displaying it on the monitor. However, just doing that would cause your textures to look way too bright since they are already gamma-corrected! Therefore, you can set your textures to the sRGB format to indicate to OpenGL that they are in the sRGB color space, and that they should therefore automatically be converted to linear space when they are sampled.

The nice thing is that all of these operations are virtually free, because they are nowadays implemented in hardware. These features have existed for a long time, and you can rely on the vast majority of modern graphics hardware to correctly implement sRGB support. We’ve added support to the Direct3D 9 renderer as well, and even to our software renderer! However, keep in mind that you can’t always rely on a monitor being calibrated to 2.2 gamma, and therefore it is always best to have a screen that allows the user to calibrate the application’s gamma. We’ve added a special post-processing filter that applies an additional gamma correction to help with that.

To read more about color spaces in the future Panda3D version, check out this manual page, although the details are still subject to change.

Tessellation shaders

Tessellation shaders are a way to let the GPU take a base mesh and subdivide it into more densely tessellated geometry. It is possible to use a simple base mesh and subdivide it to immensely detailed proportions with a tessellation shader, unburdened by the narrow bandwidth between the GPU and the CPU. Their programmable nature allows for continuous and highly dynamic level of detail without popping artifacts.

To enable tessellation, you have to provide two new types of shader: the control shader, which specifies which control points to subdivide and how many times to subdivide each part of the patch, and the evaluation shader, which then specifies what to do with the tessellated vertices. (They are respectively called a “hull shader” and a “domain shader” in Direct3D parlance.) They are used with a new type of primitive, GeomPatches, which can contain any number of vertices. There are helpful methods for automatically converting existing geometry into patches.

One immediately obvious application is LOD-based terrain or water rendering. Only a small number of patches has to be uploaded to the GPU, after which a tesselation control shader can subdivide the patches by an amount that is calculated based on the distance to the camera. In the tesselation evaluation shader, the desired height can be calculated either based on a height map texture or based on procedural algorithms, or a combination thereof.

Another application is displacement mapping, where an existing mesh is subdivided on the GPU and a displacement map is used to displace the vertices of the subdivided mesh. This allows for showing very high-detailed meshes with dynamic level of detail even when the actual base mesh is very low-poly. Panda3D exposes methods that can be used for converting existing triangle geometry to patches to make it easier to apply this technique. Alternatively, this can be done by way of a new primitive type supported by the egg loader.

Both methods are demonstrated in the screenshots below. Click to enlarge.

Displacement Mapping using Tesselation ShadersTessellated Terrain

Thanks to David Rose for implementing this feature! Support for tesselation shaders is available in the development builds.

Compute shaders

Besides the new tessellation shaders mentioned earlier, Panda3D supports vertex shaders, geometry shaders, and fragment shaders. All of these shaders are designed to perform a very particular task in the rendering pipeline, and as such work on a specific set of data, such as the vertices in a geometry mesh or pixels in a framebuffer.

However, because each shader is designed to do a very specific task, it can be difficult to write shaders to do things that the graphics card manufacturer didn’t plan for. Sometimes one might want to implement a fancy ray tracing algorithm or an erosion simulation, or simply make a small modification to a texture on the GPU. These things may require code to be invoked on the GPU at will, and be able to operate on something other than the vertices in a mesh or fragments in a framebuffer.

Enter compute shaders: a type of shader program that is general purpose and can perform a wide variety of tasks on the video card. Somewhat comparable to OpenCL programs, they can be invoked at any point during the rendering process, operating on a completely user-defined set of inputs. Their flexibility allows them to perform a lot of the tasks one might be used to implementing on the CPU or via a render-to-texture buffer. Compute shaders are particularly interesting for parallelizable tasks like physics simulations, global illumination computation, and tiled rendering; but also for simpler tasks like generating a procedural texture or otherwise modifying the contents of a texture on the GPU.

Of note is to mention that a lot of these tasks require another GLSL feature that we’ve added: ARB_image_load_store. This means that it’s not only possible to sample textures in a shader, but also perform direct read and write operations on a texture image. This is particularly useful for compute shaders, but this feature can be used in any type of shader. That means that you can now write to textures from a regular fragment or vertex shader as well.

If you’d like to get into the details, you can read this manual page. Or, if you’re feeling adventurous, you can always check out a development build of Panda3D and try it out for yourself.

In the following screenshots, compute shaders have been used to implement voxel cone tracing: a type of real-time ray tracing algorithm that provides global illumination and soft reflections. Click here to see a video of it in action in Panda3D.

global-illuminationsponza-reflection
The scene on the left is based on a model by Matthew Wong and the scene on the right makes use of the famous Sponza model by Crytek.

Render-to-texture features

We’ve made various enhancements to the render-to-texture functionality by making it possible to render directly to texture arrays, cube maps, and 3D textures. Previously, separate rendering passes were needed to render to each layer, but you can now bind the entire texture and render to it in one go. Using a geometry shader, one selects the layer of the texture to render to, which can be combined with geometry instancing to duplicate the geometry onto the different layers, depending on the desired effect.

One technique where this is immediately useful is cube map rendering (such as in point light shadows), where the geometry can be instanced across all six cube map faces in one rendering pass. This improves performance tremendously by eliminating the need to issue the rendering calls to the GPU six times.

In the screenshots below, however, layered rendering is used to render the various components of an atmospheric scattering model into different layers of a 3D texture. Click to enlarge.

atmospheric-scattering1atmospheric-scattering3

It is now also possible to use viewport arrays to render to different parts of a texture at once, though support for this is still experimental. As with layered render-to-texture, the viewport to render into is selected by writing to a special output variable in the geometry shader. This makes it possible to render into various parts of a texture atlas or render to different areas of the screen within the same rendering pass.

In the screenshots below, it is used to render the shadow maps of different cameras into one big shadow atlas, allowing the rendering of many shadow-casting lights in one rendering pass. The advantages of this approach are that parts of the texture can be rendered to on-demand, so that the number of shadow casters updating within one frame can be effectively limited, as well as the fact that different lights can use a different resolution shadow map.

boxes-shadowspanda-shadows

Stereo FBOs

We’ve long supported stereoscopic rendering and Panda3D could already take advantage of specialized stereo hardware. But now, as part of the development toward Oculus Rift support, we’ve made it possible to create a buffer on any hardware that will automatically render both left and right views when associated with a stereoscopic camera, making it possible to create postprocessing effects in your stereoscopic application without having to create two separate buffers.

With the multiview texture support introduced in Panda3D 1.8, a single texture object can contain both left and right views, and Panda3D automatically knows which view to use. This makes enabling stereo rendering in an application that uses post-processing filters very straightforward, without needing to set up two separate buffers or textures for each view, and it can be enabled or disabled at the flick of a switch.

Debugging and profiling features

We now take advantage of timer query support with the new GPU profiling feature added to PStats: Panda3D can now ask the driver to measure how much time the draw operations actually take to complete, rather than the CPU time it takes to issue the commands. This feature is instrumental in finding performance bottlenecks in the rendering pipeline by letting you know exactly which parts of the process take the longest.

The reason this feature is important is that PStats currently only displays the time it takes for the OpenGL drawing functions to finish. Most OpenGL functions, however, only cause the commands to be queued up to be sent to the GPU later, and return almost immediately. This makes the performance statistics very misleading, and makes it very difficult to track down bottlenecks in the draw thread. By inserting timer queries into the command stream, and asking for the results thereof a few frames later, we know how much time the commands actually take without significantly delaying the rendering pipeline.

It is also possible to measure the command latency, which is the time that it takes for the GPU to catch up with the CPU and process the draw commands that the CPU has issued.

GPU Timing with PStats

This feature is available in current development builds. The documentation about this feature is forthcoming.

Other features

This blog post is by no means a comprehensive listing of all the features that will be available in the new version, but here are a few other features we thought were worth mentioning:

Comparison of cube mapping with and without seamless mode enabled

Comparison of cube mapping with and without seamless mode enabled

  • We’ve added support for seamless cube maps, to eliminate the seams that can appear up on the edges of cube maps, especially at lower mip map levels. This is enabled by default, assuming that the driver indicates support.
  • We also added support for the new KHR_debug and ARB_debug_output extensions, which give much more fine-grained and detailed debugging information when something goes wrong.
  • We’ve added a range of performance improvements, and we’ve got even more planned!
  • We’ve made it even easier to write GLSL shaders by adding more shader inputs that Panda3D can provide. This includes not only built-in inputs containing render attributes, but also a greater amount of custom input types, such as integer data types and matrix arrays.
  • Line segments are now supported in a single draw call through use of a primitive restart index (also known as a strip-cut index), improving line drawing performance.
  • It is now possible to access integer vertex data using ivec and uvec data types, and create textures with an integer format (particularly useful for atomic access from compute shaders).
  • In the fixed-function pipeline, the diffuse lighting component is now calculated separately from the specular lighting component. This means that the specular highlight will no longer be tinted by the diffuse color. This is usually more desirable and better matches up with Direct3D behavior. The old effect can still be obtained by multiplying the diffuse color into the specular color, or by disabling a configuration flag.

Special thanks go to Tobias Springer, who has been using some of the new features in his project, for graciously contributing the screenshots for some of the listed features. His results with Panda3D look amazing!

Buffer protocol support

Tuesday, February 11th, 2014 by rdb

I’d like to talk for a moment about the new buffer protocol support in the latest development version of Panda3D. It’s not a particularly exciting feature, but it can be an important one, especially if you use Panda3D together with other libraries like NumPy or if you need to do a lot of low-level operations on texture or geometry data from Python. However, most use cases will not require this functionality.

The Python buffer protocol is a way for Python applications to get a direct hook into C/C++ memory, arrays in particular. Panda3D classes that support it provide a pointer into the underlying memory to the Python interpreter along with a description of how the data is laid out in memory. This description is necessary for Python to know how to access and copy the information.

Starting with Python 2.6, you can use the built-in multiview type to access the memory underlying an object exposing the buffer interface. You can then manipulate the data by converting it to a list or array.array object, creating sub-views and operating on those, writing or reading parts of it to a file, or even just modifying the memory directly as if it were a regular Python list.

Right now, the only Panda3D classes that expose the buffer interface are GeomVertexArrayData and PointerToArray (the latter of which is used for most array storage purposes in Panda3D, including textures), but more classes can easily be added on request. Conversely, Panda allows taking an existing buffer object (such as from array.array or a NumPy array) as source data for a texture or a vertex data array.

When copying data to other libraries such as NumPy, this can help cut down on unnecessary copy operations. Presently, you would call a method like get_data() to create a C++ string, which is one copy operation already, which would then be wrapped into a Python string, which is another. Finally, you would pass this string to NumPy, which would perform at least one more copy operation to copy it into its own representation. But since NumPy also supports the buffer protocol, you can now copy the contents of a texture or of a vertex data array straight into a NumPy array without any unnecessary copy operations.

One other interesting use case for this feature is the fast and efficient manipulation of vertex data and texture data from Python. Instead of having to create a GeomVertexRewriter or a PNMImage to modify the respective data, you can now create a memoryview to iterate over the data directly, easily copy subsets around, or page them out to disk. In my own use case, which involved a lot of geometry generation and manipulation, this flexibility allowed me to dramatically decrease the time spent generating geometry and flattening it. Direct access to the memory also allowed me to quickly page chunks of geometry data out to disk and back into memory when necessary.

The buffer protocol provides a lot of the flexibility of low-level memory access without being exposed to all of the intricacies of C/C++ memory management. In particular, the data is reference counted, so you don’t need to worry about deleting the data. You should in fact be able to keep multiviews around for non-immediate consumption, but keep in mind that you may still need to tell Panda3D when you’ve modified the data later (for instance with an additional explicit call to modify_ram_image()).

This feature will be available in the 1.9.0 release of the Panda3D SDK.

Triple your frame rate?

Thursday, September 29th, 2011 by drwr

Historically, Panda has always run single-core. And even though the Panda3D codebase has been written to provide true multithreaded, multi-processor support when it is compiled in, by default we’ve provided a version of Panda built with the so-called “simple threads” model which enforces a single-core processing mode, even on a multi-core machine. But all that is changing.

Beginning with the upcoming Panda3D version 1.8, we’ll start distributing Panda with true threads enabled in the build, which enables you to take advantage of true parallelization on any modern, multi-core machine. Of course, if you want to use threading directly, you will have to deal with the coding complexity issues, like deadlocks and race conditions, that always come along with this sort of thing. And the Python interpreter is still fundamentally single-core, so any truly parallel code must be written in C++.

But, more excitingly, we’re also enabling an optional new feature within the Panda3D engine itself, to make the rendering (which is all C++ code) run entirely on a sub-thread, allowing your Python code to run fully parallel with the rendering process, possibly doubling your frame rate. But it goes even further than that. You can potentially divide the entire frame onto three different cores, achieving unprecedented parallelization and a theoretical 3x performance improvement (although, realistically, 1.5x to 2x is more likely). And all of this happens with no special coding effort on your part, the application developer–you only have to turn it on.

How does it work?

To use this feature successfully, you will need to understand something about how it works. First, consider Panda’s normal, single-threaded render pipeline. The time spent processing each frame can be subdivided into three separate phases, called “App”, “Cull”, and “Draw”:

app, cull, draw

In Panda’s nomenclature, “App” is any time spent in the application yourself, i.e. your program. This is your main loop, including any Python code (or C++ code) you write to control your particular game’s logic. It also includes any Panda-based calculations that must be performed synchronously with this application code; for instance, the collision traversal is usually considered to be part of App.

“Cull” and “Draw” are the two phases of Panda’s main rendering engine. Once your application code finishes executing for the frame, then Cull takes over. The name “Cull” implies view-frustum culling, and this is part of it; but it is also much more. This phase includes all processing of the scene graph needed to identify the objects that are going to be rendered this frame and their current state, and all processing needed to place them into an ordered list for drawing. Cull typically also includes the time to compute character animations. The output of Cull is a sorted list of objects and their associated states to be sent to the graphics card.

“Draw” is the final phase of the rendering process, which is nothing more than walking through the list of objects output by Cull, and sending them one at a time to the graphics card. Draw is designed to be as lightweight as possible on the CPU; the idea is to keep the graphics command pipe filled with as many rendering commands as it will hold. Draw is the only phase of the process during which graphics commands are actually being issued.

You can see the actual time spent within these three phases if you inspect your program’s execution via the PStats tool. Every application is different, of course, but in many moderately complex applications, the time spent in each of these three phases is similar to the others, so that the three phases roughly divide the total frame time into thirds.

Now that we have the frame time divided into three more-or-less equal pieces, the threaded pipeline code can take effect, by splitting each phase into a different thread, so that it can run (potentially) on a different CPU, like this:

app, cull, draw on separate threads

Note that App remains on the first, or main thread; we have only moved Cull and Draw onto separate threads. This is important, because it means that all of your application code can continue to be single-threaded (and therefore much easier and faster to develop). Of course, there’s also nothing preventing you from using additional threads in App if you wish (and if you have enough additional CPU’s to make it worthwhile).

If separating the phases onto different threads were all that we did, we wouldn’t have accomplished anything useful, because each phase must still wait for the previous phase to complete before it can proceed. It’s impossible to run Cull to figure out what things are going to be rendered before the App phase has finished arranging the scene graph properly. Similarly, it’s impossible to run Draw until the Cull phase has finished processing the scene graph and constructing the list of objects.

However, once App has finished processing frame 1, there’s no reason for that thread to sit around waiting for the rest of the frame to be finished drawing. It can go right ahead and start working on frame 2, at the same time that the Cull thread starts processing frame 1. And then by the time Cull has finished processing frame 1, it can start working on culling frame 2 (which App has also just finished with). Putting it all in graphical form, the frame time now looks like this:

The fully staged render pipeline

So, we see that we can now crank out frames up to three times faster than in the original, single-threaded case. Each frame now takes the same amount of time, total, as the longest of the original three phases. (Thus, the theoretical maximum speedup of 3x can only be achieved in practice if all three phases are exactly equal in length.)

It’s worth pointing out that the only thing we have improved here is frame *throughput*–the total number of frames per second that the system can render. This approach does nothing to improve frame *latency*, or the total time that elapses between the time some change happens in the game, and the time it appears onscreen. This might be one reason to avoid this approach, if latency is more important than throughput. However, we’re still talking about a total latency that’s usually less than 100ms or so, which is faster than human response time anyway; and most applications (including games) can tolerate a small amount of latency like this in exchange for a smooth, fast frame rate.

In order for all of this to work, Panda has to do some clever tricks behind the scenes. The most important trick is that there need to be three different copies of the scene graph in different states of modification. As your App process is moving nodes around for frame 3, for instance, Cull is still analyzing frame 2, and must be able to analyze the scene graph *before* anything in App started mucking around to make frame 3. So there needs to be a complete copy of the scene graph saved as of the end of App’s frame 2. Panda does a pretty good job of doing this efficiently, relying on the fact that most things are the same from one frame to the next; but still there is some overhead to all this, so the total performance gain is always somewhat less than the theoretical 3x speedup. In particular, if the application is already running fast (60fps or above), then the gain from parallelization is likely to be dwarfed by the additional overhead requirements. And, of course, if your application is very one-sided, such that almost all of its time is spent in App (or, conversely, almost all of its time is spent in Draw), then you will not see much benefit from this trick.

Also, note that it is no longer possible for anything in App to contact the graphics card directly; while App is running, the graphics card is being sent the drawing commands from two frames ago, and you can’t reliably interrupt this without taking a big performance hit. So this means that OpenGL callbacks and the like have to be sensitive to the threaded nature of the graphics pipeline. (This is why Panda’s interface to the graphics window requires an indirect call: base.win.requestProperties(), rather than base.win.setProperties(). It’s necessary because the property-change request must be handled by the draw thread.)

Early adopters are invited to try this new feature out today, before we formally release 1.8. It’s already available in the current buildbot release; to turn it on, see the new manual page on the subject. Let us know your feedback! There are still likely to be kinks to work out, so we’d love to know how well it works for you.

Panda3D and Cython

Sunday, September 12th, 2010 by Craig

This is about how to speed up your Python Code, and has no direct impact on Panda3D’s performance. For most projects, the vast majority of the execution time is inside Panda3D’s C++ or in the GPU, so no matter what you do, fixing your Python will never help. For the other cases where you do need to speed up your Python code, Cython can help. This is mainly addressed to people who prefer programming in Python, but know at least a little about C. I will not discuss how to do optimizations within Python, though if this article is relevant to you, you really should look into it.

Cython is an interesting programming language. It uses an extended version of python’s syntax to allow things like statically typed variables, and direct calls into C++ libraries. Cython compiles this code to C++. The C++ then compiles as a python extension module that you can import and use just like a regular python module. There are several benefits to this, but in our context the main one is speed. Properly written Cython code can be as fast as C code, which in some particular cases can be even 1000 times faster than nearly identical python code. Generally you won’t see 1000x speed increases, but it can be quite a bit. This does cause the modules to only work on the platform they were compiled for, so you will need to compile alternate versions for different platforms.

By default, Cython compiles to C, but the new 0.13 version supports C++. This is more useful as you probably use at least one C++ library, Panda3D. I decided to try this out, and after stumbling on a few simple issues, I got it to work, and I don’t even know C++.

Before I get to the details, I’ll outline why you might want to use Cython, rather than porting performance bottlenecks to C++ by hand. The main benefit is in the process, as well as the required skill set. If you have a large base of Python code for a project, and you decide some of it needs to be much faster, you have a few options. The common approach seems to be to learn C++, port the code, and learn how to make it so you can interface to it from python. With Cython, you can just add a few type definitions on variables where you need the performance increase, and compile it which gives you a Python modules that works just like the one you had. If you need to speed up the code that interfaces with Panda3D, you can swap the Python API calls for C++ ones. Using Cython allows you to just put effort into speeding up the parts of code you need to work on, and to do so without having to change very much. This is vastly different from ditching all the code and reimplementing it another language. It also requires you to learn a pretty minimal amount of stuff. You also get to keep the niceness of the Python syntax which may Python coders have come to appreciate.

There are still major reasons to actually code in C++ when working with Panda, but as someone who does not do any coding in C++, I won’t talk about it much. If you want to directly extend or contribute to Panda3D, want to avoid redundantly specifying your imports from header files (Cython will require you to re-specify the parts of API you are using rather than just using the header files shipped with Panda), or you simply prefer C++, C++ may be a better option. I mainly see Cython as a convenient option when you end up needing to speed up parts of a Python code-base; however, it is practical to undertake large projects from the beginning in Cython.

Cython does have some downsides as well. It is still in rather early development. This means you will encounter bugs in its translators as well as the produced code. It also lacks support for a few Python features, such as most uses of generators. Generally I haven’t had much trouble with these issues, but your experience may differ.

Cython does offer an interesting side benefit as well. It allows you to optionally   statically type variables and thus can detect more errors at compile time than Python.

To get started, first you need an install of Cython 0.13 (or probably any newer version). If you have a Cython install you can check the version with the -V command. You can pick up the source from the Cython Site, and install it by running “python setup.py install” from the Cython source directory. You will also need to have a compiler. The Cython site should help you get everything setup if you need additional guidance.

Then you should try out a sample to make sure you have everything you need, and that it’s all working. There is a nice C++ sample for Cython on the Cython Wiki. (This worked for me on Mac, and on Windows using MinGW or MSVC as a compiler).

As for working with Panda3D, there are a few things I discovered:

  • There are significant performance gains to be had by just compiling your existing Python modules as Cython. With a little additional work adding static typed variables, you can have larger performance gains without even moving over to Panda’s C++ API (Which means you don’t need to worry about linking against Panda3D which can be an issue).
  • Panda3D already has python bindings with nice memory management, so I recommend instancing all the objects using the python API, and only switching to the C++ one as needed.
  • You can use the ‘this’ property on Panda3D’s Python objects to get a pointer to the underlying C++ object.
  • On mac, you need to make sure libpanda (and is some cases, possibly others as well) is loaded before importing your extension module if you use any of Panda3D’s libraries.
  • On Windows, you need to specify the libraries you need when compiling (in my case, just libpanda)
  • The C++ classes and Python classes tend to have the same name. To resolve this, you can use “from x import y as z” when importing the python ones, or you can just import panda3d.core, and use the full name of the classes (like panda3d.core.Geom). There may be a way to rename the C++ classes on import too.
  • If using the Panda3D C++ API on Windows, you will need to use the MSVC compiler. You can get Microsoft Visual Studio 2008 Express Edition for free which includes the needed compiler.

Using this technique I got a 10x performance increase on my code for updating the vertex positions in my Geom. It avoided having to create python objects for all of the vertexes and passing them through the Python API which translates them back to C++ objects. It was just a matter of moving over one call in the inner loop to the other API. This, however, was done in already optimized Cython code that was simply loading vertex positions stored in a block of memory into the Geom. Most use cases would likely see less of a benefit. Overall though, I gained a lot of performance both from the change over to Cython, and from the change over to the C++ API. These changes only required relatively small changes to the speed critical portions of my existing python code.

I made a rather minimal example of using a Panda3D C++ API call from Cython. Place the setup.py and the testc.pyx files in the same directory, and from the said directory, run setup.py with your Python install you use with Panda3D. If everything is properly configured, this should compile the example Cython module, testc.pyx, to a python extension module and run it. If it works, it will print out a few lines ending with “done”. It is likely you may need to tweak the paths in setup.py. If not on Mac or Windows, you will get an error indicating where you need to enter your compiler settings (mostly just the paths to Panda3D’s libraries).

I would like to thank Lisandro Dalcin from the Cython-Users mailing list who helped me get this working on Windows.

Pandai Library: A Quick Review of panda3d.ai

Tuesday, July 27th, 2010 by MikeC

The Entertainment Technology Center (ETC) at Carnegie Mellon University in 2009 launched a graduate student project to add a collection of artificial intelligence behaviors like seek, flock, and evade along with 2D pathfinding to Panda3D.  This blog post is a reminder that this work is available as part of Panda 1.7.0, and receives ongoing attention from the ETC team.  The timeline for this work has been as follows:

  • Summer 2009: Collect feedback from the Panda3D community via Pandai forum post on requirements for an AI library. This forum post remains active today to address additions to the Pandai code base.
  • August – December 2009:  Start with Craig Reynolds’ published work on flocking behavior and A* algorithm for two-dimensional pathfinding between points. Develop the C++ code, based on community feedback, and prototyping with Building Virtual Worlds graduate student work at the ETC.
  • December 2009:  At the insistence of ETC Faculty advisors Ruth Comley and myself, the Pandai student team created a number of demonstrations and a detailed Pandai ETC Project Web site documenting the project work. The demonstrations show capabilities, such as a fish demonstration that shows wander, pursue, and evade.  The project web site includes further descriptions on the project team, motivations for the work, and downloadable content, including the art assets (like the fish) and code (the fish demo) needed to run demonstrations.  See Pandai ETC Project download page.

    Pandai demo: fish pursue hook until one is caught, then evade it

    Pandai demo: fish pursue hook until one is caught, then evade it

  • January 2010: Thanks to rdb, the Pandai library was published as part of the Panda3D 1.7.0 release. One change of note regarding the downloadable examples from the ETC Pandai Project web site: rather than “from libpandaai import *” the Python code should use “from panda3d.ai import *”. With this minor edit, you will be able to download and run the fish demo and others using Panda 1.7.0.
  • July 2010: Ongoing collection of feedback from the Pandai forum post led to the release of a Blender meshgen tool for pathfinding. A link to this tool has been added to the Pandai ETC Project download page.

The Pandai ETC team is responsible for version 1.0 of the Pandai library, and remain active in its support. Your comments are welcome here regarding the Pandai effort and shared code and examples. For help requests, continue the thread within the Pandai forum post.

Porting to Java

Thursday, April 1st, 2010 by rdb

Note: This was an April Fool’s Joke. Please do not take any information in this blog post seriously.

As you all know, the languages currently supported by Panda3D are Python and C++. Unfortunately, this forces the user to choose from a trade-off between simplicity and performance. Python is simple and fast to prototype with, however its performance is very poor. A CPU-intensive algorithm in Python will typically run hundreds of times slower than the same algorithm implemented in C++. C++ on the other hand provides almost native performance, but it comes with a plethora of inconveniences for the developer, the most notable being that it’s easy to induce a crash or cause your application to leak memory.

Enter Java, a language designed to be a middle-ground between these two goals. Java is a modern, high-level language with strong OO capabilities and garbage collection that doesn’t expose the coder to the dangers of manual memory management as C++. But Java is also an order of magnitude faster than Python.

In light of these properties of the Java language, the development team has unanimously decided to adopt Java as the only supported language for the Panda3D API. As 1.7.0 has just been released, now is the perfect time to switch. The upcoming 1.7.3 release of Panda3D will drop all Python and C++ code, in favor of Java. Effectively today, development has commenced on a Perforce repository that drops support for Python and C++.

What does this mean to you? Let’s start by comparing how Panda usage will look in the future as opposed to now. This is the current basic Panda example in Python that you may recognize from the manual:

from direct.showbase.ShowBase import ShowBase
 
class MyApp(ShowBase):
 
    def __init__(self):
        ShowBase.__init__(self)
 
        # Load the environment model.
        self.environ = self.loader.loadModel("models/environment")
        # Reparent the model to render.
        self.environ.reparentTo(self.render)
        # Apply scale and position transforms on the model.
        self.environ.setScale(0.25, 0.25, 0.25)
        self.environ.setPos(-8, 42, 0)
 
 
app = MyApp()
app.run()

And this is how the same will be achieved now in Java:

import org.panda3d.*;

public class MyApp extends ShowBase {

    private NodePath environ;

    public MyApp() {
        this.environ = this.getLoader().loadModel("models/environment");
        // Reparent the model to render.
        this.environ.reparentTo(this.getRender());
        // Apply scale and position transforms on the model.
        this.environ.setScale(0.25, 0.25, 0.25);
        this.environ.setPos(-8, 42, 0);
    }

    public static void main(String args[]) {
        new MyApp().run();
    }

}

Needless to say, this is a major improvement over the Python equivalent. This will definitely help Panda3D expand in the marketplace since Java alone has more demand than Python and C++ combined (see the graph below).

Java versus Python and C++

We are already seeing the first benefits of changing to Java. For example, we have replaced our build system, makepanda, with Java’s ant. This allows us to leverage the XML format to streamline the build process’ bottom-line in a monitored, decentralized way.

We hope that this multi-tiered non-volatile migration process enables us to provide synergized encompassing software emulation through realigned composite management trends, and resulting in universal global process improvement in the end.

We are also in the process of officially renaming the engine into Janda3D. This is because the P in “Panda3D” stands for Python. This involves registering a new domain name and registering the trademark with the US Patent and Trademark Office, so it may take some time.

Please stay tuned until our next blog post, in which we will explain how we plan to make the networking system in Panda3D version 1.7.9 fully compatible with RFC 2324.

Janda3D

Panda SE Project

Wednesday, March 17th, 2010 by Bei

During the past few months, several students at Carnegie Mellon University’s Entertainment Technology Center (ETC) have been working on improving the egging process as well as incrementally improving the shader system.  Just take a look at their smiling faces!

Panda SE Team Photo

From Left: Wei-Feng Huang, Federico Perazzi, Shuying Feng (Panda), Deepak Chandrasekaran, Andrew Gartner

For those of you that have been with Panda 3D a long time you’ll know that there have been ETC Panda 3D projects in the past.  Some of them have had limited success due to an oversized project scope.  This project will instead focus on making complete feature sets rather than half implemented pieces like those past unsuccessful projects.  It will also focus on documentation both within the code and the manual to make sure that you, the Panda community will be able to take their work and build on top of it.

With that said, this project will primarily focus on two things:

  1. The shader inputs
  2. The egging/model exporting process

Shader Inputs

If you’ve taken a look at the source code of Panda 3D’s shader system and have had any experience in professional game engine development, you’ll notice that it’s a system that isn’t implemented fully.  Actually, the first shader system was an ETC student project and it has since then been improved through other ETC projects and the Panda 3D community.  Shader inputs is continuing this work in a structured manor.

Shaders have supported the input of arrays and arrays of vectors for quite some time.  However, Panda 3D has never supported this.  There have been some hacks in the past where arrays are passed as textures, but this is not ideal for performance and it ruins texture caching schemes.  After this project completes, users will be able to input arrays and arrays of vectors/matrices directly into the shader.

Screenshot of multiple lights demo

Screenshot of multiple lights demo

This may not seem that exciting at first but this lays the groundwork for many more things.  If your new to computer graphics having a complete shader inputs system allows for some of the following just to name a few.

  • Hardware accelerated actors/characters
  • Shader based instancing with dynamic texture and animation support (crowds)
  • Shader based vegetation system (fast trees and grass)
  • A real deferred shading system
  • A real light manager system for shader based lights

A Real Egging Pipeline

Up until now, there have been several attempts at user interfaces to the maya2egg, dae2egg, etc.  Most of them are just simple user interfaces to the command line equivalents of them.  This new user interface is much more than that.  It is an artist friendly build system.  Just check out some of the features.

  • Simple mode for when you don’t want a build system
  • Support for multiple maya versions
  • Support for egg tools such as egg-opt-char and egg-palettize
  • A batching system that automatically detects whether a file has been changed to allow for minimal rebuilds
  • Support for all tools to be built into batch system
  • Save/Load batch scripts

Like shader inputs this lays the groundwork for much future work.  For any game engine to be professional quality, it needs a set of robust artist tools such as node-based shader generators and artist friendly level editors.

Screenshot of WIP Egging GUI

Screenshot of WIP Egging GUI