Posts Tagged ‘multithreading’

July 2018 Development Update

Sunday, August 26th, 2018 by fireclaw

Despite the vacation period, the developers have not remained idle in July. Here is an update on some of the new developments.

Collision shapes

While the internal collision system provides many useful tests between various collision solids, the CollisionTube solid (representing a capsule shape) in particular was only really useful as an “into” collision shape. Many of you have requested for more tests to be added so that it can also be used as a “from” shape, since many see it as a good fit for use with character bodies. Earlier, we had already added tube-into-plane and tube-into-sphere tests. We have now also extended this to include tube-into-tube tests and tube-into-box tests. We have also added a line-into-box test to complete the suite of CollisionBox tests.

For those who are using Bullet physics instead of the internal collision system, we have also extended the ability to convert collision solids from the Panda3D representation to the Bullet representation to include CollisionTube and CollisionPlane as well. These solids can now be easily converted to a BulletCapsuleShape and BulletPlaneShape, respectively. This way you can add these shapes directly to your .egg models and load them into your application without needing custom code to convert them to Bullet shapes.

Depth buffer precision

As most Panda3D programmers will know, two important variables to define when configuring a camera in any game are the “near” and “far” distances. These determine the range of the depth buffer; objects at the near distance have a value in the depth buffer of 0.0, whereas objects at the far plane have a value of 1.0. As such, they also determine the drawing range: objects that fall outside this range cannot be rendered. This is fundamental to how perspective rendering works in graphics APIs.

As it happens, because of the way the projection matrix is defined, it is actually possible to set the “far” distance to infinity. Panda3D added support for this a while ago already. Because of the reciprocal relationship between the distance to the camera and the generated depth value, the near distance is far more critical to the depth precision than the far distance. If it is too low, then objects in the distance will start to flicker as the differences in depth values between different objects becomes 0; the video card can no longer tell the difference between their respective distances and gets confused about which surface to render in front of the other. This is usually known as “Z-fighting”.

This is a real problem in games that require a very large drawing distance, while still needing to render objects close to the camera. There are a few ways to deal with this.

One way people usually try to resolve this is by increasing the precision of the depth buffer. Instead of the default 24 bits of depth precision, we can request a floating-point depth buffer, which has 32 bits of depth precision. However, since 32-bit floating-point numbers still have a 24-bit mantissa, this does not actually improve the precision by that much. Furthermore, due to the exponential nature of floating-point numbers, most precision is actually concentrated near 0.0, whereas we actually need precision in the distance.

As it turns out, there is a really easy way to solve this: just invert the depth range! By setting the near distance to infinity, and the far distance to our desired near distance, we get an inverted depth range whereby a value of 1.0 is close to the camera and 0.0 is infinitely far away. This turns out to radically improve the precision of the depth buffer, as further explained by this NVIDIA article, since the exponential precision curve of the floating-point numbers now complements the inverse precision curve of the depth buffer. We also need to swap the depth comparison function so that objects that are behind other objects won’t appear in front of them instead.

There is one snag, though. While the technique above works quite well in DirectX and Vulkan, where the depth is defined to range from 0.0 to 1.0, OpenGL actually uses a depth range of -1.0 to 1.0. Since floating-point numbers are most precise near 0.0, this actually puts all our precision uselessly in the middle of the depth range:

This is not very helpful, since we want to improve depth precision in the distance. Fortunately, the OpenGL authors have remedied this in OpenGL 4.5 (and with the GL_ARB_clip_control extension for earlier versions), where it is possible to configure OpenGL to use a depth range of 0.0 to 1.0. This is accomplished by setting the gl-depth-zero-to-one configuration variable to `true`. There are plans to make this the default Panda3D convention in order to improve the precision of projection matrix calculation inside Panda3D as well.

All the functionality needed to accomplish this is now available in the development builds. If you wish to play with this technique, check out this forum thread to see what you need to do.

Double precision vertices in shaders

For those who need the greatest level of numerical precision in their simulations, it has been possible to compile Panda3D with double-precision support. This makes Panda3D perform all transformation calculations with 64-bit precision instead of the default 32-bit precision at a slight performance cost. However, by default, all the vertex information of the models are still uploaded as 32-bit single-precision numbers, since only recent video cards natively support operations on 64-bit precision numbers. By setting the vertices-float64 variable, the vertex information is uploaded to the GPU as double-precision.

This worked well for the fixed-function pipeline, but was not supported when using shaders, or when using an OpenGL 3.2+ core-only profile. This has now been remedied; it is possible to use double-precision vertex inputs in your shaders, and Panda3D will happily support this in the default shaders when vertices-float64 is set.

Interrogate additions

The system we use to provide Python bindings for Panda3D’s C++ codebase now has limited support for exposing C++11 enum classes to Python 2 as well by emulating support for Python 3 enums. This enables Panda3D developers (and any other users of Interrogate) to use C++11 enum classes in order to better wrap enumerations in the Panda3D API.


We have continued to improve the thread safety of the engine in order to make it easier to use the multi-threaded rendering pipeline. Mutex lock have been added to the X11 window code, which enables certain window calls to be safely made from the App thread. Furthermore, a bug was fixed that caused a crash when taking a screenshot from a thread other than the draw thread.

May 2018 Development Update

Sunday, June 24th, 2018 by rdb

With the work on the new input system and the new deployment system coming to a close, it is high time we shift gears and focus our efforts on bundling all this into a shiny new release. So with an eye toward a final release of Panda3D 1.10, most of the work in May has centered around improving the engine’s stability and cleaning up the codebase.

As such, many bugs and regressions have been fixed that are too numerous to name. I’m particularly proud to declare the multithreaded render pipeline significantly more stable than it was in 1.9. We have also begun to make better use of compiler warnings and code-checking tools. This has led us find bugs in the code that we did not even know existed!

We announced two months ago that we were switching the minimum version of the Microsoft Visual C++ compiler from 2010 to 2015. No objections to this have come in, so this move has been fully implemented in the past month. This has cleared the way for us to make use of C++11 to the fullest extent, allowing us to write more robust code and spend less of our time writing compiler-specific code or maintaining our own threading library, which ultimately results in a better engine for you.

Thinking ahead

Behind the scenes, many design discussions have been taking place regarding our plans for the Panda3D release that will follow 1.10. In particular, I’d like to highlight a proposed new abstraction for describing multi-pass rendering that has begun to take shape.

Multi-pass rendering is a technique to render a scene in multiple ways before compositing it back together into a single rendered image. The current way to do this in Panda3D hinges on the idea of a “graphics buffer” being similar to a regular on-screen window, except of course that it does not appear on screen. At the time this feature was added, this matched the abstractions of the underlying graphics APIs quite well. However, it is overly cumbersome to set up for some of the most common use cases, such as adding a simple post-processing effect to the final rendered image. More recent additions like FilterManager and the RenderPipeline’s RenderTarget system have made this much easier, but these are high-level abstractions that simply wrap around the same underlying low-level C++ API, which still does not have an ideal level of control over the rendering pipeline.

That last point is particularly relevant in our efforts to provide the most optimal level of support for Oculus Rift and for the Vulkan rendering API. For reasons that go beyond the scope of this post, implementing these in the most optimal fashion will require Panda3D to have more complete knowledge of how all the graphics buffers in the application fit together to produce the final render result, which the current API makes difficult.

To remedy this, the proposed approach is to let the application simply describe all the rendering passes up-front in a high-level manner. You would graph out how the scene is rendered by connecting the inputs and outputs of all the filters and shaders that should affect it, similar to Blender’s compositing nodes. You would no longer need to worry about setting up all the low-level buffers, attachments, cameras and display regions. This would all be handled under the hood, enabling Panda3D to optimize the setup to make better use of hardware resources. We could even create a file format to allow storing such a “render blueprint” in a file, so something like loading and enabling a deferred rendering pipeline would be possible in only a few lines of code!

This is still in the early design stages, so we will talk about these things in more detail as we continue to iron out the design. If you have ideas of your own to contribute, please feel free to share them with us!

Helping out

In the meantime, we will continue to work towards a final release of 1.10. And this is the time when you can shine! If you wish to help, you are encouraged to check out a development build of Panda3D from the download page (or installed via our custom pip index) and try it with your projects. If you encounter an issue, please go to the issue tracker on GitHub and let us know!

February 2018 Development Update

Tuesday, March 13th, 2018 by rdb

We again bring you some of the highlights of the new developments in the past month. This is however not an exhaustive list of changes, which can be obtained from the commit logs in the GitHub repository.

Android developments

We’re happy to announce that the Android port has made great strides forward in the past weeks. Most importantly, there is now a complete Python interpreter added to the Panda3D app, so that Python applications can be run directly from the Android device. Many of the sample programs now run as well as they do on the desktop.

It is now even possible to compile Panda3D entirely on an Android device, using an app that provides a Linux-like terminal environment. One free and open-source example is termux, which provides a package manager that gives access to an impressive suite of software, including the compilers needed to compile Panda3D.

Furthermore, using a bash script that we provide, it is also possible to launch Panda3D from termux and pipe the output from the interpreter app back to the terminal, so that you can develop Panda apps on Android the same way as you would on the desktop.

Eventually, we intend to make it possible to use the new deploy-ng system to produce an Android package from any operating system, making it easy to deploy your application to both desktop and mobile platforms.

In a separate effort, a commercial port of Panda3D to Android has been released on the app store, called Cub3D. This is an impressively complete development environment for Panda3D apps, and can be used today to develop and run Panda apps on Android. (This software is developed by a third-party company that is not affiliated with the Panda3D team.)

Roaming Ralph running on an Android tablet

Roaming Ralph running on an Android tablet

Ball in Maze running on an Android watch with Cub3D

Ball in Maze running on an Android watch with Cub3D

CMake improvements

Behind the scenes, we have been working on adding support for the CMake build system on the cmake branch. CMake is a popular cross-platform build system that is rapidly becoming the de facto standard for open-source projects, and it is eventually intended to replace our own makepanda build script. This will make it easier to build Panda from source and to customize the build process. Last month, the CMake port saw some updates and improvements, bringing us closer to this goal.

Bullet thread safety

Users of the Bullet physics engine were previously encountering crashes when using Bullet in conjunction with the multithreaded render pipeline. This is caused by the fact that the Bullet library is not thread safe. To remedy this, we have added locking in the Panda3D wrapper for Bullet that protects the Bullet library from access by multiple threads, which makes it safe to use the Bullet integration in a multithreaded program.

December 2017 Development Update

Monday, January 22nd, 2018 by fireclaw

Hello everyone to our heavily delayed December post. Even though we’re quite late with this one due to lack of time, we wish you all a happy new year! Much has happened during the winter holidays, so read on to see what’s new.

What happened in the last month

The work on the input overhaul branch is almost complete and needs some more polish to finalize the API before we will merge it into the development trunk. In addition, we started to add a mapping table for known devices to have them work as expected. For other devices, the mapping is provided by the device driver with the help of some heuristics to detect the device type. Currently the input overhaul is still in heavy development and API changes will occur, though for those who are interested in testing it, sample applications are available and some manual entries with more or less accurate instructions have been created and will be finalized as soon as the API is stable.

Some long-standing bugs with the multithreaded pipeline were finally resolved. These issues caused deadlocks that occurred whenever geometry data was being manipulated on another thread, or when shadows were enabled using the automatic shader generator; as such, they were a significant barrier that prevented the multithreaded pipeline from being useful for some applications. However, more testing is needed before we can be completely confident that all the threading issues are resolved.

On macOS, it is now possible to get an offscreen rendering buffer without opening a window. This lets you render to a buffer on a headless mac server, which can be useful for plenty of things. Aside from scientific simulations where no immediate output is necessary or even desirable, another example is to send a frame rendered by Panda3D over a socket or network to display it elsewhere. This technique is used in the BlenderPanda project to render a Panda3D frame into a Blender viewport and thereby get a live display of how a model will look when used with the engine.

Looking into the crystal ball

In the coming months some of the newly developed features (input-overhaul, deploy-ng) will be polished off and merged into the master branch of panda3d. More work is also planned on the introduction of a new physically-based material model as well as support for the glTF 2.0 format. Stay tuned for more updates!