Posts Tagged ‘C++’

July 2018 Development Update

Sunday, August 26th, 2018 by fireclaw

Despite the vacation period, the developers have not remained idle in July. Here is an update on some of the new developments.

Collision shapes

While the internal collision system provides many useful tests between various collision solids, the CollisionTube solid (representing a capsule shape) in particular was only really useful as an “into” collision shape. Many of you have requested for more tests to be added so that it can also be used as a “from” shape, since many see it as a good fit for use with character bodies. Earlier, we had already added tube-into-plane and tube-into-sphere tests. We have now also extended this to include tube-into-tube tests and tube-into-box tests. We have also added a line-into-box test to complete the suite of CollisionBox tests.

For those who are using Bullet physics instead of the internal collision system, we have also extended the ability to convert collision solids from the Panda3D representation to the Bullet representation to include CollisionTube and CollisionPlane as well. These solids can now be easily converted to a BulletCapsuleShape and BulletPlaneShape, respectively. This way you can add these shapes directly to your .egg models and load them into your application without needing custom code to convert them to Bullet shapes.

Depth buffer precision

As most Panda3D programmers will know, two important variables to define when configuring a camera in any game are the “near” and “far” distances. These determine the range of the depth buffer; objects at the near distance have a value in the depth buffer of 0.0, whereas objects at the far plane have a value of 1.0. As such, they also determine the drawing range: objects that fall outside this range cannot be rendered. This is fundamental to how perspective rendering works in graphics APIs.

As it happens, because of the way the projection matrix is defined, it is actually possible to set the “far” distance to infinity. Panda3D added support for this a while ago already. Because of the reciprocal relationship between the distance to the camera and the generated depth value, the near distance is far more critical to the depth precision than the far distance. If it is too low, then objects in the distance will start to flicker as the differences in depth values between different objects becomes 0; the video card can no longer tell the difference between their respective distances and gets confused about which surface to render in front of the other. This is usually known as “Z-fighting”.

This is a real problem in games that require a very large drawing distance, while still needing to render objects close to the camera. There are a few ways to deal with this.

One way people usually try to resolve this is by increasing the precision of the depth buffer. Instead of the default 24 bits of depth precision, we can request a floating-point depth buffer, which has 32 bits of depth precision. However, since 32-bit floating-point numbers still have a 24-bit mantissa, this does not actually improve the precision by that much. Furthermore, due to the exponential nature of floating-point numbers, most precision is actually concentrated near 0.0, whereas we actually need precision in the distance.

As it turns out, there is a really easy way to solve this: just invert the depth range! By setting the near distance to infinity, and the far distance to our desired near distance, we get an inverted depth range whereby a value of 1.0 is close to the camera and 0.0 is infinitely far away. This turns out to radically improve the precision of the depth buffer, as further explained by this NVIDIA article, since the exponential precision curve of the floating-point numbers now complements the inverse precision curve of the depth buffer. We also need to swap the depth comparison function so that objects that are behind other objects won’t appear in front of them instead.

There is one snag, though. While the technique above works quite well in DirectX and Vulkan, where the depth is defined to range from 0.0 to 1.0, OpenGL actually uses a depth range of -1.0 to 1.0. Since floating-point numbers are most precise near 0.0, this actually puts all our precision uselessly in the middle of the depth range:

This is not very helpful, since we want to improve depth precision in the distance. Fortunately, the OpenGL authors have remedied this in OpenGL 4.5 (and with the GL_ARB_clip_control extension for earlier versions), where it is possible to configure OpenGL to use a depth range of 0.0 to 1.0. This is accomplished by setting the gl-depth-zero-to-one configuration variable to `true`. There are plans to make this the default Panda3D convention in order to improve the precision of projection matrix calculation inside Panda3D as well.

All the functionality needed to accomplish this is now available in the development builds. If you wish to play with this technique, check out this forum thread to see what you need to do.

Double precision vertices in shaders

For those who need the greatest level of numerical precision in their simulations, it has been possible to compile Panda3D with double-precision support. This makes Panda3D perform all transformation calculations with 64-bit precision instead of the default 32-bit precision at a slight performance cost. However, by default, all the vertex information of the models are still uploaded as 32-bit single-precision numbers, since only recent video cards natively support operations on 64-bit precision numbers. By setting the vertices-float64 variable, the vertex information is uploaded to the GPU as double-precision.

This worked well for the fixed-function pipeline, but was not supported when using shaders, or when using an OpenGL 3.2+ core-only profile. This has now been remedied; it is possible to use double-precision vertex inputs in your shaders, and Panda3D will happily support this in the default shaders when vertices-float64 is set.

Interrogate additions

The system we use to provide Python bindings for Panda3D’s C++ codebase now has limited support for exposing C++11 enum classes to Python 2 as well by emulating support for Python 3 enums. This enables Panda3D developers (and any other users of Interrogate) to use C++11 enum classes in order to better wrap enumerations in the Panda3D API.

Multi-threading

We have continued to improve the thread safety of the engine in order to make it easier to use the multi-threaded rendering pipeline. Mutex lock have been added to the X11 window code, which enables certain window calls to be safely made from the App thread. Furthermore, a bug was fixed that caused a crash when taking a screenshot from a thread other than the draw thread.

June 2018 Development Update

Tuesday, July 31st, 2018 by fireclaw

Due to the vacation period, this post is somewhat delayed, but the wait is finally at an end. Here is the new update with a selection of the developments in June.

OpenGL on macOS

At the Apple WWDC in June of this year, Apple formally announced the deprecation of the OpenGL graphics API, in favor of an Apple-only graphics API called Metal. This move puzzled many as a lot of software is relying on OpenGL for macOS support, including Panda3D, and investing significant resources into developing a whole new rendering back-end to support just a relatively small segment of the market is hard to justify.

While it seems likely that—despite the deprecation notice—Apple will continue to include OpenGL support as a part of macOS, we will need to start looking at another approach for maintaining first-class support for high-end graphics on macOS. It would be nice if there were one next-gen graphics API that would be well-supported on all platforms going forward.

Enter MoltenVK. This is an implementation of the cross-platform Vulkan API for macOS and iOS, implemented as a wrapper layer on top of Apple’s Metal API. It has recently been open sourced by the Khronos group, in an effort to make Vulkan available on every operating system. Despite being a wrapper layer, it was still found by Valve to have increased performance benefits over plain OpenGL. This will let us focus our efforts on implementing Vulkan and thereby support all platforms.

Accordingly, we have increased the priority towards developing a Vulkan renderer, and it has made several strides forward in the past months. It is still not quite able to render more than the simplest of sample programs, however. We will keep you updated as developments unfold.

Mouselook smoothness improved

In order to implement camera movement using the mouse, many applications use move_pointer to reset the cursor to the center of the window every frame. However, when the frame rate gets low, the mouse movement can become choppy. This is usually only an issue occurring on Windows as other platforms support the M_relative mouse mode, which obviates the need to reset the mouse cursor. For all those who can’t or don’t use this mode, a significant improvement in smoothing this movement has now been worked out.
In previous versions, small movements that occurred between the last iteration of the event loop and the call to move_pointer could have been ignored, causing the mouselook to become choppy. This has been fixed by changing the get_pointer() method to always obtain the latest mouse cursor position from the operating system, resulting in more accurate mouse tracking. In the future, we will further improve on this by emulating relative mouse mode on Windows and perhaps even adding a new interface for handling mouselook.

Grayscale and grayscale-alpha support with FFmpeg

Panda3D’s implementation of FFmpeg now gained the ability to load videos that only contain grayscale colours more efficiently, greatly reducing the memory usage compared to the previous approach of converting all videos to full RGB. This is also extended to videos that have a grayscale channel as well as an alpha channel for transparency. Do note, however, that not all video formats support grayscale pixel formats.

Removing ‘using namespace std’ from Panda3D headers

C++ users, take note: besides the changes to add better C++11 support, we have also eliminated the bad practice of having using namespace std; in the public Panda3D headers. This prevents the entire std namespace from being pulled in and causing namespace clashes. If your codebase relied on being able to use standard types without an explicit std:: prefix, you may need to add using namespace std to your own headers. It would be even better to fully qualify your references to standard types and functions with an std:: prefix or—where appropriate—pull in specific types with individual using std::x; statements.

Bullet Vehicle chassis-to-wheel synchronisation

Using Bullet’s BulletVehicle class to simulate car physics had a problem with syncing the wheels to the chassis if the wheels were parented to the the vehicle’s RigidBodyNode. The wheel models would always be one frame ahead of the chassis model, resulting in visible artifacts as the wheels would appear to move away from the chassis. Especially if the vehicle accelerates to a significant speed, the wheels may appear to drift quite far from the vehicle. This bug was fixed thanks to a contributor.

April 2018 Development Update

Tuesday, May 22nd, 2018 by fireclaw

Welcome back to our monthly developer news. This month we don’t have much interesting to offer on the current developments. Most time has been spent to fix bugs and iron out existing functionality. So we decided to provide you with some outlook on which future developments you can expect to see next.

New app framework

Behind the scenes, much discussion has taken place regarding future developments of Panda3D.  In this post, we will discuss one of the plans on the table, namely to redesign the high-level application layer of the Panda3D library.

The current design of the ShowBase class as a monolithic singleton makes it easy to get started with prototyping a game in Panda, but it does have some downsides. It is cluttered with methods and variables that may not be needed by every application. It also makes running two instances of the application within the same process harder, and because it is implemented in Python, it is not available to C++ developers.

Our intent is to introduce a replacement for ShowBase that not only resolves these issues, but also encourages best practices and makes it easier for developers to write games for non-desktop platforms, such as mobile, web and VR.

For example, the ShowBase design gives the application developer complete agency over the lifetime of the application.  This is a great fit for traditional desktop applications, but not so great for mobile apps, where the operating system needs to be able to suspend and resume applications at any time in order to free up resources when needed.  Doing this today in Panda3D requires developers having to specifically detect these states, whereas we would ideally make this an intuitive part of the game structure.

Another example is virtual reality, where there is not really a meaningful “window” in the traditional sense, but the content is rather projected into a virtual 3D environment, with any GUI being superimposed into the scene in layers.  We would like to represent these concepts in a way that makes it easy for developers to develop applications for both VR and desktop, while giving them full control over the additional possibilities that virtual reality offers.

While the intent is not to remove control that game developers might need, we want to make a framework that encourages best practices that work well for all platforms.  We will follow up on this in the future with more concrete plans, but rest assured that ShowBase will remain available for the foreseeable future for any applications that rely on it.

CMake

CMake has in recent years rapidly become the de facto standard build system for open-source projects. As we have reported on before, we are working on replacing Panda’s own build system “makepanda” with it. This work is still underway on the cmake branch in GitHub.

For Panda3D developers, the integration of CMake in IDEs and other tools is far better than that of makepanda. This also allows us to better integrate the testing framework and code checking tools. Using CMake also simplifies building and directly running a built Panda3D instance (without installing) from the build directory using the rpath feature on Linux.
Another benefit is the simplified cross-compilation and exotic compiler support for mobile or ARM-based systems. In the future there might be support for CMake’s “exported targets” feature, meaning C++ users of Panda3D would not have to configure their libraries/includes/search path if they’re using CMake as well. Another exciting benefit is that CMake can lower the build times by roughly 30% dependent on the system and setup.
A little side effect of using CMake instead of makepanda is that we have less code to maintain and we get the benefits of the CMake community’s development efforts as a whole. That also means we get to focus our resources exclusively on making the engine itself awesome as we’re not distracted with the build tools.

The CMake branch is currently in a good enough state where it works great on Linux/BSD, and macOS support isn’t far behind. Windows support is slightly trickier and needs some more work before it gets up to that level of support. Mobile platforms are not yet supported.

If you run Linux or macOS feel free to grab the branch and try it for yourself. Whenever you find any bugs, just fill a report on the GitHub issue tracker or ask in the forum or on the IRC channel.

Ubuntu 18.04 “Bionic Beaver”

The most recent LTS version of Ubuntu, 18.04 has been released a few weeks ago with recent software packages. Panda3D has been successfully built on this version and updated install packages are available for the 1.10 development builds.

Panda3D and Cython

Sunday, September 12th, 2010 by Craig

This is about how to speed up your Python Code, and has no direct impact on Panda3D’s performance. For most projects, the vast majority of the execution time is inside Panda3D’s C++ or in the GPU, so no matter what you do, fixing your Python will never help. For the other cases where you do need to speed up your Python code, Cython can help. This is mainly addressed to people who prefer programming in Python, but know at least a little about C. I will not discuss how to do optimizations within Python, though if this article is relevant to you, you really should look into it.

Cython is an interesting programming language. It uses an extended version of python’s syntax to allow things like statically typed variables, and direct calls into C++ libraries. Cython compiles this code to C++. The C++ then compiles as a python extension module that you can import and use just like a regular python module. There are several benefits to this, but in our context the main one is speed. Properly written Cython code can be as fast as C code, which in some particular cases can be even 1000 times faster than nearly identical python code. Generally you won’t see 1000x speed increases, but it can be quite a bit. This does cause the modules to only work on the platform they were compiled for, so you will need to compile alternate versions for different platforms.

By default, Cython compiles to C, but the new 0.13 version supports C++. This is more useful as you probably use at least one C++ library, Panda3D. I decided to try this out, and after stumbling on a few simple issues, I got it to work, and I don’t even know C++.

Before I get to the details, I’ll outline why you might want to use Cython, rather than porting performance bottlenecks to C++ by hand. The main benefit is in the process, as well as the required skill set. If you have a large base of Python code for a project, and you decide some of it needs to be much faster, you have a few options. The common approach seems to be to learn C++, port the code, and learn how to make it so you can interface to it from python. With Cython, you can just add a few type definitions on variables where you need the performance increase, and compile it which gives you a Python modules that works just like the one you had. If you need to speed up the code that interfaces with Panda3D, you can swap the Python API calls for C++ ones. Using Cython allows you to just put effort into speeding up the parts of code you need to work on, and to do so without having to change very much. This is vastly different from ditching all the code and reimplementing it another language. It also requires you to learn a pretty minimal amount of stuff. You also get to keep the niceness of the Python syntax which may Python coders have come to appreciate.

There are still major reasons to actually code in C++ when working with Panda, but as someone who does not do any coding in C++, I won’t talk about it much. If you want to directly extend or contribute to Panda3D, want to avoid redundantly specifying your imports from header files (Cython will require you to re-specify the parts of API you are using rather than just using the header files shipped with Panda), or you simply prefer C++, C++ may be a better option. I mainly see Cython as a convenient option when you end up needing to speed up parts of a Python code-base; however, it is practical to undertake large projects from the beginning in Cython.

Cython does have some downsides as well. It is still in rather early development. This means you will encounter bugs in its translators as well as the produced code. It also lacks support for a few Python features, such as most uses of generators. Generally I haven’t had much trouble with these issues, but your experience may differ.

Cython does offer an interesting side benefit as well. It allows you to optionally   statically type variables and thus can detect more errors at compile time than Python.

To get started, first you need an install of Cython 0.13 (or probably any newer version). If you have a Cython install you can check the version with the -V command. You can pick up the source from the Cython Site, and install it by running “python setup.py install” from the Cython source directory. You will also need to have a compiler. The Cython site should help you get everything setup if you need additional guidance.

Then you should try out a sample to make sure you have everything you need, and that it’s all working. There is a nice C++ sample for Cython on the Cython Wiki. (This worked for me on Mac, and on Windows using MinGW or MSVC as a compiler).

As for working with Panda3D, there are a few things I discovered:

  • There are significant performance gains to be had by just compiling your existing Python modules as Cython. With a little additional work adding static typed variables, you can have larger performance gains without even moving over to Panda’s C++ API (Which means you don’t need to worry about linking against Panda3D which can be an issue).
  • Panda3D already has python bindings with nice memory management, so I recommend instancing all the objects using the python API, and only switching to the C++ one as needed.
  • You can use the ‘this’ property on Panda3D’s Python objects to get a pointer to the underlying C++ object.
  • On mac, you need to make sure libpanda (and is some cases, possibly others as well) is loaded before importing your extension module if you use any of Panda3D’s libraries.
  • On Windows, you need to specify the libraries you need when compiling (in my case, just libpanda)
  • The C++ classes and Python classes tend to have the same name. To resolve this, you can use “from x import y as z” when importing the python ones, or you can just import panda3d.core, and use the full name of the classes (like panda3d.core.Geom). There may be a way to rename the C++ classes on import too.
  • If using the Panda3D C++ API on Windows, you will need to use the MSVC compiler. You can get Microsoft Visual Studio 2008 Express Edition for free which includes the needed compiler.

Using this technique I got a 10x performance increase on my code for updating the vertex positions in my Geom. It avoided having to create python objects for all of the vertexes and passing them through the Python API which translates them back to C++ objects. It was just a matter of moving over one call in the inner loop to the other API. This, however, was done in already optimized Cython code that was simply loading vertex positions stored in a block of memory into the Geom. Most use cases would likely see less of a benefit. Overall though, I gained a lot of performance both from the change over to Cython, and from the change over to the C++ API. These changes only required relatively small changes to the speed critical portions of my existing python code.

I made a rather minimal example of using a Panda3D C++ API call from Cython. Place the setup.py and the testc.pyx files in the same directory, and from the said directory, run setup.py with your Python install you use with Panda3D. If everything is properly configured, this should compile the example Cython module, testc.pyx, to a python extension module and run it. If it works, it will print out a few lines ending with “done”. It is likely you may need to tweak the paths in setup.py. If not on Mac or Windows, you will get an error indicating where you need to enter your compiler settings (mostly just the paths to Panda3D’s libraries).

I would like to thank Lisandro Dalcin from the Cython-Users mailing list who helped me get this working on Windows.