I’m trying to write a small application which let you build worlds with simple bricks (not more than 100 triangles). When I arrange them in quadtree-alike structure Panda can cull away the invisible bricks fast enough. What I wonder ist the raw performance. When I write a simple app where all bricks are visible …
for x in range(-9, 9 + 1):
for y in range(-7, 7 + 1):
for z in range(0, 7 + 1):
brick = loader.loadModelCopy("brick.bam")
brick.reparentTo(render)
brick.setPos(x, y, z)
render.flattenStrong()
… I got about 50 fps (no textures, no shader, nothing). Is this normal?
First I thought this is awfull slow (~13e6 triangles/per seconds, far less then my card is able to draw). Then I switched to Irrlicht, started to write the same thing in C++ and got 17 fps.
Any idea to improve a piece of code that draws a lot of simple objects?
Thanks in advance, Azraiyl
BTW Well done Panda3D guys. Never thought that Panda3D can compete with others on raw numbers.
Turn on directtools, which is basically a scene graph browser. It will let you see whether or not flattenstrong really flattened the scene graph. Let us know.
That’s a lot of time spent in cull. That’s the overhead to traverse all of the individual nodes; so your cull time is large because you have lots of nodes in your graph.
The reason you have this overhead is that flattenStrong() is not able to combine the bricks into a single node, because each brick was loaded as a separate model, and by default, flattenStrong() will not cross model file boundaries.
To improve the performance of flattenStrong(), try this variant:
brick = loader.loadModel("brick.bam").getChild(0)
Note that:
(a) loadModelCopy() is deprecated; you should use loadModel() instead (it does the same thing)
(b) calling flattenStrong() will combine all the bricks into, ideally, a single node, which is optimal when all bricks are visible, but defeats the culling properties of your quadtree-like structure. Ideally, you will probably want to all flattenStrong() at the cullable unit level, where the ideal number of bricks in each cullable unit depends heavily on your particular graphics card.
Azraiyl is actually atypical. Most users would just create thousands of individual bricks, and not realize that you can’t render thousands of objects in any engine. They would never think to look for a tool like “flattenStrong,” because they don’t know they’re doing anything wrong. So then they see their program running slowly, they don’t see anything wrong in their own code (that they know). So they start wondering if the engine is to blame.
If you are actively looking for reasons to distrust an engine, you can always find them. In panda’s case, there are several:
First, panda uses python, and python is a slow language. Not that that actually matters — all the important code is in C++. But this isn’t about rational evaluation, it’s about people’s expectations.
Second, ogre is more popular than panda. People tend to assume that an engine which is popular must be good, and that an engine which is less popular is less good. Reasonable assumption, but not always right.
Third, panda is mostly known for toontown, which is a kids’ game. Rational or no, people assume that an engine used to create toys must be a toy. That’s illogical, but people aren’t logical.
Combine all those pseudo-rational reasons to distrust panda, with a newbie who’s actively looking for an excuse to blame the engine, and you’ve got a bad situation.
First of all I would like to thank you for all answers. A call to render.analyze() reveals that the bricks are merged together now. I have to find an algorithm now which is able to construct a tree with a good speed/flexibility tradeoff. (I will post as pstat screeny later, the graphiccard I have here is not the same)
root = render.attachNewNode('root')
for x in range(-9, 9 + 1):
for y in range(-7, 7 + 1):
for z in range(0, 7 + 1):
brick = loader.loadModel("brick.bam").getChild(0)
brick.reparentTo(root)
brick.setPos(x, y, z)
root.flattenStrong()
I have only on more question. In my example the root node is completely visible (showBounds is inside the frustum). When the root node is invisible, Panda3D does not care about its children’s, but why does Panda3D try to cull children’s if the parent is completely inside the frustum?
Why I thought that Panda3D is slow? On the “Panda3D Features” page there is, unlike most other engines, no word about speed. Therefore I thought the developers do not care about it.
But with a application as simple as mine I think that the speed of Python should not matter because I only define a scene. The render loop doesn’t have to call a single line of Python code.
OT: I am glad that I get one more reason to stay with Panda3D. My personal opinion: This time it is impossible to create a game like Unreal XY, Half Life, Neverwinter Nights etc… with 3 or 4 people. But all these games have one thing in common. The ideas and concepts they use are old. They all only care about graphics (e.g. the AI is often as dumb as ever). With Panda3D you can care about fancy ideas and the like. That is not the case with other OpenSource engines IMHO (they are limited to BSP levels, care only about graphics, write/compile/run cycle needs too much time, useless documentation, …).
In panda, the “cull” pass does quite a few things other than just culling. The cull pass is where it computes node transforms, accumulates render attributes, queues the model for rendering, does bin-sorting, and so forth. So the cull pass has to visit all objects that are going to be rendered.
I’m a little surprised, though, that the cull pass was taking as long as the render pass. I wouldn’t have expected that.
That “flip” time is video sync. Your graphics driver is holding up each frame waiting for the video refresh to begin.
This is a good thing to do when you’re actually running a real application–there’s no point in rendering frames faster than the monitor can display them. But strictly for the purposes of getting your fps as high as possible, you can turn it off.
Already done this and disabled sync on the graphic card driver directly. Maybe I should analyze it with NVPerfKit but i looks like i have to recompile Panda3D to select the correct driver (pandadx9 isn’t enough).
Not sure what you mean there. There’s only one graphics driver per video card, isn’t there?
But, yeah, I misspoke. I shouldn’t have asserted that the flip time is definitively wait-on-video-sync time. It might also be time waiting for the graphics card to finish processing the previous frame’s drawing commands, which is likely to be the case here (especially if you’re confident you’ve turned off video sync).
In that case, you’re now rendering as fast your card can go: Panda’s waiting on the graphics card, not vice-versa. Though there is one more trick you can try. You can put:
sync-flip 0
in your Config.prc, which will delay the flip operation for as long as possbile, and should enable just a hair more parallelization between the CPU and the graphics card.
Its also illogical if they think that games made for an older audience are not also toys
To be fair, though, even if the user is a newbie (like me) who doesn’t really know how to optimize the engine, different engines do have different performance levels and the only way to know how two engines compare is to test them.
Maybe anyone is interested how I do it now. One requirement was that one should be able to hide/show individual layers (divided by greenish planes). When the application starts it creates a NodePath for each layer. Then it creates a NodePath for each cluster inside this layer. A cluster (blueish cubes) consists of bricks. When a “level” is loaded this clusters are populated with bricks.
Without any further optimisation the scene can be large (50000+ bricks) but I only get 25 FPS. When I call flattenStrong on each cluster I get ~200 FPS. If the user likes to modify a particular cluster (remove or modify a brick) I recreate the cluster, modify it and call flattenStrong once more. When I like to hide a layer I simply call hide() on the appropriate NodePath.
Still reading? I happy if anyone has any ideas to improve it. On request I’ll put together an example.