render 1000+ objects

Azraiyl · September 12, 2007, 6:06pm

Hi,

I’m trying to write a small application which let you build worlds with simple bricks (not more than 100 triangles). When I arrange them in quadtree-alike structure Panda can cull away the invisible bricks fast enough. What I wonder ist the raw performance. When I write a simple app where all bricks are visible …

for x in range(-9, 9 + 1):
   for y in range(-7, 7 + 1):
      for z in range(0, 7 + 1):
         brick = loader.loadModelCopy("brick.bam")
         brick.reparentTo(render)
         brick.setPos(x, y, z)
render.flattenStrong()

… I got about 50 fps (no textures, no shader, nothing). Is this normal?

First I thought this is awfull slow (~13e6 triangles/per seconds, far less then my card is able to draw). Then I switched to Irrlicht, started to write the same thing in C++ and got 17 fps.
Any idea to improve a piece of code that draws a lot of simple objects?

Thanks in advance, Azraiyl

BTW Well done Panda3D guys. Never thought that Panda3D can compete with others on raw numbers.

â‚¬dit:

forgot to add a pstat screeny.

Josh_Yelon · September 12, 2007, 7:09pm

Turn on directtools, which is basically a scene graph browser. It will let you see whether or not flattenstrong really flattened the scene graph. Let us know.

drwr · September 12, 2007, 7:13pm

That’s a lot of time spent in cull. That’s the overhead to traverse all of the individual nodes; so your cull time is large because you have lots of nodes in your graph.

The reason you have this overhead is that flattenStrong() is not able to combine the bricks into a single node, because each brick was loaded as a separate model, and by default, flattenStrong() will not cross model file boundaries.

To improve the performance of flattenStrong(), try this variant:

brick = loader.loadModel("brick.bam").getChild(0)

Note that:

(a) loadModelCopy() is deprecated; you should use loadModel() instead (it does the same thing)

(b) calling flattenStrong() will combine all the bricks into, ideally, a single node, which is optimal when all bricks are visible, but defeats the culling properties of your quadtree-like structure. Ideally, you will probably want to all flattenStrong() at the cullable unit level, where the ideal number of bricks in each cullable unit depends heavily on your particular graphics card.

© Even if you do want to flatten all the bricks into a single node, in general, it’s probably better to assemble all of the bricks under a new node, not render, so you don’t have to call flattenStrong() on render itself. Calling flattenStrong() on render is often asking for trouble, since your camera and mouse controls are under render too.

(d) Why wouldn’t you have thought that Panda3D could compete with other graphics engines on raw numbers?

David

mindstormss · September 12, 2007, 10:41pm

Not to hijack the thread, but I found that most other programming forums view panda as slow and old, a basis for which I cannot find any reason for.

Josh_Yelon · September 13, 2007, 3:03am

Here’s what I think.

Azraiyl is actually atypical. Most users would just create thousands of individual bricks, and not realize that you can’t render thousands of objects in any engine. They would never think to look for a tool like “flattenStrong,” because they don’t know they’re doing anything wrong. So then they see their program running slowly, they don’t see anything wrong in their own code (that they know). So they start wondering if the engine is to blame.

If you are actively looking for reasons to distrust an engine, you can always find them. In panda’s case, there are several:

First, panda uses python, and python is a slow language. Not that that actually matters — all the important code is in C++. But this isn’t about rational evaluation, it’s about people’s expectations.

Second, ogre is more popular than panda. People tend to assume that an engine which is popular must be good, and that an engine which is less popular is less good. Reasonable assumption, but not always right.

Third, panda is mostly known for toontown, which is a kids’ game. Rational or no, people assume that an engine used to create toys must be a toy. That’s illogical, but people aren’t logical.

Combine all those pseudo-rational reasons to distrust panda, with a newbie who’s actively looking for an excuse to blame the engine, and you’ve got a bad situation.

Manakel · September 13, 2007, 7:28am

My two own cents…

I personnaly do the the same for any engine i try .
Maybe Azrail also.

I do create an absurd big and complex scene (very like the real world) with huge number of object , a very flat hierarchy, no optimisation.

Here i don’t mind if every engine have low fps
Then i try to optimize my scene with the engine tool.
That’s where you know if the engine is good

In all case, i’m not sure we can compare Ogre and P3D . P3D is a full engine, Ogre is very dedicated to the Rendering only…

Azraiyl · September 13, 2007, 10:59am

First of all I would like to thank you for all answers. A call to render.analyze() reveals that the bricks are merged together now. I have to find an algorithm now which is able to construct a tree with a good speed/flexibility tradeoff. (I will post as pstat screeny later, the graphiccard I have here is not the same)

root = render.attachNewNode('root')
for x in range(-9, 9 + 1):
   for y in range(-7, 7 + 1):
      for z in range(0, 7 + 1):
         brick = loader.loadModel("brick.bam").getChild(0)
         brick.reparentTo(root)
         brick.setPos(x, y, z)
root.flattenStrong()

I have only on more question. In my example the root node is completely visible (showBounds is inside the frustum). When the root node is invisible, Panda3D does not care about its children’s, but why does Panda3D try to cull children’s if the parent is completely inside the frustum?

Why I thought that Panda3D is slow? On the “Panda3D Features” page there is, unlike most other engines, no word about speed. Therefore I thought the developers do not care about it.
But with a application as simple as mine I think that the speed of Python should not matter because I only define a scene. The render loop doesn’t have to call a single line of Python code.

OT: I am glad that I get one more reason to stay with Panda3D. My personal opinion: This time it is impossible to create a game like Unreal XY, Half Life, Neverwinter Nights etc… with 3 or 4 people. But all these games have one thing in common. The ideas and concepts they use are old. They all only care about graphics (e.g. the AI is often as dumb as ever). With Panda3D you can care about fancy ideas and the like. That is not the case with other OpenSource engines IMHO (they are limited to BSP levels, care only about graphics, write/compile/run cycle needs too much time, useless documentation, …).

Josh_Yelon · September 13, 2007, 3:28pm

In panda, the “cull” pass does quite a few things other than just culling. The cull pass is where it computes node transforms, accumulates render attributes, queues the model for rendering, does bin-sorting, and so forth. So the cull pass has to visit all objects that are going to be rendered.

I’m a little surprised, though, that the cull pass was taking as long as the render pass. I wouldn’t have expected that.

Azraiyl · September 13, 2007, 4:27pm

As promised here is the screenhot. But the framerate was so instable that I had to create more bricks (x 10). Nevertheless I had still ~50fps.

drwr · September 13, 2007, 6:12pm

That “flip” time is video sync. Your graphics driver is holding up each frame waiting for the video refresh to begin.

This is a good thing to do when you’re actually running a real application–there’s no point in rendering frames faster than the monitor can display them. But strictly for the purposes of getting your fps as high as possible, you can turn it off.

To do this, put:

sync-video 0

in your Config.prc file.

Azraiyl · September 13, 2007, 7:40pm

Already done this and disabled sync on the graphic card driver directly. Maybe I should analyze it with NVPerfKit but i looks like i have to recompile Panda3D to select the correct driver (pandadx9 isn’t enough).

drwr · September 13, 2007, 9:36pm

Not sure what you mean there. There’s only one graphics driver per video card, isn’t there?

But, yeah, I misspoke. I shouldn’t have asserted that the flip time is definitively wait-on-video-sync time. It might also be time waiting for the graphics card to finish processing the previous frame’s drawing commands, which is likely to be the case here (especially if you’re confident you’ve turned off video sync).

In that case, you’re now rendering as fast your card can go: Panda’s waiting on the graphics card, not vice-versa. Though there is one more trick you can try. You can put:

sync-flip 0

in your Config.prc, which will delay the flip operation for as long as possbile, and should enable just a hair more parallelization between the CPU and the graphics card.

David

T_Rex · September 14, 2007, 12:55pm

Its also illogical if they think that games made for an older audience are not also toys

To be fair, though, even if the user is a newbie (like me) who doesn’t really know how to optimize the engine, different engines do have different performance levels and the only way to know how two engines compare is to test them.

Azraiyl · September 16, 2007, 4:41pm

Maybe anyone is interested how I do it now. One requirement was that one should be able to hide/show individual layers (divided by greenish planes). When the application starts it creates a NodePath for each layer. Then it creates a NodePath for each cluster inside this layer. A cluster (blueish cubes) consists of bricks. When a “level” is loaded this clusters are populated with bricks.
Without any further optimisation the scene can be large (50000+ bricks) but I only get 25 FPS. When I call flattenStrong on each cluster I get ~200 FPS. If the user likes to modify a particular cluster (remove or modify a brick) I recreate the cluster, modify it and call flattenStrong once more. When I like to hide a layer I simply call hide() on the appropriate NodePath.

Still reading? I happy if anyone has any ideas to improve it. On request I’ll put together an example.

weihuan · September 18, 2007, 1:12pm

actually we need example to understand, thanks

Azraiyl · September 19, 2007, 7:14pm

Give me 3-4 day and I’ll try my best.

Azraiyl

Azraiyl · September 23, 2007, 10:52am

Here it is. Maybe anyone should move this thread to showcases. To run the script you have to create an appropriate link to ppython on your own.

l -> load level
f -> flatten level (should run faster afterwards)
0-9 -> show/hide layers
o -> oobe mode (move around camera)
a -> render.analyze

http://rapidshare.com/files/57645508/brickworld.zip.html

There are variables (like OPTIMIZE) and pieces of commented code which change the behaviour of the example.

rdb · September 24, 2007, 3:46pm

Could you perhaps host it somewhere else, on a free filehost? I’m over my free rapidshare limit.

Azraiyl · September 24, 2007, 7:30pm

files-upload.com/files/519838/brickworld.zip

weihuan · September 27, 2007, 2:28am

the link files-upload.com/files/519838/brickworld.zip has “404 Not Found” error. can change to other host, Azraiyl? thank u:)