Actor instancing sample

birukoff · April 3, 2009, 1:34pm

If you played Left4Dead, you might wonder how they manage to show dozens of animated and detailed zombies at the same time. The answer is: instancing.
This is very powerful technique that allows you to display dozens of characters onscreen with much better framerates then if you assign a unique actor for each of them separately. Your CPU will need to animate only some of the actors, while other “clones” just share the same animation (the disadvantage is all instances of an actor will have the same model and the same animation at the same frame).
I have made this small sample in order to demonstrate this technique. For example, on my ancient laptop I can have 100 visible actors onscreen (10 unique actors with 10 instances each) at 60 fps. At the same time, having 100 unique actors onscreen can blow up my laptop.
Please, enjoy and share your feedback.

from pandac.PandaModules import *
from direct.actor.Actor import Actor
from direct.gui.DirectGui import OnscreenText
import random, sys

# General settings
use_multiple_colors = True
use_multiple_textures = False

screenWidth, screenHeight = 500, 500

# Apply settings and run Panda
loadPrcFileData("", "win-size %d %d" % (screenWidth, screenHeight))
loadPrcFileData("", "sync-video #f")
#loadPrcFileData("", "want-pstats #t")
#loadPrcFileData("", "hardware-animated-vertices #t")
import direct.directbase.DirectStart

base.setBackgroundColor(0, 0, 0.2)
base.setFrameRateMeter(True)
base.disableMouse()

# Helper function
def addInstructions(pos, msg):
    return OnscreenText(text=msg, style=1, fg=(1,1,1,1), mayChange=1,
                        pos=(-1, pos), align=TextNode.ALeft, scale = .05,
                        shadow=(0,0,0,1), shadowOffset=(0.1,0.1))

# Readout and instructions
inst1 = addInstructions(0.95, 'Left/Right Arrow : descrease/increase the number of actors (currently: unknown)')
inst2 = addInstructions(0.90, 'Down/Up Arrow : descrease/increase the number of clones (currently: unknown)')
inst3 = addInstructions(0.85, 'Visible actors onscreen: unknown')
inst4 = addInstructions(0.80, 'Space : pause/resume animation')
inst5 = addInstructions(0.75, 'Enter : print render.ls() info')
inst6 = addInstructions(0.70, 'Escape : quit')

# Setup camera
base.camera.setPos(0, 0, 10)
base.camera.lookAt(0, 0, 0)
lens = OrthographicLens()
lens.setFilmSize(screenWidth/10, screenHeight/10)
base.cam.node().setLens(lens)

# The actual script
actors = []
instances = {}
instances_for_actor = 1
animPlaying = True

class Instance():
    def __init__(self, actor, color=None, tex=None):
        self.placeholder = render.attachNewNode("placeholder")
        x = (random.random()-0.5) * screenWidth/10
        y = (random.random()-0.5) * screenHeight/10
        self.placeholder.setPos(x, y, 0)
        self.instanced_model = actor.instanceTo(self.placeholder)

        # To set different color on the instanced actor, apply setColor()
        # method on the parent of the instance:
        if color:
            self.placeholder.setColor(*color)

        # To add/replace the actor's texture, apply setTexture() method
        # on the parent of the instance:
        if tex:
            self.placeholder.setTexture(tex, 1)

    def destroy(self):
        self.instanced_model.detachNode()
        self.placeholder.detachNode()

def _newActor():
    actor = Actor("samples/Roaming-Ralph/models/ralph",
                {"run":"samples/Roaming-Ralph/models/ralph-run"})
    actor.flattenStrong()
    actor.postFlatten()
    if animPlaying:
        actor.loop("run")
    return actor
def _addInstance(actor):
    # In this sample I use random color and texture for each instance.
    color = None
    if use_multiple_colors:
        color = (random.random(), random.random(), random.random())
    tex = None
    if use_multiple_textures:
        textures = ('tex1.jpg', 'tex2.png') # list of available textures
        tex = loader.loadTexture(random.choice(textures))
    instance = Instance(actor, color=color, tex=tex)
    instances[actor].append(instance)
def _removeInstance(instance):
    instance.destroy()

def addActor():
    global actors, instances, instances_for_actor
    actor = _newActor()
    actors += [actor]
    instances[actor] = []
    for i in range(instances_for_actor):
        _addInstance(actor)
    updateReadout()
def removeActor():
    global actors, instances, instances_for_actor
    if len(actors) == 1:
        return
    actor = actors.pop()
    for instance in instances[actor]:
        _removeInstance(instance)
    del instances[actor]
    actor.cleanup()
    actor.removeNode()
    updateReadout()

def addInstances():
    global actors, instances, instances_for_actor
    instances_for_actor += 1
    for actor in actors:
        _addInstance(actor)
    updateReadout()
def removeInstances():
    global actors, instances, instances_for_actor
    if instances_for_actor == 1:
        return
    instances_for_actor -= 1
    for actor in actors:
        _removeInstance(instances[actor].pop())
    updateReadout()

def updateReadout():
    global actors, instances_for_actor, inst1, inst2, inst3
    inst1.setText('Left/Right Arrow : descrease/increase the number of unique actors (currently %d)' % len(actors))
    inst2.setText('Down/Up Arrow : descrease/increase the number of clones (currently %d for each actor)' % instances_for_actor)
    inst3.setText('Visible actors onscreen: %d' % (len(actors) * instances_for_actor))

def toggleAnimation():
    global animPlaying, actors
    if animPlaying:
        for actor in actors:
            actor.stop()
        animPlaying = False
    else:
        for actor in actors:
            actor.loop("run", restart=0)
        animPlaying = True

# Add the first actor
addActor()

# Controls
base.accept("enter", render.ls)
base.accept("escape", sys.exit)

base.accept("space", toggleAnimation)

base.accept("arrow_right", addActor)
base.accept("arrow_left", removeActor)
base.accept("arrow_up", addInstances)
base.accept("arrow_down", removeInstances)

run()

EDIT: To make it work, just put 2 any textures in the same folder with this sample (tex1.png and tex2.png). Just make sure they use different colors, to make the difference clearly visible, for example, one is just white, another is black.
Probably, you will want to test texture replacement and color replacement separately, to make their effect clearly visible. To do that, change variables “use_multiple_colors” and “use_multiple_textures” in the beginning.

rdb · April 3, 2009, 1:59pm

That’s a nice example, thanks for sharing. Shows how to do a crowd fairly easy.

Btw: OpenGL 3.1 has support for GPU instancing. We should maybe consider implementing that in Panda when the NVIDIA+ATI drivers are released.

birukoff · April 3, 2009, 2:52pm

Yes, GPU instancing would be great. But that would require hardware skinning and animation too, I guess. Then it has to go under shader generator class
GPU instancing is actually already supported by sm4 cards, both under DX and OpenGL, as far as I know (I might be wrong though).

birukoff · April 5, 2009, 12:18pm

I updated the code because I have found one very interesting but not obvious thing: the instances may be colorized differently, and even may have different textures! Colors and textures should be applied on the parent of the instance (in this sample, self.placeholder in the Clone() class), not on the instance itself.
This is very interesting, in fact. For example, you may have two different types of enemies in your FPS (or whatever you make), like “enemy_private” and “enemy_corporal”; as long as if they use the same model/geometry and animations (which is perfectly possible in some cases), they can have different textures but share the same Actor instance! In this way, you can have many different enemies without any additional load on the CPU.
If you make a strategy game, it is even more useful: imagine the army of few hundreds characters onscreen (sharing the same model and animation, but different in textures). Your CPU animates only one or two Actors, and the rest are just clones with different textures! The fact that geometry is the same is hardly noticeable from the large distance, which is typical for strategies.

astelix · April 5, 2009, 3:22pm

very good sample birukoff - I saw very fast the different degrading rate between actor clone and unique instancing where in the latter is visible just after 10 characters in my radeon card as well.

just one thing is not clear to me: assuming that I use the same animation .egg, am I locked to animate all the clones at once and everybody in sync or I’m free to animate each of them independently?

birukoff · April 5, 2009, 3:50pm

All instances of the same actor will share the same model, the same animation and the same frame of the animation (you can change textures though). As far as I know, DX10 allows different animations for instances but Panda doesn’t support this at the moment.

birukoff · April 5, 2009, 4:16pm

Out of interest, I tested how many unique actors and how many instances can handle my old laptop at 30 frames per second. Here are results:
27 different actors @ 30 fps

450 instances of the same actor @ 30 fps

As you can see, it really makes sense to use instancing when you want many characters onscreen
Intel Core Duo @ 1.8 GHz (not Core 2 Duo!), 2Gb RAM, ATI Radeon X1600.

astelix · April 5, 2009, 4:34pm

It would be very nice though

by the way this is a screen showing how much I was able to push my worst rig (see my sig below):

~200 @ 30fps

birukoff · April 5, 2009, 4:46pm

I am not sure what is more limiting factor… On one hand, animation calculations are handled on CPU (currently Panda uses software animation, as far as I know), and instancing can help a lot with it. On the other hand, each Ralph has 1700 polygons, and 200 Ralphs = 340k polygons. So, maybe it was GPU that became bottleneck in your particular case. It would be interesting to take a look at pstats

ynjh_jo · April 5, 2009, 4:54pm

birukoff:

the instances may be colorized differently, and even may have different textures! Colors and textures should be applied on the parent of the instance (in this sample, self.placeholder in the Clone() class), not on the instance itself.
This is very interesting, in fact. For example, you may have two different types of enemies in your FPS (or whatever you make), like “enemy_private” and “enemy_corporal”; as long as if they use the same model/geometry and animations (which is perfectly possible in some cases), they can have different textures but share the same Actor instance! In this way, you can have many different enemies without any additional load on the CPU.
If you make a strategy game, it is even more useful: imagine the army of few hundreds characters onscreen (sharing the same model and animation, but different in textures). Your CPU animates only one or two Actors, and the rest are just clones with different textures! The fact that geometry is the same is hardly noticeable from the large distance, which is typical for strategies.

That’s the whole point of instancing, ie. to have a free slot to hold different states. You can apply unique TransformState, RenderState (not just ColorAttrib, TextureAttrib, LightAttrib, ShaderAttrib, but also all other *Attrib’s and *Effect’s because at lower level, they are simply attributes), etc. a regular panda node can hold.

In AlienShooter from Russia (you must know it), there are hundreds of visible aliens onscreen, especially at the last level, the prison. All of them surround and attack the player until I can’t easily see where the player is, because the screen is so full of particles and debris. LOL.

birukoff · April 5, 2009, 6:12pm

Pstats show that the number of geoms grows when spawning instances. Each Ralph has 2 geoms. So, the videocard has to draw quite a few geoms, what drains performance.
Interesting, is there any way to send all instances to the card as single geom? Something like RigidBodyCombiner? That would be great speedup. Maybe David knows…

rdb · April 5, 2009, 6:16pm

An animated model is not ‘rigid’, so it can’t be combined using the RigidBodyCombiner.

birukoff · April 5, 2009, 6:24pm

Yes, you are right
On the other hand, all instances are actually copies of the same actor, and maybe it could be possible to send them to the card as one batch of polygons, not as hundreds of geoms… This is why something like RigidBodyCombiner…
Because right now if I make a strategy game or a clone of Alien Shooter, the number of geoms will be huge, and this will significantly affect the performance. Wouldn’t it be great to combine all instances into one batch?

rdb · April 5, 2009, 6:26pm

Sure would be awesome if that would be there.

We could just wait until NVidia and ATI release the new OpenGL 3.1 drivers and add support for GPU instancing - that would severely minimize the number of batches in this case.

birukoff · April 5, 2009, 6:45pm

NVidia has released drivers 182.47 with OpenGL 3.1 support for cards from 8000 series on (developer.nvidia.com/object/opengl_3_driver.html). ATI annouced support for OpenGL 3.1 in the next release of the driver, but it will not support cards before HD 2000 series at all. So, the hardware instancing is very good but older cards will not be able to use it. This is why some kind of “software” fallback is also interesting for this functionality.

drwr · April 5, 2009, 6:49pm

You can also use the new flatten-actor technique to flatten all of your many models into a few models. Of course, then Panda has to animate them all on the CPU again, so you lose some of the instancing benefit. There’s always a trade-off.

Then again, if your animation is simple enough, you could set hardware-animated-vertices in your Config.prc file, then you may be able to offload some of the animation onto the GPU.

David

rdb · April 5, 2009, 7:03pm

That’s great, and we could maybe implement that in a later release of Panda.
(However I’ve already found the 180 drivers to be Ã¼berbuggy - when I ran your sample with 150+ actors, I already got a garbled screen and a complete lockup, I had to revert to 177.)

birukoff · April 6, 2009, 10:29am

Another update: it is possible to pause/resume animations now.
It can help to determine where the bottleneck is. If you pause animation and framerate jumps up a lot, the bottleneck is CPU that handles animation.
If the framerate doesn’t change significantly, the bottleneck is either large number of geoms (most likely) or large number of polygons that are handled by GPU.

astelix · April 6, 2009, 11:02am

well birukoff, if you see my rig you’ll understand at once
By the way mine was far to be a complain - with clone instancing I can place 200 puppets with still a decent frame rate while if I use the other way I have to stop after 12!!! As a matter of fact, if I stop the animation I get an identical fps.

aurilliance · April 6, 2009, 1:34pm

birukoff, really nice demo! It’s funny, I was reading about instancing in GPU gems this morning, then saw this post this afternoon lol (mind you, that was gpu based instancing, i think)

I’ve uploaded my bench test results as part of a semi-formal analysis http://aurilliance.p3dp.com/#[[CPU%20Based%20Instancing]] <there

Nice work again,