setOneShot and RTMCopyRam?

drwr · September 30, 2011, 6:51pm

This message is directed at Craig, who asked about this question on my recent blog post:

I’ve been so far unable to reproduce the reported problem. setOneShot() appears to be working for me in conjunction with RTMCopyRam. Here is some sample code that demonstrates it working–load it up and press “b” to create a buffer and start things going. It should render a scene offscreen and, when the result gets back to RAM, it should save it to tex.png and also display it onscreen. Does this code work for you, or does it fail?

from direct.directbase.DirectStart import *
from panda3d.core import *

def saveTexture(tex, buffer, task):
    if not tex.hasRamImage():
        return task.cont

    print "Got texture"
    tex.write('tex.png')
    cm = CardMaker('card')
    cm.setFrame(-0.5, 0.5, -0.5, 0.5)
    card = aspect2d.attachNewNode(cm.generate())
    card.setTexture(tex)

    base.graphicsEngine.removeWindow(buffer)

    return task.done

def makeBuffer():
    tex = Texture('tex')
    buffer = base.win.makeTextureBuffer('buffer', 256, 256, tex, True)
    buffer.setClearColor((0, 0, 0, 1))
    print "Made %s" % (buffer.getType())

    bufscene = NodePath('bufscene')
    bufcam = bufscene.attachNewNode(Camera('bufcam'))
    dr = buffer.makeDisplayRegion()
    dr.setCamera(bufcam)

    m = loader.loadModel('smiley.egg')
    m.reparentTo(bufscene)
    m.setPos(0, 20, 0)

    buffer.setOneShot(True)

    taskMgr.add(saveTexture, 'saveTexture', extraArgs = [tex, buffer], appendTask = True)
    
    
base.accept('b', makeBuffer)
run()

David

zhao · October 1, 2011, 12:57am

David, I have a tangential problem. Opening an offscreen buffer using base.graphicsEngine.makeOutput is crashing under both gl and dx9 when I enable the new threading model using ‘threading-model Cull/Draw’. This is on a GTX 260/winxp 32 bitmachine.

from panda3d.core import loadPrcFileData
#loadPrcFileData('', 'load-display pandadx9' )
loadPrcFileData('', 'load-display pandagl' )
loadPrcFileData('', 'framebuffer-multisample 0')
loadPrcFileData('', 'multisamples 0')
import direct.directbase.DirectStart
from panda3d.core import NodePath, FrameBufferProperties, GraphicsPipe, GraphicsOutput, Texture, WindowProperties

winprops = WindowProperties()
winprops.setSize( 512, 512 )
fbprops = FrameBufferProperties()
fbprops.setColorBits(1)
fbprops.setAlphaBits(1)
objBuffer = base.graphicsEngine.makeOutput(base.pipe, 'hello', -10, fbprops, winprops, GraphicsPipe.BFRefuseWindow, base.win.getGsg(), base.win)

run()

teedee · October 1, 2011, 2:30am

I believe I have the same problem with the buffers.
Is this the sort of error you get?

Traceback (most recent call last):
  File "C:\work\sp1\Panda3D\direct\showbase\ShowBase.py", line 1719, in __igLoop

    self.graphicsEngine.renderFrame()
AssertionError: pipeline_stage == 0 at line 667 of c:\work\panda3d\panda\src\display\displayRegion.cxx
:task(error): Exception occurred in PythonTask igLoop
Traceback (most recent call last):
  File "client.py", line 564, in <module>
    game.run()
  File "C:\work\sp1\Panda3D\direct\showbase\ShowBase.py", line 2644, in run
    self.taskMgr.run()
  File "C:\work\sp1\Panda3D\direct\task\Task.py", line 502, in run
    self.step()
  File "C:\work\sp1\Panda3D\direct\task\Task.py", line 460, in step
    self.mgr.poll()
  File "C:\work\sp1\Panda3D\direct\showbase\ShowBase.py", line 1719, in __igLoop

    self.graphicsEngine.renderFrame()
AssertionError: pipeline_stage == 0 at line 667 of c:\work\panda3d\panda\src\display\displayRegion.cxx

Another curious side-effect is that by setting a window icon, Panda becomes unable to open a window. For example:

loadPrcFileData('', 'icon-filename icon.ico')

Specifically, Python will sit idle with no CPU usage when it hits line 611 of ShowBase.py, the graphicsEngine.makeOutput call.

zhao · October 1, 2011, 3:34am

Yep. Same error message, in opengl mode, it gives a slightly different error message

:display:gsg:glgsg(error): at 87 of c:\buildslave\dev_sdk_win32\build\panda3d\pa
nda\src\glstuff\glGraphicsBuffer_src.cxx : invalid operation

drwr · October 1, 2011, 5:48pm

OK, it’s true that it doesn’t work in multithreaded mode, and I’m working on that right now.

But I was hoping to answer whether it works in the original, single-threaded mode. Craig’s report seems to suggest that it doesn’t work even in that case. Can anyone else confirm this?

David

teedee · October 1, 2011, 6:46pm

In my case it works just fine in the single-thread mode, albeit slower than threads disabled which is to be expected.
Edit: By “it” I meant my game, but I also just tried the example you posted, David, and it works as well.

zhao · October 1, 2011, 8:46pm

It works fine for me as well in the single thread mode.

It does throw a warning about pnmimage and transparencies though.

Craig · October 3, 2011, 4:25am

Running your code from the first post on my older mac, it runs until I hit b, then if freezes for a while and crashes with a bus error:

DirectStart: Starting the game.
Known pipe types:
osxGraphicsPipe
(all display modules loaded.)
Made GLGraphicsBuffer
/var/folders/M4/M4oCK5IpHAucjm-kozc7uk+++TI/Cleanup At Startup/untitled text-339307867.848.command: line 3: 252 Bus error /usr/local/bin/python /private/var/folders/M4/M4oCK5IpHAucjm-kozc7uk+++TI/Cleanup\ At\ Startup/untitled\ text-339307867.730
logout

[Process completed]

Anyway, thats not the computer I usually use (horrible graphics support), and your code works fine on the machine which I was producing my issue on (a newer mac).

This modified version which reuses the same buffer only works the first time, then all repeated attempts never finish. This is the issue I was having I think.

from direct.directbase.DirectStart import * 
from panda3d.core import * 

def saveTexture(tex, buffer, task): 
    if not tex.hasRamImage(): 
        return task.cont 

    print "Got texture" 
    tex.write('tex.png') 
    cm = CardMaker('card') 
    cm.setFrame(-0.5, 0.5, -0.5, 0.5) 
    card = aspect2d.attachNewNode(cm.generate()) 
    card.setTexture(tex) 
    
    buffer.clearRenderTextures()

    return task.done 

buffer = base.win.makeTextureBuffer('buffer', 256, 256, Texture(), True) 

def makeBuffer(): 
    tex = Texture('tex') 
    
    
    mode=GraphicsOutput.RTMCopyRam
    buffer.addRenderTexture(tex,mode)
    
    buffer.setClearColor((0, 0, 0, 1)) 
    print "Made %s" % (buffer.getType()) 

    bufscene = NodePath('bufscene') 
    bufcam = bufscene.attachNewNode(Camera('bufcam')) 
    dr = buffer.makeDisplayRegion() 
    dr.setCamera(bufcam) 

    m = loader.loadModel('smiley.egg') 
    m.reparentTo(bufscene) 
    m.setPos(0, 20, 0) 

    buffer.setOneShot(True) 

    taskMgr.add(saveTexture, 'saveTexture', extraArgs = [tex, buffer], appendTask = True) 
    
    
base.accept('b', makeBuffer) 
run()

That should work right?

drwr · October 3, 2011, 4:32pm

Ah, hmm. This is actually beyond the original intention of setOneShot(), which was to permanently disable the buffer after the first time.

Actually, the original intention of setOneShot() was to delete the buffer after one frame, but that turned out to be problematic, so we eventually softened it into disabling the buffer after one frame, and required the user to explicitly delete it later. But since we have made that change, it makes sense to allow the buffer to be re-used again later if desired. I’ll see about making the necessary changes to support that.

David

drwr · October 5, 2011, 2:19pm

FYI, I’ve committed my recent fixes for these issues, and the buildbot server has picked them up. In particular: (1) the above code, and offscreen buffers in general, should now work correctly in threaded mode, and (2) setOneShot() is now reusable.

Please let me know as you find additional issues.

David

teedee · October 6, 2011, 10:13am

I tried it out on the “Teapot on TV” sample and got about a +50% FPS boost, pretty good for a sample that has almost nothing going on!
My game still freezes before it gets to render a frame, I’ll try to make a small example which fails.

teedee · October 26, 2011, 6:37am

I finally got a chance to debug this, here is some code which causes the threading to fail:

from panda3d.core import *
loadPrcFileData('', 'threading-model Cull/Draw')
from direct.directbase.DirectStart import *

base.wireframeOn()
res_x, res_y = 16, 9
# writers
vdata = GeomVertexData('vdata', GeomVertexFormat.getV3t2(), Geom.UHDynamic)
vtxwriter = GeomVertexWriter(vdata, 'vertex')
texwriter = GeomVertexWriter(vdata, 'texcoord')
# verts
for y in range(res_y):
    for x in range(res_x):
        vtxwriter.addData3f(x, -y, 0)
        vtxwriter.addData3f(x+1, -y, 0)
        vtxwriter.addData3f(x+1, -y-1, 0)
        vtxwriter.addData3f(x, -y-1, 0)
        for i in range(4):
            texwriter.addData2f(0, 0)
# polys
prim = GeomTriangles(Geom.UHStatic)
for i in range(res_x * res_y):
    prim.addVertices(i*4+2, i*4+1, i*4)
    prim.addVertices(i*4+3, i*4+2, i*4)
prim.closePrimitive()
# model
geom = Geom(vdata)
geom.addPrimitive(prim)
node = GeomNode('geom')
node.addGeom(geom)
model = render.attachNewNode(node)

run()

drwr · October 26, 2011, 2:59pm

Ah, I can run the code and reproduce the failure. Deadlock! Curse you, subtle threading issues! Thanks for the sample code.

David

drwr · October 26, 2011, 3:07pm

Ah, I see what’s going on. Not precisely a bug in Panda–the problem is that your GeomVertexWriter objects are still in scope at the time you call run(), so you are still holding the write locks on the vertices, and when the renderer tries to grab the write locks–deadlock.

The simple solution is to add:

del vtxwriter
del texwriter

before your call to run().

David

teedee · October 26, 2011, 5:53pm

Ah yes, that has got it going!
I like how the animation goes into the Cull thread, that really helps even things out.

I am still getting some sort of deadlock in one of the levels in the game that is not happening in the other level I tested so I guess I can sort out what is causing that by the difference in the levels. Is there an easier way to find out what is causing the lock?

I thought you might be interested in the performance numbers:
Non-threaded build: 40-45 fps
Threaded build WITHOUT threading-model set: 30ish fps
Threaded build WITH threading-model set: 55-65 fps

Supposing I wanted to get the best speed on a single core or multiple cores, would I need two builds of Panda (threaded and non-threaded)?

drwr · October 26, 2011, 6:15pm

Well, you can break into the running Panda with a debugger and see where in the code each thread is stopped; that’s what I did in this case. But that may not tell you much unless you already have a good sense of what the code is supposed to be doing. (I saw that the main thread was waiting for the cull thread to complete, which is normal, and that the cull thread was waiting to access a GeomVertexData, which isn’t normal. That gave me the clue to suspect that the main thread was holding the lock on the GeomVertexData.)

However, if you build a version of Panda with DEBUG_THREADS defined, then it will compile a special version of the threading library that monitors each lock and unlock and attempts to check for deadlock. Not all deadlock conditions can be reported, but many can (this example would have been), and if it detects the deadlock it will tell you what locks were being held by which threads, which may also give you insight. Of course the DEBUG_THREADS version of Panda runs more slowly.

Yes. The fastest possible speed on a single core is always with a single-threaded build. Having threading support available necessarily adds additional overhead to all low-level operations across the board, even if you’re not using threads at the moment.

David

teedee · October 26, 2011, 9:18pm

Thanks, that was helpful. I managed to find a couple more problems related to the vertex data issue. Now it runs for a fair bit longer, but I still get a deadlock.
I tried the thread debug build, this is what I’ve got:

:thread(error): 

****************************************************************
*****                 Deadlock detected!                   *****
****************************************************************

:thread(error): Thread Cull attempted to lock Mutex  03D9BDE0 which is held by MainThread Main
:thread(error): MainThread Main is blocked waiting on CyclerMutex PandaNode::CData which is held by Thread Cull
:thread(error): Deadlock!
Assertion failed: Deadlock at line 205 of c:\work\panda3d\panda\src\pipeline\mutexDebug.cxx
Assertion failed: node->_prev != NULL && node->_prev->_next == node && node->_next->_prev == node at line 92 of c:\work\panda3d\built\include\linkedListNode.I
Assertion failed: Thread Cull attempted to release Mutex  03D9BDE0 which it does not own at line 369 of c:\work\panda3d\panda\src\pipeline\mutexDebug.cxx
Traceback (most recent call last):
  File "C:\work\panda3d\built\direct\showbase\ShowBase.py", line 1656, in __resetPrevTransform
    PandaNode.resetAllPrevTransform()
AssertionError: Deadlock at line 205 of c:\work\panda3d\panda\src\pipeline\mutexDebug.cxx
:task(error): Exception occurred in PythonTask resetPrevTransform
Traceback (most recent call last):
  File "client.py", line 564, in <module>
    game.run()
  File "C:\work\panda3d\built\direct\showbase\ShowBase.py", line 2648, in run
    self.taskMgr.run()
  File "C:\work\panda3d\built\direct\task\Task.py", line 502, in run
    self.step()
  File "C:\work\panda3d\built\direct\task\Task.py", line 460, in step
    self.mgr.poll()
  File "C:\work\panda3d\built\direct\showbase\ShowBase.py", line 1656, in __resetPrevTransform
    PandaNode.resetAllPrevTransform()
AssertionError: Deadlock at line 205 of c:\work\panda3d\panda\src\pipeline\mutexDebug.cxx

CData, is it collision?

drwr · October 26, 2011, 10:15pm

CData is short for CycleData, which is the internal name for the data that is copied between pipeline stages across all Panda objects. In this case, it’s PandaNode::CData, which means the deadlock is on something in PandaNode: something in the main thread is waiting for a PandaNode lock (held by Cull) at the same time as it’s also holding this unnamed Mutex 03D9BDE0, and in the meantime Cull tried to grab this unnamed Mutex.

That’s deadlock all right, but it’s not very informative since we’re not sure what the unnamed mutex was. The stack trace gives us a bit of a clue, that it happened during resetAllPrevTransform(), but I’m still not sure what the contention is. Bleah.

David

teedee · November 6, 2011, 11:08pm

After some digging, it seems this deadlock is related to something in my positional audio update code, and possibly FMOD. The more sounds I have the more quickly it will deadlock.
My guess would be as a result of many calls to sound.status(), sound.set3dAttributes(), or sound.setVolume(), but who knows.
I’ll try to isolate the sound code and see if it still has a problem on its own. That should make a fairly good demo of the problem.

drwr · November 7, 2011, 12:11am

Ah, OK. I’d be willing to bet that Panda’s FMod audio layer (and its OpenAL audio layer, for that matter) doesn’t properly protect itself against multithreaded access.

Both of these were originally implemented by people who likely weren’t thinking in terms of multithreaded code.

I’ll put it on the list to investigate. In the meantime, one easy way to assure yourself that it is, in fact, strictly related to the audio subsystem is to run with “audio-library-name null” in your Config.prc file.

David