Layered FBOs, and render-to-texture-array

Just posting my rant about what I’ve been up for the past few days to keep this information on the record. This is fairly low-level stuff that probably won’t be all that interesting to most of you.

I’ve worked with TobiasSpringer to add support for render-to-2d-texture-array. You can now attach a texture array to a buffer using add_render_texture. Similar to how rendering to a cube map works, you can create one display region per layer and use set_cube_map_index to indicate which layer you want to render to.
(Yes, set_cube_map_index sounds a bit silly when applied to texture arrays. We should probably rename it into set_layer_index, or something of the sort.)

However, this isn’t very economical for rendering to a lot of layers. Panda will have to create one FBO per layer, and issue all of the draw calls for each layer.
To remedy that, I’ve also added a new render-to-texture mode called RTM_bind_layered. Instead of creating one FBO per layer, this will instead create a single FBO with all of the layers attached to the buffer. The idea is that you then use a geometry shader and indicate which layer you wish to write to by assigning gl_Layer (or in Cg, binding to the LAYER semantic) and the OpenGL implementation will know which layer of the texture to write the fragment to.

If you wish to render the same geometry to all layers, you can combine this with geometry instancing using set_instance_count(n_layers), and assigning the value of gl_InstanceID to gl_Layer in the shader. If instancing isn’t supported on the card, you can simply use a “for” loop in the geometry shader to issue the geometry N times, albeit a bit less efficiently.

This technique is particularly useful for cube map rendering, where you want to draw the same geometry to all faces without incurring the cost of sending the geometry to the GPU six times (once per FBO). I intend to use this when I add point light shadows support to Panda.

Needless to say, to use this method, you need both render-to-texture support and geometry shaders to use RTM_bind_layered. You can use it without a geometry shader, but you’ll only be able to draw to the first layer.
Unfortunately, this means that if you want to support a fallback approach, you’ll need to have two completely different implementations: one shader-based, in which you’re forced to handle the camera transformations in the shader, and one where you create a whole bunch of display regions and set up the appropriate Lenses. I don’t think that there is an easy way around this that will serve everyone’s needs. (Since it requires shader code to deal with the layer that it’s to be rendered on, it will require the use of the shader generator, but there are many cases where people will want to use their own shaders instead.)

As you can’t apply texture arrays to a model using the fixed-function pipeline, you previously had to use a shader to sample texture arrays. You can now also render texture arrays on a model without using a shader, by enabling the shader generator. The layer that is sampled is determined by the W texture coordinate. The buffer viewer takes advantage of this to display the layers attached to a buffer on separate cards.

While I was at it, I’ve overhauled the FBO code a bit here and there to make the RTP_depth_stencil handling more simple, not to make buffers that the user didn’t ask for, and better respect the user’s FrameBufferProperties settings (ie. set_depth_bits(32) will actually ask Panda to use a 32-bit depth attachment.)
(You can now also request a floating-point depth buffer by setting the Texture component type to T_float, but I’m really iffy on the ambiguity of whether the framebuffer storage is defined by the format of the Texture object or by the FrameBufferProperties settings. The fact that Texture does not distinguish between the internal format and the external format isn’t helping to make this less ambiguous. Maybe we can simply add some flags to FrameBufferProperties to signify a floating-point buffer? drwr, any thoughts?)

The next step is to take advantage of this functionality to add efficient point light shadow mapping.

This is great stuff! The framebuffer creation has always been a little spotty in Panda, and I’m glad to see it getting an overhaul.

As to the Texture vs. FramebufferProperties issue, the original design was that FramebufferProperties would determine the intended properties of the requested framebuffer, and any bound Texture objects would have their properties modified to match the framebuffer. Josh Yelon added some code a while back that reversed this design, so that the framebuffer read some of its intended properties from the bound texture(s), like the request for a floating-point buffer. I’m not opposed to that design in principle, but it is a departure from the original design; and I think we should choose one or the other. I think the original design is ultimately more powerful; so yes, we should add some flags to FramebufferProperties to signify a floating-point buffer.

David

No sure if I understand (and that’s perfectly normal and expected), but are you saying that using this sort of magic, one could get away with rendering a dynamic cubemap without the typical divide-your-framerate-by-eight penalty?

@drwr: cool, thanks for clearing that up. One particular problem with our current FBO implementation is that it doesn’t know what the attachments are going to be at the point when open_buffer is called, so we cannot make many useful guesses or predictions about the framebuffer properties at that time. Fixing this would probably require separating out Buffer from BufferContext (as we discussed about in the past), as one could first create the backend-agnostic Buffer object, attach textures to it, and the graphics renderer would later create the appropriate context based on this information.

@wezu: yes, well, you mean “divide-by-six”, since a cube map has six faces. But yes, that’s the idea: you send the geometry to the GPU once, and you instance the geometry across each face on the hardware side. There’s still a little bit of overhead associated with that, but that should be nowhere near the overhead of doing all those draw calls six times.
(Instead of instancing the geometry six times, you could actually even make it a bit smarter and let the geometry shader figure out on on which face a particular triangle should be, but that’s a bit more complex as you need to deal with triangles that overlap across faces boundaries. It may or may not be worth it; requires experimentation.)

I noticed that a lot of code out there relies on being able to get a depth buffer or colour buffer without explicitly requesting it in the FrameBufferProperties. As of now, if you want a depth buffer, you need to call set_depth_bits(1), and if you want colour, you need set_color_bits(1), and if you want alpha, you need set_alpha_bits(1).

Of course, if you have a texture attached to RTP_color or RTP_depth then Panda will automatically set it for you if you haven’t. However, you will still need set_alpha_bits if you want your colour attachment to contain an alpha channel.

I’ve changed make_texture_buffer to request all of this, though, so that nothing will change for people who use that method.

Formerly, Panda always assumed the user wanted all of this; I’m trying to make it possible to get a depth-only framebuffer, for instance, in order to speed up shadow mapping.
Actually, there appears to be a slight bug on my Intel HD Graphics card with getting a depth-only FBO, so I’ve changed it for now to always get a colour buffer until I figure this one out.

Sounds like a good improvement. For compatibility with old code (and for driver issues such as on your Intel), it might not be a bad idea to implement a config variable that, when set, causes Panda to follow the old behavior of creating many default attachments.

David

I’d also like to add support for stereo FBOs and rendering to a multiview texture in order to more easily facilitate postprocessing effects on stereo displays (such as the Oculus Rift, which requires a post shader that compensates for chromatic aberration). I think this could simply be implemented like cube map FBOs, by creating separate FBOs for the left and right eyes.

However, I’m not sure how it should be determined at bind time which view of the texture to render into. From what I understand, views 0 and 1 don’t necessarily correspond to left and right. I cannot rely on dr->get_tex_view_offset() at the time I’m binding the texture to the FBO because this binding is supposed to be done once for the entire buffer, whereas each DR could in theory specify completely different tex view offsets - I need to bind both texture views from the start, and to do that, I need to know which stereo channel to bind to which tex view.

Any thoughts?

Well, we could just assert that if you want to use stereo FBO’s in this way, you have to use 0 and 1 as your tex_view_offset values, which wouldn’t be so bad.

But we can keep the flexibility of allowing these to be redefined, by adding GraphicsOutput::set_tex_view_offset(), which will have the default value of 0. Then DisplayRegion::set_stereo_channel() can set the default tex_view_offset to _output->get_tex_view_offset() for the left eye, and _output->get_tex_view_offset() + 1 for the right eye. (I don’t think there’s any reason to support having values that aren’t n and n + 1 for the two eyes–the only reason to support custom values at all is to allow the developer to use the multiview feature for some other purpose as well as for stereo on the same texture.)

It still means the developer has to know that he can’t change the view offset for each DisplayRegion on the buffer independently, but I think that’s a reasonable limitation given the underlying implementation.

David

Thanks, that is sensible. However, after doing some more thinking, I think that an even better solution is to re-bind the colour texture on the existing FBO in mid-render. I first dismissed it thinking it would be inefficient, but I since learned that switching texture attachments on an FBO is actually faster than binding a new FBO, especially since we only need to re-bind the colour texture and the different texture views have the exact same format.

So unless you think there should be a disconnect between the concept of “tex view that is being rendered to” and “tex view that is being rendered on the geometry”, I can actually implement it without the need for the proposed GraphicsOutput::set_tex_view_offset().

I’ve tried out the method I just proposed, and it works quite well: (exaggerated interocular)
rdb.name/glowdemo_stereo.png
This is with minimal changes in FilterManager.

The only caveat with this method is that the left buffer can’t be cleared at the same time as the right buffer, so I had to make minor adjustments to StereoDisplayRegion so that the left colour buffer is cleared by the left DR and the right colour buffer by the right DR. I don’t think this will be a significant performance issue as the right buffer already needs to be cleared individually for the depth/stencil/aux bits.

Excellent!

Reviving this topic a bit.

We have a need to generate a cubemap in one pass for some performance optimizations. RDB, do you have the Panda code that does this? I can’t seem to get it to work with a geometry shader. This is a pretty old feature that I’m pretty sure not a lot of people use, so there might be a bit of code-rot as well.

If you’re curious about what I’m currently doing here’s the entire code block. It’s run-able but there’s a line in there causes an assert dr.setCubeMapIndex(f).

Also this code doesn’t actually change the Model View Projection matrices within each layer, but it shouldn’t be black.

Python

__author__ = 'Bei'
from direct.directbase.DirectStart import *
from panda3d.core import *


class SinglePassCode(object):
    def __init__(self):
        self.sceneRoot = render.attachNewNode("sceneRoot")
        self.eyeCenter = render.attachNewNode("eyeCenter")
        self.screen = render.attachNewNode("screen")

        #composite scene
        smiley = loader.loadModel("misc/smiley")
        smiley.reparentTo(self.screen)
        smiley.setTexGen(TextureStage.getDefault(), TexGenAttrib.MWorldNormal)
        base.cam.node().setScene(self.screen)

        #rendered scene
        s2 = loader.loadModel("misc/smiley")
        s2.reparentTo(self.sceneRoot)
        s2.setScale(50)
        s2.setTwoSided(1)
        s2.setRenderModeWireframe(1)

        #make the cubemap
        self.__setupBufferCubeMap()

        #sets up the shaders
        self.shader = Shader.load(Shader.SLGLSL, "vertex.glsl", "fragment.glsl", "geometry.glsl")
        self.sceneRoot.setShader(self.shader)


    cubeFaces = [
        ("positive_x", LPoint3(1, 0, 0), LVector3(0, -1, 0)),
        ("negative_x", LPoint3(-1, 0, 0), LVector3(0, -1, 0)),
        ("positive_y", LPoint3(0, 1, 0), LVector3(0, 0, 1)),
        ("negative_y", LPoint3(0, -1, 0), LVector3(0, 0, -1)),
        ("positive_z", LPoint3(0, 0, 1), LVector3(0, -1, 0)),
        ("negative_z", LPoint3(0, 0, -1), LVector3(0, -1, 0)),
    ]

    def __setupBufferCubeMap(self):
        """ Creates the cube map texture that will be used to render
        all screens (for each stereo channel). """
        # First, set up the cube map camera rig.
        cubeRig = self.eyeCenter.attachNewNode('cubeRig')
        self.cubeMapCameras = []
        self.cubeMapCameraNPs = []
        self.cubeMapViewMats = []
        self.cubeMapMatrices = []

        # This lens is replicated for each camera.

        for f in range(len(self.cubeFaces)):
            lens = PerspectiveLens()
            lens.setFov(90.0)
            lens.setNearFar(1.0, 1000.0)
            lens.setInterocularDistance(0.0)
            name, lookAt, up = self.cubeFaces[f]
            camera = Camera(name, lens=lens)
            camera.setScene(self.sceneRoot)
            cameraNP = cubeRig.attachNewNode(camera)
            self.cubeMapCameras.append(camera)
            self.cubeMapCameraNPs.append(cameraNP)
            cameraNP.lookAt(lookAt, up)
            self.cubeMapViewMats.append(Mat4(cameraNP.getMat()))
            self.cubeMapMatrices.append(Mat4(lens.getProjectionMat()))

            #disable the camera
            camera.setActive(False)

        #just enable the first one and this is the one that renders the rest in the geometry shader
        self.cubeMapCameras[0].setActive(True)


        # Now the cube map buffers.
        self.cubeMapBufs = []
        self.cubeMapDisplayRegions = []
        self.cubeMapLenses = []

        size = 256
        name = 'cubeMap'
        buf = self.__makeOffscreenBuffer(name, size, size, forCubeMap=True)
        self.cubeMapBufs.append(buf)
        buf.setClearColorActive(False)
        buf.setClearDepthActive(False)
        buf.setClearStencilActive(False)

        #our cube map texture
        tex = Texture(name)
        tex.setupCubeMap()


        for f in range(len(self.cubeFaces)):
            dr = buf.makeMonoDisplayRegion()
            #dr.setClearColorActive(True)
            dr.setClearColor(VBase4(0.0, 0.0, 1.0, 1.0))
            #dr.setClearDepthActive(True)
            dr.setCubeMapIndex(0)
            #hmmm setting this to f causes an assert
            #dr.setCubeMapIndex(f)
            dr.setCamera(self.cubeMapCameraNPs[f])
            self.cubeMapDisplayRegions.append(dr)

        buf.addRenderTexture(tex, GraphicsOutput.RTMBindLayered, GraphicsOutput.RTPColor)

        #set the texture on the object
        self.screen.setTexture(tex, 1)


    def __makeOffscreenBuffer(self, name, sizeX, sizeY, forCubeMap=False):
        flags = GraphicsPipe.BFRefuseWindow | GraphicsPipe.BFSizePower2
        flags |= GraphicsPipe.BFCanBindEvery | GraphicsPipe.BFRttCumulative
        if forCubeMap:
            flags |= GraphicsPipe.BFSizeSquare

        winprops = WindowProperties()
        winprops.setSize(sizeX, sizeY)
        props = FrameBufferProperties()
        props.setRgbColor(1)
        props.setAlphaBits(1)
        props.setDepthBits(1)
        props.setMultisamples(0)
        props.setStereo(0)
        sort = -10

        return base.graphicsEngine.makeOutput(base.pipe, name, sort, props, winprops,
                                              flags, base.win.getGsg(), base.win)


if __name__ == "__main__":
    w = SinglePassCode()
    base.setBackgroundColor(0.2, 0.2, 0.2, 1.0)
    run()

vertex.glsl

#version 400

in vec4 p3d_Vertex;
in vec4 p3d_Normal;
in vec2 p3d_MultiTexCoord0;

uniform mat4  gl_ModelViewProjectionMatrix;

out vec3 vNormal;
out vec2 vTexCoord;

//simple pass-through
void main()
{
    gl_Position = gl_ModelViewProjectionMatrix * vec4(p3d_Vertex.xyz, 1.0);
    vNormal = p3d_Normal.xyz;
    vTexCoord = p3d_MultiTexCoord0.xy;
}

geometry.glsl

#version 400

layout(triangles) in;
layout(triangle_strip, max_vertices = 3) out;


in vec3 vNormal[3];
in vec2 vTexCoord[3];

out vec3 gNormal;
out vec2 gTexCoord;


void main() {
    for(int j = 0; j < 6; j++){

        gl_Layer = j;
        for(int i = 0; i < gl_in.length(); i++){
            gl_Position = gl_in[i].gl_Position;
            //gNormal = vNormal[i];
            //gTexCoord = vTexCoord[i];
            EmitVertex();
        }
        EndPrimitive();
    }

}

fragment.glsl

#version 400

in vec3 gNormal;
in vec2 gTexCoord;


uniform sampler2D p3d_Texture0;

out vec4 FragColor;

void main()
{
    vec4 albedo = texture2D( p3d_Texture0, gTexCoord );

    FragColor = albedo;
    FragColor.b = 0.0;

}

I have not actually tested thoroughly with cube map textures. I can see results when I add base.bufferViewer.toggleEnable() and replace setupCubeMap() with setup2dTextureArray(6), though, which should be functionally equivalent but perhaps I have simply missed a conditional somewhere to make this work for cube maps.

You shouldn’t use setCubeMapIndex when using layered textures. You should have only one display region which will render all the layers at once, and your geometry shader should shift the point-of-view by using an array of six projection matrices or something like that. It doesn’t make sense to use setCubeMapIndex since the geometry shader decides the specific layer to use.

Also, your vertex shader didn’t compile for me since you redeclared the built-in gl_ModelViewProjectionMatrix.
I also don’t think that Panda binds the outputs for the fragment shader itself (your FragColor variable), so either use the built-in gl_FragColor or use layout() to bind it to the first output yourself.

I’ll have some time to look more deeply into this tomorrow.

Strange. I just tried having a texture array of size 6 with one display region and all I get is the first texture in the texture viewer.

Sorry for the delay in responding, I’ve been too busy to stay updated on the forums.

Are you using the latest version of the texture viewer and the shader generator? The FFP can’t render texture arrays, so I made some changes back then so that they could be viewed through the shader generator, and changed the texture viewer to take advantage of that.