GLSL dithering filter help

Can anyone help me get this dithering filter GLSL shader to work with Panda3D?

devlog-martinsh.blogspot.com/201 … ering.html

from panda3d.core import *
import direct.directbase.DirectStart
from direct.filter.FilterManager import FilterManager

manager = FilterManager(base.win, base.cam)
tex = Texture()
quad = manager.renderSceneInto(colortex=tex)
quad.setShader(Shader.load("shaders/filter.glsl", Shader.SL_GLSL))
quad.setShaderInput("tex", tex)


model = loader.loadModel("panda")
model.reparentTo(render)

base.run()

I’ve set up a simple scene to see the filter in action, but I get this error:

Using deprecated DirectStart interface.
Known pipe types:
  wglGraphicsPipe
(all display modules loaded.)
:shader(error): GLSL shaders must have separate shader bodies!
Traceback (most recent call last):
  File "main.py", line 8, in <module>
    quad.setShader(Shader.load("shaders/filter.glsl", Shader.SL_GLSL))
TypeError: NodePath.set_shader() argument 1 must be Shader, not NoneType

Here’s the slightly modified GLSL shader:

#version 130

uniform sampler2D tex;
float Scale = 1.0;

float find_closest(int x, int y, float c0)
{

int dither[8][8] = {
{ 0, 32, 8, 40, 2, 34, 10, 42}, /* 8x8 Bayer ordered dithering */
{48, 16, 56, 24, 50, 18, 58, 26}, /* pattern. Each input pixel */
{12, 44, 4, 36, 14, 46, 6, 38}, /* is scaled to the 0..63 range */
{60, 28, 52, 20, 62, 30, 54, 22}, /* before looking in this table */
{ 3, 35, 11, 43, 1, 33, 9, 41}, /* to determine the action. */
{51, 19, 59, 27, 49, 17, 57, 25},
{15, 47, 7, 39, 13, 45, 5, 37},
{63, 31, 55, 23, 61, 29, 53, 21} };

float limit = 0.0;
if(x < 8)
{
limit = (dither[x][y]+1)/64.0;
}


if(c0 < limit)
return 0.0;
return 1.0;
}

void main()
{
vec4 lum = vec4(0.299, 0.587, 0.114, 0);
float grayscale = dot(texture2D(tex, gl_TexCoord[0].xy), lum);
vec3 rgb = texture2D(tex, gl_TexCoord[0].xy).rgb;

vec2 xy = gl_FragCoord.xy * Scale;
int x = int(mod(xy.x, 8));
int y = int(mod(xy.y, 8));

vec3 finalRGB;
finalRGB.r = find_closest(x, y, rgb.r);
finalRGB.g = find_closest(x, y, rgb.g);
finalRGB.b = find_closest(x, y, rgb.b);

float final = find_closest(x, y, grayscale);
gl_FragColor = vec4(finalRGB, 1.0);
}

According to the manual, you appear to have the parameters to “Shader.load” backwards: you should pass in “Shader.SL_GLSL” first, and the shader file-name second.

Not really.
panda3d.org/reference/1.9.0/ … 88d375d733

But here’s the error message when I do it backwards:

Using deprecated DirectStart interface.
Known pipe types:
  wglGraphicsPipe
(all display modules loaded.)
Traceback (most recent call last):
  File "main.py", line 8, in <module>
    quad.setShader(Shader.load(Shader.SL_GLSL, "shaders/filter.glsl"))
TypeError: an integer is required

I’m assuming the Shader.load() is expecting more than a single shader file (not just fragment) when using that order for arguments.

I also posted the exact shader code, don’t know why I didn’t sooner.

GLSL shaders should consist of at least a vertex and a fragment shader, and are loaded like this:

Shader.load(Shader.SL_GLSL, "shaders/filter.vert", "shaders/filter.frag")

It looks like the shader you have is just a fragment shader. You will need a vertex shader that supplies the appropriate inputs, like (untested):

in vec4 p3d_Vertex;
in vec4 p3d_MultiTexCoord0;
uniform mat4 p3d_ModelViewProjectionMatrix;

void main() {
  gl_Position = p3d_ModelViewProjectionMatrix * p3d_Vertex;
  gl_TexCoord[0] = p3d_MultiTexCoord0;
}

hm

Using deprecated DirectStart interface.
Known pipe types:
  wglGraphicsPipe
(all display modules loaded.)
:display:gsg:glgsg(error): An error occurred while compiling GLSL shader shaders
/filter.frag:
ERROR: shaders/filter.frag:9: '' : arrays of arrays supported with GLSL 4.3 or G
LSL ES31 or GL_ARB_arrays_of_arrays enabled
ERROR: shaders/filter.frag:42: 'finalRGB' : undeclared identifier
ERROR: shaders/filter.frag:42: 'r' :  field selection requires structure, vector
, or matrix on left hand side
ERROR: shaders/filter.frag:43: 'g' :  field selection requires structure, vector
, or matrix on left hand side
ERROR: shaders/filter.frag:44: 'b' :  field selection requires structure, vector
, or matrix on left hand side
ERROR: shaders/filter.frag:47: 'constructor' : not enough data provided for cons
truction

:display:gsg:glgsg(error): Unrecognized vertex attrib 'p3d_ModelViewProjectionMa
trix'!

The error indicates the shader requires OpenGL 4.3 features, but the shader has a “#version 130” tag. You can try changing it to “#version 430”, which is the GLSL version corresponding to OpenGL 4.3.

Also, I made an error in my shader, it’s “uniform mat4 p3d_ModelViewProjectionMatrix;” instead of “in mat4…”

Thanks.
I seem to be having this issue now:

panda3d.org/manual/index.ph … ge_Filters
There’s a Cg code example to overcome it. How would it be done in GLSL?

You should just set “textures-power-2 none” in Config.prc. Any hardware that supports OpenGL 4.3 will undoubtedly handle non-power-of-two textures well (hardware that doesn’t is rare, nowadays).

Looking again at the manual page to which I linked, the order that you were using appears to be for CG shaders, not GLSL. I’d missed that there was a difference when using “Shader.load”!

Anyway, you seem to have moved past this issue, of which I’m glad. :slight_smile:

That worked. Perhaps the manual page should be updated with this info?

I am now getting this error message, with or without that Config variable:

:display:gsg:glgsg(error): An error occurred while compiling GLSL shader shaders
/filter.vert:
ERROR: 0:? : 'gl_TexCoord' : variable is not available in current GLSL version

:display:gsg:glgsg(error): An error occurred while compiling GLSL shader shaders
/filter.frag:
ERROR: 0:? : 'gl_TexCoord' : variable is not available in current GLSL version
ERROR: 0:? : 'gl_TexCoord' : variable is not available in current GLSL version

:display:gsg:glgsg(error): An error occurred while linking GLSL shader program!
Link called without any attached shader objects.

Perhaps the shader files are stored in some kind of cache? The error wasnt there before and doesnt go away.

The shader is just coded very sloppily. It probably only worked because the compiler that the author used was particularly lenient.

Here are the fixed shaders, that do work:
filter.vert:

#version 430

in vec4 p3d_Vertex;
in vec2 p3d_MultiTexCoord0;
uniform mat4 p3d_ModelViewProjectionMatrix;

out vec2 texcoord;

void main() {
  gl_Position = p3d_ModelViewProjectionMatrix * p3d_Vertex;
  texcoord = p3d_MultiTexCoord0;
}

filter.frag:

#version 430

uniform sampler2D tex;
in vec2 texcoord;
layout(location=0) out vec4 color;

float Scale = 1.0;

float find_closest(int x, int y, float c0)
{

int dither[8][8] = {
{ 0, 32, 8, 40, 2, 34, 10, 42}, /* 8x8 Bayer ordered dithering */
{48, 16, 56, 24, 50, 18, 58, 26}, /* pattern. Each input pixel */
{12, 44, 4, 36, 14, 46, 6, 38}, /* is scaled to the 0..63 range */
{60, 28, 52, 20, 62, 30, 54, 22}, /* before looking in this table */
{ 3, 35, 11, 43, 1, 33, 9, 41}, /* to determine the action. */
{51, 19, 59, 27, 49, 17, 57, 25},
{15, 47, 7, 39, 13, 45, 5, 37},
{63, 31, 55, 23, 61, 29, 53, 21} };

float limit = 0.0;
if(x < 8)
{
limit = (dither[x][y]+1)/64.0;
}


if(c0 < limit)
return 0.0;
return 1.0;
}

void main()
{
vec4 lum = vec4(0.299, 0.587, 0.114, 0);
float grayscale = dot(texture(tex, texcoord), lum);
vec3 rgb = texture(tex, texcoord).rgb;

vec2 xy = gl_FragCoord.xy * Scale;
int x = int(mod(xy.x, 8));
int y = int(mod(xy.y, 8));

vec3 finalRGB;
finalRGB.r = find_closest(x, y, rgb.r);
finalRGB.g = find_closest(x, y, rgb.g);
finalRGB.b = find_closest(x, y, rgb.b);

float final = find_closest(x, y, grayscale);
color = vec4(finalRGB, 1.0);
}

Thank you.
When using the grayscale value instead of r,g,b I get the exact effect I needed, black and white dithered frames.

Is pretty slow though, console also mentions this

WARNING: Too many temp register is used in Fragment shader, it may cause slow ex
ecution.

This is for a volumetric display project similar to this one.
gl.ict.usc.edu/Research/3ddisplay/
These kind of displays accept 1 bit frames only, about 2000-5000 each second.

The way its done is for each real 24 bit frames the scene is rendered in FBOs 24 times with a slightly different camera angle, then the 24 are combined into a single fake frame to send via HDMI to a programmable DLP projector where each 24bit frame is again broken down into the individual 24 frames. So if the GPU refresh rate is set to 120, each second 2880 frames will be sent to the device to create the illusion of a 3d volume.

The display I’m working on looks more like this one: cs.bris.ac.uk/publications/p … 001341.pdf
It is enclosed in an acrylic dome. The aim is to decrease the size of the device to a crystall ball like this old discontinued device and unlike the previous projects add some more interactivity (AI) and be realtime unlike persepcta.

I’m currently stuck on the Cg code.

That sounds like a cool project!

As for the slow shader: I had no issues on my card, but drivers can vary in how they compile GLSL. It may be because of the dither pattern table; it is possible that your card is trying to load it all into registers. You could switch to using a uniform array or a lookup texture to store it, or perhaps try an algorithm that doesn’t require such a large amount of constants.

Thanks. I will try on other computers.

How would you render the scene thousand times each second in Panda and send the data to the shader the fastest way?

Each 3d volume frame is composed of several 2d ‘‘slices’’, which are not going to be displayed on the screen but be rendered and sent to the high speed projector.
It works like this: you decide the refresh rate/frame rate of the 3d volume, and the number of 2d slices.
Then take the projection speed, 2880 Hz. Say you’ve decided to go with refresh rate of 20 Hz. This will give 2880/20 = 144 2d slices (or rather 144*2 slices) which is acceptable.
So the scene has to be rendered 144 times for each “volume frame”, which will be itself updated 20 times each second. For each ‘‘volume frame’’ the scene is the same one, you just have the camera rotated a little.

Whichever method is used for the rendering, 24 consecutive frames will have to be merged to 1 24bit frame in a shader and those will be the frames displayed/sent to the projector.

So how what would you use for the rendering and how would you access that data in a shader?

Hmm, I can’t say I am an expert in this field, so my understanding (and thus my ability to help) will be limited. But I can give it a shot.

Rendering 2880 frames in a second on a non-trivial scene sounds very challenging. I would guess that doing all this in separate render passes in Panda could be infeasible. (It seems rather challenging in general; I have no idea how close you’ll be able to come!)

However, there are GPU features that may help you out here. Panda3D 1.9.0 introduces a feature called “layered rendering”, which will allow you to render to different layers of a 3D texture or 2D texture array in a single render pass. It basically issues all the draw calls for the scene once, and then a shader runs N times to render it to different layers. This will most likely be your best bet.

This feature was used to render multiple shadow maps from different light sources during the same pass, as demonstrated here, so this might cover your use case quite well.

You mention that each layer will render from a different angle? This means that with this technique, you would not be able to use effective frustum culling, since Panda only gets to make the draw calls once. You probably can’t afford the performance cost of making 2880 cull passes per second anyway, so this might limit the amount of models you may have in a scene. To disable culling, you can set this in Config.prc:

view-frustum-cull false

The way you use this is your usual render-to-texture set-up. Except your render-texture set-up will look more like this:

tex = Texture("volume")
tex.setup2dTextureArray(144)
buffer.addRenderTexture(tex, GraphicsOutput.RTM_bind_layered, GraphicsOutput.RTP_color)

Now, you’ll also need to tell Panda to instance the scene 144 times on the GPU, once for each layer. This instancing happens on the GPU, so we only have to send the scene to the GPU once per frame. (Sending the scene 2880 times in a second would almost certainly cripple your performance!)

render.setInstanceCount(144)

Now, the final step is to apply shaders to your scene. The shader is responsible for two things:
(a) Making sure the camera is rotated a bit when rendering each layer.
(b) Making sure each instance of the scene is sent to a different layer.

I’m not sure exactly on your requirements for (a), so I’m not sure what the best way to do this is. You could either calculate a rotation matrix in your shader based on gl_InstanceID, or you could set up a dummy NodePath at each camera angle in the scene, get the matrix from that, and pass that in an array of matrices to the shader.

(b) is fairly straightforward: you just have to have your shader assign the gl_InstanceID input (the index of the currently rendering instance) to the gl_Layer output (the index of the layer to render it to). There’s a caveat here, though.
If you’re on hardware that supports the GL_AMD_vertex_shader_layer extension, you’re in luck. You can put the assignment in the vertex shader. However, on NVIDIA hardware, gl_Layer is only available in the geometry shader, so you’ll have to have a simple geometry shader just to do this. This is a bit unfortunate as just having a geometry shader will make rendering a bit slower.

In your postprocessing pass, note you need a sampler2DArray to sample your array texture, which takes an integer index. Beware that running the postprocessing pass 2880 times per frame sounds like it could quickly become a bottleneck. You may be able to let each postprocessing pass process multiple frames at a time, though this will require a lot of experimentation.

I’m a bit unclear on how each scene will be sent to the display, so I’m not sure exactly how it would work from that point on. Do you have separate displays? Do you need to alternate which frame to actually display, and time this carefully?

Let me know if you have any questions or need specific help, such as with building the shaders, setting up the layered rendering, or optimizing performance. Good luck!

Wow, thank you very much for the help.

I’ll provide some more info.

Scenes are going to be simple. Right now I’m thinking a single character with AI reacting to movements with 3-4 low res cameras and motion detection and maybe simple puzzle games.

I’m thinking of using a GTX 970 Mini GPU in a mini ATX motherboard which can fit in the case of the device.

I have an old prototype volumetric display device I got from some office whose hardware sadly does not allow to render the frames on a PC/GPU and be streamed to the device, only receives polygon data and does the rendering itself and its hardware is extremely slow for realtime animations, only good for still 3d volumes with few minute processing needed.
i.imgur.com/JPog21E.jpg
But the electromechanical and optical design is the same.

There have been few univerity projects since. I could contact the developer of this display and he shared some info, even though he didn’t share code. cs.bris.ac.uk/publications/p … 001341.pdf
He said they used FBOs.

A complete volume will be generated from a 180 degree turn, not 360. So if there are 144 slices per volume, the angle difference between each slice will be 1.25 if we go with these fps number. (Decreasing fps will increase flicker but improve the volume quality by increasing number of slices and vice versa).
So we can either rotate a dummy node or just generate 288 dummy nodes with correct position/rotation for each. I think performance difference here is negligible but I’d go with the 2nd option.

If there’s still some confusion how this works, here’s an illustration showing the camera’s path.

It is rotated 144 times for each generated 3d volume, with 1.25 degree increments. Since the 3d volume is updated 20 times each second, the camera will need to make 20 180 degree revolutions per second, so rotate and render the scene 144*20=2880 times.
Persistence of vision will cause our eyes to see a solid 3d volume, although a flickery one.

I can build the electromechanical and optical parts of device and program the higher level code (game logic, AI, etc), but I’m pretty bad at Cg/GLSL.

Fair enough. I hope the Mini version is not too much weaker than the original version?

Hmm, 24 separate buffers? Does he use one for every scene angle? You could certainly try using 144 separate buffers, but I’m not at all convinced that it would be faster than using the method I described that uses a single layered FBO. Switching between FBOs in OpenGL is slow, and you would have to resubmit the scene for every FBO.

The NVIDIA Quadro FX 4800 certainly supported geometry shaders (and thereby gl_Layer) so perhaps he had simply not considered or attempted the layered rendering method.

Fair enough. I doubt the bottleneck will lie here. You can extract the matrix for each with .getMat() and pass this in an array to the shader, which you can use as view matrix.

Makes sense, I think I get the concept now.

GLSL isn’t so hard once you wrap your head around the basic concepts. :slight_smile: We’d be happy to answer your questions if you get stuck.

From the reviews and specs they are almost identical but because of less amount of fans and size of heatsink there’s not much room for overclocking.
I haven’t looked at all the options yet, I’m going to ask in the appropriate communities if there are other options.

As for CPU, I think (hope) any modern CPU is not going to be a bottleneck for this.

I just want to keep the case as small as possible. I will probably get a PCI extender and mount it horizontally somewhere above the motherboard just to decrease the height of the device case.

He meant using 24 offscreen buffers for every group of 24 slices that must be grouped into 1 24 bit frame is way faster than rendering as much on a window. So yes, he used 1 for every scene angle. So in total 144, but I don’t know if creating and destroying after sending the frame to the projector or not.

That’s very possible. These kind of projects require knowledge of electronics, mechanics and programming and take long to complete. It’s possible that the programming knowledge was limited, or they simply didn’t bother to write faster code as that one was perfectly fine for showing what the prototype was.
I’d be glad if that was the case, means we could have more complex 3d scenes with the method you’re suggesting.

Thank you!

I got a response from one of the devs of another such project from few years ago.

Does Panda allow to access these OpenGL stuff? I’m not sure what he means, but maybe someone who’s programmed in OpenGL level will get it. All I know is a bitwise operator should be pretty fast.

Huh, that’s not a bad idea. We don’t support glLogicOp right now, but we really should! It shouldn’t be too hard to add. I’ll get back to you about this.