Cartoon shader improvements

If you want to take a look, the primary shader generator in Panda is written in C++ and resides in /panda/src/pgraphnodes/shaderGenerator.cxx. The interesting stuff happens in the method synthesize_shader(). For cartoon shading, the relevant part is the light ramping.

The postprocessing shader generator is Python-based, located at /direct/src/filter/CommonFilters.py. This generator takes care of the inking step (search for CARTOON_BODY). (In Linux, CommonFilters.py is installed in /usr/share/panda3d/direct/filter/. This is useful for prototyping, as you can simply replace the file and re-run your program.)

When I started this, I found the shader generators rather understandable after reading and puzzling over the source for a while.

Actually, me too. It’s just something that appeared pretty similar to scientific computing :slight_smile:

Many of the elements are the same: mathematics (especially numerics and vectorized calculations), algorithm speed considerations (number of operations and memory fetches, degree of parallelism), and the code consists mainly of (sometimes very long and logically unsplittable) functions that are designed to perform one task well. Unlike in application programming, the logic is usually so simple that you can work out just by reading the source code whether a given implementation works correctly or not (resorting to pen and paper for the math).

Still, that leaves a lot to learn regarding things that you gradually pick up by exposure to a particular field - for example, I’d never even imagined a procedural lens flare, until ninth posted his shader and the link where the technique is explained.

(The same blog (http://john-chapman-graphics.blogspot.co.uk/) contains some more useful stuff, including an explanation of SSAO, how to do realtime motion blur, and how to generate good-looking spotlight cones for dusty air. There’s no index, but it’s quickly read through as there are only 7 entries in total.)

The approximate technique used by ninth is a good find, as it’s simple to implement and understand, and it’s computationally light. Another approach to generating procedural lens flares is by raytracing the lens system. See e.g. http://resources.mpi-inf.mpg.de/lensflareRendering/pdf/flare.pdf. The results look very impressive, and the authors mention that the raytracing can be done in a vertex shader, but the article was light on details, so this won’t be at the forefront of my list of things to try :stuck_out_tongue:

Yes, it will involve additional texture lookups, as indeed does the supersampling. But it might allow for using less supersamples.

In the case of supersampling, it would be possible to eliminate some lookups by placing the supersamples so that e.g. the “right” detector point of one supersample coincides with the “left” detector point of the next one, but that drastically reduces the flexibility of supersample placement and makes the code more complicated. And I’m always worried about using regular grids for sampling (due to potential for moire artifacts).

There are also special tricks that are sometimes applicable. For example, it is good to keep in mind that the GPU does bilinear filtering at no extra cost. Hence, one should keep an eye out for places in the algorithm where a linear combination of two neighboring texels can be used to derive the result instead of using the original two values directly. In the context of gaussian blur, see http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/ (link picked from inside the lens flare link posted by ninth).

If it runs on the GPU, probably not. It’s true that the binary search reduces the run time for one instance from O(n) to O(log(n)), but the branching is likely to destroy parallelism.

A general guideline is to avoid if statements in a shader, as they can be slow in Cl. It wasn’t said explicitly, but I think that implies that the GPU performs SIMD type tasks well, while branching requires extra effort. At least from a vectorization viewpoint that makes sense.