eigen

Return to Compiling or Editing the Panda Source

Postby jean-claude » Thu Dec 22, 2011 6:52 pm

just for my understanding
If we go back to an alternative malloc library that can provide 16-byte alignment natively, this waste goes away. I'll run a few more tests to confirm this, then commit this change.

Assuming the use of alternative malloc library:
(1) what about /zp16 option in the compiler?
(2) can EIGEN_ALIGN16 then still be used as recommended in eigen3?
ie for instance typedef Eigen3::Map<Eigen3::Vector4f, Eigen3::Aligned> Vector4fMap;
User avatar
jean-claude
 
Posts: 384
Joined: Sun Jan 23, 2011 1:41 pm
Location: Paris - France

Postby drwr » Thu Dec 22, 2011 7:23 pm

About the EGG loader, I just found it strange that the memory it uses sticks around even after the game is loaded up. Shouldn't that memory it uses to convert to BAM be able to be released once it has finished loading?

Yes, it does, mostly. It doesn't actually get returned to the system--most allocated memory doesn't--but that memory can still be subsequently reused by Panda operations, to a point limited by fragmentation and related problems. But I think the fundamental problem with loading egg files in the runtime client is that once the graphics context has been created, there is much memory already allocated for that purpose, and then the egg loader adds memory on top of that--and it becomes easy to exceed the paltry 2GB limit that Win32 provides.

what about /zp16 option in the compiler?

This affects only the compile-time packing of structures, which we are already handling correctly with the EIGEN_ALIGN16 and other related definitions. But these kinds of alignment rules all assume that the the structures *start* on a 16-byte aligned block of memory, which might not be true for the memory returned by malloc(). (Actually, I've seen differing reports on whether this is true or not for Win32 malloc(), and I just thought it best to assume there exist cases for which it's not true without actually testing it. For instance, maybe it's true for Win7, but not on WinXP; and who wants to go around and test all of the existing Windows versions?)

can EIGEN_ALIGN16 then still be used as recommended in eigen3?

Yes, but again, this refers only to the compiler packing, and it fundamentally assumes that the runtime memory is already aligned (which is what Panda will be responsible for guaranteeing one way or another, especially if you use PANDA_MALLOC() / PANDA_FREE() to manage your memory allocations).

David
drwr
 
Posts: 11425
Joined: Fri Feb 13, 2004 12:42 pm
Location: Glendale, CA

Postby jean-claude » Thu Dec 22, 2011 8:00 pm

ok, so basically you're handling directly a panda aligned_malloc & aligned_free, and heap management (garbage collection).

I'm asking since for SSEx optimization when I'm using Intel compiler in some case I've redefined some new/delete.
But maybe I should merely rely upon Panda_malloc.

btw.
(1) I refrain from using STL stuff (list,set,map,vector,...) since I've seen a waste of memory (I'd say +40% sometimes) and a drop in performances
(2) in some case using alloca (ie allocating on the stack instead allocating in the heap) has proven quite efficient...
User avatar
jean-claude
 
Posts: 384
Joined: Sun Jan 23, 2011 1:41 pm
Location: Paris - France

Postby drwr » Thu Dec 22, 2011 8:39 pm

But maybe I should merely rely upon Panda_malloc.

You can also inherit from MemoryBase, which gives you a redefined operator new/delete that calls down to PANDA_MALLOC(). Note that this is guaranteed to be 16-byte aligned only when compiling with Eigen.

(1) I refrain from using STL stuff (list,set,map,vector,...) since I've seen a waste of memory (I'd say +40% sometimes) and a drop in performances

There are volumes of opinions across the internet about the pros and cons of STL and its relative inefficiency. It's true it's not the most efficient toolkit, memory-wise or CPU-wise, but it's a decent tradeoff between performance and developer effort. And at this point we're well-committed to relying on STL heavily throughout Panda. ;) In the very inner loops, we are more likely to rely on hand-rolled structures for optimum performance.

(2) in some case using alloca (ie allocating on the stack instead allocating in the heap) has proven quite efficient...

Agreed! I love alloca; it's practically free. It's also unfortunately only occasionally useful due to its nature.

David
drwr
 
Posts: 11425
Joined: Fri Feb 13, 2004 12:42 pm
Location: Glendale, CA

Postby drwr » Thu Dec 22, 2011 8:45 pm

Shoot, I was wrong about the source of the 30% memory bloat. It's not related to the malloc scheme at all. I'll have to investigate further. It takes a while to iterate because my build times are so slow now.

David
drwr
 
Posts: 11425
Joined: Fri Feb 13, 2004 12:42 pm
Location: Glendale, CA

Postby jean-claude » Fri Dec 23, 2011 8:46 am

BTW. There is something that puzzles me for a couple of days, ie I see some structures like that being generated at compile time at several places:
Eigen::Matrix<float, 3, 3, 3, 3, 3>

What kind of matrix is this? A five dimensional matrix?? What for??

example:
instantiation of "const Eigen::Transpose<const Derived> Eigen::DenseBase<Derived>::transpose() const [with Derived=Eigen::ReturnByValue<Eigen::internal::inverse_impl<Eigen::Matrix<double, 3, 3, 3, 3, 3>>>]" at line 160 of "panda/src/display/graphicsPipe.cxx"
1>[T4] Building C++ object built/tmp/p3display_composite2.obj
User avatar
jean-claude
 
Posts: 384
Joined: Sun Jan 23, 2011 1:41 pm
Location: Paris - France

Postby drwr » Fri Dec 23, 2011 1:07 pm

Some of those 3's are actually bitmask options for Eigen's template class. I'm not sure offhand what they all stand for; but this is part of the nature of template libraries: the compiler symbols are really hard to decipher. ;)

David
drwr
 
Posts: 11425
Joined: Fri Feb 13, 2004 12:42 pm
Location: Glendale, CA

Postby teedee » Thu Jan 05, 2012 3:58 pm

Just wanted to give a little update.
Either by my own accidental doing or by recent changes in Panda my server app is now using less than 100mb of memory instead of 900mb. That is huge.
Using the 3GB patch on python lets me keep my 32-bit build for development since the memory eaten by the EGG loader will not push the client over the memory limit. Thanks for that suggestion jean-claude!
With eigen enabled I am getting a frame rate increase somewhere in the range of 5-10%, basically for free. :)
Did you ever figure out where the memory bloat was coming from? I seem to get about the same memory usage regardless of enabling eigen or not in the build.
teedee
 
Posts: 849
Joined: Tue May 12, 2009 11:33 pm
Location: Kepler-22b

Postby drwr » Fri Jan 06, 2012 2:35 pm

When I looked closer, I didn't find a memory bloat at all. Maybe there's something unique to your particular scene that now causes more memory usage than in previous versions?

Or maybe the memory bloat is in the egg loader only, for instance because we changed that default recursion limit? You can try setting "egg-recursion-limit 10000" to see if it makes a difference.

David
drwr
 
Posts: 11425
Joined: Fri Feb 13, 2004 12:42 pm
Location: Glendale, CA

Postby teedee » Fri Jan 06, 2012 9:13 pm

I tried setting it to 1000, 10000, and 1000000. It didn't seem to make any difference in memory usage.
The extra memory from the EGG loader isn't really an issue for me now with the 3GB patch, and it won't affect end users who will have BAM files instead.
I'm not sure if my issue with GLSL shaders is related to these changes or not, but otherwise everything seems to be working fine.
teedee
 
Posts: 849
Joined: Tue May 12, 2009 11:33 pm
Location: Kepler-22b

Postby teedee » Thu Jan 19, 2012 3:03 am

There appears to be a problem with OrthographicLens when eigen is used in the build, in that it behaves differently and spits out warnings in some circumstances.
Example code:
Code: Select all
from panda3d.core import *
from direct.showbase.ShowBase import ShowBase

class Game(ShowBase):
    def __init__(self):
        """Get the game ready to play."""
        ShowBase.__init__(self)
        self.model = self.loader.loadModel('smiley')
        self.model.reparentTo(render)
        self.lens = OrthographicLens()
        self.cam.node().setLens(self.lens)
        self.lens.setFilmSize(50)
        self.lens.setAspectRatio(1)

game = Game()
game.run()

The warning (spammed repeatedly):
Code: Select all
linmath(warning): Tried to invert singular LMatrix4
teedee
 
Posts: 849
Joined: Tue May 12, 2009 11:33 pm
Location: Kepler-22b

Previous

Return to Compiling or Editing the Panda Source

Who is online

Users browsing this forum: No registered users and 0 guests