Converting panda CVS repository to Git

Hi everybody,

I’ve experimented a bit with Panda3D CVS repository conversion to git. These are the results I got:

  1. git cvsimport (git version 1.7.11.7) method failed with an error.
    stackoverflow.com/questions/5845 … 225#586225
  2. cvs-fast-export 0.3 failed with a Segmentation fault.
    gitorious.org/cvs-fast-export
  3. cvs2git worked fine after specifying encoding with support for umlauts.
    cvs2svn.tigris.org/cvs2git.html

These are the commands I gave to run the conversion:

mkdir panda3d-cvs-repo
rsync -av panda3d.cvs.sourceforge.net::cvsroot/panda3d/* panda3d-cvs-repo
cvs2git --blobfile=cvs2git-blob-panda3d-cvs-repo.dat --dumpfile=cvs2git-dump-panda3d-cvs-repo.dat --username=cvs2git --encoding=cp500 panda3d-cvs-repo

Hope this helps as a starting point for eventual git migration efforts.

If you need help, just ask.

Bye!

Thanks for the efforts! We’ve been talking behind the scenes about migrating to a distributed SCM, and I’ve personally been able to convert the main branch to git, hg, and bzr for comparisons.

For the record, I personally import using git-cvsimport, but it requires manually fixing the structure of the CVS repository before doing so. What I don’t like about cvs2git is that it produces fastimport files of over 9000 MB! Plus, cvs2git does not seem to support incremental changes, which makes migration more difficult - it would need to happen all at once.

It’s not easy, though. We have a lot of version history, for one, including branches and tags that cannot be imported 1:1 because these concepts don’t map in a straightforward manner. Have you tried a “diff -r” between a CVS checkout and your git checkout to verify the integrity? Have you been able to import tags and branches correctly?

I personally have a strong interest in migration to a distributed SCM, so it’s going to happen sooner or later - we just have to figure out all the details.

What are your opinions about hg over git, if any?

Unfortunately git-cvsimport is not advised as the most reliable and dependable tool to move from cvs to git. It’s own documentation states that:
«git cvsimport uses cvsps version 2, which is considered deprecated; it does not work with cvsps version 3 and later. If you are performing a one-shot import of a CVS repository consider using cvs2git or parsecvs.»
and has several issues listed here:
kernel.org/pub/software/scm/ … mport.html

My import with the previously mentioned command line created me two files for a total of 2.5GB. The imported result in a bare git resulted of only 118MB, which I think is fine.

I don’t think that is a problem. On the contrary I think a good approach to move from a SCM to another is to:

  • announce the movement and specify date, mode and supposed duration of migration;
  • turn CVS to read only on the announced date;
  • covert CVS repo to a git repo;
  • put online the new git;
  • after some time that git is working ok, shutdown completely CVS.

I think this is the approach KDE took to migrate from SVN to git, you can read about their efforts here:
techbase.kde.org/Projects/MovetoGit
Having two SCM at the same time can be a real nightmare.

Yes, everything looks fine (except for missing CVSROOT and .git presence, of course).

Yes, I think some adjustments are required (eg. all the “unlabeled” branches), but most of the things look ok. Take a look at the imported branches:
pastebin.com/vCtBhvWu
and at the tags:
pastebin.com/7jr3Se2N

I would surely go for git. I admit I’m a git fan, and have little experience with hg, but git syntax seems much more clear, and at the moment is surely the most supported and widespread choice for DVCS. It has several good administration tools (gitolite, gitlab), several web code browsers, a bunch of open source free hostings (github, gitorious, google code, sourceforge) and has a wonderful integrated code review system, called Gerrit, in use for Google Android, Qt Project and several other prominent open source projects.

Many of the ISSUES on that page don’t just apply to cvsimport, but to importing from cvs to git in general. For one, CVS does not store the branch base, so you have to find out at which point it diverges the least from a certain commit on the master branch; and since CVS has per-file revisions, tags and branches, to import a branch correctly you have to have ugly duplicate commits at some point and to import tags correctly you have to make a branch.

I’m curious to see how cvs2git handles these issues, so maybe I’ll try and import again after freeing up some disk space on the machine I run the imports at. :slight_smile:

Ideally, I agree, but Panda is used by several companies and I’m not sure if they can stop what they’re doing and switch from one day to the next. We might still like to pull in a few lost changes from CVS once in a while.

This looks very good. We may not need to import many of these branches, but it’s good to see that cvs2git imports them correctly.

Those are some good points. However, there are also good points to be raised for hg. That said, to me it’s more important that we switch to any DVCS at all (whether it be bzr, hg or git) instead of sticking with CVS.

migration should attract more tinkerers to panda. as for myself - i dread to touch cvs. its horrible and old and fk that st :slight_smile:)
i hope you will side with git. it being by far most popular should expose panda to widest potential coder audience.
cant wait to clone and compile 2.0 branch :slight_smile:

I mentioned in irc that one of the things to consider in addition to the technical merits of each dvcs is the social aspect – which dvcs and hosting site will help panda get more outside developers. Python is on bitbucket and hg so many python developers are focused and set up on those systems. On the other hand github and git is a much larger and wider community.

Git may be more popular in general, but hg seems quite popular among Python users. Python itself and many projects that use Python use hg; hg is written in Python as well.

Hg has a cleaner, somewhat less confusing interface than git; it also seems to have better Windows support.

Neither compare to bzr in terms of interface, though, since bzr’s interface and GUI tools just rock. I don’t like bzr very much because of its poor performance on large repositories, however.

Not that I’m advocating for one of the other; I just think we should consider all of the possibilities.

well - for one thing i would never ever pick hg over git is because git has superior branching feature. at least myself - i tent do separate my large work-in-progress code from main codebase using branches, and only when feature X is working it gets merged to master branch. it gives me bleeding edge master branch that is never broken along with ability to have many other branches that may or may not be broken, experimental and so on.

From what i gathered hg is perceived as easier to use and better supported tool on windows than git. I personally find it totally strange. Being git user on windows i cant say why msysgit would “feel as second class citizen”. Everything works fine for me. And GUI support for git is not what it was few years ago. Now we have tortoise git (opensource) (my first choice) and smart git (free for non commercial use) which is cross-platform even.

The only strong point (if it is strong point at all) of mercurial over git i can see is it being choice of python’s people. But i somewhat doubt sacrificing some git’s technological advantages over hg’s python community yield bigger gain for project. Dont forget python is #4 language on github too taking 8% of all projects. Besides - hg being python would it not be slow?

Some propaganda links:
felipec.wordpress.com/2011/01/16 … -branches/
blogs.atlassian.com/2012/03/git- … l-why-git/
blogs.atlassian.com/2012/02/merc … mercurial/

Projects using git:

I may not know everything neither about git nor about hg, but all those smart people behind those big projects cant be wrong! And so far git pretty much wins in score.

Aaaaand last thing… If you are geek like me you will probably enjoy 70 minutes of Torvalds talking about git: youtube.com/watch?v=4XpnKHJAok8

I’d just like to add gitextensions to the list of great windows git GUIs: code.google.com/p/gitextensions/

As a git newbie, I could not really get comfortable using TortoiseGit, then I tried gitextensions and never looked back.

Could you demonstrate how git has superior branching to hg? That sounds like it could be an important point, if you can back up this claim.

i must point out i am not advanced git user, i use it in most basic way because so far it pretty much is all i needed (except few cases where i had to resolve to git magic, outcome was satisfying).
nor i have ever used hg, so all i know about hg branches is what i read, and internet keeps touting “git branches are killer feature” and “git > hg”. I can provide some material i have read though:
rockstarprogrammer.org/post/ … l-and-git/
felipec.wordpress.com/2011/01/16 … -branches/
Or more in depth: felipec.wordpress.com/2012/05/26 … th-graphs/

From what i read hg branches are ridiculous because one has to clone full repository to new branch. Making full new copy of data that is already in repository sounds unnecessary, afterall branch is just changes on top of commit that was branched - and git gets that.
I also read deleting hg branches is not really possible which is weird. I do not keep defunct (already merged to master) branches any more - no point since they will not be used any more anyway.

I hope you put time into reading on git vs hg material. If you dont have that much time then listen to me and go for git hoho :smiley:

This is not quite true; in hg, there are two types of branches, named branches and repository clones. Named branches are like in git, a tag assigned to a particular changeset, allowing you to quickly switch branches within the same working copy. Repository clones, on the other hand, are more like bzr.

Now, there are some superficial differences in the way these branches work, such as whether or not each branch operation is recorded in history. There are advantages and disadvantages to be named for all of these little differences, but I don’t think they are very important.

What’s more important is whether a system has a usable and inviting interface, and whether it has good platform support, and whether the developers are comfortable working with it; not whether X has slightly more efficient storage algorithms for certain operations than Y.

Also, I’m sure that msysgit works great for you; it is still sign of poor platform support that the Windows port that comes closest to being considered ‘official’ is a separate project; and also that it is even MSYS-based to begin with because of its reliance on shell and perl scripts.

Again, I’m not advocating either way, and I’m not really going into debate about little specific features, nor do you need to try and sell me on a particular system especially if your knowledge of the other systems comes from a few blog posts.
I simply want to give all of the systems a fair consideration.

Another little note I can add to the “personal experience” discussion is that recent releases of the Eclipse EGit plugin are more feature complete and stable than its Mercurial counterpart.

speaking of hg branches - no matter what kind of branch you cant get rid of it ever when merged. my understanding is that discourages makign throw-away branches for testing because of history littering.

platform support: again, it works great, and if it works great where does poor platform support come from? i will just shurug-off poor platform support bit because it makes no sense >:)

you know how it is… as for me - panda3d is little toy. seeing little toy use my other big toy would be real benefit for me so yes, im advocating for git due to my self-interest. so happens this self-interest is in best interest of project too so i just want to let you know which tool is best for a job. its not a big deal i dont know hg - it does not take genius to figure out that ship with some holes will start sinking sooner or later :stuck_out_tongue:

and last thing - please give them equal consideration, because then winner is clear >:) its like comparing bike and car. sure bike is much better in some places, but you can go only as far with it… actually in real world if we are generous - CVS is a bike… :slight_smile:)

rndbit: long story short, you prefer git for personal reasons?

as far as i see, both git and hg meet the technical requirements.
i worked with git in the past, it was pretty good but i needed a bit of guidance to set everything up at first.
looking at hg it seems a bit easier. as hg has that “just works” thingy. i also like the documentation of hg.

i’d intuitively go with hg. mainly because it seems to be the easiest to use system. so new developers (even those who are in for a single bugfix) won’t be scared away by setup procedures.

btw. that CVS bike is more like a children-tricycle :wink:

well… panda3d is pretty big and complex project so people who cant wrap their heads around git should not touch code in the first place. well… they can touch code but their contributions would be of no value if there would be any contributions at all. besides game development is not an easy ride as it is, it takes pretty damn smart person to make a game. even in panda3d. im pretty confident everyone here can handle 10 minutes reading manpage once.

panda using cvs until this day proves project developers can manage using all kinds of weird stuff so anyway w/e is chosen im pretty sure they will manage… :smiley: and for other people repository still can be mirrored. actually ill make a mirror of official repository on github right now :smiling_imp:

@rndbit:
I’m sorry for being blunt, but you seem to be displaying a large amount of confirmation bias. Many of your claims about git are glorified and fished from blog posts made in git’s favour, and many of your claims about hg are simply false.
Given your lack of experience with hg, your lack of due diligence and your self-admitted conflict of interest I would therefore advise you to refrain from commenting further on this discussion until you have gained more insight into the matter.

For the record, it is not true that branches cannot be deleted in hg. Typically, in hg you would close a branch, which is the same thing except that the version history is maintained. It is still possible to delete the branch entirely, but it is made slightly more difficult; which can also be argued as a good thing. Git makes it very easy to accidentally delete a branch without providing a way to get it back, whereas hg makes it deliberately a bit more difficult to do operations that can cause large amounts of version history to be lost.
(A local non-named branch that has not been pushed can always easily be deleted by simply removing the directory.)

Now, once again, I am not arguing for either git and hg. Yes, git has advantages, just as hg has advantages. As ThomasEgi pointed out, they both fit the bill as far as I’m aware.
The reason why I brought up hg in the first place was to gauge the community’s acceptance of it, not to start a religious war over which is better. I don’t want to know what people think is “God’s DVCS”, I want to know if people would feel comfortable working with hg as opposed to git.

rndbit, it’s good to be skilled, but there is absolutely no need for such an attitude towards new developers.

it makes no sense to use an artificially complex system as entrance barrier. the whole point of switching away from cvs is to get rid of such a barrier and making it more attractive for new people to join in.

if you don’t agree with someones code quality in a distributed system. just don’t pull from them. it is vital that everyone can use the system, so they can code, test out things, practice, improve and one day become a valued developer.

last but not least there may also be a number of developers who didn’t use git before who simply don’t want to wrap their heads around git just to fix a small bug somewhere.

so i kindly ask you to put your nonsense argumentation away. either come up with good reasons for or against a certain vcs, or be quiet instead.

I guess im just tired of under-skilled people i have to deal with, hence my attitude :slight_smile:

Shovel - non complex, excavator - complex. You can see where this is going. Complexity is justified by usefulness.

To this i can argue to the death: if someone cant use simple tool (i have not seen complex VCS yet tbh) then we can safely guess quality of possibly produced code.

true - those developers just make patches and post on forum or wherever.

I guess we just have different priorities. Ill be silent then :slight_smile:

This isn’t going to be an argument in favour of any particular DVCS. I just want to point out that I, personally, would be open to using either git or hg.

Learning new tools is one of the most crucial skills an information technologist must have. The way I see it, if a software developer doesn’t have experience in one DVCS, and the developer wants to contribute to a codebase using that DVCS, then the developer should be eager to learn the basics of the DVCS. Insisting on sticking to a single set of tools (especially when it’s something simple like learning the basics of a DVCS–really, there are free interactive courses online teaching basic Git usage, and they take at most 15 minutes to complete; I’m sure the same can be said for hg) is simply unsustainable in the IT world. Tools, methods, and trends are rapidly evolving.

So if you love git, great! Use git for your personal projects. If you love hg, also great! Use hg for your personal projects. Whichever way Panda3D goes, though, I implore each information technologist to be ready to learn.