Lots of spam in the Manual

I did a quick search for “card” in the manual, looking for a way to texture an image onto a card / plane. It returned one entry on graphics cards, and a ton of articles about blackjack card counting, credit card services, internet marketing strategies with business cards, etc. I remember this was a problem before with other products. Seems like someone’s using the Panda Manual for some serious SEO.

Seems google has noticed as well, theres a message that appears when you do a google search that says ‘this site may be comprimised’

support.google.com/websearch/answer/190597

How does the spam get on the manual?

People register an account in the manual and create a new page. Fortunately, they rarely edit existing pages, or add links to their pages, but this also means that they often don’t get noticed.

I found out that there were over 18000 spam pages, which was about 90% of all pages. I’ve just used some broad patterns to nuke over 10000 pages. Maybe that’ll enough for Google not to consider the site to be compromised, although I’ll get around to deleting the rest eventually.

It might be worth restricting the manual to a white-list of authorized editors (if that’s possible). I’m not sure you want just anybody going in and editing the primary instructions / adding pages. Or have new pages need approved?

I do agree that we need stricter access control in the manual.

I’ve just changed the permissions; new users can still edit, but they have to wait 4 days until they can create a new page.

The manual mainpage has the spammies now, too

http://www.panda3d.org/manual/index.php/Main_Page

edit: Just opened a wiki-account to try to fix it, but rdb was faster. Thanks!

Google search warns this website “may be hacked”.

I have been experimenting with measures to keep spammers out; they seemed to work so far. I just cleaned up the spam in the manual with a few sophisticated queries, deleting about 35000 pages.

RDB - years back when I had a php bulletin board on my site, it got eaten by spammers, but they also exploited something that allowed them to edit some of my actual pages. So I came up with a perl script that knew the sizes and dates of creation, and I kept the master list elsewhere. On a regular interval, my perl script would execute and automatically “heal” the site.

This sounds like something you could do also. I just don’t know how easily you can get it in your workflow since I was the only person editing and working on my site (well, other than those spammers) but it might be worth looking in to.

Cheers!

Charles