Sitemaps for wikis, penalties for spammers
Forum » News / Wikidot news » Sitemaps for wikis, penalties for spammers
started by: michal frackowiakmichal frackowiak
on: 1213263350|%e %b %Y, %H:%M %Z|agohover
number of posts: 20
rss icon RSS: new posts
Sitemaps for wikis, penalties for spammers
michal frackowiakmichal frackowiak 1213263350|%e %b %Y, %H:%M %Z|agohover

We have just introduced a new nice feature for all wikis — automatic sitemaps. According to sitemaps.org:

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.

Every wiki now features a special URL which contains a list of pages within a wiki to help search engines crawl and find your content easier. E.g. a sitemap for www.wikidot.com can be found at http://www.wikidot.com/sitemap.xml (note: this is an XML file, your browser might not display this correctly). Of course there are no sitemaps for private wikis — they are not public by definition.

But honestly, last week we have been fighting spam on Wikidot most of the time. About 10% of all content created on Wikidot within the last 2 years is spam (this is not that much, but could become a serious problem if left unattended). Our job is to find spam, hide it from you and discourage people from spamming. Therefore we have built a very nice automatic filter for detecting spam content and now we need to plug this to Wikidot to provide 2 kinds of functionality:

  • it should help us marking some wikis as spam (those are mostly about bank credits, health insurances etc.)
  • it should help wiki admins and blog authors to protect from spam (spam comments, forum posts, spam pingbacks etc.) — we will certainly need this once we introduce a real blogging platform

As an important step in discouraging spammers from starting spam wikis, every wiki marked as spam:

  • blocks all web crawlers (including Google, Yahoo MSN and all others) from indexing the content (by using the robots.txt file),
  • changes all external links into redirects with special rel="nofollow" attribute — such links are not visible to any web crawler.

Those certainly make creating "link farms" on Wikidot pretty useless.

Although sitemaps do not bring any new visible improvements, they will certainly improve your wikis' visibility in the Internet. Good luck to all Wikidotians!

last edited on 1213332604|%e %b %Y, %H:%M %Z|agohover by michal frackowiak + show more
unfold Sitemaps for wikis, penalties for spammers by michal frackowiakmichal frackowiak, 1213263350|%e %b %Y, %H:%M %Z|agohover
Wikidotians
scottplanscottplan 1213315728|%e %b %Y, %H:%M %Z|agohover

Far be it from me to question the King on this matter of highest importance…

but shouldn't that be Wikidotians?

Lilliput : Lilliputians = Wikidot : Wikidotians

Humbly submitted.

S.

unfold Wikidotians by scottplanscottplan, 1213315728|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
michal frackowiakmichal frackowiak 1213332589|%e %b %Y, %H:%M %Z|agohover

Well, I guess you are right! Although there is no "Wikidotians" entry in any dictionary, this is a good point.

Thanks!

(updated in the news above)

unfold Re: Sitemaps for wikis, penalties for spammers by michal frackowiakmichal frackowiak, 1213332589|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
RMIGHTY1RMIGHTY1 1213388696|%e %b %Y, %H:%M %Z|agohover

Aww, durn, I thought I was a wikidite or a wikidonian!

unfold Re: Sitemaps for wikis, penalties for spammers by RMIGHTY1RMIGHTY1, 1213388696|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
David MarseillesDavid Marseilles 1213524238|%e %b %Y, %H:%M %Z|agohover

Keep up the good work. I'm psyched to see the blogging platform evolve.

unfold Re: Sitemaps for wikis, penalties for spammers by David MarseillesDavid Marseilles, 1213524238|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
wikario1wikario1 1213579371|%e %b %Y, %H:%M %Z|agohover

Good news!

But rel = "nofollow" looks like something paid and closed (in sense of truly open source philosophy). I think it would be better (but 100 % are more difficult) to realise struggle against a spam and spamers through a rating. As soon as the rating goes down below certain level the ability to create pages is closed… Further - possibility to edit current pages disappears, and then the account goes blocked and spams-pages removes.

Thanks!

last edited on 1213579585|%e %b %Y, %H:%M %Z|agohover by wikario1 + show more
unfold Re: Sitemaps for wikis, penalties for spammers by wikario1wikario1, 1213579371|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
mmclean89mmclean89 1213581271|%e %b %Y, %H:%M %Z|agohover

great progress. I am continually impressed by Wikidot.

unfold Re: Sitemaps for wikis, penalties for spammers by mmclean89mmclean89, 1213581271|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
michal frackowiakmichal frackowiak 1213601537|%e %b %Y, %H:%M %Z|agohover

Thanks for all the comments:

@wikario1:

We would like to avoid complicated "spam rating systems" simply because we do not want to start playing games with spammers. As soon as you set rules, ratings etc. people start finding workarounds, tricks and hacks. Also, it sometimes could hurt normal users. Somehow internally we are introducing various ratings through the statistical filter, but those are not used for anything else than spam detection and not necessarily "per user". In fact we are only feeding the filter with data and it figures the patterns itself. The only kinds of visible user levels are the karma levels we have introduced.

What we actually want is to know which wikis are violating our Terms and which are created to promote spam. We have a set of heuristic measures to find these and we are going to plug in our Bayesian filter too.

I think the more important thing is, once we start the mentioned blogging platform, to stop spam from appearing in comments, pingbacks etc. According to Akismet stats there is much more spam in the blogsphere than legitimate content. And nobody wants to look for and review spam manually. Also we have seen several forums affected by spammers where community acts as a filter and report spam manually. We want to help in this matter.

It appears that as a side-effect of working on spam we have developed a service (now in our internal use only) very similar to Akismet or TypePad AntiSpam — where users can send various content for spam identification. Not sure now if it will develop into a public service but it looks nice so far and has at least a few nice advantages over the competition (e.g. it is not bound to blogs only, has more flexible API…).

Anyway, I am getting back to work ;-) Thanks for your comments and all the support for the project!

Michal

last edited on 1213642226|%e %b %Y, %H:%M %Z|agohover by Gabrys + show more
unfold Re: Sitemaps for wikis, penalties for spammers by michal frackowiakmichal frackowiak, 1213601537|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
wikario1wikario1 1213719253|%e %b %Y, %H:%M %Z|agohover

@Michal:

Ohh… Now I have completely understood your point of view. Thanks for the answers. You do all correctly. Otherwise Wikido.com can turn in to another trash full by a spam. But personally I, as the supporter of open source and open internet don't like & welcome rel=nofollow methods. I agree, that spammers will try to bypass any filters. That's the shameful evolution of spam. And that's very bad! Cleaning and protection against spam is a difficult and long process which should be as much as possible automated without a damage to ordinary users. (for example, some of my wiki sites have got under the nofollow filter after the someone has placed there spam content (now it's deleted by me, and closed for editing).

So what's the pass for open and safe spam fighting? You know this much better than me, guys… I just discuss this hard problem from my angle :)

On my supervision on the Internet there is a refusal from nofollow - revival of live follow links.

—-
Regards,
Mike

unfold Re: Sitemaps for wikis, penalties for spammers by wikario1wikario1, 1213719253|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
michal frackowiakmichal frackowiak 1213728338|%e %b %Y, %H:%M %Z|agohover

I can agree that rel="nofollow" is not a "standard", but rather looks like a quick hack proposed by Google to tell search engines: "do not follow this link". Also, the rel attribute should define relation between items, not behavior.

But both Yahoo! and MSN accepted this hack and are using it. And effectively DOES stop spammers from building spam link farms. Many major open-source software (WordPress, MediaWiki) also have support for rel="nofollow" attribute. So de facto it became a web standard.

Also, we are not (yet?) putting such anti-spam solutions into our open-source release.

Let's wait and see how it works.

unfold Re: Sitemaps for wikis, penalties for spammers by michal frackowiakmichal frackowiak, 1213728338|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
wikario1wikario1 1213743253|%e %b %Y, %H:%M %Z|agohover

I really understand all of you said. And most of it I agree with you. I know history of nofollow + google and others SE… I know why and what for it was made by SE. But I see the tendency, when most bloggers remove rel="nofollow" by some plugins (in WP) etc., and proudly mark their blogs by lables like this

ifolloworange1.gif

Try to search google for this search phrase: "do follow" ;) Honestly, I was impressed!

Besides MediaWiki as well as the basic project constructed on its basis (wikipedia) are not indicative in "open internet" concepts. Lets take for example magnificent Drupal CMF which doesn't use this evil (not my words about "evil..") nofollow attribute. Personally I have some websites based on Drupal, and two of them, even being rather popular are not cluttered up by a spam, thanks to other spam protection technologies.

And one more thing I wish to tell… Most of spamers doesn't scare presence of nofollow attribute on websites, and they stupidly and persistently spoil their spam comments and forum posts daily. Some of them even assert, that at a considerable quantity of a spam link do that the "link weight" is transmitted to their sites even through nofollow! Is that truth? I don't know. But not all spammers are stupid (That's really scare me)

Anyway, you guys are doing a great work. Thank you!


Best regards,
Mike

last edited on 1213743532|%e %b %Y, %H:%M %Z|agohover by wikario1 + show more
unfold Re: Sitemaps for wikis, penalties for spammers by wikario1wikario1, 1213743253|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
GabrysGabrys 1213904704|%e %b %Y, %H:%M %Z|agohover

Well, Wikidot sets the nofollow attributes only to the links on the spam wikis, so in general, it lets robots follow links :).

unfold Re: Sitemaps for wikis, penalties for spammers by GabrysGabrys, 1213904704|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
tekmiestertekmiester 1213977288|%e %b %Y, %H:%M %Z|agohover

Don't foget about the pill site spam! good exmple: …

last edited on 1214139503|%e %b %Y, %H:%M %Z|agohover by Gabrys + show more
unfold Re: Sitemaps for wikis, penalties for spammers by tekmiestertekmiester, 1213977288|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
wikario1wikario1 1214055722|%e %b %Y, %H:%M %Z|agohover

Gabrys 19 Jun 2008, 22:45 +0300
Well, Wikidot sets the nofollow attributes only to the links on the spam wikis, so in general, it lets robots follow links :).

Hi Gabrys,

Yep…, I know that. But we shouldn't forget about a human factor. Wikidot is the platform for a collective page editing. It means, that each man can edit my wiki (in case I preliminary didn’t curtail the editing rights) and my wiki will automatically turn in to a spam wiki if someone change my content by a spam content. Next step is the Wikidot system settings having defined my wiki as a spam wiki, will automatically impose the nofollow filter?!? Such scheme arise some kind of complexities for users. Am I right?

Another big doubt is that filtered spam wikis still function and very well indexed (thanks Wikidot developers for good SE optimized core) by some search engines (citation: …blocks all web crawlers including Google, Yahoo MSN and all others from indexing the content by using the robots.txt file!!!) Hm… Indexing filtered wikis is tested by me personally, they are indexed pretty well. May be cached? And such spam wikis is carry out their black role - build some SE traffic and targeting this traffic to spammers’ sites from those spam wiki pages.

I think that deleting of a spam wikis after making a few warning to their admins is a good idea. Less spam - cleaner wikidot is. But wikidot is needed some regulating mechanism for this step, not just users rating, "flag as objectionable" links or nofollow attribute… need something deeper. Spam is a web AIDS of 21 century.

tekmiester 20 Jun 2008, 18:54 +0300
Don't foget about the pill site spam! good exmple: …

Believe me, it's not so bad spam example. … - that's really terrible spam. Man even not learn Wikidot syntax for spamming. And it's a lot of such examples in Wikidot and out of it.

last edited on 1214139534|%e %b %Y, %H:%M %Z|agohover by Gabrys + show more
unfold Re: Sitemaps for wikis, penalties for spammers by wikario1wikario1, 1214055722|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
Craig MacomberCraig Macomber 1214072646|%e %b %Y, %H:%M %Z|agohover

Please stop linking spam wikis (It helps them).

Such scheme arise some kind of complexities for users. Am I right?

It won't effect your using your wiki at all. It might even help your search rank because nofollow can avoid search penalties from spam links. I see no issue with this, though it would be good to inform the site admin if there wiki gets flagged as spam, and what to do about it.

unfold Re: Sitemaps for wikis, penalties for spammers by Craig MacomberCraig Macomber, 1214072646|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
GabrysGabrys 1214139588|%e %b %Y, %H:%M %Z|agohover

Craig, right, I deleted any links in comments (if someone's interested, let they check post' history).

unfold Re: Sitemaps for wikis, penalties for spammers by GabrysGabrys, 1214139588|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
najasnajas 1214699438|%e %b %Y, %H:%M %Z|agohover

hi,

i just ran into this spam site, […] .wikidot.com and felt sorry i could do no more than just reporting it. If you ever need more hands on deck to press the delete button on spammers like this, i volunteer

greets,
naja

Gabrys: thanks

last edited on 1214727618|%e %b %Y, %H:%M %Z|agohover by Gabrys + show more
unfold Re: Sitemaps for wikis, penalties for spammers by najasnajas, 1214699438|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
SoopSoop 1215599655|%e %b %Y, %H:%M %Z|agohover

Do I have to do anything to get google to read the sitemap?

Thanks by the way, this is one of the big features that I was really looking for.

unfold Re: Sitemaps for wikis, penalties for spammers by SoopSoop, 1215599655|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
Karr YriKarr Yri 1215932244|%e %b %Y, %H:%M %Z|agohover

i'm sure someone brought this up but i'm honestly too lazy and tired to read it all so here go's…
I've noticed that quite a few Wiki's tend to have absolutely no content, are just a title and a never altered page. these sites I find discouraging to my friends who all wanted to see a page that wasn't mine. maybe an action not as drastic as destroying the wikis like this, but just flagging them until they are at least altered from their original box settings…

humbly admitted at approximately midnight

one time insomniac Karr Yri

unfold Re: Sitemaps for wikis, penalties for spammers by Karr YriKarr Yri, 1215932244|%e %b %Y, %H:%M %Z|agohover
Re: Sitemaps for wikis, penalties for spammers
Phil ChettPhil Chett 1215949493|%e %b %Y, %H:%M %Z|agohover

I've noticed that quite a few Wiki's tend to have absolutely no content, are just a title and a never altered page. these sites I find discouraging to my friends who all wanted to see a page that wasn't mine.

I agree there are loads of "empty" sites. But if you want to direct people to sites that have content and are use lots of different wikidot features it might be worth while directing them to here:-

http://community.wikidot.com/start-featured

unfold Re: Sitemaps for wikis, penalties for spammers by Phil ChettPhil Chett, 1215949493|%e %b %Y, %H:%M %Z|agohover
new post