Minor outage earlier today
Forum » News / Wikidot news » Minor outage earlier today
Started by: michal frackowiakmichal frackowiak
On: 1236186869|%e %b %Y, %H:%M %Z|agohover
Number of posts: 6
rss icon RSS: New posts
Minor outage earlier today
michal frackowiakmichal frackowiak 1236186869|%e %b %Y, %H:%M %Z|agohover

Unfortunately we had a minor outage earlier today, exactly between 14.30 and 16.00 UTC. The problem was caused probably by an unexpected DOS attack that consumed resources of our main server, but we are still investigating it. We started working on this at 14.31. Although we did cope with the issue quite quickly, our freshly-rebooted storage array forced an all-disk check (fsck) which took more than an hour. Since the reboot was not scheduled, it halted the whole service from being taken on-line, and enormously prolonged the outage.

We were positing status updates in real-time.

Sorry about any inconvenience. We know people rely on Wikidot for many of their activities (we got a lot of phone-calls too), this is why we always try to prevent such incidents and fix them as soon as possible. Thanks for your understanding!

Best regards,
The Wikidot Team


Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me

unfold Minor outage earlier today by michal frackowiakmichal frackowiak, 1236186869|%e %b %Y, %H:%M %Z|agohover
Re: Minor outage earlier today
Helmuti_pdorfHelmuti_pdorf 1236191553|%e %b %Y, %H:%M %Z|agohover

Thanks for the info!

Question: would it help to set up a "short notice" on one of the "reachable" google.groups ( our "temporary wikidot dev-list" ?)

This link is readable for non-signed in visitors too: http://groups.google.com/group/wikidot?hl=en

at this link I asked why (and if !) wikidot is not reachable - could be my provider has lost some DNS . . , my router is damaged …, my browser has problems.. ,

(I for my own tried first www.openDNS,com to check if my provider has problems)

I got the info from piotr ( by mail) - the team is working on a strange failure.. that helped realy!

I finished my experiments to reach wikidot - and made a pause…


Service is my success. My webtips:www.blender.org, www.zusi.de (Demo-Video)

Wollen Sie Wikidot helfen im deutschen » Handbuch ?

unfold Re: Minor outage earlier today by Helmuti_pdorfHelmuti_pdorf, 1236191553|%e %b %Y, %H:%M %Z|agohover
Re: Minor outage earlier today
michal frackowiakmichal frackowiak 1236194830|%e %b %Y, %H:%M %Z|agohover

Thanks Helmuti,

first of all, we do have "dynamic IP addresses" which we can assign to different servers when needed. In case one front-end server dies, we can easily assign the IP to another server and either serve a read-only content from Wikidot, or make a backup server display a message.

This time however we had a problem with it — I am not sure how this works (Piotr is much better at this) but the router in the datacenter did not want to route traffic properly. Perhaps we were doing something wrong, but it did work several times before that. This is why we got the message a bit later, and for a while no servers could be reached at all.

There was also a suggestion to set up an external site that would report status of Wikidot.com, which is also a good idea.


Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me

unfold Re: Minor outage earlier today by michal frackowiakmichal frackowiak, 1236194830|%e %b %Y, %H:%M %Z|agohover
Re: Minor outage earlier today
michal frackowiakmichal frackowiak 1236330404|%e %b %Y, %H:%M %Z|agohover

There was one more minor outage yesterday at 21.40 UTC, but this gave us more details of what is going on. It looks like we are suffering from a bug in PHP FastCGI interface that leaves dysfunctional PHP processes, that produce "500 Internal Server Error" and leave a lot of resources open: connections to cache, database, files etc. The problem escalated really fast and effective lead to front server stopping responding, which leads to an internally-caused Denial of Service.

We have modified the webserver configurations to: 1. prevent situations which lead to creating dysfunctional processes, 2. if it does not work, each such emergency situation is now detected automatically and fixed so that no escalation takes place, thus the server is auto-healing.

With Wikidot growing, we often meet new frontiers and new problems. Most of them we fix or adjust to without any service interruption, but some (as this one that results from a bug in software we are using) manifests in a very nasty way. Fortunately we have a really bright team here that can quickly diagnose, react to problems and most importantly prevent problems from occurring in the first place.

So I hope we have this one closed. Thanks for all the friendly support emails from you, Twitter posts and comments we have received! We are really happy to provide Wikidot services and we are devoted to it, and also are glad that our efforts are recognized!

EDIT: we are looking at some future-proof improvements to our infrastructure, I put more notes on my blog.


Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me

last edited on 1236335866|%e %b %Y, %H:%M %Z|agohover by michal frackowiak + show more
unfold Re: Minor outage earlier today by michal frackowiakmichal frackowiak, 1236330404|%e %b %Y, %H:%M %Z|agohover
Re: Minor outage earlier today
bruteginkillerbruteginkiller 1245171542|%e %b %Y, %H:%M %Z|agohover

Thanks for the info! i dont think i was on then so lucky i wasnt, i get angry very fast try not to but….it happens

unfold Re: Minor outage earlier today by bruteginkillerbruteginkiller, 1245171542|%e %b %Y, %H:%M %Z|agohover
Re: Minor outage earlier today
cold_blood3dcold_blood3d 1245175811|%e %b %Y, %H:%M %Z|agohover

thanks. i was happy it didn't last too long.

unfold Re: Minor outage earlier today by cold_blood3dcold_blood3d, 1245175811|%e %b %Y, %H:%M %Z|agohover
New post
2007-2009 Copyright Wikidot Inc.