The outage indeed started after the midnight (Friday/Saturday) but I was unable to contact anyone at the data center. It looked like routing problems because all traceroutes stopped 2 servers away from our machine. There was also a notice from the hosting company that they are experiencing some problems, but these should be short-time outages only.
I was monitoring the situation really hoping things would get back to normal within minutes. Since it was 3AM and the outages were expected between 12-3AM I thought it was nothing to worry about. I did not want to switch to the failover server at this point.
However in the very morning it appeared that the hosting company has some problems (at least this is what I was told) and they will fix it in max 5 hours. So I put a quick notice at Wikidot (in fact changed DNS stetting for Wikidot to another server) and keep contacting the tech team. Apparently things were fixed after 6 hours from that moment.
So far we did not experience any serious problems. One recent 1-hour break was also caused by routing, once there was a Slashdot news linking to NOOOXML.org campaign that slowed Wikidot before we tweaked the server configuration. The only really long outage was on 2nd Jan 2007 (routing problems again).
I am really sorry about the outage. This should be something to fix within 1 hour max but apparently it does not work like this.
Another thing is that we have decided (a few days ago) to change hosting company and move to another datacenter. There are a few reasons:
- we need faster servers and more storage — Wikidot is getting very very popular recently
- more reliable servers and network, as redundant as possible
- we need higher support level, better issue handling etc.
- more flexible plans, customizable hardware
We are seriously considering moving the servers to US since most of the traffic is coming from Americas. We would keep a server in Europe for hosting static files to reduce response time for visitors from Europe.
I think we will move in a month or 2 months. I believe this will improve things a lot and will make Wikidot more reliable, responsive and… better. This is even more important since we host websites and data that people rely on and we are fully aware of this. The excuse that "Wikidot sites are free and therefore do not have to be reliable" is not an option for us, believe me.
best regards and thank you for your patience!
BTW: Yep, and we need to improve the ways we contact with our users and the community. Starting ASAP.