The task is to create a PHP application that fetches data from remote websites, parses HTML code and displays the result in a simple way, using HTML elements. We want to test your ability to understand basic web programming techniques, and to write clean, high quality code.
From page http://www.wikidot.com/ get sites from the "Featured Sites" section. For each of the sites go to <SITE_URL>/system:members and count number of members. Show the result in a simple table:
| Site | Link | Number of members |
|---|---|---|
| Cocktails | http://cocktails.wikidot.com | 1 |
| … | … | … |
To prevent fetching data with each request to your script, store results in a local cache (e.g. Memcached).
We care about the code you write. Do not add fancy features. You get points for efficient design, accurate understanding of the task, functional minimalism, and clarity and expressiveness. Comments are not a substitute for readability.
We will stress-test your code, running 10, 100, and 1000 concurrent connections, to check for performance bottlenecks and scalability, by running ab -n 10000 -c 10/100/1000 http://localhost/your_script.php.
If you have difficulties or questions with any aspect of this exercise, you should ask us for help (leave a comment or send an e-mail): that will not be counted against you.
Environment
- You can assume the Memcached server is running on 127.0.0.1:11211
- All standard PHP extensions are available, including all XML parsing libs (DOM, SimpeXML etc).
Deliverables
You should produce:
- A small design document (README) that captures:
- your understanding of the problem, and
- your proposed architecture.
- A few test cases written in shell scripting
Pack those in tar.gz and send to moc.todikiw|rtoip#moc.todikiw|rtoip
Deadline
We are waiting till Friday, 9th Oct, noon (CEST).






Here is a small hint for developers:
The displayed result does not need to be shockingly up-to-date. The information listed on the main page (http://www.wikidot.com) is also cached for some period of time.
It should be more important to create an efficient way of fetching and displaying data and handling dozens of req/s than displaying up-to-date content. This is why we encourage using Memcached.
Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me
Just of the curiosity — this "Featured Sites" section is where exactly?
@plisken: Featured Sites section is on the very center of http://www.wikidot.com/
Piotr Gabryjeluk
visit my blog
Hello,
testing environment:
CPU: AMD Athlon 2000+
RAM: 768MB DDR1
HDD: 40GB Seagate Baracuda
My test :D
Cant make 10000/1000… its kiling my PC :>
Now writeing readme, then pack everything and sending it to wikidot.com
gl ALL
A small hint is that if
1. the data is already in the cache,
2. you optimize the part of your script that displays the data
you could get a few hundreds of req/s quite easily. But performance aside, we really look much more at overall design (even with such a small task) and elegance of code, which reflects your coding habits.
Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me
I‘m not sure, is that hint for me…
if so, the higher performance on my PC its a some kind of abstraction :>
ma machine handle ~400req/s on
<?php echo ’Hello world.'; ?>
script ;)
so 80req/s its an magnificent result
Of course in my opinion.
Yes, I could throw out OO api (from main page cache) and use some inline scripting but is it wourth ??
regards
PS. Script was already send to moc.todikiw|rtoip#moc.todikiw|rtoip
Well, not necessarily to anyone particular, but we have solutions that go down below 1 req/s. Honestly I have not seen your solution yet (Piotr did not share it, still waiting), but 80 req/s is reasonable, especially if you machine can do 400 max.
And we have also written our own solutions to this task and we are getting ~ 800 req/s when data is in the cache, on a dual-core 2.6GHz machine. But there are many external factors, hardware and software.
We will go through it, probably together with authors of solutions. No worries.
BTW: are you using APC or eAccelerator?
Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me
@michal frackowiak
no i dont… (as I know)
could u past me benchmark from U testing Environment 10000/100
I`m so curious
This is on a 64-bit Mac. FastCGI and eAccelerator might be adding some extra boost here, but this setup is new so I did not bother to much tuning. The real application have other bottlenecks.
Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me
@michal frackowiak
thx,
but I thought about my script on U testing Environment ;) [starting with clean cache]
(test was mailed from: lp.qret|tkatnok#lp.qret|tkatnok)
@terq
I did some tuning to my PHP config, and see the whooping performance of your solution:
I cannot complete the test with -c 100 the cache is not populated, because I am getting apr_poll: The timeout specified has expired (70007) — probably to many concurrent outgoing connections and the thing is not reliable. Also could be Mac-specific. Strange, since I am using only 10 PHP processes to handle the traffic. Should work.
Anyway, nice performance.
Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me
Still 82 minutes left for submissions…
Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me
time out :>
Yep, we are digging through solutions — we will try to meet a few authors next week.
Thanks for all the submissions!
Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me
After the T/O but anyway :)
4000/5000 req / sec, C2D, no matter cache is populated or not ;)