Archive for July 9th, 2014
As many of you know, I actively participate on Stack Overflow, the leading Q/A website for software developers.
Kind folks on Stack Overflow have made their data open for examination, and anyone can query their database using this web interface at data.stackexchange.com.
Many of the questions and answers there are illustrated with links to XKCD, the web comics created by Randall Munroe.
So I decided to see which of those comics best illustrate quirks and oddities we keyboard warriors have to deal with in our daily routine.
The query itself is quite simple:
SELECT link, cnt, id [Post Link], score FROM ( SELECT link, id, score, COUNT(*) OVER (PARTITION BY link) cnt, ROW_NUMBER() OVER (PARTITION BY link ORDER BY score DESC) rn FROM ( SELECT id, score, 'http' + SUBSTRING(start, 1, PATINDEX('%/%', SUBSTRING(start, 13, 30)) + 12) FROM ( SELECT id, score, SUBSTRING(body, PATINDEX('%://xkcd.com/[0-9]%', body), 20) start FROM ( SELECT TOP 400000 id, body, score FROM posts ORDER BY id DESC UNION SELECT TOP 4000000 postId, text, score FROM comments ORDER BY id DESC ) q WHERE body LIKE '%://xkcd.com/[0-9]%' ) q ) q (id, score, link) ) q WHERE rn = 1 ORDER BY cnt DESC
TOP clauses are there to limit the query size, as their hosting provider does now allow fulltext indexing, and without them the query would just time out. This query only searches recent comments and posts, so some famous and top-voted posts might not be present here. Still, it's a good sample.
You can see all results here, and we'll just discuss the top 5 entries.
#5. Random Number (10 links)
Of course the questions which link to it deal with random number generators of any kind and fallacies in their implementation.