EXPLAIN EXTENDED

How to create fast database queries

Top 5 XKCD comics which can illustrate programming questions

with 8 comments

As many of you know, I actively participate on Stack Overflow, the leading Q/A website for software developers.

Kind folks on Stack Overflow have made their data open for examination, and anyone can query their database using this web interface at data.stackexchange.com.

Many of the questions and answers there are illustrated with links to XKCD, the web comics created by Randall Munroe.

So I decided to see which of those comics best illustrate quirks and oddities we keyboard warriors have to deal with in our daily routine.

The query itself is quite simple:

SELECT  link, cnt, id [Post Link], score
FROM    (
        SELECT  link, id, score,
                COUNT(*) OVER (PARTITION BY link) cnt,
                ROW_NUMBER() OVER (PARTITION BY link ORDER BY score DESC) rn
        FROM    (
                SELECT  id, score,
                        'http' + SUBSTRING(start, 1, PATINDEX('%/%', SUBSTRING(start, 13, 30)) + 12)
                FROM    (
                        SELECT  id, score,
                                SUBSTRING(body, PATINDEX('%://xkcd.com/[0-9]%', body), 20) start
                        FROM    (
                                SELECT  TOP 400000
                                        id, body, score
                                FROM    posts
                                ORDER BY
                                        id DESC
                                UNION
                                SELECT  TOP 4000000
                                        postId, text, score
                                FROM    comments
                                ORDER BY
                                        id DESC
                                ) q
                        WHERE   body LIKE '%://xkcd.com/[0-9]%'
                        ) q
                ) q (id, score, link)
        ) q
WHERE   rn = 1
ORDER BY
        cnt DESC

The TOP clauses are there to limit the query size, as their hosting provider does now allow fulltext indexing, and without them the query would just time out. This query only searches recent comments and posts, so some famous and top-voted posts might not be present here. Still, it's a good sample.

You can see all results here, and we'll just discuss the top 5 entries.

#5. Random Number (10 links)

Random Number

Of course the questions which link to it deal with random number generators of any kind and fallacies in their implementation.

The top voted post:

Is 161803398 A 'Special' Number? Inside of Math.Random()

For some (probably good) reason, the Golden Ratio seems a good choice to seed a random number generator.

It's interesting that one of the referenced papers, Pseudo Random Number Generators in Programming Languages, displays the XKCD comic right on its first page.

#4. goto (10 links)

goto

goto is a construct in some programming languages which allows to make an unconditional jump to an arbitrary point within a program or function, skipping all currently executing loops or statement blocks. Many developers (and apparently velociraptors) consider this bad programming practice.

The top voted comment goes on this post:

Getting Parse Error Unexpected "

, where the author asks not to pick on his gotos, to no avail. The top comment linked to the comic with 5 upvotes.

#3. Password Strength (11 links)

Password Strength

Instead of using random character sequences for passwords (which are hard to remember), Randall suggests using passphrases in natural language, which are both more secure and easier to memorize.

The top voted comment is on a post which asks for a regexp to validate passwords with enough complexity:

Complex password validation using regex

This regexp would not allow using such passphrases. The comment author subtly points to this fact with an XKCD link.

#2. Wisdom of the Ancients (16 links)

Wisdom of the Ancients

It's so frustrating when you are searching for a rare and specific problem on the internet, and find a single post which addresses the exactly same problem but does not mention how has it been solved, if at all.

The top voted comment goes on this post:

Class bytes found but defineClass()failed for error when deploying EAR

The guy who had posted it apparently spent much time searching for a solution, and when he had finally solved the problem, he was so kind as to post both the problem and the solution on Stack Overflow.

The comment is just a thank you with a link to the comic. And judging by the comment and the number of upvotes, it helped at least three people.

Good job, seriously! All do as Jarrod Roberson does.

And the winner with astonishing 96 links is, of course

#1. Exploits of a Mom (96 links)

Exploits of a Mom

A cunning mom named her son so that entering his name into a database not protected against SQL injection would ruin it.

This is actually not unheard of, and there is a real story you may find fun to read: Searching for a customer whose last name is Null kills the application.

Also, people or Irish descent are apparently pretty much irritated with the wrong SQL injection handling techniques as they try to submit their surnames on websites.

The top voted comment goes on this question:

SQL Injectable Webpage

whose author in fact tries to write code vulnerable to SQL injection on purpose, but fails.

Written by Quassnoi

July 9th, 2014 at 8:00 pm

Posted in Miscellaneous

8 Responses to 'Top 5 XKCD comics which can illustrate programming questions'

Subscribe to comments with RSS

  1. Usually, there’s no algorithmic reason for using the golden ratio for crypto algorithms, besides the fact that it was provably not fabricated. When a crypto algorithm contains hardcoded numbers whose origin is not provable, there is a reasonable suspect that the numbers might have been picked to create a backdoor. If instead some provable-arbitrary constants are used (like pi, phi, etc.), the backdoor suspect goes away.

    Giovanni Bajo

    10 Jul 14 at 22:53

  2. @Giovanni: taking your word for it, this is totally outside my area of competence!

    Quassnoi

    10 Jul 14 at 23:06

  3. You missed the joke on the goto one:

    He’s not implying that velociraptors hate goto statements. He’s making a Jurassic Park joke. As in, if you make a coding mistake, the velociraptor pen will go berserk and let the raptors out.

    Vince

    10 Jul 14 at 23:40

  4. @Vince: well, this is outside my area of expertise too! Never watched the movie :)

    Quassnoi

    10 Jul 14 at 23:42

  5. http://xkcd.com/1319/

    This is in the readme of the automated tests I’m writing for my QA department. Hasn’t led me astray yet.

    steven

    11 Jul 14 at 00:00

  6. Quassnoi

    11 Jul 14 at 00:04

  7. the where the links to each question/comment are supposed to be appears to have zero width, judging by the page source my browser received. In other words, the links are invisible and therefore unusable (without checking the source to see the data-id, at least).
    interesting post otherwise though! Just wish I knew SQL a bit better…

    Twisted_Code

    8 Jul 21 at 22:26

  8. @Twisted_Code: the WordPress plugin that used to show the links is Tango Uniform. Here are your links in plain text. Thanks for the heads up!

    Quassnoi

    8 Jul 21 at 23:02

Leave a Reply