## Archive for March 25th, 2010

## Bayesian classification

Comments enabled. I *really* need your comment

From **Stack Overflow**:

Suppose you've visited sites

S0 … S50. All exceptS0are48%female;S0is100%male.I'm guessing your gender, and I want to have a value close to

100%, not just the49%that a straight average would give.Also, consider that most demographics (i.e. everything other than gender) does not have the average at

50%. For example, the average probability of having kids0-17is~37%.The more a given site's demographics are different from this average (e.g. maybe it's a site for parents, or for child-free people), the more it should count in my guess of your status.

What's the best way to calculate this?

This is a classical application of Bayes' Theorem.

The formula to calculate the posterior probability is:

`P(A|B) = P(B|A) × P(A) / P(B) = P(B|A) × P(A) / (P(B|A) × P(A) + P(B|A`

^{*}) × P(A^{*}))

, where:

`P(A|B)`

is the posterior probability of the visitor being a male (given that he visited the site)`P(A)`

is the prior probability of the visitor being a male (initially,**50%**)`P(B)`

is the probability of (any Internet user) visiting the site`P(B|A)`

is the probability of a user visiting the site, given that he is a male`P(A`

is the prior probability of the visitor not being a male (initially,^{*})**50%**)`P(B|A`

is the probability of a user visiting the site, given that she is not a male.^{*})