By?Evan Miller

**PROBLEM**: You are a web programmer. You have users. Your users rate stuff on your site. You want to put the highest-rated stuff at the top and lowest-rated at the bottom. You need some sort of "score" to sort by.

**WRONG SOLUTION #1**: Score = (Positive ratings) - (Negative ratings)

*Why it is wrong*: Suppose one item has 600 positive ratings and 400 negative ratings: 60% positive. Suppose item two has 5,500 positive ratings and 4,500 negative ratings: 55% positive. This algorithm puts item two (score = 1000, but only 55% positive) above item one (score = 200, and 60% positive). WRONG.

*Sites that make this mistake*: Urban Dictionary

**WRONG SOLUTION #2**: Score = Average rating = (Positive ratings) / (Total ratings)

*Why it is wrong*: Average rating works fine if you always have a ton of ratings, but suppose item 1 has 2 positive ratings and 0 negative ratings. Suppose item 2 has 100 positive ratings and 1 negative rating. This algorithm puts item two (tons of positive ratings) below item one (very few positive ratings). WRONG.

*Sites that make this mistake*: Amazon.com

**CORRECT SOLUTION**: Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter

*Say what*: We need to balance the proportion of positive ratings with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is:?*Given the ratings I have, there is a 95% chance that the "real" fraction of positive ratings is at least what?*?Wilson gives the answer. Considering only positive and negative ratings (i.e. not a 5-star scale), the lower bound on the proportion of positive ratings is given by: