TSuereth blog header photo
TSuereth's c-blog
Posts 0Blogs 16Following 0Followers 4



A Modest Review Proposal

Re: Why do we use review scores?

I completely agree that review scores are easy, painless, and extremely convenient, even if they are an imperfect metric of a game's worth. I myself put non-trivial stock in Metacritic averages, and will sometimes skip a review's text in light of an exceptionally high or low score (hey, I gots things to do!).

But where would the human race be if we settled for convenient, imperfect solutions? (Your mother's house, that's where!) So in the interest of bettering not only video game journalism, but humanity itself, I hereby put forth my proposed replacement for game review scores.

The remainder of this post is essentially a poorly-written whitepaper, so if you just want to post a derogatory comment about my face, feel free to skip ahead.


The fundamental problem that review scores attempt to solve, is reducing a game's worth into as succinct and understandable a measure as possible. Anyone possessing remedial math skills can look at a 7/10 rating, and see that it's "better" than a 5/10 rating. But trying to derive any further meaning from a numeric score can be tricky:

1) If Game A has a 7/10 from IGN, does that make it better than Game B, with 6/10 from Destructoid?

This is a problem that Metacritic (and other review aggregators) go a long way toward solving, as they smooth out the effect of each outlet's numeric range. But it is still difficult to separate the effect of a critic's opinion from the effect of an outlet's skew on review scores in general.

2) Is Game C, released in 1999 with an 81% average, better than Game D, released in 2009 with an 80% average?

Our expectations change as genres, technology, and development practices move forward. Last year's award winner often becomes this year's par-for-the-course, and next year's trash. But not always! - some games still seem almost timeless. There's no standard rubric to adjust a game's review score based on its age, nevermind newly-published reviews of older games.

3) If Game E, a racing game, gets 90/100, is it better than Game F, an action-adventure, which got only 85/100?

This is really an unfair question, since it assumes a 1:1 mapping between "good" and "bad" across genres. I would make the argument that all racing and sports games suck, unless they have banana peels; whereas a racing game fan might make the argument that I am a dick. While there are some aspects of disparate games that can be directly compared (graphics, voice acting, etc.), in general, their numeric differences are not meaningful. The score for an "average" football game is completely different from the score for an "average" platformer.

4) Is Game G exactly 'average' if it receives a 5.0/10? That is, can it be assumed that half of all games are better, and half are worse?

Of course, we all know this is bullshit. Individual review outlets can assign textual descriptions to their numbers, but these become meaningless in view of other outlets' different numbers and descriptions. Furthermore, the idea that a game can be explained as purely a number, with no context, is utter nonsense; the number is useless except when compared with other numbers.

It is this last question that inspired my proposed system, which I'll call Relative-Comparative Ranking (RCR for short). I assert that it is pointless attempting to create a holistic measure of game worth, independent of comparison; it is impossible to say a game is "good" or "bad" without some contextual basis, e.g. "better than this game" or "not as good as that game." At any rate, a game consumer doesn't know what to expect from a game rated 80 until he's played a 70, or a 90, et cetera.

In so many words, my assumption is that a succinct game-worth descriptor need not be meaningful purely on its own. So with that in mind, why bother with the artifice of numbers? My RCR proposal is to describe games as better than (or not as good as) other games.

For instance, if I were to review the this-gen slash-em-up Conan, I might describe it as better than Vexx, but not as good as New Super Mario Bros. A user could view Conan's ranking page, and plainly see that it's more fun than a terrible 3d platformer from 2003. Or, he could view the Vexx ranking page, and note that it's worse than a mediocre action game from 2007.

If the user has played any of these three games, he can roughly gauge the quality of the other two, based on my and other reviewers' ranking votes. Given a large enough pool of users, all with their own distinct genre tastes, the effects of those tastes will filter out - leaving (presumably) the real, measurable differences in quality between games of un-like genres. This can be construed as an answer to (3), above: as much as unlike-games can be compared, the ranking system makes the comparison direct, rather than using unreliable numbers.

Rankings become infinitely more meaningful, though, when games are compared to like-games; which is just the kind of comparison you'd expect a critic to make, anyway. Regarding the Conan example, rather than comparing the game to Vexx and Mario, I'm in fact much more likely to say that Conan is better than Golden Axe: Beast Rider, but not as good as God of War. Now, chances are pretty good that any fan of beat-em-up games can use this ranking information to make a reliable purchasing decision. Since these rankings are non-temporal - a 6.0 may not always be better than a 5.0, but Conan will always be better than Beast Rider - this handily solves issue (2).

It's also desirable to rank games using votes from some degrees of separation away - e.g., if there are no votes comparing Modern Warfare to The Conduit, but there are known MW vs. MW: Reflex rankings, and known MW: Reflex vs. Conduit rankings, some second-degree inference may be made. Hence, an answer to (1).

Embellish these rankings with short, tag-like descriptors of a game, covering its genre, platform, key features, etc., and it becomes trivial to computationally construct meaningful rankings based on category. It would be simple to determine the "best" racing game, or the "best" Zelda game, or the "best" game starring Nolan North. This answers issue (4), in that it is easy to see how a game has placed in the grand scheme of things. It could even serve as a mechanism for a novice gamer to pick a good starting point; if a game consumer's never played a Zelda before, he could check the series ranking and instantly go for one near the top of the list.

The trick of RCR is, naturally, the implementation, which I haven't completely figured out yet. A straightforward approach to first-degree comparative rankings could be pretty easy: a massive table with every recorded game on each axis, and ranking votes where two titles intersect. But how should second-degree comparisons, and farther out, be decided? What algorithm, and what weight, should be applied to these rankings? And what determines statistical significance - if I go to a game's ranking page, how many times will the entire table have to be searched, to determine what comparisons are most meaningful?

Ultimately, getting the implementation right requires much further thought, as well as practical experimentation, and a significant set of data to play around with.


When I first thought of this system, I'd intended to further design and test it for my personal game site (citation needed); but I gave up before reaching any sort of functional prototype, having no significant userbase to use for algorithm testing. Also, laziness. I'd still love to try this someday, though.

TLDR - review scores are good because the alternatives are kind-of fuckin' complicated.
Login to vote this up!



Please login (or) make a quick account (free)
to view and post comments.

 Login with Twitter

 Login with Dtoid

Three day old threads are only visible to verified humans - this helps our small community management team stay on top of spam

Sorry for the extra step!


About TSuerethone of us since 8:04 PM on 07.31.2009

Xbox LIVE:TSuereth


Around the Community