Why We Chose ELO: Building a Reputation System That's Actually Fair

When we started designing TownSquare's reputation system, we spent a long time studying what already existed — and why it failed. Every major platform has some form of reputation signal, and every one of them has a fundamental flaw that undermines its usefulness.

Understanding those failures was the first step to building something better.

The Failures We Wanted to Avoid

Follower counts measure popularity, not quality

The number of followers you have reflects how many people have opted in to see your content — but it's a lagging indicator heavily influenced by factors that have nothing to do with the quality of your thinking. Early adopters accumulate followers simply by being early. People who post frequently accumulate followers through sheer volume. Accounts that go viral once can accumulate hundreds of thousands of followers from a single lucky post, and then coast on that number for years regardless of what they post next.

Follower counts also don't decay. A pundit who was consistently wrong throughout the 2010s still has the follower count they accumulated then. The market corrects eventually, but slowly and noisily.

Upvotes and likes are a popularity contest

Simple upvote systems — Reddit, YouTube likes, Twitter/X hearts — measure how much a post resonated with whoever saw it. But resonance isn't the same as quality. A post that confirms what most readers already believe will outperform a post that challenges them, even if the challenging post is more accurate and more useful. Agreeable mediocrity beats uncomfortable insight almost every time.

There's also a timing and visibility problem: posts that appear at the top of a feed get more eyes and therefore more upvotes, regardless of their relative quality. The rich get richer. Early upvotes predict future upvotes more reliably than quality does.

Karma systems accumulate without reflecting current quality

Systems like Reddit karma give you a running total that grows over time. But there's no mechanism for decay, no way for past performance to stop subsidizing present behavior. A user with 50,000 karma who has been posting poorly for the last year still presents as a high-reputation account. The number has lost its signal.

What We Needed Instead

As we worked through these failures, the requirements for a better system became clearer:

It should reflect current quality, not just accumulated history. Past performance should matter, but recent performance should matter more.
It should account for who is judging you. A positive assessment from a high-reputation user should carry more weight than one from a brand-new account.
It should be self-correcting. If your quality drops, your score should drop. If it improves, your score should reflect that.
It should be resistant to gaming. You shouldn't be able to accumulate reputation by posting into a sympathetic bubble without that reputation being worth less as a result.

The more we refined these requirements, the more one system kept coming to mind: ELO.

What ELO Actually Does

The ELO rating system was developed in the 1960s by Arpad Elo, a Hungarian-American physics professor and chess enthusiast, to replace a previous rating system that failed to accurately predict match outcomes. The core insight is elegant: your rating should reflect not just whether you win, but whether you win against the people you're expected to beat.

When a highly-rated chess player beats a much lower-rated opponent, their rating barely moves — the outcome was expected. But if a lower-rated player beats a highly-rated one, both ratings shift significantly. Upsets matter more than expected outcomes. The system continuously recalibrates based on actual performance against real competition.

"Your ELO reflects how you perform against expectations — which means it reflects genuine skill, not just accumulated wins."

The properties of ELO map almost perfectly to what we needed for social media reputation:

Votes from high-reputation users move your score more — just as beating a highly-rated chess player moves your score more than beating a beginner.
Scores are self-correcting — a run of poor content lowers your rating the same way a losing streak lowers a chess player's.
Posting into a bubble is self-limiting — if you only get votes from low-reputation accounts, those votes don't do much for your score.
The score is always live — it reflects your actual recent performance, not just your historical peak.

How We Adapted It

Pure ELO needed some modifications for a social context. In chess, you have clear wins and losses. On a social platform, the signal is more nuanced — which is why we built the multi-axis voting system alongside the ELO ratings.

When someone votes on a TownSquare post, they're not just giving a thumbs up or down. They're choosing from six categories: Insightful, Funny, Well-Sourced, Respectful, Misleading, or Off-Topic. Positive votes increase your ELO. Negative votes (Misleading, Off-Topic) decrease it. And the K-factor — the amount by which your score can change from any single vote — scales with the reputation gap between you and the voter.

The result is a reputation system that actually means something. A high ELO score on TownSquare isn't a measure of how long you've been around or how popular you are. It's a measure of how consistently your content is judged to be valuable by the people best positioned to judge it.

That's a harder thing to earn. But it's a much more honest thing to display.