How I Built Elo-Based Matchmaking for a Real-Time Chess App

When I built PlayChess - a real-time multiplayer chess platform - one of the most interesting engineering problems was matchmaking. Not just "pair two waiting players together," but pairing them fairly: close enough in skill that the game is competitive, but fast enough that nobody waits forever.

The solution has two parts: the Elo rating system to quantify skill, and a dynamic rating-range expansion algorithm to balance fairness against wait time. This post covers both, with the actual code from the codebase.

Part 1: The Elo Rating System

What Elo actually measures

Elo doesn't measure how good a player is in absolute terms. It measures how good a player is relative to the pool they've played in. A 1600-rated player in one pool might be a 1400-rated player in a stronger pool. The number is only meaningful within a consistent system.

The key insight behind Elo: every game is a prediction. Before the game, the system predicts a probability that each player will win based on the rating gap. After the game, it updates both ratings based on whether the result matched the prediction.

Surprise the system → gain more points. Confirm what was expected → gain fewer points.

The expected score formula

The core of Elo is a single formula:

E = 1 / (1 + 10^((opponentRating - playerRating) / 400))

This is a logistic curve that maps any rating difference to a probability between 0 and 1.

function calculateExpectedScore(playerRating: number, opponentRating: number): number {
  return 1 / (1 + Math.pow(10, (opponentRating - playerRating) / 400));
}

Let's see what it produces for different rating gaps:

Rating gap	Expected score	Win probability
+400 (you're higher)	0.91	~91%
+200	0.76	~76%
+100	0.64	~64%
0 (equal)	0.50	50%
-100	0.36	~36%
-200	0.24	~24%
-400	0.09	~9%

The 400-point divisor is a calibration constant. It sets the scale: a player who is 400 points higher is expected to win about 10× more often than they lose. This is a convention, not a law - FIDE uses 400, some systems use 173.7 (which gives a normal distribution instead of logistic).

Rating updates

After the game, each player's new rating is:

NewRating = OldRating + K × (ActualScore − ExpectedScore)

Where ActualScore is 1 for a win, 0.5 for a draw, 0 for a loss.

function calculateNewRating(
  currentRating: number,
  opponentRating: number,
  actualScore: number,
): { newRating: number; change: number } {
  const kFactor = getKFactor(currentRating);
  const expectedScore = calculateExpectedScore(currentRating, opponentRating);

  const change = Math.round(kFactor * (actualScore - expectedScore));
  const newRating = currentRating + change;

  return {
    newRating: Math.max(100, newRating), // Floor at 100 - no negative ratings
    change,
  };
}

The minimum rating floor of 100 is a small but important detail. Without it, a losing streak can drive a new player's rating negative, which breaks the expected score formula and creates nonsensical matchmaking.

The K-factor: how volatile should ratings be?

The K-factor controls how much a single game can shift your rating. A high K means ratings move fast - good for new players whose true skill is unknown. A low K means ratings are stable - good for established players where you want small adjustments, not wild swings.

const K_FACTORS = {
  BEGINNER:     40,  // Rating < 1500
  INTERMEDIATE: 32,  // Rating 1500–2000
  ADVANCED:     24,  // Rating 2000–2400
  MASTER:       16,  // Rating > 2400
} as const;

function getKFactor(rating: number): number {
  if (rating < 1500) return K_FACTORS.BEGINNER;
  if (rating < 2000) return K_FACTORS.INTERMEDIATE;
  if (rating < 2400) return K_FACTORS.ADVANCED;
  return K_FACTORS.MASTER;
}

Why does this make sense?

A brand new player with a default 1200 rating might actually be a 1700-strength player. K=40 lets their rating climb quickly - fewer games wasted before they reach their true level. An established 2300-rated player has played hundreds of games; their rating is already accurate, and K=16 prevents a lucky run from inflating it artificially.

FIDE uses the same tiered approach: K=40 for new players, K=20 for established players, K=10 for elite players.

Calculating both players at once

Both players update simultaneously, based on the pre-game ratings. A critical detail: don't update player A's rating first and then use the new rating to calculate player B's change. Both calculations use the original ratings.

export function calculateEloChanges(
  whiteRating: number,
  blackRating: number,
  winner: Winner,
): {
  whiteChange: number;
  blackChange: number;
  whiteNewRating: number;
  blackNewRating: number;
} {
  const whiteActualScore = calculateActualScore(winner, 'white');
  const blackActualScore = calculateActualScore(winner, 'black');

  const whiteResult = calculateNewRating(whiteRating, blackRating, whiteActualScore);
  const blackResult = calculateNewRating(blackRating, whiteRating, blackActualScore);

  return {
    whiteChange: whiteResult.change,
    blackChange: blackResult.change,
    whiteNewRating: whiteResult.newRating,
    blackNewRating: blackResult.newRating,
  };
}

Showing predicted changes before the game

One UX feature that improves player experience significantly: showing players how much they stand to win or lose before the game starts. This is pure math - no game state needed.

export function estimateRatingChange(
  playerRating: number,
  opponentRating: number,
): {
  onWin: number;
  onDraw: number;
  onLoss: number;
} {
  const kFactor = getKFactor(playerRating);
  const expectedScore = calculateExpectedScore(playerRating, opponentRating);

  return {
    onWin:  Math.round(kFactor * (1   - expectedScore)),
    onDraw: Math.round(kFactor * (0.5 - expectedScore)),
    onLoss: Math.round(kFactor * (0   - expectedScore)),
  };
}

For a 1400-rated player facing a 1600-rated opponent (K=40, expected score ≈ 0.24):

Result	Change
Win	+30
Draw	+10
Loss	−10

High potential gain, low potential loss - because beating a stronger player is a surprise to the system. This asymmetry is exactly what you want to show players.

Part 2: The Matchmaking Problem

Rating math is the easy part. The harder problem is the queue.

The naive approach (and why it fails)

The simplest matchmaking strategy: when a player joins the queue, scan for anyone within ±100 rating points and pair them immediately.

This works fine when the queue has 100+ players. It fails badly when the queue is small - which is most of the time for an indie project. Players sit waiting because nobody within their exact range is online.

The fundamental tension in matchmaking:

Too strict a range → long wait times, players leave
Too loose a range → unfair matches, players also leave

The solution is to start strict and relax over time.

Dynamic rating-range expansion

The idea: when a player first joins the queue, require a close rating match. Every N seconds they're still waiting, expand the acceptable range. Eventually, after enough time, match them with whoever is available.

const MATCHMAKING_CONFIG = {
  BASE_RANGE:         100,  // Initial rating range (±100)
  EXPANSION_RATE:      50,  // Expand by ±50 every interval
  EXPANSION_INTERVAL: 10,   // Expand every 10 seconds
  MAX_RANGE:          500,  // Never exceed ±500
  MAX_WAIT_TIME:      120,  // Force match after 2 minutes
} as const;

function getCurrentRange(waitTimeSeconds: number): number {
  const expansions = Math.floor(waitTimeSeconds / MATCHMAKING_CONFIG.EXPANSION_INTERVAL);
  const range = MATCHMAKING_CONFIG.BASE_RANGE +
    (expansions * MATCHMAKING_CONFIG.EXPANSION_RATE);

  return Math.min(range, MATCHMAKING_CONFIG.MAX_RANGE);
}

The range progression for a waiting player:

Wait time	Rating range
0s	±100
10s	±150
20s	±200
30s	±250
60s	±400
90s	±500 (capped)
120s	Force match

The queue scan

Every few seconds, the matchmaking service scans the queue and tries to pair players:

interface QueuedPlayer {
  userId: string;
  rating: number;
  joinedAt: Date;
  timeControl: TimeControl;
}

function findMatch(
  player: QueuedPlayer,
  queue: QueuedPlayer[],
): QueuedPlayer | null {
  const waitSeconds = (Date.now() - player.joinedAt.getTime()) / 1000;
  const range = getCurrentRange(waitSeconds);

  // Filter to same time control first
  const eligible = queue.filter(
    (p) =>
      p.userId !== player.userId &&
      p.timeControl === player.timeControl &&
      Math.abs(p.rating - player.rating) <= range,
  );

  if (eligible.length === 0) return null;

  // Among eligible players, prefer the closest rating
  return eligible.reduce((best, current) =>
    Math.abs(current.rating - player.rating) <
    Math.abs(best.rating - player.rating)
      ? current
      : best,
  );
}

Two important details here:

Time control filtering comes first. A 1500-rated bullet player and a 1500-rated classical player should not be matched together. Time controls are different games.
Among eligible players, pick the closest rating. Just because someone is within the expanded range doesn't mean you shouldn't prefer the best available match.

Force match after max wait

After 2 minutes, stop being picky. Match with whoever is closest, regardless of range:

function shouldForceMatch(joinedAt: Date): boolean {
  const waitSeconds = (Date.now() - joinedAt.getTime()) / 1000;
  return waitSeconds >= MATCHMAKING_CONFIG.MAX_WAIT_TIME;
}

function findMatchForPlayer(
  player: QueuedPlayer,
  queue: QueuedPlayer[],
): QueuedPlayer | null {
  if (shouldForceMatch(player.joinedAt)) {
    // Force match: pick closest rating regardless of range
    const eligible = queue.filter(
      (p) =>
        p.userId !== player.userId &&
        p.timeControl === player.timeControl,
    );

    if (eligible.length === 0) return null;

    return eligible.reduce((best, current) =>
      Math.abs(current.rating - player.rating) <
      Math.abs(best.rating - player.rating)
        ? current
        : best,
    );
  }

  return findMatch(player, queue);
}

Rating Tiers

One more function worth covering - rating tiers. These are the human-readable labels shown in the UI next to a player's rating.

export function getRatingTier(rating: number): string {
  if (rating < 1000) return 'Beginner';
  if (rating < 1200) return 'Novice';
  if (rating < 1400) return 'Intermediate';
  if (rating < 1600) return 'Advanced';
  if (rating < 1800) return 'Expert';
  if (rating < 2000) return 'Candidate Master';
  if (rating < 2200) return 'Master';
  if (rating < 2400) return 'International Master';
  if (rating < 2600) return 'Grandmaster';
  return 'Super Grandmaster';
}

These roughly mirror FIDE title thresholds, scaled down slightly for an online platform where ratings tend to be lower than over-the-board ratings. Labels matter more than you'd think for engagement - players respond to achieving "Expert" far more than they respond to a number going from 1799 to 1801.

Edge Cases That Will Bite You

New player rating inflation. When a new player joins at the default 1200 rating, they might actually be a 1700-strength player who's played chess for years. Until their rating catches up, they're in the "beginner" K-factor bracket and will rapidly gain points — but in the meantime, they'll be crushing 1200-rated players who are genuinely beginners. Consider a provisional period: don't use their rating for matchmaking until they've played 10+ games.

Rating deflation over time. Active players who play frequently gradually pull rating points from less-active players. Over months, the average rating in your system can drift lower even though the player pool hasn't changed. FIDE combats this with rating floors and inactivity protections. For a small platform, the simplest fix is periodic recalibration.

Draw handling at extreme rating gaps. If a 2200-rated player draws with a 1000-rated player, the expected score formula says the 2200 player deserved to win with ~99.9% probability. A draw is a massive underperformance — they'd lose significant rating. This is technically correct by Elo's logic, but it discourages strong players from accepting games with much weaker opponents. Minimum change thresholds (never gain/lose more than X per game) are one mitigation.

Concurrent queue updates. Two matchmaking scans running simultaneously might both try to pair the same player with different opponents. This is a race condition. The fix: use Redis distributed locks to ensure only one scan processes a given player at a time. I covered this in more detail in the PlayChess project page.

What I Learned

Elo is deceptively simple to implement and surprisingly deep to tune. The formula itself is five lines. Getting the K-factors, rating floors, and matchmaking expansion parameters right took far more iteration.

The biggest lesson: treat matchmaking parameters as product decisions, not engineering decisions. The right K-factor for beginners isn't a math problem - it's a question of what experience you want new players to have. Do you want them to reach their true level in 5 games or 20? Do you want their early games to feel competitive or do you want to protect them from strong players? Those are design choices that the K-factor encodes.

The Elo system is 60 years old and still the foundation of most competitive online games. It's a great example of an algorithm that does one thing well, is simple enough to reason about, and generalises cleanly to almost any 1v1 competitive context.

PlayChess is open source on GitHub. If you found this useful, I write about algorithms, systems design, and building real-world projects as a CS student - follow me on Twitter/X.