FIDE vs. lichess rating comparison

This is an analysis I did some time ago comparing FIDE ratings to lichess ratings. The biggest part of the work was finding a reliable way to get both ratings for the same set of players. A lot of players on lichess post their FIDE ratings, but unfortunately, some players will post joke ratings, or just outright lie.

In order to avoid “liar’s bias” in the self-reported FIDE ratings, I used two different methods to independently look up FIDE ratings directly from FIDE. In the first method, I took advantage of the fact that many lichess users post their FIDE ID numbers to their lichess profiles, enabling direct lookup of their ratings. In the second method, I used lichess profile data to create a “key” consisting of name, title, and country. In cases where the key is unique, and is present in both the lichess and FIDE data sets, I can then look up the FIDE rating for that user. Finally, to ensure I have the correct player, I only keep the data point if the player’s self-reported FIDE rating matches the one I looked up.

Out of 628,200 lichess profiles, I was able to harvest 5957 FIDE ratings. This is plenty of data for looking at the overall correlation between the two ratings.

It turns out, there is a large spread present in the ratings data. In other words, if you know someone’s rating in one system, your estimate of their rating in the other system will be very imprecise. To give people a better feel for how (un)reliable the estimates are, each graph here shows scatter plots of FIDE standard ratings versus the various ratings for different lichess time controls. The red line in each plot is a robust linear regression of the data, while the blue lines show quantile regressions for the 97.5, 50 and 2.5 percentiles. In other words, 95% of players fall between the two dotted blue lines. For example, about 95% of players with a lichess blitz rating of 2500 will have a FIDE standard rating between about 2000 to 2500, so if you have a blitz rating of 2500, you could roughly estimate your FIDE as 2250 ± 250. Note that even though this is about the tightest correlation shown, the precision isn’t great. Apparently people just vary a lot. Maybe it has something to do with skill playing online not being quite the same as playing in person.

One rather unexpected thing that stands out in the data, is that the correlation is worst at the fastest and slowest time controls. Since most lichess time controls average less than 25 minutes per game, compared to FIDE’s typical 90 minutes for the first 40 moves, I was surprised to see that blitz has the tightest correlation.

Here are the linear regression coefficients, since I’m sure someone will ask for them (x indicates the slope):

[1] "ultraBullet"
Coefficients:
(Intercept)            x  
   1265.331        0.473

[1] "bullet"
Coefficients:
(Intercept)            x  
   722.0768       0.6261

[1] "blitz"
Coefficients:
(Intercept)            x  
    41.4256       0.9074

[1] "rapid"
Coefficients:
(Intercept)            x  
   309.9946       0.7962

[1] "classical"
Coefficients:
(Intercept)            x  
  1161.4994       0.4038

kubernetes@kbin.social

2·

1 year ago

Very interesting, thank you for sharing.
I think the major reason for the difference in rating is caused because playing online vs. over the board is very different. And players that are used to play online might be not as strong over the board and vice versa. But this is just my assumption without any data behind :)