league-of-legends-analysis


Project maintained by vanessly Hosted on GitHub Pages — Theme by mattgraham

Know Your Role: Predicting League of Legends Roles from Player Stats

Vinh Tran | LinkedIn | vinht@umich.edu

CC Ly | LinkedIn | vanessly@umich.edu

Introduction

Introduction and Question Identification

Dataset Overview

What is League of Legends?

Throughout the game, players earn gold, experience, and items by defeating enemy champions, minions, and monsters. Their performance is tracked through stats like kills, deaths, assists, damage dealt, and gold earned, many of which are used in our analysis to predict a player’s role.

Central Question

How accurately can we predict a player’s in‑game role (Top, Jungle, Mid, Bottom, or Support) using only their post‑game performance statistics?

Key Columns

Below are the columns relevant to our question:

Column Description
gameid Unique ID for each match (ties together all player and team rows)
position The role a player filled in that game (Top, Jungle, Mid, Bottom, Support)
kills Number of enemy champions the player eliminated
assists Number of enemy champion kills the player helped secure
deaths Number of times the player was eliminated by enemy champions
dpm Damage per minute: average damage dealt to champions per minute
earned gpm Gold per minute earned by the player throughout the match
cspm Creep score per minute: average minions and monsters killed per minute
monsterkills Total number of neutral monsters killed by the player
kda Kills/Deaths/Assists ratio: (Kills + Assists) divided by Deaths, used to evaluate combat performance
participation Also known as "kill participation". Proportion of team kills a player was involved in (kills or assists)
xptogoldat10 Ratio of experience points to gold earned at 10 minutes, used to estimate lane efficiency

Why It Matters

Data Cleaning and Exploratory Data Analysis

Data Cleaning

1. Filtered only for complete player data

df = df[df['datacompleteness'] == 'complete']
df = df.groupby('gameid', group_keys=False).apply(lambda x: x.iloc[:-2])

3. Dropped irrelevant columns

cols_to_drop = ['url', 'split', 'pick1', ..., 'firstdragon']
df.drop(columns=cols_to_drop, inplace=True)

4. Dropped columns with Null values

columns_with_null = df.isnull().sum()[df.isnull().sum() > 0].index.to_list()
df.drop(columns=columns_with_null, inplace=True)
['playerid', 'teamname', 'teamid', 'ban1', 'ban2', 'ban3', 'ban4', 'ban5', 'barons', 'opp_barons', 'inhibitors', 'opp_inhibitors', 'goldat20', 'xpat20', 'csat20', 'opp_goldat20', 'opp_xpat20', 'opp_csat20', 'golddiffat20', 'xpdiffat20', 'csdiffat20', 'killsat20', 'assistsat20', 'deathsat20', 'opp_killsat20', 'opp_assistsat20', 'opp_deathsat20', 'goldat25', 'xpat25', 'csat25', 'opp_goldat25', 'opp_xpat25', 'opp_csat25', 'golddiffat25', 'xpdiffat25', 'csdiffat25', 'killsat25', 'assistsat25', 'deathsat25', 'opp_killsat25', 'opp_assistsat25', 'opp_deathsat25']

Final Cleaned Dataframe

gameid datacompleteness league year playoffs date game patch participantid side position playername champion gamelength result kills deaths assists teamkills teamdeaths doublekills triplekills quadrakills pentakills firstblood firstbloodkill firstbloodassist firstbloodvictim team kpm ckpm damagetochampions dpm damageshare damagetakenperminute damagemitigatedperminute wardsplaced wpm wardskilled wcpm controlwardsbought visionscore vspm totalgold earnedgold earned gpm earnedgoldshare goldspent total cs minionkills monsterkills cspm goldat10 xpat10 csat10 opp_goldat10 opp_xpat10 opp_csat10 golddiffat10 xpdiffat10 csdiffat10 killsat10 assistsat10 deathsat10 opp_killsat10 opp_assistsat10 opp_deathsat10 goldat15 xpat15 csat15 opp_goldat15 opp_xpat15 opp_csat15 golddiffat15 xpdiffat15 csdiffat15 killsat15 assistsat15 deathsat15 opp_killsat15 opp_assistsat15 opp_deathsat15
0 ESPORTSTMNT01_2690210 complete LCKC 2022 0 2022-01-10 07:44:08 1 12.01 1 Blue top Soboro Renekton 1713 0 2 3 2 9 19 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.32 0.98 15768.0 552.29 0.28 1072.40 777.79 8.0 0.28 6.0 0.21 5.0 26.0 0.91 10934 7164.0 250.93 0.25 10275.0 231.0 220.0 11.0 8.09 3228.0 4909.0 89.0 3176.0 4953.0 81.0 52.0 -44.0 8.0 0.0 0.0 0.0 0.0 0.0 0.0 5025.0 7560.0 135.0 4634.0 7215.0 121.0 391.0 345.0 14.0 0.0 1.0 0.0 0.0 1.0 0.0
1 ESPORTSTMNT01_2690210 complete LCKC 2022 0 2022-01-10 07:44:08 1 12.01 2 Blue jng Raptor Xin Zhao 1713 0 2 5 6 9 19 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.32 0.98 11765.0 412.08 0.21 944.27 650.16 6.0 0.21 18.0 0.63 6.0 48.0 1.68 9138 5368.0 188.02 0.19 8750.0 148.0 33.0 115.0 5.18 3429.0 3484.0 58.0 2944.0 3052.0 63.0 485.0 432.0 -5.0 1.0 2.0 0.0 0.0 0.0 1.0 5366.0 5320.0 89.0 4825.0 5595.0 100.0 541.0 -275.0 -11.0 2.0 3.0 2.0 0.0 5.0 1.0
2 ESPORTSTMNT01_2690210 complete LCKC 2022 0 2022-01-10 07:44:08 1 12.01 3 Blue mid Feisty LeBlanc 1713 0 2 2 3 9 19 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.32 0.98 14258.0 499.40 0.25 581.65 227.78 19.0 0.67 7.0 0.25 7.0 29.0 1.02 9715 5945.0 208.23 0.21 8725.0 193.0 177.0 16.0 6.76 3283.0 4556.0 81.0 3121.0 4485.0 81.0 162.0 71.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 5118.0 6942.0 120.0 5593.0 6789.0 119.0 -475.0 153.0 1.0 0.0 3.0 0.0 3.0 3.0 2.0
3 ESPORTSTMNT01_2690210 complete LCKC 2022 0 2022-01-10 07:44:08 1 12.01 4 Blue bot Gamin Samira 1713 0 2 4 2 9 19 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.32 0.98 11106.0 389.00 0.20 463.85 218.88 12.0 0.42 6.0 0.21 4.0 25.0 0.88 10605 6835.0 239.40 0.24 10425.0 226.0 208.0 18.0 7.92 3600.0 3103.0 78.0 3304.0 2838.0 90.0 296.0 265.0 -12.0 1.0 1.0 0.0 0.0 0.0 0.0 5461.0 4591.0 115.0 6254.0 5934.0 149.0 -793.0 -1343.0 -34.0 2.0 1.0 2.0 3.0 3.0 0.0
4 ESPORTSTMNT01_2690210 complete LCKC 2022 0 2022-01-10 07:44:08 1 12.01 5 Blue sup Loopy Leona 1713 0 1 5 6 9 19 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.32 0.98 3663.0 128.30 0.06 475.03 490.12 29.0 1.02 14.0 0.49 11.0 69.0 2.42 6678 2908.0 101.86 0.10 6395.0 42.0 42.0 0.0 1.47 2678.0 2161.0 16.0 2150.0 2748.0 15.0 528.0 -587.0 1.0 1.0 1.0 0.0 0.0 0.0 1.0 3836.0 3588.0 28.0 3393.0 4085.0 21.0 443.0 -497.0 7.0 1.0 2.0 2.0 0.0 6.0 2.0

Univariate Analysis

Bivariate Analysis

Interesting Aggregates

position bot jng mid sup top
gameid
ESPORTSTMNT01_2690210 5.0 3.0 4.0 0.5 1.5
ESPORTSTMNT01_2690219 1.5 3.5 3.5 0.0 1.0
ESPORTSTMNT01_2690227 1.5 1.0 4.0 1.0 2.0
ESPORTSTMNT01_2690255 4.5 2.5 3.0 2.0 2.5
ESPORTSTMNT01_2690264 3.0 1.0 2.0 2.0 2.5
ESPORTSTMNT01_2690302 5.0 3.5 7.0 0.5 5.5
ESPORTSTMNT01_2690328 6.0 3.5 7.5 0.5 3.5
ESPORTSTMNT01_2690351 1.5 0.5 4.0 0.5 2.5
ESPORTSTMNT01_2690370 4.5 2.0 0.5 0.0 0.5
ESPORTSTMNT01_2690390 2.5 4.5 4.5 2.0 0.5

Imputation

columns_with_null = df.isnull().sum()[df.isnull().sum() > 0].index.to_list()
df.drop(columns=columns_with_null, inplace=True)
['playerid', 'teamname', 'teamid', 'ban1', 'ban2', 'ban3', 'ban4', 'ban5', 'barons', 'opp_barons', 'inhibitors', 'opp_inhibitors', 'goldat20', 'xpat20', 'csat20', 'opp_goldat20', 'opp_xpat20', 'opp_csat20', 'golddiffat20', 'xpdiffat20', 'csdiffat20', 'killsat20', 'assistsat20', 'deathsat20', 'opp_killsat20', 'opp_assistsat20', 'opp_deathsat20', 'goldat25', 'xpat25', 'csat25', 'opp_goldat25', 'opp_xpat25', 'opp_csat25', 'golddiffat25', 'xpdiffat25', 'csdiffat25', 'killsat25', 'assistsat25', 'deathsat25', 'opp_killsat25', 'opp_assistsat25', 'opp_deathsat25']

Framing a Prediction Problem

Problem Identification

Response Variable

Evaluation Metric

Information Available at Time of Prediction

Baseline Model

Model Description and Evaluation

Why We Chose Logistic Regression

We chose logistic regression for our baseline model because it is particularly useful when we want to:

Features Used

We included the following seven features, all of which are quantitative:

We did not use any ordinal or nominal features in our model, so no encoding (e.g. one-hot encoding or label encoding) was necessary for the features.

The target variable (position) is nominal (categorical with no inherent order), consisting of five distinct classes: top, mid, bot, jng, and sup.

Model Performance

Test Accuracy: 0.6760885885885886

Class Precision Recall F1-Score Support
bot 0.48 0.45 0.47 5307
jng 1.00 1.00 1.00 5407
mid 0.47 0.50 0.48 5312
sup 0.96 0.97 0.97 5262
top 0.47 0.46 0.47 5352
accuracy 0.68 26640
macro avg 0.67 0.68 0.68 26640
weighted avg 0.68 0.68 0.68 26640

Confusion Matrix

These performance statistics reveal that the model is very confident and correct when predicting Jungle and Support, however, it struggles to correctly classify Bottom, Top, and Mid. This checks out intuitively, since these three roles have overlapping post-game stat profiles (e.g., similar kills, assists, and CS patterns), which makes them harder to distinguish using just basic numerical features.

Is the Model “Good”?

We believe this baseline model is a good starting point, but not fully sufficient for high-accuracy role classification. The model captures general trends such as supports having high assists and low kills but struggles to differentiate between positions like top, mid, and jungle, thus leading to a lower accuracy than wanted.

To improve on this baseline, we plan to:

Nonetheless, this baseline confirms that post-game performance statistics can offer meaningful insights into role prediction.

Final Model

Feature Engineering

We created three new features based on the successes and pitfalls of our baseline model, and our prior knowledge of League of Legends and how different roles contribute to team success.

Model Selection and Hyperparameters

Model: Decision Tree
Test Accuracy: 0.6995495495495495

Model: Random Forest
Test Accuracy: 0.7177552552552553

Model: Naive Bayes
Test Accuracy: 0.678978978978979

Model: Logistic Regression
Test Accuracy: 0.7015765765765766

Model: Neural Network
Test Accuracy: 0.7228228228228228

Performance Comparison

Baseline Model

Test Accuracy: 0.6760885885885886

ClassPrecisionRecallF1-ScoreSupport
bot0.480.450.475307
jng1.001.001.005407
mid0.470.500.485312
sup0.960.970.975262
top0.470.460.475352
accuracy0.6826640
macro avg0.670.680.6826640
weighted avg0.680.680.6826640

Final Model

Test Accuracy: 0.9207957957957958

ClassPrecisionRecallF1-ScoreSupport
bot0.960.970.974253
jng1.001.001.004298
mid0.830.820.824290
sup0.980.990.994213
top0.830.830.834258
accuracy0.9221312
macro avg0.920.920.9221312
weighted avg0.920.920.9221312

Confusion Matrix

Conclusion