Throughout the game, players earn gold, experience, and items by defeating enemy champions, minions, and monsters. Their performance is tracked through stats like kills, deaths, assists, damage dealt, and gold earned, many of which are used in our analysis to predict a player’s role.
How accurately can we predict a player’s in‑game role (Top, Jungle, Mid, Bottom, or Support) using only their post‑game performance statistics?
Below are the columns relevant to our question:
Column | Description |
---|---|
gameid |
Unique ID for each match (ties together all player and team rows) |
position |
The role a player filled in that game (Top, Jungle, Mid, Bottom, Support) |
kills |
Number of enemy champions the player eliminated |
assists |
Number of enemy champion kills the player helped secure |
deaths |
Number of times the player was eliminated by enemy champions |
dpm |
Damage per minute: average damage dealt to champions per minute |
earned gpm |
Gold per minute earned by the player throughout the match |
cspm |
Creep score per minute: average minions and monsters killed per minute |
monsterkills |
Total number of neutral monsters killed by the player |
kda |
Kills/Deaths/Assists ratio: (Kills + Assists) divided by Deaths, used to evaluate combat performance |
participation |
Also known as "kill participation". Proportion of team kills a player was involved in (kills or assists) |
xptogoldat10 |
Ratio of experience points to gold earned at 10 minutes, used to estimate lane efficiency |
df = df[df['datacompleteness'] == 'complete']
incomplete
, which may result from matches where data logging failed or games were not played to completion. We filtered the DataFrame to keep only rows where datacompleteness
was marked as complete
, ensuring all included rows contain full, reliable statistics.df = df.groupby('gameid', group_keys=False).apply(lambda x: x.iloc[:-2])
gameid
, the dataset contains 12 rows: 10 for individual players and 2 for team-level summary statistics. Since our prediction task focuses on individual player performance, we removed the last two rows of each match group, which correspond to team summaries. We verified that this operation worked correctly by checking that only 10 players remained in a sample game:
print(df.loc[df['gameid'] == 'ESPORTSTMNT01_2690210', 'playername'])
cols_to_drop = ['url', 'split', 'pick1', ..., 'firstdragon']
df.drop(columns=cols_to_drop, inplace=True)
columns_with_null = df.isnull().sum()[df.isnull().sum() > 0].index.to_list()
df.drop(columns=columns_with_null, inplace=True)
['playerid', 'teamname', 'teamid', 'ban1', 'ban2', 'ban3', 'ban4', 'ban5', 'barons', 'opp_barons', 'inhibitors', 'opp_inhibitors', 'goldat20', 'xpat20', 'csat20', 'opp_goldat20', 'opp_xpat20', 'opp_csat20', 'golddiffat20', 'xpdiffat20', 'csdiffat20', 'killsat20', 'assistsat20', 'deathsat20', 'opp_killsat20', 'opp_assistsat20', 'opp_deathsat20', 'goldat25', 'xpat25', 'csat25', 'opp_goldat25', 'opp_xpat25', 'opp_csat25', 'golddiffat25', 'xpdiffat25', 'csdiffat25', 'killsat25', 'assistsat25', 'deathsat25', 'opp_killsat25', 'opp_assistsat25', 'opp_deathsat25']
gameid | datacompleteness | league | year | playoffs | date | game | patch | participantid | side | position | playername | champion | gamelength | result | kills | deaths | assists | teamkills | teamdeaths | doublekills | triplekills | quadrakills | pentakills | firstblood | firstbloodkill | firstbloodassist | firstbloodvictim | team kpm | ckpm | damagetochampions | dpm | damageshare | damagetakenperminute | damagemitigatedperminute | wardsplaced | wpm | wardskilled | wcpm | controlwardsbought | visionscore | vspm | totalgold | earnedgold | earned gpm | earnedgoldshare | goldspent | total cs | minionkills | monsterkills | cspm | goldat10 | xpat10 | csat10 | opp_goldat10 | opp_xpat10 | opp_csat10 | golddiffat10 | xpdiffat10 | csdiffat10 | killsat10 | assistsat10 | deathsat10 | opp_killsat10 | opp_assistsat10 | opp_deathsat10 | goldat15 | xpat15 | csat15 | opp_goldat15 | opp_xpat15 | opp_csat15 | golddiffat15 | xpdiffat15 | csdiffat15 | killsat15 | assistsat15 | deathsat15 | opp_killsat15 | opp_assistsat15 | opp_deathsat15 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ESPORTSTMNT01_2690210 | complete | LCKC | 2022 | 0 | 2022-01-10 07:44:08 | 1 | 12.01 | 1 | Blue | top | Soboro | Renekton | 1713 | 0 | 2 | 3 | 2 | 9 | 19 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.32 | 0.98 | 15768.0 | 552.29 | 0.28 | 1072.40 | 777.79 | 8.0 | 0.28 | 6.0 | 0.21 | 5.0 | 26.0 | 0.91 | 10934 | 7164.0 | 250.93 | 0.25 | 10275.0 | 231.0 | 220.0 | 11.0 | 8.09 | 3228.0 | 4909.0 | 89.0 | 3176.0 | 4953.0 | 81.0 | 52.0 | -44.0 | 8.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 5025.0 | 7560.0 | 135.0 | 4634.0 | 7215.0 | 121.0 | 391.0 | 345.0 | 14.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 |
1 | ESPORTSTMNT01_2690210 | complete | LCKC | 2022 | 0 | 2022-01-10 07:44:08 | 1 | 12.01 | 2 | Blue | jng | Raptor | Xin Zhao | 1713 | 0 | 2 | 5 | 6 | 9 | 19 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.32 | 0.98 | 11765.0 | 412.08 | 0.21 | 944.27 | 650.16 | 6.0 | 0.21 | 18.0 | 0.63 | 6.0 | 48.0 | 1.68 | 9138 | 5368.0 | 188.02 | 0.19 | 8750.0 | 148.0 | 33.0 | 115.0 | 5.18 | 3429.0 | 3484.0 | 58.0 | 2944.0 | 3052.0 | 63.0 | 485.0 | 432.0 | -5.0 | 1.0 | 2.0 | 0.0 | 0.0 | 0.0 | 1.0 | 5366.0 | 5320.0 | 89.0 | 4825.0 | 5595.0 | 100.0 | 541.0 | -275.0 | -11.0 | 2.0 | 3.0 | 2.0 | 0.0 | 5.0 | 1.0 |
2 | ESPORTSTMNT01_2690210 | complete | LCKC | 2022 | 0 | 2022-01-10 07:44:08 | 1 | 12.01 | 3 | Blue | mid | Feisty | LeBlanc | 1713 | 0 | 2 | 2 | 3 | 9 | 19 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.32 | 0.98 | 14258.0 | 499.40 | 0.25 | 581.65 | 227.78 | 19.0 | 0.67 | 7.0 | 0.25 | 7.0 | 29.0 | 1.02 | 9715 | 5945.0 | 208.23 | 0.21 | 8725.0 | 193.0 | 177.0 | 16.0 | 6.76 | 3283.0 | 4556.0 | 81.0 | 3121.0 | 4485.0 | 81.0 | 162.0 | 71.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 5118.0 | 6942.0 | 120.0 | 5593.0 | 6789.0 | 119.0 | -475.0 | 153.0 | 1.0 | 0.0 | 3.0 | 0.0 | 3.0 | 3.0 | 2.0 |
3 | ESPORTSTMNT01_2690210 | complete | LCKC | 2022 | 0 | 2022-01-10 07:44:08 | 1 | 12.01 | 4 | Blue | bot | Gamin | Samira | 1713 | 0 | 2 | 4 | 2 | 9 | 19 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.32 | 0.98 | 11106.0 | 389.00 | 0.20 | 463.85 | 218.88 | 12.0 | 0.42 | 6.0 | 0.21 | 4.0 | 25.0 | 0.88 | 10605 | 6835.0 | 239.40 | 0.24 | 10425.0 | 226.0 | 208.0 | 18.0 | 7.92 | 3600.0 | 3103.0 | 78.0 | 3304.0 | 2838.0 | 90.0 | 296.0 | 265.0 | -12.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 5461.0 | 4591.0 | 115.0 | 6254.0 | 5934.0 | 149.0 | -793.0 | -1343.0 | -34.0 | 2.0 | 1.0 | 2.0 | 3.0 | 3.0 | 0.0 |
4 | ESPORTSTMNT01_2690210 | complete | LCKC | 2022 | 0 | 2022-01-10 07:44:08 | 1 | 12.01 | 5 | Blue | sup | Loopy | Leona | 1713 | 0 | 1 | 5 | 6 | 9 | 19 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.32 | 0.98 | 3663.0 | 128.30 | 0.06 | 475.03 | 490.12 | 29.0 | 1.02 | 14.0 | 0.49 | 11.0 | 69.0 | 2.42 | 6678 | 2908.0 | 101.86 | 0.10 | 6395.0 | 42.0 | 42.0 | 0.0 | 1.47 | 2678.0 | 2161.0 | 16.0 | 2150.0 | 2748.0 | 15.0 | 528.0 | -587.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 3836.0 | 3588.0 | 28.0 | 3393.0 | 4085.0 | 21.0 | 443.0 | -497.0 | 7.0 | 1.0 | 2.0 | 2.0 | 0.0 | 6.0 | 2.0 |
position | bot | jng | mid | sup | top |
---|---|---|---|---|---|
gameid | |||||
ESPORTSTMNT01_2690210 | 5.0 | 3.0 | 4.0 | 0.5 | 1.5 |
ESPORTSTMNT01_2690219 | 1.5 | 3.5 | 3.5 | 0.0 | 1.0 |
ESPORTSTMNT01_2690227 | 1.5 | 1.0 | 4.0 | 1.0 | 2.0 |
ESPORTSTMNT01_2690255 | 4.5 | 2.5 | 3.0 | 2.0 | 2.5 |
ESPORTSTMNT01_2690264 | 3.0 | 1.0 | 2.0 | 2.0 | 2.5 |
ESPORTSTMNT01_2690302 | 5.0 | 3.5 | 7.0 | 0.5 | 5.5 |
ESPORTSTMNT01_2690328 | 6.0 | 3.5 | 7.5 | 0.5 | 3.5 |
ESPORTSTMNT01_2690351 | 1.5 | 0.5 | 4.0 | 0.5 | 2.5 |
ESPORTSTMNT01_2690370 | 4.5 | 2.0 | 0.5 | 0.0 | 0.5 |
ESPORTSTMNT01_2690390 | 2.5 | 4.5 | 4.5 | 2.0 | 0.5 |
gameid
, and each column represents the total number of kills made by players in one of the five standard League of Legends roles: bot, jng (jungle), mid, sup (support), and top.columns_with_null = df.isnull().sum()[df.isnull().sum() > 0].index.to_list()
df.drop(columns=columns_with_null, inplace=True)
['playerid', 'teamname', 'teamid', 'ban1', 'ban2', 'ban3', 'ban4', 'ban5', 'barons', 'opp_barons', 'inhibitors', 'opp_inhibitors', 'goldat20', 'xpat20', 'csat20', 'opp_goldat20', 'opp_xpat20', 'opp_csat20', 'golddiffat20', 'xpdiffat20', 'csdiffat20', 'killsat20', 'assistsat20', 'deathsat20', 'opp_killsat20', 'opp_assistsat20', 'opp_deathsat20', 'goldat25', 'xpat25', 'csat25', 'opp_goldat25', 'opp_xpat25', 'opp_csat25', 'golddiffat25', 'xpdiffat25', 'csdiffat25', 'killsat25', 'assistsat25', 'deathsat25', 'opp_killsat25', 'opp_assistsat25', 'opp_deathsat25']
position
column, which identifies the role each player fulfilled during a match: top
, jng
, mid
, bot
, or sup
. We chose this variable because our goal is to infer a player’s role solely from their post-game performance statistics, such as kills
, assists
, gold earned per minute
, and damage dealt per minute
, rather than using manually labeled or externally sourced data.We chose logistic regression for our baseline model because it is particularly useful when we want to:
We included the following seven features, all of which are quantitative:
We did not use any ordinal or nominal features in our model, so no encoding (e.g. one-hot encoding or label encoding) was necessary for the features.
The target variable (position
) is nominal (categorical with no inherent order), consisting of five distinct classes: top, mid, bot, jng, and sup.
Test Accuracy: 0.6760885885885886
Class | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
bot | 0.48 | 0.45 | 0.47 | 5307 |
jng | 1.00 | 1.00 | 1.00 | 5407 |
mid | 0.47 | 0.50 | 0.48 | 5312 |
sup | 0.96 | 0.97 | 0.97 | 5262 |
top | 0.47 | 0.46 | 0.47 | 5352 |
accuracy | 0.68 | 26640 | ||
macro avg | 0.67 | 0.68 | 0.68 | 26640 |
weighted avg | 0.68 | 0.68 | 0.68 | 26640 |
These performance statistics reveal that the model is very confident and correct when predicting Jungle and Support, however, it struggles to correctly classify Bottom, Top, and Mid. This checks out intuitively, since these three roles have overlapping post-game stat profiles (e.g., similar kills, assists, and CS patterns), which makes them harder to distinguish using just basic numerical features.
We believe this baseline model is a good starting point, but not fully sufficient for high-accuracy role classification. The model captures general trends such as supports having high assists and low kills but struggles to differentiate between positions like top, mid, and jungle, thus leading to a lower accuracy than wanted.
To improve on this baseline, we plan to:
Nonetheless, this baseline confirms that post-game performance statistics can offer meaningful insights into role prediction.
We created three new features based on the successes and pitfalls of our baseline model, and our prior knowledge of League of Legends and how different roles contribute to team success.
Kills
, Deaths
, and Assists
as separate features. We decided to incorporate KDA
instead of these individual components, since treating these components as independent may leading to redundancy and potential overfitting. We wanted to make our model as streamlined and reduce noise as much as possible.xpat10 / goldat10
, we created this feature to measure lane efficiency by comparing experience to gold earned at the 10-minute mark. We came up with this feature because Mid laners typically earn more XP per unit of gold due to faster leveling in solo lanes.
Model: Decision Tree
Test Accuracy: 0.6995495495495495
Model: Random Forest
Test Accuracy: 0.7177552552552553
Model: Naive Bayes
Test Accuracy: 0.678978978978979
Model: Logistic Regression
Test Accuracy: 0.7015765765765766
Model: Neural Network
Test Accuracy: 0.7228228228228228
We ultimately decided upon a Random Forest Classifier for our final model because it performed slightly higher compared to the other methods, and upon research, performs strongly for classification.
max_depth
([10, 15, 20, None])n_estimators
([100, 200])min_samples_split
([2, 5])min_samples_leaf
([1, 2])max_features
([‘sqrt’, ‘log2’])n_estimators=200
max_depth=None
min_samples_split=5
min_samples_leaf=1
max_features='sqrt'
Test Accuracy: 0.6760885885885886
Class | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
bot | 0.48 | 0.45 | 0.47 | 5307 |
jng | 1.00 | 1.00 | 1.00 | 5407 |
mid | 0.47 | 0.50 | 0.48 | 5312 |
sup | 0.96 | 0.97 | 0.97 | 5262 |
top | 0.47 | 0.46 | 0.47 | 5352 |
accuracy | 0.68 | 26640 | ||
macro avg | 0.67 | 0.68 | 0.68 | 26640 |
weighted avg | 0.68 | 0.68 | 0.68 | 26640 |
Test Accuracy: 0.9207957957957958
Class | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
bot | 0.96 | 0.97 | 0.97 | 4253 |
jng | 1.00 | 1.00 | 1.00 | 4298 |
mid | 0.83 | 0.82 | 0.82 | 4290 |
sup | 0.98 | 0.99 | 0.99 | 4213 |
top | 0.83 | 0.83 | 0.83 | 4258 |
accuracy | 0.92 | 21312 | ||
macro avg | 0.92 | 0.92 | 0.92 | 21312 |
weighted avg | 0.92 | 0.92 | 0.92 | 21312 |
0.25
, jumping from ~0.67 in our baseline model to ~0.92 in our final model. Additionally, there are significant improvements in precision, recall, and F1-score across all positions, particularly Bottom, Mid, and Top, which were previously extremely hard to differentiate.The confusion matrix from the Final Model demonstrates far more accurate predictions across all roles, especially in the previously confused categories of Bot, Mid, and Top. The stronger diagonal pattern indicates that misclassifications are now rare and mostly occur between conceptually similar roles, specifically Top and Mid.
Overall, by being very intentional with our features and thorough hyperparameter tuning, we improved our model’s performance substantially compared to the baseline model. The Random Forest model generalizes well and captures the nuances of player behavior across different roles, validating our original hypothesis that post-game stats can predict a player’s post-game position.
92%
of the time.