LCS Data Visualization
Introduction
As a Data Scientist, it is essential for me to be able to tell a story and effectively report my findings to stakeholders. One of the best ways to achieve this is through data visualizations, and Tableau is the most popular tool to do that. None of my courses taught Tableau, so I started this personal project which uses tools I am familiar with, all the giving myself the opportunity to learn Tableau.
The goal of the data visualization is to provide statistical data for the LCS and find a correlation between old champs becoming less favorable because of the conception of new champions.
The Data
The data I found had a .csv for every event for every split (eg. Playoffs Spring 2020, Summer Split 2020) and the data dates from 2013 to 2021. All in all, there were 18 files I had to reconcile.
This is what Spring 2020 looks like:
Cleaning the Data
In order to reconcile the data, I combined the files that fall under one season and created a data frame for each one.
Eg. Spring Lock-in 2020, Spring Split 2020, Spring Playoffs 2020 —> Spring 2020
To merge the data decided to outer join on columns ‘Champion’ and ‘Pos’ which worked perfectly for my needs. I then filtered all the unnecessary columns. I kept ‘GP’ (Games Played) because that is the data I need to find the popularity of a champion.
The image is what Spring 2020 looks like after cleaning.
Find Popularity of Champs over time
To see if old Champions lose popularity as new Champions are released, we need to the popularity of champions over time.
I created a function that would loop through each split and create a dictionary that would have the name of the split as the key and the values would be a list which has the Champion as index 0 and games played as index 1.
(Eg. {‘spring_2020’: [[‘Aatrox’, 46], etc.])
Table1. The cleaned data frame showing games played per Champion in each split
Creating Visualization Part 1: Most Popular Champions in every Split
With this data, we move over to Tableau and start on the visualization to display which champions have been popular each split
Having the ability to see the top 5 most popular champions each split gives a good picture of where the meta is at and helps us analyze to see if new champions are really dominating the meta during the year they are released.
Webscrape to get years Champions were made
I needed to collect the dates on which every champ was released, unfortunately, there was no dataset for that, so I scraped: https://leagueoflegends.fandom.com/wiki/List_of_champions
Table2. Data Frame of Champions created in each year
Get Most Popular Champions by Role
By looking at the most popular champions by role we can take a look at which champions stood the test of time and what makes a strong skillset.
Create a function that will take in the role and return a dictionary that has Champions as the key, and the games played in that specific role.
(eg. input ‘jungle’ as role = {‘Lillia’: 59.0, ‘Nidalee’: 197.0 ,etc.})
Now we just have to sort the dictionaries by descending and turn them into a data frame for easier reading
Resulting Data Frames of the top 5 most popular champions in each role:
Get Win/Loss Ratio
Popularity is one indicator of how good a champion is, but I want something that actually shows the performance of the champion. If the champion is overloaded then it would show in the Champion’s win rate. For now, I just want to look at the win rate of the most popular of champions to see how good they actually are.
I created a function that takes in the Champion name and the Role and reads the cleaned data frame and finally outputs what would be the percent of games the Champion has won in that role.
(Eg. get_WL(‘Renekton’, role):
returns 0.51)
Creating Data Visualization Part 2: Popular Champions and Stats
Now we move over to Tableau again and create a visualization with the interesting stats we just collected.
Using the most played champions per role data frames I am able to create a table that displays popularity and performance statistics.
Find correlation between old champion’s popularity when new champions are released
We finally have the cleaned and correct data to find a correlation between the popularity of old champions when new champions are released.
First, I have to start by mapping each champion to a year and create a new dictionary for that information. So We start by looking at the Champions in Table1 and map the Champion to the year they were made by matching them to the data in Table2. The resulting dictionary is nested, the first key is the name of the split and the next key is the year champions were created and the value is a list with games played from each champion from that year
The output looks like this: {‘summer_2021’:{‘2013’:[1,6,2,12,9],etc.]}}
Now I just have to sum up all the values in the dictionary to get the number of games played from champions in each year in every split.
Output: {‘summer_2021’:{‘2013’:30, etc.}}
One issue we run into with using the sum, is that the every year has a different amount of champions made. During Riot’s most prolific era, 2009, they made 41 champions, while in 2018 they only made 3 new champions. So to make this fair, we take the average of games played to champions made.
Table 3. Displays the average games played from every champ created in each year over the splits
Creating Data Visualization Part 3: Correlation
Now that we finally have the appropriate data, we get a better picture with graphs. We can tell if there is a correlation if there is an increasing number of average games played by champions created in recent years.
Conclusion:
There doesn’t seem like there is a clear correlation of old champions getting pushed out of the meta, in fact, it seems like the most popular champions are still older champions from around the 2011 era. We do see small spikes from newer champions from the 2017-2020 period, but nothing nearly as consistent as older champions.