Exploratory Data Analysis of Pokemon Types

Introduction

Here, we investigate some of the data as it pertains to Pokemon type.

Load Libraries

Start off by loading the libraries we need.

# Load libraries
library(dplyr)
library(tidyr)
library(ggplot2)

Read and Format Data

Read in the data we scraped in the previous steps. We also need to make sure to set some column types in order to group and summarize the data. The Generation and Evolution.Place columns are categorical, and the Has.Evolution column is boolean, so we need to specify those column types.

# Read the data
pokedex <- read.csv('~/Projects/pokedex/data/pokedex_ext.csv',sep=',')

# Change categorical columns
pokedex$Generation <- as.factor(pokedex$Generation)
pokedex$Evolution.Place <- as.factor(pokedex$Evolution.Place)
pokedex$Has.Evolution <- as.logical(pokedex$Has.Evolution)

# Show Pokedex data
head(pokedex)
Number Name URLs Type.1 Type.2 Species Height Weight HP Attack Defense Sp..Atk Sp..Def Speed Total Type.3 Has.Evolution Evolution.Place Maximum.Evolution.Count Evolution.Index Generation Legendary.Status
1 Bulbasaur https://pokemondb.net/pokedex/bulbasaur Grass Poison Seed Pokémon 0.7 6.9 45 49 49 65 65 45 318   TRUE 1 3 0.33 1  
2 Ivysaur https://pokemondb.net/pokedex/ivysaur Grass Poison Seed Pokémon 1 13 60 62 63 80 80 60 405   TRUE 2 3 0.67 1  
3 Venusaur https://pokemondb.net/pokedex/venusaur Grass Poison Seed Pokémon 2 100 80 82 83 100 100 80 525   TRUE 3 3 1 1  
4 Charmander https://pokemondb.net/pokedex/charmander Fire   Lizard Pokémon 0.6 8.5 39 52 43 60 50 65 309   TRUE 1 3 0.33 1  
5 Charmeleon https://pokemondb.net/pokedex/charmeleon Fire   Flame Pokémon 1.1 19 58 64 58 80 65 80 405   TRUE 2 3 0.67 1  
6 Charizard https://pokemondb.net/pokedex/charizard Fire Flying Flame Pokémon 1.7 90.5 78 84 78 109 85 100 534   TRUE 3 3 1 1  

Pokemon Type Frequency

I’m curious how many Pokemon of each type there are. I suspect most Pokemon will be water or grass type Pokemon.

Of note here, some Pokemon have multiple, up to three, different types. Those are stored in Type.1, Type.2, and Type.3. To count the Pokemon of each type, we’ll need to either only count up the Pokemon of Type.1, or we can pivot to a longer table and Pokemon with multiple types will have multiple entries. I’ll pivot to a longer table, and then count the Pokemon of each type.

# Get a count of of each type
type_count <- pokedex %>%
  pivot_longer(c("Type.1","Type.2","Type.3"),names_to="Type.Number",values_to="Type") %>%
  drop_na("Type") %>%
  group_by(Type) %>%
  count() %>%
  arrange(desc(n))

# Show a bar plot of type counts
ggplot(type_count, aes(x = reorder(Type, -n),y=n, ymin=0)) +
  geom_bar(stat="identity",position="dodge",width=0.7,fill="#81a2be") +
  ggtitle(label="Count of Pokemon for Each Type") +
  xlab("Type") +
  ylab("Count") +
  scale_alpha(guide = 'none') +
  theme_minimal() +
  scale_x_discrete(guide = guide_axis(n.dodge = 2))

# Save image
ggsave("images/type-counts.png",bg="white")

type-counts.png

As expected, water and grass are among the most frequent classes. I figured “Normal” Pokemon would make an appearance up top, but I didn’t quite expect it to be the second most common type. I definitely didn’t expect psychic Pokemon to be so common either.

Mean Base Stats by Type

Now, I’d like to view the average stats of each Pokemon to see which types of Pokemon are typically the strongest. I’d expect that dragon type and steel type are among the top, from my experience with the game. Again, I’ll pivot the table so that Pokemon with multiple types have those multiple types represented.

# Calculate mean stats by Pokemon type
type_stats <- pokedex %>%
  pivot_longer(c("Type.1","Type.2","Type.3"),names_to="Type.Number",values_to="Type") %>%
  drop_na("Type") %>%
  group_by(Type) %>%
  summarize_at(vars(HP:Speed),mean) %>%
  pivot_longer(-1,names_to="Stat",values_to="Value")

# Specify Stat as a factor and set order
type_stats$Stat <- factor(type_stats$Stat,
                          levels=c("HP",
                                   "Attack",
                                   "Defense",
                                   "Sp..Atk",
                                   "Sp..Def",
                                   "Speed"))

# Plot Pokemon Stats as a stacked bar
ggplot(data = type_stats, aes(x=Type, y=Value, fill=Stat, ymin=0)) +
  geom_bar(stat="identity",position="stack",width=0.7) +
  labs(title = "Average Base Stats by Type",
       y = "Stat Value",
       x = "Type") +
  scale_fill_brewer(palette="Dark2") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

# Save image
ggsave("images/type-stats.png",bg="white")

type-stats.png

Dragon and steel Pokemon did indeed come out among the top, but dark, fighting, and ice Pokemon are also quite strong in terms of their average total base stats. Bug, normal, and poison Pokemon are among the weakest, which is not terribly surprising.

Rather than plotting this as a stacked bar chart, I’d like to see the different stats all separated out. I could make a clustered bar chart, but with as many types as we have here, that would be a very wide, messy chart. Let’s instead make subplots using facet_wrap.

# Plot each Pokemon Stat on an individual bar chart
ggplot(data = type_stats, aes(x=Type, y=Value, ymin=0)) +
  geom_bar(stat="identity",position="dodge",width=0.7,fill="#81a2be") +
  labs(title = "Average Base Stats by Type",
       y = "Stat Value", x = "Type") +
  facet_wrap(~Stat,ncol=2) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  scale_fill_brewer(palette="Dark2")

# Save image
ggsave("images/type-stats-2.png",bg="white")

type-stats-2.png

Now we can better see the distributions of stats for different types.

  • Dragon Pokemon have the highest HP.
  • Fighting type Pokemon have the highest attack.
  • Rock and steel Pokemon have the highest defense.
  • Psychic, dragon, fire, and electric Pokemon have the highest special attack.
  • Psychic, fairy, steel, and ghost Pokemon have the highest special defense.
  • Flying, electric, dragon, and dark Pokemon have the highest speed.

Base Stats Distribution

Rather than looking at only the means, let’s also look at how the stats are distributed for each Pokemon, stating with the total base stats.

pokedex %>%
  pivot_longer(c("Type.1","Type.2","Type.3"),names_to="Type.Number",values_to="Type") %>%
  drop_na("Type") %>%
  ggplot(aes(x=Type,y=Total)) +
    geom_violin(width=0.8,fill="#81a2be") +
    geom_boxplot(width=0.1, color="black", alpha=0.5) +
    scale_fill_brewer(palette="Dark2") +
    ggtitle("Distribution of Total Stats by Type") +
    xlab("Type") +
    ylab("Total Stats") +
    theme_minimal() +
    scale_x_discrete(guide = guide_axis(n.dodge = 2))

# Save image
ggsave("images/type-total-stats-violin.png",bg="white")

type-total-stats-violin.png

Almost every type has an hourglass distrubution, likely because many Pokemon are relatively weak or strong depending on where they are in their evolution chain. The box and violin plot indicates again that dark, ragon, fighting, ice, psychic, rock, and steel Pokemon are not only among the strongest Pokemon on average but that there are many Pokemon with total base stats towards the stronger end of the distribution.

Finally, let’s look at each different stat separately, rather than just the total stats.

# Calculate mean stats by Pokemon type
pokedex %>%
  pivot_longer(c("Type.1","Type.2","Type.3"),names_to="Type.Number",values_to="Type") %>%
  drop_na("Type") %>%
  pivot_longer(c(HP:Speed),names_to="Stat",values_to="Value") %>%
  ggplot(aes(x=Type, y=Value, ymin=0)) +
    geom_violin(width=0.8,fill="#81a2be") +
    geom_boxplot(width=0.1, color="black", alpha=0.5) +
    labs(title = "Distribution of Base Stats by Type",
         y = "Stat Value", x = "Type") +
    facet_wrap(~Stat,ncol=2) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
    scale_fill_brewer(palette="Dark2")

# Save image
ggsave("images/type-stats-violin.png",bg="white")

type-stats-violin.png

Nothing here is incredibly new from what we showed with the mean stats before. There’s a lot going on here, but it really just accentuates the information from the mean stats. I do prefer this view of the stats though, as it shows just how much more attack fighting Pokemon have, defense rock and steel Pokemon have, and how much Special Attack psychic Pokemon have, and how often that is the case.