We have a pretty fleshed out Pokedex dataset now, but I’d like to add some extra information, such as Pokemon generation and legendary status. This could probably have been done at the scraping stage, but I have already scraped the data and don’t want to hit the server again just to add some known data.
¶Load Packages
First, let’s load what we need to run the code. We’ll scrape some additional information to get the list of legendary Pokemon, so we’ll again rely on rvest
.
library(rvest) library(knitr) library(dplyr)
¶Read Scraped Pokedex Data
Next, read the Pokedex data from the previous step.
# Read the pokemon data pokedex_data <- read.csv('./data/pokedex.csv',sep=',') glimpse(pokedex_data)
## Rows: 1,010 ## Columns: 20 ## $ Number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,… ## $ Name <chr> "Bulbasaur", "Ivysaur", "Venusaur", "Charmande… ## $ URLs <chr> "https://pokemondb.net/pokedex/bulbasaur", "ht… ## $ Type.1 <chr> "Grass", "Grass", "Grass", "Fire", "Fire", "Fi… ## $ Type.2 <chr> "Poison", "Poison", "Poison", NA, NA, "Flying"… ## $ Species <chr> "Seed Pokémon", "Seed Pokémon", "Seed Pokémon"… ## $ Height <dbl> 0.7, 1.0, 2.0, 0.6, 1.1, 1.7, 0.5, 1.0, 1.6, 0… ## $ Weight <dbl> 6.9, 13.0, 100.0, 8.5, 19.0, 90.5, 9.0, 22.5, … ## $ HP <int> 45, 60, 80, 39, 58, 78, 44, 59, 79, 45, 50, 60… ## $ Attack <int> 49, 62, 82, 52, 64, 84, 48, 63, 83, 30, 20, 45… ## $ Defense <int> 49, 63, 83, 43, 58, 78, 65, 80, 100, 35, 55, 5… ## $ Sp..Atk <int> 65, 80, 100, 60, 80, 109, 50, 65, 85, 20, 25, … ## $ Sp..Def <int> 65, 80, 100, 50, 65, 85, 64, 80, 105, 20, 25, … ## $ Speed <int> 45, 60, 80, 65, 80, 100, 43, 58, 78, 45, 30, 7… ## $ Total <int> 318, 405, 525, 309, 405, 534, 314, 405, 530, 1… ## $ Type.3 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA… ## $ Has.Evolution <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… ## $ Evolution.Place <int> 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1… ## $ Maximum.Evolution.Count <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3… ## $ Evolution.Index <dbl> 0.33, 0.67, 1.00, 0.33, 0.67, 1.00, 0.33, 0.67…
¶Add Generation Data as a Column
Here, we provide the number of Pokemon in each generation, according to PokemonDB, and then use that information to generate a column assigning each Pokemon to their proper generation.
# Specify how many pokemon were in each generation generation_count <- list(151,100,135,107,156,72,88,96,105) # Build the list of generation IDs for each pokemon generation_id = list() for (i in 1:length(generation_count)) { generation_id <- c(generation_id,rep(i,generation_count[[i]])) } # Add the list as a column in the data pokedex_data$Generation <- as.numeric(generation_id) # Show a table of the data kable(pokedex_data[148:153,c(1,2,21)])
Number | Name | Generation | |
---|---|---|---|
148 | 148 | Dragonair | 1 |
149 | 149 | Dragonite | 1 |
150 | 150 | Mewtwo | 1 |
151 | 151 | Mew | 1 |
152 | 152 | Chikorita | 2 |
153 | 153 | Bayleef | 2 |
¶Add Legendary Status as a Column
To add legendary status, we’ll use the information on Serebii’s Legendary Pokemon List. The table their has a thumbnail for each Pokemon and their legendary status. Conveniently, the thumbnail file name is the number of the Pokemon, so we can use that as a key to identify which Pokemon have which legendary status rather than relying on the names in our scraped Pokedex table to match up identically with what’s shown in this Serebii table.
Note: Serebii provides three unique legendary status categories.
- Sub-legendary - Pokemon like Articuno, Zapdos, and Moltres
- Legendary - Pokemon like Mewtwo
- Mythical - Pokemon like Mew
url <- "https://www.serebii.net/pokemon/legendary.shtml" # Read the body from the page body <- url %>% read_html() %>% html_nodes("body") # Get the info cards for each Pokemon tables <- html_nodes(body, "table.trainer") # Fetch the Pokemon numbers from the source in the table for sub-legendary sub_legendary_id <- tables[1] %>% html_nodes("img") %>% html_attr("src") sub_legendary_id <- gsub("\\D", "", sub_legendary_id[grepl("*[0-9].png", sub_legendary_id)]) sub_legendary_id <- as.numeric(sub_legendary_id) # Fetch the Pokemon numbers from the source in the table for legendary legendary_id <- tables[2] %>% html_nodes("img") %>% html_attr("src") legendary_id <- gsub("\\D", "", legendary_id[grepl("*[0-9].png",legendary_id)]) legendary_id <- as.numeric(legendary_id) # Fetch the Pokemon numbers from the source in the table for mythical mythical_id <- tables[3] %>% html_nodes("img") %>% html_attr("src") mythical_id <- gsub("\\D", "", mythical_id[grepl("*[0-9].png", mythical_id)]) mythical_id <- as.numeric(mythical_id) # Trim off any numbers that exceed our current pokedex. This is happening # because Serebii has additional pokemon from a recent expansion pack that # Pokemon DB did not yet have. sub_legendary_id <- sub_legendary_id[sub_legendary_id < length(pokedex_data$Number)] legendary_id <- legendary_id[legendary_id < length(pokedex_data$Number)] mythical_id <- mythical_id[mythical_id < length(pokedex_data$Number)] # Build the column for legendary status legendary_status <- rep(NA,length(pokedex_data$Number)) legendary_status[sub_legendary_id] <- "Sub-Legendary" legendary_status[legendary_id] <- "Legendary" legendary_status[mythical_id] <- "Mythical" # Add legendary status as a column for the pokedex data pokedex_data$Legendary.Status <- legendary_status kable(pokedex_data[145:151,c(1,2,22)])
Number | Name | Legendary.Status | |
---|---|---|---|
145 | 145 | Zapdos | Sub-Legendary |
146 | 146 | Moltres | Sub-Legendary |
147 | 147 | Dratini | NA |
148 | 148 | Dragonair | NA |
149 | 149 | Dragonite | NA |
150 | 150 | Mewtwo | Legendary |
151 | 151 | Mew | Mythical |
¶Write Data to Output File
And finally, now that we are done, let’s write the resulting table to an output file.
# Write data to CSV write.table(pokedex_data, "~/Projects/pokedex/data/pokedex_ext.csv", sep = ",", row.names = FALSE)