Scrape Initial Pokedex

Before we scrape all of the Pokemon data, let’s start by building the table of just Pokemon and their number, again using the PokemonDB site.

Load Libraries

Again, we’ll primarily use the rvest package, along with dplyr and tidyr for some additional common functions.

# Load packages
library(rvest)
library(dplyr)
library(tidyr)

Load National Pokedex Page

Let’s load the page data from the National Pokedex page, which has a table of all Pokemon, their number, and a link to their page.

# Set the URL to fetch data from
url <- "https://pokemondb.net/pokedex/national"

# Read the body from the page
body <- url %>% read_html() %>% html_nodes("body")

# Get the info cards for each Pokemon
infocards <- html_nodes(body, "span.infocard-lg-data.text-muted")

Get the Initial Pokedex Data

Using the tags for each “infocard” which contains the information we want for each Pokemon, get the numbers, names, and URLs for each Pokemon.

# Fetch the Pokemon numbers
numbers <- infocards %>%
  html_element("small") %>%
  html_text()

# Fetch the Pokemon names
names <- infocards %>%
  html_element("a") %>%
  html_text()

# Fetch the Pokemon URLs
urls <- infocards %>%
  html_element("a") %>%
  html_attr("href")
urls <- paste0("https://pokemondb.net", urls)

Assemble Initial Pokedex Dataframe

Finally, let’s put all of this data into a dataframe.

# Create tibble for Pokedex
data_tbl <- tibble(numbers, names, urls) %>%
  rename(Number = numbers, Name = names, URLs = urls)
data_tbl$Number <- substring(data_tbl$Number, 2) %>% as.numeric()

glimpse(data_tbl)
Rows: 1,010
Columns: 3
$ Number <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, …
$ Name   <chr> "Bulbasaur", "Ivysaur", "Venusaur", "Charmander", "Charmeleon",…
$ URLs   <chr> "https://pokemondb.net/pokedex/bulbasaur", "https://pokemondb.n…

This initial table will serve as the initial table for scraping all of the Pokemon data.