IBTrACS Valid Training Dates

¶Introduction

So far, we’ve seen how to pull GOES imagery and how to overlay IBTrACS data on the imagery. We’ll use these same ideas now to obtain dates to fetch training data. Some things we’ll need to consider are:

Some bands aren’t visible at night, so we don’t want to fetch images that aren’t useful. To address this, we can start by using only infrared bands on the imager that capture clouds at night. If more data is needed, we can come back to this point and identify daytime visible bands.
The geostationary view of the imager has a limited spatial extent, so some IBTrACS data points may not be visible to the imager. We can use the projection data of the GOES image filter out the latitudes and longitudes of the IBTrACS data.
GOES16 was launched in November 2016, so we’ll only be able to use a subset of the IBTrACS data. We just need to make sure we exclude any IBTrACS data outside of the GOES16 imaging window.
IBTrACS data is every 3 hours, while GOES16 is every 5 minutes. We can assume that if a storm had a track observation at time $t$ and again at time $t$+3 hours, an image within that time has a storm in it.

Each of these points will play a role in how we automatically assess if an image is a positive or negative training sample.

Additionally, it seems that the first GOES image, at least that I can access on the AWS storage, wasn’t available until 10 April, so we need to filter those dates from the final results.

Downloading the data will take some time, so first, I just want to build a list of dates that, according to the IBTrACS data, correspond to a time when a tropical cyclone was present ($y=1$) or was not present ($y=0$). At first, I think the goal should be to get an even number of positive and negative training samples, so I think it might be best to randomly pick a time for a positive example, then ensure that the next selection is a negative example, and so on.

¶Import modules

First, let’s import the modules we’ll need. I’ve created some functions to do things like read in the IBTrACS and GOES data, which I implemented in previous notebooks.

# Import modules
import datetime
import numpy as np
from pyproj import Proj
import matplotlib.pyplot as plt
import os

# Import functions I've written
os.chdir("..")
import goes
import ibtracs

¶Read IBTrACS Data

Now we read in the IBTrACS data from 2017 until now. Let’s get rid of anything else before 10 April.

ibtracsPath = ibtracs.download_data(basin="ALL",overwrite=False)
dfTracks = ibtracs.read_data(ibtracsPath,True,2017,2020)
dfTracks = dfTracks[dfTracks['ISO_TIME']>=datetime.datetime(2017,4,10)]

¶Read GOES16 Image

Next, we’ll read in the GOES16 image corresponding to 30 days ago. The image itself doesn’t matter yet. Since the imager is geostationary, we can just take any image and use it to make sure our IBTrACS data all falls within the GOES16 full disc array.

# Set the parameters to download data
date = datetime.datetime.now()-datetime.timedelta(days=30)
bucketName = 'noaa-goes16'
product = 'ABI-L1b-RadF'
credPath = "secrets.csv"
band = 13

# Get the GOES data
ds = goes.download_data(date,credPath,bucketName,product,band)

¶Handle Projections

In order to get the IBTrACS data relative to the GOES imagery, we need to get the projection of the GOES data.

# Get dataset projection data
satHeight = ds.goes_imager_projection.perspective_point_height
satLon = ds.goes_imager_projection.longitude_of_projection_origin
satSweep = ds.goes_imager_projection.sweep_angle_axis
majorMinorAxes = (ds.goes_imager_projection.semi_major_axis,ds.goes_imager_projection.semi_minor_axis)

# The projection x and y coordinates equals the scanning angle (in radians) multiplied by the satellite height
x = ds.variables['x'][:] * satHeight
y = ds.variables['y'][:] * satHeight

# Create X and Y meshgrids
X, Y = np.meshgrid(x, y)

# Create a pyproj geostationary map object
p = Proj(proj='geos', h=satHeight, lon_0=satLon, sweep=satSweep)

# Get latitudes and longitudes
lons, lats = p(X, Y, inverse=True)

¶Filter Out with Bounding Box

A bounding box corresponding to the minimum and maximum latitude and longitudes covers more space than the full-disc, but the only way to really check is to loop through all of the IBTrACS data, project it onto the GOES projection, and then see if it’s in the image. That will take a lot of time, where this will not, so let’s use a bounding box as a first pass here to avoid unnecessary loopling.

# Get a simple bounding box based on min/max lat/lons
lons = np.where(lons==1e+30,np.nan,lons)
lats = np.where(lats==1e+30,np.nan,lats)
minLat = np.nanmin(lats[lats != -np.inf])
maxLat = np.nanmax(lats[lats != np.inf])
minLon = np.nanmin(lons[lons != -np.inf])
maxLon = np.nanmax(lons[lons != np.inf])

# Query IBTraCS data based on bounding box
dfTracks = dfTracks[
    (dfTracks['LAT'] >= minLat) &
    (dfTracks['LAT'] <= maxLat) &
    (dfTracks['LON'] >= minLon) &
    (dfTracks['LON'] <= maxLon)
]

¶Drop Additional Off-Disc Samples

Now that we’ve limited the extent a bit, let’s drill down and make sure none of the points are off of the full-disc. First, we find the point on the image that corresponds to the latitude/longitude of the storm. Since the lats and lons arrays have values of NaN where the points are off the disc, we can check if that point in the lat or lon array is missing. We only need to check one array, since it’s a meshgrid, we’ll check lons. And rather than just checking if that one point is NaN, let’s check if any point within a window of size checkSize is NaN. This should avoid any points that are just barely sitting on the edge of the disc.

# Create empty list
dropInds = []

# Reset indices of dataframe
dfTracks = dfTracks.reset_index()

for dfInd, row in dfTracks[['LAT','LON']].iterrows():

    # Cast latitude and longitude to float
    trackLat = float(row["LAT"])
    trackLon = float(row["LON"])

    # Convert lon/lat to x/y
    trackX,trackY = p(trackLon,trackLat)

    # Get the closest point to the IBTrACS data
    xInd = np.nanargmin(abs(x-trackX))
    yInd = np.nanargmin(abs(y-trackY))

    # Check that none of 50 points on any side of the storm are off of the disc
    checkSize = 50
    offDisc = np.isnan(lons[yInd-checkSize:yInd+checkSize,xInd-checkSize:xInd+checkSize]).any()

    # If the points are off the disc, append the dataframe index to drop after looping
    if offDisc:
        dropInds.append(dfInd)

# Drop any indices that fell off the disc
dfTracks = dfTracks.drop(dfTracks.index[dropInds])

¶Save the Output

So now, we’ve handled the issues of the imager extent and the IBTrACS extent, both spatially and temporally. Let’s save the output now so we can use it to make training data.

dfTracks.to_csv('./data/ibtracs_GOES16.csv')