import os
import matplotlib.pyplot as plt
import geopandas as gpd
import pandas as pd
import rioxarray as rioxr
import matplotlib.ticker as ticker
from matplotlib.patches import Patch
import warnings
Impacts of the Fire
The Thomas Fire, which ignited on December 4, 2017, and burned until January 12, 2018, stands as one of the most devastating wildfires in California’s history. Spanning over 281,893 acres (approximately 1,141 square kilometers), it destroyed 1,063 structures and caused damages exceeding $2.2 billion.
Located in Southern California, the fire affected Ventura and Santa Barbara counties. These areas are known for their dry, Mediterranean climate, which can exacerbate fire conditions. The fire’s vast spread impacted a variety of landscapes, including urban areas, agricultural lands, and natural reserves.
The Thomas Fire occurred during a particularly severe fire season in California, which saw multiple large-scale wildfires. Historically, this region has been prone to wildfires, but recent years have seen an increase in both frequency and intensity, attributed to climate change and changing land use patterns.
The fire’s impact on air quality was devastating, with particulate matter levels soaring and the Air Quality Index (AQI) reaching hazardous levels in affected areas. Communities, especially those with vulnerable populations such as the elderly and children, faced significant health risks. The destruction of over 1,000 structures also led to displacement and long-term economic challenges for residents.
In this blog post, we delve into the use of false color imagery and AQI analysis to uncover the environmental and health impacts of the Thomas Fire. By leveraging satellite data and geospatial analysis, we aim to provide a comprehensive understanding of how this catastrophic event altered the landscape and air quality. This analysis offers insights into the broader implications of wildfires on public health and environmental sustainability.
About the Data
Fire Perimeter: Shapefile of the Thomas Fire perimeter, obtained from the California Department of Forestry and Fire Protection (CAL FIRE).
Satellite Imagery: Landsat 8 imagery accessed via the server at UCSB
AQI Data[^3]: This dataset is a dataframe with AQI values of cities all across the US. Retrieved from The U.S. Environmental Protection Agency [^3]
References
U.S. Environmental Protection Agency. (2017). Daily AQI by county, 2017 [Data set]. Retrieved from https://aqs.epa.gov/aqsweb/airdata/daily_aqi_by_county_2017.zip
U.S. Environmental Protection Agency. (2017). Daily AQI by county, 2017 [Data set]. Retrieved from https://aqs.epa.gov/aqsweb/airdata/daily_aqi_by_county_2018.zip
California Department of Forestry and Fire Protection (CalFire). (2023). Fire perimeters. Retrieved November 16, 2024, from https://www.fire.ca.gov/what-we-do/fire-resource-assessment-program/fire-perimeters
Landsat Data: EDS 220. (2024). Retrieved from /courses/EDS220/data/hwk4_landsat_data landsat8-2018-01-26-sb-simplified.nc
False Color: A Quick Overview
False color imagery is a powerful tool used in remote sensing to enhance specific features of the landscape that might not be visible in natural color images. By assigning visible colors (like red, green, and blue) to spectral bands that the human eye cannot see (such as near-infrared), false color images highlight various aspects of the environment. For instance, healthy vegetation reflects more infrared light and appears bright red in false color images, while burned areas, like those from the Thomas Fire, show up in darker hues, making them easier to distinguish.
About
Purpose:
In this blog post, we delve into the use of false color imagery and AQI analysis to uncover the environmental and health impacts of the Thomas Fire. By leveraging satellite data and geospatial analysis, we aim to provide a comprehensive understanding of how this catastrophic event altered the landscape and air quality. This analysis offers insights into the broader implications of wildfires on public health and environmental sustainability.
Now let’s dive in by importing some libraries!
Import Libraries
Import File Geodatabase
# Read in gdb file with geopandas. Save as variable fire_bounds
= gpd.read_file("data/fire23-1gdb/fire23_1.gdb") fire_bound
Quick Data Exploration
Checking a few things like head, crs and shape is important to familarizing yourself with the data you are working with.
# Check the dataset
print(fire_bound.head())
# Check the CRS
print(fire_bound.crs)
# Check the number of rows and columns
print(fire_bound.shape)
YEAR_ STATE AGENCY UNIT_ID FIRE_NAME INC_NUM \
0 2023.0 CA CDF SKU WHITWORTH 00004808
1 2023.0 CA LRA BTU KAISER 00010225
2 2023.0 CA CDF AEU JACKSON 00017640
3 2023.0 CA CDF AEU CARBON 00018821
4 2023.0 CA CDF AEU LIBERTY 00018876
IRWINID ALARM_DATE \
0 {7985848C-0AC2-4BA4-8F0E-29F778652E61} 2023-06-17T00:00:00+00:00
1 {43EBCC88-B3AC-48EB-8EF5-417FE0939CCF} 2023-06-02T00:00:00+00:00
2 {B64E1355-BF1D-441A-95D0-BC1FBB93483B} 2023-07-01T00:00:00+00:00
3 {CB41DB0A-E4B1-489D-A4EA-738F2CD6DB3B} 2023-07-11T00:00:00+00:00
4 {F83F70A4-07A7-40B8-BD51-10CCC1C30D63} 2023-07-11T00:00:00+00:00
CONT_DATE C_METHOD CAUSE OBJECTIVE COMPLEX_NAME \
0 2023-06-17T00:00:00+00:00 1.0 5.0 1.0 None
1 2023-06-02T00:00:00+00:00 1.0 5.0 1.0 None
2 2023-07-02T00:00:00+00:00 1.0 2.0 1.0 None
3 2023-07-11T00:00:00+00:00 1.0 9.0 1.0 None
4 2023-07-12T00:00:00+00:00 1.0 14.0 1.0 None
COMPLEX_ID COMMENTS FIRE_NUM Shape_Length Shape_Area \
0 None None None 933.907127 23184.946763
1 None None None 986.343353 55046.898792
2 None None None 2115.572740 112561.147688
3 None None None 1994.866526 237794.210367
4 None None None 4008.305800 287241.810389
geometry
0 MULTIPOLYGON (((-243242.555 394267.371, -24324...
1 MULTIPOLYGON (((-167826.147 203275.612, -16782...
2 MULTIPOLYGON (((-79176.443 31393.490, -78959.0...
3 MULTIPOLYGON (((-85936.934 47921.886, -85856.4...
4 MULTIPOLYGON (((-86427.836 26461.613, -86436.1...
EPSG:3310
(22261, 19)
# Check if the CRS is projected or geographic
if fire_bound.crs.is_projected:
print("The CRS is projected.")
elif fire_bound.crs.is_geographic:
print("The CRS is geographic.")
The CRS is projected.
Fire Data Exploration
With some quick data explloration, you can see that the dataset from California Department of Forestry and Fire Protection has over 22,000 fires between the years 1878 and 2023. You can also see what types column names and values we are dealing with. For example, are there capital letters? spaces? unusual characters in columns? This will help further filtering the data to the Thomas Fire in 2017
💡 Tip: Look at the year column. Do you notice anything that might make filtering the data in that column tricky?
Now that we know we can filter for just the Thomas Fire in 2017
# Save as vairable thomas_fire
= fire_bound[(fire_bound['FIRE_NAME'] == 'THOMAS') & (fire_bound['YEAR_'] == 2017.0)]
thomas_fire
# Check the dataset
thomas_fire.head()
YEAR_ | STATE | AGENCY | UNIT_ID | FIRE_NAME | INC_NUM | IRWINID | ALARM_DATE | CONT_DATE | C_METHOD | CAUSE | OBJECTIVE | COMPLEX_NAME | COMPLEX_ID | COMMENTS | FIRE_NUM | Shape_Length | Shape_Area | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2654 | 2017.0 | CA | USF | VNC | THOMAS | 00003583 | 2017-12-04T00:00:00+00:00 | 2018-01-12T00:00:00+00:00 | 7.0 | 9.0 | 1.0 | None | None | CONT_DATE based on Inciweb | None | 445282.444798 | 1.140367e+09 | MULTIPOLYGON (((34867.386 -396856.457, 34819.3... |
# Check the fire perimeter by plotting
thomas_fire.plot()
After I filtered for just the 2017 Thomas Fire, it is handy to plot it to see exactly what you are working with. In this case, this looks to be the Thomas Fire perimeter we are looking for! Next, I’ll export it as a shapefile to use later.
Export the filtered dataframe as a shapefile to use later in our analysis
# Use package warnings to ignore unwanted outputs`qw
'ignore')
warnings.filterwarnings(
# Specify the output path where you want to save the new shapefile
= "data/Thomas_Fire.shp"
output_path
# Save the filtered GeoDataFrame to the specified path
thomas_fire.to_file(output_path)
📝 A Quick Note: I chose to use a shapefile for several reasons, the first being because I am more familiar with that file type. Second, the format is straightforward and easy to use when working with vector data.
Bring in Landsat Image
Next, upload the satellite image of the Santa Barbara area. This image is form the Landsat program and specifically Landsat 8
# Create file path to landsat data, save as variable root
= os.path.join('/','courses', 'EDS220', 'data', 'hwk4_landsat_data')
root
# Use os.path.join to select variable root and file landsat8-2018-01-26-sb-simplified.nc file, save it as variable fp
= os.path.join(root,'landsat8-2018-01-26-sb-simplified.nc')
fp
# Use rioxr to open file and save it as color bands
= rioxr.open_rasterio(fp)
color_bands
# Check the variable
color_bands
<xarray.Dataset> Size: 25MB Dimensions: (band: 1, x: 870, y: 731) Coordinates: * band (band) int64 8B 1 * x (x) float64 7kB 1.213e+05 1.216e+05 ... 3.557e+05 3.559e+05 * y (y) float64 6kB 3.952e+06 3.952e+06 ... 3.756e+06 3.755e+06 spatial_ref int64 8B 0 Data variables: red (band, y, x) float64 5MB ... green (band, y, x) float64 5MB ... blue (band, y, x) float64 5MB ... nir08 (band, y, x) float64 5MB ... swir22 (band, y, x) float64 5MB ...
Take a look at the output for the variable color_bands. You can see the data contains variables named “red”, “green”, “blue”, “nir08”, “swir22” and has the dimensions (band: 1x: 870y: 731). These variable names are what we use to present a false color image.
Drop the Band Dimensions and False Color Mapping
# Original dimensions and coordinates
print(color_bands.dims, color_bands.coords,'\n')
# Remove length 1 dimension (band)
= color_bands.squeeze()
color_bands print(color_bands.dims, color_bands.coords,'\n')
# Remove coordinates associated to band
= color_bands.drop_vars('band')
color_bands print(color_bands.dims, color_bands.coords)
FrozenMappingWarningOnValuesAccess({'band': 1, 'x': 870, 'y': 731}) Coordinates:
* band (band) int64 8B 1
* x (x) float64 7kB 1.213e+05 1.216e+05 ... 3.557e+05 3.559e+05
* y (y) float64 6kB 3.952e+06 3.952e+06 ... 3.756e+06 3.755e+06
spatial_ref int64 8B 0
FrozenMappingWarningOnValuesAccess({'x': 870, 'y': 731}) Coordinates:
band int64 8B 1
* x (x) float64 7kB 1.213e+05 1.216e+05 ... 3.557e+05 3.559e+05
* y (y) float64 6kB 3.952e+06 3.952e+06 ... 3.756e+06 3.755e+06
spatial_ref int64 8B 0
FrozenMappingWarningOnValuesAccess({'x': 870, 'y': 731}) Coordinates:
* x (x) float64 7kB 1.213e+05 1.216e+05 ... 3.557e+05 3.559e+05
* y (y) float64 6kB 3.952e+06 3.952e+06 ... 3.756e+06 3.755e+06
spatial_ref int64 8B 0
💡 Tip: Dropping the band will allow us to work with one less dimesion, making it easier to plot
In remote sensing and imaging, each color band (red, green, blue) represents a different part of the visible light spectrum captured by sensors. By combining these bands, we can create a more natural and visually understandable image, similar to what we see with our eyes.
# Convert the selected variables to a NumPy array and plot the RGB image
'red', 'green', 'blue']].to_array().plot.imshow() color_bands[[
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
A quick look and we can see that this does not look like a satellite image and we recieved a warning about “valid range”. This is because our values are being skewed because of some extreme numbers. we will want to change the parameter to robust = True. It scales the image based on these percentiles, effectively ignoring the extreme low (below the 2nd percentile) and high (above the 98th percentile) values.
🤔 Thinking: What are some object in the earths atmosphere that could cause extreme values? HINT: Think of what blocks the sun for us!
# Convert the selected variables to a NumPy array and plot the RGB image, adjust scale with robust = True
'red', 'green', 'blue']].to_array().plot.imshow(robust = True) color_bands[[
Switching to the SWIR (Shortwave Infrared), NIR (Near-Infrared), and blue spectral bands allows us to visualize environmental details that aren’t visible with traditional RGB bands. SWIR is useful for detecting moisture content in soil and vegetation, while NIR helps assess vegetation health. It should help us seeing the impact of the Thomas Fire more clearly.
# Create false color image by plotting the short-wave infrared (swir22), near-infrared, and red variables
'swir22', 'nir08', 'blue']].to_array().plot.imshow(robust = True) color_bands[[
Now, looking at the bottom right of the image, the Thomas Fire burn area is indeed more clearly defined.
Map with Thomas Fire SHapefile We Made
Now we can use the shapefile we made early to highlight the Thomas Fire’s perimeter.
# Import Thomas Fire Perimeter
= gpd.read_file(os.path.join('data','Thomas_Fire.shp'))
fire_bound
# Check CRS's
print(fire_bound.crs == color_bands.rio.crs)
False
Since the CRS’s do not match, we need to transform. Since this is important for mapping, we will write an assert test to make sure
# Ensure CRS matches (transform CRS if necessary)
if fire_bound.crs != color_bands.rio.crs:
= fire_bound.to_crs(color_bands.rio.crs)
fire_bound
# Check if the CRSs match
assert fire_bound.crs == color_bands.rio.crs, "CRS of fire_bound does not match CRS of color_bands"
print("CRS match successfully!")
CRS match successfully!
💡 Tip: Assert tests are a quick way to verify your output is what you want
Now that they match, we can plot them in the same figure.
# Plot the RGB image with robust scaling
= plt.subplots(figsize=(10, 8))
fig, ax 'swir22', 'nir08', 'red']].to_array().plot.imshow(ax=ax, robust=True)
color_bands[[
# Set the y-axis to display integers only
'%d'))
ax.yaxis.set_major_formatter(ticker.FormatStrFormatter(
# Overlay the fire perimeter
=ax, edgecolor='red', facecolor='none', linewidth=1.5, label='Thomas Fire Perimeter')
fire_bound.plot(ax
# Customize the plot
'2017 Thomas Fire Boundary Overlaid on Landsat\nFalse Color Image (SWIR/NIR/Red)', fontsize=15)
ax.set_title('Easting (meters)')
ax.set_xlabel('Northing (meters)')
ax.set_ylabel(
# Add a custom legend
= [Patch(facecolor='none', edgecolor='red', linewidth=1.5, label='Thomas Fire Perimeter')]
legend_handles =legend_handles, loc='upper right')
ax.legend(handles
# PLot figure text
0, 0,
plt.figtext("Data: Satellite Imagery: Landsat data accessed via the server at UCSB. Thomas Fire Perimeter: California Department of Forestry and Fire Protection\nAccessed: November 21")
plt.show()
This map illustrates the Thomas Fire perimeter overlaid on a false color image created using shortwave infrared (SWIR), near-infrared (NIR), and red bands from the Landsat satellite imagery. SWIR (Shortwave Infrared) Band: Mapped to the red channel, SWIR is sensitive to moisture content and can distinguish between dry and wet areas, as well as highlight burn scars from wildfires. NIR (Near-Infrared) Band: Mapped to the green channel, NIR is useful for vegetation analysis because healthy vegetation reflects more near-infrared light. Red Band: Mapped to the blue channel, the red band provides detail and clarity to the overall image.
The Impact on AQI
How did the thomas fire effect AQI? Let’s read in the AQI data we talked about at earlier to see!
# Read in data, use compression = 'zip' for the ZIP file
= pd.read_csv('https://aqs.epa.gov/aqsweb/airdata/daily_aqi_by_county_2017.zip', compression = 'zip')
aqi_17 = pd.read_csv('https://aqs.epa.gov/aqsweb/airdata/daily_aqi_by_county_2018.zip', compression = 'zip') aqi_18
Data Exploration and Cleaning
Like we did before lets explore the data to see what we are working with.
# Do some data exploration
print(aqi_17.columns)
print(aqi_18.columns)
print(aqi_17.shape)
print(aqi_18.shape)
print(aqi_17.isnull().sum())
print(aqi_18.isnull().sum())
Index(['State Name', 'county Name', 'State Code', 'County Code', 'Date', 'AQI',
'Category', 'Defining Parameter', 'Defining Site',
'Number of Sites Reporting'],
dtype='object')
Index(['State Name', 'county Name', 'State Code', 'County Code', 'Date', 'AQI',
'Category', 'Defining Parameter', 'Defining Site',
'Number of Sites Reporting'],
dtype='object')
(326801, 10)
(327543, 10)
State Name 0
county Name 0
State Code 0
County Code 0
Date 0
AQI 0
Category 0
Defining Parameter 0
Defining Site 0
Number of Sites Reporting 0
dtype: int64
State Name 0
county Name 0
State Code 0
County Code 0
Date 0
AQI 0
Category 0
Defining Parameter 0
Defining Site 0
Number of Sites Reporting 0
dtype: int64
For the purpose of the analysis I only printed the exploration I found to be important to moving along in the analysis. Columnn names and null values are important for filtering and for computations like mean, which we will use for our AQI. Knowing how many rows are in dataframes will come in handy for stacking, which we will do next.
Now lets stack the 2017 and 2018 into one dataframe.Since we want to keep all observations for AQI, we will use concat to put them all together.
# Use .concat to glue the data frames togehter, ave it as variable aqi
= pd.concat([aqi_17, aqi_18])
aqi
# Check the first
aqi.shape
(654344, 10)
Since I know that each data set had around 325,000 rows, 654,344 is on par with what I expect from the concat function.
Now that we have the data all in one place, let’s clean the names to remove the space in some of the coumn names, filter for Santa Barbara, and change the date column to datetime
# Simplify column names
= (aqi.columns
aqi.columns str.lower()
.str.replace(' ','_')
.
)
# Check the output
aqi.columns
# Filter by county is equal to Santa Barbara, save as variable aqi_sb, use .drop to remove the columns listed above from the aqi_sb data frame
= aqi[aqi['county_name'] == 'Santa Barbara'].drop(['state_name',
aqi_sb 'county_name',
'state_code',
'county_code'], axis = 1)
# Use pd.to_datetime to change the 'date' column in the aqi_sb data frame
= pd.to_datetime(aqi_sb['date'])
aqi_sb.date
# Use set_index to make the 'date' column the index
= aqi_sb.set_index('date') aqi_sb
Calculate Rolling Average and Plot
When analyzing air quality data, it’s essential to smooth out short-term fluctuations to see clearer trends over time. One effective method is to calculate a rolling average, which provides a smoothed value by averaging over a specific period. Here’s one way to do that with .rolling()
# Calculate AQI rolling average over 5 days
= aqi_sb['aqi'].rolling('5D').mean()
rolling_average
# Call previous variable rolling_average and save it to a new column named five_day_average in the aqi_sb data frame
'five_day_average'] = rolling_average
aqi_sb[
# Check for the new column in the head
aqi_sb.head()
# Create the plot
=(14, 7))
plt.figure(figsize'aqi'], label='Daily AQI', color='blue', alpha=0.5)
plt.plot(aqi_sb.index, aqi_sb[='5-Day Rolling Average', color='red', linewidth=2)
plt.plot(aqi_sb.index, rolling_average, label
# Highlight the period of the Thomas Fire in December 2017
'2017-12-04'), pd.Timestamp('2017-12-23'), color='yellow', alpha=0.5, label='Thomas Fire Period')
plt.axvspan(pd.Timestamp(
# Add titles and labels
'Daily AQI and 5-Day Rolling Average for Santa Barbara County')
plt.title('Date')
plt.xlabel('AQI')
plt.ylabel(
plt.legend()True)
plt.grid(
plt.show()
Wrapping Up
In this plot, the Thomas Fire dates are highlighted by the yellow rectangle. It is clear that the AQI went through the roof during this period, a spike that was anticipated given the nature of forest fires. Such an event underscores the immediate effects of wildfires on air quality, posing significant health risks to nearby communities and emphasizing the need for robust environmental monitoring and emergency response strategies. By analyzing these trends through false color imagery and AQI data, we gain invaluable insights into the environmental and public health challenges posed by such disasters, reinforcing the importance of sustainable land management and fire prevention measures.