Using 'open virtual dataset' capability to work with TEMPO Level 3 data¶
Summary¶
In this tutorial, we will use the earthaccess.open_virtual_mfdataset()
function to open a week's worth of granules from the Nitrogen Dioxide (NO2) Level-3 data collection of the TEMPO air quality mission (link). We will then calculate means for a subset of the data and visualize the results.
Note that this same approach can be used for a date range of any length, within the mission's duration. Running this notebook for a year's worth of TEMPO Level-3 data took approximately 15 minutes.
Prerequisites¶
AWS US-West-2 Environment: This tutorial has been designed to run in an AWS cloud compute instance in AWS region us-west-2. However, if you want to run it from your laptop or workstation, everything should work just fine but without the speed benefits of in-cloud access.
Earthdata Account: A (free!) Earthdata Login account is required to access data from the NASA Earthdata system. Before requesting TEMPO data, we first need to set up our Earthdata Login authentication, as described in the Earthdata Cookbook's earthaccess tutorial (link).
Packages:
cartopy
dask
earthaccess
version 0.14.0 or greatermatplotlib
numpy
xarray
Setup¶
import cartopy.crs as ccrs
import earthaccess
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from matplotlib import rcParams
%config InlineBackend.figure_format = 'jpeg'
rcParams["figure.dpi"] = (
80 # Reduce figure resolution to keep the saved size of this notebook low.
)
Login using the Earthdata Login¶
auth = earthaccess.login()
if not auth.authenticated:
# Ask for credentials and persist them in a .netrc file.
auth.login(strategy="interactive", persist=True)
print(earthaccess.__version__)
0.14.0
Search for data granules¶
We search for TEMPO Nitrogen Dioxide (NO2) data for a week-long period (note: times are in UTC), between January 11th and 18th, 2024.
results = earthaccess.search_data(
short_name="TEMPO_NO2_L3",
version="V03",
temporal=("2024-01-11 12:00", "2024-01-18 12:00"),
)
print(f"Number of results: {len(results)}")
Number of results: 81
Opening Virtual Multifile Datasets¶
First we set the argument options to be used by earthaccess.open_virtual_mfdataset
.
load
argument considerations:
load=True
works. Withinearthaccess.open_virtual_mfdataset
, a temporary virtual reference file (a "virtual dataset") is created and then immediately loaded with kerchunk. This is because the function assumes the user is making this request for the first time and the combined manifest file needs to be generated first. In the future, however,earthaccess.open_virtual_mfdataset
may provide a way to save the combined manifest file, at which point you could then avoid repeating these steps, and proceed directly to loading with kerchunk/virtualizarr.load=False
results inKeyError: "no index found for coordinate 'longitude'"
because it createsManifestArray
s without indexes (see the earthaccess documentation here (link))
open_options = {
"access": "direct",
"load": True,
"concat_dim": "time",
"coords": "minimal",
"compat": "override",
"combine_attrs": "override",
"parallel": True,
}
Because TEMPO data are processed and archived in a netCDF4 format using a group hierarchy, we open each group ā i.e., 'root', 'product', and 'geolocation' ā and then afterwards merge them together.
%%time
result_root = earthaccess.open_virtual_mfdataset(granules=results, **open_options)
result_product = earthaccess.open_virtual_mfdataset(
granules=results, group="product", **open_options
)
result_geolocation = earthaccess.open_virtual_mfdataset(
granules=results, group="geolocation", **open_options
)
CPU times: user 2.96 s, sys: 250 ms, total: 3.21 s Wall time: 25.8 s
Merge root groups with subgroups.
result_merged = xr.merge([result_root, result_product, result_geolocation])
result_merged
<xarray.Dataset> Size: 81GB Dimensions: (latitude: 2950, longitude: 7750, time: 81) Coordinates: * latitude (latitude) float32 12kB 14.01 ..... * longitude (longitude) float32 31kB -168.0 ... * time (time) datetime64[ns] 648B 2024-... Data variables: weight (time, latitude, longitude) float32 7GB dask.array<chunksize=(1, 590, 1550), meta=np.ndarray> main_data_quality_flag (time, latitude, longitude) float32 7GB dask.array<chunksize=(1, 984, 2584), meta=np.ndarray> vertical_column_stratosphere (time, latitude, longitude) float64 15GB dask.array<chunksize=(1, 738, 1938), meta=np.ndarray> vertical_column_troposphere (time, latitude, longitude) float64 15GB dask.array<chunksize=(1, 738, 1938), meta=np.ndarray> vertical_column_troposphere_uncertainty (time, latitude, longitude) float64 15GB dask.array<chunksize=(1, 738, 1938), meta=np.ndarray> relative_azimuth_angle (time, latitude, longitude) float32 7GB dask.array<chunksize=(1, 984, 2584), meta=np.ndarray> solar_zenith_angle (time, latitude, longitude) float32 7GB dask.array<chunksize=(1, 984, 2584), meta=np.ndarray> viewing_zenith_angle (time, latitude, longitude) float32 7GB dask.array<chunksize=(1, 984, 2584), meta=np.ndarray> Attributes: (12/40) history: 2024-08-10T19:20:11Z: L2_regrid -v /tem... scan_num: 2 time_coverage_start: 2024-01-11T12:56:25Z time_coverage_end: 2024-01-11T13:36:11Z time_coverage_start_since_epoch: 1389013003.147965 time_coverage_end_since_epoch: 1389015389.7366676 ... ... title: TEMPO Level 3 nitrogen dioxide product collection_shortname: TEMPO_NO2_L3 collection_version: 1 keywords: EARTH SCIENCE>ATMOSPHERE>AIR QUALITY>NI... summary: Nitrogen dioxide Level 3 files provide ... coremetadata: \nGROUP = INVENTORYMET...
Temporal Mean - a map showing an annual average¶
temporal_mean_ds = (
result_merged.sel({"longitude": slice(-78, -74), "latitude": slice(35, 39)})
.where(result_merged["main_data_quality_flag"] == 0)
.mean(dim=("time"))
)
temporal_mean_ds
<xarray.Dataset> Size: 2MB Dimensions: (latitude: 200, longitude: 200) Coordinates: * latitude (latitude) float32 800B 35.01 ..... * longitude (longitude) float32 800B -77.99 ... Data variables: weight (latitude, longitude) float32 160kB dask.array<chunksize=(130, 150), meta=np.ndarray> main_data_quality_flag (latitude, longitude) float32 160kB dask.array<chunksize=(200, 200), meta=np.ndarray> vertical_column_stratosphere (latitude, longitude) float64 320kB dask.array<chunksize=(200, 200), meta=np.ndarray> vertical_column_troposphere (latitude, longitude) float64 320kB dask.array<chunksize=(200, 200), meta=np.ndarray> vertical_column_troposphere_uncertainty (latitude, longitude) float64 320kB dask.array<chunksize=(200, 200), meta=np.ndarray> relative_azimuth_angle (latitude, longitude) float32 160kB dask.array<chunksize=(200, 200), meta=np.ndarray> solar_zenith_angle (latitude, longitude) float32 160kB dask.array<chunksize=(200, 200), meta=np.ndarray> viewing_zenith_angle (latitude, longitude) float32 160kB dask.array<chunksize=(200, 200), meta=np.ndarray>
%%time
mean_vertical_column_trop = temporal_mean_ds["vertical_column_troposphere"].compute()
CPU times: user 4.6 s, sys: 1.76 s, total: 6.36 s Wall time: 22.5 s
fig, ax = plt.subplots(subplot_kw={"projection": ccrs.PlateCarree()})
mean_vertical_column_trop.squeeze().plot.contourf(ax=ax)
ax.coastlines()
ax.gridlines(
draw_labels=True,
dms=True,
x_inline=False,
y_inline=False,
)
<cartopy.mpl.gridliner.Gridliner at 0x7e5cd16bdf50>
/home/docs/checkouts/readthedocs.org/user_builds/earthaccess/envs/1021/lib/python3.11/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naturalearth.s3.amazonaws.com/10m_physical/ne_10m_coastline.zip warnings.warn(f'Downloading: {url}', DownloadWarning)
Spatial mean - a time series of area averages¶
spatial_mean_ds = (
result_merged.sel({"longitude": slice(-78, -74), "latitude": slice(35, 39)})
.where(result_merged["main_data_quality_flag"] == 0)
.mean(dim=("longitude", "latitude"))
)
spatial_mean_ds
<xarray.Dataset> Size: 4kB Dimensions: (time: 81) Coordinates: * time (time) datetime64[ns] 648B 2024-... Data variables: weight (time) float32 324B dask.array<chunksize=(1,), meta=np.ndarray> main_data_quality_flag (time) float32 324B dask.array<chunksize=(1,), meta=np.ndarray> vertical_column_stratosphere (time) float64 648B dask.array<chunksize=(1,), meta=np.ndarray> vertical_column_troposphere (time) float64 648B dask.array<chunksize=(1,), meta=np.ndarray> vertical_column_troposphere_uncertainty (time) float64 648B dask.array<chunksize=(1,), meta=np.ndarray> relative_azimuth_angle (time) float32 324B dask.array<chunksize=(1,), meta=np.ndarray> solar_zenith_angle (time) float32 324B dask.array<chunksize=(1,), meta=np.ndarray> viewing_zenith_angle (time) float32 324B dask.array<chunksize=(1,), meta=np.ndarray>
%%time
spatial_mean_vertical_column_trop = spatial_mean_ds[
"vertical_column_troposphere"
].compute()
CPU times: user 4.53 s, sys: 1.88 s, total: 6.42 s Wall time: 16.1 s
spatial_mean_vertical_column_trop.plot()
plt.show()
Single scan subset¶
subset_ds = result_merged.sel(
{
"longitude": slice(-78, -74),
"latitude": slice(35, 39),
"time": slice(
np.datetime64("2024-01-11T13:00:00"), np.datetime64("2024-01-11T14:00:00")
),
}
).where(result_merged["main_data_quality_flag"] == 0)
subset_ds
<xarray.Dataset> Size: 2MB Dimensions: (time: 1, latitude: 200, longitude: 200) Coordinates: * latitude (latitude) float32 800B 35.01 ..... * longitude (longitude) float32 800B -77.99 ... * time (time) datetime64[ns] 8B 2024-01... Data variables: weight (time, latitude, longitude) float32 160kB dask.array<chunksize=(1, 130, 150), meta=np.ndarray> main_data_quality_flag (time, latitude, longitude) float32 160kB dask.array<chunksize=(1, 200, 200), meta=np.ndarray> vertical_column_stratosphere (time, latitude, longitude) float64 320kB dask.array<chunksize=(1, 200, 200), meta=np.ndarray> vertical_column_troposphere (time, latitude, longitude) float64 320kB dask.array<chunksize=(1, 200, 200), meta=np.ndarray> vertical_column_troposphere_uncertainty (time, latitude, longitude) float64 320kB dask.array<chunksize=(1, 200, 200), meta=np.ndarray> relative_azimuth_angle (time, latitude, longitude) float32 160kB dask.array<chunksize=(1, 200, 200), meta=np.ndarray> solar_zenith_angle (time, latitude, longitude) float32 160kB dask.array<chunksize=(1, 200, 200), meta=np.ndarray> viewing_zenith_angle (time, latitude, longitude) float32 160kB dask.array<chunksize=(1, 200, 200), meta=np.ndarray> Attributes: (12/40) history: 2024-08-10T19:20:11Z: L2_regrid -v /tem... scan_num: 2 time_coverage_start: 2024-01-11T12:56:25Z time_coverage_end: 2024-01-11T13:36:11Z time_coverage_start_since_epoch: 1389013003.147965 time_coverage_end_since_epoch: 1389015389.7366676 ... ... title: TEMPO Level 3 nitrogen dioxide product collection_shortname: TEMPO_NO2_L3 collection_version: 1 keywords: EARTH SCIENCE>ATMOSPHERE>AIR QUALITY>NI... summary: Nitrogen dioxide Level 3 files provide ... coremetadata: \nGROUP = INVENTORYMET...
%%time
subset_vertical_column_trop = subset_ds["vertical_column_troposphere"].compute()
CPU times: user 69.6 ms, sys: 19.5 ms, total: 89.1 ms Wall time: 211 ms
fig, ax = plt.subplots(subplot_kw={"projection": ccrs.PlateCarree()})
subset_vertical_column_trop.squeeze().plot.contourf(ax=ax)
ax.coastlines()
ax.gridlines(
draw_labels=True,
dms=True,
x_inline=False,
y_inline=False,
)
plt.show()