CMIP 6 Processing

Currently the CMIP6 data we have is in NetCDF format, which is not directly compatible with our model input requirements. Therefore, we need to preprocess this data to extract the relevant variables and reformat them appropriately.

Also there are >4 TB of data for all of New Zealand on a daily timestep, at a 5x5 km resolution. This is too large to process directly, so we will need to subset the data spatially and temporally.

Metadata

We have gathered metadata for the CMIP6 datasets available on the shared drive, including information on the variables, scenarios, models, and bias correction methods used.

Further details can be found here.

File access

The CMIP6 data is stored on a shared drive (Z: drive) accessible within our institutional network. The files are organized by variable, scenario, model, region, timestep, spatial resolution, and bias correction method e.g “hurs_historical_ACCESS-CM2_CCAM_daily_NZ5km_raw.nc”. This corresponds to relative humidity (hurs) for the historical scenario from the ACCESS-CM2 GCM, downscaled using the CCAM model over New Zealand at a daily timestep and 5 km spatial resolution, without bias correction.

Loading and reading data from these NetCDF files can be done using the ncdf4 package in R. However, reading large datasets can be time-consuming, so we have implemented a function which uses a grid reference system to efficiently extract only the required data points.

This provided as the VCSN agent data. This contains a grid of points at 5 km resolution covering New Zealand, with each point having a unique coordinate index. We use this grid to identify which points fall within our area of interest, and then extract only those points from the NetCDF files.

Create an area of interest

If you do not have a shapefile of your area of interest (AOI), you can easily generate one with the mapedit package.

x <- mapedit::drawFeatures()

Save the drawn features for future use:

saveRDS(x, here::here("data", "processed", "rotorua_area.rds"))

Processing function

The process_cmip6 function takes in a spatial polygon (defining the area of interest), the VCSN grid points, the path to a NetCDF file, and an output file path. It extracts the relevant data points from the NetCDF file based on the area of interest, and saves the processed data to a new NetCDF file.

Function Parameters for process_cmip6()
Parameter Description
x sf polygon
vcsn_grid_points sf points of VCSN grid
file path to input netCDF file
outfile path to output netCDF file
source(here::here("R", "process_cmip6.R"))
process_cmip6(x = x, vcsn_grid_points = vcsn_grid_points,
              file = file, outfile = outfile)