CMIP 6 Processing
Currently the CMIP6 data we have is in NetCDF format, which is not directly compatible with our model input requirements. Therefore, we need to preprocess this data to extract the relevant variables and reformat them appropriately.
Also there are >4 TB of data for all of New Zealand on a daily timestep, at a 5x5 km resolution. This is too large to process directly, so we will need to subset the data spatially and temporally.
Metadata
We have gathered metadata for the CMIP6 datasets available on the shared drive, including information on the variables, scenarios, models, and bias correction methods used.
Further details can be found here.
File access
The CMIP6 data is stored on a shared drive (Z: drive) accessible within our institutional network. The files are organized by variable, scenario, model, region, timestep, spatial resolution, and bias correction method e.g “hurs_historical_ACCESS-CM2_CCAM_daily_NZ5km_raw.nc”. This corresponds to relative humidity (hurs) for the historical scenario from the ACCESS-CM2 GCM, downscaled using the CCAM model over New Zealand at a daily timestep and 5 km spatial resolution, without bias correction.
Loading and reading data from these NetCDF files can be done using the ncdf4 package in R. However, reading large datasets can be time-consuming, so we have implemented a function which uses a grid reference system to efficiently extract only the required data points.
This provided as the VCSN agent data. This contains a grid of points at 5 km resolution covering New Zealand, with each point having a unique coordinate index. We use this grid to identify which points fall within our area of interest, and then extract only those points from the NetCDF files.
Create an area of interest
If you do not have a shapefile of your area of interest (AOI), you can easily generate one with the mapedit package.
x <- mapedit::drawFeatures()Save the drawn features for future use:
saveRDS(x, here::here("data", "processed", "rotorua_area.rds"))Processing function
The process_cmip6 function takes in a spatial polygon (defining the area of interest), the VCSN grid points, the path to a NetCDF file, and an output file path. It extracts the relevant data points from the NetCDF file based on the area of interest, and saves the processed data to a new NetCDF file.
| Parameter | Description |
|---|---|
x |
sf polygon |
vcsn_grid_points |
sf points of VCSN grid |
file |
path to input netCDF file |
outfile |
path to output netCDF file |
source(here::here("R", "process_cmip6.R"))
process_cmip6(x = x, vcsn_grid_points = vcsn_grid_points,
file = file, outfile = outfile)