There are four dependencies used in this demo. The most important is the RNetCDF package. This is a low-level R interface to the file format NetCDF designed by Unidata. Other packages to interact with NetCDF in R are ncdf4 and tidync.
The rest of libraries are used as helpers.
library(RNetCDF)
library(readr)
library(dplyr)
library(glue)
This is a fragment of a real data set, adjusted to serve as a small example. The names of the columns are Darwin Core terms
dataset <- read_csv("./data/raw/dataset.csv")
#> Rows: 10 Columns: 7
#> ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): scientificName, scientificNameID
#> dbl (4): decimalLongitude, decimalLatitude, AphiaID, occurrenceStatus
#> date (1): eventDate
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dataset
#> # A tibble: 10 × 7
#> decimalLongitude decimalLatitude eventDate scientificName scientificNameID AphiaID occurrenceStatus
#> <dbl> <dbl> <date> <chr> <chr> <dbl> <dbl>
#> 1 -5.70 56.2 2019-06-17 Axinella infundibuliformis urn:lsid:marinespecies.org:taxname:132480 132480 1
#> 2 -5.27 51.7 2019-06-17 Axinella infundibuliformis urn:lsid:marinespecies.org:taxname:132480 132480 0
#> 3 -5.27 51.7 2019-06-17 Axinella infundibuliformis urn:lsid:marinespecies.org:taxname:132480 132480 0
#> 4 -44.2 -24.3 2019-06-17 Pachastrella monilifera urn:lsid:marinespecies.org:taxname:134078 134078 1
#> 5 -23.5 14.9 1986-08-22 Pheronema urn:lsid:marinespecies.org:taxname:132098 132098 0
#> 6 -24.6 15.0 1986-08-24 Pheronema urn:lsid:marinespecies.org:taxname:132098 132098 1
#> 7 9.62 67.5 2019-06-17 Thenea levis urn:lsid:marinespecies.org:taxname:134105 134105 0
#> 8 10.3 67.9 2019-06-17 Thenea levis urn:lsid:marinespecies.org:taxname:134105 134105 0
#> 9 9.52 67.3 2019-06-17 Thenea levis urn:lsid:marinespecies.org:taxname:134105 134105 1
#> 10 -5.74 57.1 2019-06-17 Axinella infundibuliformis urn:lsid:marinespecies.org:taxname:132480 132480 0
The variable of interest is occurrenceStatus
. It
has either presences (1) or absences (0)
It can be considered that this variable is defined by four
dimensions: Spatially, these are
decimalLongitude
and decimalLatitude
.
Timewise is the column eventDate
.
The fourth dimension that defines the variable of interest here is
the taxon, which is shown in the columns
scientificName
, scientificNameID
and
AphiaID
. The first is the scientific name of the taxa,
while scientificNameID
has the Life Science Identifier
(LSID) as an uri. The column AphiaID
is the unique
identifier for a taxon name used in the World Record of Marine
Species.
eventDate
The eventDate
must be transformed to temporal
amounts. These are seconds, days, years etc since a certain
date. There is a helper in the RNetCDF package:
uitinvcal.nc()
In this demo, the units will be
days since 1970-01-01 00:00:00
.
dataset$time <- utinvcal.nc(
unitstring = "days since 1970-01-01 00:00:00" ,
value = as.POSIXct(dataset$eventDate, tz = "UTC")
)
unique(dataset$eventDate)
#> [1] "2019-06-17" "1986-08-22" "1986-08-24"
unique(dataset$time)
#> [1] 18064 6077 6079
The key to transform a data frame to netcdf, is that the data
must be transformed into a 4D array. To do that, all
the possible combinations of the dimensions must be attached.
This will coerce NA
or empty values in the variable of
interest as there will be no value for these combinations.
The base function expand.grid()
allows to pass vectors
and create a data frame with all the possible combinations of those
vectors.
# First add an unique identifier by the combination of:
# decimaLongitude, decimalLatitude, eventDate and AphiaID
dataset <- dataset %>%
mutate(
id = glue("{decimalLongitude}-{decimalLatitude}-{time}-{AphiaID}")
)
# Extract the unique and sorted values of the 4 dimensions
lon = sort(unique(dataset$decimalLongitude))
lat = sort(unique(dataset$decimalLatitude))
time = sort(unique(dataset$time))
# Taxon will be put in a new data frame
taxon <- tibble(
aphiaid = dataset$AphiaID,
taxon_name = dataset$scientificName,
taxon_lsid = dataset$scientificNameID) %>%
distinct() %>%
arrange(aphiaid)
# Use expand.grid() to create a data frame with all the possible
# combinations of the 4 dimensions
longer <- expand.grid(lon = lon, lat = lat, time = time,
aphiaid = taxon$aphiaid,
stringsAsFactors = FALSE)
# Define unique identifier again and merge the variable occurrenceStatus
# with presences and absences
dataset <- dataset %>%
select(id, occurrenceStatus)
longer <- longer %>%
mutate(
id = glue("{lon}-{lat}-{time}-{aphiaid}")
) %>%
left_join(dataset) %>%
select(-id)
#> Joining with `by = join_by(id)`
# Save for later
write_csv(longer, "./data/derived/longer.csv")
# Inspect
longer
#> # A tibble: 1,200 × 5
#> lon lat time aphiaid occurrenceStatus
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -44.2 -24.3 6077 132098 NA
#> 2 -24.6 -24.3 6077 132098 NA
#> 3 -23.5 -24.3 6077 132098 NA
#> 4 -5.74 -24.3 6077 132098 NA
#> 5 -5.70 -24.3 6077 132098 NA
#> 6 -5.27 -24.3 6077 132098 NA
#> 7 -5.27 -24.3 6077 132098 NA
#> 8 9.52 -24.3 6077 132098 NA
#> 9 9.62 -24.3 6077 132098 NA
#> 10 10.3 -24.3 6077 132098 NA
#> # ℹ 1,190 more rows
NetCDF is designed to host array-oriented scientific
data (ref).
Transforming into an object of type array
in R will allow
the easiest way to add data into a NetCDF file.
In the demo dataset, we know that the total length of each dimension are:
These lengths will be passed to the R function array()
as a vector: c(10, 10, 3, 4)
. We will also pass the
variable of interest longer$occurrenceStatus
. However, this
variable must have as length the product of all lengths of all
dimensions: This is why we got all the combinations of the
dimensions via expand.grid()
.
# The product of the lengths of all dimensions
10 * 10 * 3 * 4
#> [1] 1200
# Is the same as the length of the variable of interest, including all
# possible combinations of the dimensions even if this coerce NA's
length(longer$occurrenceStatus)
#> [1] 1200
# Create array
array <- array(
data = longer$occurrenceStatus,
dim = c(10, 10, 3, 4)
)
Note that this won’t always be possible as adding all the possible combinations makes the variable of interest grow exponentially. In R, vectors, data frames and arrays are read into memory. It is common that your machine won’t have enough memory to handle such amount of data.
But worry not: There are workarounds. These are discussed later below in this demo.
The first step is to create a netcdf file that will
work as a placeholder to add the data. We use the R package
RNetCDF
# Create nc file
nc <- create.nc("./data/derived/foo.nc")
Each dimensions has an homonymous variable assigned along with number of attributes. This is a technical requirement. The dimensions must have the length of the unique values of each dimension.
Some extra attributes must be added to meet the CF-Convention.
In addition, the homonymous variable defining the dimensions
must have their own data. The data are passed as a 1D
vector to var.put.nc()
, specifying in which variable has to
be written.
# Define lon dimension
dim.def.nc(nc, dimname = "lon", dimlength = length(lon))
# Define lon variable
var.def.nc(nc, varname = "lon", vartype = "NC_DOUBLE", dimensions = "lon")
# Add attributes
att.put.nc(nc, variable = "lon", name = "units", type = "NC_CHAR", value = "degrees_east")
att.put.nc(nc, variable = "lon", name = "standard_name", type = "NC_CHAR", value = "longitude")
att.put.nc(nc, variable = "lon", name = "long_name", type = "NC_CHAR", value = "Longitude")
# Put data
var.put.nc(nc, variable = "lon", data = lon)
# Check
var.get.nc(nc, variable = "lon")
#> [1] -44.16500 -24.63330 -23.53330 -5.73876 -5.70009 -5.26890 -5.26569 9.52050 9.61616 10.34300
# Define lat dimension
dim.def.nc(nc, dimname = "lat", dimlength = length(lat))
# Define lat variable
var.def.nc(nc, varname = "lat", vartype = "NC_DOUBLE", dimensions = "lat")
# Add attributes
att.put.nc(nc, variable = "lat", name = "units", type = "NC_CHAR", value = "degrees_north")
att.put.nc(nc, variable = "lat", name = "standard_name", type = "NC_CHAR", value = "latitude")
att.put.nc(nc, variable = "lat", name = "long_name", type = "NC_CHAR", value = "Latitude")
# Put data
var.put.nc(nc, variable = "lat", data = lat)
# Check
var.get.nc(nc, variable = "lat")
#> [1] -24.34700 14.88330 14.95000 51.69739 51.70180 56.24024 57.11223 67.31833 67.53900 67.85716
# Define time dimension
dim.def.nc(nc, dimname = "time", dimlength = length(time))
# Define time variable
var.def.nc(nc, varname = "time", vartype = "NC_DOUBLE", dimensions = "time")
# Add attributes
att.put.nc(nc, variable = "time", name = "standard_name", type = "NC_CHAR", value = "time")
att.put.nc(nc, variable = "time", name = "long_name", type = "NC_CHAR", value = "Time")
att.put.nc(nc, variable = "time", name = "units", type = "NC_CHAR", value = "days since 1970-01-01 00:00:00")
att.put.nc(nc, variable = "time", name = "calendar", type = "NC_CHAR", value = "gregorian")
# Put data
var.put.nc(nc, variable = "time", data = time)
# Check
var.get.nc(nc, variable = "time")
#> [1] 6077 6079 18064
To define the taxon dimension, there will be one dimension called
aphiaid
, with three variables assigned:
aphiaid
: the actual aphiaid as an integertaxon_name
: the scientific name of the taxataxon_lsid
: the LSID (unique identifier) of the
taxaThese are technical requirements both for ERDDAP and to meet the CF-Convention.
The dimension aphiaid
will be used later to define the
variable of interest. It is a better practice to use the AphiaID instead
of, for instance, the scientific name in taxon_name
,
because NetCDF works better with numeric data types than with
characters.
Note that character variables in NetCDF require to be defined also by a dimension typically called “string”. Its length is the total number of characters that can be hosted in the variable of type character. This is a requirement of NetCDF4.
# Define the aphia and string80 dimensions
dim.def.nc(nc, dimname = "aphiaid", dimlength = nrow(taxon))
dim.def.nc(nc, dimname = "string80", dimlength = 80)
# Add aphiaid variable and attributes
var.def.nc(nc, varname = "aphiaid", vartype = "NC_INT", dimensions = "aphiaid")
att.put.nc(nc, variable = "aphiaid", name = "long_name", type = "NC_CHAR", value = "Life Science Identifier - World Register of Marine Species")
att.put.nc(nc, variable = "aphiaid", name = "units", type = "NC_CHAR", value = "level")
# Put aphiaid data
var.put.nc(nc, variable = "aphiaid", data = taxon$aphiaid)
# Check
var.get.nc(nc, variable = "aphiaid")
#> [1] 132098 132480 134078 134105
# Add taxon_name variable and attributes
var.def.nc(nc, varname = "taxon_name", vartype = "NC_CHAR", dimension = c("string80", "aphiaid"))
att.put.nc(nc, variable = "taxon_name", name = "standard_name", type = "NC_CHAR", value = "biological_taxon_name")
att.put.nc(nc, variable = "taxon_name", name = "long_name", type = "NC_CHAR", value = "Scientific name of the taxa")
# Put taxon_name data
var.put.nc(nc, variable = "taxon_name", data = taxon$taxon_name)
# Check
var.get.nc(nc, variable = "taxon_name")
#> [1] "Pheronema" "Axinella infundibuliformis" "Pachastrella monilifera" "Thenea levis"
# Add taxon_lsid variable and attributes
var.def.nc(nc, varname = "taxon_lsid", vartype = "NC_CHAR", dimension = c("string80", "aphiaid"))
att.put.nc(nc, variable = "taxon_lsid", name = "standard_name", type = "NC_CHAR", value = "biological_taxon_lsid")
att.put.nc(nc, variable = "taxon_lsid", name = "long_name", type = "NC_CHAR", value = "Life Science Identifier - World Register of Marine Species")
# Put taxon_name data
var.put.nc(nc, variable = "taxon_lsid", data = taxon$taxon_name)
# Check
var.get.nc(nc, variable = "taxon_lsid")
#> [1] "Pheronema" "Axinella infundibuliformis" "Pachastrella monilifera" "Thenea levis"
A non-dimensional variable will be defined to host all the info about the Coordinate Reference System (CRS). This is useful for some GIS software.
It is assumed that the CRS is WGS84. If it was
different: transform decimalLatitude
and
decimalLongitude
to WGS84 before hand. See R package
sf
.
# Define non-dimensional crs variable
var.def.nc(nc, varname = "crs", vartype = "NC_CHAR", dimensions = NA)
# Add attributes
att.put.nc(nc, variable = "crs", name = "long_name", type = "NC_CHAR", value = "Coordinate Reference System")
att.put.nc(nc, variable = "crs", name = "geographic_crs_name", type = "NC_CHAR", value = "WGS 84")
att.put.nc(nc, variable = "crs", name = "grid_mapping_name", type = "NC_CHAR", value = "latitude_longitude")
att.put.nc(nc, variable = "crs", name = "reference_ellipsoid_name", type = "NC_CHAR", value = "WGS 84")
att.put.nc(nc, variable = "crs", name = "horizontal_datum_name", type = "NC_CHAR", value = "WGS 84")
att.put.nc(nc, variable = "crs", name = "prime_meridian_name", type = "NC_CHAR", value = "Greenwich")
att.put.nc(nc, variable = "crs", name = "longitude_of_prime_meridian", type = "NC_DOUBLE", value = 0.)
att.put.nc(nc, variable = "crs", name = "semi_major_axis", type = "NC_DOUBLE", value = 6378137.)
att.put.nc(nc, variable = "crs", name = "semi_minor_axis", type = "NC_DOUBLE", value = 6356752.314245179)
att.put.nc(nc, variable = "crs", name = "inverse_flattening", type = "NC_DOUBLE", value = 298.257223563)
att.put.nc(nc, variable = "crs", name = "spatial_ref", type = "NC_CHAR", value = 'GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]')
att.put.nc(nc, variable = "crs", name = "GeoTransform", type = "NC_CHAR", value = '-180 0.08333333333333333 0 90 0 -0.08333333333333333 ')
The variable to add the presence/absence data must be defined
by the four dimensions considered in this demo. This is passed
to var.def.nc()
in the argument dimensions
as
a vector containing the names of the dimensions.
The values stating presences and absences are 1 and 0, hence the variable must be of type integer.
Some other attributes are added. E.g. _FillValue
is a
requirement for the CF-Convention stating what value will be used in
case of NULL or NA. This is typically -99999
. The attribute
long_name
is free text and it describes the variable.
# Create the presence_absence variable defined by the four dimensions
var.def.nc(nc, varname = "presence_absence", vartype = "NC_INT", dimensions = c("lon", "lat", "time", "aphiaid"))
# Add attributes
att.put.nc(nc, variable = "presence_absence", name = "_FillValue", type = "NC_INT", value = -99999)
att.put.nc(nc, variable = "presence_absence", name = "long_name", type = "NC_CHAR", value = "Probability of occurrence of biological entity")
If there were a standard name for such a variable in the CF-Convention, this would be add to
an attribute named standard_name
.
As we previously created a 4D array, we can pass this directly to
var.put.nc()
var.put.nc(nc, variable = "presence_absence", data = array)
Adding the data from a data set or a vector is also possible by using
the parameters start
and count
in
var.put.nc()
** In start
, each number
indicates the index from which you start adding data
for each variable. ** In count
, each number indicates
the length of the data added for each variable.
In this demo, as we have a 4D variable, the vectors will be of length
4. We can add all the data at once by setting each start index
as 1, and count as the length of each dimension. Remember the
length of lon is 10, lat is 10, time is 3 andtaxon is 4.
start
is set as c(1, 1, 1, 1)
as we add all
the date from the beginning of each dimension.
var.put.nc(nc, variable = "presence_absence",
data = longer$occurrenceStatus,
start = c(1, 1, 1, 1),
count = c(10, 10, 3, 4))
However, out in the wild the data product will become very large when adding all possible combinations of the dimensions. See how the demo data set of only 10 rows exploded up to 1200 rows when adding all possible combinations of the four dimensions
nrow(dataset)
#> [1] 10
nrow(longer)
#> [1] 1200
Imagine what would happen with a real data set with thousands of rows!
Typically you won’t have enough memory in your
computer to create such a large array. To work around
this issue, you can add data in chunks by setting
correctly start
and count
For instance, you can loop over the data to add one taxon per iteration.
# Iterate from 1 to 4 as there are 4 possible taxons
for(i in 1:4){
taxa_i <- taxon$aphiaid[i]
data_at_taxa_i <- subset(longer, longer$aphiaid == taxa_i)
var.put.nc(nc, variable = "presence_absence", data = data_at_taxa_i$occurrenceStatus, start = c(1, 1, 1, i), count = c(10, 10, 3, 1))
}
Note that the last element of start is set as i
, which
is the length of the dimension aphiaid
: 1 to 4
The last element of count
corresponds to the count of
aphiaid.
It is set as 1 because in each iteration only the
data of one taxon is put in the variable.
Furthermore, you can nest loops as much as you want to add the slice of data that you can handle at once. Below there is an extreme example of a four times nested loop, in which in each iteration only one data is added to the NetCDF file.
Remember to use the length of each dimension in the
for()
loop. Note that count
is set as
c(1, 1, 1, 1)
as only one data point is added in each
iteration. The argument start
contains the index of each
loop.
for(loni in 1:10){
for(lati in 1:10){
for(timei in 1:3){
for(taxai in 1:4){
# Read only one row per iteration
data_at_i <- longer %>%
filter(
aphiaid == taxon$aphiaid[taxai],
time == time[timei],
lat == lat[lati],
lon == lon[loni]
)
# Put the data into
var.put.nc(nc, variable = "presence_absence",
data = data_at_i$occurrenceStatus,
start = c(loni, lati, timei, taxai),
count = c(1, 1, 1, 1))
}
}
}
}
Note this is an extreme example and its performance is not optimal.
These loops are specially useful to turn large amounts of data into NetCDF as you can have your data saved into disk and iterate to read a certain part of it and put into netcdf at a time.
In next and last example, it is assumed that the data is too large to be read into memory. In this case, we can write a loop that reads a chunk of data that we know that R can handle, inmediately put it in the NetCDF file, and once it is saved there as NetCDF simply remove from memory.
You can read chunks of data from a .csv
file either with
readr::read_csv_chunked
or
sqldf::read.csv.sql
. We will use the first in this example.
Remember we saved the longer
data frame into
./data/derived/longer.csv
.
for(i in 1:4){
# Identify the taxa of interest based on the i index
taxa_i <- taxon$aphiaid[i]
# Read a chunk of data, subsetting to only the taxa of interest in each iteration
data_at_taxa_i <- read_csv_chunked(
file = './data/derived/longer.csv',
callback = DataFrameCallback$new(
function(x, pos) subset(x, aphiaid == taxa_i)
)
)
# Put the data into the NetCDF file
var.put.nc(nc, variable = "presence_absence",
data = data_at_taxa_i$occurrenceStatus,
start = c(1, 1, 1, i),
count = c(10, 10, 3, 1))
# Inmediately remove the dataset read in the iteration to free up memory
rm(data_at_taxa_i)
}
NetCDF files can host metadata in the form of global attributes. These contain information such as the author, data of creation, affiliation, citation or license.
See below an example of how defining a number of global attributes and adding them programatically.
Click on code
to unfold
attributes <- list(
title = "Example title",
summary = "This is the result of a demo to show how to turn a variable defined by four dimensions into a netcdf array.",
Conventions = "CF-1.8",
# id = "",
naming_authority = "emodnet-biology.eu",
history = "https://github.com/EMODnet/EMODnet-Biology-products-erddap-demo",
source = "https://github.com/EMODnet/EMODnet-Biology-products-erddap-demo",
# processing_level = "",
# comment = "",
# acknowledgment = "",
license = "CC-BY",
standard_name_vocabulary = "CF Standard Name Table v1.8",
date_created = as.character(Sys.Date()),
creator_name = "Salvador Fernandez",
creator_email = "salvador.fernandez@vliz.be",
creator_url = "www.vliz.be",
institution = "Flanders Marine Institute (VLIZ)",
project = "EMODnet-Biology",
publisher_name = "EMODnet-Biology",
publisher_email = "bio@emodnet.eu",
publisher_url = "www.emodnet-biology.eu",
# geospatial_bounds = "",
# geospatial_bounds_crs = "",
# geospatial_bounds_vertical_crs = "",
geospatial_lat_min = min(lat),
geospatial_lat_max = max(lat),
geospatial_lon_min = min(lon),
geospatial_lon_max = max(lon),
# geospatial_vertical_min = "",
# geospatial_vertical_max = "",
# geospatial_vertical_positive = "",
# time_coverage_start = "1911",
# time_coverage_end = "2016",
# time_coverage_duration = "",
# time_coverage_resolution = "",
# uuid = "",
# sea_name = "",
# creator_type = "",
creator_institution = "Flanders Marine Institute (VLIZ)",
# publisher_type = "",
publisher_institution = "Flanders Marine Institute (VLIZ)",
# program = "",
# contributor_name = "",
# contributor_role = "",
geospatial_lat_units = "degrees_north",
geospatial_lon_units = "degrees_east",
# geospatial_vertical_units = "",
# date_modified = "",
# date_issued = "",
# date_metadata_modified = "",
# product_version = "",
# keywords_vocabulary = "",
# platform = "",
# platform_vocabulary = "",
# instrument = "",
# instrument_vocabulary = "",
# featureType = "Point",
# metadata_link = "",
# references = "",
comment = "Uses attributes recommended by http://cfconventions.org",
license = "CC-BY",
publisher_name = "EMODnet Biology Data Management Team",
citation = "Fernández-Bejarano, Salvador. 2022. Create a EMODnet-Biology data product as NetCDF.",
acknowledgement = "European Marine Observation Data Network (EMODnet) Biology project (EMFF/2019/1.3.1.9/Lot 6/SI2.837974), funded by the European Union under Regulation (EU) No 508/2014 of the European Parliament and of the Council of 15 May 2014 on the European Maritime and Fisheries Fund"
)
# Define function that detects if the data type should be character of
# integer and add to global attributes
add_global_attributes <- function(nc, attributes){
stopifnot(is.list(attributes))
for(i in 1:length(attributes)){
if(is.character(attributes[[i]])){
type <- "NC_CHAR"
}else if(is.numeric(attributes[[i]])){
type <- "NC_DOUBLE"
}
att.put.nc(nc, variable = "NC_GLOBAL", name = names(attributes[i]), type = type, value = attributes[[i]])
}
sync.nc(nc)
}
# Add attributes
add_global_attributes(nc, attributes)
Congratulations! The NetCDF file containing the product has been created. Now you can inspect the file.
sync.nc(nc)
print.nc(nc)
#> netcdf classic {
#> dimensions:
#> lon = 10 ;
#> lat = 10 ;
#> time = 3 ;
#> aphiaid = 4 ;
#> string80 = 80 ;
#> variables:
#> NC_DOUBLE lon(lon) ;
#> NC_CHAR lon:units = "degrees_east" ;
#> NC_CHAR lon:standard_name = "longitude" ;
#> NC_CHAR lon:long_name = "Longitude" ;
#> NC_DOUBLE lat(lat) ;
#> NC_CHAR lat:units = "degrees_north" ;
#> NC_CHAR lat:standard_name = "latitude" ;
#> NC_CHAR lat:long_name = "Latitude" ;
#> NC_DOUBLE time(time) ;
#> NC_CHAR time:standard_name = "time" ;
#> NC_CHAR time:long_name = "Time" ;
#> NC_CHAR time:units = "days since 1970-01-01 00:00:00" ;
#> NC_CHAR time:calendar = "gregorian" ;
#> NC_INT aphiaid(aphiaid) ;
#> NC_CHAR aphiaid:long_name = "Life Science Identifier - World Register of Marine Species" ;
#> NC_CHAR aphiaid:units = "level" ;
#> NC_CHAR taxon_name(string80, aphiaid) ;
#> NC_CHAR taxon_name:standard_name = "biological_taxon_name" ;
#> NC_CHAR taxon_name:long_name = "Scientific name of the taxa" ;
#> NC_CHAR taxon_lsid(string80, aphiaid) ;
#> NC_CHAR taxon_lsid:standard_name = "biological_taxon_lsid" ;
#> NC_CHAR taxon_lsid:long_name = "Life Science Identifier - World Register of Marine Species" ;
#> NC_CHAR crs ;
#> NC_CHAR crs:long_name = "Coordinate Reference System" ;
#> NC_CHAR crs:geographic_crs_name = "WGS 84" ;
#> NC_CHAR crs:grid_mapping_name = "latitude_longitude" ;
#> NC_CHAR crs:reference_ellipsoid_name = "WGS 84" ;
#> NC_CHAR crs:horizontal_datum_name = "WGS 84" ;
#> NC_CHAR crs:prime_meridian_name = "Greenwich" ;
#> NC_DOUBLE crs:longitude_of_prime_meridian = 0 ;
#> NC_DOUBLE crs:semi_major_axis = 6378137 ;
#> NC_DOUBLE crs:semi_minor_axis = 6356752.31424518 ;
#> NC_DOUBLE crs:inverse_flattening = 298.257223563 ;
#> NC_CHAR crs:spatial_ref = "GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]" ;
#> NC_CHAR crs:GeoTransform = "-180 0.08333333333333333 0 90 0 -0.08333333333333333 " ;
#> NC_INT presence_absence(lon, lat, time, aphiaid) ;
#> NC_INT presence_absence:_FillValue = -99999 ;
#> NC_CHAR presence_absence:long_name = "Probability of occurrence of biological entity" ;
#>
#> // global attributes:
#> NC_CHAR :title = "Example title" ;
#> NC_CHAR :summary = "This is the result of a demo to show how to turn a variable defined by four dimensions into a netcdf array." ;
#> NC_CHAR :Conventions = "CF-1.8" ;
#> NC_CHAR :naming_authority = "emodnet-biology.eu" ;
#> NC_CHAR :history = "https://github.com/EMODnet/EMODnet-Biology-products-erddap-demo" ;
#> NC_CHAR :source = "https://github.com/EMODnet/EMODnet-Biology-products-erddap-demo" ;
#> NC_CHAR :license = "CC-BY" ;
#> NC_CHAR :standard_name_vocabulary = "CF Standard Name Table v1.8" ;
#> NC_CHAR :date_created = "2024-08-27" ;
#> NC_CHAR :creator_name = "Salvador Fernandez" ;
#> NC_CHAR :creator_email = "salvador.fernandez@vliz.be" ;
#> NC_CHAR :creator_url = "www.vliz.be" ;
#> NC_CHAR :institution = "Flanders Marine Institute (VLIZ)" ;
#> NC_CHAR :project = "EMODnet-Biology" ;
#> NC_CHAR :publisher_name = "EMODnet Biology Data Management Team" ;
#> NC_CHAR :publisher_email = "bio@emodnet.eu" ;
#> NC_CHAR :publisher_url = "www.emodnet-biology.eu" ;
#> NC_DOUBLE :geospatial_lat_min = -24.347 ;
#> NC_DOUBLE :geospatial_lat_max = 67.85716 ;
#> NC_DOUBLE :geospatial_lon_min = -44.165 ;
#> NC_DOUBLE :geospatial_lon_max = 10.343 ;
#> NC_CHAR :creator_institution = "Flanders Marine Institute (VLIZ)" ;
#> NC_CHAR :publisher_institution = "Flanders Marine Institute (VLIZ)" ;
#> NC_CHAR :geospatial_lat_units = "degrees_north" ;
#> NC_CHAR :geospatial_lon_units = "degrees_east" ;
#> NC_CHAR :comment = "Uses attributes recommended by http://cfconventions.org" ;
#> NC_CHAR :citation = "Fernández-Bejarano, Salvador. 2022. Create a EMODnet-Biology data product as NetCDF." ;
#> NC_CHAR :acknowledgement = "European Marine Observation Data Network (EMODnet) Biology project (EMFF/2019/1.3.1.9/Lot 6/SI2.837974), funded by the European Union under Regulation (EU) No 508/2014 of the European Parliament and of the Council of 15 May 2014 on the European Maritime and Fisheries Fund" ;
#> }
You can read the variables you have created with
var.get.nc
# Get variable, turn array into vector and get unique values
unique(c(var.get.nc(nc, variable = "presence_absence")))
#> [1] NA 0 1
If everything is correct, close the file. Remember it was saved into
./data/derived/foo.nc
close.nc(nc)