Functions for working with station x taxon data.

getDensity takes a subset of density data, possibly at higher taxonomic level.

getProportion estimates proportional values from (a subset of) density data.

addAbsences adds taxon absences in density data in long format.

Usage

getDensity(descriptor, taxon, value, averageOver = NULL, 
        taxonomy = NULL, subset, wide.output = FALSE, 
        full.output = FALSE, verbose=FALSE)

getProportion(descriptor, taxon, value, averageOver = NULL, 
        taxonomy = NULL, verbose=FALSE)

addAbsences(descriptor, taxon, value, averageOver = NULL)

Arguments

descriptor: name(s) of the descriptor, i.e. *where* the data were taken, e.g. station names. Either a vector, a list, a data.frame or matrix (and with multiple columns). It can be of type numerical, character, or a factor. When a data.frame or a list the "names" will be used in the output; when a vector, the argument name will be used in the output. In theory, this could also be one number or NA; however, care needs to be taken in case this combined with subset and averageOver
taxon: vector describing *what* the data are; it gives the taxonomic name (e.g. species). Should be of the same length as (the number of rows of) descriptor. Can be a list (or data.frame with one column), or a vector. When a data.frame or a list the "name" will be used in the output; when a vector, the argument name will be used.
value: vector or list that contains the *values* of the data, usually density. Should be of the same length as (the number of rows of) descriptor and taxon. For function getDensity, value can also be a multi-column data.frame or matrix.
averageOver: vector with *replicates* over which averages need to be taken. Should be of the same length as (the number of rows of) descriptor.
subset: logical expression indicating elements to keep: missing values are taken as FALSE. If NULL, or absent, then all elements are used. Note that the subset is taken *after* the number of samples to average per descriptor is calculated, so this will also work for selecting certain taxa that may not be present in all replicates over which should be averaged.
taxonomy: taxonomic information; first column will be matched with taxon, regardless of its name.
full.output: when TRUE, will also return descriptors for which the value is 0. This can be relevant in case a selection is made for taxonomic composition. Note that the taxon in this case will be undefined.
wide.output: when TRUE, will recast the output in wide format (the default is long format). This only makes sense when value has multiple columns. In the wide format, each row holds the taxa for a descriptor, and each column holds the values for one taxon and all descriptors.
verbose: when TRUE, may write warnings to the screen.

Value

getDensity returns a data.frame combining descriptor, taxon, and value, in LONG format.

Unless argument full.output is TRUE, this will not contain absences (i.e. where value = 0), unless they were already present in the input.

The value consists of summed values for (descriptor, averageOver, taxon) combinations, followed by averaging over averageOver.

Author

Karline Soetaert <karline.soetaert@nioz.nl> Olivier Beauchard

Details

getDensity. In this function a summed value over taxa x descriptor is first calculated, and this value is then divided by the number of replicates (as in averageOver) per descriptor.

This is suitable for density and biomass data, but NOT for mean individual weight for instance.

Examples


## ====================================================
## A small dataset with replicates
## ====================================================

# 2 stations, 2 replicates for st.a, one replicate for st.b
Bdata.rep <- data.frame(
  station   = c("st.a","st.a","st.a","st.b","st.b","st.b"),
  replicate = c(     1,     1,    2,     1,     1,     1),
  species   = c("sp.1","sp.2","sp.1","sp.3","sp.4","sp.5"),
  density   = c(     1,     2,    3,     3,     1,     3)
)
Bdata.rep
#>   station replicate species density
#> 1    st.a         1    sp.1       1
#> 2    st.a         1    sp.2       2
#> 3    st.a         2    sp.1       3
#> 4    st.b         1    sp.3       3
#> 5    st.b         1    sp.4       1
#> 6    st.b         1    sp.5       3

# Proportion of each species
with (Bdata.rep, 
  getProportion (value       = density, 
                 descriptor  = data.frame(station, replicate), 
                 taxon       = species))
#>   station replicate taxon         p
#> 1    st.a         1  sp.1 0.3333333
#> 2    st.a         1  sp.2 0.6666667
#> 3    st.a         2  sp.1 1.0000000
#> 4    st.b         1  sp.3 0.4285714
#> 5    st.b         1  sp.4 0.1428571
#> 6    st.b         1  sp.5 0.4285714


##-----------------------------------------------------
## average of replicates
##-----------------------------------------------------

with (Bdata.rep, 
  getDensity (value       = density, 
              descriptor  = station, 
              taxon       = species, 
              averageOver = replicate))
#>   descriptor taxon value
#> 1       st.a  sp.1     2
#> 2       st.a  sp.2     1
#> 3       st.b  sp.3     3
#> 4       st.b  sp.4     1
#> 5       st.b  sp.5     3

# using named lists to have good column headings
with (Bdata.rep, 
  getDensity (value       = list(density=density), 
              descriptor  = list(station=station), 
              taxon       = species, 
              averageOver = replicate))
#>   station taxon density
#> 1    st.a  sp.1       2
#> 2    st.a  sp.2       1
#> 3    st.b  sp.3       3
#> 4    st.b  sp.4       1
#> 5    st.b  sp.5       3
                 
# estimating proportions
with (Bdata.rep, 
  getProportion (value    = list(density=density), 
              descriptor  = list(station=station), 
              taxon       = species, 
              averageOver = replicate))
#>   station taxon         p
#> 1    st.a  sp.1 0.6666667
#> 2    st.a  sp.2 0.3333333
#> 3    st.b  sp.3 0.4285714
#> 4    st.b  sp.4 0.1428571
#> 5    st.b  sp.5 0.4285714
                 
# averaging multiple value columns at once 
# extending the data with biomass - assume no biomass for st b

Bdata.rep$biomass = c(0.1, 0.2, 0.3, NA, NA, NA)
Bdata.rep
#>   station replicate species density biomass
#> 1    st.a         1    sp.1       1     0.1
#> 2    st.a         1    sp.2       2     0.2
#> 3    st.a         2    sp.1       3     0.3
#> 4    st.b         1    sp.3       3      NA
#> 5    st.b         1    sp.4       1      NA
#> 6    st.b         1    sp.5       3      NA

with (Bdata.rep, 
  getDensity (value       = list(density=density, biomass=biomass), 
              descriptor  = list(station=station), 
              taxon       = species, 
              averageOver = replicate))
#>   station taxon density biomass
#> 1    st.a  sp.1       2     0.2
#> 2    st.a  sp.2       1     0.1
#> 3    st.b  sp.3       3      NA
#> 4    st.b  sp.4       1      NA
#> 5    st.b  sp.5       3      NA

##-----------------------------------------------------
## Select information for one species
##-----------------------------------------------------

with (Bdata.rep, 
  getDensity (subset      = species=="sp.2",
              value       = density, 
              descriptor  = list(station=station), 
              taxon       = species, 
              averageOver = replicate))
#>   station taxon value
#> 1    st.a  sp.2     1

# returns also the 0 value 
with (Bdata.rep, 
  getDensity (subset      = species=="sp.2",
              value       = density, 
              descriptor  = list(station=station), 
              taxon       = species, 
              averageOver = replicate, 
              full.output = TRUE))
#>   station taxon value
#> 1    st.a  sp.2     1
#> 2    st.b  sp.2     0

##-----------------------------------------------------
## Extend the long format with absences
##-----------------------------------------------------

# take averages over replicates
with(Bdata.rep, 
  addAbsences (value      = density, 
               descriptor = cbind(station, replicate), 
               taxon      = species))
#>    station replicate taxon value
#> 1     st.a         1  sp.1     1
#> 2     st.b         1  sp.1     0
#> 3     st.a         2  sp.1     3
#> 4     st.a         1  sp.2     2
#> 5     st.b         1  sp.2     0
#> 6     st.a         2  sp.2     0
#> 7     st.a         1  sp.3     0
#> 8     st.b         1  sp.3     3
#> 9     st.a         2  sp.3     0
#> 10    st.a         1  sp.4     0
#> 11    st.b         1  sp.4     1
#> 12    st.a         2  sp.4     0
#> 13    st.a         1  sp.5     0
#> 14    st.b         1  sp.5     3
#> 15    st.a         2  sp.5     0

# keep replicates
with(Bdata.rep, 
  addAbsences (value       = density, 
               descriptor  = station, 
               taxon       = species, 
               averageOver = replicate))
#>    descriptor taxon value
#> 1        st.a  sp.1     2
#> 2        st.b  sp.1     0
#> 3        st.a  sp.2     1
#> 4        st.b  sp.2     0
#> 5        st.a  sp.3     0
#> 6        st.b  sp.3     3
#> 7        st.a  sp.4     0
#> 8        st.b  sp.4     1
#> 9        st.a  sp.5     0
#> 10       st.b  sp.5     3

## ====================================================
## A small dataset without replicates
## ====================================================

Bdata <- data.frame(
  station = c("st.a","st.a","st.b","st.b","st.b","st.c"),
  species = c("sp.1","sp.2","sp.1","sp.3","sp.4","sp.5"),
  density = c(1, 2, 3, 3, 1, 3)
)

## ====================================================
## Small dataset: taxonomy
## ====================================================

Btaxonomy <- data.frame(
  species = c("sp.1","sp.2","sp.3","sp.4","sp.5","sp.6"),
  genus   = c( "g.1", "g.2", "g.2", "g.2", "g.3", "g.4"),
  family  = c( "f.1", "f.1", "f.1", "f.1", "f.2", "f.3"),
  order   = c( "o.1", "o.1", "o.1", "o.1", "o.2", "o.2"),
  class   = c( "c.1", "c.1", "c.1", "c.1", "c.1", "c.1")
  )

##-----------------------------------------------------
## density on higher taxonomic level
##-----------------------------------------------------

# species density for a particular genus 
sp.g2 <- with (Bdata, 
  getDensity(descriptor = station, 
             taxon      = species,
             value      = density,
             taxonomy   = Btaxonomy,
             subset     = genus == "g.2")
              )

# select data for station st.a
Bselect <- with (Bdata, 
   getDensity (value      = density, 
               descriptor = station, 
               taxon      = species, 
               subset     = station=="st.a")
                )
Bselect
#>   descriptor taxon value
#> 1       st.a  sp.1     1
#> 2       st.a  sp.2     2

# pass taxonomy to select only species that belong to g.1
with (Bdata, 
   getDensity (value      = density, 
               descriptor = station, 
               taxon      = species, 
               taxonomy   = Btaxonomy, 
               subset     = genus=="g.1"))
#>   descriptor taxon value
#> 1       st.a  sp.1     1
#> 2       st.b  sp.1     3

## ====================================================
## Northsea dataset
## ====================================================

#-----------------------------------------------------
## Occurrence of Abra alba, averaged per station
##-----------------------------------------------------

Abra_alba <- with(MWTL$density, 
   getDensity(subset      = taxon=="Abra alba",
              descriptor  = station,
              averageOver = year,
              taxon       = taxon,
              value       = density))
head(Abra_alba)
#>   descriptor     taxon      value
#> 1  BREEVTN02 Abra alba 55.6795892
#> 2  BREEVTN04 Abra alba  0.6747638
#> 3  BREEVTN07 Abra alba  0.7684211
#> 4  BREEVTN08 Abra alba  3.5663961
#> 5  BREEVTN09 Abra alba  0.6747632
#> 6  BREEVTN10 Abra alba  0.6747632

# Mean of all Abra species over all stations
# This should be done in two steps.

Abra <- with(MWTL$density, 
   getDensity(subset      = genus=="Abra",
              descriptor  = station,
              averageOver = year,
              taxon       = taxon,
              taxonomy    = Taxonomy,
              value       = density, 
              full.output = TRUE))
head(Abra)
#>   descriptor           taxon     value
#> 1  BREEVTN02       Abra alba 55.679589
#> 2  BREEVTN02     Abra nitida  1.349526
#> 3  BREEVTN02 Abra prismatica  0.000000
#> 4  BREEVTN02     Abra tenuis  0.000000
#> 5  BREEVTN03       Abra alba  0.000000
#> 6  BREEVTN03     Abra nitida  0.000000

tapply(Abra$value, INDEX=list(Abra$taxon), FUN=mean)
#>       Abra alba     Abra nitida Abra prismatica     Abra tenuis 
#>      22.0298979       1.1496596       0.9196846       0.0196533