get_density takes a subset of density data, possibly at higher taxonomic level.

get_proportion estimates proportional values from (a subset of) density data.

add_absences adds taxon absences in density data in long format.

get_density(data, descriptor, taxon, value, averageOver, 
            taxonomy = NULL, subset, wide.output = FALSE, 
            full.output = FALSE, verbose=FALSE)

get_proportion(data, descriptor, taxon, value, averageOver, 
            taxonomy = NULL, verbose=FALSE)

add_absences(data, descriptor, taxon, value, averageOver)

Arguments

data

data.frame to use for extracting the arguments descriptor, taxon, value, averageOver. Can be missing.

descriptor

variable(s) *where* the data were taken, e.g. sampling stations. If data is not missing: one or more column(s) from data; use cbind or data.frame to select more columns. If data is missing: a vector, a list, a data.frame or a matrix (with one or multiple columns). It can be of type numerical, character, or a factor. In theory, descriptor can also be one number, NA or missing; however, care needs to be taken in case this combined with subset and averageOver.

taxon

variables describing *what* the data are; it gives the taxonomic name (e.g. species). If data is not missing: one column from data. If data is missing: a list (or data.frame with one column), or a vector. When a data.frame or a list the "name" will be used in the output; when a vector, the argument name will be used.

value

variable that contains the *values* of the data, usually density. If data is not missing: one or more column(s) from data; use cbind or data.frame to select more columns. If data is missing: a vector, a list, a data.frame or a matrix (with one or multiple columns). it should be of the same length (or have the same number of rows) as (the number of rows of) descriptor and taxon. Should contain numerical values. Should always be present.

averageOver

*replicates* over which averages need to be taken. If data is not missing: one or more column(s) from data; use cbind or data.frame to select more columns. Else a vector, a list, a data.frame or a matrix (with one or multiple columns). It can be of type numerical, character, or a factor. Can be absent.

subset

logical expression indicating elements to keep: missing values are taken as FALSE. If NULL, or absent, then all elements are used. Note that the subset is taken *after* the number of samples to average per descriptor is calculated, so this will also work for selecting certain taxa that may not be present in all replicates over which should be averaged.

taxonomy

taxonomic information; first column will be matched with taxon.

full.output

when TRUE, will also return descriptors for which the value is 0. This can be relevant in case a selection is made for taxonomic composition. Note that the taxon in this case will be undefined.

wide.output

when TRUE, will recast the output in wide format (the default is long format). This only makes sense when value has multiple columns. In the wide format, each row holds the taxa for a descriptor, and each column holds the values for one taxon and all descriptors.

verbose

when TRUE, may write warnings to the screen.

Value

  • get_density returns a data.frame combining descriptor, taxon, and value, in LONG format. The value consists of summed values for (descriptor, averageOver, taxon) combinations, followed by averaging over averageOver.

    Unless argument full.output is TRUE, this will not contain absences (i.e. where value = 0), unless they were already present in the input.

  • get_proportion returns a data.frame combining descriptor, taxon, and proportional, in LONG format.

Depending on whether argument data is passed or not, the output columns may be labelled differently:

  • if data is passed: the original names in data will be kept

  • if data is not passed: the names will only be kept if explicitly passed.

Author

Karline Soetaert <karline.soetaert@nioz.nl> Olivier Beauchard

See also

MWTL for the data sets.

map_key for simple plotting functions.

long2wide for estimating converting data from long to wide format and vice versa.

get_summary for estimating summaries from density data.

get_trait_density for functions combining density and traits.

extend_trait for functions working with traits.

get_Db_index for extracting bioturbation and bioirrigation indices.

Details

  • get_density. In this function a summed value over taxa x descriptor is first calculated, and this value is then divided by the number of replicates (as in averageOver) per descriptor.

    This is suitable for density and biomass data, but NOT for mean individual weight for instance (which cannot be summed).

Examples


## ====================================================
## A small dataset with replicates
## ====================================================

# 2 stations, 2 replicates for st.a, one replicate for st.b
Bdata.rep <- data.frame(
  station   = c("st.a","st.a","st.a","st.b","st.b","st.b"),
  replicate = c(     1,     1,    2,     1,     1,     1),
  species   = c("sp.1","sp.2","sp.1","sp.3","sp.4","sp.5"),
  density   = c(     1,     2,    3,     3,     1,     3)
)
Bdata.rep
#>   station replicate species density
#> 1    st.a         1    sp.1       1
#> 2    st.a         1    sp.2       2
#> 3    st.a         2    sp.1       3
#> 4    st.b         1    sp.3       3
#> 5    st.b         1    sp.4       1
#> 6    st.b         1    sp.5       3

## ====================================================
## Proportion of each species -
## ====================================================

# three ways to extract proportions

# -----------------------------------------------------
# 1. using input argument data 
# -----------------------------------------------------
# note: taxon is called species as in original data set
  
  get_proportion (data        = Bdata.rep,
                  value       = density, 
                  descriptor  = data.frame(station, replicate), 
                  taxon       = species)
#>   station replicate species         p
#> 1    st.a         1    sp.1 0.3333333
#> 2    st.a         1    sp.2 0.6666667
#> 3    st.a         2    sp.1 1.0000000
#> 4    st.b         1    sp.3 0.4285714
#> 5    st.b         1    sp.4 0.1428571
#> 6    st.b         1    sp.5 0.4285714

  # to force the taxon to be called taxon
  get_proportion (data        = Bdata.rep,
                  value       = density, 
                  descriptor  = data.frame(station, replicate), 
                  taxon       = data.frame(taxon = species))  # named dataframe
#>   station replicate taxon         p
#> 1    st.a         1  sp.1 0.3333333
#> 2    st.a         1  sp.2 0.6666667
#> 3    st.a         2  sp.1 1.0000000
#> 4    st.b         1  sp.3 0.4285714
#> 5    st.b         1  sp.4 0.1428571
#> 6    st.b         1  sp.5 0.4285714

# -----------------------------------------------------
# 2. using with() to create an environment 
# -----------------------------------------------------
# note: taxon is called taxon

with (Bdata.rep, 
  get_proportion (value       = density, 
                  descriptor  = data.frame(station, replicate), 
                  taxon       = species)
      )
#>   station replicate taxon         p
#> 1    st.a         1  sp.1 0.3333333
#> 2    st.a         1  sp.2 0.6666667
#> 3    st.a         2  sp.1 1.0000000
#> 4    st.b         1  sp.3 0.4285714
#> 5    st.b         1  sp.4 0.1428571
#> 6    st.b         1  sp.5 0.4285714

# force the taxon to be called "species"
with (Bdata.rep, 
  get_proportion (value       = density, 
                  descriptor  = data.frame(station, replicate), 
                  taxon       = data.frame(species = species))
      )          
#>   station replicate species         p
#> 1    st.a         1    sp.1 0.3333333
#> 2    st.a         1    sp.2 0.6666667
#> 3    st.a         2    sp.1 1.0000000
#> 4    st.b         1    sp.3 0.4285714
#> 5    st.b         1    sp.4 0.1428571
#> 6    st.b         1    sp.5 0.4285714

# -----------------------------------------------------
# 3. using pipes
# -----------------------------------------------------
  Bdata.rep |> get_proportion (value       = density, 
                               descriptor  = data.frame(station, replicate), 
                               taxon       = species)
#>   station replicate species         p
#> 1    st.a         1    sp.1 0.3333333
#> 2    st.a         1    sp.2 0.6666667
#> 3    st.a         2    sp.1 1.0000000
#> 4    st.b         1    sp.3 0.4285714
#> 5    st.b         1    sp.4 0.1428571
#> 6    st.b         1    sp.5 0.4285714


##-----------------------------------------------------
## average of replicates
##-----------------------------------------------------

  PP <-  get_density(
              data        = Bdata.rep,
              value       = density, 
              descriptor  = station, 
              taxon       = species, 
              averageOver = replicate)


# input arguments are kept in the attributes              
attributes(PP)[-(1:3)]
#> $dataset
#> Bdata.rep
#> 
#> $names_descriptor
#> [1] "station"
#> 
#> $names_taxon
#> [1] "species"
#> 
#> $names_value
#> [1] "density"
#> 
#> $names_averageOver
#> [1] "replicate"
#> 
#> $subset
#> [1] NA
#> 
                 
# averaging multiple value columns at once 
# extending the data with biomass - assume no biomass for st b

Bdata.rep$biomass = c(0.1, 0.2, 0.3, NA, NA, NA)
Bdata.rep
#>   station replicate species density biomass
#> 1    st.a         1    sp.1       1     0.1
#> 2    st.a         1    sp.2       2     0.2
#> 3    st.a         2    sp.1       3     0.3
#> 4    st.b         1    sp.3       3      NA
#> 5    st.b         1    sp.4       1      NA
#> 6    st.b         1    sp.5       3      NA

  DD <-  get_density(
              data        = Bdata.rep,
              value       = data.frame(density, biomass), 
              descriptor  = data.frame(station), 
              taxon       = species, 
              averageOver = replicate)

# input arguments              
  attributes(DD)[-(1:3)]
#> $dataset
#> Bdata.rep
#> 
#> $names_descriptor
#> [1] "station"
#> 
#> $names_taxon
#> [1] "species"
#> 
#> $names_value
#> [1] "density" "biomass"
#> 
#> $names_averageOver
#> [1] "replicate"
#> 
#> $subset
#> [1] NA
#> 

##-----------------------------------------------------
## Select information for one species
##-----------------------------------------------------

  get_density(data        = Bdata.rep, 
              subset      = species=="sp.2",
              value       = density, 
              descriptor  = station, 
              taxon       = species, 
              averageOver = replicate)
#>   station species density
#> 1    st.a    sp.2       1

# returns also the 0 value 
  get_density(data        = Bdata.rep,
              subset      = species=="sp.2",
              value       = density, 
              descriptor  = station, 
              taxon       = species, 
              averageOver = replicate, 
              full.output = TRUE)
#>   station species density
#> 1    st.a    sp.2       1
#> 2    st.b    sp.2       0

##-----------------------------------------------------
## Extend the long format with absences
##-----------------------------------------------------

# take averages over replicates
Bdata.rep |>
  add_absences (value      = density, 
                descriptor = cbind(station, replicate), 
                taxon      = species)
#>    station replicate species density
#> 1     st.a         1    sp.1       1
#> 2     st.b         1    sp.1       0
#> 3     st.a         2    sp.1       3
#> 4     st.a         1    sp.2       2
#> 5     st.b         1    sp.2       0
#> 6     st.a         2    sp.2       0
#> 7     st.a         1    sp.3       0
#> 8     st.b         1    sp.3       3
#> 9     st.a         2    sp.3       0
#> 10    st.a         1    sp.4       0
#> 11    st.b         1    sp.4       1
#> 12    st.a         2    sp.4       0
#> 13    st.a         1    sp.5       0
#> 14    st.b         1    sp.5       3
#> 15    st.a         2    sp.5       0

# keep replicates
  add_absences (data        = Bdata.rep, 
                value       = density, 
                descriptor  = station, 
                taxon       = species, 
                averageOver = replicate)
#>    station species density
#> 1     st.a    sp.1       2
#> 2     st.b    sp.1       0
#> 3     st.a    sp.2       1
#> 4     st.b    sp.2       0
#> 5     st.a    sp.3       0
#> 6     st.b    sp.3       3
#> 7     st.a    sp.4       0
#> 8     st.b    sp.4       1
#> 9     st.a    sp.5       0
#> 10    st.b    sp.5       3

## ====================================================
## ====================================================
## A small dataset without replicates
## ====================================================
## ====================================================

Bdata <- data.frame(
  station = c("st.a","st.a","st.b","st.b","st.b","st.c"),
  species = c("sp.1","sp.2","sp.1","sp.3","sp.4","sp.5"),
  density = c(1, 2, 3, 3, 1, 3)
)

## ====================================================
## Small dataset: taxonomy
## ====================================================

Btaxonomy <- data.frame(
  species = c("sp.1","sp.2","sp.3","sp.4","sp.5","sp.6"),
  genus   = c( "g.1", "g.2", "g.2", "g.2", "g.3", "g.4"),
  family  = c( "f.1", "f.1", "f.1", "f.1", "f.2", "f.3"),
  order   = c( "o.1", "o.1", "o.1", "o.1", "o.2", "o.2"),
  class   = c( "c.1", "c.1", "c.1", "c.1", "c.1", "c.1")
  )

# all input:                                 
# ** simple density

get_density(Bdata, 
            descriptor = station, 
            taxon      = species, 
            value      = density)
#>   station species density
#> 1    st.a    sp.1       1
#> 2    st.a    sp.2       2
#> 3    st.b    sp.1       3
#> 4    st.b    sp.3       3
#> 5    st.b    sp.4       1
#> 6    st.c    sp.5       3

# without taxon: 
# ** sum per station

get_density(Bdata, 
            descriptor = station, 
            value      = density)
#>   station taxon density
#> 1    st.a    NA       3
#> 2    st.b    NA       7
#> 3    st.c    NA       3

# without taxon, and averaging over species: 
# ** average per station

get_density(Bdata, 
            descriptor  = station, 
            averageOver = species,
            value       = density)
#>   station taxon  density
#> 1    st.a    NA 1.500000
#> 2    st.b    NA 2.333333
#> 3    st.c    NA 3.000000

# without descriptor: 
# ** sum per species

get_density(Bdata, 
            taxon      = species, 
            value      = density)
#>   descriptor species density
#> 1         NA    sp.1       4
#> 2         NA    sp.2       2
#> 3         NA    sp.3       3
#> 4         NA    sp.4       1
#> 5         NA    sp.5       3

# without descriptor, averaging over stations: 
# ** average per species

get_density(Bdata, 
            taxon       = species, 
            averageOver = station,
            value       = density)
#>   descriptor species   density
#> 1         NA    sp.1 1.3333333
#> 2         NA    sp.2 0.6666667
#> 3         NA    sp.3 1.0000000
#> 4         NA    sp.4 0.3333333
#> 5         NA    sp.5 1.0000000

# without descriptor and taxon: 
# ** sum of all

get_density(Bdata, 
            value       = density)
#>   descriptor taxon density
#> 1         NA    NA      13

# without descriptor and taxon, averaging over species and stations: 
# ** average of all

get_density(Bdata, 
            averageOver = cbind(station, species),
            value       = density)
#>   descriptor taxon  density
#> 1         NA    NA 2.166667

##-----------------------------------------------------
## density on higher taxonomic level
##-----------------------------------------------------

# species density for a particular genus 
sp.g2 <- 
  get_density(data       = Bdata,
              descriptor = station, 
              taxon      = species,
              value      = density,
              taxonomy   = Btaxonomy,
              subset     = genus == "g.2")
sp.g2
#>   station species density
#> 1    st.a    sp.2       2
#> 2    st.b    sp.3       3
#> 3    st.b    sp.4       1

# select data for station st.a
Bselect <- 
   get_density(data       = Bdata,
               value      = density, 
               descriptor = station, 
               taxon      = species, 
               subset     = station=="st.a")

Bselect
#>   station species density
#> 1    st.a    sp.1       1
#> 2    st.a    sp.2       2

# pass taxonomy to select only species that belong to g.1
   get_density(data       = Bdata,
               value      = density, 
               descriptor = station, 
               taxon      = species, 
               taxonomy   = Btaxonomy, 
               subset     = genus=="g.1")
#>   station species density
#> 1    st.a    sp.1       1
#> 2    st.b    sp.1       3

## ====================================================
## Northsea dataset
## ====================================================

#-----------------------------------------------------
## Occurrence of Abra alba, averaged per station
##-----------------------------------------------------

Abra_alba <- 
   get_density(data        = MWTL$density,
               subset      = taxon=="Abra alba",
               descriptor  = station,
               averageOver = year,
               taxon       = taxon,
               value       = density)
head(Abra_alba)
#>     station     taxon    density
#> 1 BREEVTN02 Abra alba 55.6795892
#> 2 BREEVTN04 Abra alba  0.6747638
#> 3 BREEVTN07 Abra alba  0.7684211
#> 4 BREEVTN08 Abra alba  3.5663961
#> 5 BREEVTN09 Abra alba  0.6747632
#> 6 BREEVTN10 Abra alba  0.6747632

# Mean of all Abra species over all stations
# This should be done in two steps.

Abra <- 
   get_density(data        = MWTL$density,
               subset      = genus=="Abra",
               descriptor  = station,
               averageOver = year,
               taxon       = taxon,
               taxonomy    = Taxonomy,
               value       = density, 
               full.output = TRUE)
head(Abra)
#>     station           taxon   density
#> 1 BREEVTN02       Abra alba 55.679589
#> 2 BREEVTN02     Abra nitida  1.349526
#> 3 BREEVTN02 Abra prismatica  0.000000
#> 4 BREEVTN02     Abra tenuis  0.000000
#> 5 BREEVTN03       Abra alba  0.000000
#> 6 BREEVTN03     Abra nitida  0.000000

tapply(Abra$density, INDEX=list(Abra$taxon), FUN=mean)
#>       Abra alba     Abra nitida Abra prismatica     Abra tenuis 
#>      22.0298979       1.1496596       0.9196846       0.0196533