get_trait_density combines (descriptor x taxon) density data with (taxon x trait) data to a (descriptor x trait) data set.

get_trait_density(data, descriptor, taxon, value, averageOver, 
        wide = NULL, descriptor_column = 1, trait, taxon_column = 1, 
        trait_class = NULL, trait_score = NULL, taxonomy = NULL, 
        scalewithvalue = TRUE, verbose = FALSE)

Arguments

data

data.frame to use for extracting the arguments descriptor, taxon, value, averageOver. Can be missing.

descriptor

variable(s) *where* the data were taken, e.g. sampling stations. If data is not missing: one or more column(s) from data; use cbind or data.frame to select more columns. If data is missing: a vector, a list, a data.frame or a matrix (with one or multiple columns). It can be of type numerical, character, or a factor. In theory, descriptor can also be one number, NA or missing; however, care needs to be taken in case this combined with subset and averageOver.

taxon

variables describing *what* the data are; it gives the taxonomic name (e.g. species). If data is not missing: one column from data. If data is missing: a list (or data.frame with one column), or a vector. When a data.frame or a list the "name" will be used in the output; when a vector, the argument name will be used.

value

variable that contains the *values* of the data, usually density. If data is not missing: one or more column(s) from data; use cbind or data.frame to select more columns. If data is missing: a vector, a list, a data.frame or a matrix (with one or multiple columns). it should be of the same length (or have the same number of rows) as (the number of rows of) descriptor and taxon. Should contain numerical values. Should always be present.

averageOver

*replicates* over which averages need to be taken. If data is not missing: one or more column(s) from data; use cbind or data.frame to select more columns. Else a vector, a list, a data.frame or a matrix (with one or multiple columns). It can be of type numerical, character, or a factor. Can be absent.

wide

density data, in *WIDE* format. This is a data.frame or matrix with (descriptor x taxon) information. If NULL, this data.frame will be calculated from the descriptor, taxon, (replicate) and value data. The number of descriptor columns are specified with descriptor_column. If a data.frame then the first column usually contains the descriptor name; in this case the dimensions of wide are (number of descriptors x number of species+1), and descriptor_column=1. It is also allowed to have the descriptors as row.names of the data.frame -this requires setting descriptor_column=0.

trait

(taxon x trait) data or (descriptor x trait) data, in *WIDE* format, and containing numerical values only. Traits can be fuzzy coded. The number of columns with taxonomic information is specified with taxon_column. The default is to have the first column contain the name of the taxon, and taxon_column=1. It is also allowed to have the taxa as row.names of the data.frame; in this case, taxon_column=0. See last example for how to deal with categorical traits.

trait_class

indices to trait levels, a vector. The length of this vector should equal the number of columns of trait or wide minus the value of taxon_column. The order should be conform the order in the trait columns (not checked). If present, this -together with trait_score- will be used to convert the trait matrix from fuzzy to crisp format.

trait_score

trait values or scores, a vector. Should be of same length as trait_class

scalewithvalue

when TRUE, will standardize with respect to total density, so that the average trait value is obtained (not the summed value). Note that total density will be estimated only for those taxa whose traits are known.

verbose

when TRUE, will write warnings to the screen.

taxonomy

taxonomic information (the relationships between the taxa), a data.frame; first column will be matched with taxon (in the data.frames data and trait). The subsequent columns should have increasing taxonomic level. (e.g. the column order should be *species, genus, family, ...*. The taxonomic relations from data and trait are used to estimate traits of taxa that are not accounted for, and that will be estimated based on taxa at the nearest taxonomic level. See details.

descriptor_column, taxon_column

position(s) or name(s) of the column(s) that holds the descriptor of the density data set (data.frame wide), and the taxa in the trait data set (data.frame trait). The default is to have the first column holding the descriptors or taxa. If NULL, or 0, then there is no separate column with names, so the row.names of the dataset are used as descriptor or taxon names.

Details

The taxonomy is used to fill in the gaps of the trait information, assuming that closely related taxa will share similar traits. This is done in two steps:

The trait database is first extended with information on higher taxonomic levels (using *extend_trait*). The traits for a taxonomic level are estimated as the average of the traits at the lower level. For instance, traits on genus level will be averages of known traits of all species in the database belonging to this genus.

Then, for each taxon that is not present in the trait database, the traits on the closest taxonomic level are used. For instance, for an unrecorded species, it is first checked if the trait is known on genus level, if not, family level and so on.

Value

get_trait_density returns the descriptor x trait density matrix.

Depending on whether argument data is passed or not, the output columns may be labelled differently:

  • if data is passed: the original names in data will be kept

  • if data is not passed: the names will only be kept if explicitly passed.

see example labeled "use data argument or explicit input".

Author

Karline Soetaert <karline.soetaert@nioz.nl> Olivier Beauchard

See also

MWTL for the data sets.

map_key for simple plotting functions.

get_density for functions working with density data.

get_Db_index for extracting bioturbation and bioirrigation indices.

extend_trait for functions working with traits.

get_trait

Note

When traits of certain taxa are not found, they are put equal to 0 and calculation will proceed; this will be notified only when verbose is set to TRUE. When that happens, the taxa that were ignored can be found in attributes(..)$notrait

Examples


## ====================================================
## A small dataset with replicates
## ====================================================

# 2 stations, 2 replicates for st.a, one replicate for st.b
Bdata.rep <- data.frame(
  station   = c("st.a","st.a","st.a","st.b","st.b","st.b"),
  replicate = c(     1,     1,    2,     1,     1,     1),
  species   = c("sp.1","sp.2","sp.1","sp.3","sp.4","sp.5"),
  density   = c(     1,     2,    3,     3,     1,     3)
)
Bdata.rep
#>   station replicate species density
#> 1    st.a         1    sp.1       1
#> 2    st.a         1    sp.2       2
#> 3    st.a         2    sp.1       3
#> 4    st.b         1    sp.3       3
#> 5    st.b         1    sp.4       1
#> 6    st.b         1    sp.5       3

##-----------------------------------------------------
## averaging over replicates - long format
##-----------------------------------------------------

Blong <- get_density(
              data        = Bdata.rep,
              descriptor  = station,
              taxon       = species,
              averageOver = replicate,
              value       = density)
Blong
#>   station species density
#> 1    st.a    sp.1       2
#> 2    st.a    sp.2       1
#> 3    st.b    sp.3       3
#> 4    st.b    sp.4       1
#> 5    st.b    sp.5       3

##-----------------------------------------------------
## From long to wide format, averaging over replicates
##-----------------------------------------------------

Bwide <- l2w_density (
               data        = Bdata.rep,
               value       = density, 
               descriptor  = station, 
               averageOver = replicate,
               taxon       = species)
Bwide
#>   station sp.1 sp.2 sp.3 sp.4 sp.5
#> 1    st.a    2    1    0    0    0
#> 2    st.b    0    0    3    1    3

##-----------------------------------------------------
## Small dataset: fuzzy-coded traits
##-----------------------------------------------------

# Note: no data for "sp.4"

Btraits <- data.frame(
  taxon   = c("sp.1","sp.2","sp.3","sp.5","sp.6"),
  T1_M1   = c(0     , 0    ,   0  , 0.2  ,     1),
  T1_M2   = c(1     , 0    , 0.5  , 0.3  ,     0),
  T1_M3   = c(0     , 1    , 0.5  , 0.5  ,     0),
  T2_M1   = c(0     , 0    ,   1  , 0.5  ,     1),
  T2_M2   = c(1     , 1    ,   0  , 0.5  ,     0)
)

# The metadata for this trait
Btraits.lab <- data.frame(
  colname  = c("T1_M1", "T1_M2", "T1_M3", "T2_M1", "T2_M2"),
  trait    = c("T1"   , "T1"   , "T1"   , "T2"   , "T2"),
  modality = c("M1"   , "M2"   , "M3"   , "M1"   , "M2"), 
  score    = c(0      ,  0.5   ,  1     ,  0.2   ,  2)
)

##-----------------------------------------------------
## Small dataset: taxonomy
##-----------------------------------------------------

Btaxonomy <- data.frame(
  species = c("sp.1","sp.2","sp.3","sp.4","sp.5","sp.6"),
  genus   = c( "g.1", "g.2", "g.2", "g.2", "g.3", "g.4"),
  family  = c( "f.1", "f.1", "f.1", "f.1", "f.2", "f.3"),
  order   = c( "o.1", "o.1", "o.1", "o.1", "o.2", "o.2"),
  class   = c( "c.1", "c.1", "c.1", "c.1", "c.1", "c.1")
  )

## ====================================================
## Use of get_trait_density: long and wide input
## ====================================================

# input in wide format (Bwide)
cwm.1 <- get_trait_density (
                        wide           = Bwide,  
                        trait          = Btraits)
cwm.1
#>   station T1_M1     T1_M2     T1_M3 T2_M1 T2_M2
#> 1    st.a   0.0 0.6666667 0.3333333  0.00  1.00
#> 2    st.b   0.1 0.4000000 0.5000000  0.75  0.25

# input in long format (Blong)
cwm.2 <- get_trait_density (
                        data           = Blong,
                        taxon          = species,
                        descriptor     = station,
                        value          = density,
                        trait          = Btraits)
cwm.2
#>   station T1_M1     T1_M2     T1_M3 T2_M1 T2_M2
#> 1    st.a   0.0 0.6666667 0.3333333  0.00  1.00
#> 2    st.b   0.1 0.4000000 0.5000000  0.75  0.25

# input in original long format (Bdata.rep) - average over replicates
cwm.3 <- get_trait_density (
                        data           = Bdata.rep,
                        taxon          = species,
                        descriptor     = station,
                        averageOver    = replicate,
                        value          = density,
                        trait          = Btraits)
cwm.3
#>   station T1_M1     T1_M2     T1_M3 T2_M1 T2_M2
#> 1    st.a   0.0 0.6666667 0.3333333  0.00  1.00
#> 2    st.b   0.1 0.4000000 0.5000000  0.75  0.25

# input in original long format (Bdata.rep) - keep replicates (as descriptor)
cwm.3b <- get_trait_density (
                        data           = Bdata.rep,
                        taxon          = species,
                        descriptor     = data.frame(station, replicate),
                        value          = density,
                        trait          = Btraits)
cwm.3b
#>   station replicate T1_M1     T1_M2     T1_M3 T2_M1 T2_M2
#> 1    st.a         1   0.0 0.3333333 0.6666667  0.00  1.00
#> 2    st.b         1   0.1 0.4000000 0.5000000  0.75  0.25
#> 3    st.a         2   0.0 1.0000000 0.0000000  0.00  1.00

## ====================================================
## use data argument or explicit input
## ====================================================

# -----------------------------------------------------
# use data argument
# -----------------------------------------------------
cwm.2 <- get_trait_density (
                        data           = Blong,
                        taxon          = species,
                        descriptor     = station,
                        value          = density,
                        trait          = Btraits)
                        
cwm.2  # keeps the names of the original data
#>   station T1_M1     T1_M2     T1_M3 T2_M1 T2_M2
#> 1    st.a   0.0 0.6666667 0.3333333  0.00  1.00
#> 2    st.b   0.1 0.4000000 0.5000000  0.75  0.25

# -----------------------------------------------------
# use with() to create a data environment
# -----------------------------------------------------
cwm.2b <- with(Blong, get_trait_density (
                        taxon          = species,
                        descriptor     = station,
                        value          = density,
                        trait          = Btraits))

cwm.2b     # Note: first column is called "descriptor"
#>   descriptor T1_M1     T1_M2     T1_M3 T2_M1 T2_M2
#> 1       st.a   0.0 0.6666667 0.3333333  0.00  1.00
#> 2       st.b   0.1 0.4000000 0.5000000  0.75  0.25

# explicitly name the descriptor argument
cwm.2c <- with(Blong, get_trait_density (
                        taxon          = species,
                        descriptor     = list(station = station),
                        value          = density,
                        trait          = Btraits))

cwm.2c     # called "station"
#>   station T1_M1     T1_M2     T1_M3 T2_M1 T2_M2
#> 1    st.a   0.0 0.6666667 0.3333333  0.00  1.00
#> 2    st.b   0.1 0.4000000 0.5000000  0.75  0.25

# -----------------------------------------------------
# explicit arguments
# -----------------------------------------------------
cwm.2d <- get_trait_density (
                        taxon          = Blong$species,
                        descriptor     = Blong$station,
                        value          = Blong$density,
                        trait          = Btraits)
cwm.2d 
#>   descriptor T1_M1     T1_M2     T1_M3 T2_M1 T2_M2
#> 1       st.a   0.0 0.6666667 0.3333333  0.00  1.00
#> 2       st.b   0.1 0.4000000 0.5000000  0.75  0.25

cwm.2d <- get_trait_density (
                        taxon          = Blong$species,
                        descriptor     = list(station = Blong$station),
                        value          = Blong$density,
                        trait          = Btraits)
cwm.2d 
#>   station T1_M1     T1_M2     T1_M3 T2_M1 T2_M2
#> 1    st.a   0.0 0.6666667 0.3333333  0.00  1.00
#> 2    st.b   0.1 0.4000000 0.5000000  0.75  0.25

## ====================================================
## use of scalewithvalue
## ====================================================

# if verbose=TRUE, a warning will be given, as there is no trait information
# for sp4 - this affects the trait density for station "b"

cwm.trait.1 <- get_trait_density (
                        wide           = Bwide,  
                        trait          = Btraits, 
                        trait_class    = Btraits.lab$trait, 
                        trait_score    = Btraits.lab$score, 
                        scalewithvalue = TRUE, 
                        verbose        = FALSE)

cwm.trait.1
#>   station        T1   T2
#> 1    st.a 0.6666667 2.00
#> 2    st.b 0.7000000 0.65

attributes(cwm.trait.1)$notrait   # species that was ignored
#> [1] "sp.4"

# same, but total trait values

cwm.trait.2 <- get_trait_density (
                        wide           = Bwide,  
                        trait          = Btraits, 
                        trait_class    = Btraits.lab$trait, 
                        trait_score    = Btraits.lab$score, 
                        scalewithvalue = FALSE, 
                        verbose        = FALSE)

cwm.trait.2
#>   station  T1  T2
#> 1    st.a 2.0 6.0
#> 2    st.b 4.2 3.9

## ====================================================
## pass taxonomy to estimate traits for unrecorded species
## ====================================================

cwm.trait.2 <- get_trait_density (
                        wide           = Bwide,  
                        trait          = Btraits, 
                        trait_class    = Btraits.lab$trait, 
                        trait_score    = Btraits.lab$score, 
                        taxonomy       = Btaxonomy,
                        scalewithvalue = TRUE)

cwm.trait.2
#>   station        T1        T2
#> 1    st.a 0.6666667 2.0000000
#> 2    st.b 0.7250000 0.7142857
attributes(cwm.trait.2)$notrait    # none
#> [1] NA

##-----------------------------------------------------
## Same but keeping fuzzy scores
##-----------------------------------------------------

cwm.trait.3 <- get_trait_density (
                        wide           = Bwide,  
                        trait          = Btraits, 
                        taxonomy       = Btaxonomy,
                        scalewithvalue = TRUE)

cwm.trait.3
#>   station      T1_M1     T1_M2     T1_M3     T2_M1     T2_M2
#> 1    st.a 0.00000000 0.6666667 0.3333333 0.0000000 1.0000000
#> 2    st.b 0.08571429 0.3785714 0.5357143 0.7142857 0.2857143

## ====================================================
## categorical traits
## ====================================================

# Note: no data for "sp.4"

Bcategory <- data.frame(
  taxon   = c("sp.1","sp.2","sp.3","sp.5","sp.6"),
  C1      = c(   "A",   "B",   "A",   "C",   "C")
)

if (FALSE) { # \dontrun{
# this will not work, as trait should be numerical
cwm.trait.4 <- get_trait_density (
                        wide           = Bwide,  
                        trait          = Bcategory, 
                        taxonomy       = Btaxonomy,
                        scalewithvalue = TRUE)
} # }

# convert categorical traits to numerical values
crisp2fuzzy(Bcategory)
#>   taxon C1_A C1_B C1_C
#> 1  sp.1    1    0    0
#> 2  sp.2    0    1    0
#> 3  sp.3    1    0    0
#> 4  sp.5    0    0    1
#> 5  sp.6    0    0    1

# this will  work, as categorical traits -> numerical
cwm.trait.4 <- get_trait_density (
                        wide           = Bwide,  
                        trait          = crisp2fuzzy(Bcategory), 
                        taxonomy       = Btaxonomy,
                        scalewithvalue = TRUE)
cwm.trait.4
#>   station      C1_A       C1_B      C1_C
#> 1    st.a 0.6666667 0.33333333 0.0000000
#> 2    st.b 0.5000000 0.07142857 0.4285714