long2wide casts data from long to wide format.

w2l_density casts density data from wide to long format.

w2l_trait casts trait data from wide to long format.

wide2long casts data from wide to long format.

l2w_density casts density data from long to wide format.

l2w_trait casts trait data from long to wide format.

wide2long(wide, descriptor_column = 1, wide_names = NULL, 
          absences = FALSE)

w2l_density(wide, descriptor_column = 1, taxon_names = NULL, 
           absences = FALSE) 

w2l_trait(wide, taxon_column = 1, trait_names = NULL, 
           absences = FALSE) 

long2wide(data, row, column, value, averageOver, 
        taxonomy = NULL, subset)
        
l2w_density(data, descriptor, taxon, value, averageOver, 
        taxonomy = NULL, subset)
        
l2w_trait(trait, descriptor, taxon, value, averageOver, 
         taxonomy = NULL, subset)

Arguments

wide

data, in *WIDE* format. For density data, this is a data.frame or matrix with (descriptor x taxon) information, and the first column usually contains the descriptor name. For trait data this is a data.frame with (taxon x trait) information, and the first column generally contains the names of the taxa. It is also allowed to have the descriptors as row.names of the data.frame -this requires setting descriptor_column=0.

row

vector or data.frame that contains the data that will be used to label the rows in wide format. This can consist of multiple colums.

column

vector with the data that will be used to label the columns in wide format.

data

data.frame to use for extracting the arguments descriptor, taxon, value, averageOver. Can be missing.

descriptor

variable(s) *where* the data were taken, e.g. sampling stations. If data is not missing: one or more column(s) from data; use cbind or data.frame to select more columns. If data is missing: a vector, a list, a data.frame or a matrix (with one or multiple columns). It can be of type numerical, character, or a factor. In theory, descriptor can also be one number, NA or missing; however, care needs to be taken in case this combined with subset and averageOver.

taxon

variables describing *what* the data are; it gives the taxonomic name (e.g. species). If data is not missing: one column from data. If data is missing: a list (or data.frame with one column), or a vector. When a data.frame or a list the "name" will be used in the output; when a vector, the argument name will be used.

value

variable that contains the *values* of the data, usually density. If data is not missing: one or more column(s) from data; use cbind or data.frame to select more columns. If data is missing: a vector, a list, a data.frame or a matrix (with one or multiple columns). it should be of the same length (or have the same number of rows) as (the number of rows of) descriptor and taxon. Should contain numerical values. Should always be present.

averageOver

*replicates* over which averages need to be taken. If data is not missing: one or more column(s) from data; use cbind or data.frame to select more columns. Else a vector, a list, a data.frame or a matrix (with one or multiple columns). It can be of type numerical, character, or a factor. Can be absent.

subset

logical expression indicating elements to keep: missing values are taken as FALSE. If NULL, or absent, then all elements are used. Note that the subset is taken *after* the number of samples to average per descriptor is calculated, so this will also work for selecting certain taxa that may not be present in all replicates over which should be averaged.

taxonomy

taxonomic information; first column will be matched with taxon, regardless of its name.

descriptor_column

position(s) or name(s) of the column(s) that holds the descriptors of the (density) data set, and that should be removed for any calculations. The default is to have the first column holding the descriptors. If NULL, or 0, then there is no separate column with names, so the row.names of the dataset are used as descriptor names.

taxon_column

position(s) or name(s) of the column(s) that holds the taxon names of the (trait) data set, and that should be removed for any calculations. The default is to have the first column holding the taxa. If NULL, or 0, then there is no separate column with names, so the row.names of the dataset are used as taxon names.

trait

(taxon x trait) data or (descriptor x trait) data, in WIDE format. Traits can be fuzzy coded. In the default setting, the first column contains the name of the taxon, and taxon_column=1. It is also allowed to have the taxa as row.names of the data.frame - set taxon_column=0.

wide_names, taxon_names, trait_names

names of the items constituting the columns in the wide dataset. If not given, the columnames (minus descriptor_column) will be used. Input this as a data.frame if you want to set the names of the columns in the long format.

absences

if TRUE the long format will contains 0's for absences

Author

Karline Soetaert <karline.soetaert@nioz.nl>

See also

MWTL for the data sets

map_key for simple plotting functions

get_density for functions working on density data

get_summary for estimating summaries from density data

get_trait_density for functions combining density and traits

get_Db_index for extracting bioturbation and bioirrigation indices

extend_trait for functions working with traits

get_trait

Details

About long2wide and wide2long:

There are two ways in which density data can be inputted:

  • descriptor, taxon, value, replicates, ... are vectors with density data in *LONG* format: (where, which, replicates (averageOver), value); all these vectors should be of equal length (or NULL).

  • wide has the density data in *WIDE* format, i.e. as a matrix with the descriptor (and perhaps replicates) in the first column, the taxon as the column names (excluding the first column), and the content of the data is the density.

Examples


## ====================================================
## Datasets
## ====================================================
##-----------------------------------------------------
## A small dataset with replicates
##-----------------------------------------------------

# 2 stations, 2 replicates for st.a, one replicate for st.b
Bdata.rep <- data.frame(
  station   = c("st.a","st.a","st.a","st.b","st.b","st.b"),
  replicate = c(     1,     1,    2,     1,     1,     1),
  species   = c("sp.1","sp.2","sp.1","sp.3","sp.4","sp.5"),
  density   = c(     1,     2,    3,     3,     1,     3)
)
Bdata.rep
#>   station replicate species density
#> 1    st.a         1    sp.1       1
#> 2    st.a         1    sp.2       2
#> 3    st.a         2    sp.1       3
#> 4    st.b         1    sp.3       3
#> 5    st.b         1    sp.4       1
#> 6    st.b         1    sp.5       3

##-----------------------------------------------------
## A small dataset without replicates
##-----------------------------------------------------

Bdata <- data.frame(
  station = c("st.a","st.a","st.b","st.b","st.b","st.c"),
  species = c("sp.1","sp.2","sp.1","sp.3","sp.4","sp.5"),
  density = c(1, 2, 3, 3, 1, 3)
)

##-----------------------------------------------------
## Small dataset: taxonomy
##-----------------------------------------------------

Btaxonomy <- data.frame(
  species = c("sp.1","sp.2","sp.3","sp.4","sp.5","sp.6"),
  genus   = c( "g.1", "g.2", "g.2", "g.2", "g.3", "g.4"),
  family  = c( "f.1", "f.1", "f.1", "f.1", "f.2", "f.3"),
  order   = c( "o.1", "o.1", "o.1", "o.1", "o.2", "o.2"),
  class   = c( "c.1", "c.1", "c.1", "c.1", "c.1", "c.1")
  )

## ====================================================
## Long to wide format
## ====================================================

##-----------------------------------------------------
## Go to wide format, average of replicates
##-----------------------------------------------------

# use with() to create an environment -> first column called "descriptor"
  with (Bdata.rep, l2w_density(
              value       = density, 
              descriptor  = station, 
              taxon       = species, 
              averageOver = replicate))
#>   descriptor sp.1 sp.2 sp.3 sp.4 sp.5
#> 1       st.a    2    1    0    0    0
#> 2       st.b    0    0    3    1    3

# use data argument -> first column called "station"
  l2w_density(data        = Bdata.rep, 
              value       = density, 
              descriptor  = station, 
              taxon       = species, 
              averageOver = replicate)
#>   station sp.1 sp.2 sp.3 sp.4 sp.5
#> 1    st.a    2    1    0    0    0
#> 2    st.b    0    0    3    1    3

##-----------------------------------------------------
## Go to wide format, keep replicates
##-----------------------------------------------------

  l2w_density(data      = Bdata.rep,
              value      = density, 
              descriptor = cbind(station, replicate), 
              taxon      = species)
#>   station replicate sp.1 sp.2 sp.3 sp.4 sp.5
#> 1    st.a         1    1    2    0    0    0
#> 2    st.b         1    0    0    3    1    3
#> 3    st.a         2    3    0    0    0    0

##-----------------------------------------------------
## Go to wide format, ADD replicates
##-----------------------------------------------------

  l2w_density(data       = Bdata.rep,
              value      = density, 
              descriptor = station,  
              taxon      = species)
#>   station sp.1 sp.2 sp.3 sp.4 sp.5
#> 1    st.a    4    2    0    0    0
#> 2    st.b    0    0    3    1    3


##-----------------------------------------------------
## Go to wide format, AVERAGE over replicates
##-----------------------------------------------------

  l2w_density(data        = Bdata.rep,
              value       = density, 
              descriptor  = station,  
              averageOver = replicate,
              taxon       = species)
#>   station sp.1 sp.2 sp.3 sp.4 sp.5
#> 1    st.a    2    1    0    0    0
#> 2    st.b    0    0    3    1    3


##-----------------------------------------------------
## density on higher taxonomic level
##-----------------------------------------------------

# add genus, family... to the density data

Bdata.ext <- merge(Bdata, Btaxonomy,
                   by = "species")
head(Bdata.ext)   
#>   species station density genus family order class
#> 1    sp.1    st.a       1   g.1    f.1   o.1   c.1
#> 2    sp.1    st.b       3   g.1    f.1   o.1   c.1
#> 3    sp.2    st.a       2   g.2    f.1   o.1   c.1
#> 4    sp.3    st.b       3   g.2    f.1   o.1   c.1
#> 5    sp.4    st.b       1   g.2    f.1   o.1   c.1
#> 6    sp.5    st.c       3   g.3    f.2   o.2   c.1

# estimate (summed) density on genus level 
Bwide.genus <- l2w_density(
              data       = Bdata.ext, 
              descriptor = station, 
              taxon      = genus,
              value      = density)

Bwide.genus
#>   station g.1 g.2 g.3
#> 1    st.a   1   2   0
#> 2    st.b   3   4   0
#> 3    st.c   0   0   3

##-----------------------------------------------------
## select part of the data
##-----------------------------------------------------

# return species density for g.2 only
  l2w_density(data       = Bdata.ext,  
              value      = density, 
              descriptor = station, 
              taxon      = species, 
              subset     = Bdata.ext$genus=="g.2")
#>   station sp.2 sp.3 sp.4
#> 1    st.a    2    0    0
#> 2    st.b    0    3    1
    

# create summed values for g.2 only
  l2w_density(data       = Bdata.ext,  
              value      = density, 
              descriptor = station, 
              taxon      = genus, 
              subset     = Bdata.ext$genus=="g.2")
#>   station g.2
#> 1    st.a   2
#> 2    st.b   4

## ====================================================
## From wide to long format
## ====================================================

  Bwide <- data.frame(station = c("Sta", "Stb", "Stc"),
                      sp1     = c(    1,     3,     0),
                      sp2     = c(    2,     0,     0),
                      sp3     = c(    0,     0,     3))

# this long format includes the 0 densities
  wide2long (wide     = Bwide, 
             absences = TRUE)
#>   station name value
#> 1     Sta  sp1     1
#> 2     Stb  sp1     3
#> 3     Stc  sp1     0
#> 4     Sta  sp2     2
#> 5     Stb  sp2     0
#> 6     Stc  sp2     0
#> 7     Sta  sp3     0
#> 8     Stb  sp3     0
#> 9     Stc  sp3     3

# this does not include the absences, and renames the species
  wide2long (wide = Bwide, 
             wide_names = paste("Species", 1:3, sep="_"))
#>   station wide_names value
#> 1     Sta  Species_1     1
#> 2     Stb  Species_1     3
#> 3     Sta  Species_2     2
#> 4     Stc  Species_3     3

## ====================================================
## From wide trait data to long format
## ====================================================

head(Traits_nioz, n = c(3, 5))
#>                   taxon ET1.M1 ET1.M2 ET1.M3 ET1.M4
#> 1          Abludomelita    0.5    0.5    0.0      0
#> 2 Abludomelita obtusata    0.5    0.5    0.0      0
#> 3             Abra alba    0.0    0.5    0.5      0

T_long   <- w2l_trait(Traits_nioz)
T_long_a <- w2l_trait(Traits_nioz, absences = TRUE)

head(T_long)
#>                    taxon   name     value
#> 1           Abludomelita ET1.M1 0.5000000
#> 2  Abludomelita obtusata ET1.M1 0.5000000
#> 3      Acteon tornatilis ET1.M1 0.3333333
#> 4  Ampelisca brevicornis ET1.M1 0.2500000
#> 5      Ampelisca diadema ET1.M1 0.2500000
#> 6 Ampelisca macrocephala ET1.M1 0.2500000
head(T_long_a)
#>                   taxon   name value
#> 1          Abludomelita ET1.M1   0.5
#> 2 Abludomelita obtusata ET1.M1   0.5
#> 3             Abra alba ET1.M1   0.0
#> 4           Abra nitida ET1.M1   0.0
#> 5       Abra prismatica ET1.M1   0.0
#> 6           Abra tenuis ET1.M1   0.0

## ====================================================
## From long trait data to wide format
## ====================================================

head(T_long)
#>                    taxon   name     value
#> 1           Abludomelita ET1.M1 0.5000000
#> 2  Abludomelita obtusata ET1.M1 0.5000000
#> 3      Acteon tornatilis ET1.M1 0.3333333
#> 4  Ampelisca brevicornis ET1.M1 0.2500000
#> 5      Ampelisca diadema ET1.M1 0.2500000
#> 6 Ampelisca macrocephala ET1.M1 0.2500000

# go back from long to wide format
T_wide   <- l2w_trait(trait      = T_long,
                      taxon      = taxon, 
                      descriptor = name, 
                      value      = value)
head(T_wide, n=c(5,5))
#>                   taxon ET1.M1 ET1.M2 ET1.M3 ET1.M4
#> 1          Abludomelita    0.5    0.5    0.0      0
#> 2 Abludomelita obtusata    0.5    0.5    0.0      0
#> 3             Abra alba    0.0    0.5    0.5      0
#> 4           Abra nitida    0.0    1.0    0.0      0
#> 5       Abra prismatica    0.0    1.0    0.0      0

# other way around
T_wide_2  <- long2wide(data      = T_long,
                      row        = name, 
                      column     = taxon, 
                      value      = value)

head(T_wide_2, n=c(5,5))
#>     name Abludomelita Abludomelita obtusata Abra alba Abra nitida
#> 1 ET1.M1          0.5                   0.5       0.0           0
#> 2 ET1.M2          0.5                   0.5       0.5           1
#> 3 ET1.M3          0.0                   0.0       0.5           0
#> 4 ET1.M4          0.0                   0.0       0.0           0
#> 5 ET1.M5          0.0                   0.0       0.0           0