Functions for conversion from long to wide format and vice versa.

long2wide casts data from long to wide format.

w2lDensity casts density data from wide to long format.

w2lTrait casts trait data from wide to long format.

wide2long casts data from wide to long format.

l2wDensity casts density data from long to wide format.

l2wTrait casts trait data from long to wide format.

Usage

wide2long(wide, d.column = 1, w.names = NULL, 
          absences = FALSE)

w2lDensity(wide, d.column = 1, taxon.names = NULL, 
           absences = FALSE) 

w2lTrait(wide, t.column = 1, trait.names = NULL, 
           absences = FALSE) 

long2wide(row, column, value, averageOver = NULL, 
        taxonomy = NULL, subset)
        
l2wDensity(descriptor, taxon, value, averageOver = NULL, 
        taxonomy = NULL, subset)
        
l2wTrait(trait, taxon, value, averageOver = NULL, 
        taxonomy = NULL, subset)

Arguments

wide: data, in *WIDE* format. For density data, this is a data.frame or matrix with (descriptor x taxon) information (for density data), and The first column usually contains the descriptor name. For trait data this is a data.frame with (taxon x trait) information, and the first column generally contains the names of the taxa. It is also allowed to have the descriptors as row.names of the data.frame -this requires setting d.column=0.
row: vector or data.frame that contains the data that will be used to label the rows in wide format. This can consist of multiple colums.
column: vector with the data that will be used to label the columns in wide format.
descriptor: name(s) of the descriptor, i.e. *where* the data were taken, e.g. station names. Either a vector, a list, a data.frame or matrix (and with multiple columns). It can be of type numerical, character, or a factor. When a data.frame or a list the "names" will be used in the output; when a vector, the argument name will be used in the output.
taxon: gives the taxonomic name(s) (e.g. species). Should be of the same length as (the number of rows of) descriptor. Can be a list (or data.frame with one column), or a vector. When a data.frame or a list the "name" of the column will be used in the output; when a vector, the argument name will be used.
value: vector or list that contains the *values* of the data, usually density. Should be of the same length as (the number of rows of) descriptor and taxon. For function getDensity, value can also be a multi-column data.frame or matrix.
averageOver: vector with *replicates* over which averages need to be taken. Should be of the same length as (the number of rows of) descriptor.
subset: logical expression indicating elements to keep: missing values are taken as FALSE. If NULL, or absent, then all elements are used. Note that the subset is taken *after* the number of samples to average per descriptor is calculated, so this will also work for selecting certain taxa that may not be present in all replicates over which should be averaged.
taxonomy: taxonomic information; first column will be matched with taxon, regardless of its name.
d.column: position(s) or name(s) of the column(s) that holds the descriptors of the (density) data set, and that should be removed for any calculations. The default is to have the first column holding the descriptors. If NULL, or 0, then there is no separate column with names, so the row.names of the dataset are used as descriptor names.
t.column: position(s) or name(s) of the column(s) that holds the taxon names of the (trait) data set, and that should be removed for any calculations. The default is to have the first column holding the taxa. If NULL, or 0, then there is no separate column with names, so the row.names of the dataset are used as taxon names.
trait: (taxon x trait) data or (descriptor x trait) data, in WIDE format. Traits can be fuzzy coded. In the default setting, the first column contains the name of the taxon, and t.column=1. It is also allowed to have the taxa as row.names of the data.frame - set t.column=0.
w.names, taxon.names, trait.names: names of the items constituting the columns in the wide dataset. If not given, the columnames (minus d.column) will be used. Input this as a data.frame if you want to set the names of the columns in the long format.
absences: if TRUE the long format will contains 0's for absences

Author

Karline Soetaert <karline.soetaert@nioz.nl>

Details

About long2wide and wide2long:

There are two ways in which density data can be inputted:

descriptor, taxon, value, replicates, ... are vectors with density data in *LONG* format: (where, which, replicates (averageOver), value); all these vectors should be of equal length (or NULL).
wide has the density data in *WIDE* format, i.e. as a matrix with the descriptor (and perhaps replicates) in the first column, the taxon as the column names (excluding the first column), and the content of the data is the density.

Examples


## ====================================================
## Long to wide format
## ====================================================

##-----------------------------------------------------
## A small dataset with replicates
##-----------------------------------------------------

# 2 stations, 2 replicates for st.a, one replicate for st.b
Bdata.rep <- data.frame(
  station   = c("st.a","st.a","st.a","st.b","st.b","st.b"),
  replicate = c(     1,     1,    2,     1,     1,     1),
  species   = c("sp.1","sp.2","sp.1","sp.3","sp.4","sp.5"),
  density   = c(     1,     2,    3,     3,     1,     3)
)
Bdata.rep
#>   station replicate species density
#> 1    st.a         1    sp.1       1
#> 2    st.a         1    sp.2       2
#> 3    st.a         2    sp.1       3
#> 4    st.b         1    sp.3       3
#> 5    st.b         1    sp.4       1
#> 6    st.b         1    sp.5       3

##-----------------------------------------------------
## Go to wide format, average of replicates
##-----------------------------------------------------

with (Bdata.rep, 
  l2wDensity(value      = density, 
            descriptor  = station, 
            taxon       = species, 
            averageOver = replicate))
#>   descriptor sp.1 sp.2 sp.3 sp.4 sp.5
#> 1       st.a    2    1    0    0    0
#> 2       st.b    0    0    3    1    3

##-----------------------------------------------------
## Go to wide format, keep replicates
##-----------------------------------------------------

with (Bdata.rep, 
  l2wDensity(value      = density, 
            descriptor = cbind(station, replicate), 
            taxon      = species))
#>   station replicate sp.1 sp.2 sp.3 sp.4 sp.5
#> 1    st.a         1    1    2    0    0    0
#> 2    st.b         1    0    0    3    1    3
#> 3    st.a         2    3    0    0    0    0

##-----------------------------------------------------
## Go to wide format, ADD replicates
##-----------------------------------------------------

with (Bdata.rep, 
  l2wDensity(value      = density, 
            descriptor = station,  
            taxon      = species))
#>   descriptor sp.1 sp.2 sp.3 sp.4 sp.5
#> 1       st.a    4    2    0    0    0
#> 2       st.b    0    0    3    1    3


## ====================================================
## A small dataset without replicates
## ====================================================

Bdata <- data.frame(
  station = c("st.a","st.a","st.b","st.b","st.b","st.c"),
  species = c("sp.1","sp.2","sp.1","sp.3","sp.4","sp.5"),
  density = c(1, 2, 3, 3, 1, 3)
)

##-----------------------------------------------------
## From long to wide format
##-----------------------------------------------------

Bwide <- with (Bdata, 
  l2wDensity (value      = density, 
             descriptor = station, 
             taxon      = species))
Bwide
#>   descriptor sp.1 sp.2 sp.3 sp.4 sp.5
#> 1       st.a    1    2    0    0    0
#> 2       st.b    3    0    3    1    0
#> 3       st.c    0    0    0    0    3


## ====================================================
## Small dataset: taxonomy
## ====================================================

Btaxonomy <- data.frame(
  species = c("sp.1","sp.2","sp.3","sp.4","sp.5","sp.6"),
  genus   = c( "g.1", "g.2", "g.2", "g.2", "g.3", "g.4"),
  family  = c( "f.1", "f.1", "f.1", "f.1", "f.2", "f.3"),
  order   = c( "o.1", "o.1", "o.1", "o.1", "o.2", "o.2"),
  class   = c( "c.1", "c.1", "c.1", "c.1", "c.1", "c.1")
  )

##-----------------------------------------------------
## density on higher taxonomic level
##-----------------------------------------------------

# add genus, family... to the density data

Bdata.ext <- merge(Bdata, Btaxonomy)
head(Bdata.ext)   
#>   species station density genus family order class
#> 1    sp.1    st.a       1   g.1    f.1   o.1   c.1
#> 2    sp.1    st.b       3   g.1    f.1   o.1   c.1
#> 3    sp.2    st.a       2   g.2    f.1   o.1   c.1
#> 4    sp.3    st.b       3   g.2    f.1   o.1   c.1
#> 5    sp.4    st.b       1   g.2    f.1   o.1   c.1
#> 6    sp.5    st.c       3   g.3    f.2   o.2   c.1

# estimate (summed) density on genus level 
Bwide.genus <- with (Bdata.ext, 
  l2wDensity(descriptor = station, 
             taxon      = genus,
             value      = density)
                    )

Bwide.genus
#>   descriptor g.1 g.2 g.3
#> 1       st.a   1   2   0
#> 2       st.b   3   4   0
#> 3       st.c   0   0   3

##-----------------------------------------------------
## select part of the data
##-----------------------------------------------------

# return species density for g.2 only
with (Bdata.ext, 
  l2wDensity(value     = density, 
            descriptor = station, 
            taxon      = species, 
            subset     = genus=="g.2")
      )
#>   descriptor sp.2 sp.3 sp.4
#> 1       st.a    2    0    0
#> 2       st.b    0    3    1

# create summed values for g.2 only
with (Bdata.ext, 
  l2wDensity(value     = density, 
            descriptor = station, 
            taxon      = genus, 
            subset     = genus=="g.2")
      )
#>   descriptor g.2
#> 1       st.a   2
#> 2       st.b   4

## ====================================================
## From wide to long format
## ====================================================

Bwide <- data.frame(station=c("Sta", "Stb", "Stc"),
                    sp1    =c(    1,     3,     0),
                    sp2    =c(    2,     0,     0),
                    sp3    =c(    0,     0,     3))

# this long format includes the 0 densities
wide2long (wide = Bwide, absences=TRUE)
#>   station name value
#> 1     Sta  sp1     1
#> 2     Stb  sp1     3
#> 3     Stc  sp1     0
#> 4     Sta  sp2     2
#> 5     Stb  sp2     0
#> 6     Stc  sp2     0
#> 7     Sta  sp3     0
#> 8     Stb  sp3     0
#> 9     Stc  sp3     3

# this labels the species column appropriately
wide2long (wide = Bwide, w.name=data.frame(species=colnames(Bwide)[-1]))
#>   station species value
#> 1     Sta     sp1     1
#> 2     Stb     sp1     3
#> 3     Sta     sp2     2
#> 4     Stc     sp3     3