This lesson is still being designed and assembled (Pre-Alpha version)

Statistics with R: Getting Started

Overview

Teaching: 20 min
Exercises: 10 min
Questions
  • What R packages do I need to explore my results?

  • How do I import my results into R?

Objectives
  • Learn the basics of the R package Phyloseq

  • Create R objects from metabarcoding results

  • Output graphics to files

Importing outputs into R

For examining the results of our analysis in R, we primarily be using the Phyloseq package, with some additional packages.

There are many possible file and data types that can be imported into Phyloseq:

alt_text

To start off, we will load the R libraries we will need

# load the packages
library('phyloseq')
library('tibble')
library('ggplot2')
library('ape')
library('vegan')

We will do the work in the ‘plots’ directory, so in R, set that as the working directory

# set the working directory
setwd('../plots')

Now, we will put together all the elements to create a Phyloseq object

Import the frequency table

First we will import the frequency table.

import_table <- read.table('../otus/otu_frequency_table.tsv',header=TRUE,sep='\t',row.names=1, comment.char = "")
head(import_table)
A data.frame: 6 × 11
AM1AM2AM3AM4AM5AM6AS2AS3AS4AS5AS6
<int><int><int><int><int><int><int><int><int><int><int>
OTU.1 723363402907171195627302856419237973392
OTU.10 0 00 0 0 02223 0 01024 0
OTU.111892 00 0 82 113 0 0 0 0 0
OTU.12 015870 0 0 0 0 0 0 0 0
OTU.13 0 00 0 0 0 0 0 01472 0
OTU.14 0 00 0 0 0 0 0 0 01087

Now convert to a matrix for Phyloseq

otumat <- as.matrix(import_table)
head(otumat)
A matrix: 6 × 11 of type int
AM1AM2AM3AM4AM5AM6AS2AS3AS4AS5AS6
OTU.1 723363402907171195627302856419237973392
OTU.10 0 00 0 0 02223 0 01024 0
OTU.111892 00 0 82 113 0 0 0 0 0
OTU.12 015870 0 0 0 0 0 0 0 0
OTU.13 0 00 0 0 0 0 0 01472 0
OTU.14 0 00 0 0 0 0 0 0 01087

Now create an OTU object using the function otu_table

OTU <- otu_table(otumat, taxa_are_rows = TRUE)
head(OTU)
A otu_table: 6 × 11 of type int
AM1AM2AM3AM4AM5AM6AS2AS3AS4AS5AS6
OTU.1 723363402907171195627302856419237973392
OTU.10 0 00 0 0 02223 0 01024 0
OTU.111892 00 0 82 113 0 0 0 0 0
OTU.12 015870 0 0 0 0 0 0 0 0
OTU.13 0 00 0 0 0 0 0 01472 0
OTU.14 0 00 0 0 0 0 0 0 01087

Import the taxonomy table

Now we will import the taxonomy table. This is the output from the Sintax analysis.

import_taxa <- read.table('../taxonomy/sintax_taxonomy.tsv',header=TRUE,sep='\t',row.names=1)
head(import_taxa)
A matrix: 6 × 8 of type chr
kingdomphylumclassorderfamilygenusspeciesConfidence
OTU.1EukaryotaChordataActinopteriScombriformesGempylidae Thyrsites Thyrsites_atun 1.0000000
OTU.2EukaryotaChordataActinopteriMugiliformes Mugilidae Aldrichetta Aldrichetta_forsteri 0.9999999
OTU.3EukaryotaChordataActinopteriPerciformes Bovichtidae Bovichtus Bovichtus_variegatus 0.9999979
OTU.4EukaryotaChordataActinopteriBlenniiformesTripterygiidaeForsterygionForsterygion_lapillum1.0000000
OTU.5EukaryotaChordataActinopteriLabriformes Labridae Notolabrus Notolabrus_fucicola 0.9996494
OTU.6EukaryotaChordataActinopteriBlenniiformesTripterygiidaeForsterygionForsterygion_lapillum1.0000000

Convert to a matrix:

taxonomy <- as.matrix(import_taxa)

Create a taxonomy class object

TAX <- tax_table(taxonomy)

Import the sample metadata

metadata <- read.table('../docs/sample_metadata.tsv',header = T,sep='\t',row.names = 1)
metadata
A data.frame: 12 × 8
fwd_barcoderev_barcodeforward_primerreverse_primerlocationtemperaturesalinitysample
<chr><chr><chr><chr><chr><int><int><chr>
AM1GAAGAGTAGCGTCGGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTmudflats1232AM1
AM2GAAGAGTCTACTCGGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTmudflats1432AM2
AM3GAAGAGATGACTCGGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTmudflats1232AM3
AM4GAAGAGATCTATCGGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTmudflats1032AM4
AM5GAAGAGACAGATCGGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTmudflats1234AM5
AM6GAAGAGATACTGCGGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTmudflats1034AM6
AS2GAAGAGAGATACTCGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTshore 1232AS2
AS3GAAGAGATGCGATGGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTshore 1232AS3
AS4GAAGAGTGCTACTCGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTshore 1034AS4
AS5GAAGAGACGTCATGGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTshore 1434AS5
AS6GAAGAGTCATGTCGGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTshore 1034AS6
ASNGAAGAGAGACGCTCGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTnegativeNANAASN

As we are not using the negative control, we will remove it

metadata <- metadata[1:11,1:8]
tail(metadata)
A data.frame: 6 × 8
fwd_barcoderev_barcodeforward_primerreverse_primerlocationtemperaturesalinitysample
<chr><chr><chr><chr><chr><int><int><chr>
AM6GAAGAGATACTGCGGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTmudflats1034AM6
AS2GAAGAGAGATACTCGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTshore 1232AS2
AS3GAAGAGATGCGATGGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTshore 1232AS3
AS4GAAGAGTGCTACTCGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTshore 1034AS4
AS5GAAGAGACGTCATGGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTshore 1434AS5
AS6GAAGAGTCATGTCGGACCCTATGGAGCTTTAGACCGCTGTTATCCCTADRGTAACTshore 1034AS6

Create a Phyloseq sample_data-class

META <- sample_data(metadata)

Create a Phyloseq object

Now that we have all the components, it is time to create a Phyloseq object

physeq <- phyloseq(OTU,TAX,META)
physeq
    phyloseq-class experiment-level object
    otu_table()   OTU Table:         [ 33 taxa and 11 samples ]
    sample_data() Sample Data:       [ 11 samples by 8 sample variables ]
    tax_table()   Taxonomy Table:    [ 33 taxa by 8 taxonomic ranks ]

One last thing we will do is to convert two of the metadata variables from continuous to ordinal. This will help us graph these fields later

sample_data(physeq)$temperature <- as(sample_data(physeq)$temperature, "character")

sample_data(physeq)$salinity <- as(sample_data(physeq)$salinity, "character")


Initial data inspection

Now that we have our Phyloseq object, we will take a look at it. One of the first steps is to check alpha rarefaction of species richness. This is done to show that there has been sufficient sequencing to detect most species (OTUs).

rarecurve(t(otu_table(physeq)), step=50, cex=1)

png

create a bar plot of abundance

plot_bar(physeq)

png

print(min(sample_sums(physeq)))
print(max(sample_sums(physeq)))
    [1] 8117
    [1] 20184

Rarefy the data

From the initial look at the data, it is obvious that the sample AS3 has about twice as many reads as any of the other samples. We can use rarefaction to simulate an even number of reads per sample. Rarefying the data is preferred for some analyses, though there is some debate. We will create a rarefied version of the Phyloseq object.

we will rarefy the data around 90% of the lowest sample

physeq.rarefied <- rarefy_even_depth(physeq, rngseed=1, sample.size=0.9*min(sample_sums(physeq)), replace=F)
    `set.seed(1)` was used to initialize repeatable random subsampling.
    
    Please record this for your records so others can reproduce.
    
    Try `set.seed(1); .Random.seed` for the full vector
    
    ...

now plot the rarefied version

plot_bar(physeq.rarefied)

png

Saving your work to files

You can save the Phyloseq object you just created, and then import it into another R session later. This way you do not have to re-import all the components separately.

Also, below are a couple of examples of saving graphs. There are many options for this that you can explore to create publication-quality graphics of your results

Save the phyloseq object

saveRDS(physeq, 'fish_phyloseq.rds')

Also save the rarefied version

saveRDS(physeq.rarefied, 'fish_phyloseq_rarefied.rds')

To later read the Phyloseq object into your R session:

first the original

physeq <- readRDS('fish_phyloseq.rds')
print(physeq)

Saving a graph to file

# open a pdf file
pdf('species_richness_plot.pdf')
# run the plot, or add the saved one
rarecurve(t(otu_table(physeq)), step=50, cex=1.5, col='blue',lty=2)
# close the pdf
dev.off()

pdf: 2

there are other graphic formats that you can use

jpeg("species_richness_plot.jpg", width = 800, height = 800)
rarecurve(t(otu_table(physeq)), step=50, cex=1.5, col='blue',lty=2)
dev.off()

pdf: 2

Key Points

  • Phyloseq has many functions to explore metabarcoding data

  • Results from early lessons must be converted to a matrix

  • Phyloseq is integrated with ggplot to expand graphic output