This lesson is still being designed and assembled (Pre-Alpha version)

Genotype by Sequencing: Glossary

Key Points

Data101: From raw data to individual samples files
  • Raw data should always be checked with FastQC.

  • Assigning reads to specific samples is called demultiplexing

  • process_radtags is the built in demultiplexing tools of Stacks and it includes some basic quality control

De-novo assembly without a reference genome
  • -M is the main parameter to optimise when identifying variants de-novo using Stacks

  • Optimisation can often be performed with a subset of the data

  • SLURM scripts are the way to harvest the cluster’s potential by running jobs

Assembly with a reference genome
  • Reference genomes, even of poor quality or from a related species are great for SNP identification

  • Reference-based SNP calling takes the guess work of distance between and within loci away by mapping reads to individual location within the genome

Population genetics analyses
  • SNP filtering is about balancing signal vs noise

  • Populations is the stacks implemented software to deal with filtering of SNPs

  • Principal component analysis (PCA) and Structure are easy visualisation tools for your samples