Data101: From raw data to individual samples files
|
Raw data should always be checked with FastQC.
Assigning reads to specific samples is called demultiplexing
process_radtags is the built in demultiplexing tools of Stacks and it includes some basic quality control
|
De-novo assembly without a reference genome
|
-M is the main parameter to optimise when identifying variants de-novo using Stacks
Optimisation can often be performed with a subset of the data
SLURM scripts are the way to harvest the cluster’s potential by running jobs
|
Assembly with a reference genome
|
Reference genomes, even of poor quality or from a related species are great for SNP identification
Reference-based SNP calling takes the guess work of distance between and within loci away by mapping reads to individual location within the genome
|
Population genetics analyses
|
SNP filtering is about balancing signal vs noise
Populations is the stacks implemented software to deal with filtering of SNPs
Principal component analysis (PCA) and Structure are easy visualisation tools for your samples
|