This lesson is still being designed and assembled (Pre-Alpha version)

Trimming and Filtering reads

Overview

Teaching: 30 min
Exercises: 0 min
Questions
  • What steps are required for basic data cleaning in RNA-seq studies?

Objectives
  • Understand the various data cleaning steps for RNA-seq data.

  • Learn how to perfrorm adapter removal and quality trimming.

Cleaning Reads

In the previous section, we took a high-level look at the quality of each of our samples using FastQC. We visualized per-base quality graphs showing the distribution of read quality at each base across all reads in a sample and extracted information about which samples fail which quality checks. Some of our samples failed quite a few quality metrics used by FastQC. This doesn’t mean, though, that our samples should be thrown out! It’s very common to have some quality metrics fail, and this may or may not be a problem for your downstream application.

Adapter removal

We will use a program called CutAdapt to filter poor quality reads and trim poor quality bases from our samples.

How to act on fastq after QC.

We can do several trimming:

To do so, we can use on tools: The cutadapt application is often used to remove adapter sequence from FASTQ files.

$ pwd
/home/[Your_Username]/obss_2021/RNA_seq

$ mkdir Trimmed

$ module load cutadapt/2.10-gimkl-2020a-Python-3.8.2

$ cutadapt -q 20 -a AACCGGTT -o Trimmed/SRR014335-chr1_cutadapt.fastq Raw/SRR014335-chr1.fastq > Trimmed/SRR014335-chr1.log

We can have a look at the log file to see what cutadapt has done.

$ less Trimmed/SRR014335-chr1.log

Now we should trim all samples.

$ cd Raw

$ ls
SRR014335-chr1.fastq  SRR014336-chr1.fastq  SRR014337-chr1.fastq  SRR014339-chr1.fastq  SRR014340-chr1.fastq  SRR014341-chr1.fastq
$ for filename in *.fastq
> do base=$(basename ${filename} .fastq)
> cutadapt -q 20 -a AACCGGTT -o ../Trimmed/${base}.trimmed.fastq ${filename} > ../Trimmed/${base}.log
> done

MultiQC: cutadapt log files

$ cd ../MultiQC
 
$ cp ../Trimmed/*log .

$ multiqc .

Alt text

Key Points

  • Adapter removal and trimming (optional) are important steps in processign RNA-seq data.