Data Wrangling and Processing for Genomics: Glossary

Key Points

Background and Metadata
  • It’s important to record and understand your experiment’s metadata.

Assessing Read Quality
  • Quality encodings vary across sequencing platforms.

  • for loops let you perform the same set of operations on multiple files with a single command.

Trimming and Filtering
  • The options you set for the command-line tools you use are important!

  • Data cleaning is an essential step in a genomics workflow.

Variant Calling Workflow
  • Bioinformatic command line tools are collections of commands that can be used to carry out bioinformatic analyses.

  • To use most powerful bioinformatic tools, you’ll need to use the command line.

  • There are many different file formats for storing genomics data. It’s important to understand what type of information is contained in each file, and how it was derived.

Automated Version Control
  • Version control is like an unlimited ‘undo’.

  • Version control also allows many people to work in parallel.

Setting up Version Control with Git
  • Use git config with the --global option to configure a user name, email address, editor, and other preferences once per machine.

Creating a Repository
  • git init initializes a repository.

  • Git stores all of its repository data in the .git directory.

Tracking Changes
  • git status shows the status of a repository.

  • Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).

  • git add puts files in the staging area.

  • git commit saves the staged content as a new commit in the local repository.

  • Write a commit message that accurately describes your changes.

Ignoring Things
  • The .gitignore file tells Git what files to ignore.

Automating a Variant Calling Workflow
  • We can combine multiple commands into a shell script to automate a workflow.

  • Use echo statements within your scripts to get an automated progress update.

Glossary

FIXME