Getting Started Guides for Coding and Computational Biology

This fall I realized that a lot of the knowledge that I at some point during my PhD started taking for granted was not, indeed, universal. One of my favorite things about MIT was almost no information lived exclusively in someone’s brain; all of it was somewhere accessible online. In that spirit I started putting together written guides to whatever I happen to know, initially intended for anyone in our lab and now intended also for you. I’m not reinventing the wheel, not making anything new, just pointing newcomers to resources they can use to get started.

Here are the getting started guides you can read so far:

  • Getting started running viral-ngs workflows in Terra for no-code, scalable sequence analysis
    Our lab’s sequence analysis toolkit and codebase, called viral-ngs, is housed in Terra, a smart, no-code interface for running analyses on any number of samples. viral-ngs is our lab’s collection of frequently used, modular, generalizable computational workflows. My three favorite workflows are demux_plus, assemble_refbased, and align_and_plot….
  • Getting started using R for data analysis and visualization
    Our lab is split pretty evenly between people who use Python for data analysis and visualization and people who use R for data analysis and visualization. I strongly prefer R, because it was made specifically for data analysis, but there is no wrong answer. R is especially powerful because of the added functionality of its libraries, such as dplyr and ggplot2….
  • Getting started using regular expressions for pattern matching and smart find and replace
    Regular expressions (nicknamed regex) are an extremely powerful (and in my opinion vastly underused) tool for pattern matching and find and replace. Here are some things I have recently used regex for: changing the format of dates from 10/5/2022 to 2022-10-05; scraping a web page to retrieve the first image appearing in each web page it links to….

More soon….