☰ Menu

      Single Cell RNA-Seq Analysis

Home
Discussion and Lectures
Intro to the Workshop
What is Bioinformatics/Genomics?
Experimental Design and Cost Estimation
Closing thoughts
Data Reduction
Files and Filetypes
Generating Expression Matrix
Prerequisites
CLI
R
Data analysis
Prepare scRNAseq Analysis
Part 1- Create object
Part 2- Filtering
Part 3- Normalization
Part 4- Dimensionality reduction
Part 6- Enrichment and DE
Support
Cheat Sheets
Software and Links
Scripts
ETC
CAT website
Github page
Report Errors

Create a new RStudio project

Open RStudio and create a new project, for more info see Using-Projects:

File > New Project > New Directory > New Project

Name the new workshop directory (e.g. scRNA_analysis), and check “use renv with this project” if present.

Learn more about renv.

Install packages

One of R’s many benefits is the large, active user community, which produces and maintains many packages that extend the functionality of base R and provide functions that enable bioinformatic analyses without completely custom code.

The following package installation commands should be run individually, in the R console. Many of them will require your input to determine which, if any, dependencies should be updated; for the quickest result, attempt ‘n’ (none) first.

R-universe for arm64 installations

r-universe is a new umbrella project by rOpenSci. It uses cross-compiling for arm64 binaries.

For those who are using Macs that have M1/M2/M3 chips, if you have trouble installing the packages and get error that is similar to “ld: warning: ignoring file ‘/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libR.dylib’: found architecture ‘arm64’, required architecture ‘x86_64’”, please go to https://r-universe.dev/search/ and search for the packages and use the installation instructions provided there.

BiocManager

BiocManager is an interface for the bioinformatics-specific R package repository. We will be using BiocManager to install other packages when possible, rather than the base R function install.packages.

if (!requireNamespace("BiocManager", quietly = TRUE)){
    install.packages("BiocManager")
}

rmarkdown

The rmarkdown package, when used with others like tinytex and knitr, allows you to knit your Rmd document to nicely-formatted reports.

if (!any(rownames(installed.packages()) == "rmarkdown")){
  BiocManager::install("rmarkdown")
}
library(rmarkdown)

tinytex

TinyTeX is a small LaTeX distribution for use with R.

if (!any(rownames(installed.packages()) == "tinytex")){
  BiocManager::install("tinytex")
}
library(tinytex)

knitr

if (!any(rownames(installed.packages()) == "knitr")){
  BiocManager::install("knitr")
}
library(knitr)

kableExtra

The kableExtra package gives the user fine-grained control over table formats. This is useful for knit reports.

if (!any(rownames(installed.packages()) == "kableExtra")){
  BiocManager::install("kableExtra")
}
library(kableExtra)

ggplot2

An extremely popular package by the authors of RStudio, ggplot2 produces highly customizable plots.

if (!any(rownames(installed.packages()) == "ggplot2")){
  BiocManager::install("ggplot2")
}
library(ggplot2)

dplyr

Like ggplot2 and tidyr, dplyr is part of the “tidyverse” by the RStudio authors: a group of packages designed for data analysis and visualization.

if (!any(rownames(installed.packages()) == "dplyr")){
  BiocManager::install("dplyr")
}
library(dplyr)

tidyr

if (!any(rownames(installed.packages()) == "tidyr")){
  BiocManager::install("tidyr")
}
library(tidyr)

viridis

viridis produces accessible color palettes.

if (!any(rownames(installed.packages()) == "viridis")){
  BiocManager::install("viridis")
}
library(viridis)

hdf5r

HDF5 (heirarchical data format version five) files can be used to store single cell expression data (including output from Cell Ranger). The hdf5r package provides utilities for interacting with the format.

if (!any(rownames(installed.packages()) == "hdf5r")){
  BiocManager::install("hdf5r")
}
library(hdf5r)

Seurat

Seurat is an extensive package for the analysis of single cell experiments, from normalization to visualization.

if (!any(rownames(installed.packages()) == "Seurat")){
  BiocManager::install("Seurat")
}
library(Seurat)

ComplexHeatmap

ComplexHeatmap produces beautiful, highly-customizable heat maps.

if (!any(rownames(installed.packages()) == "ComplexHeatmap")){
  BiocManager::install("ComplexHeatmap")
}
library(ComplexHeatmap)

biomaRt

This package provides an interface to Ensembl databases.

if (!any(rownames(installed.packages()) == "biomaRt")){
  BiocManager::install("biomaRt")
}
library(biomaRt)

org.Hs.eg.db

org.Hs.eg.db contains genome-wide annotation based on Entrez Gene identifiers in the Human genome.

if (!any(rownames(installed.packages()) == "org.Hs.eg.db")){
  BiocManager::install("org.Hs.eg.db")
}
library(org.Hs.eg.db)

limma

Originally developed for microarray data, limma provides functions for linear modeling and differential expression.

if (!any(rownames(installed.packages()) == "limma")){
  BiocManager::install("limma")
}
library(limma)

topGO

Test gene ontology (GO) term enrichment while accounting for the topology of the GO graph.

if (!any(rownames(installed.packages()) == "topGO")){
  BiocManager::install("topGO")
}
library(topGO)

remotes

Some packages (or versions of packages) cannot be installed through Bioconductor. The remotes package contains tools for installing packages from a number of repositories, including GitHub.

if (!any(rownames(installed.packages()) == "remotes")){
  utils::install.packages("remotes")
}
library(remotes)

ape

Analysis of Phylogenetics and Evolution (ape) is used to generate and manipulate phylogenetic trees. In this workshop, we will be using ape to investigate the relationships between clusters.

if (!any(rownames(installed.packages()) == "ape")){
  utils::install.packages("ape")
}
library(ape)

DoubletFinder

DoubletFinder detects multiplets within single cell or nucleus data.

if (!any(rownames(installed.packages()) == "DoubletFinder")){
  remotes::install_github('chris-mcginnis-ucsf/DoubletFinder')
}
library(DoubletFinder)

openxlsx

The openxlsx package is a suite of tools for reading and writing .xlsx files.

if (!any(rownames(installed.packages()) == "openxlsx")){
  BiocManager::install("openxlsx")
}
library(openxlsx)

HGNChelper

Both R and Excel can introduce changes to gene symbols. HGNChelper can correct gene symbols that have been altered, and convert gene symbols to valid R names.

if (!any(rownames(installed.packages()) == "HGNChelper")){
  BiocManager::install("HGNChelper")
}
library(HGNChelper)

Expression matrix

Move the downloaded dataset to the RStudio Project folder.

If the zip file isn’t extracted yet you can extrct it on the R terminal.

In Rstudio, navigate to the terminal tab (next to the console). This gives you access to a bash terminal. Run the following code:

unzip 01-CellRanger.zip

When the download and extraction are complete, you should see three folder: 01-CellRanger

Download materials and prepare for the next section

In the R console run the following command to download part 1 of data analysis.

Markdown template document

download.file("https://raw.githubusercontent.com/ucsf-cat-bioinformatics/2024-08-SCRNA-Seq-Analysis/main/data_analysis/01-create_object.Rmd", "01-create_object2.Rmd")

Verfiy installation

Finally, we can get the session info to ensure that all of the packages were installed and loaded correctly.

sessionInfo()