Introduction into the topic
Salvador et al. (2023) published a research article on age-dependent
immune and lymphatic responses following spinal cord injury to better
understand the lifelong disabling effects that can result from such
severe injuries
Publication.
The study investigates how the immune and lymphatic systems respond to
spinal cord injuries in young and old mice. The researchers used a
technique called single-cell RNA sequencing to study the activity of
individual cells by measuring which genes are turned on in each cell
(scRNA-seq).
Data
The data is publicly available on the Gene Expression Omnibus (GEO)
under the accession number GSE205038. The pre-analyzed data set
mmc4.xlsx is already provided on GitHub Data
file.
To give you an idea how the data looks like:
Lars2 |
1.705496 |
0 |
GO:0045766 |
positive regulation of angiogenesis |
11/109 |
188/23328 |
0 |
3.2e-06 |
- logFC: log2 fold change in gene expression between experimental and
control conditions (positive values indicate upregulation and
negative values indicate downregulation)
- ID: identifier of the associated gene ontology (GO) term or pathway
- Description: a description of the biological process associated with
the GO term.
- GeneRatio: the ratio of genes in the GO term to the total number of
genes in the analysis.
- BgRatio: not relevant for the data manipulation & visualization
tasks
Data manipulation
- The excel data file contains multiple sheets, so the first task is
to get familiar with the data file. Find the sheet “Aged vs Young
Macrophages”.
- Extract the sheet “Aged vs Young Macrophages” so that you can load
it into R. It contains the differentially expressed genes (DEGs) and
the GO terms for different time points (naïve, 3dpi, 7dpi, 14dpi).
We are only interested in the naïve aged macrophages vs. naïve young
macrophages comparison.
- Seperate the data into two data frames: one for the DEGs and one for
the GO terms. The DEGs data frame should contain the columns “gene”,
“logFC”, “p.adjust”. The GO terms data frame should contain the
columns “ID”, Description”, “GeneRatio”, “BgRAtio”, “pvalue”,
“p.adjust”. (Be careful with the column names, two columns have the
same name! Rename the columns if necessary.)
Now let’s take a look at the DEGs:
- Create a new column called logFC_high that is TRUE if logFC > 2
for a gene of “naïve aged macrophages vs. naïve young macrophages” ,
and FALSE otherwise.
- Filter the DEGs to include only those with p.adjust < 0.05, and
return the top 10 genes with the highest logFC. Please display the
top 10 genes in a nice table.
Now, we will focus on the GO terms to get on rough idea what biological processes, but not individual genes, are upregulated.
- Filter the upregulated GO terms to include only those whose
description contains keywords such as “angiogenesis”, “immune
response”, “immunity”, “cytokine”, “vasculature”, “wound”,
“inflammatory response”, “chemokine”, “lymphatic”, “lymphocyte”,
“macrophage”, “monocyte”.
- Remove the columns “BG_Ratio”, “pvalue” and “ID”.
- After the two steps you end up with the three columns “Description”,
“GeneRatio”, and “p_adjust_2” that are of interest for the second
visualization task.
Data visualization
- Create a volcano plot for all DEG genes. On the x-axis display
log2FC for all genes and on the y-axis display –log10(adjusted
p-value) (the negative log10 of the adjusted p-value) for all genes.
Add a horizontal line at -log10(0.05) and a vertical line at logFC =
2 and logFC = -2.
- Color the points based on the logFC_high status (TRUE or FALSE) and
label the points with the gene names.
- Label the axes accordingly and add a title. I added a picture to
make it easier to understand what a volcano plot is:

- Create a dot plot with GO terms on the y-axis sorted by descending
GeneRatio on the x-axis.
- Use a single-color gradient to indicate significance based on the
adjusted p-value (non-significant terms in a lighter shade and vice
versa).
- Include a clear legend explaining the color intensity
(significance).
- If you can find the time: Present the GeneRatio as the size of the
dots in the dotplot.
The data wrangling and the volcano plot will certainly take up a lot of
time. That’s why the dot plot is more for when you still have time to
spare.
I’m looking forward to your solutions! Good luck!