Skip to content

Diagnostic functions to assess the quality of cell type annotations in single-cell RNA-seq data

Notifications You must be signed in to change notification settings

ccb-hms/scDiagnostics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scDiagnostics: diagnostic functions to assess the quality of cell type annotations in single-cell RNA-seq data

Overview

The accurate annotation of cell types is a critical step in single-cell RNA-sequencing (scRNA-seq) analysis. While annotation transfer from a reference dataset offers a convenient and automated approach, it can also introduce biases and errors if not performed carefully.

scDiagnostics is an R package designed to address this challenge by providing a comprehensive set of diagnostic tools for evaluating the quality of cell type annotations in scRNA-seq data. With scDiagnostics, researchers can systematically assess the compatibility and accuracy of annotations, ensuring reliable and reproducible results in their scRNA-seq analysis workflow.

Installation

Release Version

To install the release version of the package from Bioconductor use the following command:

BiocManager::install("scDiagnostics")

NOTE: you will need the BiocManager package to install from Bioconductor.

To build the package vignettes upon installation use:

BiocManager::install("scDiagnostics",
                     build_vignettes = TRUE,
                     dependencies = TRUE)

Development Version

To install the development version of the package from GitHub use the following command:

BiocManager::install("ccb-hms/scDiagnostics")

To build the package vignettes upon installation use:

BiocManager::install("ccb-hms/scDiagnostics",
                     build_vignettes = TRUE,
                     dependencies = TRUE)

Usage

See the scDiagnostics website for an overview of the functionality of the package. The individual documentation of each available function incl. usage details and executable examples is available in the reference tab.

Key Features

See the Getting Started with scDiagnostics vignette for an overview of the main functionality of the package. As listed below, additional vignettes are available for specific diagnostic tasks.

  • Visualization of Cell Type Annotations: Illustrates the distributions of cell type annotations of the query and reference dataset, allowing the user to identify potential differences in the cell type composition between datasets.

  • Evaluation of QC and Annotation Scores: Provides functionality for assessing the impact of frequently used QC criteria on the cell type annotation confidence, allowing the user to identify systematic relationships between QC metrics and the predicted cell type categories.

  • Evaluation of Dataset and Marker Gene Alignment: Provides functionality for assessing dataset alignment through quantitative comparison of query-to-reference projections in reduced dimension space. Additional functionality for assessing marker gene expression across datasets allows the user to identify potential misalignments between reference and query on the level of individual genes.

  • Statistical Measures to Assess Dataset Alignment: Utilizes statistical tests and metrics to quantitatively assess the alignment between the query and reference datasets. Key measures include p-values from Hotelling's T-squared test and Cramer's V statistic, which evaluate the degree of similarity or dissimilarity in cell type distributions and principal component projections. These statistical assessments help in determining the robustness of dataset alignment and highlight any significant differences that may impact the reliability of cell type annotations.

  • Detection of Annotation Anomalies: Focuses on identifying inconsistencies or anomalies in cell type annotations between the query and reference datasets through comparison of expert annotations with annotations derived from automated methods. By highlighting discrepancies that could be indicative of potential errors, this feature aids in refining and improving the accuracy and reliability of cell type classifications.

  • Analysis of Distances Between Specific Cells and Cell Populations: Calculates distances or similarities between individual cells and predefined cell types in both the query and reference datasets. This analysis helps determine how closely each cell in the query dataset matches the cell types defined in the reference dataset. By providing insights into cell type classification accuracy and identifying potential mismatches, this functionality supports more precise annotation and aids in detecting areas that may require further investigation or refinement.

About

Diagnostic functions to assess the quality of cell type annotations in single-cell RNA-seq data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages