Get top N highly variable genes (HVGs) by a specified metric.

Types of HVG metrics are specified in 02_norm_clustering.yaml and 01_integration.yaml configs. See the corresponding section there.

Usage

get_top_hvgs(
  sce_norm,
  hvg_metric_fit,
  hvg_selection_value,
  hvg_metric = c("gene_var", "gene_cv2", "sctransform"),
  hvg_selection = c("top", "significance", "threshold")
)

Arguments

sce_norm

A SingleCellExperiment object with computed HVG metric.

hvg_metric_fit

A DataFrame with HVG metric fit as produced by e.g. scran::modelGeneVar().

hvg_selection_value

A numeric scalar: threshold value to select HVGs. This depends on hvg_metric.

hvg_metric

A character scalar: type of HVG metric. If sctransform is used, HVGs are selected by the underlying method, and number of them is controlled by SCT_N_HVG parameter in 02_norm_clustering.yaml. For the other metric types, see the hvg_selection and hvg_selection_value parameters.

hvg_selection

A character scalar: method to use for selection of HVGs. This is only relevant when hvg_metric is "gene_var" or "gene_cv2". See https://bioconductor.org/books/3.15/OSCA.basic/feature-selection.html#hvg-selection and https://bioconductor.org/books/3.15/OSCA.advanced/more-hvgs.html#more-hvg-selection-strategies for more details.

"top": Take top X genes according to a metric. "bio" and "ratio" columns are used for hvg_metric "gene_var" and "gene_cv2", respectively.
"significance": Use FDR threshold.
"threshold": Use threshold on the minimum value of a metric. "bio" and "ratio" columns are used for hvg_metric of "gene_var" and "gene_cv2", respectively.

Value

A character vector of HVG IDs (ENSEMBL).