Post-integration clustering stage (`02_int_clustering`)
Document generated: 2024-09-27 15:33:10 UTC+0000
Source:vignettes/stage_int_clustering.Rmd
stage_int_clustering.Rmd
This stage uses the result from a selected integration method and
performs clustering, cell type annotation and visualization (similar to
the 02_norm_clustering
stage of the single-sample
pipeline). HVGs, reduced dimensions, and selected markers are already
computed in the previous stage (01_integration
).
⚙️ Config file:
config/integration/02_int_clustering.yaml
📋 HTML report target (in config/pipeline.yaml
):
DRAKE_TARGETS: ["report_int_clustering"]
📜 Example report (used config)
Config for this stage is stored in the
config/integration/02_int_clustering.yaml
file. Directory
with this file is read from SCDRAKE_INTEGRATION_CONFIG_DIR
environment variable upon scdrake load or attach, and
saved as scdrake_integration_config_dir
option. This option
is used as the default argument value in several scdrake
functions.
The following parameters are the same as those in the
02_norm_clustering
stage of the single-sample pipeline (see
vignette("stage_norm_clustering")
):
- Clustering and cell annotation parameters.
- The only exception is
CLUSTER_SC3_ENABLED
that is automatically set toFALSE
whenINTEGRATION_FINAL_METHOD
is"harmony"
, because SC3 clustering cannot be performed on reduced dimensions (Harmony does not compute an integrated expression matrix).
- The only exception is
CELL_GROUPINGS
-
INT_CLUSTERING_REPORT_DIMRED_NAMES
,INT_CLUSTERING_REPORT_CLUSTERING_NAMES
,INT_CLUSTERING_REPORT_DIMRED_PLOTS_OTHER
-
INT_CLUSTERING_KNITR_MESSAGE
,INT_CLUSTERING_KNITR_WARNING
,INT_CLUSTERING_KNITR_ECHO
Selection of final integration method
INTEGRATION_FINAL_METHOD: "mnn"
Type: character scalar ("mnn"
|
"rescaling"
| "regression"
|
"harmony"
)
A name of the final integration method that will be used for clustering and downstream steps.
INTEGRATION_FINAL_METHOD_RM_CC: False
Type: logical scalar
Whether to take the result with removed cell cycle-related genes.
This will be also applied to the “uncorrected” method (which is used for
cluster markers and contrasts). True
is only possible when
any of the single-samples in the INTEGRATION_SOURCES
parameter (01_integration.yaml
) has
hvg_rm_cc_genes
set to True
.
Cell grouping assignment
ADDITIONAL_CELL_DATA_FILE: null
Type: character scalar or null
Same as in the 02_norm_clustering
stage of the
single-sample pipeline (see
vignette("stage_norm_clustering")
), except the additional
data must contain two columns: Barcode
and
batch
. The latter must match the dataset names in
INTEGRATION_SOURCES
in 01_integration.yaml
config file. Example:
DataFrame with 4 rows and 3 columns
Barcode batch cluster_custom
<character> <factor> <factor>
AAACCCAAGTTGGGAC-1-pbmc1k AAACCCAAGTTGGGAC-1 pbmc1k 2
AAACCCACATTCTGTT-1-pbmc1k AAACCCACATTCTGTT-1 pbmc1k 1
AAACCCAGTCAGACGA-1-pbmc3k AAACCCAGTCAGACGA-1 pbmc3k 1
AAACCCAGTTTGTTGG-1-pbmc3k AAACCCAGTTTGTTGG-1 pbmc3k 2
...
Note that rownames are not mandatory.
Input files
INT_CLUSTERING_REPORT_RMD_FILE: "Rmd/integration/02_int_clustering.Rmd"
Type: character scalar
A path to RMarkdown files used for HTML report of this pipeline stage.
Output files
INT_CLUSTERING_BASE_OUT_DIR: "02_int_clustering"
Type: character scalar
A path to base output directory for this stage. It will be created
under BASE_OUT_DIR
specified in 00_main.yaml
config.
INT_CLUSTERING_DIMRED_PLOTS_OUT_DIR: "dimred_plots"
INT_CLUSTERING_CELL_ANNOTATION_OUT_DIR: "cell_annotation"
INT_CLUSTERING_OTHER_PLOTS_OUT_DIR: "other_plots"
INT_CLUSTERING_REPORT_HTML_FILE: "02_int_clustering.html"
Type: character scalar
Names of files and directories created under
INT_CLUSTERING_BASE_OUT_DIR
. Subdirectories are not
allowed.
Here you can find description of the most important targets for this
stage. However, for a full overview, you have to inspect the source
code of the get_int_clustering_subplan()
function.
HTML report target name: report_int_clustering
SingleCellExperiment
objects
sce_int_uncorrected
: the selected uncorrected SCE object
according to the INTEGRATION_FINAL_METHOD_RM_CC
parameter.
sce_int_final
: the selected integrated SCE object
according to the INTEGRATION_FINAL_METHOD
and
INTEGRATION_FINAL_METHOD_RM_CC
parameters.
sce_int_clustering_final
: similar to
sce_final_norm_clustering
in the
02_norm_clustering
stage of the single-sample pipeline.
Targes similar to 02_norm_clustering
These targets are basically the same as those in the
02_norm_clustering
pipeline in the single-sample pipeline
(see vignette("stage_norm_clustering")
).
However, for a full overview, you have to inspect the source
code of the get_int_clustering_subplan()
function.
selected_markers_int_plots_final
: selected markers plots
for the selected integration method