Post-integration clustering stage (`02_int

This stage uses the result from a selected integration method and performs clustering, cell type annotation and visualization (similar to the 02_norm_clustering stage of the single-sample pipeline). HVGs, reduced dimensions, and selected markers are already computed in the previous stage (01_integration).

⚙️ Config file: config/integration/02_int_clustering.yaml

📋 HTML report target (in config/pipeline.yaml): DRAKE_TARGETS: ["report_int_clustering"]

📜 Example report (used config)

Config for this stage is stored in the config/integration/02_int_clustering.yaml file. Directory with this file is read from SCDRAKE_INTEGRATION_CONFIG_DIR environment variable upon scdrake load or attach, and saved as scdrake_integration_config_dir option. This option is used as the default argument value in several scdrake functions.

The following parameters are the same as those in the 02_norm_clustering stage of the single-sample pipeline (see vignette("stage_norm_clustering")):

Clustering and cell annotation parameters.
- The only exception is CLUSTER_SC3_ENABLED that is automatically set to FALSE when INTEGRATION_FINAL_METHOD is "harmony", because SC3 clustering cannot be performed on reduced dimensions (Harmony does not compute an integrated expression matrix).
CELL_GROUPINGS
INT_CLUSTERING_REPORT_DIMRED_NAMES, INT_CLUSTERING_REPORT_CLUSTERING_NAMES, INT_CLUSTERING_REPORT_DIMRED_PLOTS_OTHER
INT_CLUSTERING_KNITR_MESSAGE, INT_CLUSTERING_KNITR_WARNING, INT_CLUSTERING_KNITR_ECHO

Selection of final integration method

INTEGRATION_FINAL_METHOD: "mnn"

Type: character scalar ("mnn" | "rescaling" | "regression" | "harmony")

A name of the final integration method that will be used for clustering and downstream steps.

INTEGRATION_FINAL_METHOD_RM_CC: False

Type: logical scalar

Whether to take the result with removed cell cycle-related genes. This will be also applied to the “uncorrected” method (which is used for cluster markers and contrasts). True is only possible when any of the single-samples in the INTEGRATION_SOURCES parameter (01_integration.yaml) has hvg_rm_cc_genes set to True.

Cell grouping assignment

ADDITIONAL_CELL_DATA_FILE: null

Type: character scalar or null

Same as in the 02_norm_clustering stage of the single-sample pipeline (see vignette("stage_norm_clustering")), except the additional data must contain two columns: Barcode and batch. The latter must match the dataset names in INTEGRATION_SOURCES in 01_integration.yaml config file. Example:

DataFrame with 4 rows and 3 columns
                          Barcode              batch     cluster_custom
                          <character>          <factor>  <factor>
AAACCCAAGTTGGGAC-1-pbmc1k AAACCCAAGTTGGGAC-1   pbmc1k    2
AAACCCACATTCTGTT-1-pbmc1k AAACCCACATTCTGTT-1   pbmc1k    1
AAACCCAGTCAGACGA-1-pbmc3k AAACCCAGTCAGACGA-1   pbmc3k    1
AAACCCAGTTTGTTGG-1-pbmc3k AAACCCAGTTTGTTGG-1   pbmc3k    2
...

Note that rownames are not mandatory.

Input files

INT_CLUSTERING_REPORT_RMD_FILE: "Rmd/integration/02_int_clustering.Rmd"

Type: character scalar

A path to RMarkdown files used for HTML report of this pipeline stage.

Output files

INT_CLUSTERING_BASE_OUT_DIR: "02_int_clustering"

Type: character scalar

A path to base output directory for this stage. It will be created under BASE_OUT_DIR specified in 00_main.yaml config.

INT_CLUSTERING_DIMRED_PLOTS_OUT_DIR: "dimred_plots"
INT_CLUSTERING_CELL_ANNOTATION_OUT_DIR: "cell_annotation"
INT_CLUSTERING_OTHER_PLOTS_OUT_DIR: "other_plots"
INT_CLUSTERING_REPORT_HTML_FILE: "02_int_clustering.html"

Type: character scalar

Names of files and directories created under INT_CLUSTERING_BASE_OUT_DIR. Subdirectories are not allowed.

Here you can find description of the most important targets for this stage. However, for a full overview, you have to inspect the source code of the get_int_clustering_subplan() function.

HTML report target name: report_int_clustering

`SingleCellExperiment` objects

sce_int_uncorrected: the selected uncorrected SCE object according to the INTEGRATION_FINAL_METHOD_RM_CC parameter.

sce_int_final: the selected integrated SCE object according to the INTEGRATION_FINAL_METHOD and INTEGRATION_FINAL_METHOD_RM_CC parameters.

sce_int_clustering_final: similar to sce_final_norm_clustering in the 02_norm_clustering stage of the single-sample pipeline.

Targes similar to `02_norm_clustering`

These targets are basically the same as those in the 02_norm_clustering pipeline in the single-sample pipeline (see vignette("stage_norm_clustering")).

However, for a full overview, you have to inspect the source code of the get_int_clustering_subplan() function.

selected_markers_int_plots_final: selected markers plots for the selected integration method

Post-integration clustering stage (`02_int_clustering`)

^{Document generated:
2025-08-27 16:00:03 UTC+0000}

Selection of final integration method

Cell grouping assignment

Input files

Output files

`SingleCellExperiment` objects

Targes similar to `02_norm_clustering`

Post-integration clustering stage (`02_int_clustering`)

Document generated: 2025-08-27 16:00:03 UTC+0000

Selection of final integration method

Cell grouping assignment

Input files

Output files

SingleCellExperiment objects

Targes similar to 02_norm_clustering

^{Document generated:
2025-08-27 16:00:03 UTC+0000}

`SingleCellExperiment` objects

Targes similar to `02_norm_clustering`