Extending the pipeline

In this vignette we will reuse the project initialized within the Quick start guide (vignette("scdrake")) that should live in ~/scdrake_projects/pbmc1k.

drake’s pipeline definitions (plans) are R objects which can be arbitrarily extended by new targets. There are several ways how to extend plans provided by scdrake, but the easiest way is to utilize the built-in plan extension called inside the drake init scripts _drake_single_sample.R and _drake_integration.R.

Before the plan (or more precisely, drake::drake_config()) is returned from those scripts, an additional R script (plan_custom.R in project root by default) is sourced. In this script, the last returned value must be a valid drake plan (drake::drake_plan()) which is consequently merged with the original plan.

The path to R script with custom plan is taken from the SCDRAKE_PLAN_CUSTOM_FILE environment variable.

Defining a custom plan

Let’s try a dummy additional plan which is already present in plan_custom.R. Just uncomment the lines under Example:, or if you want, you can try to define your own target/s. A quick intro to drake plans can be found here and here.

We also need to tell drake to make this new target. Open config/pipeline.yaml and set DRAKE_TARGETS to ["my_target"] (or more targets if you have defined them).

Now just run the pipeline as usual:

run_single_sample_r()

In the terminal you should see an informative text ℹ Extending the plan with a custom one defined in 'plan_custom.R'.

When the pipeline finishes, you can load the new target to your session with

drake::loadd(my_target)

Parametrized custom plan

In plan_custom.R you can use any variables defined in _drake_single_sample.R or _drake_integration.R. Probably the most important are cfg and cfg_pipeline lists holding pipeline parameters. Note that in plan_custom.R all variables from the parent script are locked and cannot be modified.

Let’s see how you can utilize the cfg list. Open config/single_sample/01_input_qc.yaml and add a new line:

MY_GENE: "NOC2L"

Then replace the code in plan_custom.R with

drake::drake_plan(
  my_target = scater::plotExpression(sce_final_input_qc, cfg$input_qc$MY_GENE, exprs_values = "counts", swap_rownames = "SYMBOL")
)

You can see we used the MY_GENE parameter defined in the config file. Later, drake will replace cfg$input_qc$MY_GENE with its value "NOC2L".

In case you don’t want to use the standard config files, you can make your own one, e.g. config/my_params.yaml:

MY_GENE: "NOC2L"

, and use it in plan_custom.R:

my_cfg <- load_config("config/my_params.yaml")
drake::drake_plan(
  my_target = scater::plotExpression(sce_final_input_qc, my_cfg$MY_GENE, exprs_values = "counts", swap_rownames = "SYMBOL")
)

Extending the RMarkdown documents

Stage-specific RMarkdown documents

All RMarkdown files used for stage reports are located in the Rmd/ directory in project’s root. Feel free to modify them to your needs. Just keep in mind that when you call update_project(), those files will be overwritten by the default ones. To overcome this situation, you can save your modified file using a different name and then modify the parameter specifying a path to stage’s Rmd file.

For example, you modify Rmd/single_sample/01_input_qc.Rmd, save it as Rmd/single_sample/01_input_qc_modified.Rmd, and change accordingly the INPUT_QC_REPORT_RMD_FILE parameter in config/single_sample/01_input_qc.yaml.

Custom RMarkdown documents

For additional RMarkdown documents you just need to incorporate a file-returning target into your custom plan, e.g.

Rmd/my_report.Rmd

---
title: "My report"
---

```{r}
drake::loadd(sce_final_norm_clustering)
scater::plotReducedDim(sce_final_norm_clustering, dimred = "umap", colour_by = "cluster_graph_louvain")
```

plan_custom.R

drake::drake_plan(
  my_report = drake::target(
    rmarkdown::render(
      drake::knitr_in("Rmd/my_report.Rmd"),
      output_file = here::here("my_report.html"),
      knit_root_dir = here::here()
    ),

    format = "file"
  )
)

Let’s break down the plan above:

my_report target has format = "file" meaning drake expects the return value to be a character vector of file or directory paths, and rmarkdown::render() returns path to output file.
drake::knitr_in("Rmd/my_report.Rmd") is a special function which marks the Rmd file as a dependency. Internally, it scans active code chunks and search for calls to drake::loadd() and drake::readd(), and marks the targets inside as dependencies of the target (my_report).
By default, knitr, which is responsible for rendering of Rmd files, uses working directory the same as the location of the Rmd file. This is violating our project-based approach (everything is specified relative to project root), and so we are using here::here() to specify the output file and working directory. here::here() remembers the project root directory on its load and converts root-relative paths to absolute. You can try it yourself: call here::here() or here::here("Rmd/my_report.Rmd").

Just for curiosity, we can see the dependencies of my_target using drake::r_deps_target(my_report, source = "_drake_single_sample.R"):

  name                      type       hash            
1 here::here                namespaced NA              
2 rmarkdown::render         namespaced NA              
3 sce_final_norm_clustering loadd      911414192d751378
4 Rmd/my_report.Rmd         knitr_in   NA

Reusing the stage-specific RMarkdown documents

Another possibility is to reuse the machinery responsible for rendering of stage reports, that is:

The headmost part of Rmd documents (you can check e.g. Rmd/single_sample/01_input_qc.Rmd):

---
title: "`r params$title`"
author: "Your name"
institute: "Your institute"
date: "`r glue::glue('Document generated: {format(Sys.time(), \"%Y-%m-%d %H:%M:%S %Z%z\")}')`"
output:
  html_document:
    toc: true
    toc_depth: 4
    toc_float: true
    number_sections: false
    theme: "flatly"
    self_contained: true
    code_download: true
    df_print: "paged"
params:
  css_file: !expr here::here("Rmd/common/stylesheet.css")
  drake_cache_dir: !expr here::here(".drake")
  title: "Your title"
css: "`r params$css_file`"
---

```{r, message = FALSE, warning = FALSE}
drake_cache_dir <- params_$drake_cache_dir
drake::loadd(your_target_1, your_target_2, ..., path = drake_cache_dir)
```

The example Rmd document above has several parameters:

css_file: a path to CSS file with HTML styling.
drake_cache_dir: used to load targets from nondefault cache directory.
title: a dynamic document title.

The second part of the machinery is the generate_stage_report() function. Internally, it’s a wrapper around rmarkdown::render() with some sensible defaults, and passing css_file, drake_cache_dir and other user-defined parameters to the Rmd document. For the Rmd document above, we would call this function as

generate_stage_report(
  ## -- We assume that the document is saved here.
  "Rmd/my_report.Rmd",
  "output/my_report.html",
  params = list(title = "My report")
)

And here is an example of target rendering a report for the 01_input_qc stage (source):

drake::drake_plan(
  report_input_qc = target(
      generate_stage_report(
        rmd_file = knitr_in(!!cfg$INPUT_QC_REPORT_RMD_FILE),
        out_html_file_name = file_out(!!cfg$INPUT_QC_REPORT_HTML_FILE),
        css_file = file_in(!!cfg_main$CSS_FILE),
        message = !!cfg$INPUT_QC_KNITR_MESSAGE,
        warning = !!cfg$INPUT_QC_KNITR_WARNING,
        echo = !!cfg$INPUT_QC_KNITR_ECHO,
        other_deps = list(
          file_in(!!here("Rmd/common/_header.Rmd")),
          file_in(!!here("Rmd/common/_footer.Rmd")),
          file_in(!!here("Rmd/single_sample/01_input_qc_children/empty_droplets.Rmd"))
        ),
        drake_cache_dir = !!cfg_pipeline$DRAKE_CACHE_DIR
      ),
      format = "file"
    )
)

You can see that almost all function parameters are dynamic, based on config file (cfg). Also, we force drake to watch for changes in other child Rmd documents by specifying them inside other_deps.

Note that drake::file_in() is watching for changes in file size/structure and not for calls to drake::loadd() and drake::readd() as drake::knitr_in() is doing (static code analysis).