Extending the pipeline
Document generated: 2024-09-14 14:01:46 UTC+0000
Source:vignettes/scdrake_extend.Rmd
scdrake_extend.Rmd
In this vignette we will reuse the project initialized within the
Quick start guide (vignette("scdrake")
) that should live in
~/scdrake_projects/pbmc1k
.
drake’s pipeline definitions (plans) are R objects
which can be arbitrarily extended by new targets. There are several ways
how to extend plans provided by scdrake, but the easiest
way is to utilize the built-in plan extension called inside the
drake init scripts _drake_single_sample.R
and _drake_integration.R
.
Before the plan (or more precisely,
drake::drake_config()
) is returned from those scripts, an
additional R script (plan_custom.R
in project root by
default) is sourced. In this script, the last returned value must be a
valid drake
plan (drake::drake_plan()
) which
is consequently merged with the original plan.
The path to R script with custom plan is taken from the
SCDRAKE_PLAN_CUSTOM_FILE
environment variable.
Defining a custom plan
Let’s try a dummy additional plan which is already present in
plan_custom.R
. Just uncomment the lines under
Example:
, or if you want, you can try to define your own
target/s. A quick intro to drake plans can be found here and here.
We also need to tell drake to make this new target.
Open config/pipeline.yaml
and set
DRAKE_TARGETS
to ["my_target"]
(or more
targets if you have defined them).
Now just run the pipeline as usual:
In the terminal you should see an informative text
ℹ Extending the plan with a custom one defined in 'plan_custom.R'
.
When the pipeline finishes, you can load the new target to your session with
drake::loadd(my_target)
Parametrized custom plan
In plan_custom.R
you can use any variables defined in
_drake_single_sample.R
or
_drake_integration.R
. Probably the most important are
cfg
and cfg_pipeline
lists holding pipeline
parameters. Note that in plan_custom.R
all variables from
the parent script are locked and cannot be modified.
Let’s see how you can utilize the cfg
list. Open
config/single_sample/01_input_qc.yaml
and add a new
line:
MY_GENE: "NOC2L"
Then replace the code in plan_custom.R
with
drake::drake_plan(
my_target = scater::plotExpression(sce_final_input_qc, cfg$input_qc$MY_GENE, exprs_values = "counts", swap_rownames = "SYMBOL")
)
You can see we used the MY_GENE
parameter defined in the
config file. Later, drake will replace
cfg$input_qc$MY_GENE
with its value
"NOC2L"
.
In case you don’t want to use the standard config files, you can make
your own one, e.g. config/my_params.yaml
:
MY_GENE: "NOC2L"
, and use it in plan_custom.R
:
my_cfg <- load_config("config/my_params.yaml")
drake::drake_plan(
my_target = scater::plotExpression(sce_final_input_qc, my_cfg$MY_GENE, exprs_values = "counts", swap_rownames = "SYMBOL")
)
Extending the RMarkdown documents
Stage-specific RMarkdown documents
All RMarkdown files used for stage reports are located in the
Rmd/
directory in project’s root. Feel free to modify them
to your needs. Just keep in mind that when you call
update_project()
, those files will be
overwritten by the default ones. To overcome this
situation, you can save your modified file using a different name and
then modify the parameter specifying a path to stage’s Rmd file.
For example, you modify
Rmd/single_sample/01_input_qc.Rmd
, save it as
Rmd/single_sample/01_input_qc_modified.Rmd
, and change
accordingly the INPUT_QC_REPORT_RMD_FILE
parameter in
config/single_sample/01_input_qc.yaml
.
Custom RMarkdown documents
For additional RMarkdown documents you just need to incorporate a file-returning target into your custom plan, e.g.
Rmd/my_report.Rmd
---
title: "My report"
---
```{r}
drake::loadd(sce_final_norm_clustering)
scater::plotReducedDim(sce_final_norm_clustering, dimred = "umap", colour_by = "cluster_graph_louvain")
```
plan_custom.R
drake::drake_plan(
my_report = drake::target(
rmarkdown::render(
drake::knitr_in("Rmd/my_report.Rmd"),
output_file = here::here("my_report.html"),
knit_root_dir = here::here()
),
format = "file"
)
)
Let’s break down the plan above:
-
my_report
target hasformat = "file"
meaning drake expects the return value to be a character vector of file or directory paths, andrmarkdown::render()
returns path to output file. -
drake::knitr_in("Rmd/my_report.Rmd")
is a special function which marks the Rmd file as a dependency. Internally, it scans active code chunks and search for calls todrake::loadd()
anddrake::readd()
, and marks the targets inside as dependencies of the target (my_report
). - By default, knitr, which is responsible for rendering
of Rmd files, uses working directory the same as the location of the Rmd
file. This is violating our project-based approach (everything is
specified relative to project root), and so we are using
here::here()
to specify the output file and working directory.here::here()
remembers the project root directory on its load and converts root-relative paths to absolute. You can try it yourself: callhere::here()
orhere::here("Rmd/my_report.Rmd")
.
Just for curiosity, we can see the dependencies of
my_target
using
drake::r_deps_target(my_report, source = "_drake_single_sample.R")
:
name type hash
1 here::here namespaced NA
2 rmarkdown::render namespaced NA
3 sce_final_norm_clustering loadd 911414192d751378
4 Rmd/my_report.Rmd knitr_in NA
Reusing the stage-specific RMarkdown documents
Another possibility is to reuse the machinery responsible for rendering of stage reports, that is:
- The headmost part of Rmd documents (you can check e.g. Rmd/single_sample/01_input_qc.Rmd):
---
title: "`r params$title`"
author: "Your name"
institute: "Your institute"
date: "`r glue::glue('Document generated: {format(Sys.time(), \"%Y-%m-%d %H:%M:%S %Z%z\")}')`"
output:
html_document:
toc: true
toc_depth: 4
toc_float: true
number_sections: false
theme: "flatly"
self_contained: true
code_download: true
df_print: "paged"
params:
css_file: !expr here::here("Rmd/common/stylesheet.css")
drake_cache_dir: !expr here::here(".drake")
title: "Your title"
css: "`r params$css_file`"
---
```{r, message = FALSE, warning = FALSE}
drake_cache_dir <- params_$drake_cache_dir
drake::loadd(your_target_1, your_target_2, ..., path = drake_cache_dir)
```
The example Rmd document above has several parameters:
-
css_file
: a path to CSS file with HTML styling. -
drake_cache_dir
: used to load targets from nondefault cache directory. -
title
: a dynamic document title.
The second part of the machinery is the
generate_stage_report()
function. Internally, it’s a
wrapper around rmarkdown::render()
with some sensible
defaults, and passing css_file
,
drake_cache_dir
and other user-defined parameters to the
Rmd document. For the Rmd document above, we would call this function
as
generate_stage_report(
## -- We assume that the document is saved here.
"Rmd/my_report.Rmd",
"output/my_report.html",
params = list(title = "My report")
)
And here is an example of target rendering a report for the
01_input_qc
stage (source):
drake::drake_plan(
report_input_qc = target(
generate_stage_report(
rmd_file = knitr_in(!!cfg$INPUT_QC_REPORT_RMD_FILE),
out_html_file_name = file_out(!!cfg$INPUT_QC_REPORT_HTML_FILE),
css_file = file_in(!!cfg_main$CSS_FILE),
message = !!cfg$INPUT_QC_KNITR_MESSAGE,
warning = !!cfg$INPUT_QC_KNITR_WARNING,
echo = !!cfg$INPUT_QC_KNITR_ECHO,
other_deps = list(
file_in(!!here("Rmd/common/_header.Rmd")),
file_in(!!here("Rmd/common/_footer.Rmd")),
file_in(!!here("Rmd/single_sample/01_input_qc_children/empty_droplets.Rmd"))
),
drake_cache_dir = !!cfg_pipeline$DRAKE_CACHE_DIR
),
format = "file"
)
)
You can see that almost all function parameters are dynamic, based on
config file (cfg
). Also, we force drake to
watch for changes in other child Rmd documents by specifying them inside
other_deps
.
Note that drake::file_in()
is watching for changes in
file size/structure and not for calls to drake::loadd()
and
drake::readd()
as drake::knitr_in()
is doing
(static code analysis).