Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/quarto identify source files #200

Merged
merged 17 commits into from
Nov 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:

- uses: r-lib/actions/setup-pandoc@v2

- uses: quarto-dev/quarto-actions/setup@v2
- uses: quarto-dev/quarto-actions/setup@v2.1.6

- name: Install Mac system dependencies
if: runner.os == 'macOS'
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
* Detect child quarto documents (#199, @mutlusun).
* Improve reporting of static branch names from `tar_map()` and `tar_map_rep()` (#201, @kkmann).
* Ensure compatibility with `targets` after https://github.com/ropensci/targets/issues/1368.
* Improve detection of Quarto files in `tar_quarto_inspect()` (#200, @multusun).

# tarchetypes 0.10.0

Expand Down
109 changes: 69 additions & 40 deletions R/tar_quarto_files.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@
#' * `output`: output files that will be generated during
#' `quarto::quarto_render()`.
#' * `input`: pre-existing files required to render the project or document,
#' such as `_quarto.yml`.
#' such as `_quarto.yml` and quarto extensions.
#' @inheritParams quarto::quarto_render
#' @param path Character of length 1, either the file path
#' to a Quarto source document or the directory path
#' to a Quarto project. Defaults to the Quarto project in the
Expand All @@ -33,7 +34,7 @@
#' writeLines(lines, path)
#' # If Quarto is installed, run:
#' # tar_quarto_files(path)
tar_quarto_files <- function(path = ".", profile = NULL) {
tar_quarto_files <- function(path = ".", profile = NULL, quiet = TRUE) {
assert_quarto()
targets::tar_assert_scalar(path)
targets::tar_assert_chr(path)
Expand All @@ -47,19 +48,22 @@
}
out <- if_any(
fs::is_dir(path),
tar_quarto_files_project(path),
tar_quarto_files_document(path)
tar_quarto_files_project(path, quiet),
tar_quarto_files_document(path, quiet)
)
for (field in c("sources", "output", "input")) {
out[[field]] <- sort(fs::path_rel(unique(as.character(out[[field]]))))
out[[field]] <- as.character(out[[field]])
out[[field]] <- as.character(fs::path_rel(out[[field]]))
out[[field]] <- unique(sort(out[[field]]))
}
out
}

tar_quarto_files_document <- function(path) {
info <- quarto::quarto_inspect(input = path)
out <- list(sources = path)
tar_quarto_files_document <- function(path, quiet) {
info <- quarto::quarto_inspect(input = path, quiet = quiet)
out <- list()

# Collect data about source files.
out$sources <- tar_quarto_files_get_source_files(info$fileInformation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As currently written, this function is based on detecting calls to tar_read() and tar_load(). tar_render() source files do not necessarily call these functions. Is there a more reliable way to detect source files? If we know they're all qmd files that will be run/rendered, can we just include all of them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether I do understand your comment correctly. Is your question whether files that do not use tar_read() or tar_load() are still somehow detected and used as a dependency?

The doc says the following:

A named list of important file paths in a Quarto project or document:

  • sources: source files with tar_load()/tar_read() target dependencies in R code chunks.
  • output: output files that will be generated during quarto::quarto_render().
  • input: pre-existing files required to render the project or document, such as _quarto.yml.

I followed this documentation and collected only files that have tar_read or a tar_load in them in sources. All other included files that do not call tar_read/load are in input. See this test.

To be honest, I'm not sure whether the differentiation between these three list items is necessary for targets as all of them are simply treated as file dependencies (at least this way I understand the code). In addition, it would be really easy for tar_quarto_files to detect which targets are loaded because fileInformation contains the code cells. This way, the call to knitr_deps in tar_quarto_raw could be eliminated. But I didn't change that to not alter the interface of these functions.

Did I understand your question correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your question is whether also files are detected that are included by the traditional knitr (tar_render) way ( ```{r child="file1"): this is the case.

Copy link
Member

@wlandau wlandau Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed this documentation and collected only files that have tar_read or a tar_load in them in sources.

Sources can have tar_read()/tar_load(), but they don't have to, and they don't need to be distinguished from source files that don't have tar_read()/tar_load(). tarchetypes just needs to know which files to scan for tar_read()/tar_load() using static code analysis, and it needs to know the inputs to quarto_render().

To be honest, I'm not sure whether the differentiation between these three list items is necessary for targets as all of them are simply treated as file dependencies (at least this way I understand the code).

The sources are the files scanned with knitr_deps(). The other input files do not necessarily even have code chunks that can be parse()'d as valid R code.

In addition, it would be really easy for tar_quarto_files to detect which targets are loaded because fileInformation contains the code cells.

That part could be useful for just extracting the code chunks, but we already can use knits::knit(tangle = TRUE), and it is a unified approach that also works with R Markdown and knitr.

This way, the call to knitr_deps in tar_quarto_raw could be eliminated.

Regardless of where we get the text of the code chunks, knitr_deps() is still needed because it runs static code analysis to robustly detect tar_load()/tar_read() calls and the targets that they load/read. It is much more reliable to walk the AST than manipulate text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback and the clarifications!

Sources can have tar_read()/tar_load(), but they don't have to, and they don't need to be distinguished from source files that don't have tar_read()/tar_load(). tarchetypes just needs to know which files to scan for tar_read()/tar_load() using static code analysis, and it needs to know the inputs to quarto_render().

Thanks for clarifying this. I did not understand the documentation in such a way. I changed the documentation to make this point clearer. Additionally, I reworked the code and the tests to reflect this. Things are much simpler now.

Regarding knitr_deps(): Sorry for not being clear here. My intention was not to completely remove the logic behind knitr_deps() but to not read the files once again as quarto::quarto_inspect has already done this. For example, one could create an AST from the code snippets provided by quarto inspect. However, in all my reports, report generation takes much longer than tarchetypes to decide whether to rerun the report or not. Thus, it probably doesn't matter and it is the easiest solution to keep everything as is.


# Collect data about output files.
for (format in info$formats) {
Expand All @@ -69,25 +73,20 @@
)
}

# Collect data about input files.
for (myfile in names(info$fileInformation)) {
out$input <- c(
out$input,
# `myfile` are relative paths starting from `path`.
myfile,
# `includeMap` contains relative paths starting from `myfile`.
file.path(
dirname(path),
info$fileInformation[[myfile]]$includeMap$target
)
)
# Collect data about input files. As this is not a project, there doesn't
# exist the `info$files` key. However, we can include resources if present.
if (length(info$resources) > 0) {
out$input <- info$resources
out$input <- out$input[file.exists(out$input)]

Check warning on line 80 in R/tar_quarto_files.R

View check run for this annotation

Codecov / codecov/patch

R/tar_quarto_files.R#L79-L80

Added lines #L79 - L80 were not covered by tests
} else {
out$input <- character(0L)
}

out
}

tar_quarto_files_project <- function(path) {
info <- quarto::quarto_inspect(input = path)
tar_quarto_files_project <- function(path, quiet) {
info <- quarto::quarto_inspect(input = path, quiet)

Check warning on line 89 in R/tar_quarto_files.R

View check run for this annotation

Codecov / codecov/patch

R/tar_quarto_files.R#L89

Added line #L89 was not covered by tests
targets::tar_assert_nonempty(
info$config$project$`output-dir`,
paste(
Expand All @@ -97,29 +96,59 @@
)
)

# Detect source files.
sources <- info$files$input[file.exists(info$files$input)]
out <- list(output = file.path(path, info$config$project$`output-dir`))

Check warning on line 99 in R/tar_quarto_files.R

View check run for this annotation

Codecov / codecov/patch

R/tar_quarto_files.R#L99

Added line #L99 was not covered by tests

# Collect data about source files.
out$sources <- tar_quarto_files_get_source_files(info$fileInformation)

Check warning on line 102 in R/tar_quarto_files.R

View check run for this annotation

Codecov / codecov/patch

R/tar_quarto_files.R#L102

Added line #L102 was not covered by tests

# Detect input files.
input <- info$files
input <- unlist(input)
input <- input[file.exists(input)]
for (myfile in names(info$fileInformation)) {
input <- c(
input,
# `myfile` is an absolute path.
# Detect input files like the config file (`_quarto.yml`) and resources like
# quarto extensions. Make sure in the end that these files exist.
out$input <- unlist(
c(
info$files$config,
info$files$resources
)
)
out$input <- out$input[file.exists(out$input)]

Check warning on line 112 in R/tar_quarto_files.R

View check run for this annotation

Codecov / codecov/patch

R/tar_quarto_files.R#L106-L112

Added lines #L106 - L112 were not covered by tests

out

Check warning on line 114 in R/tar_quarto_files.R

View check run for this annotation

Codecov / codecov/patch

R/tar_quarto_files.R#L114

Added line #L114 was not covered by tests
}

#' @title Get Source Files From Quarto Inspect
#' @description Collects all files from the
#' `fileInformation` field that are used in the current report.
#' @details `fileInformation` contains a list of files. Each file entry contains
#' two data frames. The first, `includeMap`, contains a `source` column (files
#' that include other files, e.g. the main report file) and a `target` column
#' (files that get included by the `source` files). The `codeCells` data frame
#' contains all code cells from the files represented in `includeMap`.
#' @return A character vector of Quarto source files.
#' @param file_information The `fileInformation` element of the list
#' returned by `quarto::quarto_inspect()`.
tar_quarto_files_get_source_files <- function(file_information) {
ret <- character(0)

for (myfile in names(file_information)) {
# Collect relevant source files. The files in `includeMap$target` are always
# relative to the main entry point of the report. Thus, we need to add the
# corresponding paths to the entries.
#
# We don't need to include the `source` column as all files are also present
# in `target` or are `myfile`.
ret <- c(
ret,
myfile,
# `includeMap` files are relative starting from `myfile`.
file.path(
dirname(myfile),
info$fileInformation[[myfile]]$includeMap$target
file_information[[myfile]]$includeMap$target
)
)
}

list(
sources = sources,
output = file.path(path, info$config$project$`output-dir`),
input = input
)
# Check that these files actually exist.
ret <- ret[file.exists(ret)]

# We don't need to call `unique` here on `ret` as this happens on the main
# function.
ret
}
2 changes: 1 addition & 1 deletion R/tar_quarto_raw.R
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ tar_quarto_raw <- function(
targets::tar_assert_scalar(profile %|||% ".")
targets::tar_assert_chr(profile %|||% ".")
targets::tar_assert_nzchar(profile %|||% ".")
info <- tar_quarto_files(path = path, profile = profile)
info <- tar_quarto_files(path = path, profile = profile, quiet = quiet)
sources <- info$sources
output <- info$output
if (!is.null(output_file)) {
Expand Down
5 changes: 4 additions & 1 deletion R/tar_quarto_rep_raw.R
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,10 @@ tar_quarto_rep_raw <- function(
rep_workers <- as.integer(rep_workers)
name_params <- paste0(name, "_params")
sym_params <- as.symbol(name_params)
default_output_file <- utils::head(tar_quarto_files(path)$output, n = 1L)
default_output_file <- utils::head(
tar_quarto_files(path, quiet = quiet)$output,
n = 1L
)
default_output_file <- default_output_file %||%
fs::path_ext_set(path, "html")
target_params <- targets::tar_target_raw(
Expand Down
32 changes: 25 additions & 7 deletions man/tar_age.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

32 changes: 25 additions & 7 deletions man/tar_change.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

32 changes: 25 additions & 7 deletions man/tar_combine.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading