Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new feature: pagebreak for every format #1626

Merged
merged 20 commits into from
Aug 27, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@
^tests/testthat/site/.*_files/
^\.github$
^pkgdown$
^doc$
^Meta$
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,6 @@
.Rhistory
.RData
.DS_Store
inst/doc
doc
Meta
6 changes: 5 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,11 @@ Authors@R: c(
person(family = "Drifty", role = "cph",
comment = "Ionicons"),
person("Aidan", "Lister", role = c("ctb", "cph"),
comment = "jQuery StickyTabs")
comment = "jQuery StickyTabs"),
person("Benct Philip", "Jonsson", role = c("ctb", "cph"),
comment = "pagebreak lua filter"),
person("Albert", "Krewinkel", role = c("ctb", "cph"),
comment = "pagebreak lua filter")
)
Maintainer: Yihui Xie <xie@yihui.name>
Description: Convert R Markdown documents into a variety of formats.
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
rmarkdown 1.15
================================================================================

- It is now possible to add pagebreak in html, word, latex and odt documents using `\newpage` or `\pagebreak` command in a Rmd file. This is possible thanks to [_pandoc's pagebreak lua filter_](https://github.com/pandoc/lua-filters/tree/master/pagebreak). See `vignette("pagebreak", package = "rmarkdown")`

- Exclude `README.R?md` from files processed by `render_site()`,

- `html_document` with `code_folding: hide` supports showing individual source code chunks if they are assigned the `fold-show` class via the chunk option `class.source="fold-show"` (thanks, @atusy, #1602).
Expand Down
3 changes: 3 additions & 0 deletions R/html_document_base.R
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,9 @@ html_document_base <- function(smart = TRUE,
if (pandoc2.0() && is.null(metadata$title) && is.null(metadata$pagetitle))
args <- c(args, "--metadata", paste0("pagetitle=", input_file))

# a lua filters added if pandoc2.0
args <- c(args, pandoc_lua_filters("pagebreak.lua"))

args
}

Expand Down
3 changes: 3 additions & 0 deletions R/odt_document.R
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@ odt_document <- function(fig_width = 5,
# reference odt
args <- c(args, reference_doc_args("odt", reference_odt))

# lua filters (added if pandoc > 2)
args <- c(args, pandoc_lua_filters("pagebreak.lua"))

# pandoc args
args <- c(args, pandoc_args)

Expand Down
14 changes: 14 additions & 0 deletions R/pandoc.R
Original file line number Diff line number Diff line change
Expand Up @@ -673,6 +673,20 @@ pandoc_citeproc <- function() {
"pandoc-citeproc"
}

pandoc_lua_filters <- function(...) {
args <- c()
# lua filters was introduced in pandoc 2.0
if (pandoc2.0()) {
args <- c(
rbind(
"--lua-filter",
rmarkdown_system_file(file.path("rmd/lua", c(...)))
)
)
}
args
}


# quote args if they need it
quoted <- function(args) {
Expand Down
3 changes: 3 additions & 0 deletions R/pdf_document.R
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,9 @@ pdf_document <- function(toc = FALSE,
# make sure the graphics package is always loaded
if (identical(template, "default")) args <- c(args, "--variable", "graphics=yes")

# lua filters (added if pandoc > 2)
args <- c(args, pandoc_lua_filters("pagebreak.lua"))

# args args
args <- c(args, pandoc_args)

Expand Down
3 changes: 3 additions & 0 deletions R/word_document.R
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,9 @@ word_document <- function(toc = FALSE,
# reference docx
args <- c(args, reference_doc_args("docx", reference_docx))

# lua filters (added if pandoc > 2)
args <- c(args, pandoc_lua_filters("pagebreak.lua"))

# pandoc args
args <- c(args, pandoc_args)

Expand Down
100 changes: 100 additions & 0 deletions inst/rmd/lua/pagebreak.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
--[[
pagebreak – convert raw LaTeX page breaks to other formats

Copyright © 2017-2019 Benct Philip Jonsson, Albert Krewinkel

Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
]]
local stringify_orig = (require 'pandoc.utils').stringify

local function stringify(x)
return type(x) == 'string' and x or stringify_orig(x)
end

--- configs – these are populated in the Meta filter.
local pagebreak = {
epub = '<p style="page-break-after: always;"> </p>',
html = '<div style="page-break-after: always;"></div>',
latex = '\\newpage{}',
ooxml = '<w:p><w:r><w:br w:type="page"/></w:r></w:p>',
odt = '<text:p text:style-name="Pagebreak"/>'
}

local function pagebreaks_from_config (meta)
local html_class =
(meta.newpage_html_class and stringify(meta.newpage_html_class))
or os.getenv 'PANDOC_NEWPAGE_HTML_CLASS'
if html_class and html_class ~= '' then
pagebreak.html = string.format('<div class="%s"></div>', html_class)
end

local odt_style =
(meta.newpage_odt_style and stringify(meta.newpage_odt_style))
or os.getenv 'PANDOC_NEWPAGE_ODT_STYLE'
if odt_style and odt_style ~= '' then
pagebreak.odt = string.format('<text:p text:style-name="%s"/>', odt_style)
end
end

--- Return a block element causing a page break in the given format.
local function newpage(format)
if format == 'docx' then
return pandoc.RawBlock('openxml', pagebreak.ooxml)
elseif format:match 'latex' then
return pandoc.RawBlock('tex', pagebreak.latex)
elseif format:match 'odt' then
return pandoc.RawBlock('opendocument', pagebreak.odt)
elseif format:match 'html.*' then
return pandoc.RawBlock('html', pagebreak.html)
elseif format:match 'epub' then
return pandoc.RawBlock('html', pagebreak.epub)
else
-- fall back to insert a form feed character
return pandoc.Para{pandoc.Str '\f'}
end
end

local function is_newpage_command(command)
return command:match '^\\newpage%{?%}?$'
or command:match '^\\pagebreak%{?%}?$'
end

-- Filter function called on each RawBlock element.
function RawBlock (el)
-- Don't do anything if the output is TeX
if FORMAT:match 'tex$' then
return nil
end
-- check that the block is TeX or LaTeX and contains only
-- \newpage or \pagebreak.
if el.format:match 'tex' and is_newpage_command(el.text) then
-- use format-specific pagebreak marker. FORMAT is set by pandoc to
-- the targeted output format.
return newpage(FORMAT)
end
-- otherwise, leave the block unchanged
return nil
end

-- Turning paragraphs which contain nothing but a form feed
-- characters into line breaks.
function Para (el)
if #el.content == 1 and el.content[1].text == '\f' then
return newpage(FORMAT)
end
end

return {
{Meta = pagebreaks_from_config},
{RawBlock = RawBlock, Para = Para}
}
27 changes: 27 additions & 0 deletions tests/testthat/test-lua-filters.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
.generate_md_and_convert <- function(content, output_format) {
input_file <- tempfile(fileext = ".Rmd")
output_file <- tempfile()
on.exit(unlink(c(input_file, output_file)), add = TRUE)
xfun::write_utf8(content, input_file)
res <- rmarkdown::render(input_file, output_format = output_format, output_file = output_file, quiet = TRUE)
xfun::read_utf8(res)
}

# lua filters exists only since pandoc 2.0
skip_if_not(rmarkdown::pandoc_available("2.0"))

test_that("pagebreak lua filters works", {
rmd <- "# HEADER 1\n\\newpage\n# HEADER 2\n\\pagebreak\n# HEADER 3"
res <- .generate_md_and_convert(rmd, "html_document")
expect_match(res[grep("HEADER 1", res)+1], "<div style=\"page-break-after: always;\"></div>")
expect_match(res[grep("HEADER 2", res)+1], "<div style=\"page-break-after: always;\"></div>")
# add a class instead of inline style
rmd2 <- paste0("---\nnewpage_html_class: page-break\n---\n", rmd)
res <- .generate_md_and_convert(rmd2, "html_document")
expect_match(res[grep("HEADER 1", res)+1], "<div class=\"page-break\"></div>")
expect_match(res[grep("HEADER 2", res)+1], "<div class=\"page-break\"></div>")
# For tex document this is unchanged
res <- .generate_md_and_convert(rmd, "latex_document")
expect_match(res[grep("HEADER 1", res)+2], "\\newpage", fixed = TRUE)
expect_match(res[grep("HEADER 2", res)+2], "\\pagebreak", fixed = TRUE)
})
2 changes: 2 additions & 0 deletions vignettes/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.html
*.R
159 changes: 159 additions & 0 deletions vignettes/pagebreak.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
---
title: "Add a pagebreak in Rmarkdown document"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{pagebreak}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

## Usage

Adding a pagebreak in document was always possible using custom output specific syntax in a rmarkdown file but one drawback was the compatibility with several output format.

Since rmarkdown >= 1.15 and with RStudio >= 1.2 (or with pandoc >= 2.0), it is possible to add a `\newpage` or `\pagebreak` command in a new line to include a pagebreak in any of these formats: `pdf_document()`, `html_document()`, `word_document()` and `odt_document()`.

```md
# Header 1

Some text

\newpage

# Header 2 on a new page

Some other text

\pagebreak

# Header 3 on a third page

```

rmarkdown will convert those commands in the correct output format syntax using a [lua filter](#lua-filter) during pandoc conversion.

### Using with PDF/ latex documents {#pdf}

As the commands are the ones already used in latex syntax, this works as expected in a tex output document, and thus with pdf. Adding a pagebreak was already possible with rmarkdown when output is `pdf_document()` or `latex_document()`, without any restriction about the version of pandoc.

### Using with HTML documents {#html}

A `\newpage` or `\pagebreak` command in a rmarkdown document with output as HTML will be converted by default in this html code with inline style using CSS rule [`page-break-after`](https://developer.mozilla.org/en-US/docs/Web/CSS/page-break-after)

```html
<div style="page-break-after: always;"></div>
```

This will always insert a pagebreak after this div.

To get more flexibility, you can use a HTML class and some custom CSS instead of an inline style. You need to add a metadata field `newpage_html_class` in your yaml header to set the class.

Then you can control the behavior using custom CSS as in this example

````md
---
output:
html_document: default
newpage_html_class: page-break
---

```{css, echo = FALSE}`r ''`
// display the pagebreak only when printing the html page
@media all {
.page-break { display: none; }
}
@media print {
.page-break { display: block; break-after: page; }
}
```

# Header 1

Some text

\newpage

# Header 2 on a new page

Some other text
````

`\newpage` will be converted here to

```html
<div class="page-break"></div>
```

and the style will be applied to this class from the CSS included in the chunk.

This customisation can also be achieved by setting the environnement variable `PANDOC_NEWPAGE_HTML_CLASS` in the R session that will render the document (or in `.Renviron` file for example)

Let's note that in this example we use [`break-after`](https://developer.mozilla.org/en-US/docs/Web/CSS/break-after) property instead of `page-break-after` as it is recommended now to use the former which is the replacement. The latter is kept around for [compatibility reason with browsers](https://caniuse.com/#search=page-break).

### Using with Word documents {#word}

A `\newpage` or `\pagebreak` command in a rmarkdown document with output as Word document will be converted in a pagebreak for word document. Manually, this would mean adding this in your rmarkdown

````md
```{=openxml}
<w:p><w:r><w:br w:type="page"/></w:r></w:p>
```
````

For example, using the pagebreak feature, this will add the first header in the second page of the work document

````md
---
title: My main title
output: word_document
---

\newpage

# First Header
````

### Using with ODT documents {#odt}

To use the pagebreak feature with `odt_document()`, you need to provide a reference document that includes a paragraph style with, by default, the name _Pagebreak_. This named paragraph style should have no extra space before or after and have a pagebreak after it. (see [libre office documentation](https://help.libreoffice.org/Writer/Text_Flow) on how to create a style).

The name of the named paragrah style could be customized using `newpage_odt_style` metadata in yaml header or `PANDOC_NEWPAGE_ODT_STYLE` environment variable (as in [html document](#html)).

As the previous one, this example will lead to a two pages document, with first header on the second page.

````md
---
title: My main title
output:
odt_document:
reference_odt: reference.odt
---

\newpage

# First Header
````


## About lua filters {#lua-filter}

Since pandoc 2.0, it is possible to use lua filters to add some extra functionality to pandoc document conversion. Adding a pagebreak command in markdown to be compatible with several output documents is one of them. You can find some more informations about lua filters in [pandoc's documentation](https://pandoc.org/lua-filters.html) and also some examples in [a collection of lua filters for pandoc](https://github.com/pandoc/lua-filters). These examples, and any other lua filters, can be use in your Rmarkdown document directly by adding [a pandoc argument](https://bookdown.org/yihui/rmarkdown/html-document.html#pandoc-arguments) in yaml header

```html
---
output:
html_document:
pandoc_args: ["--lua-filter=filter.lua"]
---
```

The package [rmdfiltr](https://github.com/crsh/rmdfiltr) provides a collection of lua filters and helpers functions to use them.

Before pandoc 2.0, [using filter](https://pandoc.org/filters.html) with pandoc was already available through programs that modifies the AST. `pandoc-citeproc` is an example used to deal with citations. The package [pandocfilter](https://cran.r-project.org/web/packages/pandocfilters/) is useful to create filters using R.