Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

citeproc settings in YAML metadata block: Problems when using pandoc-zotxt.lua filter #2294

Closed
4 tasks done
njbart opened this issue Sep 5, 2022 · 16 comments
Closed
4 tasks done
Assignees
Labels
bug Something isn't working

Comments

@njbart
Copy link

njbart commented Sep 5, 2022

Bug description

This report describes an issue with obtaining bibliographic data directly from Zotero, bypassing the need for exporting .bib, .json, or .yaml bibliographic data, by using the Zotero add-on zotxt and the associated pandoc filter pandoc-zotxt.lua.

Using Rmarkdown, it is possible to knit/render files such as the following from within RStudio (provided Zotero is running, and an entry @bard:2015introduction does exist):

---
output:
  html_document: 
    pandoc_args:
      - '--lua-filter'
      - pandoc-zotxt.lua
      - '--citeproc'
---

Blah blah [@bard:2015introduction, 67].

Result:

Blah blah (Bard 2015, 67).

Bard, Kathryn A. 2015. An Introduction to the Archaeology of Ancient Egypt. 2nd ed. Hoboken, NJ: Wiley-Blackwell.

Using quarto from the command line, with this file (test.qmd)

---
format: html
---

Blah blah [@bard:2015introduction, 67].

and the command

quarto render test.qmd -L /Users/nb/.local/share/pandoc/filters/pandoc-zotxt.lua -C

produces the exact same expected result.

(Note that I had to provide the full path to the filter here, but that seems to be a different issue.)

For qmd from within RStudio, one would expect to be able to achieve the same with appropriate settings in the YAML metadata block; however, this does not seem to be possible, as using pandoc-args (or pandoc_args) has no effect, and does not seem to be supported anyway.

Example:

---
format:
  html: 
    pandoc-args:
      - '--lua-filter'
      - /Users/nb/.local/share/pandoc/filters/pandoc-zotxt.lua
      - '--citeproc'
---

Blah blah [@bard:2015introduction, 67].

Result:

Blah blah [@bard:2015introduction, 67].

(i.e., citation not resolved)

Including filters and citeproc: TRUE in the header has no effect either:

---
filters:
  - /Users/nb/.local/share/pandoc/filters/pandoc-zotxt.lua
citeproc: TRUE
format: html
---

Blah blah [@bard:2015introduction, 67].

Result:

Blah blah [@bard:2015introduction, 67].

(i.e., citation not resolved)

What does work is including the instruction to use a bibliography file (dummy.json) which includes only one (dummy) entry, different from @bard:2015introduction. This seems sufficient to trigger the activation of citeproc.

---
filters:
  - /Users/nb/.local/share/pandoc/filters/pandoc-zotxt.lua
format: html
bibliography: dummy.json
---

Blah blah [@bard:2015introduction, 67].

Result: as expected (note that not even citeproc: TRUE is required here).

Possible solutions:

  1. Allow the use of citeproc as a (pseudo-) filter (note that the the order of filters matters)

    filters:
      - /Users/nb/.local/share/pandoc/filters/pandoc-zotxt.lua
      - citeproc
    
  2. Include support for using pandoc-args (or pandoc_args) in a YAML metadata block.

  3. Make the following work:

    filters:
      - /Users/nb/.local/share/pandoc/filters/pandoc-zotxt.lua
    citeproc: TRUE
    

    A problem I see here, though, is that in this particular case pandoc-zotxt.lua has to be run before citeproc, but there might be other filters that have to be run after citeproc, so the option to force a specific order of filters (and pseudo-filters) which would be possible with 1. and 2. might be best.


RStudio 2022.07.1+554 "Spotted Wakerobin" Release (7872775ebddc40635780ca1ed238934c3345c5de, 2022-07-22) for macOS
Mozilla/5.0 (Macintosh; Intel Mac OS X 11_6_8) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.12.10 Chrome/69.0.3497.128 Safari/537.36

quarto -v
1.1.179

Checklist

  • Please include a minimal, fully reproducible example in a single .qmd file? Please provide the whole file rather than the snippet you believe is causing the issue.
  • Please format your issue so it is easier for us to read the bug report.
  • Please document the RStudio IDE version you're running (if applicable), by providing the value displayed in the "About RStudio" main menu dialog?
  • Please document the operating system you're running. If on Linux, please provide the specific distribution.
@njbart njbart added the bug Something isn't working label Sep 5, 2022
@dragonstyle
Copy link
Collaborator

dragonstyle commented Sep 6, 2022

Thanks for reporting this.

It looks like we were disabling the citeproc filter unless we determine that there are references or a bibliography. You should now be able to specify:

filters:
  - /Users/nb/.local/share/pandoc/filters/pandoc-zotxt.lua
citeproc: true

By default the citeproc filter will be run after user provided filters, but you can control its position using the pseudo filter as you suggest:

filters:
  - /Users/nb/.local/share/pandoc/filters/pandoc-zotxt.lua
  - citeproc

This will only position the citeproc filter if it is enabled (because there are references, a bibliography, or citeproc: true is specified). (This doesn't for citeproc on, just positions it if it is present).

See below

@dragonstyle
Copy link
Collaborator

Correction (thanks @cderv!) - citeproc is always run after user provided filters no matter what (which allows filters to interact with citation before they are rendered).

@njbart
Copy link
Author

njbart commented Sep 6, 2022

Thanks for the update. This is not a problem for the pandoc-zotxt.lua use case described above, but it might be worth keeping in mind that there are others where filters need to be run before and after citeproc in a specific order (for one example, see here).

@cderv
Copy link
Collaborator

cderv commented Sep 6, 2022

I don't know if that will help but I think it is worth mentioning that Pandoc 2.19.1 has now a pandoc.utils.citeproc that runs citeproc on the document. https://pandoc.org/releases.html#pandoc-2.19.1-2022-08-18

Could help running before and after using filters.

@njbart
Copy link
Author

njbart commented Sep 7, 2022

Thanks for pointing this out.

For future reference: This filter (taken from here), in a file citeproc.lua

-- Lua filter that behaves like `--citeproc`
function Pandoc (doc)
  return pandoc.utils.citeproc(doc)
end

combined with the following in a YAML metadata block

filters:
  - /Users/nb/.local/share/pandoc/filters/pandoc-zotxt.lua
  - /Users/nb/.local/share/pandoc/filters/citeproc.lua

does run pandoc-zotxt.lua and then citeproc with the expected results.

I haven’t tried anything else so far, but it does seem it will allow you to run citeproc at any specific stage within an ordered sequence of filters and hence would solve all problems I can imagine in this context for now.

What doesn’t work as expected so far is that full paths to filters in the filter subdirectory in pandoc’s user data directory seem to be required, even though quarto pandoc -v does return User data directory: /Users/nb/.local/share/pandoc. I’ll open a separate issue for that.

@dragonstyle
Copy link
Collaborator

Thanks for the additional context regarding filter order (esp with citeproc). It may be that we have to re-introduce the concept of filter ordering or perhaps allow you to provide different classes of filters that target different positions in the AST rendering stack for just this type of reason. We're planning to do some work on our own LUA / filters and I'd expect we'll reconsider this as a part of that!

@bcdavasconcelos
Copy link

@dragonstyle, just curious if there were any updates on this. In case not, perhaps a simple new field would suffice.

filters-after-citeproc:
  - filter1.lua

@cscheid
Copy link
Collaborator

cscheid commented Mar 13, 2023

@bcdavasconcelos Thanks for the idea. We'll do something similar, but slightly different.

We're not going to be able to ship this in 1.3 because of regression risk (we're getting ready for a release).

A simple new field unfortunately does not suffice in the presence of our new Custom AST handlers, but this is a generally good idea. We'll do it in 1.4.

@njbart
Copy link
Author

njbart commented Nov 23, 2024

Unfortunately, the solution described above seems to have stopped working. A file, test.qmd:

---
format: html
---

Blah blah [@bard:2015introduction, 67].

processed from the command line with

quarto render test.qmd -L /Users/nb/.local/share/pandoc/filters/pandoc-zotxt.lua -C

still produces the expected result (i.e., includes a resolved citation), whereas a second file:

---
format: html
filters:
  - /Users/nb/.local/share/pandoc/filters/pandoc-zotxt.lua-1.2.0/pandoc-zotxt.lua
  - /Users/nb/.local/share/pandoc/filters/citeproc.lua
---

Blah blah [@bard:2015introduction, 67].

processed from the command line with

quarto render test.qmd

which used to work well back in September 2022 (see above), now emits a warning:

[WARNING] Citeproc: citation bard:2015introduction not found

and the result lacks a resolved citation.

Rendering the second file from within RStudio produces the same unsatisfactory result.

Since including pandoc-zotxt.lua and citeproc as command line parameters does lead to the expected result, it seems unlikely that there’s anything wrong with the other elements in the toolchain, including Zotero, zotxt, pandoc-zotxt.lua or BBT, or quarto per se – but maybe something has been changed with regard quarto’s processing of filters listed in the YAML header?


  • OS: MacOS 15.1.1 (24B91)
  • IDE: Version 2024.09.1+394 (2024.09.1+394)
  • quarto check
Quarto 1.5.57
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.2.0: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.41.0: OK
      Typst version 0.11.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.5.57
      Path: /Applications/quarto/bin

[✓] Checking tools....................OK
      Chromium: 869685
      TinyTeX: (not installed)

[✓] Checking LaTeX....................OK
      Using: Installation From Path
      Path: /usr/local/texlive/2024/bin/universal-darwin
      Version: 2024

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.13.0
      Path: /opt/homebrew/opt/python@3.13/bin/python3.13
      Jupyter: (None)

      Jupyter is not available in this Python installation.
      Install with python3 -m pip install jupyter

[✓] Checking R installation...........OK
      Version: 4.4.2
      Path: /Library/Frameworks/R.framework/Resources
      LibPaths:
        - /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
      knitr: 1.49
      rmarkdown: 2.29

[✓] Checking Knitr engine render......OK
  • Zotero 7.0.9
  • zotxt 7.0.0
  • pandoc-zotxt.lua 1.2.0
  • BetterBibTeX 6.7.253

@cderv
Copy link
Collaborator

cderv commented Nov 25, 2024

@njbart as linked by @TomBener, this issue seems to be related to #7888 improvement.

It seems panodc-zotxt.lua requires citeproc from Pandoc to have run before applying. And this is currently not possible without using -L at command line.

See also my reply in a related issue

So watch out for #7888!

which used to work well back in September 2022 (see above), now emits a warning:

Since 2022, I believe there were some changes in the complex Lua filter chain that messed this up. #7888 is about improving this. Current limitation is for any filter in filters: to run before citeproc unfortunately.

@njbart
Copy link
Author

njbart commented Nov 25, 2024

Many thanks. I will watch out for #7888 then.

One small thing however:

It seems panodc-zotxt.lua requires citeproc from Pandoc to have run before applying. And this is currently not possible without using -L at command line.

I don't think that's accurate:

The logic is for panodc-zotxt.lua to get the required biblio data from Zotero and write these into the YAML header before citeproc does its job.

Example:

This file:

@bard:2015introduction

processed with

pandoc -s test.md -L pandoc-zotxt.lua -t markdown

outputs

---
references:
- author:
  - family: Bard
    given: Kathryn A.
  edition: 2
  issued:
    date-parts:
    - - 2015
  language: en-US
  publisher: Wiley-Blackwell
  publisher-place: Hoboken, NJ
  title: An introduction to the archaeology of Ancient Egypt
  type: book
---

@bard:2015introduction

citeproc has not been involved so far here, but this intermediate result can be processed further by

pandoc -s test_zotxt_lua_citation.md -L pandoc-zotxt.lua -t markdown | pandoc --citeproc -t plain

to obtain the expected result:

Bard (2015)

Bard, Kathryn A. 2015. An Introduction to the Archaeology of Ancient
Egypt. 2nd ed. Hoboken, NJ: Wiley-Blackwell.

@cderv
Copy link
Collaborator

cderv commented Nov 25, 2024

The logic is for panodc-zotxt.lua to get the required biblio data from Zotero and write these into the YAML header before citeproc does its job.

Then I am surprised this does not work. let me try to reproduce the error on my side. It could be something else.

@cscheid
Copy link
Collaborator

cscheid commented Nov 25, 2024

FWIW, I've moved #7888 to 1.7 - that seems like a fairly important fix.

@cderv
Copy link
Collaborator

cderv commented Nov 25, 2024

@njbart Looking more closely to your example you mentioned the following

citeproc has not been involved so far here, but this intermediate result can be processed further by

Is this filter expected to be used in two steps ?

Because passing as filters: inside Quarto will insert the filter in the chain, with citeproc last. But it won't be a two steps process. And that is possibly the problem here compared to calling pandoc -L pandoc-zotxt.lua -C

It seems panodc-zotxt.lua requires citeproc from Pandoc to have run before applying

So this was probably wrong conclusion, but still #7888 is required so that citeproc can correctly be called at the right time. With current logic, I don't think the filter is working.

We'll try to check this filter when working on #7888 to be sure this is solving the issue.

@njbart
Copy link
Author

njbart commented Nov 25, 2024

Is this filter expected to be used in two steps ?

No.

First ‘step’ is pandoc-zotxt.lua identifying which citekeys occur in the source document, fetching the corresponding biblio data from Zotero and inserting these in CSL YAML format into the source document’s YAML header. After this ‘step’ we have what I called the ‘intermediate result’: the biblio data are there, but have not been processed by citeproc yet.

Second ‘step’ is citeproc doing its job.

Reversing the order (i.e., citeproc before pandoc-zotxt.lua) can’t be expected to work, since in this case citeproc just doesn’t have any data to process, as shown by pandoc -s test.md -C -L pandoc-zotxt.lua -t plain.

@cderv
Copy link
Collaborator

cderv commented Nov 25, 2024

inserting these in CSL YAML format into the source document’s YAML header.

The document is not processed a second time with this new YAML by Quarto; This is why I was asking about two step process. If this is supposed to be written in-document then it won't be read again by Quarto.

Though my understanding of the lua filter is that it puts the biblio in Metadata inside the AST, but it is not picked up by citeproc in our filter chain logic. I really think that #7888 will help with this problem.

Reversing the order (i.e., citeproc before pandoc-zotxt.lua) can’t be expected to work, since in this case citeproc just doesn’t have any data to process, as shown by pandoc -s test.md -C -L pandoc-zotxt.lua -t plain.

Yeah sorry to have been misleading on this. This is another problem for other filters that #7888 aims to fix. Not the case here. Problem is a bit different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants