Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the pagelist entry #339

Merged
merged 6 commits into from
Oct 12, 2018
Merged

Added the pagelist entry #339

merged 6 commits into from
Oct 12, 2018

Conversation

iherman
Copy link
Member

@iherman iherman commented Oct 9, 2018

This is the action recorded in minutes of 8th of October.

Some notes

  • @avneesh: I have added a text in 3.8 that I copied, essentially, from the similar entry in the EPUB document. I wonder if the pagelist as we have here would not be a bit more general, and not depend on "page boundaries present in a print source". A more general text would be therefore welcome.
  • @HadrienGardeur: I am aware of Simplifying the WebIDL #338, but I decided to keep this PR independent of that one, ie, follow the current draft and add the pagelist entry to the WebIDL. If that issue leads to overall changes we can remove that easily, but my goal was to keep things consistent.

If this PR is merged, then w3c/pub-vocab#1 should also be merged and processed; this adds the extra rel value to our vocabulary files.

Cc: @GarthConboy @llemeurfr @laudrain @HadrienGardeur

Fix #223


Preview | Diff

@HadrienGardeur
Copy link
Member

@iherman aside from my comment in #338, I don't think that pagelist and toc can simply return an HTMLElement.

Since they can both be contained in resources other than the entry page, we need to know the URIs of such resources as well.
There's also the case of multiple resources marked as being the TOC and/or page liste as well, which requires extra attention.

@HadrienGardeur
Copy link
Member

Sidenote: we can usually preview such PRs but I'm not seeing any link for that right now. Is something broken in our pipeline?

@iherman
Copy link
Member Author

iherman commented Oct 9, 2018

I left this intentionally open for now, because the whole issue of ToC is still open as far as I am concerned, and until we have a clear idea of what we want, I do not want to touch the WebIDL. That should come when all other issues around ToC are closed.

For pagelist, I think we should, for now, simply keep it as a clone of ToC, so to say, and handle them together. This should not be done as part of this PR imho...

@HadrienGardeur
Copy link
Member

HadrienGardeur commented Oct 9, 2018

I think that the situation is quite different from last time. We were discussing whether the TOC could be machine friendly enough to result in a different representation in the WebIDL where it would already be parsed and ready to use for UAs.

In the current PR, there's a contradiction between:

  • the spec language that allows the use of a rel to identify a page list in resources or readingOrder
  • the WebIDL which would only work if the page list is contained in the entry page

This would be resolved by #338 if instead of HTMLElement we implemented a getter (TBD), but even in this case, the getter wouldn't return an HTMLElement.

@HadrienGardeur
Copy link
Member

By the way, all of the previous comment is also true for the TOC and this is completely independent from any discussion about the machine readability of such documents.

@lrosenthol
Copy link

I realize that I am jumping in here pretty late, but I would like to strongly recommendation against the pagelist as defined here (and in EPUB). I believe, based on various implementation considerations, that it is backwards from what is really desired to provide rich reflow<->static mappings.

Instead, individual content elements should identify their source pages (eg. data-source-page). Using this approach, a UA is able to not only take the user to the correct page ("Jump to Page") but the right spot on the page (eg. paragraph 4 on page 3). Additionally, it enables the UA to map between the two models in publications - something extremely useful when you want to share (web) annotations between a fixed layout and a reflowable version of the same publication.

@avneeshsingh
Copy link

avneeshsingh commented Oct 9, 2018 via email

@lrosenthol
Copy link

@avneeshsingh great question! If we use data attributes, then the non-WP-aware UA would ignore them (since that's the rules - https://www.w3.org/TR/html52/dom.html#embedding-custom-non-visible-data-with-the-data-attributes). However, because they are "connected" with the content (being attributes), they could be used by any HTML UA, such as an assistive technology device, even if it wasn't WP aware.

@danielweck
Copy link
Member

@lrosenthol isn't that abusing / misusing custom data-xxx attributes?

"User agents must not derive any implementation behavior from these attributes or values. Specifications intended for user agents must not define these attributes to have any meaningful values."

"Custom data attributes are intended to store custom data private to the page or application, for which there are no more appropriate attributes or elements."

"These attributes are not intended for use by software that is not known to the administrators of the site that uses the attributes. For generic extensions that are to be used by multiple independent tools, either this specification should be extended to provide the feature explicitly, or a technology like microdata should be used (with a standardized vocabulary)."

"JavaScript libraries may use the custom data attributes, as they are considered to be part of the page on which they are used."

etc.
https://www.w3.org/TR/html52/dom.html#embedding-custom-non-visible-data-with-the-data-attributes

@danielweck
Copy link
Member

The pagelist is a list of links that provide navigation to positions in the content that correspond to the locations of page boundaries present in a print source being represented by the Web Publication

I am wondering whether it would be useful to clarify "print source" here, as this could be interpreted exclusively as "publications printed on paper, like traditional books or magazines", when this could in fact be referring to digital publications not designed to be fed through a printer at all (i.e. born-digital, stay-digital documents), such as a fixed-layout / pre-paginated EPUBs (or perhaps even PDFs) with interactive features, videos, animations, etc. I have seen a few examples of sales brochures, technical documentation, children's talking books, cooking / recipe books, etc. that do not have print equivalent, yet they are "paginated" works that would benefit from "pagelist" navigation in Web Publications.

@lrosenthol
Copy link

@danielweck

isn't that abusing / misusing custom data-xxx attributes?

Probably...but HTML's limited model for attributes doesn't give us too much other choice..short of going to the HTML committee and asking for a new one. But I still think either of those is a better choice than creating a completely new grammar/language that requires custom parsing...

@danielweck
Copy link
Member

Pagelist is important for Web Publications with print equivalent, however it may also be used for Web Publications which do not have print equivalent, for making the navigation more convenient.

I am not sure I understand the intent of this additional prose (which was inserted on top of the original proposal), and I am certainly not convinced by the claim "for making the navigation more convenient" which does not seem based upon a well-defined rationale.

Normally, we start with use cases / functional requirements, and if we agree on the needs, we translate that to well-specified / standardized mechanisms. With the additional prose I feel that we are introducing a tautological statement, and an imprecise one as well. Here is how I read the current prose: "page lists are important for publications that somehow have pages, but even if they don't, this feature is useful too because it provides navigation convenience").

Let's look at it from a different angle, and let's say that we're now debating the inclusion of EPUB's "list of landmarks" (e.g. figures, mathematical equations, etc.) in the Web Publications standard. Just like "pagelist", there would be a well-defined metadata vocabulary (or semantic role) that reading systems / user agents would use to discover the list of links identifying worthy locations within publication resources. It would be strange to state that "this feature can also be used for publications that do not actually contain figures, to make navigation more convenient".

I am sorry if I am missing the point :)
Perhaps I am just nit-picking.

@iherman
Copy link
Member Author

iherman commented Oct 9, 2018

@danielweck : as I said on the call yesterday, I do need a text for the paragraph referring you, and I welcome any input:-)

@iherman
Copy link
Member Author

iherman commented Oct 9, 2018

@danielweck maybe the discussion should move back to #223...

@danielweck
Copy link
Member

@danielweck maybe the discussion should move back to #223...

You mean the discussion about data-xxx attributes, right? (+ @lrosenthol )

I think this PR is the right place to discuss the prose proposed in this PR :)

@TzviyaSiegman
Copy link
Contributor

@lrosenthol granular anchoring is not the topic of discussion here. Please open a separate issue if you'd like to see that addressed.

@iherman
Copy link
Member Author

iherman commented Oct 9, 2018

@danielweck no. #223 is on any other navigation elements, including the pagelist. This PR implements the current resolution of including pagelist, but if we question this resolution, or want to extend it to other categories, the discussion should be done there, imho.

@lrosenthol
Copy link

@TzviyaSiegman I disagree, as we are discussing the pagelist here...However, it appears that @iherman thinks that #223 would be a better spot and I am happy to move it there since both @danielweck and I have concerns about the currently proposed pagelist mechanism .

@danielweck
Copy link
Member

as I said on the call yesterday, I do need a text for the paragraph referring you, and I welcome any input:-)

@iherman

(1) I am proposing a clarification of "print source" as per my comment here ( #339 (comment) ) which I feel is neither an objection to the resolution, nor a request for extending the semantics of "page list" (perhaps the latter is where our interpretation differ, in which case I will be happy to move to #223 ).

(2) Regarding the additional sentence "Pagelist is important for Web Publications with print equivalent, however it may also be used for Web Publications which do not have print equivalent, for making the navigation more convenient", right now I am leaning towards removing it entirely, based on the reasons mentioned in comment #339 (comment) (I feel that the original prose clearly addresses the identified use-case / functional requirement, especially if the notion of "print source" is clarified)

@mattgarrish
Copy link
Member

I agree with @danielweck that the wording right now needs improvement. What about something like:

The pagelist is a list of links that provides navigation to static page demarcation points within the content. These locations allow users, for example, to coordinate access into the content. The exact nature of these locations is left to content creators to define. They might correspond to a statically paginated source, for example, such as a print document, or might be a purely digital creation to simplify access into the content.

How we provide the convenience of pagination in a digital world is a problem that needs a better solution some day, but abusing attributes that we know are just going to lead to non-implementation is not the way forward. It's not a problem we're likely to solve on WP's timeline.

@TzviyaSiegman
Copy link
Contributor

I agree with @danielweck 's concerns

  1. "print source" is a bit unclear. Perhaps we should phrase it as "paginated content" or simply remove the phrase. HTML can be paginated too.
  2. +1 to removing mention of when it might be helpful to use pagelist. We leave this as a MAY and allow the user to determine when to use it.

@danielweck
Copy link
Member

To put it differently: once the notion of "print source" is clarified to encompass any type of publication that intrinsically provides / contains a "list of pages" (thus the "pagelist" keyword / semantic role proposed here), such as "printed physical publications", "digital fixed-layout publications", and otherwise "reflowable publication with some sort of virtual pagination markers" (which I believe was the primary motivator for the additional prose), then we can remove the additional prose.

@danielweck
Copy link
Member

@mattgarrish I like the term "static" because I interpret its opposite meaning ("dynamic") as "pagination at rendering time, not baked into the content at authoring time". I am not sure everybody would use the same interpretation though.

@llemeurfr
Copy link
Contributor

@avneeshsingh, I had further thoughts about the start/end issue; UAs will follow a link in a page list and display content. Such link acts as a bookmark would: in reflow mode, the "synthetic page" (i.e. the screen) displayed will be the one the user would have accessed whilst moving from page to page from the start of the corresponding html resource. Therefore its impossible to know if the page break will be at the start, middle or end of this page, as it depends on the screen size.

@mattgarish, as it isn't tied to the current PR, I'll create a specific issue for the issue of page numbers as content or empty elements.

@llemeurfr
Copy link
Contributor

In the proposed text I read [the pagelist nav] "MUST be the first element in the document so designated." It seems impossible to have both the ToC nav and the pagelige nav in the same entry page, both being the first element in the document. Or Do I miss something?

@mattgarrish
Copy link
Member

I'll create a specific issue for the issue of page numbers as content or empty elements.

Oh, okay, I misunderstood your request. One common issue is whether the numbers have to be present with the page list and the other is whether they have to be content or not. I can comment on the latter when you open the issue. :)

@mattgarrish
Copy link
Member

It seems impossible to have both the ToC nav and the pagelige nav in the same entry page, both being the first element in the document.

The wording is a bit confusing, as it seems to mix authoring and user agent requirements. I read it as the pagelist/toc MUST be identified by the appropriate role, and user agents MUST recognize the first one so designated. It would probably be better to split these statements, if so.

@iherman
Copy link
Member Author

iherman commented Oct 11, 2018

In the proposed text I read [the pagelist nav] "MUST be the first element in the document so designated." It seems impossible to have both the ToC nav and the pagelige nav in the same entry page, both being the first element in the document. Or Do I miss something?

"so designated" includes the choice of the nav based on the value of role (doc-toc vs. doc-panelist). Ie, the choice of these two are disjoint, and they can both be in the same file.

@iherman
Copy link
Member Author

iherman commented Oct 11, 2018

@HadrienGardeur

In the current PR, there's a contradiction between:

  • the spec language that allows the use of a rel to identify a page list in resources or readingOrder
  • the WebIDL which would only work if the page list is contained in the entry page

I do not see why the second statement would be true. It is an HTMLElement, ie, a DOM object, and that can be in any HTML resource, whether it is the entry page or not.

But, again, this is the same as the ToC. Finalizing the ToC would fix this, too.

@HadrienGardeur
Copy link
Member

@iherman this means that in addition to having the DOM of the current document being displayed (for example chapter 1 instead of the entry page), you would need the DOM for potentially three additional documents as well:

  • the one containing the manifest
  • the one containing the TOC
  • the one containing the pagelist

This is quite heavy in terms of processing and I really don't think that's a good idea.

@iherman
Copy link
Member Author

iherman commented Oct 11, 2018

Trying to find a consensus on this PR, based on the texts above. I extract the following possible changes before merge:

  1. The text in 3.8 should become:

The pagelist is a list of links that provides navigation to static page demarcation points within the content. These locations allow users, for example, to coordinate access into the content. The exact nature of these locations is left to content creators to define. They usually correspond to pages of a print document which is the source of the digital publication, but might be a purely digital creation added for the sake of easing navigation.

  1. in 3.7 and 3.8 part of the sentence

, and MUST be the first element in the document so designated.

should say

, and MUST be the first element in the document with that role value.


I have also looked at the issue on whether we would allow several pagelists. This requires some "stylistic" changes ("A pagelist" instead of "The pageslist") and (2) above is moot for pagelists. However, it also raises the question on how to locate them. We could say:

  1. The UA tries to locate pagelists in the primary entry page
  2. Additionally, it will follow all entries, if available, in resources or readingOrder.

Do we agree to have this? Is it important to have several pageslists, in fact? Or should we forget about this for now?


Additionally, some other questions arose, to be discussed elsewhere, but should not prevent merging this PR:

  • should there be definition or advise to identify page number content
  • possible changes as a result of Simplifying the WebIDL #338 would affect the WebIDL referring to pagelist
  • have some text in the draft (or best practices) on what should kind of targets should be in pageslists
  • usage of data-xxx attributes to identify targets

@HadrienGardeur
Copy link
Member

Additionally, it will follow all entries, if available, in resources or readingOrder.

Yikes. Why would we do that? The pagelist should be identified in resources or readingOrder using a rel value.

@iherman
Copy link
Member Author

iherman commented Oct 11, 2018

@HadrienGardeur, on #339 (comment) ("you would need the DOM for potentially three additional documents as well"): that is unfortunately correct. But this is related to the (still open) issue #291. If that was closed, we could simplify the WebIDL in some way, possibly defining a helping function that returns a ToC/pageslist in some JSON format for further processing. But, until that issue is open, we are stuck.

@iherman
Copy link
Member Author

iherman commented Oct 11, 2018

@HadrienGardeur

Yikes. Why would we do that? The pagelist should be identified in resources or readingOrder using a rel value.

That is what I meant. If we have several pagelists, then each of them should be treated as we have defined for one.

Note that I do not have strong feelings whether we need this or not, ie, whether we should allow several pages lists or not. I was just exploring what this would mean.

@avneeshsingh
Copy link

@iherm
The changes written by you are fine. Regarding multiple pagelist, I think it would be good to keep away from geting into to much details. We heard about a use case from Dave but we should know about more use cases to specify multiple pagelist properly.

@HadrienGardeur
Copy link
Member

there are many types of publications which would be having different implementations of page numbers. Some may be numeric, some may be alphabetic and some may have labels.

@avneeshsingh IMO this is exactly what makes it very difficult for UAs to implement good features based on page lists.

For example, as a UA developer I can't show a dialog that will let the user jump to page "9" because:

  • the pagelist might skip "9" entirely
  • or it is indicated as "IX" or "ix"

As a UA developer, I can only provide:

  • a long list of references (not super readable)
  • and somehow offer the ability to display the page number for what I'm currently viewing (as an overlay)

I frankly don't think that's enough for education where we need to easily share a location AND jump to it as well (teacher tells the class to read page 9).

@avneeshsingh
Copy link

avneeshsingh commented Oct 11, 2018 via email

@laudrain
Copy link

and somehow offer the ability to display the page number for what I'm currently viewing (as an overlay)

In the content, page breaks have to identified with semantic, for instance the role "doc-pagebreak".

@iherman
Copy link
Member Author

iherman commented Oct 12, 2018

Unless there is outcry, I intend to modify the PR today along the lines of #339 (comment), except that I would keep the unicity of pagelists (ie, there should be only zero or one), following the remarks of Avneesh.

@avneeshsingh
Copy link

@iherman
Good for now. If the group members find that multiple page lists are important, it would be good to open a new issue for it. I anticipate that identifying multiple pagelists with different print sources (as mentioned by Dave) will need additional work, so it would be good to document the use cases for it before we expand pagelist to pagelists.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
@iherman
Copy link
Member Author

iherman commented Oct 12, 2018

@mattgarrish I had to make some more changes to make the first-come-first-served rule:-) Please check. Maybe merge after this?

@mattgarrish
Copy link
Member

Looks good to me now.

@mattgarrish mattgarrish merged commit 1481110 into master Oct 12, 2018
@mattgarrish mattgarrish deleted the pagelist branch October 12, 2018 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants