Added the pagelist entry #339

iherman · 2018-10-09T10:01:10Z

This is the action recorded in minutes of 8th of October.

Some notes

@avneesh: I have added a text in 3.8 that I copied, essentially, from the similar entry in the EPUB document. I wonder if the pagelist as we have here would not be a bit more general, and not depend on "page boundaries present in a print source". A more general text would be therefore welcome.
@HadrienGardeur: I am aware of Simplifying the WebIDL #338, but I decided to keep this PR independent of that one, ie, follow the current draft and add the pagelist entry to the WebIDL. If that issue leads to overall changes we can remove that easily, but my goal was to keep things consistent.

If this PR is merged, then w3c/pub-vocab#1 should also be merged and processed; this adds the extra rel value to our vocabulary files.

Cc: @GarthConboy @llemeurfr @laudrain @HadrienGardeur

Fix #223

Preview | Diff

HadrienGardeur · 2018-10-09T11:11:10Z

@iherman aside from my comment in #338, I don't think that pagelist and toc can simply return an HTMLElement.

Since they can both be contained in resources other than the entry page, we need to know the URIs of such resources as well.
There's also the case of multiple resources marked as being the TOC and/or page liste as well, which requires extra attention.

HadrienGardeur · 2018-10-09T11:13:19Z

Sidenote: we can usually preview such PRs but I'm not seeing any link for that right now. Is something broken in our pipeline?

iherman · 2018-10-09T11:13:58Z

I left this intentionally open for now, because the whole issue of ToC is still open as far as I am concerned, and until we have a clear idea of what we want, I do not want to touch the WebIDL. That should come when all other issues around ToC are closed.

For pagelist, I think we should, for now, simply keep it as a clone of ToC, so to say, and handle them together. This should not be done as part of this PR imho...

HadrienGardeur · 2018-10-09T11:26:20Z

I think that the situation is quite different from last time. We were discussing whether the TOC could be machine friendly enough to result in a different representation in the WebIDL where it would already be parsed and ready to use for UAs.

In the current PR, there's a contradiction between:

the spec language that allows the use of a rel to identify a page list in resources or readingOrder
the WebIDL which would only work if the page list is contained in the entry page

This would be resolved by #338 if instead of HTMLElement we implemented a getter (TBD), but even in this case, the getter wouldn't return an HTMLElement.

HadrienGardeur · 2018-10-09T11:30:06Z

By the way, all of the previous comment is also true for the TOC and this is completely independent from any discussion about the machine readability of such documents.

lrosenthol · 2018-10-09T13:23:39Z

I realize that I am jumping in here pretty late, but I would like to strongly recommendation against the pagelist as defined here (and in EPUB). I believe, based on various implementation considerations, that it is backwards from what is really desired to provide rich reflow<->static mappings.

Instead, individual content elements should identify their source pages (eg. data-source-page). Using this approach, a UA is able to not only take the user to the correct page ("Jump to Page") but the right spot on the page (eg. paragraph 4 on page 3). Additionally, it enables the UA to map between the two models in publications - something extremely useful when you want to share (web) annotations between a fixed layout and a reflowable version of the same publication.

avneeshsingh · 2018-10-09T13:51:27Z

How would this approach work in a non-WP aware user agent. It would be good to know the details.

lrosenthol · 2018-10-09T15:49:14Z

@avneeshsingh great question! If we use data attributes, then the non-WP-aware UA would ignore them (since that's the rules - https://www.w3.org/TR/html52/dom.html#embedding-custom-non-visible-data-with-the-data-attributes). However, because they are "connected" with the content (being attributes), they could be used by any HTML UA, such as an assistive technology device, even if it wasn't WP aware.

danielweck · 2018-10-09T16:01:13Z

@lrosenthol isn't that abusing / misusing custom data-xxx attributes?

"User agents must not derive any implementation behavior from these attributes or values. Specifications intended for user agents must not define these attributes to have any meaningful values."

"Custom data attributes are intended to store custom data private to the page or application, for which there are no more appropriate attributes or elements."

"These attributes are not intended for use by software that is not known to the administrators of the site that uses the attributes. For generic extensions that are to be used by multiple independent tools, either this specification should be extended to provide the feature explicitly, or a technology like microdata should be used (with a standardized vocabulary)."

"JavaScript libraries may use the custom data attributes, as they are considered to be part of the page on which they are used."

etc.
https://www.w3.org/TR/html52/dom.html#embedding-custom-non-visible-data-with-the-data-attributes

danielweck · 2018-10-09T16:35:40Z

The pagelist is a list of links that provide navigation to positions in the content that correspond to the locations of page boundaries present in a print source being represented by the Web Publication

I am wondering whether it would be useful to clarify "print source" here, as this could be interpreted exclusively as "publications printed on paper, like traditional books or magazines", when this could in fact be referring to digital publications not designed to be fed through a printer at all (i.e. born-digital, stay-digital documents), such as a fixed-layout / pre-paginated EPUBs (or perhaps even PDFs) with interactive features, videos, animations, etc. I have seen a few examples of sales brochures, technical documentation, children's talking books, cooking / recipe books, etc. that do not have print equivalent, yet they are "paginated" works that would benefit from "pagelist" navigation in Web Publications.

lrosenthol · 2018-10-09T16:40:57Z

@danielweck

isn't that abusing / misusing custom data-xxx attributes?

Probably...but HTML's limited model for attributes doesn't give us too much other choice..short of going to the HTML committee and asking for a new one. But I still think either of those is a better choice than creating a completely new grammar/language that requires custom parsing...

danielweck · 2018-10-09T16:58:11Z

Pagelist is important for Web Publications with print equivalent, however it may also be used for Web Publications which do not have print equivalent, for making the navigation more convenient.

I am not sure I understand the intent of this additional prose (which was inserted on top of the original proposal), and I am certainly not convinced by the claim "for making the navigation more convenient" which does not seem based upon a well-defined rationale.

Normally, we start with use cases / functional requirements, and if we agree on the needs, we translate that to well-specified / standardized mechanisms. With the additional prose I feel that we are introducing a tautological statement, and an imprecise one as well. Here is how I read the current prose: "page lists are important for publications that somehow have pages, but even if they don't, this feature is useful too because it provides navigation convenience").

Let's look at it from a different angle, and let's say that we're now debating the inclusion of EPUB's "list of landmarks" (e.g. figures, mathematical equations, etc.) in the Web Publications standard. Just like "pagelist", there would be a well-defined metadata vocabulary (or semantic role) that reading systems / user agents would use to discover the list of links identifying worthy locations within publication resources. It would be strange to state that "this feature can also be used for publications that do not actually contain figures, to make navigation more convenient".

I am sorry if I am missing the point :)
Perhaps I am just nit-picking.

iherman · 2018-10-09T16:59:26Z

@danielweck : as I said on the call yesterday, I do need a text for the paragraph referring you, and I welcome any input:-)

iherman · 2018-10-09T17:01:38Z

@danielweck maybe the discussion should move back to #223...

danielweck · 2018-10-09T17:06:22Z

@danielweck maybe the discussion should move back to #223...

You mean the discussion about data-xxx attributes, right? (+ @lrosenthol )

I think this PR is the right place to discuss the prose proposed in this PR :)

TzviyaSiegman · 2018-10-09T17:10:00Z

@lrosenthol granular anchoring is not the topic of discussion here. Please open a separate issue if you'd like to see that addressed.

iherman · 2018-10-09T17:22:00Z

@danielweck no. #223 is on any other navigation elements, including the pagelist. This PR implements the current resolution of including pagelist, but if we question this resolution, or want to extend it to other categories, the discussion should be done there, imho.

lrosenthol · 2018-10-09T17:38:44Z

@TzviyaSiegman I disagree, as we are discussing the pagelist here...However, it appears that @iherman thinks that #223 would be a better spot and I am happy to move it there since both @danielweck and I have concerns about the currently proposed pagelist mechanism .

danielweck · 2018-10-09T17:45:04Z

as I said on the call yesterday, I do need a text for the paragraph referring you, and I welcome any input:-)

@iherman

(1) I am proposing a clarification of "print source" as per my comment here ( #339 (comment) ) which I feel is neither an objection to the resolution, nor a request for extending the semantics of "page list" (perhaps the latter is where our interpretation differ, in which case I will be happy to move to #223 ).

(2) Regarding the additional sentence "Pagelist is important for Web Publications with print equivalent, however it may also be used for Web Publications which do not have print equivalent, for making the navigation more convenient", right now I am leaning towards removing it entirely, based on the reasons mentioned in comment #339 (comment) (I feel that the original prose clearly addresses the identified use-case / functional requirement, especially if the notion of "print source" is clarified)

mattgarrish · 2018-10-09T17:58:24Z

I agree with @danielweck that the wording right now needs improvement. What about something like:

The pagelist is a list of links that provides navigation to static page demarcation points within the content. These locations allow users, for example, to coordinate access into the content. The exact nature of these locations is left to content creators to define. They might correspond to a statically paginated source, for example, such as a print document, or might be a purely digital creation to simplify access into the content.

How we provide the convenience of pagination in a digital world is a problem that needs a better solution some day, but abusing attributes that we know are just going to lead to non-implementation is not the way forward. It's not a problem we're likely to solve on WP's timeline.

TzviyaSiegman · 2018-10-09T17:58:41Z

I agree with @danielweck 's concerns

"print source" is a bit unclear. Perhaps we should phrase it as "paginated content" or simply remove the phrase. HTML can be paginated too.
+1 to removing mention of when it might be helpful to use pagelist. We leave this as a MAY and allow the user to determine when to use it.

danielweck · 2018-10-09T18:00:09Z

To put it differently: once the notion of "print source" is clarified to encompass any type of publication that intrinsically provides / contains a "list of pages" (thus the "pagelist" keyword / semantic role proposed here), such as "printed physical publications", "digital fixed-layout publications", and otherwise "reflowable publication with some sort of virtual pagination markers" (which I believe was the primary motivator for the additional prose), then we can remove the additional prose.

danielweck · 2018-10-09T18:02:41Z

@mattgarrish I like the term "static" because I interpret its opposite meaning ("dynamic") as "pagination at rendering time, not baked into the content at authoring time". I am not sure everybody would use the same interpretation though.

llemeurfr · 2018-10-10T19:25:57Z

@avneeshsingh, I had further thoughts about the start/end issue; UAs will follow a link in a page list and display content. Such link acts as a bookmark would: in reflow mode, the "synthetic page" (i.e. the screen) displayed will be the one the user would have accessed whilst moving from page to page from the start of the corresponding html resource. Therefore its impossible to know if the page break will be at the start, middle or end of this page, as it depends on the screen size.

@mattgarish, as it isn't tied to the current PR, I'll create a specific issue for the issue of page numbers as content or empty elements.

llemeurfr · 2018-10-10T19:44:45Z

In the proposed text I read [the pagelist nav] "MUST be the first element in the document so designated." It seems impossible to have both the ToC nav and the pagelige nav in the same entry page, both being the first element in the document. Or Do I miss something?

mattgarrish · 2018-10-10T19:47:12Z

I'll create a specific issue for the issue of page numbers as content or empty elements.

Oh, okay, I misunderstood your request. One common issue is whether the numbers have to be present with the page list and the other is whether they have to be content or not. I can comment on the latter when you open the issue. :)

mattgarrish · 2018-10-10T19:51:03Z

It seems impossible to have both the ToC nav and the pagelige nav in the same entry page, both being the first element in the document.

The wording is a bit confusing, as it seems to mix authoring and user agent requirements. I read it as the pagelist/toc MUST be identified by the appropriate role, and user agents MUST recognize the first one so designated. It would probably be better to split these statements, if so.

iherman · 2018-10-11T04:52:47Z

In the proposed text I read [the pagelist nav] "MUST be the first element in the document so designated." It seems impossible to have both the ToC nav and the pagelige nav in the same entry page, both being the first element in the document. Or Do I miss something?

"so designated" includes the choice of the nav based on the value of role (doc-toc vs. doc-panelist). Ie, the choice of these two are disjoint, and they can both be in the same file.

iherman · 2018-10-11T08:40:07Z

@HadrienGardeur

In the current PR, there's a contradiction between:

the spec language that allows the use of a rel to identify a page list in resources or readingOrder

the WebIDL which would only work if the page list is contained in the entry page

I do not see why the second statement would be true. It is an HTMLElement, ie, a DOM object, and that can be in any HTML resource, whether it is the entry page or not.

But, again, this is the same as the ToC. Finalizing the ToC would fix this, too.

HadrienGardeur · 2018-10-11T08:44:45Z

@iherman this means that in addition to having the DOM of the current document being displayed (for example chapter 1 instead of the entry page), you would need the DOM for potentially three additional documents as well:

the one containing the manifest
the one containing the TOC
the one containing the pagelist

This is quite heavy in terms of processing and I really don't think that's a good idea.

iherman · 2018-10-11T09:12:04Z

Trying to find a consensus on this PR, based on the texts above. I extract the following possible changes before merge:

The text in 3.8 should become:

The pagelist is a list of links that provides navigation to static page demarcation points within the content. These locations allow users, for example, to coordinate access into the content. The exact nature of these locations is left to content creators to define. They usually correspond to pages of a print document which is the source of the digital publication, but might be a purely digital creation added for the sake of easing navigation.

in 3.7 and 3.8 part of the sentence

, and MUST be the first element in the document so designated.

should say

, and MUST be the first element in the document with that role value.

I have also looked at the issue on whether we would allow several pagelists. This requires some "stylistic" changes ("A pagelist" instead of "The pageslist") and (2) above is moot for pagelists. However, it also raises the question on how to locate them. We could say:

The UA tries to locate pagelists in the primary entry page
Additionally, it will follow all entries, if available, in resources or readingOrder.

Do we agree to have this? Is it important to have several pageslists, in fact? Or should we forget about this for now?

Additionally, some other questions arose, to be discussed elsewhere, but should not prevent merging this PR:

should there be definition or advise to identify page number content
possible changes as a result of Simplifying the WebIDL #338 would affect the WebIDL referring to pagelist
have some text in the draft (or best practices) on what should kind of targets should be in pageslists
usage of data-xxx attributes to identify targets

HadrienGardeur · 2018-10-11T09:15:59Z

Additionally, it will follow all entries, if available, in resources or readingOrder.

Yikes. Why would we do that? The pagelist should be identified in resources or readingOrder using a rel value.

iherman · 2018-10-11T09:19:29Z

@HadrienGardeur, on #339 (comment) ("you would need the DOM for potentially three additional documents as well"): that is unfortunately correct. But this is related to the (still open) issue #291. If that was closed, we could simplify the WebIDL in some way, possibly defining a helping function that returns a ToC/pageslist in some JSON format for further processing. But, until that issue is open, we are stuck.

iherman · 2018-10-11T09:21:06Z

@HadrienGardeur

Yikes. Why would we do that? The pagelist should be identified in resources or readingOrder using a rel value.

That is what I meant. If we have several pagelists, then each of them should be treated as we have defined for one.

Note that I do not have strong feelings whether we need this or not, ie, whether we should allow several pages lists or not. I was just exploring what this would mean.

avneeshsingh · 2018-10-11T09:27:55Z

@iherm
The changes written by you are fine. Regarding multiple pagelist, I think it would be good to keep away from geting into to much details. We heard about a use case from Dave but we should know about more use cases to specify multiple pagelist properly.

HadrienGardeur · 2018-10-11T13:51:06Z

there are many types of publications which would be having different implementations of page numbers. Some may be numeric, some may be alphabetic and some may have labels.

@avneeshsingh IMO this is exactly what makes it very difficult for UAs to implement good features based on page lists.

For example, as a UA developer I can't show a dialog that will let the user jump to page "9" because:

the pagelist might skip "9" entirely
or it is indicated as "IX" or "ix"

As a UA developer, I can only provide:

a long list of references (not super readable)
and somehow offer the ability to display the page number for what I'm currently viewing (as an overlay)

I frankly don't think that's enough for education where we need to easily share a location AND jump to it as well (teacher tells the class to read page 9).

avneeshsingh · 2018-10-11T14:14:56Z

In WP, we need to walk between flexibility and precision. This is why the best practices are so important. I would love to have it well defined that can address main use cases, but then it will restrict flexibility.

laudrain · 2018-10-11T17:41:59Z

and somehow offer the ability to display the page number for what I'm currently viewing (as an overlay)

In the content, page breaks have to identified with semantic, for instance the role "doc-pagebreak".

iherman · 2018-10-12T04:48:59Z

Unless there is outcry, I intend to modify the PR today along the lines of #339 (comment), except that I would keep the unicity of pagelists (ie, there should be only zero or one), following the remarks of Avneesh.

avneeshsingh · 2018-10-12T05:18:43Z

@iherman
Good for now. If the group members find that multiple page lists are important, it would be good to open a new issue for it. I anticipate that identifying multiple pagelists with different print sources (as mentioned by Dave) will need additional work, so it would be good to document the use cases for it before we expand pagelist to pagelists.

index.html

iherman · 2018-10-12T12:47:40Z

@mattgarrish I had to make some more changes to make the first-come-first-served rule:-) Please check. Maybe merge after this?

mattgarrish · 2018-10-12T13:17:36Z

Looks good to me now.

Added the pagelist entry

6f0cb6e

iherman requested review from mattgarrish and avneeshsingh October 9, 2018 10:01

This was referenced Oct 9, 2018

Added the pageslist rel value w3c/pub-vocab#1

Closed

Simplifying the WebIDL #338

Closed

Added Avneesh's text to the pagelist description.

645fd73

lrosenthol mentioned this pull request Oct 9, 2018

Other navigation elements beyond ToC #223

Closed

Proposed final version for the PR

1506389

This was referenced Oct 12, 2018

Additional text or hints on what would become pagelist targets. #341

Closed

Several pagelists? #342

Closed

mattgarrish reviewed Oct 12, 2018

View reviewed changes

index.html Outdated Show resolved Hide resolved

index.html Outdated Show resolved Hide resolved

index.html Outdated Show resolved Hide resolved

iherman added 2 commits October 12, 2018 14:17

Remains of a TOC text

f2afaad

Removed ambiguity for several toc/pagelist entries

0ad5882

typo

a176c0a

mattgarrish merged commit 1481110 into master Oct 12, 2018

mattgarrish deleted the pagelist branch October 12, 2018 13:17

Added the pagelist entry #339

Added the pagelist entry #339

Conversation

iherman commented Oct 9, 2018 • edited by pr-preview bot Loading

HadrienGardeur commented Oct 9, 2018

HadrienGardeur commented Oct 9, 2018

iherman commented Oct 9, 2018

HadrienGardeur commented Oct 9, 2018 • edited Loading

HadrienGardeur commented Oct 9, 2018

lrosenthol commented Oct 9, 2018

avneeshsingh commented Oct 9, 2018 via email

lrosenthol commented Oct 9, 2018

danielweck commented Oct 9, 2018

danielweck commented Oct 9, 2018

lrosenthol commented Oct 9, 2018

danielweck commented Oct 9, 2018

iherman commented Oct 9, 2018

iherman commented Oct 9, 2018

danielweck commented Oct 9, 2018

TzviyaSiegman commented Oct 9, 2018

iherman commented Oct 9, 2018

lrosenthol commented Oct 9, 2018

danielweck commented Oct 9, 2018

mattgarrish commented Oct 9, 2018

TzviyaSiegman commented Oct 9, 2018

danielweck commented Oct 9, 2018

danielweck commented Oct 9, 2018

llemeurfr commented Oct 10, 2018

llemeurfr commented Oct 10, 2018

mattgarrish commented Oct 10, 2018

mattgarrish commented Oct 10, 2018

iherman commented Oct 11, 2018

iherman commented Oct 11, 2018

HadrienGardeur commented Oct 11, 2018

iherman commented Oct 11, 2018

HadrienGardeur commented Oct 11, 2018

iherman commented Oct 11, 2018

iherman commented Oct 11, 2018

avneeshsingh commented Oct 11, 2018

HadrienGardeur commented Oct 11, 2018

avneeshsingh commented Oct 11, 2018 via email

laudrain commented Oct 11, 2018

iherman commented Oct 12, 2018

avneeshsingh commented Oct 12, 2018

iherman commented Oct 12, 2018

mattgarrish commented Oct 12, 2018

iherman commented Oct 9, 2018 •

edited by pr-preview bot

Loading

HadrienGardeur commented Oct 9, 2018 •

edited

Loading