Switch_off Switch_on Switch_off
Inventive Labs: Web Problem Solvers

My long, good-natured article about EPUB 2.1, part 1 of 3 [22062010]


Last week I wrote a witty essay, incomplete at several thousand words, about why I thought the goals of the EPUB 2.1 Working Group were ‘broadly antithetical’ to a ‘sensible and stable ebook specification’. Today I threw it away. Even after two rewrites, my attempts at wit came across as disparaging, and that’s not what I meant to do.

The people who have undertaken this work and are involved in these discussions are well-intentioned and capable. It seems to me they are earnest and as much in the dark as the rest of us about the future of books in digital form. So I didn’t publish the essay. Good for me. I still have a perspective on this stuff though, and I’m going to try again with a narrower purview.

Today I’m just going to list three concerns I have with the idea of interactivity in EPUB books โ€” which is apparently the foremost consideration on the road to 2.1. Tomorrow I’ll make a couple of observations about the EPUB specification process, as it appears to an inattentive layperson on the outside. On Thursday I’ll close it out, in my bombastic way, with a dire word of warning.

Interactions and infractions

The EPUB 2.1 Working Group charter lists fourteen “main problems” with ebooks as the existing specification caters to them. Some of the problem descriptions are, in my view, cogent and urgent. Others are not. Most are worth thinking about.

I’m specifically interested in the first one, which is about interactivity and rich media in ebooks. The charter makes the example of an interactive crossword puzzle, to be achieved by embedding executable scripts in the book’s content. Indeed, the term “book” is too narrow, because as the example implies, this is intended to attract other formerly printed material (such as newspapers) to the EPUB format. The charter continues:

“These capabilities are necessary for interactive digital textbooks and digital magazines, and more generally to enable eBooks to evolve into a new medium, rather than simply be digital equivalents of paper books.”

In the previous incarnation of this post, it was after this quote that things got snarky. I’m not going to do that now; later I will come back to why I think the ambition expressed above is unhelpful for digital publishing as a distinct and profitable industry.

But here I will just note the premise: that EPUB-lications should become interactive โ€” and the emphasis: that this is the very fabric of the future of digital publishing.

My three counterpoints are somewhat bland and technical. Perhaps set against that big thinking they may be unconvincing or obstructionist or pedantic and irrelevant. Those possibilities were why my initial essay was so ardently but unfairly argued. This time, good-naturedly though.


Security

For my own reasons, as the developer of an EPUB-friendly reading system, security is pretty much the start and end of the discussion regarding script-based interactivity within EPUB packages.

Monocle takes book-like content and embeds it, paginated and with navigation aids, in a webpage. It is, I often say, a way of making books a part of the web, just as videos and audio are now sharable, linkable, embeddable parts of the web.

Monocle must be able to manipulate the contents of parts of the book as it displays them. In doing so, due to the nature of JavaScript security measures, Monocle must expose its execution environment to the components of the book. If these components contain malicious scripts, they could exploit this open access to pinch private data out of cookies or other properties of the page state. Even if one managed to trap EPUBs in that security net, positing some future unidirectional window access or something, that doesn’t stop them annoying users with prompts, pop-unders, et cetera.

Since Monocle is an open solution, we’ll leave this choice in the hands of implementers. Scripts will not be stripped from book components, and component iframes will not use the proposed sandbox attribute. If you use Monocle and you absolutely trust the EPUBs you’re displaying in it (because you created them, for example), you don’t have to disable scripting. But the platform we are quietly building around Monocle to make book embedding easy: this will absolutely strip scripts. Trust is simply of greater importance to us.

Of course, that reason is Monocle-specific. Although for me it is the decisive consideration, we should move on. Next.


Interaction models

When you think about it, there are only two things that a reading system does with EPUB over and above a web browser. One is that it unpacks the somewhat byzantine file into constituent components, serving them up in the correct order. (This feat is perhaps less remarkable than you think.) The other is that it provides an interaction model between the person reading and the HTML being rendered.

What’s an interaction model? It’s the thing that might eventually make reading ebooks a joyful experience. It’s the flappy, wobbly, paper-like experience of turning a page in iBooks, which has probably done as much as anything to introduce less-than-avid readers to the notion of digital books. It includes the very idea of pagination itself, which is absent (or differently meant) in HTML.

Contemporary ebook interaction models include turning pages, scrubbing to different parts of the book, selecting and annotating text, creating bookmarks, magnifying text or adjusting the brightness (which in Stanza, for example, may be controlled with a simple vertical swipe), clicking internal links to advance or to view a footnote, et cetera. The experience isn’t very rich yet and ebook reading systems still have a lot of work to do making these models intuitive and enjoyable. So far the specification has mostly and properly stayed out of it. But the notion of “interactivity” scatters these pigeons like a large cat.

If that crossword puzzle was coded so that a downward swipe displayed the clue for 8-Down, Stanza’s brightness control would not work. The Kobo Reader, to which many people here in ebook-device-starved Australia have flocked, would not do anything, because it has no touchscreen or pointer.

The point is not that there’s an engineering problem, at least not for hobbyist reading system developers like me. With Monocle I could pick and choose which interactions go through to scripts and which are reserved for the system. Given Monocle’s open design, my hope would be to leave as many of those choices with implementers as possible.

The point is the uncertain landscape that emerges for book designers. The device capabilities and interactions models of existing reading systems are wildly divergent; far more divergent than the desktop web browsing landscape, for example. And what we’ve seen over the last decade on the web is that where features have uneven implementations, web designers have generally avoided them. They have fairly and reasonably stuck to the middle path of wide cross-browser support. In other cases they have favoured the dominant implementation โ€” but for ebooks, there is yet no such thing.

That’s my second point, and a good part of my third point too: inherent inconsistency in the implementation will make things very difficult for book designers who attempt significant interactivity.


Implementation minutiae

This is the most bland and technical of all my bland and technical points. But I really think it is the rock against which interactivity in EPUB dashes itself, at least until the (not particularly improbable) day when reading systems are developed by software architects and engineers much cleverer than I.

At this time, more and more reading systems are being built on top of dedicated, drop-in renderers like WebKit. It seems clear that all the momentum for JavaScript-based interactivity comes from WebKit’s capabilities. You can see why: its HTML5 parser and feature-set, the typographic and animation possibilities arising from CSS3 standards and de facto standards as well as the canvas element, and its fast, featureful JavaScript engine. Why wouldn’t you unleash that?

However, even among the handful of reading systems built on top of WebKit of which I have any knowledge, the implementation details of those systems rupture the consistent surface that the renderer appears to provide.

Here is the problem: between loading a component out of an ebook and displaying it, every reading system modifies the shape of the document data (the DOM). Some do a lot of modification, some do as little as possible. The reasons they do it are generally good and entirely implementation-specific. To take a now-retired example, Liza Daly’s trailblazing epubjs goes looking for places to split paragraphs for the purposes of pagination. It creates new paragraph elements as necessary. The technique is ingenious, but since it modifies the DOM tree of the document substantially, it makes for unstable ground for scripters.

Nor do I mean to pick on epubjs: every implementation tweaks the DOM in its own way. Monocle’s only interesting advance on epubjs was to get the browser to do the pagination, using CSS columns. So in that regard, Monocle modifies the DOM very little. But Monocle’s default “page flipper” is configured to display two overlapping iframes, so that as the page you’re reading turns, you see the next page underneath. Even naively, what this means is that you have two execution environments for scripting, which do not share state. Increment an integer in one frame and it is unchanged in the other. It would be very difficult, even for brilliant book designers, to account for that.

My guess, and it is only a guess, is that iBooks renders the current component to an off-screen WebView surface, then essentially blits distinct portals of that content onto a “paged” visual metaphor. If so, this would probably be the purest, least DOM-invasive technique I know of, but even in this case a book designer would need to be aware of many specific implementation details. Similarly for every other reading system, and more so for the very customisable systems like Stanza. Modifications to book CSS, such as Stanza loves to make, will be sufficient to break many scripts.

The troubling outcome here is not that interactivity will be ignored by book designers. What bothers me is that if book designers adopt these features, they will do so for a particular, presently dominant reading system. They will simply have to choose to support the biggest or the luckiest reading system. Their books won’t work, or won’t work as well, on anything else. Innovation will in this way be stifled, not stimulated.

The market and designers will probably do that anyway, but part of the role of the EPUB specification is to avert that.

Which brings us to some observations on how the spec is being specified. Tomorrow.