An Infrastructure-Agnostic Model of Hypertext

Authors: Jakob Voß

Verbundzentrale des GBV (VZG)

jakob.voss@gbv.de

Published: 2019-07-01

Modified: 2019-07-18 (version 2.1.0)

Identifier: https://jakobib.github.io/hypertext2019/short-version.html

Status: Camery-ready version (https://doi.org/10.1145/3342220.3344922)

Repository: https://github.com/jakobib/hypertext2019/

Feedback: Annotate via hypothes.is; Open a GitHub issue

License: CC BY 4.0

Abstract

This short paper summarizes a new interpretation of the original vision of hypertext: infrastructure-agnostic hypertext is independent from specific formats and protocols. References to existing technologies for implementations are included nevertheless.

Categories and Subject Descriptors

CCS Information systems → Hypertext languages
CCS Information systems → Document representation
CCS Human-centered computing → Hypertext / hypermedia

Keywords

Introduction

The original vision of hypertext as proposed by Ted Nelson [6, 9, 11] still waits to be realized. Hypertext (subsuming hypermedia and hyperdata) has also been understood differently both in the literary community (that focused on simple links), and in the hypertext research community (that focused on tools) [14]. Infrastructure agnostic hypertext is an attempt to recover the core parts of original hypertext by concentrating on documents and their connections.

Outline of Hypertext

Documents (D) are all finite sequences of bytes, including

document identifiers (I) to reference individual documents,
content locators (C) to reference segments within documents,
edit lists (E) to combine (parts of) document into new ones,
and document segments with S ⊂ C × D.

Documents are further grouped by a plethora of non-disjoint data formats, such as UTF-8, CSV, and SVG. A hypertext system needs

a retrieval function R: I → D to get documents,
a transclusion function T: S → D to get document segments,
an assemble function A: E → D to execute edit lists,
a segments usage function U: E → 𝒫(S) for backlinks,

and practical methods to tell which content locators can be used with which documents to form an actual document segment. The core elements of hypertext, except from the simple parts documents and document identifiers, require some explanation.¹

Data formats

A data format is a set of documents that share a common data/document model, and a common serialization (see fig. ). The Dexter Hypertext Reference Model assigned formats to the “within-component” layer without further analysis because “it would be folly to attempt a generic model covering all of these data types” [3]. Infrastructure-agnostic hypertext however requires knowledge of data formats and models to support integration of any kind of document. The challenge that hypertext systems need to address is the unsolved problem of data modeling: ideas can be expressed in many models, models can be interpreted in many ways, models and formats are often not given explicitly [4, 13].

Content locators

Hypertext documents are primarily grounded in content locators for transclusion of document segments and in edit lists to create new documents. A content locator is a document that can be used to select parts of another document via transclusion. Nelsons refers to these locators as “reference pointers” [9], exemplified with spans of bytes or characters in a document. Whether and how parts of a document can be selected with a content locator language depends on which data format the document is interpreted in. For instance a SVG document can at least be seen as two-dimensional image, as XML tree, and as sequence of Unicode characters. Possible locator languages include IIIF for the first data format and XPath, XPointer, and XQuery for the second. Other locator languages apply for instance to tabular data (SQL, RFC 7111) or to graphs (SPARQL, GraphQL). Existing locator technologies further include URI Fragment Identifiers, patch formats (JSON Patch, XML Patch, LD Patch), and domain-specific query languages: content locators can be extended to all executable programs that reproducibly transform documents into other documents. This generalization can be useful to track data processing pipelines as hyperdata such as discussed for executable papers and reproducible research.

Edit Lists

Edit lists, known as Edit Decision Lists in Xanadu and borrowed from film making [7], are documents consisting of references to document segments and rules how to combine them into new documents. Several applications to create and modify digital objects track changes but this information is not provided in form of reusable edit lists, if at all. Hypermedia authoring should be integrated into existing editing tools [2] and their changelogs. Simplified forms of edit lists are implemented in version control systems and in collaborative editing tools. Hypertext edit lists go beyond this one-dimensional case by support of multiple source documents and by more flexible methods of document processing (in addition to basic operations such as insert, delete, and replace). The actual processing steps tracked by an edit list depend on data formats of transcluded documents. Just like content locators, edit lists can be extended to arbitrary executable programs that reproducibly emit documents.

Link services

Links services have been proposed as central part of Open Hypermedia Systems [1]. On the Web they are not available openly because of commercial interest.² Hypertext systems can derive links from edit lists, but the challenge is to establish link services that collect and share these links. Recent development such as Webmention and OpenCitation may help here.

Document retrieval

Access to documents via a retrieval function R can be implemented with existing network and identifier technologies (basically HTTP and URL) as done in OpenXanadu [10] but content-based identifiers better guarantee to always reference the same document [5]. The “global address space” [9] could directly be derived by hash functions from the space of all digital content. Promising technologies for implementation include IPFS Mulihashes and BitTorrent Merkle-Hashes. The challenge of hypertext systems is less the technical infrastructure to retrieve documents but normalization of documents to canonical forms to support content-based identifiers.

Summary and conclusion

Infrastructure-agnostic hypertext focuses on hypermedia content (documents) and connections (hyperlinks). Its model allows for integration of all kinds of data. The agnosticism to ever changing technologies excludes only some Xanadu requirements [12]:

identified servers as canonical sources of documents and
identified users and access control were part of Tumblers [8]
copyright and royalty system via micropayment
user interfaces to navigate and edit hypertexts

With documents as primary elements, document identifiers are a preferred over servers and users (OpenXanadu uses plain URLs as part if its edit list format but these links may break [10]). Canonical sources of documents (authorship) could be implemented with blockchain or other trusted logfiles. User interfaces highly depend on use-cases and data formats anyway. What’s needed for xanalogical hypertext systems are efforts to understand, normalize and process file formats in order to implement and popularize an ecosystem of content locators and edit lists. References to existing technologies show that hypertext as envisioned by Ted Nelson can be integrated into current information infrastructures, especially the Internet and the Web.

References

[1] Atzenbeck, C. et al. 2017. Revisiting Hypertext Infrastructure. Proceedings of the 28th ACM Conference on Hypertext and Social Media (2017), 35–44.

[2] Di Iorio, A. and Vitali, F. 2005. From the writable web to global editability. Proceedings of the sixteenth ACM conference on Hypertext and hypermedia. (2005), 35–45. DOI:https://doi.org/10.1145/1083356.1083365.

[3] Halasz, F.G. and Schwartz, M.D. 1990. The Dexter Hypertext Reference Model.

[4] Kent, W. 1988. The Many Forms of a Single Fact. Proceedings of IEEE COMPCON 89 (1988), 438–443.

[5] Lukka, T. and Fallenstein, B. 2002. Freenet-like GUIDs for implementing xanalogical hypertext. Proceedings of the thirteenth ACM conference on Hypertext and hypermedia. (2002), 194–195. DOI:https://doi.org/10.1145/513338.513386.

[6] Nelson, T. 1965. Complex information processing: a file structure for the complex, the changing and the indeterminate. Proceedings of the 1965 20th national conference (1965), 84–100.

[7] Nelson, T. 1967. Getting It Out of Our System. Information Retrieval: A Critical Review (1967), 191–210.

[8] Nelson, T. 1980. Literary Machines. Mindful Press.

[9] Nelson, T. 1999. Xanalogical structure, needed now more than ever: parallel documents, deep links to content, deep versioning, and deep re-use. ACM Computing Surveys. 31, 4es (Dec. 1999). DOI:https://doi.org/10.1145/345966.346033.

[10] Nelson, T. and Levin, N. 2014. OpenXanadu. http://xanadu.com/xanademos/MoeJusteOrigins.html.

[11] Nelson, T. et al. 2007. Back to the future: hypertext the way it used to be. Proceedings of the eighteenth conference on Hypertext and hypermedia (2007), 227–228.

[12] Pam, A.D. 2002. Xanadu FAQ. http://www.aus.xanadu.com/xanadu/faq.html.

[13] Voss, J. 2013. Describing Data Patterns. http://aboutdata.org/.

[14] Wardrip-Fruin, N. 2004. What hypertext is. Proceedings of the fifteenth ACM conference on Hypertext and hypermedia (2004), 126–127.

An earlier and more detailled version of this publication (4 pages) is available at https://arxiv.org/abs/1907.00259 (PDF) and https://jakobib.github.io/hypertext2019/ (HTML). The illustrating poster will be published in September at https://doi.org/10.5281/zenodo.3339295.↩
By search engines and by spammers. A criterion to judge the success of a hypertext system may be whether it is popular enough to attract link spam.↩