I was recently thinking about preparing a document, and if I could structure the information of that document in the most universal way possible. Ideally, document elements should be tagged as what they are - header, tagline, chapter, etc. Naturally, there is variation from document to document, so people can make their own tag labels.
Basically, this is just HTML, except I wanted to be sure every piece of information was tagged and structured, so that the formatting of the abstract “text information / data structure” could be switched between many types instantly, with no need for modifying.
That just means that sentences should also be tagged, not just paragraphs with
tags, and ideally, any kind of diagram or figure is easy to break into pieces, to modularize, as well. Why? Because the point of this format is that everything, every element, is identified on the smallest possible level, so that for different formats, you can design a different system for displaying that data structure in that medium.
So, imagine we have this abstract document markup - title, author, section, paragraph, sentence, list, table, table header, etc.
Ideally, we can just plug it in to some style/formatting package to make a new type of presentation - it can be a webpage - it can be a single piece of paper, like a large kind of research presentation poster - it can be a paginated book - it can be a PowerPoint presentation - it can be an automatically generated video - it can even be an interactive reading application where you navigate through pressing “next” or “back” - etc. Each format knows how to handle the “universal markup”. For example, the universal markup may include a citation for a sentence - or it may include a link, for a word. The formats would know that in a book, citations can be at the bottom; in an interactive reader, they can be omitted; on a website, perhaps they could be expressed as hyperlinks on that text, or a parentheses at the end of the sentence (“see here”). Whereas a hyperlink in a document would be expressed normally, in a book, it could either be a footnote, omitted, or maybe an annotation blurb in the margins. And so on.
So: has anyone considered a minor extension to HTML focusing on explicitly tagging all elements with their abstract function or role in the document could be a universal document markup language to easily be formatted in myriad ways?