|
Printers and editors have long used the term markup to refer to marks added to a manuscript or typescript instructing the printer how to treat particular stretches of a document. Some material should be printed with one font face, and size, and weight, some with another. Some should be inset left, some centered, and of course the very design of the page itself had to be marked. Each physically tagged stretch had, presumably, some particular function in the text (it was a book title, or technical term, or section heading, or verbatim citation, date of composition, or some other such). Document on this view meant something very concrete like "something that can be xeroxed." HTML gets its M for markup from a rather different, more abstract sense of a document as made up of structural units--as, for example, a business letter is made up of a date, inside address, salutation, body, closing, and signature. To be sure, each of these elements would most likely be distinguished with physical markup (with perhaps BODY as the unmarked case), but various layouts of a business letter are possible, and it remains a business letter because it is made up of the parts regardless of their physical realization. We speak of a business letter as a document type; an explicit listing of the parts (ELEMENTS) of the document type and the rules for combining them we call a DOCUMENT TYPE DEFINITION (or DTD). Similarly, we can think of "play" as a document type (made up of acts, scenes, stage directions, list of characters, and speeches). Clearly the DTD for play would be very different than that for business letter, since they are quite different types of documents. Notice that the DTD says nothing about how the individual elements of the document are to be displayed. Either the browser has some instructions for display coded in (in which case it is dedicated to displaying only one type of document) or it receives instructions from the author in the form of specifications coded in the document or in a separate style sheet (as is the case with the general purpose SGML browsers Panorama and Multidoc Pro). HTML--to end the suspense--is a document type.* It is far less specific that PLAY or LETTER, and thereby more serviceable for a wide range of applications. In fact, for text units, it contains only a few headers, paragraphs, inset block quotations, an address, and lists. It is not very good for marking up documents that have a lot of structure proper to their specific type. It does not have an element for "line" or "stanza," for example, so it is not very handy for marking up poetry, nor does it have one for "speaker," so it is not the natural choice for drama either.* But though simple in its inventory of elements, it is a structural markup language, and it enabled people to mark up documents without thinking of the platform or the processing program they were working with. "Platform independence" was the motto, and it was achieved when browsers capable of displaying it sensibly became available for all the major computing platforms. True, a document looked different on different platforms, and in different browsers, and even on different machines, but there it was on all of them with out conversions and filters, its rough structure apparent to all. Another factor contributing to HTML's success was its impurity: alongside its structural elements it also offered some physical markup of the traditional kind (bold and italic, for example), so that one could still use old habits of word-processing markup. And when all browsers began to realize the <p> unit with a full line break, then it could be (and quickly was) used with zero content to insert some extra line space. And similarly the headings of various weights (H1 the biggest and heaviest, H6 teeny-tiny) began to be used for font size control--i.e., they might be used without being the heading of anything. It is well to note that HTML's loose attitude toward layout is not the only solution to the problem posed by diversity of hardware and operating systems. You could use a page-description language like PostScript to do layout exactly if your provided browsers and fonts for all platforms that would display the same way. Adobe Systems did and does offer this possibility for sending and displaying uniformly formatted documents. These are The best argument for using conceptual markup and using it consistently is that it allows texts to be searched and transformed in terms of meaningful structural units. In most cases, it would be more useful to have the result of searching for "words emphasized" than for "words italicized", since the latter would include titles of plays and books and perhaps foreign words and words cited as words. And again, if we think of converting a document to one in another format, or even just changing the format, we might well want all emphasized items in the document to receive a certain treatment in the converted one, but not all the italicized ones. Similarly, we can call out and highlight all the uses of French words and phrases in a work (if marked up with TEILITE) or all the speeches of a character, and so on with any element of structure that has been marked up. One of the most compelling reasons for HTML to use content markup was the hope to make it platform independent, so that it could be viewed on X Window Unix workstations as well as Macintosh and MSDOS desktop computers. These platforms all differ in their installed font sets, default screen sizes, handling of color. One of the brilliant design decisions made at NCSA was to let the text scroll: the local browser handles the details of the display according to the resources and defaults of the system it is running on. The browser produces a certain uniformity in look and style of display across platforms, but in fact, in the early going there was not even a standard browser you could design for if you wanted to. Then Mosaic became the WWW browser of choice only to be displaced by Netscape, which is now receiving some serious competition from Microsoft Internet Explorer. There has never been and may never be a single world wide standard browser-- or at least not for long--and the same is likely to be true of platforms. The prospect of knowing how your work will be laid out by a browser, and of controlling that layout by physical markup seemed remote. So the simple gray screen and large black type of Mosaic seem to follow the centuries-long trend in learned and serious text away from the illuminated manuscript, away from children's books, comic books, and away, above all from television and commercial advertising. Left to themselves as they largely were, academics and scientists gave the world--Mosaic. |
|
Netscape Communications Corporation, one might say, sold the Internet
to American business. Part of their success came from expanding
the options for display--font sizes, colors, wallpapers, tables,
frames, alignment, floating images, graphics formats, animation
and movies--in short, the tools of graphic designers. Most recently, Netscape Communicator 4.0 added support for layers and absolute positioning. Designers
were quick to make use of the net equivalents of their art and a
concern with Web design began to emerge alongside the concern for
content and structure. Among the most articulate of the design
advocates is David Siegel, who delights in opposing "information
purists" and "academic library scientists".* Siegel lavishes his contempt on the
gray ("any color but gray"), boring, fragmented pages with their
balls, horizontal lines, wall-to-wall print, and visual junk that
are the work of the First and Second generation web designers.
And his challenges to HTML orthodoxy have made him
controversial--on the
To understand the controversy, two points should be borne in mind. First, being a graphic designer, Siegel designs only for the later model big browsers (Netscape and MSIE); viewed in a text-browser (Lynx) or with auto-image-loading off (quite common these days to speed up browsing), his pages are a gibberish of little clear gifs used for layout and tables that don't work. While it is certainly not morally reprehensible to do that, it does fly in the face of the ethos of accessibility which developed in the early years of HTML's rise and which has been reiterated countless times on the web and in the web writing handbooks. "Browse it in Lynx, make sure it makes sense to those readers." There is frankly a real tension webwriters experience between making the page available to others (and restricting yourself to lowest common denominator technology) and using the latest and greatest technical powers provided by the medium. One working compromise in many instances (say in the case of frames) is to do two versions of a page, high and low tech, but this won't help Siegel very much, since design for him is of the essence of a page. (I should make it clear that Siegel does not design just for the highly technologically endowed--he is passionate on the subject of limiting the bandwidth demands of his pages, for example.) Breaking rules, and doing it combatively, does not, I think, fully account for the hostility Siegel has attracted. His view of web-authoring is also anathema to many academics: he sees site-making as retail marketing, likening the attracting of viewers to enticing shoppers in a mall store. This flies in the face of academic and professional codes of disinterest and non-self-promotion. The ideas should speak for themselves--a recipe, to be sure, for drab and stodgy document design, but the drabness is itself authenticating. Academics typically know nothing about layout, digital image processing, color, video (!), or digital sound, may not have time to learn, and resent being told that their pages of honest, well-intentioned work are boring and likely to be ignored online. It takes the marketplace of ideas metaphor in directions they don't want to go: the mallification of the web. Siegel urges his readers not to try to become graphic designers, but to get hold of one to help them develop their pages. But he then proceeds to teach the techniques and educate the eye as if he were in fact teaching a class in graphic design for the web. Teaching graphics and graphic design certainly does complicate the work of teaching writing for the web--especially if you don't know very much about itbut getting the help of a designer is not a realistic option in most cases, and is not one most people would take if they could, for a very attractive part of html-writing is the control it gives the writer over the display screen. If you have grown up thinking that you can only be a consumer of what TV serves up, the chance to populate a video screen with your own text and images is thrilling. Think of it--who would want to take or teach a class where you browsed and taught images-off? It is a defensible approach, but it smacks of fanaticism. Of course, college and university writing programs have largely ignored desktop publishing: they don't teach finding and using images, fonts, boxes, side bars, columns, tables, or other "enhancements" that the typical word processor plus deskjet combination can easily handle. So composition programs may continue to think of the text produced by writing as an abstract entity whose material embodiment, should it ever be published, lies on a far distant horizon, beyond the control of the writer (as is the case for most writing in the humanities). But if you approach HTML that way, you had better disable the graphic display, not just deselect autoload images. By pressing the notions of impact and eye-candy hard, Siegel may in fact be conceding too much to the professors: it can certainly be argued, and it will be later, that the visual provides a mode of comprehension and meaning in itself--or it can, if used intelligently. Siegel's bundle of tricks to trick HTML 2.0 into controlling layout was, he freely acknowledges, a hack and craved to be replaced by something more rational, systematic and thorough. The W3Consortium even put him on the committee to develop style sheets for HTML. The real problem with his hack and style of doing layout comes when you think of trying to translate or convert a Siegel marked page into some other format (say LaTeX or RTF for type-setting). It would be hard to imagine any set of macros that would filter out just the markup-for-layout. And similarly, you would have a lot of hand work to do if you decided to change the formatting in a group of documents. Siegel thinks of a page and a site as unique (and as a contracting site-designer, why wouldn't he?), like the display windows in the Mall's flagship stores*. |
|
Viewed in relation to the other major document types (TEI(LITE), DOCUBOOK) HTML developed anomalously in that it had no systematic way for writers to specify the physical realization of its structural units. That was left to the browser (some of which did offer optional resetting of font family and size by the reader). That certainly simplified things at the cost of loss of control over the document's design. The W3Consortium worked intensively for years developing a mechanism for style sheets and finally got browser support, first from Microsoft IE and then from Netscape. In their current versions the main browsers support over 90% of the CSS1 standard (version 2, however, is in draft-Request for Comments). People are beginning to discover what can be done with the sheets, though of course the results are visible only on the new browsers, so style sheets still have a narrowed, avant, elitist quality. CSS1 is only partially integrated into HTML 4.0, and not all of its features have browser support. This new standard allows you to set the physical realization of all HTML elements including the wildcard SPAN element. In fact, you could set several realizations--alternate style sheets--and allow people to choose which "view" they liked best. That is the mechanism, in fact, that allows you to change from X Window to Windows 9X displays of this document. Further, it allows you to define subclasses of elements, and even to treat together subclasses of different elements. This mechanism is very useful to supplement the very skimpy set of text elements, so that we can define different classes of P, for example, as P.footnote, P.note, P.abstract, and any other P that you have use for; we can then specify that P.note be a smaller size font, for example, or otherwise change it, and it will change everywhere in the document that it, P.note, occurs.* I repeat, you can MAKE UP YOUR CLASSES as you want them, which goes quite a way toward having the full SGML power to write a modified DTD. If we want to, we can define a P.line for marking up poetry and a class of DIV, say, stanza. The main thing we can't do is to determine automatically for a certain poem that each <P class=line> occurs in a <DIV class-stanza> since these subdivided elements are not themselves elements. As an example of how HTML plus Cascading Style Sheets can use the very general set text elements given by HTML to mark up a specialized document like a play, consider this
In |
| Readers familiar with generative grammar will see that the organization of markup languages strongly parallels that of natural languages: the DTD can be read as a set of rewrite rules generating a set of trees with units of structure at the ends of the branches. Content is placed in these container units (according to an inexplicit protocols of appropriateness), and this structure with content is physically realized by a third component (the style sheet) that gives display values to the units. Further, if we think of the basic structures of HTML as trees, then we can look at hypertext links as ways of going from one part of one tree to another part, or another part of a different tree directly, that is, without climbing down (or up) to the root of one tree and over to root of the other tree etc. Hence the term jump much used in the earlier days of hypertext to refer to going from some portion of one tree to a portion of another, distinct tree. Note that the "portion" of a tree need not be a complete node--it can be some word or words, image, or part of an image, and in fact hypertext A(nchors) cannot be placed around most structural ("block") elements such as P, DIV, TABLE, lists, or BLOCKQUOTE, or around H(eaders). In pithy, Chomskyan jargon, hypertext anchors are not structurally dependent--i.e. not defined in terms of structural elements. |