Front-end Semantics

A sketch of the Semantic Web/LOD

Tim Berners-Lee's vision of the semantic web describes web content laden with semantic metadata. Web agents are envisioned interrogating the metadata and making decisions that, for example, might include linking the content from multiple records together to produce serendipitous benefits. The fulfillment of such a vision assumes, at a minimum, commonly employed data architecture and a large amount of marked-up data in public web space.

The impulse to share scientific data has been long standing. Since 2007 the sharing of data on the web has been promoted by the Semantic Web Education and Outreach (SWEO) Interest Group, which organizes its activities at the Linking Open Data wiki. Rubrics such as "linking open data," "linked open data," "LOD" and the "web of data" all describe structured data in open web space that feature links from one data set to another. Metaphorically, one could find oneself in a local neighborhood in one dataset and then ride a link to another dataset, and in this fashion meander through the semantic web. A listing of currently available datasets indicates that there are more than seven billion records available (as of September 2009) for harvesting, linking and manipulation. Berners-Lee has declared the linked open data movement as the "semantic web done right," which suggests, at least rhetorically, that the semantic web manifests itself in the twenty-first century as the LOD.

The information architecture for placing data is open web space is RDF - Resource Description Framework.

RDF - Resource Description Framework

RDF sounds fancy and perhaps obscure, but realize that it is nothing more than a relational database structure turned into a tree structure for the web.

Consider the following image...to the left is the relational table that describes a product. The product ID is "1" and there are two attributes that describe "Model" and "Quantity". Now consider the equivalent RDF representation to the right. The RDF models the very same information but as a hierarchical tree structure.

To use the jargon of RDF: this image shows several "triples" made up of a Subject, Predicate and Object. Here's one triple that is being illustrated: Subject = "products id= 1", Predicate = "Model" and Object = "ZX-6".

The following picture looks difficult, but shows the direct rapport between a relational table and a series of triples. Consider the first one in the table of triples to the right: Subject = "Prod1", Predicate = "rdfs:label" and Object = "Eeeepc".

In the picture below, you'll spy a number of predicate names and objects that are preceded by another name, for example "rdfs:label". Here the predicate is "label" and the namespace is "rdfs". For our course, we don't have to worry about namespaces. They become useful in large, complex systems where there may be a danger of the same predicate or object names being used in several systems and then being confused one for another.

RDFa - RDF in the Attributes of HTML elements

RDFa is a thin layer of markup you can add to your web pages that makes them understandable for machines as well as people. You could describe it as a CSS for meaning. By adding it, browsers, search engines, and other software can understand more about the pages, and in so doing offer more services or better results for the user. For instance, if a browser knows that a page is about an event such as a conference, it can offer to add it to your calendar, show it on a map, locate hotels or flights, or any number of other things.

Read through RDFa for HTML Authors and pay particular attention to this example:

	
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/elements/1.1/">
<head>
   <title>John Smith's Fish of the World</title>
</head>
<body>
   <h1 property="dc:title">Fish of the World</h1>
   <p>by <span property="dc:creator">John Smith</span></p>

</body>	
	

Note the <h1> element that has an attribute "property". The value of the property is "title" [What sort of "title" is it? It is a dc or Dublin Core title. Where would I go to find out more about a Dublin Core title? I would go to http://purl.org/dc/elements/1.1/ which is set in the <html> element at the top of the document.] Note the <p> element that hosts a <span> element. The <span> element has a property attribute that indicates that the creator is John Smith.

Read Introduction to RDFa and Introduction to RDFa II and pay particular attention to this example:

<html>
<head>
  <title>RDFa: Now everyone can have an API</title>
</head>
<body>
  <h1>RDFa: Now everyone can have an API</h1>
  Author: <em property="author" content="Mark Birbeck">
    Mark Birbeck</em>
  Published: <em property="created" content="2009-05-09">
    May 14th, 2009</em>
</body>
</html>	
	

Note how the author's name and the date of creation are marked up semantically.

Rich snippets and GoodRelations

To display Rich Snippets, Google looks for markup formats (microformats and RDFa) that you can easily add to your own web pages.

Read Introducing Rich Snippets and pay particular attention to their example:

Of course, this example looks more complex that the ones we've examined so far, but that is because Google is illustrating RDFa with a namespace called "v" that it seems to be related to "stars", etc., etc. You get the idea, however, that the GoogleBot will sweep over this code and recognize the semantic RDFa markup and then be able to show a rich snippet in its search results.

Read Yahoo! RDFa - GoodRelations and pay attention to this example.

<div typeof="product:Product"
    xmlns:product="http://search.yahoo.com/searchmonkey/product/"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

    <span property="product:listPrice">27.99</span>
    <span property="product:currency" content="USD"></span>

    <span property="rdfs:label">Startech Serial ATA Cable - 45.72cm - Red

Note that Yahoo also indexes RDFa markup. In the Yahoo example there are several predicates describing the product and its label.