The Semantic Web is based on the relatively straightforward idea that to be able to integrate (link) data on the Web we must have some mechanism for knowing what relationships hold among the data, and how that relates to some “real world” context. The following is a lot of detail that comes from this simple idea.
-
Web Science - the World of the World Wide Web by James Hendler
The Web affects us all, but we know surprisingly little about it. It revolutionizes the sciences we practice, but its own science remains to be developed. In this blog, I explore areas of Web research of interest to the scientific community.
-
What is the Semantic Web really all about?
- Date:
- Tuesday, 16 Jun e 2009 - 23:10 UTC
To answer this question properly, let me start back in the early Web era. While I’m going to do some potentially boring personal history, I’ll note the key ideas as I go along.
Circa 1995, my research group began playing with an idea (first proposed by my then student Sean Luke now a faculty member at GMU) that if web markup (it was all HTML back then) contained some machine readable “hints” to the computer, then we could do a better job of Web tasks like search, query, and faceted browsing. It rapidly became clear to us that for this to work, the machine-readable tags used in one file would need to point at those in others — either to point back to definitions or to point at various kinds of related things. Since the Web provided a naming mechanism, URIs (think URL, but to specific parts of a document as opposed to the whole thing) we figured basing the markup on this would be of help. We created a language called the Simple HTML Ontology Extensions which was based on creating documents that contained useful terminologies, having other documents use those terms (pointing back to the original ones). Our slogan for the project, which continues to be a popular quote, was on shirts we had printed ca. 1998:

Slogan: A little semantics goes a long way
key idea 1: SIMPLE
key idea 2: URI-basedAnother student, Jeff Heflin (now a faculty member at Lehigh) started playing with this idea and extended it to include a rule-based reasoner, a Web scraper for extracting SHOE from non-annotated Web sites, a visual query by example system and a bunch of other things. His thesis included the first formal description of the Semantic Web – defined in terms of multiple ontologies linked together (it was also the first thesis to my knowledge to use the term “Semantic Web” and to try to define it with some rigor).
key idea 3: LinkedIn 1999, I went to the US’ Defense Advanced Research Projects Agency and convinced them to invest in the technology. My primary argument was that this could be used to help solve a lot of the DoD’s (and, of course, everyone else’s) data integration problems.
key idea 4: DataA project called the DARPA Agent Markup Langauge was launched. MIT’s Semantic Web Advanced Development led by Tim Berners-Lee was funded under this program, with a proposal to base the emerging language more specifically in the then-recent Resource Description Framework language. Along the wat, we came to rebrand the work under Tim’s name for it: The Semantic Web (with an emphasis on the Web).
key idea 5: Semantic WebI also learned that many of the ideas in the pargraphs above were part if Tim’s original conception of the Web, and certainly by the late 90s of the Semantic Web. In fact, in a 1994 talk (Web Conference, Geneva) Tim said:
in fact documents on the web describe real objects and imaginary concepts, and give particular relationships between them… For example, a document might describe a person. The title document to a house describes a house and also the ownership relation with a person. … This means that machines, as well as people operating on the web of information, can do real things. For example, a program could search for a house and negotiate transfer of ownership of the house to a new owner. The land registry guarantees that the title actually represents reality.”In short, by early 2000, there were a number of people working on approaches to Tim’s idea of a simple, URI-based Linked-data approach called the “Semantic Web.” In 2001, I was lucky enough to be the “et” in Berners-Lee et al a Scientific American article that outlined the vision.
That technology is now maturing, and becoming much easier to use. However, there’s a lot of more advanced ideas that underline an as yet unrealized research vision of a far more powerful Semantic Web — and I think that is where a lot of confusion has grown up. That vision is shaped to a greater degree by Artificial Intelligence researchers, by people working on advanced language-based search mechanisms, by people looking at expressive ontologies and complex rules, etc.
But that shouldn’t obscure this important fact: there is a lot of useful technology ready to be applied today in fairly simple ways based on the earlier and now-maturing vision.
In recent years some new terms have been coming into favor to help deconflict these different visions:
- “Linked Data” (or “Web of Data”) – close to the original vision. Get datasets rendered into RDF, use very simple ontologies (really more like thesauri or vocabularies than AI languages) to provide terms that can be used for linking and/or describing the data.
- “Web 3.0” – sometimes used as a synonym for Semantic Web, this term really grew up as a way of explaining how semantics (especially the simple ontologies we’ve been discussing here) can be integrated with Web 2.0 technologies to best effect. So it is coming to mean “Scalable Web applications that use Semantic Web languages or technologies” – examples of Web 3.0 apps include Twine, Freebase, Garlik, Bintro and many others (full disclosure – I’m on the board of Bintro and the advisory board for Twine). The term is also used for other kinds of Web Applications that use these technologies in concert with powerful AI technologies for example Powerset (now part of Microsoft) and the recently announced Siri. Again, large Web application with Semantics, (and so far less linking between these than in the linked data idea, but that is in part because they’re trying to figure out how to make money and not yet into how to cooperate with each other).
- “Semantic Web” – here’s the tough one – some people use it to mean all of the above (I tend to), other people only to mean the hard stuff that isn’t included in the previous two. This is usually used by people who want to make absurd claims like “The Semantic Web will never work because it requires getting machines to think” and other such nonsense. It is this negative use that has mostly driven business people to call things Web 3.0 (to avoid the term) and has driven those of us still loyal to the original vision to use the linked data term more and more (to differentiate).
So, if someone tells you something like “it is hard to build and maintain an ontology” they are wrong if what you are primarily trying to do is create enough context to do linked data applications. On the other hand, if you are trying to do more AI-like applications (think expert system on the Web), they can be right.
So I tend to advocate using mostly small, lightweight, ontologies that are linked together and which are primarily used to describe datasets as the basis for a lot of exciting new applications that can take advantage of “mashed up data” (and eScience is a key one of these). And that is what the term Semantic Web meant back at the beginning, and being an “old fart” in this space, I think I’ll stick with that.
Hope that helps… more when time permits.
Last updated: Tuesday, 16 Jun 2009 - 23:10 UTC
-
Comments
-
Thanks, Jim, for a very informative post. I appreciate the way you describe these concepts to a nonspecialist. I’d be interested if you could write sometime about one scientific example of a “useful technology ready to be applied today in fairly simple ways based on the earlier and now-maturing vision”. I still find it hard to connect the principles that you outline here (which I can follow, thanks to your clear explanation), with a real example of (say) scientific research.
I like your “old fart” view too. And the history is interesting, especially to a n00b like me.
What about layered ontologies. An ontology of ontologies: keeping something as simple as possible for one application, but making it accessible to others. I guess, an ontological web. Then you don’t need an infinitely complex underlying ontology; it becomes almost limitlessly expandable. I’m still learning vocab here and it’s hard to get my ideas out!
Ian – exactly right in concept – but you don’t need the ontology of ontologies as it were, rather the linking between ontologies described above is exactly to create a Web of linked ontologies – your ontological web. That hasn’t yet been as used as has linking of data elements, but I think as we see more deployment of this, we’ll see that linking happen and that will be a big deal
Thanks for putting 15 years in perspective in a brief post.
I will definitely reference this in my blog