• Web Science - the World of the World Wide Web by James Hendler

    The Web affects us all, but we know surprisingly little about it. It revolutionizes the sciences we practice, but its own science remains to be developed. In this blog, I explore areas of Web research of interest to the scientific community.

    • The Semantic Web - my personal (unofficial) FAQ

      Sunday, 30 Aug 2009

      I spend a lot of my time fielding questions from various people on what the Semantic Web is (and isn’t) and about its status – below are some of the questions I get asked the most, and my answers. This has no official status of any kind (I have been involved in some of W3C’s official FAQ activities, but this is not to be taken as related).

      Q. When will we see the Semantic Web emerge? (this question is also asked as "with all the hype, why haven’t we seen anything)

      A. My answer to this one is that it shows a certain ignorance of what the Semantic Web is all about (for more on that, see my previous blog entry). In particular, the Semantic Web is primarily an infrastructure technology that will bring new information to the Web, especially structured and semi-structured materials), that will help link sites to each other (via links through the “semantic space” – see below), and that will help create more information that can be, in some simple sense of the term, “understood” by the computer. Note that these things must be exposed through integration with the current Web — so that the revolutionary capabilities offered by the Semantic Web (and I still do believe it will provide a revolution in capabilities) will primarily be seen through improvements in functionality to existing web sites and to new Web applications – but they’ll still be deployed through the browser and look like the other stuff on the Web.

      So properly asked, the question is when will Semantic Web technologies be deployed widely on the Web. I think that is a more interesting question. We know that a lot of stuff is out there, and many sites use the maturing RDF already — examples include the new Yahoo! Web home site and a number of their other pages, Google’s Rich Snippets and Yahoo’s search monkey, which use RDFa, twine.com, freebase.com, and others that either use of export information in SW formats, etc. — there’s a good presentation on this that Frank van Harmelen prepared).

      So basically, I think Web 3.0 is here, but the Web is so big that you don’t always see it. More importantly, this is really more of an infrastructure technology, so without a “Web 3.0 Inside” sticker of some kind, you don’t know some of the sites you already use are using it. When John Markoff wrote his NY Times article more or less coining Web 3.0 in November 2006, he was responding to a panel that had three new companies on it – MetaWeb, RadarNetworks and Powerset. You might not have heard a lot about them by those names, but MetaWeb produced freebase.com and RadarNetworks did twine.com, both of which are quite successful and popular Web sites. Powerset was bought by Microsoft, and their technology is now reputed to be a significant contributor to bing.com, the new “decision engine” being promoted by Microsoft. Google supports something called “rich snippets” and Yahoo! something called “Search monkey” both of which expose semantic technologies in interesting ways. Several of the larger social networking sites are reputed to be working with the search engine companies to exploit these technologies, which has aroused a lot of interest in this community. The various blog posts saying that the Obama administration is starting to use some basic Semantic Web stuff (RDFa) on various sites have also kicked in a lot of interest. So much is happening right now, but a lot of it is under the hood – not hard to find, but you have to know how to look.

      The key thing about the above is that on the Web, once something starts to prove successful, it tends to grow. Every year since 2001, we’ve seen more and more of this below the hood stuff happening, but now with these sorts of breakthroughs, we’re really seeing things starting to heat up. I could claim it is already used daily by those on the Web, but I’d be exaggerating as probably not more than a few hundred thousand of the Webs billion users hit these sites on a regular basis. But that will grow rapidly in the next few years and I feel very comfortable saying we’re going to see a staggering amount of this stuff coming along. So I’d be surprised if it’s not the case in the next five years that more than 10% of Web users will regularly be hitting an application with some sort of semantic web technologies involved (and some large percentage of that will be using Semantic Web apps on a regular basis).

      So my real answer to this question is that it is here, and it is growing, but that we still have plenty of space to grow. Exciting things are coming, and will continue to do so.

      (However, I caution to add that the technologies seeing the biggest use now are those that were the first ones out of the research laboratories. There’s a lot more interesting things coming, but the day of “AI” and “intelligent agents” is still not going to be here in five years, but more and more of it will be coming each year, so it’s an exciting time to be in this area).

      Q. Is Semantic Search part of the Semantic Web (same question for Semantic social networks, semantic match, semantic ?x for most ?x)?

      A. This one is harder than it might seem. The question is whether “AI on the Web” is inherently Semantic Web, and I think the answer to that is a clear no. There’s many things that use various models of learning across large statistical datasets, evolutionary or other techniques are used on “human computation” Web sites, and so too are other AI techniques deployed on the Web. In most of these cases, however, the missing part is linking – the Web stuff – lots of techniques will make the Web better, but the real key is how the links work.

      But a lot of these systems are starting to use Semantic Web technologies and URIs within the application. Those are harder to answer. For example, Powerset (now part of Microsoft) was reputed to use Wordnet and various ontologies – whether those were explicitly in Semantic Web formats or not, is unclear, but it was clear that the creation of these things, the import and export of information was related to Sem Web formats etc. I work with a company called bintro.com which uses ontologies (in OWL and/or other formats) to help match job seekers to people offering jobs – so instead of keyword search, we try to do profile match – is this Semantic Web? I think so, because we are using the ontological stuff in a way that, eventually, we’ll be able to use and link to other people’s stuff. On the other hand, it’s currently in a single app – so I’d say we’re talking “Semantic IntraWeb” – an idea that is not well defined, but seems to be where a lot of the “web 3.0” players are right now.

      Q. So what is it that your research group at Rensselaer is doing?

      The field of AI has generally stressed expressive knowledge representation (being able to say something like “a hand that belongs to a human has five fingers, one of which is the thumb”) or on having lots of data and no knowledge at all (as the many machine learning projects currently being deployed). However, a small amount of knowledge applied to a large dataset seems like an extremely important, and largely ignored, area of Web development. My research is now looking at things like “very scalable” reasoning and also on “data on demand” systems – that is, in many applications there is so much data that it cannot be easily stored in a local machine (for example in science applications where we now see petabytes of data). We are looking at technologies that could, on the fly, find and merge appropriate pieces of very large datasets into custom “data caches” and make those available in Web applications. The key to a lot of this is that being able to scale these things requires some semantics, but not the traditional KR that AI people have explored nor can you use only the relational model that has been the hallmark of database research.

      How about describing one or two cool projects that you are focused on now
      and that you believe will lead to promising developments in this area in the
      near term?

      Here’s two I’m particularly excited about right now:
      In http://data-gov.tw.rpi.edu we’ve been taking the data that the US government has been releasing in the Data.gov project and making it available in Semantic Web formats. This allows us to rapidly create visualizations, link it to other datasets (either from there or other govt sources), and to start linking it into Web information sources that live in what is known as the “Linked Open Data Cloud.” This is a set of datasets from a number of domains that have partial mappings to other datasets, so that, in essence, developers can mashup the data and then write Web Applications on top of it. In the past two months we’ve been able to convert a lot of data into Semantic Web formats and to show the power of data mashups, and we’ve got a lot of really cool things that’ll be along soon.

      The other project is at the other end of the scale. We’ve been using the supercomputers available to us at RPI’s Computational Center for Nanotechnology Innovations to explore scaling the algorithms that power Semantic Web applications to really large datasets. We’ve been playing with graphs that have over a billion RDF “triples” (essentially the assertions underlying the Semantic Web stuff) and exploring how we can process them in a number of different and interesting ways. There’s really only a small number of groups working on this approach, and we think we’re the only US group in the space, so it is great fun. Turns out we get really nice parallelization on a number of processes, which speaks well to these algorithms eventually moving to multicore machines and to the sorts of backend server farms that power large web applications with millions of users.

      Q. What would you identify as the major challenges you are facing in your work,
      in the near term and in the long term?

      Near term the issue is staying ahead of the commercial world. I mention above that we’re starting to play with billions of triples but, for example, the Open Calais project (http://www.opencalais.com/) which is just one of many new projects playing with these technologies blogs that they are creating about 750-800 million triples a week! So in that way the Web has of making scale critical, the numbers are growing really big really fast. A second issue is that a lot of the power will come when applications start doing more linking to other applications through the Semantic Web layer. Just as the Web really became visible when a lot of “intranets” started opening up and linking to each other, as I mentioned above, the big Web 3.0 applications are still mostly functioning as separate and non-linking apps. Getting people to understand why the linking is so important, and what the network effect gets you, is a major part of my current “evangelism” efforts.

      OK, so what does the future hold for the Web? (I’m often asked this, rarely are my replies published, but hey, this is my FAQ!)

      I think the Web doesn’t look all that different, but applications that are similar to the ones you use now will start seeming to have a lot more data available (expect to see graphs, tables and structured information in a lot more places), will seem to have search-like capabilities (such as bintro’s matching) that are way beyond the current capabilities, and will increasingly be able to exploit the context of your queries (i.e. right now when you search on the name of a restaurant, your search engine doesn’t know if you’re looking to choose a restaurant, find out more about a particular restaurant, or are in that restaurant looking for other things nearby).

      I also think there is another very important thing that will be different, which is that much much more of your access to the Web will be from your mobile device (the thing currently known as a cell phone) and your location and social context will be much more available to applications your willing to make it available to (the way you can now give your iphone permission to use your GPS location in various apps). We’re working on a demo in my current lab where a wine-recommender is coupled with a location-aware phone and can access your Facebook information. So your phone could know you are in a particular restaurant, with a particular set of friends, and could use yours and their preferences to pick appropriate wines from the winelist based on what each person is ordering.

      So, in essence, I sort of have this vision in my head of us, using our mobile devices, wandering through a Web of information with the ability to somehow find a lot of the right stuff at the right time, based on where we are, what we’re doing, and maybe even who we’re doing it with. When I moved to RPI, I too a chair called “Tetherless World Professor” – and the more I’ve come to think about this new vision, the more I like the title. This stuff is still new and exciting, but I look at it this way – I started playing with the Semantic Web back in the 1990s. As a researcher, I’m not content to sit around and exploit Web 3.0 – my job is to help create Web 4.0!

    • I just got turned down for a grant. That’s nothing new, you win some and you lose some, and every senior professor has gotten used to that over time. This time, however, I cannot find it in myself to just say “oh well” and let it go at that. This time, I think I need to go public, because I think what happened shows an endemic problem with the US National Science Foundation and, I hope, points out some things they could do to fix it.

      The proposal was for an NSF “Expeditions in Computing” grant, and we put together a team of which the final panel review said “the research team is among the best in the world” (as well as me, the PI team included three winners of MacArthur “genius” awards and one of the top researchers in theoretical computer science). They said our vision was “compelling,” and that the proposal “outlines an ambitious research program to attain this vision, while at the same time with realistic ambitions as to what can be done within the resources of the expedition.”

      There are in fact, no real criticisms in the review until the summary which after saying more nice things about the team and the vision and the technology (a potential new breakthrough technology for the Web) said
      The panel was concerned with the management plan, in particular how to ensure collaboration among the diverse research communities, especially given the number of strong intellectual leaders among the team members.
      Or, in my words, they were worried how a team of such mavericks could pull together.

      In the end, NSF told us it had been “very very close” but that they were just not quite willing to deal with the risk and they could not fund us. But here’s the thing — this is supposed to be “high risk, high reward” research. In the end, NSF simply could not bring itself to take the risk!!

      OK, now if it just ended there I would expect, rightly, for you to say “sour grapes” – but I hope if you’re a US taxpayer, and looking at NSF turning down what the reviewers said could be breakthrough research in new ways of using the Web, you’re annoyed at this. “What,” you might ask, “could be done to improve the situation?”

      Here’s some thoughts I have based on having been both a researcher and a funding agent at different times in my life:
      1 – Many funding agencies in this situation would be able to come back to the team and work with us to craft a management plan that would make sense to both sides – we wanted enough freedom to use our judgment in which paths to follow, they wanted some better assurance we would pull together. In feedback from some NSF PDs they have ideas of exactly what we “should have said,” but they couldn’t give us this feedback until after we were rejected. In short, NSF cannot work with a “near miss” to craft a winner, they can only say yes or no.
      2 – Having been turned down, we can, of course, submit again next year. But the amount of effort that went into a proposal of this magnitude (we had 36 researchers in 11 different fields at 8 universities) is quite high, and next year we would be started again from scratch. NSF has no means where we could say “last year we almost made it and this year we fixed the management plan” – so we’d have to do preproposals, proposal if invited (no sure thing), site visit if we made it that far, etc (and all for different review panels who would have different questions and issues). There would be no advantage to having made it so far in the first round. Most funding agencies have a way of giving some sort of extra consideration in these sorts of cases, NSF doesn’t. Starting from scratch with the high likelihood of the same outcome (because this team wouldn’t be interested in doing anything if it isn’t high risk) is not a compelling idea, and we won’t be doing it, and NSF has nothing to offer to encourage us to do so.
      3 – NSF reviewing is now all done by panels. In the old days, it was all done by external reviewers (as many countries still do it). There is a lot of anecdotal, and some formal, evidence that panels tend to be more conservative. On the other hand, the workload of external review was way too high. The solution might be, particularly in situations like this, to solicit outside review to be compared with the panels (or to comment on specific matters such as “does the management plan meet what you know of these reviewers”). The panel who were at the site visit only included one person who has done published research directly relating to our proposal, and a couple more who knew parts of it. Getting some people who would have pointed out that many of the people in the proposal had coauthored with each other, and we were all known for working in large groups, might have helped alleviate NSF’s fear.

      I know NSF does a great job, putting up with constantly changing oversight from Congress, dealing with a public that doesn’t understand science, and trying their best to fund the best work while keeping the bureaucrats happy. Still, I must tell you that I am bitterly disappointed that they turned down a grant that had, as the reviewers themselves described it, “the potential to transform the use of widescale, networked computing” with _"a dream team of Web researchers and also top researchers in foundational computer science and social science." I would have loved to work with this team, and I’m sure many of us will now go and find other ways to pursue pieces of our vision. Knowing the team, this setback will delay this research getting into practice, but won’t stop us — we’re too "unmanageable"to give up. But I sure hope that the agency can understand why this turn down is more of a condemnation of them than us, and I hope the constructive solutions above can lead to NSF’s rethinking their approach to funding high-risk research.

      addendum July 16, 2009
      I was beginning to enter the “acceptance” stage of this rejection when I came across a 2005 report by the National Research Council (the organization that coordinates the National Academies in the US – the highest level scientific societies we have) entitled “Facilitating Interdisciplinary Research.” Their number one recommendation with respect to funding was

      Funding organizations should recognize and take into consideration
      in their programs and processes the unique challenges faced by
      IDR with respect to risk, organizational mode, and time.

      exactly what NSF did not do in this case – in fact, the exact reason they said they rejected it. Ahh well, they didn’t trust our research team which included a number of members of the National Academies, why should I expect them to listen to the NRC. Sigh – back to denial, anger and depression…

    • I’m lucky enough to be at a point in my career where I get asked to give keynote talks at a number of conferences. While I have always liked travel, I must admit that most of the time these trips leave me wishing that more conferences would let me just “call it in” instead of increasing my carbon footprint all the time. In fact, one of the true frustrations of professordom is the people who say “wow, you’re going to can I come?” — I usually answer, “Yes, in fact you could go instead of me!” It’s hard to explain to people that the usual trip is spent inside a conference center or hotel and the best part is coming back home to one’s research group which has been ignored during the interim. (Quick aside – the only thing worse than these people who think the travel is wonderful is the people who ask “So what do you do during the summer?” – arrghh!)

      But, on rare, and I mean rare, occasions, there is a trip that is totally worth it and makes one glad you came. I got home from one of these a couple of weeks ago, and the glow has not quite worn off (neither has the backlog that grew while I was away, which is why this post was so delayed).

      I got invited to give the opening talk at the International Conference of Web Engineering (ICWE) in San Sebastian, Spain. I will tell you right now, if anyone invites you to anything there, say “yes!”

      I was lucky enough to hit the town on a week when the weather was perfect – sunny everyday and not too hot but warm enough to hang on the beach and swim in the ocean. My wife came with me for once and we actually came a few days early (we decided to risk this given the travel brochures). San Sebastian is in the basque region, just over the border with France in the foothills of the Pyrenees. Just point your camera at random and you’ll get a picture postcard worthy shot.

      San Sebastian also has some of the best food in the world – a mix of basque and provence styles, with simple ingredients, wonderful seafood and very few of the heavy sauces that most of the world uses to hide the flavor of poorly cooked food. If you don’t believe me (why should you trust my taste in food), I’d point you to the fact, which the folks from San Sebastian love to point out, that the town has the highest ratio of Michelin starred restaurants to unit area anywhere in the world (I suspect the fact that they have three 3-star restaurants in the same area guarantees this).

      But I digress. The conference was held in a beautiful conference venue

      and the quality of papers, presenters and discussion was quite good. I usually don’t end up enjoying myself when I go to a conference I’m new to, I am not great at the whole smalltalk thing – but in this case, plenty of interesting people doing good work I wasn’t aware of. A good learning opportunity.

      The conference organizers went all out. The conference reception was in the city hall, one of those georgeous old buildings that Europe has hidden in various corners

      and this was the first time I was ever at a conference banquet held in a restaurant with a Michelin star (and the organizers made sure there was good food for those of us not eating meat)

      I’d say it’s been about 5 years since the last time I was at a conference I enjoyed as much in a place as nice, and it was one of those times when I was happy to be in a profession where I get invited to come tell people about my work — what more could you ask for?

      So for those of you students, post-docs and junior faculty, keep slogging. There’s a lot to complain about in the day to day life of a modern academic, but every few years something like this comes along that makes you feel special. And that (plus getting back to your research group with some new ideas) can make it all worthwhile.

      [and let me publicly thank Oscar Diaz, the conference chair, and his local conference team, for the great work they did!]

    • I know the issue of blogging by folks of more “mature” years has come up at Nature and elsewhere many times. Usually this is attributed to generation gap, lack of technology awareness, and unwillingness to share — all of which is partially true, but I’m rapidly realizing there’s a lot more to it than that.

      As much as I’d like to be a frequent blogger, the demands on my time are such that it is difficult to make this a regular part of my life. As most of the readers on this site will know, the outside world has very little idea of what real professors and scientists do. I cannot tell you how often I’m asked “but what do you do during the summer” by those who assume teaching is the primary thing in my life (oh for those halcyon junior faculty days when, in fact, teaching was something I could give the attention it deserved!). Those who work in, and even more those who run, a research group know how silly this is. Our real work involves advising, publishing, funding (and funding and funding and funding), service commitments, travel and speaking (way too much) and on and on. My administrator is fond of telling people that her boss works a 60+ hour week, and that doesn’t include weekends!

      And now consider the time needed to simply get something like this out — this week we have a conference deadline, so I’ve been helping edit my students’ papers, last week I was in Washington DC on a “reverse site visit” to present some proposed research to NSF, next week I’m off to give a keynote at a conference in Spain. I spend hours a day in front of a computer screen, my email alone can take a few hours a day, and to blog for fun at midnight, as I’m doing now, is not always how I want to spend my time. But it isn’t because I don’t want to do it, or don’t understand the importance, rather it’s that time management is something that has gotten more and more complicated as my lab has grown.

      I’ve noticed lately that I twitter in bursts, Facebook in bursts, and yes, blog in bursts. I also write papers, grants and do most other things in bursts – I do have colleagues who manage to keep things more even, but I notice that as we all get more advanced in career, this seems to get harder and harder.

      I suspect that some folks in the younger generation can do more thinking and typing faster, and understand the level of formality for blogging more than many of the folks my age. But I think there’s also a tendency to underestimate the effort that goes into producing blogs and such in general.

      So there is definitely work to do — we need to help senior scientists understand the importance of blogging and informal communications in science. On the other hand, there also has to be an understanding that effort is effort, and finding time to balance all the things that must be balanced means that whatever communication mechanism we’re talking about, the senior scientist sometimes need to focus inwardly on our research groups (who do you think writes the grants), our research communities (who do you think runs the professional organizations), our research venues (who are the senior editors on the journals and conference chairs) and then to the wider community (including blogging).

      So I suspect that eventually we will see that as more and more science bloggers advance career-wise, we will need to explore new models of how best to make blogging (and other social media) work without derailing so many of the other things that are making science work. Just my thought for the day as I run out for yet some more meetings, then head out to my next set of travels…

    • What is the Semantic Web really all about?

      Tuesday, 16 Jun 2009

      The Semantic Web is based on the relatively straightforward idea that to be able to integrate (link) data on the Web we must have some mechanism for knowing what relationships hold among the data, and how that relates to some “real world” context. The following is a lot of detail that comes from this simple idea.

      continue reading this post
    • some fun - graduation day in haiku

      Monday, 18 May 2009

      Apologies to those expecting Web Science, but I figured the “rules” of blogging allow an occasional “fun” one to be deployed among the serious — so having spent yesterday at my daughter’s college graduation, I felt it might be worth sharing the event, but given I had to do it originally in tweeter (@jahendler) I had to find a 140 character format, so here is my daughter’s graduation as a set of haikus

      Graduation in Haiku

      Parents wait and wait
      Students anxious in the hall
      Commencement is nigh

      Pomp and Circumstance
      Way too many caps and gowns
      precess slowly by

      Chairman of trustees
      welcomes all to the event
      when will he be done?

      President makes jokes
      then he gets more serious
      “we need your money!”

      Now they proceed to
      honorary doctorates
      lots of famous folks

      (someone sends a text
      it’s a message from my kid:
      “Get me out of here!”)

      juris causus doc
      no one has heard of this guy
      must be a donor

      Commencement address
      starts with funny anecdote
      so what else is new?

      Speaker’s really good!
      Words of hope and cheer – too bad:
      grads are all asleep

      Cannot see my kid
      the talks all go on too long
      will it ever end?

      A problem with this:
      seeing your kid get her degree
      means watching the rest

      The Prez says the words
      the crowd stands up, cheers real loud
      kids now have B.A.s

      Alma mater sung
      speeches made, degrees conferred
      graduation’s done

    • Wolfram|Alpha -- Wow!

      Wednesday, 06 May 2009

      Before you read this there’s two things worth knowing about me – I’m very hard to impress with new technology, and at DARPA (where I was a Program Manager for 3 years), I was known for hating demos unless I could get hands on.

      Therefore, when I was invited to attend today’s webinar demonstration of Wolfram|Alpha, the new Web “computational knowledge engine,” I didn’t expect to be very impressed. On the other hand, since the demoer was going to be Stephen Wolfram himself, and since there’s been a lot of buzz going on about the system, I figured I should watch. I’m really glad I did.

      I was an early user of Web now-giants like Google, Facebook, Wikipedia and others, yet I cannot remember a time when I was so excited. Not because the product is so obviously ready for prime time (hard to tell that without putting one’s own hands on it), but because of the potential it opens up.

      The idea behind W|A is easy – the company collects, curates and manages huge amounts of data about all sorts of things – including scientific domains (I noticed Physics, chemistry, mathematics, earth science) – as well as about general stuff (people, places, culture and media) — lots of areas where there is a lot of data. The system then functions like a search engine over these things, but instead of bringing back results of documents, it finds appropriate data, and processes it (using Mathematica as a back end apparently) and provides relevant answers and related facts, in various math, graphics and text form.

      So if you type “GDP France” you get not just the number you might find elsewhere, but a graph over time, different currencies, etc. If you ask “integrate x^2 sin^3 x dx” you get the function, and the graph (and if you want, you can see the steps used in computing it). One of my favorites was “Height of Everest / length of Golden Gate Bridge” (the answer is 3.233).

      Impressive also was the query "ATTGTTAATTGGTTAA’ which returned a bunch of biological information about this particular sequence (and changing a character or two clearly changed the result – we’re not talking about a fixed set of inputs best I could tell)

      I could go on for a long time – “2.5 molar H2SO4”, “uncle’s uncle’s grandson”, “3,7,15,31,63” (which showed the next entries and a for the series) and much more. In fact, what made it impressive was how many different things it was able to handle, and the quality of the return. (oh, and “ISS” was pretty cool as it showed the current position based on getting the data from an appropriate web site and then computing the known orbit of the space station).

      OK, is Google going out of business? Is Yahoo! done for? – No, W|A has a lot of work to do in improving its handling of queries; increasing its data sources; and working on its integration of sources across various fields. At the moment it looks like it would be a useful tool for some fields, and mainly a play toy beyond that — at least for now.

      But the potential is incredible. I really feel like it ushers in a new generation of Web applications and opens the door for getting people to realize that search is only the very beginning of what the Web is about.

      So keep your eye out for the public beta, and play with it — it may not be what you need tomorrow, but it will likely be something you use a lot someday…
      -Jim Hendler

      ADDED NEXT DAY:

      I just found out that last week ReadWriteWeb got permission to post some screen shots – they will give you a better idea of what this is all about.

    • WWW2009 - The Web at 20

      Tuesday, 28 Apr 2009

      While I must admit that I travel way too much (yes, Jennifer, Wanderlust Wanes but science and travel are becoming all to interlinked), last week’s trip to Madrid Spain for WWW2009 was well worth the shlep. This year’s conference celebrated the 20th anniversary of Tim Berners-Lee’s original proposal, at CERN, for an open hypertext system — the proposal for what we now call the World Wide Web.

      After a couple of days of workshops and tutorials, the conference kicked off with a panel which include Berners-Lee, inventor of the Web, Vint Cerf, one of the inventors of the Internet, and a couple of other people who’ve been working on the Web since the very beginning. Dame Wendy Hall moderated the panel, and while nothing too Earth shattering came up, it was interesting to hear the people who made the Web happen discussing where it came from — it’s not that often that I go to a Web event where I feel like a newbie!

      The panel was followed by the arrival of the paparazzi, who came to see Prince Felipe and Princess Letizia of Spain. The Prince gave an opening welcome talk to the conference, and while my Spanish is not that good, it was clear he went off script for a bit to talk about “El Web Semantico” (the Semantic Web) and Web Science, the areas where I work — nice to hear from a royal who tracks the technology!

      The second day of the conference started with another plenary panel, this one about Web Science. Moderated by my colleague, Nigel Shadbolt (one of my codirectors of the Web Science Research Initiative) the panel included Berners-Lee again, the research director from Yahoo! Barcelona (Ricardo Baeza-Yates), and Michael Brodie, Chief Scientist at Verizon. The panel included a lot of good discussion of the interdisciplinary nature of the Web, and why understanding economics, sociology and related fields are necessary to understand the Web.

      The best line in the panel was one from Mike Brodie. He said that in today’s world “the digitsl world is becoming real and the real world is becoming digital.” While it sounds like it’s just a throw-away line, it really contains a profound truth, that to understand the world in which we live, we must understand the Web and its role. More and more, what is online affects what we know about the world around us, and more and more data from the real world is becoming available online. Where do we see this more than in science, where increasingly huge data sets are being created which must be analyzed by large teams of scientists, working together across the world using the Web as a critical part of the infrastructure of science.

      Oh, and speaking of the increasingly networked world, you can see much more about the conference by looking at last weeks twitter stream with the tag #www2009 — for a while there on Wednesday morning, we were one of the top twitter trends, even beating out Susan Boyle for a little while (and yours truly (jahendler) was listed in the twitter stats as one of the main tweeters on the topic – not bad for an old guy).

    • The Science of the World Wide Web

      Monday, 13 Apr 2009

      I’ve recently returned from Athens Greece where I had the privilege of co-chairing the conference “WebSci09: Society On-Line,” the first real conference in the emerging area of Web Science . The conference went quite well – we had about 350 registrants, and close to 1000 people showed up for the special opening session/reception, which included a talk by Sir Tim Berners-Lee, inventor of the Web, and Turing Award winner Joseph Sifakis. For the ubergeeks among us, the coolest moment was when we learned that the tag #websci09 was one of the top ten twitter labels for a little while during the meeting (and tweets on the meeting went out in seven languages and in English, Greek and Chinese character sets).

      I figured that this meeting is a good chance for me to introduce myself to more of the science-blogging community here on the Nature Network, because I believe I’ll be the first blogger representing this new field of Web Science. Which, of course, leads to two things I need to do – introduce myself, and introduce Web Science. I’m a computer science professor with a long background in artificial intelligence and, more recently, the Semantic Web. (You can see lots more about me at http://www.cs.rpi.edu/~hendler) About 4 years ago, I proposed to one of my colleagues, Wendy Hall], (who recently became “Dame Wendy” – way to go DW), that we should put together a book on the future on the Web. She suggested instead that we might consider putting together a workshop on that topic, and in September 2005, the British Computer Society was nice enough to host an invitation only workshop on the subject. We worked hard to make sure that people from many different areas were there, not just computer scientists, and it was a good meeting.

      One of the results of this workshop was a realization that not just society but virtually every field of science (social, physical and mathematical) was being impacted by the Web. It was changing how we published, how we communicated, how we found and kept students, how we taught, etc. However, it also became clear that there was no field of study that took the Web seriously as the primary area of study in its own right. Mathematicians, physicists and computer Scientists were studying the Web graph and other aspects of the networks of connections on the Web, social scientists were analyzing Web-based systems, like Wikipedia, and the impacts of the Web on policy and politics, and engineers (in academe and industry) were creating new Web systems left and right. However, there was no where these folks were coming together to learn about each other’s work. Further, we realized just how intertwined these things were – for example, a change to the Web’s architecture could have huge impacts on the social systems supported by the Web, whether it resulted from a desire to engineer for better scalability (say a change to the underlying Internet stack) or for social reasons (like the providers wanting more money from differential packet handling). Like the Web itself, where social, scientific, and engineering sites are linked together without respect for disciplinary and methodological boundaries, the scientists studying the Web needed to be significantly more deeply intertwined.

      This idea led to a short article on ’Creating a Science of the Web":http://www.sciencemag.org/cgi/content/full/313/5788/769?ijkey=o66bodkFqpcCs&keytype=ref&siteid=sci, and thence to a longer Framework for Web Science book and various articles and workshops have followed. Sessions on Web Science have been held at the World Wide Web Conference, at various conferences in the subfields of the social, economic and computer sciences, and work now proceeds on the development of Web Science curriculum materials as several schools throughout the world are starting programs in the area at the PhD, masters, and most recently at the undergraduate levels.

      It should be noted that Web Science was never really intended to be a separate or separable field, and we work hard to keep it linked – whether physicist or psychologist, Web engineer or Wikipedia analyst, legal analyst of the laws of the Web or mathematician trying to develop the laws of the Web, there’s a unifying meme, that of the Web itself, that must be respected. Web Science is the place for us to come together to understand the interactions, not to separate ourselves from the others in our fields.

      And it is important to note that Web Science has some unique and exciting challenges of its own. Like a physical science, there is a real thing we study (the Web exists and can be analyzed), like a social science, there’s indirect effects on the way people live their lives (and with 21% of humanity already using the Web, that’s a heck of an impact), and like an engineering discipline, what we study is made by people, not a natural system, so it can be manipulated. Further, unlike most other systems people study, this one has the annoying property of growing so fast that by the time we can say much, it may have changed — while you were reading this blog post, billions of Internet packets were generated on the Web. Creating methodologies for its study will be a major challenge to the emerging area.

      So I hope like me, you are ready to learn more about Web Science. I may be one of the first people to have changed his business card to read “Web Scientist” – but there’s lots of room for more.

      Happy Webbing
      Jim Hendler

      p.s. Keep an eye on this blog for discussion of lots of things Web — things like the difference between the Web and the Internet (no, they’re not the same!), the Semantic Web, eScience and data-intensive science and what the Web has to do with it, and other things that you as a scientist, and a citizen, should know about the Web. And, of course, this being the blogosphere, I will share some of my experiences as I head out into the world to learn about life as a “Web Scientist” – wait, no more scare quotes – we’re here to stay – make that life as a Web Scientist!


Search blogs

web feed Request a blog Send an invite

Advertisement