These are a rough version of my comments at today’s conference on Cyberinfrastructure, University Policy and Innovation. Text after the jump.
-
john wilbanks' blog
Agitating for innovation through open licensing and good technology.
-
Cyberinfrastructure, University Policy, Innovation
- Date:
- Thursday, 21 Feb ruary 2008 - 18:25 GMT
My topic for the panel is “what is the end goal of using Knowledge Infrastructure”?
We use content effortlessly today. Think about taking a digital picture of the national academies, sending to a web-based photo sharing site, for tagging into a “cloud” of similar photos, under a creative commons license for sharing, which is then incorporated into an open travel guide. This is possible because of the investment in infrastructures like the domain name system, the networks, tcpip, which are public, as well as private investments in digital photo tools, etc, and in business models like flickr, and also in the legal infrastructure of CC and the social investment of the users of flickr who invest their own time in categorizing information. We’re swimming in cultural infrastructure that creates entirely new forms of collaborative innovation.
This was not an end goal of the arpanet, i think. But it was an end Result of the Design Choices made to achieve the shorter term goal of getting computers to talk to each other. It could have been done in a heavy, feature-rich manner. Instead it was done in a simple, end to end way.
I’m arguing here not just for end to end but something implicit in the end to end idea: the one-to-many offer. This is an incredibly powerful idea – I bind myself to fulfill the deal, no matter who accepts it. To reference Tony Hey’s wording, the transformation of data to information to knowledge is the key. But if we exact technical, legal, social transaction costs at each step of the transformation, the alchemical transformation, from raw data to usable knowledge, fails. The one to many offer is one of the only non-miraculous tools available to us…
The structure of the deal matters, because ontogeny is connected to phylogeny: from wikipedia: “Ontogeny is the growth (size change) and development (shape change) of an individual organism; phylogeny is the evolutionary history of a species” and “one can explain connections between phylogeny and ontogeny if one assumes that one species changes into another by a sequence of small modifications to its developmental program. “
Thus, the web recapitulates aspects of the network itself. Rather than a feature-rich hypermedia and IP restricted system like dynatext, the simple version with the minimum amount of control exploded, into something it was not designed for – its end goal was to make it easy to get CERN data to other places, and get data from other places to CERN. I once asked timbl what surprised him most about the web in 2004 and he said google – not because of the size of the web or the power of the search, but simply that there was enough storage and processing power to download, store, and compute across something that big. The innovations frequently come from lateral thinking.
So perhaps our end goal is enable things we can’t imagine, to create a world in which knowledge can be used the way we use content.
So what does that mean? And how do we engineer towards that end goal?
Here’s a strawman for what it means: three intermediate goals that we can measure that may get us to an end goal of enabling things we can’t imagine, three goals that recapitulate some of the phylogeny of the network.
1. How successful are we at making the system end-to-end, everywhere, as the default rule?
Another way of putting it: have we gone to rehab and broken our addiction to the idea of “paper” and the one database per child metaphor? We’re hooked on atomization and IP - they provide us with the illusion of control, either to keep stuff from being shared (the traditional approach) or to force others to share (GPL). We need to stop even talking about it all this way. These are mental artifacts of a world that is long gone…
So let’s take the ideas behind the existing ICTs and make them multidimensional: the end to end principle applied to content, for example, like CC. The way the domain name system lets us uniquely identify a web site applied to things like genes, proteins, and other core entities that are the subject of research. If we can’t aggregate content and use the same names for things, we can’t even start. The point about standards here is well taken – the creative chaos of folksonomy is built on the controlled names of the DNS - if we want to enable that level of creative chaos in knowledge, it gets built on some standards underneath. So one end goal is to make information aggregation, and knowledge communication cheap.
Side point: not all knowledge is digital. The network has been very good at moving actual physical things around the world between actual physical people – we should be thinking about how to bring that efficiency to bear. Right now we’ve just digitized an analog process – instead of calling to beg for tools, we use email.
Since research and knowledge are part of an ecosystem, we have to think like ecologists – if we release just one piece of the resistance, all we do is move the location of the blockages. WWe need to think about pre-ip movement in the same moment we think about licensing IP, we need to think about data movement in the moment we think about data analysis, we need to think about giving credit in the same moment we think about compelling sharing.
2. How successful are we at making it easy to get answers to research questions?
Not everyone can be a programmer or an ontology engineer. This has to be easy, the way the web makes it easy to browse and search content. google’s great for searching content. It doesn’t work nearly as well for research, which is atomized into papers utterly devoid of hyperlinks. Building knowledge queries is hard. So another goal is the creation query builders that let you stitch together a bunch of complex information resources like databases and ontologies and literature – but in a way that doesn’t require much more skill than learning to use a web browser. Queries may well be the web pages of the knowledge web, which means we also need to think about enabling view source on those queries – it’s easier to hack someone else’s stuff and change small bits than to write from scratch. This is a big piece of how the web exploded – a deliberate release of control at the view source level.
3. How successful are we at building capacity?
Invest in people, invest in training, invest in standards, invest in semantics. The SC work on the neurocommons is radically limited by the small number of people who know RDF and genes well enough to put them together. And now our people are in demand across a huge swath of science.
So the real end goal: design these systems well enough that we are stunned by the things that they enable. That the kids hacking in harvard dorm rooms build knowledge systems, not facebook – we need bait to bring the innovators over to our world, because right now they’re starting video game companies and social tagging systems. That a scientist in Brazil can ask questions with as much power as a scientist at Pfizer. Because that is the infrastructure for innovation. Where there isn’t competition on information control or aggregation, but instead on hypothesis and model building. Where collaboration is easy, and we dramatically increase the entire throughput of the systemic processes by which data is transformed into knowledge, and knowledge translated into actual innovations that help our lives and societies.
Last updated: Thursday, 21 Feb 2008 - 18:25 GMT
-
Comments
-
John, like you I am a great fan of applying abstract architectural principals like the E2E of successful systems like the internet to other domains. Giving URLs to genes and proteins is a good example. Here is another core entity that could use the same treatment: scientists. Attribution, finding, asking questions, collaboration, all these things could benefit from web-inspired technologies like for example OpenID. Let’s give everybody a couple of URLs to identify them and build systems on top of that.
I understand you will be giving a talk in São Paulo next wednesday. I hope to see you there.