Last Friday we released a major update to connotea with the aim of resolving Buggotea. Prior to this update the main database tables were something like this:
user
bookmark
user_bookmark
user_bookmark_comment
user_bookmark_details
user_bookmark_tag
and the unique key in the bookmark table was the hash of the url. This is where the entire problem with buggotea stems from. All of the functionality of connotea was built on top of this structure. You could say that philosophically the structure was set up for people to share web pages that represented articles rather than articles represented by web pages.
We have changed the tables to the following structure:
user
article
bookmark
user_article (with preferred bookmark and citation pointers)
user_article_comment
user_article_details
user_article_tag
Where the article is an abstract entity more representative of the fact that people want to reference an abstract piece of work that can exist in many different locations. We now normalise bookmark entries via PMID and DOI into a single article object. It seems to be working so far for entries in connotea that have PMID and or DOI information.
This is a first step and for sure there are things that can be improved, support for normalisation based on other meta data, how the information about an article with multiple instances is displayed, concatenation of the best metadata from multiple authoritative sources. Its not rocket science but this is a good step towards making connotea more useful in the academic realm. I won’t be surprised if there are some hiccups at first, so if you spot any odd behavior please let us know.
Thanks again Ian for fixing this. I see buggotea is also a bit of a problem for citeulike too ...
Hi Duncan,
It’s a first step. The original DB structure was not as abstract as it turns out is required for this problem. We are now resolving over PMID and DOI, but I hope we can extend this in the future to try smart matching based on other meta-data. That’s a bit more of a fuzzy problem but if anyone has any suggestions I’m all ears.
Hi Ian,
Does this mean that using API, if I request all posts corresponding to a Pubmed, something like this URI:
http://www.connotea.org/data/uri/http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=17822526&ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum
I will get all posts for this abstract, regardless of their URIs?
Hi Mohsen,
Wow, sorry for the delay in getting back to you. At the moment I don’t think that the article aggregation works that way. The MySQL is a bit hairy, but we are working on adding a “doi” and “pmid” handler to the API that would do that for you.