The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) is being held this year in Antwerp in September and a part of this conference is the
Discovery Challenge. This should be of interest to social bookmarking enthusiasts, as the challenge this year is being held by one of the other services that do very much the same kind of thing as Connotea, Bibsonomy. They are interested in looking at spam Detection in Social Bookmarking Systems and Tag Recommendation in Social Bookmark Systems. Connotea already has a rudimentary implementation of related tags. My own feeling is that recommendations need to go further (a nice review of recommendations is given in this MIT tech review) and we need to produce article recommendations, but nonetheless, I’ll be keeping an eye on the results of this challenge, and if anyone out there is interested in applying their methods on a complimentary data set then we can arrange to get access to the Connotea data.
-
-
ECML PKDD Discovery Challenge 2008
- Date:
- Monday, 12 May 2008
- tags:
-
Social Software for Libraries.
- Date:
- Wednesday, 07 May 2008
Via the supernumerarypa blog I just found a book called
Social Software for Libraries by Meredith Farkas.
From the reviews it looks like a good nuts and bolts introduction to Web 2.0 tools that have a current place in Libraries. I believe Connotea is given a mention.
Of course being in a book format has advantages and disadvantages, and one of the people providing a review on Amazon sums it up nicely:
“If I had a criticism, it would only be “book versus web”, as the web is a river and a book is an island. Printing it ‘fixes’ it in time, and the highly dynamic web will outrun the content of this book in a few years, maybe sooner. Meantime, its succint, direct and practical nature recommend it as a map out of the bewildering tangle of what’s out there. Now is the time to buy it”
- tags:
-
German Tutorial for using Connotea
- Date:
- Tuesday, 06 May 2008
A student from the Danube University Krems has created a short Wiki page with a concise description of how to use Connotea in German. You can have a look at the page here. Many thanks!
- tags:
-
Display your Connotea Bookmarks on your Site.
- Date:
- Friday, 25 Apr il 2008
We have developed a little piece of javascript so you can now show off your recent Connotea bookmarks on your site! You can check out how to do it here. Below is a screen shot, and you can see it live (but unstyled, cos I’m old skool like that) my own homepage
This is a bit experimental at the moment, so we might change things around over the next few weeks. As ever if you have any feedback let us know here or you can mail me at i.mluvany@nature.com

-
Improved import of RIS files to better handle PUBMED tag terms
- Date:
- Thursday, 24 Apr il 2008
Ben Good pointed out that Conntoea’s import mechanisim for RIS files generated from Endnote was magnling tags, and especially tag items like MESH terms.
We have now fixed this, but the fix is somewhat non-trivial.
The main reason for the problem is that we were usin the same tag compreshension code that is used in the “Add To Connotea” pop-up. This assumes that “mutliple words in quotes” are one tag, and lots, of, tag, separated, words, are individual tags. In addition, since you access tags in conntoea through the url we have to throw away forward slashes since a tag with a forward slash in it is going to confuse the url resover in Connotea.
Now when it comes to pub med records it looks like all of these rules are specifically chosen to break the way Pubmed records describe tags :/
Ben describes the problem very well:
In Pubmed records, and in the Endnote records, /’s are used to separate descriptors such as “Transcription Factors” from qualifiers such as “antagonists & inhibitors” and “metabolism”. For example, you might see a keyword listed as “Transcription Factors/antagonists & inhibitors/metabolism”. When imported, Connotea strips the slashes from the tag and thus adds the tag “Transcription Factorsantagonists & inhibitors metabolism” to the post.
So now we deal correctly with these tags, yay!
MeSH terms sometimes contain commas like “Models, Genetic”. When imported, these compound terms get split into multiple separate tags (Models and Genetic).
That’s because our comma separation parsing used to take precendence over our parsing of collecvie terms, but we have fixed that now.
In addition, it appears that quite a few people have managed to import the “Research Support” aspect of Pubmed Records as well. This is why you see more than a thousand bookmarks with the rather misleading tag “Non-U.S. Gov’t”, often also tagged with the seemingly contradictor “U.S. Gov’t”. (This happens when the research in the paper had both U.S. and non-U.S. funding).
We decided to leave this alone, as solving this problem requires understanding what the tags mean, and the context in which they appear. OK, so you can’t win every time. I guess we are just going to have to wait for the semantic web!
p.s. You will often see a ‘star’ appended to the beginning of tags imported in this manner such as ”’star’Genes”. This indicates that the stared’d term is a major topic (as opposed to minor topic) in the manuscript according to MEDLINE indexing.
OK, so now we strip leading stars from tag names, so that ’’star’gene’ imported becomes the tag ‘gene’ and can connect to all of the other items that have been tagged with ‘gene’ by users.
In a way so far all of the above is pretty straigt forward, now things get a little itneresting,
Martin our developer points out the following behaviour:
“One of the annoying parts of import is that if the keywords are separated by newlines but a tag with commas was collapsed into two, it would likely merge with other tags on the first or second term and then be tedious for the user to pick out later no matter what the UI.
In the RIS importer I’ve added a heuristic test which allows splitting on commas, except where it sort of looks like newlines are being used to demarcate the tags.
Here are some examples:
(1)
KW - aaa, bbb, ccc, ddd, eee, fff, ggg, hhh,iii, jjj, kkk, lll, mmm, nnn, ooo, ppp
(2)
KW - aaabbb
ccc
ddd
Transcription, Genetic
eee
fff
ggg
hhh
iii
jjj
kkk
lll
mmm
nnn
ooo
ppp
(3)
KW - aaa, bbbccc, ddd
eee, fff
(Describing this in this blog is a bit hard, you have to ignore the extra lines between text lines as the blog parser treats whitespace in this system in a funny way, but I hope you see what we mean)
So in (1), sixteen tags are evident. In (2), the same tags appear, and I’ve added a comma-containing tag to show how they appear in dang.txt from our friend Ben. Clearly in (2) the newline is supposed to be the separator, not the comma. However, if you eat commas as part of tag names then (1) will fail.
The heuristic I came up with is that if there are at least three lines and no line runs longer than 60 characters then it should be treated as newline-separated and include the commas in the tag names. Otherwise, separate on commas as well.
This makes (3) not do the right thing, so it’s up to you if you think this will help or hurt. (3) IMHO is not likely to be computer generated… a computer would either write one per KW line (avoiding all this), split on newlines, or fill up lines to ~80 chars split on commas and add in newlines to keep going. All of which work with my test.”
And so that’s how we have left it for now.
-
Annual Reviews now supported
- Date:
- Wednesday, 16 Apr il 2008
We had a request recently from Annual reviews for Connotea to support their site.
Martin, our developer had a look at their site and came up with some options. He takes up the story:
“Although the DOI appears in the URL, and we could switch off to CrossRef
for citation data, we lose the authors and full publication date, so
that’s not as good.They embed citation metadata using Dublin Core conventions inside HTML
meta tags, but here we’re missing journal, volume, and issue data.I noticed that their site allows download in RIS, BibTeX, etc. format,
and in looking for a similar existing citation source module that does
such a download of a secondary RIS file, that Blackwell.pm was actually
a perfect module to do the work. Too perfect – it works with a domain
name change! So then I realized that these two domains, plus more, are
using software from Atypon (www.atypon.com) and that if we recognized
the pages served from this software, we could immediately support all
their current and future customers. Considering that their customer list
includes places like MIT Press that seemed like a good idea.So I’ve produced Atypon.pm which is reasonably short and can replace
Blackwell.pm as well as supporting many of their other customers.”Atypon provide publishing software for the following (and so now we support the following too!:
Commercial publishers
- Alexandrine Press
- AOAC International
- Australian Academic Press
- Blackwell Publishing
- eContent Management Pty Ltd.
- Expert Opinion
- FDI World Dental Press Ltd.
- Future Drugs
- Future Medicine
- Guilford Publications, Inc.
- IAHS Press
- IFIS Publishing
- Intellect Ltd.
- Lawrence Erlbaum Associates, Inc.
- Logos
- Lynne Rienner Publishers
- Mary Ann Liebert, Inc., publishers
- Morgan & Claypool Publishers
- Oldenbourg Wissenschaftsverlag, GmbH
- Oxford Business Group
- PNG Publications
- S. Karger AG
- Salem Press
- Scientific Journal Publishers Ltd.
- SCR Publishing Ltd.
- Sheridan Press
- Thomas Telford Publishing
- Uitgeverij Boom
- Vathek Publishing
- VEETECH Ltd.
- Walter de Gruyter
Not-for-profit / society publishers
- AIS Educator Association
- Aluka
- American Academy of Periodontology
- American Accounting Association
- American Anthropological Association
- American Association of Cereal Chemists
- American Chemical Society
- American Economic Association
- American Marketing Association
- American Phytopathological Society
- American Society for Bone and Mineral Research
- American Statistical Association
- American Veterinary Medical Association
- Annual Reviews
- Association for Childhood Education International
- BioOne
- British Institute of Non-Destructive Testing
- CFA Institute
- Chartered Institute of Building
- Commonwealth Forestry Association
- Countertrade & Offset
- CrossRef
- Institute of Electrical and Electronics Engineers
- Journal of Neurosurgery Publishing Group
- JSTOR
- Modern Language Association
- Pharmacotherapy Publishing Inc.
- Production and Operations Management Society
- TASH
University presses
- American School of Classical Studies in Athens
- Cold Spring Harbor Laboratory Press
- Edinburgh University Press
- Govi Verlag
- Indiana University Press
- Johnson Graduate School, Cornell University
- MIT Press
- Monash University ePress
- University of California Press
- University of Chicago Press
- tags:
-
Nice introduction to Connotea in the JLMA
- Date:
- Tuesday, 08 Apr il 2008
Melissa Rethlefsen has written a nice introduction to Conntoea in the JLMA.
If you are interested in press clippings about Connotea you can follow a list of them here.
- tags:
-
Connotea hot 25
- Date:
- Thursday, 06 Mar ch 2008
Mitch Andre Garcia has just built a page that shows the Top 25 bookmarks on Connotea from the past week. He is working on some other scripts on top of Connotea, so keep an ear to the ground.
- tags:
-
Connotea is now OpenID enabled.
- Date:
- Friday, 22 Feb ruary 2008
We have added support for OpenID on Connotea.
If you don’t know what OpenID is then head over to the OpenID page to get an introduction. The short version is that it is a system for managing access to sites through a trusted ID provider.
Why are we doing this? We are hoping that the introduction of OpenID on Connotea will help you guy’s with managing your online personas, and in addition we are talking with some other groups about using it as a way of creating bridges between Connotea and some other services.
If you know what OpenID is already then just have a go and log in at http://www.connotea.org/openid. At the moment we are a relying party. This means that we don’t host or generate OpenID’s but if you have one you can use it to log in to Connotea.
Each Connotea account can have one OpenID associated with it. You can set this in the Advanced settings section of your account, now available in the Toolbox on the right hand side of the page.
If you don’t already have an account, one will be generated automatically for you when you log in with an OpenID.
- tags:
-
Short talk on Science and Web2.0
- Date:
- Wednesday, 20 Feb ruary 2008
I was in The Netherlands last week and gave a short presentation to some PhD students from Utrecht University on Science and Web2.0. I’ve uploaded the slides to slideshare.
-