• The Scientist by Richard Grant

    Nature Network's answer to the paparazzi: 'awesome, but not always right'. Drinks well with scientists.

    • On Journal Disambiguation

      Wednesday, 24 Jun 2009 - 09:46 UTC

      Never mind author contributor ID, or DOIs for articles, or whatever (I can’t be bothered looking up the links): I’m currently trying to find correct names for and de-duplicate entire journals.

      ouch
      there must be a better way

      I have to match up all occurrences of a journal’s name, including misspellings and tyops, in our database and correct them to the canonical abbreviation. For further enjoyment I’d like the URL of the journal’s main page, where one exists.

      PubMed, frankly, is a bit crap at finding journal names and their homepages. Anyone know of a good resource? Preferably one with an API or at least a script-friendly interface.

      In the meantime, my favourite journal so far is

      Meded Rijksuniv Gent Fak Landbouwkd Toegep Biol Wet

      closely followed by the laconic

      Pain.

      Last updated: Wednesday, 24 Jun 2009 - 09:46 UTC

      • Comments

        • Date:
          Wednesday, 24 Jun 2009 - 09:53 UTC
          Henry Gee said:

          You mean, you have to do this by hand?

          I see it all, now – ‘Information Architect’ is one of those euphemistic jpb titles, like ‘Recycling Aggregation Engineer’ (dustman) or ‘Imperial Grand Mekon, Galactic Emperor And Absolute Ruler Of All Living Things’ (Nature editor).

        • Date:
          Wednesday, 24 Jun 2009 - 09:57 UTC
          Richard Grant said:

          Seriously.

          How would a computer know, for example, that
          Nat Struct Biol
          and
          Nat Struct Mol Biol

          are the same journal? I only know because I published in it and watched the name change.

          And in the example above, you might be able to write a program that identified the three instances of J App Cryst, but would the same program be able to tell that Mol Cell and Mol Cells are different journals? I thought that was a mistake until I looked them up.

          Hence the request for an online resource.

        • Date:
          Wednesday, 24 Jun 2009 - 10:01 UTC
          Richard Grant said:

          Or even that ‘Neuorn’ is a tyop of ‘Neuron’?

          The Information Architect’s job is to make sure these mistakes do not occur in the rebuild of the input tool, but I have to fix the fubars that already exist, too.

        • Date:
          Wednesday, 24 Jun 2009 - 10:25 UTC
          Jennifer Rohn said:

          Like all Dutch journals, that one you mention sounds like something a Dutch person might yell after hammering his thumb during DIY.

          What you need is a sort of Google-esque “did you mean…” spelling approximator built into your dedupe routine. Then it can compile all the similar ones and ask for human input at the very end. Can’t one of your techy people set up a macro or something? I got one to do something like that when I was text-mining.

          Journals are hard to find sometimes. I find Wiley InterScience to be the worst: Google invariably leads you to the WIS pseudo-homepage that doesn’t let you do very much, and it’s very difficult to find the link to the real journal homepage within all the corporate mumbo-jumbo.

        • Date:
          Wednesday, 24 Jun 2009 - 10:35 UTC
          Richard Grant said:

          Isn’t that more or less the definition of Dutch?

          You’re right, that is exactly what I need. However, not only are the techs already flat-out working towards the site relaunch, but you’d have to populate the dictionary, and then still check everything to see if the suggestion is really what we mean. We did this morning discuss a type-ahead type thing for the next iteration of the site (and not allow people to enter anything that’s not canonical) but again, this task will have to be completed first.

          Best get to it, then!

        • Date:
          Wednesday, 24 Jun 2009 - 11:41 UTC
          Maria Wolters said:

          I have a very similar problem in my journal database, which features input from many different bibliographical sources. (Incidentally, do you know who made the decision to UPPERCASE ALL JOURNAL TITLES ON WEB OF SCIENCE? There is a special circle of hell for them.) My policy is to keep all journal titles in their full form and abbreviate for medical publications using a giant search-and-replace script.
          Some links I found useful:
          Biological journals and abbreviations

          Medical journal list, very script friendly

          How friendly are F1000 with Thomson? They should maintain lists of who merged with whom for all the glamourmags for which they compute impact factors.

        • Date:
          Wednesday, 24 Jun 2009 - 11:50 UTC
          Richard Grant said:

          Heh. Thanks Maria, those links look shiny.

        • Date:
          Wednesday, 24 Jun 2009 - 12:01 UTC
          Steve Roughley said:

          You could try the CODEN or ISSN. CAS administer CODENS from here – along with ISSNs and ‘official’ abbreviations. (Its all described here on Wikipedia
          There is also a short list (~1500) at the CAS website

        • Date:
          Wednesday, 24 Jun 2009 - 12:06 UTC
          Richard Grant said:

          Ah… we’ve got a list of abbreviations, not useful things like ISSNs. And two and a half thousand non-chemical journals.

          Thanks for the thought, though…

        • Date:
          Wednesday, 24 Jun 2009 - 12:12 UTC
          Frank Norman said:

          Maybe talk to someone at Suncat?

        • Date:
          Wednesday, 24 Jun 2009 - 12:36 UTC
          Duncan Hull said:

          Hi Richard, sounds like you need Named Entity Recognition . The stuff that text-miners get excited about. Its an “active area of research” – which as you probably know, means most of the available software isn’t very useful just yet…

        • Date:
          Wednesday, 24 Jun 2009 - 12:37 UTC
          Richard Grant said:

          snort

          Yeah. I’m the named entity, and I don’t recognize a bloody thing.

        • Date:
          Wednesday, 24 Jun 2009 - 14:18 UTC
          Richard Grant said:

          Oh! Just realized that Nature PG has a lot of these guys, with lots of lovely URLs: http://www.nature.com/siteindex/index.html

        • Date:
          Wednesday, 24 Jun 2009 - 14:20 UTC
          Ian Brooks said:

          Wish I could help. I always redux to Google et al.

          This kind of issue is exactly why, as we populate our database, or define ontology underlying metadata, users are given drop down menus for data entry.

          Thou Shalt Not Enter Free Text

          is my mantra.

        • Date:
          Wednesday, 24 Jun 2009 - 14:22 UTC
          Richard Grant said:

          I’ve just had that conversation with my head Developer. He’s a good bloke. I have to say that, my job depends on him.

        • Date:
          Wednesday, 24 Jun 2009 - 15:07 UTC
          Raf Aerts said:

          If this can be of any help:

          Meded Rijksuniv Gent Fak Landbouwkd Toegep Biol Wet

          =

          Comm Agr Appl Biol Sci Ghent Univ

          And Jenny, you’re absolutely right. The Dutch version does sound like something we would shout when hitting our thumb with a hammer:)

        • Date:
          Wednesday, 24 Jun 2009 - 15:09 UTC
          Richard Grant said:

          You’re not helping.

        • Date:
          Wednesday, 24 Jun 2009 - 15:54 UTC
          Frank Norman said:

          Sorry for my cryptic comment above – done in haste on the move.

          Suncat is the serials union catalogue for the UK, with serials records from all major UK research libraries (and NIMR!).

          I guess though that you are not just after a source of data, but a matching algorithm too? Can’t help there.

        • Date:
          Wednesday, 24 Jun 2009 - 15:55 UTC
          Richard Grant said:

          It’s actually turning out to be reasonably doable, if tedious. Got a tech to hit google and return the first hit for each abbreviation, which is helping populate my URL list sensibly.

          When I’ve made this list, I’m flogging it.

        • Date:
          Wednesday, 24 Jun 2009 - 20:23 UTC
          Steve Roughley said:

          Apparently, CAS, as the administrator of the CODENs list, assigns them to just about anything that looks vaguely like a journal, even if its not chemical, and not abstracted by them… apparently… although not tried it. Still, no-one reads anything that hasn’t got ‘chem.’ in its title somewhere do they???? [JOKE!!!]

        • Date:
          Thursday, 25 Jun 2009 - 06:18 UTC
          Richard Grant said:

          That’s interesting, because the first few I looked for weren’t there.

        • Date:
          Thursday, 25 Jun 2009 - 06:34 UTC
          Frank Norman said:

          CAS does have a very wide coverage – I think about 13,000 serials – but it doesn’t cover everything.

        • Date:
          Thursday, 25 Jun 2009 - 06:35 UTC
          Frank Norman said:

          CAS does have a very wide coverage – I think about 13,000 serials – but it doesn’t cover everything.

        • Date:
          Thursday, 25 Jun 2009 - 08:14 UTC
          Steve Roughley said:

          Thinking about it, CAS, in the guise of Scifinder, must have done something similar, as they have a ‘locate article’ feature, which in the journal field will figure things like BMCL, Bioorg Med Chem Lett etc to all mean Bioorganic & (or is that ‘and’?) Medicinal Chemistry Letters, for example – but not sure how well it deals with ‘common’ typos.

          And no, I’m not on any sort of commision with CAS – its just that they happen to be the ones I interact with most!

        • Date:
          Thursday, 25 Jun 2009 - 23:07 UTC
          Sabbi Lall said:

          Yes, others must have had to solve this problem at some point (so for example ISI and Scopus had to deal with NSB and NSMB at some point). If you asked it to search the first several characters in the names (e.g. Nature Struc*), you would have found both names and a bunch of any tyops, that might hone things down a bit?
          My comment’s useless, so here’s a URL to help: www.nature.com/nsmb
          Life is Pain

        • Date:
          Thursday, 25 Jun 2009 - 23:20 UTC
          Heather Etchevers said:

          Life = Pain
          Time = Life

          ergo…

        • Date:
          Thursday, 25 Jun 2009 - 23:29 UTC
          Cath Ennis said:

          I always thought Gut was a good journal name.

          It would be fun to cite Pain in Gut.

        • Date:
          Friday, 26 Jun 2009 - 00:32 UTC
          Austin Elliott said:

          I’m oddly proud of my solitary paper in Gut – mainly because Gut is the only scientific journal I’ve ever seen feature as “guest publication” (for the missing words in headlines round) in the TV show Have I Got News For You.

          PS “Neuorn” sounds like a kind of being in one of Tolkien’s books to me. Just thought I’d say that before Henry did.

        • Date:
          Friday, 26 Jun 2009 - 00:35 UTC
          Cath Ennis said:

          Austin, your paper looks from the abstract like it might involve the release of calcium from intracellular stores…?

        • Date:
          Friday, 26 Jun 2009 - 00:36 UTC
          Sabbi Lall said:

          or Pain in Blood

          Richard, Pain is here

        • Date:
          Friday, 26 Jun 2009 - 05:55 UTC
          Richard Grant said:

          Thanks guys. Only another 2392 to go.

          Actually, the NPG pages were really helpful, I didn’t realize so many journals were theirs, and they have a lovely page of them all.

          Sarbjit, good plan, but when you have thinks like
          Mol Cell and
          Mol Cells

          (two different journals)

          J Mol Biol and
          J Mol  Biol or even
          J MoL Biol
          (same journal, two misspellings)

          then it gets tricky.

        • Date:
          Friday, 26 Jun 2009 - 10:39 UTC
          Austin Elliott said:

          Cath: yep, among other things. I am a calcium signaler / microscopy geek by scientific trade

          Re. journal names, you can’t get it confused with anything, but after sitting through another 40 minute seminar of myriad incomprehensible abbreviations, slides of unlabelled 20-lane Western blots, or handle-turning mutagenesis of every residue in a protein, I often think it is no accident that there is a journal whose abbreviation is Anal Biochem…

        • Date:
          Friday, 26 Jun 2009 - 11:46 UTC
          Frank Norman said:

          Don’t forget Biochemistry and Biochemistry. Yes, you can find two journals with the exact same title, so then we add on the place of publication to disambiguate, so the second of those becomes Biochemistry (Moscow). Strictly speaking the first one should be Biochemistry (Washington) but we tend to omit the qualifier for the more familiar title.

        • Date:
          Friday, 26 Jun 2009 - 13:05 UTC
          Richard Grant said:

          Yah, I have a couple like that, too.

        • Date:
          Wednesday, 22 Jul 2009 - 16:01 UTC
          Pamela Arroues said:

          My favorite site for figuring this stuff out is Genamics JournalSeek at http://journalseek.net/index.htm Most of the abbreviations are in there.

          I too am searching for the translation to Meded.Rijksuniv.Gent Fak.Landbouwkd.Toegep.Biol.Wet. What a pain


Search blogs

web feed Want a blog?

Submit this post to

Advertisement