The H-factor or H-index is beloved by many as an incorruptible measure of research productivity. I have begun an informal investigation of abuse of citation to see what is really going on in the dark world of academic self-promotion. Self-citation was of course recognised as an issue by Hirsch when he proposed his index but generally people have though it would require intense effort by scientists to build up an impressive track record by self-citation alone. However this may not be the case…
I have studied people in my own field (Materials science) as I have my own opinions (prejudices) of good-guys and bad guys. If we use the Web of Science as a suitable data base we can use the Citation Report tool to eliminate self-citations from the body. I have found that the good guys have self-citations as < 50% of total cites (I am at 25%). Bad guys have > 50% self-citations. The best I have found so far is h-factor of 12 based on > 80% self-citation. I need to do more research on this so watch this space.
Brian, please keep me posted on this. A related issue is citation networks, where a group of researchers is citing each other. In this paper, the author takes a closer look at self citation. One interesting result: self-citation rates are much higher in biology than in physics.
And then there is also journal self-citation. An editorial about research papers in the same issue would be the subtle way, urging authors to cite papers published in the same journal more obvious.
Martin,
the link is broken! It says Session cookie error
Sorry for the broken link, here is the full reference:
Hyland K. Self-citation and self-reference: Credibility and promotion in academic publication. Journal of the American Society for Information Science and Technology 2003:54; 251-259. DOI: 10.1002/asi.10204.
There is a paper coming out in TREE that suggests that self-citation doesn’t have a big effect in ecology and evolution. I can send you a pdf if you can’t get hold of it. Their analysis is rather indirect, so I’m not sure what exactly it means. But then I’m not sure how exactly the h-index should be interpreted anyway.
The h-index is used by appointment and promotion committees in a number of universities. Self-citation does have an effect. I have seen people who’s self-citations massively outweigh external citation and have generated respectable h-index scores for junior faculty.
Of course if you study the citations, it is soon easy to see who is really citing the person under study. However, with automated citation analysis available on ISI (with no automatic self-citation filter), people tend to look at the h-index that includes self citation.
I think the whole problem with measuring status through citation analysis is that such simple measures are far too open to gaming. This isn’t too much of a problem if they are indicators only but when people’s careers are influenced by their specific values then the games begin. I tend to be most frustrated by the ISI’s Impact Factors for journals and their misuses but I wonder whether any form of citation analysis is really proof against similar abuses.
The most interesting and encouraging work in this area seems to me to come from Herbert Van de Sompel , Johan Bollen and others out of Los Alamos. For example:
I’ve been thinking about looking at citations as counting processes, for which there is a lot of statistical theory. It feels like an interesting project for a PhD – there is plenty of data, and an aim (“design a good metric”) that is well defined.
Now all I need is the money and the student.
Chris — there is quite a literature on download statistics (I’ve read at least one of those papers you mention, but also others – and I think Peter Murray-Rust has blogged about this) but they are presumably open to similar gaming, as well as scrapers, spiders, and other robotic web activities?
Scopus (Elsevier) is doing some work in this area and I believe (but have not tried) it is free to access for a while.
Google Scholar is another platform that could in principle be developed in this way but it has been sadly quiet since first launch, and seems to have some intrinsic problems as well as the fact that it counts more of the grey literature, certainly than ISI does and possibly more than Scopus does.
There is certainly a lot of interest from all kinds of parties in alternatives to the impact factor, but the h index seems to be a flawed measure, from the Nature articles I have read by Phil Ball and others.
An alternative to Impact Factors is the SJR Indicator. A recent FASEB paper compares the two indicators in more detail.
Impact Factors and other bibliometric measurements will always be subject to manipulation, and self-citation is just one example. I only see two ways out of this dilemma. Either we forgo quantitative bibliometric measurements completely (unlikely) or one takes an active position and changes the algorithm in response to perceived misuses.
Similar things happen with web searches, only here it is called Search Engine Optimization (SEO). Citation networks are called link farms and self-citation is also common. There is a lot of experience at Google and one would think that this could be used for Google Scholar.