June 21, 2010


Last week I dove head first into extracting information about journal policies on sharing and citing data. I used the Reuters Citation Reports of 2008 + 2009 to compile all the journals with calculated impact factor in the categories ‘Ecology’ and ‘Evolutionary Biology

I attempted to quantify these policies by creating fields with a binary response (yes/no , present/not present) and a related field which could capture the text of such a policy. For instance, with the field “Journal Requires Accession number” I would record a yes or no response and then cite the text from the journal’s policy that stipulated this requirement. (With a field like “accession number” , as most can imagine, there was little coverage beyond GenBANK)

Like anyone collecting a wide swath of info. sources, I had some preconceptions that I figured would play out in the data I was gathering:

  1. I expected larger publishers to have little interest in fostering sharing.
  2. I expected journals with society affiliations to have a more nuanced stance on data sharing (and possibly be candidates for data citation policy)
  3. I expected that a publishers size would have some affect (positive or negative I wasn’t sure, but I thought for some reason this would be important)
  4. I had a hunch I would see some correlation between sharing policy and impact factor    (though I’m not expecting a clear correlation to be discernable at this point)

I can say, generally my expectations for larger publishers having any semblance of a policy regarding sharing to be generally true. Wiley-Blackwell, Elsevier, Springer and Taylor & Francis (The Major Players) all paid lip-service to sharing data with sections like “include supplementary material” but these were often limited to very basic (jpeg, .pdf or .xls) file formats or extremely small file sizes (8mb or less). With restrictions like these the policies weren’t meant to facilitate the extensive sharing of a papers underlying data, but offered instead an opportunity to include a few supplementary tables or simple figures. *It’s also worth noting, that there was no statement about an archiving policy for such supplemental material.

However, this lack of a sharing policy was not universal for all large publishers. Nature has a well developed and exceptionally valuable statement on sharing data that was linked to by all of their journals. NPG not only highlights the importance of sharing the underlying work, but also makes extensive recommendations about where to do so. The recommendations may not seem overwhelming (most university science librarians could whip up the same finding aide in an afternoon) but the mention of a repository other than GenBANK or GEO was absent in every other Major Players policy (Yet another example of why last week’s war of words between the University of California Library & NPG was so disheartening)

The lack of sharing policy was not necessarily consistent among all journals published by an individual major player.  For example Polar Research published by Wiley has this little big tacked on to the end of their otherwise standard policy:

Data that are integral to the published article must be made available in such a way as to enable readers to replicate, verify and build upon the conclusions published in the paper. Restrictions on the availability of this data must be disclosed at the time of submission.

We recommend that data for which public repositories are widely used, and are accessible to all, should be deposited in such a repository prior to publication. The appropriate linking details and identifier(s) should then be included in the publication and, where possible, in the repository, to facilitate linking between the journal article and the data. If such a repository does not exist, data should be included as supporting information to the published paper or authors should agree to make their data available upon reasonable request.

Again, this is nothing overwhelming in terms of what domain practitioners and information professionals already know, but in the bland and rather monotonous verbiage of most major player policies in this subset, this is a rather unique statement.

As for my presumptions with respect to journals with society affiliations, this proved to be a poor point of correlation. Other than ESA’s Data Registry, there was little proof that these sharing/citation policies and society affiliations have any immediately discernable relationship.

With respect to the last two expectations, I am going to hold off on making any declarations. My sample size (approx 150) is still to small to make such declarations, but this early in my gathering, there does seem to be a positive correlation between a journal mentioning sharing/citation in their policy and having a high impact factor rating. This would be a fascinating conclusion to draw in a few weeks time, but right now it’s merely a trend in my data gathered so far.