02/11/16 Grey data liberation


Every once in a while, PhD students and young researchers may come across the situation of asking for external grey data to continue with our research. This has always been a dilemma and extremely vague to operate without clear rules. As open science researchers we want to make most dataset widely and easily accessible for researcher all over the word, but why would somebody give you their data for free under the risk of it being misinterpreted? Thus, in today’s brain storming, we built up a fictional retired professor ‘Shommi’ to discuss the following questions, and trying to figure out strategies of obtaining the dataset from Dr. Shommi in a legitimate and mannered way.

  • What is grey data?
  • Why is it desirable to see grey data published?
  • What sources of grey data exist?
  • Why might a data producer not publish on the data they’ve produced?
  • What sort of questions can we ask of grey data?
  • How do we convince producers of grey data to work with us?

Dr. Shommi is a 78 year old retired professor. He worked in a medical school on veterinary and entomology. Specializing in lyme disease, he has got a lot data of tick genome from more than 15 years observation. However, he is very stubborn and reluctant to either publish or share his grey data. What can we do with him?

First of all, we discussed the definition of grey data: data that is not published, underlying data behind paper or public report, vaguely be found in government report/survey. In medical trials, it’s almost a known secret that researchers sometimes selectively publish good data, while the whole dataset might reveal more comprehensive effects.

Then it comes the question why we need grey data. Because tax money was spent on collecting them, long-term data can change in years, making some grey data valuable.

If grey data is valuable, why don’t they publish data? Well we all know it takes a lot effort and 100 reasons to get a paper published…but one small mistake can ruin the whole project. The grey data could be too messy for the owner to clean up, he might have too many projects in hand and the dataset lost priority, or (one of the most common and unspoken reason) the data is negative, or against their original hypothesis.

Given the above concerns, what would the motivation for researchers to share grey data? 1. authorship, but only if researchers take active role in explaining, cleaning or analyzing the data. 2. altruism, some open scientists are always willing to share their unused data. 3. legacy. It happened a lot with old retired professor that valuable dataset were laying in the basement till they passed away. In this case you need to reach out to the deceased researcher’s spouses to dig the grey data.

Now assuming that we got objected by Dr. Shommi for the first time. What can we do next to change his mind: 1. flattering, will we thaw his stubbornness? 2. bring some big name into the conversation and hopefully it would bypass the politics behind it. 3. thoroughly explain how interesting your plan this, and including his grey data will add the last piece of the puzzle. 4. Go through the university and require for it as the university intellectual property. This sounds legitimate and ethical but community may not cooperate well.
What will we do if we were asked the same question as Dr. Shommi, will we give out our grey data? It is possible to put our unpublished data in the thesis, and readers who are interested can ask us for the data.
Risks about liberating grey data were also discussed. This has been a long lasting arguments since the first day of modern scientific research. Community breakdown and personal conflicts have always been obstacles to sharing open data. Dealing with unethical people on grey data can put young generations of researchers at career risks.
In the end let’s come back to open science topic, how should we get some open data to adopt for the class? One option would be asking retired and pre-retiring professors to donate their unused dataset. Masters students who never got to published their data can be potential resources too.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s