2 Questions

As this is a workshop, we will move along with a bunch of questions we stop to wonder together. These questions are introduced within the text itself, and you can go back and forth between to the context where they are discussed and brief descriptions provided here.

2.1 In your field, where do you get your data?

Do you think that in your scientific field there is a culture of using the data done by other people, or a culture of collecting your own dataset?

2.2 Do we have a culture of data reuse?

Are you aware of examples where one dataset is used, modified, and then redistributed as another dataset? In which context has this happened? Was the team same in both resources?

2.3 Do you plan to publish your data?

No matter what kind of research we do, we use some type of materials to do that. Are you planning, or did you already, to publish your dataset? Have you been suggested to do so, and who is leading this discussion? Who have been setting examples of how to do this?

2.4 Why would we want to track the use of a dataset?

It is interesting to know who all have used the dataset and why. What are the benefits of this?

2.5 Why does it make sense to say that old Public Domain data is ‘easiest’ to work with? What are the implications?

I have argued that when we move onward from Public Domain data new questions arise very fast, and the whole data processing pipeline becomes much more complicated. Is this really the case?

2.6 Do we have a balance between choosing what we research and conducting research because there is available data for it?

There are many questions we currently cannot ask because there is no suitable data available. Do we see a problem in the way how policies around the data influence and direct our scientific process?

2.7 Openness at all cost?

Some data cannot be made open. Or making it open would require processes, such as anonymization, that may render the resource unusable for your use. We often can make selections that help us to publish data openly, but this may come with costs. If the cost is that the data we can make open is less suitable for our research purposes, is this something we should accept? For example, I can do mostly anything with newspapers published in 1920s in Soviet Union, but if I would like to study newspapers from 1980s, the limitations would be much greater. I can still use the materials, I can document in various ways what is done, but I most likely cannot redistribute the whole material entirely openly. If I study how one grammatical case is used, both are probably fine, but what if I wanted to study political discourse in the 1980s?

1 Introduction

3 What is research data?