5 Open access

In this lecture we are taking it granted that we do have research topics that are suitable for data that we eventually make openly accessible. This is naturally often not the case, and we try to clarify this picture here a bit more. Very often we want to work with copyrighted materials, or those that contain various personal data, and this normally cannot be made openly available that easily.

What it comes to copyrighted material, different countries have different practices. Fair use is in some places more openly interpreted than in others. In Finland we have now examples in the Language Bank of Finland where the text corpus has been shuffled on sentence level, and then published under CC-BY. EU’s text mining laws probably will also impact to various questions here.

It is absolutely the case that the easiest research data we have is something that is in Public Domain. This means that the copyright of the material has expired, and cannot be claimed. Old newspapers are surely as popular in research now as they are because they are because of this. I have also recently done one larger study where my colleague and I used entirely open dataset from the Fenno-Ugrica collection of the National Library of Finland.2

This doesn’t mean that we cannot use materials that are more restricted, we just often cannot redistribute them entirely openly. However, if the original materials are stored in an organization that offers stable identifiers i.e. to individual pages, then we probably can build a rather reproducible case around them by referring in great detail to the locations of whatever observations we are studying. A lot is also connected to what kind of research we are doing.

Some other examples can be found. For example, Yvette Oortwijn et al.3 share their research data with researchers who are able to show that they own copies of those books themselves.^[https://github.com/YOortwijn/Challenging_DMs]. The idea here is probably connected to the fact how personal scanned copies in private use are legal in many legislations. This is the only time I’ve seen such a convention.