• No results found

Text and Data Mining in EU Copyright Law

N/A
N/A
Protected

Academic year: 2021

Share "Text and Data Mining in EU Copyright Law"

Copied!
51
0
0

Loading.... (view fulltext now)

Full text

(1)

Department of Law

Spring Term 2020

Master Programme in Intellectual Property Law

Master’s Thesis 30 ECTS

Text and Data Mining in EU Copyright Law

(2)

Table of Contents

INTRODUCTION ... 3

Subject and Purpose ... 3

Material and Method ... 4

Delimitations ... 5

Outline ... 5

1. A BRIEF INTRODUCTION TO TEXT AND DATA MINING ... 7

2. FUNDAMENTAL EU COPYRIGHT LAW ... 11

2.1 Protectable Subject Matter and Exclusive Rights ... 11

2.2 The DSM Directive and TDM ... 14

3. WHY AND HOW MIGHT COPYRIGHT BE AN OBSTACLE FOR TDM ... 21

3.1 Beneficiaries ... 22

3.2 Lawful Access ... 26

3.3 Retrieval and Analysis ... 37

3.4 Sharing Results and Spreading Knowledge ... 38

3.5 Storage and Verification ... 40

(3)

INTRODUCTION

Text and data mining (TDM) can be a useful tool in such diverse fields as scientific research, journalism, culture and not least training of artificial intelligence (AI) and its importance is likely to only grow in the future. Despite its huge potential there are many indicators that copyright law restricts use of TDM – keeping users from optimal application. Copyright law should foster innovation and creativity and when it risks having the opposite effect it needs to be well motivated.

In the summer of 2019 the Directive on Copyright and Related Rights in the Digital Single Market (the DSM Directive)1 was adopted and provided the EU with two new copyright

exceptions for TDM, which along with the rest of the directive are currently being implemented into national law. Given the recent changes in European copyright law concerning TDM and the technique’s promising usefulness it is now relevant to investigate how they interrelate.

Subject and Purpose

The purpose of this thesis is to describe whether and to what extent copyright can be an obstacle for TDM with focus on the recent changes in EU law, critically comment what has changed following the new directive and what is still missing for an efficient application of TDM. Efficient application is for the purpose of this thesis not to be understood as economic efficiency, but rather practical application within the boundaries of current framework with a satisfactory level of legal certainty.

The thesis will answer the following research questions:

• Who may benefit from the exceptions in the DSM Directive?

• How may contractual provisions restrict the efficient application of TDM?

• How may technological protection measures (TPM) restrict the efficient application of TDM?

(4)

• To what extent does the legal framework permit publication of the result including parts of the input material?

• To what extent does the legal framework permit storage of the copies generated during the TDM process?

• How well does the DSM Directive meet its objectives of increased legal certainty and harmonisation?

The above questions will be discussed throughout the text and critically analysed in search of inconsistencies and practical problems that might arise when using TDM.

Material and Method

The main focus will lie on EU copyright law with examples from states in and outside of Europe where suitable. The main union law to be discussed is:

• Dir. 2019/790 on Copyright and Related Rights in the Digital Single Market (DSM Directive)

• Dir. 2001/29/EC on the harmonisation of certain aspects of copyright and related rights in the information society (InfoSoc)2

• Dir. 96/9/EC on the Legal Protection of Databases (Database Directive)3

The thesis will be based on a qualitative method through analysis of relevant provisions in the above legal texts and study of scholarly articles and reports commenting the drafting of the DSM Directive or evaluating the final version as well as related legal questions. Stakeholder and interest organisation views will be part of the material to reflect practical issues that might arise but will only be given little space in order to avoid giving the lobbying activity that revolved the drafting process of the directive too much focus. Finally, it is not the author’s choice to exclude EU case law from the material, but relatively few questions regarding the admissibility of TDM under the abovementioned legislation have been referred to the CJEU4 – possibly as a result of an undeniable level of uncertainty surrounding the

2 Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society (InfoSoc) [2001] OJ L 167, 22.6.2001, p. 10–19

3 Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases (Database Directive) [1996] OJ L 77, 27.3.1996, p. 20–28

(5)

relationship between copyright and TDM reducing the motivation to test the law. As a result, there will be few references made to EU case law throughout the following text.

Delimitations

It should be noted that TDM involving a minimum of copying or use of techniques crawling though data and processing each work separately could be performed without infringing copyright or database law5, but the objective of this thesis is not to offer technical solutions to

a legal problem. Rather, the choice of technique should be based on what is optimal to reach the desired result and not dictated by law. Similarly and for the same reason, the thesis will only describe the TDM process to the extent required to appreciate the legal discussion, limiting the technical detail to a minimum.

The main focus point for the thesis is copyright exceptions6, hence the rightholder’s exclusive

rights will only be described to provide the right context. Article 3 of the DSM Directive applies to cultural heritage institutions in addition to scientific research organisations, but the thesis is limited to discussing the latter as a beneficiary only. Finally, other legal areas such as data protection and the General Data Protection Regulation (GDPR)7 can naturally hinder

TDM too, but only copyright law will be touched upon.

Outline

The thesis is divided into four chapters beginning with a brief introduction to text and data mining presenting the most central steps in the process and when it can be applied. The second chapter handles the fundamental EU copyright law that will form the basis for the following legal discussion as well as approaching the question of protectable subject matter. The third chapter forms the main body of the thesis and serves to identify legal barriers

5 Christophe Geiger, Giancarlo Frosio and Oleksandr Bulayenko, ‘Text and Data Mining: Articles 3 and 4 of the Directive 2019/790/EU’ (Centre for International Intellectual Property Studies (CEIPI) 2019) 2019–08 7 <https://papers.ssrn.com/abstract=3470653> accessed 26 February 2020; Maria Bottis and others, ‘Text and Data Mining in the EU Acquis Communautaire Tinkering with TDM & Digital Legal Deposit’ (2019) 12 Erasmus Law Review 179, 179.

(6)
(7)

1. A BRIEF INTRODUCTION TO TEXT AND DATA MINING

The purpose of this chapter is to define TDM within the scope of the thesis and to shortly present the stages of the TDM process aiming not at a full technical description of TDM but a concise introduction necessary to appreciate the legal discussion. To illustrate that TDM can be of great use for the society and that, subsequently, not only traditionally defined scientific research purposes would benefit from a strong exception, the chapter ends with examples of the diverse application areas for TDM.

The definition of TDM8 used throughout this thesis is the act of, using computer code,

analysing and recombining large amounts of digital information (hereinafter referred to as “input material”) in order to identify new patterns and associations and with the objective to extract knowledge.9 This corresponds largely with the definition provided by the DSM

Directive.10 A human with the same ambition would likely apply a verification-based

approach, i.e. to work out a hypothesis which they subsequently seek to confirm though testing. TDM means applying a discovery approach – that is, investigating multiple possible relationships in a pre-defined dataset at the same time in order to identify those which stand out. The advantage compared to manual work is the ability to handle complex combinations and not being limited by human imagination.11 Additionally, since TDM research can be

extended with very little additional risk it is possible to carry out more tests, including those less certain to produce results, at very small marginal costs.12

There are a number of different mining techniques, but their common starting point is the identification of input material – which can be both individual works as well as works organised in a database and, depending on area of application for the study, for instance

8 Also referred to as ”text and data analysis”

9 Eleonora Rosati, ‘Copyright as an Obstacle or an Enabler? A European Perspective on Text and Data Mining and Its Role in the Development of AI Creativity’ (2019) 27 Asia Pacific Law Review 198, 1–2.

10 DSM Directive art. 2(2)

11 Francesca Bignami, ‘European Versus American Liberty: A Comparative Privacy Analysis of Anti-Terrorism Data-Mining’ (2011) 48 Boston College Law Review 609, 614–615.

(8)

consist of digitally stored text, data, images or sounds.13 Generally the TDM process can be

summarised in the following steps14:

1. Retrieval of input material – Copies of sources or databases, in whole or in part, are made and downloaded to the own server or platform.15

2. Creation of a dataset – Data relevant for the study is identified and copied from the collected material and entered in a dataset. This step can involve adding metadata and/or adapting the format to one readable by machines16 – PDFs are for instance not

machine readable.17 This is referred to as normalisation and annotation and is

sometimes performed by publishers as a service, yet some researchers prefer to do it themselves.18

3. Analysis of the dataset – The computer recombines the data in the dataset and searches for patterns.19

4. Publication – The process is sometimes finished by publication of findings.

According to a report from 2007 half of all published scientific papers are only read by its author, referees and editor.20 However, like people in the 18th century got access to and started

to read more books when the printing press was invented, we are now learning to read a million books at a time through software.21 Less time can now be spent on researching what

someone else has already done and more time can be spent making new discoveries.22 This

illustrates the evolution that use of digital information has undergone from mere information retrieval to value extraction. Algorithms can perform complicated comparisons and find patterns in collections of information which are too encompassing for the human brain to

13 Bottis and others (n 5) 179.

14 Marco Caspers and Lucie Guibault, ‘D3.3 Baseline Report of Policies and Barriers of TDM in Europe’ (FutureTDM 2016) 8–9 <https://www.futuretdm.eu/knowledge-library/> accessed 6 February 2020. 15 Bottis and others (n 5) 179; Caspers and Guibault (n 14) 8.

16 Caspers and Guibault (n 14) 8.

17 Michelle Brook, Peter Murray-Rust and Charles Oppenheim, ‘The Social, Political and Legal Aspects of Text and Data Mining (TDM)’ (2014) 20 D-Lib Magazine 2 <https://openaccess.city.ac.uk/id/eprint/4784/> accessed 25 February 2020.

18 Bottis and others (n 5) 179. 19 ibid.

20 Margoni and Kretschmer (n 12).

21 Martine Oudenhoven, ‘TDM and the Reading Revolution’ (FutureTDM, 12 April 2017)

(9)

handle and this has given large gatherings of data an additional value separate from that of each component making the collection.23

TDM makes it possible to draw new knowledge from literature for use in the natural and human sciences, computers can learn to recognise motives by mining of photographs and audio recordings can be mined to create translation tools.24 A concrete example is how the

BlueDot project used TDM to discover the outbreak of the corona virus and managed to warn their clients to avoid the Wuhan area a little under two weeks before the WHO (World Health Organisation) sent a similar public notice.25 The company uses AI to predict disease outbreaks

and to track down the connection between outbreaks and travel, among other sources 100,000 news reports in 65 languages were searched through each day.26 Other initiatives use TDM to

try to find a vaccine for the disease by analysing scientific texts about the coronavirus family.27

In journalism TDM can be used to control the accuracy of historical facts and thus expose fake news. It was also a central tool in exposing the Panama Papers scandal28.29 TDM has

made its own contribution to culture as a tool to expose art forgeries30, in an attempt to

23 Maurizio Borghi and Stavroula Karapapa, Copyright and Mass Digitization (1st edn, Oxford University Press 2013) 50–51 <https://www.oxfordscholarship.com/view/10.1093/acprof:oso/9780199664559.001.0001/acprof-9780199664559> accessed 13 March 2020.

24 Sean Flynn and others, ‘Implementing User Rights for Research in the Field of Artificial Intelligence: A Call for International Action’ (2020) 2020 European Intellectual Property Review 11, 8.

25 Marc Prosser, ‘How AI Helped Predict the Coronavirus Outbreak Before It Happened’ (Singularity Hub, 5 February 2020) <https://singularityhub.com/2020/02/05/how-ai-helped-predict-the-coronavirus-outbreak-before-it-happened/> accessed 30 April 2020.

26 ibid; Sean Flynn and João Pedro Quintais, ‘Implementing User Rights for Research in the Field of Artificial Intelligence: A Call for Action at International Level’ (Kluwer Copyright Blog, 21 April 2020)

<http://copyrightblog.kluweriplaw.com/2020/04/21/implementing-user-rights-for-research-in-the-field-of-artificial-intelligence-a-call-for-action-at-international-level/> accessed 30 April 2020.

27 Flynn and others (n 24) 3.

28 The leak of 11.5m files which revealed offshore tax evasion among a high number of prominent individuals. Luke Harding, ‘What are the Panama Papers? A guide to history's biggest data leak’ The Guardian (London 5 April 2016) https://www.theguardian.com/news/2016/apr/03/what-you-need-to-know-about-the-panama-papers accessed 28 May 2020.

29 Margoni and Kretschmer (n 12); P Bernt Hugenholtz, ‘The New Copyright Directive: Text and Data Mining (Articles 3 and 4)’ (Kluwer Copyright Blog, 24 July 2019)

<http://copyrightblog.kluweriplaw.com/2019/07/24/the-new-copyright-directive-text-and-data-mining-articles-3-and-4/> accessed 13 January 2020; Geiger, Frosio and Bulayenko (n 5) 5.

(10)

complete Beethoven’s 10th symphony31 and to let a code create “affordable” art32. By

processing of text, it can be used to find grammatical patterns33, to create automated

translation tools34, processing of legal texts – for instance in order to identify common

denominators in case law35 and smart disclosure systems warning consumers for risks in

accepting far reaching terms and conditions online36. Another major area of use is

development of AI, which is dependent on the use of TDM to train machine learning

systems.37 Finally, TDM is used in intelligent crime analysis to try to predict criminal acts,38

in search of terrorism suspects by mining call records or other digital information about individuals as well as to single out possible terrorists from boarding a plane by cross referencing airline records with other data.39

The vast area of application and possible users shows TDM’s economic potential but also its ability to work in the public interest in service of many parts of the society motivating the facilitation of use of the technique.

31 Justin Huggler, ‘Computer Is Set to Complete Beethoven’s Unfinished Symphony’ The Telegraph (13 December 2019) <https://www.telegraph.co.uk/news/2019/12/13/computer-set-complete-beethovens-unfinished-symphony/> accessed 19 May 2020.

32 Per Kristian Bjørkeng, ‘Han syntes kunst var for dyrt i Oslo. Løsningen hans blir elsket av vennene hans, men avvist av eksperter.’ Aftenposten (Oslo, 22 January 2020)

<https://www.aftenposten.no/article/ap-pLE8Rw.html> accessed 19 May 2020. 33 Hugenholtz (n 29).

34 Margoni and Kretschmer (n 12).

35 Adam Wyner and others, ‘Approaches to Text Mining Arguments from Legal Cases’ in Enrico Francesconi and others (eds), Semantic Processing of Legal Texts: Where the Language of Law Meets the Law of Language (Springer 2010) 61 <https://doi.org/10.1007/978-3-642-12837-0_4> accessed 19 May 2020.

36 Rossana Ducato and Alain Strowel, ‘Limitations to Text and Data Mining and Consumer Empowerment. Making the Case for a Right to “Machine Legibility”’ (Kluwer Copyright Blog, 19 March 2019)

<http://copyrightblog.kluweriplaw.com/2019/03/19/limitations-to-text-and-data-mining-and-consumer-empowerment-making-the-case-for-a-right-to-machine-legibility/> accessed 5 February 2020.

37 Hugenholtz (n 29); Flynn and Quintais (n 26).

38 Mohammad Reza Keyvanpour, Mostafa Javideh and Mohammad Reza Ebrahimi, ‘Detecting and Investigating Crime by Means of Data Mining: A General Crime Matching Framework’ (2011) 3 Procedia Computer Science 872, 872–873.

(11)

2. FUNDAMENTAL EU COPYRIGHT LAW

Since frequently only parts of works are used in TDM, this chapter starts with a definition of protectable subject matter before it moves on to a discussion about the exclusive rights that may be infringed during the TDM process, followed by the available exceptions for TDM with focus on the DSM directive.

2.1 Protectable Subject Matter and Exclusive Rights

Entire works such as a database or a digital library can be part of the input material, but so can isolated segments of them such as selected parts of the database or only the library book sections describing male and female characters40. It therefore makes sense to first investigate

what kind of input material can constitute protectable subject matter and the status for fractions of it. For the input material to enjoy copyright protection under the InfoSoc

Directive it needs to be original.41 According to the CJEU, the originality requirement is met

if the work contains the author’s own intellectual creation42 – an interpretation that should be

understood as implicitly harmonised by the directive43. Words as such are not protected44, but

the choice, sequence and combination of them can be, presuming that they meet the

originality requirement45. The CJEU has found that a combination of 11 words is capable of

this46 and that a data capture process therefore can be within the scope of the directive.47

Relevant for TDM is that high-level data – books newspapers, articles – therefore is more likely to be held covered by copyright than low-level data – such as phone numbers or scientific measurements.48 For a database to be eligible for copyright protection, the author’s

own intellectual creation must be shown in the “selection and arrangement” of content49,

40 This example is inspired by a real project to investigate how gender was represented in literature during a 300-year period. Matthew Sag, ‘The New Legal Landscape for Text Mining and Machine Learning’ (2019) 66 Journal of the Copyright Society of the USA 64, 5

41 Berne Convention for the Protection of Literary and Artistic Works 1886 (Berne Convention) art. 2(5) 42 C-5/08 Infopaq 1 para 34-37

43 Isabella Alexander, ‘The Concept of Reproduction and the “Temporary and Transient” Exception’ (2009) 68 The Cambridge Law Journal 520, 521.

(12)

which applies disregarding whether the individual parts as protectable or not.50 It follows

naturally that extracting parts from such a database runs a relatively low risk of copyright infringement since it would require the extraction to mirror the selection and arrangement of data which made it original.51 Merely organising a database in alphabetical or other logical

order is not considered to show intellectual creation in selection and arrangement and is therefore not sufficient for copyright protection52 and since the structure of scientific

databases are dictated by technical factors, they generally don’t meet the originality

requirement.53 On the other hand, the investment made in arranging “non-original” databases,

may enjoy protection under the sui generis right54 if the maker can show that a substantial

investment (qualitatively or quantitatively speaking) has been made in obtaining, verifying or presenting the data.55 For instance, it would be difficult to prove that a database containing

computer-generated airline schedules meets the requirement of a substantial investment56 and

is an example of material which is neither protected by copyright not the sui generis right. Data generated by a satellite on the other hand is likely to be deemed observed rather than generated and would if so be eligible for protection from the sui generis right if “arranged in a systematic or methodological way”57.58 Raw data59 can be processed using TDM and is

neither protected by copyright nor the sui generis protection since it lacks originality and isn’t “arranged in a systematic or methodological way”60 and as such doesn’t meet the

requirements for a database.61 This corresponds with the general principle in copyright law

that facts and data as such aren’t protectable subject matter, but the original expression of them is.62 As an example, scientific articles are generally protected by copyright, but not the

50 Justine Pila and Paul LC Torremans, European Intellectual Property Law (2nd edn, Oxford University Press 2019) 489.

51 P Bernt Hugenholtz, ‘Against “Data Property”’ in Hanns Ullrich, Peter Drahos and Gustavo Ghidini, Kritika : Essays on Intellectual Property, vol 3 (Edward Elgar Publishing 2018) 57

<https://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=1855657&site=ehost-live> accessed 15 April 2020.

52 Pila and Torremans (n 50) 489; Caspers and Guibault (n 14) 16.

53 Lucie Guibault and others, Safe to Be Open - Study on the Protection of Research Data and Recommendations for Access and Usage (Universitätsverlag Göttingen 2013) 21 <http://eprints.gla.ac.uk/129335/1/129335.pdf> accessed 25 February 2020.

54 Pila and Torremans (n 50) 490. 55 Database Directive art. 7

56 C–30/14 Ryanair Ltd v PR Aviation BV EU:C:2015:10 (Ryanair v PR Aviation) para 22 57 Database Directive art. 1(2)

58 Hugenholtz (n 51) 59–60.

59 Data that has not been processed for use. 60 Database Directive art. 1(2)

61 Hugenholtz (n 51) 60.

(13)

research data they are based on.63 This can and has been used as an argument as to why TDM

shouldn’t be considered infringement – after all there can be no infringement if there is no protectable subject matter in the first place. Additionally it can be argued that during the TDM process the work(s) providing the input material is not used “as a work” – a public is not enjoying its expressive features (exploitative use) – rather data is transformed and processed (non-exploitative/non-consumptive use).64 Scholars have used this way of reasoning to

motivate that “the right to read is the right to mine”65, but this is where the EU differs from

for instance the US where TDM, because of its transformative nature, is deemed fair use66.

Due to the way copyright law has evolved in Europe67 and the broad interpretation68 applied

to the “all-inclusive” reproduction right69, it is likely that the making of copies as part of the

TDM process during the retrieval, selection and analysis of data would be considered copyright relevant acts and constitute infringement of input material enjoying protection under the InfoSoc Directive70 or the Database Directive71.72 Additionally, if the result is

published and the original input material quoted, there is a risk of infringement of the right of distribution73 (paper copies) or communication to the public74 (digital copies).75 It is worth

noting that the scope of the exclusive rights for databases enjoying copyright protection are limited by the fact that a lawful user may do what is necessary in order to get access and for the sake of normal use76 even if this, for example, involves making reproductions.77

Corresponding exclusive rights under the sui generis protection are extraction (i.e.

reproduction) and re-utilisation (i.e. distribution/communication to the public).78 The former

might be infringed during the retrieving stage, when creating a target dataset and during the

<https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3160586> accessed 6 February 2020; Margoni and Kretschmer (n 12).

63 Guibault and others (n 53) 21.

64 Geiger, Frosio and Bulayenko (n 62) 2; Margoni and Kretschmer (n 12); Hugenholtz (n 29).

65 Peter Murray-Rust, ‘The Right to Read Is the Right to Mine’ (Open Knowledge Foundation Blog, 1 June 2012) <https://blog.okfn.org/2012/06/01/the-right-to-read-is-the-right-to-mine/> accessed 25 February 2020. 66 Sag (n 40) 32.

67 Geiger, Frosio and Bulayenko (n 62) 2; Margoni and Kretschmer (n 12); Hugenholtz (n 29).

68 C–5/08 Infopaq International A/S v Danske Dagblades Forening [2009] ECR I-6569 (Infopaq 1) para 43

69 Hugenholtz (n 29). 70 InfoSoc art. 2

71 Database Directive art. 5(a) 72 Caspers and Guibault (n 14) 18.

73 InfoSoc art. 4, Database Directive art. 5(c) 74 InfoSoc art. 3, Database Directive art. 5(d) 75 Caspers and Guibault (n 14) 19–20. 76 Database Directive art. 6(1)

(14)

mining/analysis while the latter might be infringed at publication.79 It is worth noting that also

these exclusive rights are limited by the lawful user’s right to use insubstantial parts of the database for whatever purpose.80 Hence, the risk of infringement at publication is very

small81, but since entire works are often copied during the previous stages of the TDM

process, there is a high chance to reach a level of substantial parts.82 Additionally, repeated

and systematic extraction is not permitted if it conflicts with the normal use of a database or unreasonably prejudices the legitimate interests of the maker of the database.83

The paradox of copyright law is that reproduction is an exclusive right but generating some kind of reproduction has become essential for the act of using certain works, in particular online, such as browsing or searching the web.84 Applied to TDM this means that repeated

reproduction/extraction of works – which might seem a clearly copyright relevant act – is necessary to perform TDM and for the underlying purpose to extract information – i.e. the use is transformative, and the work is not treated as a work and the act should as such be

permissible.85 However, the CJEU’s formalistic interpretation of the law in the Infopaq cases

has given the rightholder the possibility to prevent this kind of purely technical copies – arguably with detrimental consequences for copyright’s incentive effects of creating new content.86

2.2 The DSM Directive and TDM

The fact that the EU legislator chose to handle mining of protected sources through an exception signals that they see TDM as infringement by default when it occurs outside the scope of a licence agreement and there is no applicable exception87 and there can no longer be

any doubt that in the eyes of the legislator the right to read is not the right to mine.88 The

DSM Directive fails to explicitly spell out that mining of subject matter which is neither

79 Caspers and Guibault (n 14) 21–22. 80 Database Directive art. 8(1) 81 Caspers and Guibault (n 14) 19–20. 82 ibid 21–22.

83 Database Directive art. 7(5) 84 Borghi and Karapapa (n 23) 51–52. 85 ibid 51.

86 European Copyright Society, ‘General Opinion on the EU Copyright Reform Package’ (24 January 2017) 5 <https://europeancopyrightsocietydotorg.files.wordpress.com/2015/12/ecs-opinion-on-eu-copyright-reform-def.pdf> accessed 19 February 2020.

(15)

protected by copyright nor the sui generis right is admissible – yet this must have been the intention.89 According to the recitals “Text and data mining can […] be carried out in relation

to mere facts or data that are not protected by copyright, and in such instances no

authorisation is required under copyright law…”90 (emphasis added) but since facts and data

as a general principle cannot generate copyright protection the legislator surely had facts and data contained in works which for one reason or other are unprotected. This could for

example be subject matter that don’t qualify for copyright or sui generis protection – such as raw data – or due to expiry of protection. Additionally, there is the reduced scope of exclusive rights in respect of the lawful user.

If the input material is protected by copyright or the sui generis right one needs to get authorisation or rely on an exception in order not to infringe any exclusive rights. However, trying to get authorisation is problematic for the purpose of TDM in the sense that you need a lot of data for the process and possibly from a lot of different sources. Obtaining authorisation for each and every source can be both time consuming and expensive and is likely to cause projects to strand. To further complicate the situation, it is not always clear who is the rightholder – this is particularly true for out-of-commerce works91 and when a database has

been created by public funding92. With the above in mind, being able to rely on an exception

is clearly to prefer.

The objective of the legislative process concerning TDM under the new directive was to maximise legal certainty and minimise clearance costs for researchers.93 These goals could

have been reached in a number of ways such as the introduction of new licensing practices, a reinterpretation of the reproduction right or an exception for TDM in copyright and database law.94 The legislator chose the latter by adding a new mandatory exception to the “list” of

existing copyright exceptions indicating a change of mindset in the sense that all previous exceptions, apart from the one for temporary reproductions, are optional. This can be interpreted as an intention to harmonise the rules on TDM and make it independent from 89 Geiger, Frosio and Bulayenko (n 5) 7–8.

90 DSM Directive Recital 9 91 Rosati (n 9) 8.

92 Guibault and others (n 53) 84. 93 Bottis and others (n 5) 183.

(16)

national borders. The exceptions for temporary reproductions95, teaching and research96,

quotation97, press98 and the above mentioned use of insubstantial parts99 are (depending on

implementation and scope of implementation) commonly suggested as providing some protection for TDM outside of the DSM Directive.

In addition to function across borders unrestrictedly, the ideal legislation should be able to accommodate similar technological processes in the future, and not only TDM as it works today100, but using a closed list is quite unequipped for that.101 Additionally, since it is

impossible to predict future technical advancements, the wording of the exception can turn out to be more restrictive than intended. It would have been more flexible to use an exception in the shape of an open norm102 similar to the “fair use” regime in the US, since it could stand

technological development and social change103 at the same time as it would apply the same

to all types of exclusive rights and users.104 It has been suggested that such an exception could

include uses not yet covered by an exception but which can be justified by an important public interest and/or fundamental human rights105 or it could allow use that doesn’t interfere

with markets for copyright protected works, i.e. use which doesn’t include expressive communication to a public.106 To comply with the EU’s obligations as a contracting party to

the WCT article 10107, the three-step test would continue to apply as a control mechanism to

make sure that a fair balance between rightholder and user is maintained.108 An even more

radical option would have been to – instead of an open norm – use a “recalibrated” version of the three-step test to allow case-by-case assessment and using the list of exceptions109 work as

a reference to identify situations which could be exempted if they pass the three-step test.110

The recalibration would be for courts to cease applying the test as a way to give existing 95 InfoSoc art. 5(1).

96 InfoSoc art. 5(3)(a), Database Directive art. 6(2)(b) and 9(b). 97 InfoSoc art. 5(3)(d).

98 InfoSoc art. 5(3)(c).

99 Database Directive art. 8(1). 100 Margoni and Kretschmer (n 12). 101 Geiger, Frosio and Bulayenko (n 62) 20.

102 Also referred to as general exception or opening clause

103 European Commission, ‘Expert Group Report’ (n 94) 66; Geiger, Frosio and Bulayenko (n 62) 20; European Copyright Society (n 86) 5; Bottis and others (n 5) 186.

104 Flynn and others (n 24) 7–8.

105 Geiger, Frosio and Bulayenko (n 62) 31. 106 Flynn and others (n 24) 7–8.

107 WIPO Copyright Treaty 1996 (WCT)

108 Martin Senftleben, ‘The Perfect Match: Civil Law Judges and Open-Ended Fair Use Provisions’ (2017) 33 American University International Law Review 231, 267.

109 InfoSoc art. 5(1)-(4)

(17)

limitations a narrow interpretation and to start using it to sanction new.111 In order to improve

the balance between rightholder and user a possible addition would be a fourth “step”

safeguarding the legitimate interest of third parties.112 The introduction of an open norm was

however not even mentioned as one of the options presented in the European Commission’s Impact Assessment113; they were all variations of a new exception except the alternative of

doing nothing. Scholars have been calling for the introduction of a general open norm in EU copyright law, but there is a longstanding concern that it would not fit the traditions of civil law countries (which make up the majority of Member States) and that civil law judges don’t have the necessary experience to handle open-ended defences114, which could possibly

explain why the EU legislator chose the more traditional option of adding two new exceptions to the list in order to accommodate for TDM.

The Member States have until the 7th June 2021 to implement an exception for TDM as

required by the DSM Directive115 but those that already apply a broader one concerning uses

or fields covered by the DSM directive may maintain such exceptions if compatible with the Database or the InfoSoc Directive. Interestingly, they may also adopt new broader exceptions under same conditions.116 Hence, the Member States that have already implemented an

exception for TDM under the exception for teaching and scientific research may continue to apply it and Member States wishing to pave the way as much as possible for TDM can supplement the DSM exceptions with national law as long as they derive from the exhaustive117 list of exceptions in the InfoSoc or Database directives. Additionally, such

exceptions need also to conform with the three-step test118 to safeguard a fair balance between

rightholders and users119. During the drafting process of the DSM Directive, the EU legislator

rejected more encompassing TDM exceptions with the motivation that they would have a “negative impact on copyright as a fundamental right” and have a significant negative impact on the licensing market and rightholders’ revenues.120 Naturally, it is the final version of the

111 Senftleben (n 108) 272.

112 European Commission, ‘Expert Group Report’ (n 94) 56.

113 European Commission, ‘Commission Staff Working Document - Impact Assessment on the Modernisation of EU Copyright Rules Part 1/3’ (2016) Text SWD(2016) 301 final 107–109 <https://ec.europa.eu/digital-single-market/en/news/impact-assessment-modernisation-eu-copyright-rules> accessed 24 February 2020.

114 Senftleben (n 108) 231–232. 115 DSM Directive art. 29 116 DSM Directive art. 25 117 InfoSoc Recital 32

118 DSM Directive art. 7(2), InfoSoc art. 5(5) 119 InfoSoc Recital 31

(18)

directive that decides what is permissible and not, but maybe the reasoning of the legislator could give a hint of how a rightholder could argue before the CJEU to question the lawfulness of a broad national implementation. If nevertheless a balance is struck, this provision certainly opens up for fragmentation between Member States and the sought harmonisation will suffer. DSM takes the role as a minimum exception and, as is their habit, the EU legislator has drafted very general and formable provisions and clearly stated that exceptions related to TDM may be introduced under other directives. On the one hand it’s laudable that the Member States that wish to support this type of research have been given the discretion to create more generous national exceptions for TDM, but on the other it might prove to be catastrophic in view of reaching the goal of harmonisation enabling cross-border activity.

Onwards DSM Article 3 will be referred to as “the scientific research exception” and Article 4 as “the general exception”. The latter is sometimes referred to as “the commercial

exception”, but I find that misleading since there is room for some commercial intent under Article 3 and vice versa.

The Scientific Research Exception under the DSM Directive Article 3

Article 3 of the DSM Directive provides an exception for research organisations to carry out TDM – including storage of the copies produced in the process – for the purpose of scientific research. The exception only applies where the user already has lawful access to the work or other subject matter, but if they do the rightholder is prevented from contracting out121.

Licence terms excluding TDM is thereby null and void.122 As opposed to the general

exception which applies to the InfoSoc Directive, Database Directive and Software

Directive123, the scientific research exception only makes an explicit reference to the former

two.124 It is unclear whether this is an oversight by the legislator and if a reference to the

Software Directive is in fact necessary since software is considered literary works in copyright law.125

121 DSM Directive art. 7(1) 122 Hugenholtz (n 29).

123 Directive 2009/24/EC of the European Parliament and of the Council of 23 April 2009 on the legal protection of computer programs (Software Directive) [2009] OJ L 111, 5.5.2009, p. 16–22

124 DSM Direcrtive art. 4(1) and 3(1)

(19)

The exception was inspired by and partially overlap126 the InfoSoc Directive’s teaching and

scientific research exception127 which continues to govern all use for the purpose of scientific

research besides TDM128. The InfoSoc exception encompasses any reproduction for the

purpose of teaching or scientific research, hence simply making it mandatory would get effects reaching far longer than TDM. More importantly, input material can enjoy both copyright and sui generis protection and while the exception is available for both it is with slight variations: For the copyright exception to apply the sole purpose has to be scientific research without commercial purpose, hence any type of project with a commercial gain, including private-public cooperation, is excluded. Unlike the copyright exception, the corresponding provision under the sui generis protection129 lacks “sole” which indicates that

projects with only partly scientific research as its purpose could use the exception. Hence, the sui generis protection could potentially cover the entire TDM process when performed for that purpose. A hurdle with the sui generis protection on the other hand, is that it requires the source has to be named, which is stricter than the copyright exception, and problematic for TDM since the process requires a lot of sources.130 Finally, the earlier exceptions are optional

and whether they have been implemented or not as well as scope varies a lot between Member States131 and the mandatory status of the scientific research exception for TDM is a step

closer to removing this barrier. In addition it opens up for public-private cooperation through the definition of a research organisation and it explicitly handles storage of the input material.

The General Exception under the DSM Directive Article 4

In the first drafts of the directive there was only a TDM exception for scientific research and the exception for other purposes was only added later and remained optional until the final law proposal, where it received mandatory status.132 The general exception applies to TDM of

lawfully accessed works, disregarding the beneficiary’s purpose (other than performing TDM).133 As such, it addresses a wider group of potential beneficiaries and purposes

compared to the scientific research exception, but it provides considerably weaker protection since it has been subjected to the rightholder’s express reservation of the right to make

126 Bottis and others (n 5) 185. 127 InfoSoc art. 5(3)(a) 128 DSM Directive Recital 15 129 Database Directive art. 9(b) 130 Caspers and Guibault (n 14) 32. 131 ibid 30.

(20)

reproductions and extractions for the purpose of TDM134.135 This can be achieved both by

contract as well as technical means, meaning that the rightholder may actively prevent TDM without a scientific purpose through, for example, contract, machine readable means or terms and conditions on their website.136 Copies generated during the mining process may only be

kept as long as necessary for the purposes of TDM137, which is reminiscent of the temporary

reproduction exception, both in wording and content.138 Indeed, it has been suggested that by

exchanging “temporary” for “intermediary” copy to identify a stage in a process rather than to put a timely limitation on it, the temporary reproduction exception could encompass TDM until the stage of publishing. Intermediary copies should in this setting be understood as a copy that doesn’t have any independent significance in itself but is part of a technological process.139 However, the DSM recitals140 state that the exception for temporary reproductions

continues to apply to TDM activity falling within said exception’s scope which acknowledges that both of the TDM exceptions are intended to go beyond and not interfere with the

temporary reproduction exception.

134 DSM Directive art. 4(3) 135 Hugenholtz (n 29).

136 ibid; Rossana Ducato and Alain Strowel, ‘Limitations to Text and Data Mining and Consumer

Empowerment: Making the Case for a Right to Machine Legibility’ (2018) 2018 CRIDES Working Paper Series 45, 17.

(21)

3. WHY AND HOW MIGHT COPYRIGHT BE AN OBSTACLE FOR TDM This chapter aims to explain how copyright despite recent additions still might create barriers for the performance of TDM. It starts with a brief note on how legal uncertainty negatively effects the development of the digital single market and the EU’s competitiveness towards for instance the US. It proceeds by discussing various aspects where EU copyright law may hinder the efficient performance of TDM using the new DSM Directive as a starting point but with reference to the InfoSoc and Database directives where relevant.

TDM is part of the EU data economy together with for instance smart manufacturing and the internet of things141 and in 2015 the value of the EU data economy alone was 1,87% of EU

GDP, which was an increase by 5,6% since only the year before.142 The EU legislator has

recognised that by not accommodating for TDM the digital single market risks competitive disadvantages towards more accepting regimes; acquiring licences is burdensome and there is a risk that EU entities therefore would choose to perform their TDM activity outside of the Union, that companies based outside the Union would prefer not to invest in companies within the EU and finally, that talented European researchers seek occupation elsewhere (where they would be less restricted).143 Researchers are reluctant to test the law, including

when there is an exception if it is not well defined.144 Before the introduction of the DSM

Directive, TDM was associated with a lot of legal uncertainty; it was not clear whether TDM required authorisation from the rightsholder, could benefit from an exception or whether the intended use would constitute infringement of the rightholder’s exclusive rights at all.145 The

DSM Directive aims to provide legal certainty in the digital environment and cross-border situations in general and regarding when TDM acts might infringe copyright or the sui generis database right in particular.146

141 Hugenholtz (n 51) 51.

142 European Commission, ‘Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions “Building A European Data Economy”’ (2017) COM(2017) 9 final 2

<https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=COM%3A2017%3A9%3AFIN> accessed 28 May 2020.

143 Pamela Samuelson, ‘The EU’s Controversial Digital Single Market Directive - Part II: Why the Proposed Mandatory Text- and Data-Mining Exception Is Too Restrictive - Kluwer Copyright Blog’ (Kluwer Copyright Blog, 12 July 2018) <http://copyrightblog.kluweriplaw.com/2018/07/12/eus-controversial-digital-single-market-directive-part-ii-proposed-mandatory-text-data-mining-exception-restrictive/> accessed 14 February 2020. 144 Brook, Murray-Rust and Oppenheim (n 17) 3.

145 Rosati (n 9) 6.

(22)

3.1 Beneficiaries

As mentioned above anyone with lawful access can benefit from the general exception, but as shall be further elaborated below, the scientific research exception provides a stronger

defence for users and it is therefore desirable to try to become eligible for the latter. For a project to be able to benefit from the scientific research exception it needs to have a scientific research purpose and the beneficiary needs to be part of and pursue the project within a research organisation.147 This is sometimes referred to as the “double limitation”. It follows

from the recitals that a scientific research purpose should be understood to include both the natural and human sciences148 implying a diverse range of possible application areas, which is

welcome if the aim is to permit use of TDM in as many situations as possible, but at the same time the vague formulation opens up for legal uncertainty: How does one know if one is approaching the border of natural or human sciences and does it mean that the purpose needs to be solely scientific as opposed to mainly educational but partly scientific?149 Sometimes

scientific discoveries are made “stumbling across” something and TDM is a technique apt for finding connections or patterns in unexpected places. In that sense law demanding a clear and precise research plan to be able to use the exception to apply would restrict the potential of TDM.

As for the second limitation of the applicability of the exception, the directive defines “research organisations” based on their scope and structure. First, they need to have been established for the primary goal of conducting scientific research (possibly together with educational services). Second their work needs to be either not-for-profit or as part of a public-interest mission recognised by the state.150 Such recognition can be reflected through

public funding, provisions in national laws or through public contracts. Universities and their libraries as well as hospitals which perform research are offered as examples. Finally, a negative example is provided: Organisations over which a commercial undertaking has a decisive influence, which could result in better rights/priority to access the results, falls outside the scope of the definition.151 It is not entirely clear if this example is supposed to

highlight an organisation which fails to meet the not-for-profit or the public-interest

147 DSM Directive art. 3(1) 148 DSM Directive Recital 12

149 Geiger, Frosio and Bulayenko (n 5) 33; Geiger, Frosio and Bulayenko (n 62) 28. 150 DSM Directive art. 2(1)

(23)

requirement or both but it is tempting to use it as an example of a not not-for-profit

organisation to get an indication where the legislator intends the line to be drawn. Neither is it clear how whether donations from private undertakings may change an entity’s status as a research organisation. The definition provided by the directive allows Member States to define research organisations in accordance with national law, so the question arises what would be the outcome if an entity in a state with very including rules wants data from a state applying a more restrictive definition? Scholars have expressed concern regarding the

definition of a research organisation due to its use of vague legal terms which are not defined, and which they fear will result in a lot of work for the CJEU.152 As an example, the research

for a vaccine will likely meet the scientific research purpose requirement and it follows quite clearly that a medical institution of a university researching a vaccine constitutes a research organisation and can benefit from the scientific research exception. A scientist on secondment to a pharmaceutical seem unlikely to meet the requirements due to the pharmaceutical’s strong underlying purpose of maximising income, unless it can be shown that the work is done within a public-private partnership. As indicated above the latter depends on how big an influence the pharmaceutical has on the project. An independent researcher on the other hand, fall quite clearly outside the scope of the exception, since they don’t belong to an organisation – even if the same work could have benefited from the exception had it been performed for instance at a university or in a hospital. Hence, it is the “who” and not the purpose of what they intend to do that puts most restrictions on the scope of beneficiaries and calling it

“exception for research organisations” would have corresponded better with the content of the provision since “scientific research” promises a much wider scope than “research

organisations”. (However, naturally the headings need not to affect the content of national implementation.)

The exception as adopted in the Directive is limited to not-for-profit scientific research and scientific research with a public interest objective153 but an alternative to limiting

beneficiaries to research organisations would have been to limit them to non-commercial entities (an option which was considered by the Commission). However, restrictions on

152 Max Planck Institute for Innovation and Competition, ‘Position Statement of the Max Planck Institute for Innovation and Competition on the Proposed Modernisation of European Copyright Rules: PART B Exceptions and Limitations: Chapter 1 Text and Data Mining’ (2017) 4

<https://www.ip.mpg.de/fileadmin/ipmpg/content/stellungnahmen/MPI_Position_Statement_Part_B_Chapter_1_ Update23022017.pdf> accessed 22 April 2020.

(24)

commerciality would be counterproductive in the sense that a lot of discoveries valuable for the society are made by commercial entities or with a commercial interest. The above

mentioned BlueDot project was executed by a commercial entity, the Hathi Trust154 copies by

a public-private partnership, journalism is often driven by private companies and research projects – in particular for medical treatments – are commercialised after the project.155 On a

practical level it can also be difficult to distinguish between commercial and non-commercial in public-private partnerships.156 The scientific research exception has an expansive scope in

that it includes non-commercial users as well as the work that commercial organisations do as part of a public-interest mission.157 Rather than excluding commercial entities the EU

legislator has chosen to define who should be eligible for the exception – and in doing so they avoided the problems associated with interpreting “commercial”. However, like with a non-commercial requirement, a lot of possible contributors to societal gain are still excluded – only covering research organisations means that start-ups, individual researchers and journalism fall outside the scope of beneficiaries158 – and it has been argued that the double

limitation therefore comes very close to a limitation to non-commercial purposes.159 The

reason beneficiaries were limited to research organisations in the first place was to

compensate rightholders for the lack of a non-commercial requirement160 and the rationale

was that others should pay for a licence to mine161, leaving the purely commercial TDM

market untouched162.163 While it is possible to understand why larger corporations could be

required to pay, limiting the TDM exception to non-commercial scientific research would have had an unfavourable outcome for unaffiliated researchers such as independent data scientists and think-tank personnel.164 It has even been argued that not letting SMEs benefit

154 A not-for-profit digital collaboration between academic and research libraries. https://www.hathitrust.org/ Accessed 28 May 2020.

155 Flynn and others (n 24) 9–10. 156 Bottis and others (n 5) 180.

157 Geiger, Frosio and Bulayenko (n 62) 27; Bottis and others (n 5) 180. 158 Geiger, Frosio and Bulayenko (n 62) 27–28.

159 Margoni and Kretschmer (n 12).

160 European Commission, ‘Assessment of the Impact of the European Copyright Framework on Digitally Supported Education and Training Practices: Final Report’ (Publications Office of the European Union 2016) 116 <http://op.europa.eu/en/publication-detail/-/publication/1ba3488e-1d01-4055-b49c-fdb35f3babc8> accessed 26 May 2020.

161 Margoni and Kretschmer (n 12).

162 European Commission, ‘Impact Assessment of the European Copyright Framework on Digitally Supported Education and Training Practices’ (n 160) 116–117.

163 This refers to the market for value added licences (including for instance normalisation of data) common in particular in life science and pharmaceuticals.

(25)

from the exception is against the freedom of expression and information165 as well as the

freedom to conduct a business166.167 Possibly this may be a result of the fact that in the Impact

Assessment stakeholders were divided into researchers, corporate research users and rightholders. That means that everyone who might use TDM and who wasn’t a researcher were grouped together creating a very diverse group and I am very doubtful that it was possible to summarise their various needs and possibilities in a fair summary. Hence, while the statement that corporate research users had “generally not asked EU intervention” and were deemed to have different needs than research organisations168 may apply to corporate

research, this does not mean the entire group benefited from that. The definition of “research organisation” – despite trying to be very general and inclusive – to me reveals a very

traditional view of where scientific research is conducted, by whom and for what purpose and contrary to the objective of promoting innovation in scientific research.

An additional drawback for scientific innovation in Europe is that both Japan and the US apply exceptions that are not limited to non-profit scientific research hence the EU Digital Single Market could suffer a competitive disadvantage by applying a narrow exception.169

The general exception is, as we shall see later, not as strong. Finally, TDM is more common in the commercial sector and the insurmountable obstacle of obtaining licences from

rightsholders is not smaller for commercial entities than for research organisations170 – hence

it is a group that should not be omitted.

The above objectives could easily be achieved by extending the group of beneficiaries to everyone with lawful access – as is the current practice in the UK.171 Introducing such a

criterion ought not to be so controversial, in particular in comparison to other IP law. Similar provisions can be found in the right to use information reached by reverse engineering in

165 Charter of Fundamental Rights of the European Union [2012] (Fundamental Rights Charter) OJ C 326, 26.10.2012, p. 391–407 art. 11

166 Fundamental Rights Charter art. 16 167 Margoni and Kretschmer (n 12).

168 European Commission, ‘Impact Assessment of the European Copyright Framework on Digitally Supported Education and Training Practices’ (n 160) 116.

169 Samuelson (n 143).

170 Eleonora Rosati, ‘EU Text and Data Mining Exception for the Few: Would It Make Sense?’ (2018) 13 Journal of Intellectual Property Law & Practice 429, 429–430.

(26)

trade secret law as well as the exception for experimental use172 for the purpose of increasing

scientific knowledge in patent law.173 It can even be said that increase of scientific knowledge

through experimental use is encouraged through the publication of detailed descriptions of patented inventions174. Similarly, the Software Directive makes an exception “to observe,

study or test the functioning of the program in order to determine the ideas and principles which underlie any element of the program”.175 Hence, just like the rightholder who allows

others to use their software have to assume that the users might make reproductions for said reason, the rightholder of copyright protected works could be expected to assume that someone with lawful access might perform TDM.176 Copyright need to continue to make

room for follow-on creativity and the exercise of fundamental rights.177 Research is firmly

justified by the Fundamental Rights through the freedom of information and safeguarded through limitations in the scope of the exclusive rights and exceptions and limitations.178

3.2 Lawful Access

Common for both of the TDM exceptions in the DSM Directive is that the user is required to have lawful access to the work, which according to the recitals includes open access, licence and works freely available online.179 These examples alone are in my opinion too detailed to

give users a satisfactory definition to rely on for legal certainty. An example which is not handled by the directive is for instance whether lawful access to a public library is sufficient to mine material submitted to the library as legal deposits.180 Another situation which is

unclear under the DSM Directive is sending copies across borders, which is important for cooperation and validation, and this applies even where both of the concerned nations permit TDM – for example it does not appear from the Directive whether a researcher in the EU may

172 76/76/EEC: Convention for the European patent for the common market (CPC 1975) [1975] OJ L 17, 26.1.1976, p. 1–28 art. 31(b), Agreement on a Unified Patent Court Agreement 2013 [2013] (UPC Agreement) OJ C 175/1 art. 27(b)

173 Flynn and others (n 24) 5-6 supra note 16. 174 Pila and Torremans (n 50) 194.

175 Software Directive art. 5(3)

176 Max Planck Institute for Innovation and Competition (n 152) 6–7. 177 Flynn and Quintais (n 26).

178 Flynn and others (n 24) 5.

179 DSM Directive art. 3(1) and 4(1) and Recital 14

(27)

transfer a lawfully acquired database to a research partner in the US for mining.181 Another

unclarity is the relationship (if any) between “lawful access” and the concept of “lawful use” from the temporary reproduction exception182 – meaning use either authorised by the

rightholder or not restricted by law183 – is unclear. If the concepts are intended to have the

same meaning the same phrase should have been used.184

The Directive has been criticised because subjecting TDM to lawful access makes it possible for publishers (or other rightholders) to prevent beneficiaries from performing TDM by not granting them licence or increasing the price. Additionally, there is a fear that the licensing prices may go up if publishers routinely start adding a fee for TDM, not least since it risks creating a gap between rich and poor research institutions and/or between Member States as regards innovation.185 Scholars argue that if a use does not harm a market, it should not matter

if the source has been accessed in a lawful manner or not from the perspective of fairness towards exclusive rights.186 I would argue that the act of unlawfully accessing a work harms

the market of the source. To make a harsh comparison: It is not illegal to read a stolen book, but it is illegal to steal it. On the other hand, it can be argued that purchase of a licence was never a feasible option due to high prices and that the market therefore was not harmed, but this is a larger discussion on the boundaries of this thesis, so I will leave the question open. The lawful access requirement is intended to shield “private actors from an obligation to open up their data to third parties”187 and it secures payment to the rightholders, which supposedly

strikes a balance between them and users of a wide TDM exception. In other words:

rightholders are compensated for the TDM exception through the lawful access requirement since it secures payment for access.188 This should also be seen in the light that at least for the

scientific research exception, rightholders are not allowed to exclude TDM from licence terms.

181 Flynn and others (n 24) 10. 182 InfoSoc art. 5(1)

183 InfoSoc Recital 33

184 Geiger, Frosio and Bulayenko (n 62) 32.

185 ibid 30; Geiger, Frosio and Bulayenko (n 5) 33–34; European Copyright Society (n 86) 4. 186 Flynn and others (n 24) 10.

187 Geiger, Frosio and Bulayenko (n 5) 33 supra note 138; European Commission, ‘Expert Group Report’ (n 94) 58.

(28)

Looking at how some Member States had already implemented a TDM exception before the DSM Directive, it is difficult to tell with certainty how lawfulness has been handled without making a closer study than this thesis allows, but the UK189 and France both require lawful

access, the Estonian Copyright Act requires attribution – which could indirectly indicate a lawful access requirement – while Germany’s copyright law is silent on the matter190. Outside

of Europe it seems national copyright law tend to focus on what you intend to do with the data, which purpose you have and whether that purpose is commercial or not.191 The

American fair use doctrine is an example of this, but it should be noted that the lawfulness of unlicensed TDM has not yet been expressly ruled on.192

TDM is an important tool for research and as such it has therefore been argued that it lies in the public interest not to give rightholders control over its use.193 Arguably it is essential to

draft an exception that does not indirectly give the rightholder too much power forcing researchers to revert to sites dedicated to circumvent paywalls such as Sci-Hub and

LibGen194, but on the other hand it should not be forgotten that exceptions does not generally

grant access, but grants certain use(s) for defined purposes.

Contractual override

Rightholders commonly use contractual provisions to limit access to and use of their works by excluding TDM from permitted uses under a licence or making it subject to additional

payment. To guarantee TDM and make an exception truly efficient it should not be overridable by contact195 and contractual provisions contrary to the scientific research

exception have been made null and void196, or in plain English, if you have a licence to access

content, any provision saying that TDM is excluded from the allowed uses under the licence

189 Albeit, no longer a Member State.

190 Caspers and Guibault (n 14) 29–30; Geiger, Frosio and Bulayenko (n 5) 24–26.

191 European Commission, ‘Expert Group Report’ (n 94) Section 4.1 ‘TDM outside Europe’. 192 Rosati (n 9) 15.

193 European Copyright Society (n 86) 5.

194 Balázs Bodó, ‘The Science of Piracy, the Piracy of Science. Who Are the Science Pirates and Where Do They Come from: Part 1’ (Kluwer Copyright Blog, 6 March 2019)

<http://copyrightblog.kluweriplaw.com/2019/03/06/the-science-of-piracy-the-piracy-of-science-who-are-the-science-pirates-and-where-do-they-come-from-part-1/> accessed 4 May 2020.

195 Iain Hargreaves, ‘Digital Opportunity - Review of Intellectual Property and Growth’ (Department for Business, Innovation & Skills 2011) Independent Report 11/968 47

<https://www.gov.uk/government/publications/digital-opportunity-review-of-intellectual-property-and-growth> accessed 24 April 2020.

(29)

is unenforceable.197 The general exception, however, can as mentioned above only be used as

long as the rightholder has not expressly reserved use of the work in an appropriate manner198

by for instance expressing it in the terms and conditions of a website, through contractual agreements or by machine-readable means199. This would create problems for an independent

researcher needing access to a source originating from a major publishing house if the licence agreement excludes TDM. A researcher affiliated with a research institution could use the scientific research exception as defence for performing TDM, while an individual researcher or journalist would have to try to get an extended licence by paying more. However, for mining of sources neither protected by copyright nor the sui generis right the power of

contract is a general problem.200 The CJEU made it clear in Ryanair v PR Aviation that even if

a the definition of a database in the Database Directive201 is met, the provisions concerning

the copyright or sui generis protection does not apply if the database fails to meet the conditions202 for application of protection.203 In practice this means that applicability of the

directive can’t hinder the rightholder of unprotected work from limiting use through

contract204 as long as they are in compliance with other provisions in national law.205 As an

example, an online database consisting of automatically stored records of historical sales on the stock market would meet the requirement of a database if sorted in a chronological order, but fails to meet the originality as well as the substantial investment criteria and would be ineligible for protection. As an effect none of the TDM exceptions would provide a defence for mining the website if its terms and conditions exclude TDM from permitted uses – disregarding whether the site is otherwise publicly available. In essence this means that the user of a protected database has more extensive possibilities to perform TDM, than the user of an unprotected database.206 The situation has not changed since the introduction of the DSM

Directive and so to mine such data as in the above example would require rightholder authorisation (likely in exchange for compensation) for lawful access.

197 Hugenholtz (n 29); Geiger, Frosio and Bulayenko (n 62) 27. 198 DSM Directive art. 4(3)

199 DSM Directive Recital 18 200 Ducato and Strowel (n 136) 19. 201 Database Directive art. 1(2)

202 Database Directive art. 3(1) “the author’s own intellectual creation” and art. 7 “substantial investment” 203 C–30/14 Ryanair v PR Aviation para 35

204 C–30/14 Ryanair v PR Aviation para 39 205 Rosati (n 9) 12.

References

Related documents

in The Exception for Text and Data Mining (TDM) in the Proposed Directive on Copyright in the Digital Single Market - Technical Aspects, Dr Eleonora Rosati, Policy Department

With his concept in mind, we have through thematization, planning, interviewing, data reduction, analysis, verification and final reporting (1997; 85, 213, 214) been

This evaluation of EU pesticides law will be limited to the perspective provided by adap- tive law theory, which includes a wide range of aspects considered to be important for

It is also worth mentioning that if IKEA uses a decentralized distribution strategy in India, the number of trucking companies that could fulfill IKEA’s transport needs

Despite the fact that access to abortion is not protected directly by human rights treaties, access to safe and legal abortion is clearly a prioritised issue on the

The approach uses data rate adaptation from the output side. We use algorithm out- put granularity to preserve the limited memory size according to the incoming data

In this sense, the convention can also be described as trying to fill two legal gaps, the first one being the mandate to act in the exclusive economic zone and the second one being

99 A fair balance must be made according to Recitals 3 and 31 between the interests of copyright holders and related rights to protect their intellectual property