Skip to main content

The new exceptions to IPRs for Text and Data Mining in the Copyright Directive

by Caterina Bo

TDM and Big data

Text and data mining (TDM) is a software procedure that enables the extraction of implicit information from large amounts of text and data. With specific regard to data mining, it is also often the only way to make data collected in aggregate by digital devices ‘readable’ and usable by a human being.

The need to develop techniques to speed up the classic process of analysis and synthesis that leads to the discovery of new information derives from the exponential increase in available data, which has led to the coining of the expression ‘big data’. Big data, in fact, does not only mean that there are a lot of data, but that their volume, variety and speed of accumulation is such that traditional analysis techniques are not able to extract useful information in the time it is necessary or desirable to obtain it.

TDM is thus the use of human-trained algorithms to make sense of otherwise inaccessible data (e.g. data stored by IoT devices) or to highlight possible correlations, recurrences or still implicit meanings by comparing numerous databases or human-made texts (think, for instance, of customer reviews of sponsored products on e-commerce platforms and the information that can be drawn from them in relation to potential improvements of the products or the delivery service).

TDM and Intellectual Property

Initially developed for military intelligence, data mining algorithms have been refined and adapted to different types of databases as data warehouses[1] and the internet have become more widespread.

However, users of this technology, primarily researchers and data scientists, but increasingly also companies and commercial entities, have faced the problem of possible conflict with the copyright on the creative works being extracted and their creative arrangement, as well as with the sui generis right on databases provided for by European and national legislation.

Although data as such cannot be protected or claimed by any particular party (copyright protects the expressive form of a work), in practice, in order to operate, TDM algorithms need, firstly, to access protected materials and, secondly, to copy, elaborate and transfer them onto different media from those on which they were originally contained and make them accessible to a vast number of users, especially if they operate in cloud computing and use computer platforms that are potentially accessible to an unlimited number of users.

Such operations are in theory capable of constituting infringements of reproduction[2], elaboration[3] and communication to the public rights[4] on copyrighted works, as well as infringement of the sui generis right on databases that required substantial investments in terms of quantity or quality[5], specifically designed to ensure adequate remuneration for those who have spent time and resources systematically organising and making accessible large data sets[6].

The risk of litigation related to the use of TDM on protected works or databases is therefore not trivial, especially since the case law of the Court of Justice has historically interpreted the exclusive right of reproduction in a strict manner (including also copies not intended for “human” use but which technically provide for the reproduction of the characters of the work) in order to ensure the widest possible protection for authors.[7]

On the other hand, this risk is not eliminated by the existence of the exceptions to copyright set out in the relevant European directives, including in particular the exception for acts of temporary reproduction[8], scientific research[9] and insubstantial data extraction[10]. First of all, because the adoption of many of these exceptions has been left to the discretion of the Member States and therefore the legislative landscape is fragmented and inconsistent, and secondly because their applicability to the different phases of reproduction and processing of data carried out by TDM algorithms are difficult to trace unambiguously to their scope of operation.

These problems can, of course, be solved by direct negotiation with the rightsholders, but such negotiations are clearly time-consuming and costly, and therefore increase the transaction costs considerably, discouraging TDM projects.

The Copyright Directive and the new exceptions for text and data extraction

It is in this context of uncertainty that the so-called Copyright Directive, which entered into force on 13 June 2019, comes into play.

One of the objectives explicitly stated in the recitals of the directive is in fact the harmonisation of Member States’ copyright laws, with the specific aim of fostering the development of the European digital single market and the cross-border use of digital content.

In order to achieve this objective, two new exceptions were introduced to clarify the cases and conditions under which text and data mining operations are lawful, while making a fundamental distinction between entities dedicated to the research and protection of cultural heritage and entities pursuing a profit-making purpose.

Both exceptions also assume that TDM actors have had lawful access to the works and databases they are extracting information from, which in essence means making a first distinction between those who have the resources to negotiate licensing agreements and those who do not.

In particular, Article 3 of the Directive provides for a mandatory exception in the event that the TDM is carried out for scientific research purposes by research or cultural heritage protection organisations (provided that a profit-making entity does not exercise a decisive influence over them), which may therefore freely reproduce works protected by copyright and related rights in the context of such operations, as well as extract and re-use the contents of databases protected by the sui generis right. Moreover, the same provision also excludes research organisations from the application of the controversial Article 15 of the directive, which grants publishers the right to be remunerated for the online use of their publications of a journalistic nature.

Article 4, on the other hand, provides for a general exception allowing any person the free use of protected works and content and databases for TDM purposes as long as he or she has lawful access to them, unless this is properly excluded by the relevant right-holders.

The new Articles 70-ter and 70-quater of the Italian Copyright Law

In Italy, the Copyright Directive was implemented by Legislative Decree no. 177 of 8 November 2021, which introduced the new Articles 70 ter and 70 quater into the Italian Copyright Law, which essentially reproduce the text of Articles 3 and 4 of the Directive and include the definitions of the entities entitled to benefit from the exceptions.

The extent of the impact that such lawful uses of third-party works and databases will have on the (public and private) research landscape will of course have to be verified at a greater distance from their adoption; however, it is already possible to outline the problems that those who intend to use them will face.

First of all, we note that, in balancing the various interests at stake, the European legislator has decided to make a decisive choice in favour of scientific research carried out by entities financed mainly with public money or that are non-profit-making, with a view to favouring those TDM projects whose results should (at least theoretically) benefit the entire community. Only in relation to such entities, in fact, the exception cannot be derogated contractually.

Moreover, despite the definitional effort made by the Directive, uncertainties remain as to the attributes that would make research conducted through data mining ‘scientific’, just as it may not always be easy to determine whether or not a non-profit organisation receiving sponsorship from a commercial enterprise is the subject of a determining influence. Researchers who do not belong to a recognised organisation are also totally excluded from the scope of the exception.

In addition, the directive does not protect a fundamental aspect of the research process, i.e. the validation of the results through peer review. Clearly, in order to be able to check the coherence of the information resulting from the TDM research, it is necessary to have access to the same set of texts and data, but this will only be possible in practice if the reviewer has also obtained access independently, since the exception does not allow the communication of the researched datasets to third parties.

As regards, instead, the general exception provided for by Article 70-quater, it must be acknowledged the positive effort made by the European legislator to eliminate the pre-existing situation of legal uncertainty concerning the lawfulness or otherwise of automated extraction operations. The inertia of the rights holders in expressing themselves on the matter is in fact overcome, allowing greater certainty in determining the situations of lawfulness of the use of the mining techniques.

However, it remains doubtful whether in practice this exception would result in little increase in data processing activity by – even latu sensu – commercial entities, since the opt-out clause in the Directive allows right holders to reserve the reproduction of their works and databases to themselves.

Ultimately, although the new exceptions introduced in the copyright system contribute positively to providing a more certain basis for the adoption of text and data mining techniques, and consequently an incentive for research activity, at first glance there is certainly the risk that the area of permissibility envisaged is still too restricted for a substantial benefit for European scientific and technological innovation to be achieved.

In this respect, perhaps a starting point for improvement may come from the actual application of these exceptions in practice and in particular by the courts called upon to judge them, in the hope that they will be able to seize the opportunity to guarantee the widest possible scope for operation.


[2] Article 2 of Directive 2001/29/EC of 22 May 2001 and article 13 L. 633/1941 (Italian Copyright Law).

[3] Article 5 of Directive 2001/29/EC of 22 May 2001 and articles 4 and 18 L. 633/1941 (Italian Copyright Law).

[4] Article 3.1 of Directive 2001/29/EC of 22 May 2001 and articles 15, 15-bis, 16 and 16-bis L. 633/1941 (Italian Copyright Law).

[5] Article 7 of Directive 96/9/EC of 11 March 1996, transposed by Legislative Decree n. 169/99, which introduced new articles 102-bis and 102-ter into Law 633/1941 (Copyright Law).

[6] For a definition of the constituent elements of a database, see Advocate General Stix-Hackl’s opinion in the Fixtures Marketing case:

[7] For an in-depth analysis of European harmonisation of copyright in the light of European case law:

[8] Article 5.1 of Directive 2001/29/EC of 22 May 2001.

[9] Article 5.3 (a) of Directive 2001/29/EC of 22 May 2001, articles 6.2 (b) and 9 (b) of Directive 96/9/EC of 11 March 1996.

[10] Article 8.1 of Directive 96/9/EC of 11 March 1996.


Caterina Bo