Data and copyright - newtech.law

Is copyright a path to take to protect data? Can data be regarded as a result of creativity and, consequently, a protected work? Does the protection of a data filing system also include the data collected in it?

Non-personal data, in particular data collected or generated by machines, has great economic and scientific value. Increasingly, it is a key business asset, providing the basis for launching new goods and services. It helps improve methods of detecting and treating diseases, determine where to set up a wind farm or where to cut down trees so they do not interfere with power lines, and helps ease traffic congestion in cities.

Do entities collecting such data or creating algorithms that harvest data, e.g. from the internet, have rights to the data? Can such data be freely traded, e.g. sold or licensed?

This issue raises many unanswered questions. This article will address whether data is protected by copyright and, as a result, whether the data “owner” can use the instruments provided by the Polish Copyright Act to protect it against infringement by third parties.

What is a “work”?

International agreements indicate that copyright protection covers literary, scientific and artistic works, whatever their mode or form of expression (Art. 2 of the Berne Convention) and that copyright protection applies only to expressions and not to ideas, procedures, methods of operation or mathematical concepts (Art. 9(2) of the Agreement on Trade-Related Aspects of Intellectual Property Rights and Art. 2 of the WIPO Copyright Treaty).

A similar provision is found in the Polish Act on Copyright and Related Rights. It states that a work is any manifestation of creative activity of individual character, fixed in any form, regardless of the value, purpose and mode of expression, with protection only for the mode of expression and not for discoveries, ideas, procedures, methods and principles of operation, and mathematical concepts.

An analysis of these definitions leads to the conclusion that a work must first and foremost be a result of creation, creativity or intellectual activity, and must be characterised by a certain originality—it must bear the creator’s stamp. Only the manner of presentation is protected, not the idea or method as such, and thus for example the layout, appearance and content of a process manual is protected, but not the method described in it.

Is data a “work”?

The answer to this question is crucial. Only the recognition of a given data point, data or set of data as a work would allow it to be protected by copyright.

Manifestation of creative activity

To determine whether data can be a work, the first question to be answered is whether it can be considered a result of creativity.

Data is specific information that can be presented in a particular form (digits, words, geographical location, measurement or time record). It can be collected, e.g. by autonomous vehicles, or can constitute machine-generated information transformed by artificial intelligence after analysis of certain resources, such as the internet. It can be systematised in the form of a database or simply be a collection of information stored on the cloud, for example. Regardless of the method by which it was created, data is simply a fact and not a creative product of the human mind. An algorithm used to collect or generate data can be considered such a product, but the effect of its work, i.e. “pure” data, cannot. (The protection of algorithms is an important but separate topic, to which we will devote a separate article.)

Excluding data as such from copyright protection does not raise any doubts. A scientific fact (a finding of a single event or property of a natural or social nature) falls outside copyright protection (W. Machała & R.M. Sarbiński (eds.), Copyright and Related Rights: Commentary (WKP 2019), commentary on Art. 1). Copyright protects only a mode of expression and not the skills and labour involved in finding and identifying a source of information (J. Barta & R. Markiewicz, Copyright and Related Rights (WKP 2019), commentary on Art. 1).

Such an approach to data can be found in most legal systems. For example, data is not subject to copyright protection under US law (as the one source puts it, “Data are considered ‘facts’ under U.S. law. They are not copyrightable because they are discovered, not created as original works.”) European Union law also indicates that copyright protection cannot be extended to “mere facts or data” (Recital 45 of the Database Directive (96/9/EC)).

Individual character

Although the inability to consider data as a manifestation of creativity generally excludes its copyright protection, it is worth considering whether data has an individual character required of works. This is a question of whether data is a result of a technical activity whose effects are predicted in advance, or results from a personal approach to an issue, an author’s vision (D. Flisak, Commentary on selected provisions of the Act on Copyright and Related Rights (LEX/el. Commentary 2018), commentary on Art. 1). (Defining who can be a data creator goes beyond the scope of this article.)

Data is not subject to processes aimed at giving it an individual character. It is collected or generated as information about specific processes or events. Therefore, data is the result of a schematic and repetitive process, which is a purely mechanical activity and not an effect of a creative vision.

Is a collection or compilation of different data a work?

This assessment is not affected by whether we are dealing with one data point or a huge amount of data. However, if we collect a large amount of data and create a collection or database from it, the selection of data adopted by the creator, its arrangement or compilation, may be protected by copyright. However, the data contained in it is still not protected under copyright law. Only the selection of data or the way it is presented will be a work, not the data itself.

To illustrate this, we might consider the example of board games. Copyright protection does not cover the idea of a game, its logic or rules; only the board (specific graphics, details, colours, etc) and other elements used in the game (e.g. playing pieces, cards, markers) having a creative and individual character are a work. Applying this to a set of data, only the mode of expression bestowed by the author can be protected, e.g. colourful charts aggregating in an unusual way data generated from online sources, as well as the layout, arrangement or composition resulting from the author’s creative approach. Protection can also be sought in the way the elements of a data set are selected, if a creative, non-obvious selection of data is made on the basis of specific, subjective factors adopted by the author (e.g. a list presenting the one hundred most important books in the history of literature, in the opinion of the creator of the list).

However, it must be pointed out that this applies only to a manmade selection. If data is selected by a machine, the selection will not be protected. Only human creations are subject to copyright protection (some time ago we considered who actually owns the copyright to a work created by a robot).

Moreover, a set of data cannot be considered creative if the collected data exhaust the list of all possible elements of the set, e.g. a catalogue of all pharmacies operating in a given locality (D. Flisak in Commentary on selected provisions of the Act on Copyright and Related Rights (Gdańsk 2018), commentary on Art. 3).

In principle, the same copyright protection rules for sets of data apply in the EU and the US (see Recital 3 of the Database Directive (96/9/EC); and as an American source advises, “Although data itself cannot be copyrighted, you may be able to own a copyright in the compilation of the data. Creative arrangement, annotation, or selection of data can be protected by copyright.”) Comparable rules are also adopted in international agreements (Art. 10(2) of the Agreement on Trade-Related Aspects of Intellectual Property Rights and Art. 5 of the WIPO Copyright Treaty).

Conclusions under current regulations

Non-personal data, in particular data collected or generated by machine, does not constitute a “work” and is therefore not protected by copyright. It represents facts that cannot be considered a manifestation of human creative activity with an individual character. It is information about reality and ongoing processes, and establishing it involves discovering or deducing information, not creating a solution in a creative process. The fact that large amounts of data may be used to create something innovative or improve a service does not mean that the data itself is protected by copyright. Under the Polish Copyright Act, protection may at most apply to a creative selection, arrangement or combination of data, i.e. an original way to express a set of data, and not the data contained in it.

What does this mean for companies collecting or generating data? From a copyright point of view, they are unprotected. As a result, it cannot be said that anyone is entitled to copyright on such data. Copyright law does not prohibit the collection or use of data generated by third parties. Also, the “owner” of such data has no copyright instruments to prohibit others from using it.

This leads to the conclusion that under current regulations, protection of data must be sought outside the scope of copyright law. This is clear. But where exactly to find protection, and whether the available protection meets the needs of the market, is debatable. It seems that protection can be found in the rules on trade secrets and knowhow and, more narrowly, in the specific law governing databases. Whether the protection provided for there covers data as such (and if so, to what extent it meets the needs of entities whose business model is based on data) will be addressed in separate articles in this series.

Proposals for the future

It can already be argued that lawmakers should immediately address the issue of data and consider a new law regulating non-personal data, in particular machine-collected or -generated data. The specific nature of data and how it is created requires special solutions, and its increasing economic value calls for work on such solutions to start at full speed.

For a new right to be adapted to real economic conditions as much as possible, it must be subject to broad discussion, on both the content of such a right and on who should be entitled to it. On the latter issue, if natural persons are the source of certain data which companies later use to develop their own offers (for example, data collected by apps for runners), should the interests of the individuals from whom the data is generated be taken into account, and their contribution adequately rewarded? If so, how to identify, quantify and distribute the benefits?

We are faced with many questions that need answers. The lack of regulations and legal clarity can only slow the potential growth of the economy.

Paulina Mleczak, Lena Marcinoska