Data and copyright

10 December 2020

Is copyright a path to take to protect data? Can data be regarded as a result of creativity and, consequently, a protected work? Does the protection of a data filing system also include the data collected in it?

Non-personal data, in
particular data collected or generated by machines, has great economic and
scientific value. Increasingly, it is a key business asset, providing the basis
for launching new goods and services. It helps improve methods of detecting and
treating diseases, determine where to set up a wind farm or where to cut down
trees so they do not interfere with power lines, and helps ease traffic congestion
in cities.

Do entities collecting
such data or creating algorithms that harvest data, e.g. from the internet,
have rights to the data? Can such data be freely traded, e.g. sold or licensed?

This issue raises many unanswered questions. This article will address whether data is protected by copyright and, as a result, whether the data “owner” can use the instruments provided by the Polish Copyright Act to protect it against infringement by third parties.

What is a “work”?

International agreements
indicate that copyright protection covers literary, scientific and artistic
works, whatever their mode or form of expression (Art. 2 of the Berne
Convention) and that copyright protection applies only to expressions and not
to ideas, procedures, methods of operation or mathematical concepts (Art. 9(2)
of the Agreement on Trade-Related Aspects of Intellectual Property Rights and
Art. 2 of the WIPO Copyright Treaty).

A similar provision is
found in the Polish Act on Copyright and Related Rights. It states that a work
is any manifestation of creative activity of individual character, fixed in any
form, regardless of the value, purpose and mode of expression, with protection
only for the mode of expression and not for discoveries, ideas, procedures,
methods and principles of operation, and mathematical concepts.

An analysis of these
definitions leads to the conclusion that a work must first and foremost be a
result of creation, creativity or intellectual activity, and must be
characterised by a certain originality—it must bear the creator’s stamp. Only the
manner of presentation is protected, not the idea or method as such, and thus
for example the layout, appearance and content of a process manual is protected,
but not the method described in it.

Is data a “work”?

The answer to this
question is crucial. Only the recognition of a given data point, data or set of
data as a work would allow it to be protected by copyright.

Manifestation of creative activity

To determine whether
data can be a work, the first question to be answered is whether it can be
considered a result of creativity.

Data is specific
information that can be presented in a particular form (digits, words,
geographical location, measurement or time record). It can be collected, e.g.
by autonomous vehicles, or can constitute machine-generated information
transformed by artificial intelligence after analysis of certain resources, such
as the internet. It can be systematised in the form of a database or simply be
a collection of information stored on the cloud, for example. Regardless of the
method by which it was created, data is simply a fact and not a creative product
of the human mind. An algorithm used to collect or generate data can be
considered such a product, but the effect of its work, i.e. “pure” data, cannot.
(The protection of algorithms is an important but separate topic, to which we
will devote a separate article.)

Excluding data as such from
copyright protection does not raise any doubts. A scientific fact (a finding of
a single event or property of a natural or social nature) falls outside
copyright protection (W. Machała & R.M. Sarbiński (eds.), Copyright and Related Rights: Commentary (WKP
2019), commentary on Art. 1). Copyright protects only a mode of expression and
not the skills and labour involved in finding and identifying a source of
information (J. Barta & R. Markiewicz, Copyright and Related Rights (WKP 2019), commentary on Art. 1).

Such an approach to data
can be found in most legal systems. For example, data is not subject to
copyright protection under US law (as the one source puts it, “Data are considered ‘facts’
under U.S. law. They are not copyrightable because they are discovered, not created
as original works.”) European Union law also indicates that copyright
protection cannot be extended to “mere facts or data” (Recital 45 of the
Database Directive (96/9/EC)).

Individual character

Although the inability
to consider data as a manifestation of creativity generally excludes its
copyright protection, it is worth considering whether data has an individual
character required of works. This is a question of whether data is a result of
a technical activity whose effects are predicted in advance, or results from a
personal approach to an issue, an author’s vision (D. Flisak, Commentary on selected provisions of the Act
on Copyright and Related Rights (LEX/el. Commentary 2018), commentary on Art.
1). (Defining who can be a data creator goes beyond the scope of this article.)

Data is not subject to
processes aimed at giving it an individual character. It is collected or
generated as information about specific processes or events. Therefore, data is
the result of a schematic and repetitive process, which is a purely mechanical
activity and not an effect of a creative vision.

Is a collection or compilation of different data a work?

This assessment is not affected
by whether we are dealing with one data point or a huge amount of data.
However, if we collect a large amount of data and create a collection or
database from it, the selection of data adopted by the creator, its arrangement
or compilation, may be protected by copyright. However, the data contained in it
is still not protected under copyright law. Only the selection of data or the
way it is presented will be a work, not the data itself.

To illustrate this, we
might consider the example of board games. Copyright protection does not cover
the idea of a game, its logic or rules; only the board (specific graphics,
details, colours, etc) and other elements used in the game (e.g. playing pieces,
cards, markers) having a creative and individual character are a work. Applying
this to a set of data, only the mode of expression bestowed by the author can
be protected, e.g. colourful charts aggregating in an unusual way data
generated from online sources, as well as the layout, arrangement or
composition resulting from the author’s creative approach. Protection can also
be sought in the way the elements of a data set are selected, if a creative, non-obvious
selection of data is made on the basis of specific, subjective factors adopted
by the author (e.g. a list presenting the one hundred most important books in
the history of literature, in the opinion of the creator of the list).

However, it must be pointed out that this applies only to a manmade selection. If data is selected by a machine, the selection will not be protected. Only human creations are subject to copyright protection (some time ago we considered who actually owns the copyright to a work created by a robot).

Moreover, a set of data
cannot be considered creative if the collected data exhaust the list of all
possible elements of the set, e.g. a catalogue of all pharmacies operating in a
given locality (D. Flisak in Commentary on
selected provisions of the Act on Copyright and Related Rights (Gdańsk 2018), commentary on Art. 3).

In principle, the same
copyright protection rules for sets of data apply in the EU and the US (see
Recital 3 of the Database Directive (96/9/EC); and as an American source advises, “Although data itself
cannot be copyrighted, you may be able to own a copyright in the compilation of
the data. Creative arrangement, annotation, or selection of data can be protected
by copyright.”) Comparable rules are also adopted in international agreements
(Art. 10(2) of the Agreement on Trade-Related Aspects of Intellectual
Property Rights and Art. 5 of the WIPO Copyright Treaty).

Conclusions under current regulations

Non-personal data, in
particular data collected or generated by machine, does not constitute a “work”
and is therefore not protected by copyright. It represents facts that cannot be
considered a manifestation of human creative activity with an individual
character. It is information about reality and ongoing processes, and establishing
it involves discovering or deducing information, not creating a solution in a creative
process. The fact that large amounts of data may be used to create something
innovative or improve a service does not mean that the data itself is protected
by copyright. Under the Polish Copyright Act, protection may at most apply to a
creative selection, arrangement or combination of data, i.e. an original way to
express a set of data, and not the data contained in it.

What does this mean for
companies collecting or generating data? From a copyright point of view, they
are unprotected. As a result, it cannot be said that anyone is entitled to
copyright on such data. Copyright law does not prohibit the collection or use
of data generated by third parties. Also, the “owner” of such data has no copyright
instruments to prohibit others from using it.

This leads to the
conclusion that under current regulations, protection of data must be sought outside
the scope of copyright law. This is clear. But where exactly to find protection,
and whether the available protection meets the needs of the market, is
debatable. It seems that protection can be found in the rules on trade secrets and
knowhow and, more narrowly, in the specific law governing databases. Whether
the protection provided for there covers data as such (and if so, to what
extent it meets the needs of entities whose business model is based on data)
will be addressed in separate articles in this series.

Proposals for the future

It can already be argued
that lawmakers should immediately address the issue of data and consider a new
law regulating non-personal data, in particular machine-collected or -generated
data. The specific nature of data and how it is created requires special
solutions, and its increasing economic value calls for work on such solutions to
start at full speed.

For a new right to be
adapted to real economic conditions as much as possible, it must be subject to
broad discussion, on both the content of such a right and on who should be
entitled to it. On the latter issue, if natural persons are the source of
certain data which companies later use to develop their own offers (for
example, data collected by apps for runners), should the interests of the
individuals from whom the data is generated be taken into account, and their
contribution adequately rewarded? If so, how to identify, quantify and
distribute the benefits?

We are faced with many questions that need answers. The lack of regulations and legal clarity can only slow the potential growth of the economy.

Paulina Mleczak, Lena Marcinoska

Paulina Mleczak

Lena Marcinoska-Boulangé