My most recent publication The Copyright Surveillance Industry, appears in a special surveillance-themed issue of the open-access journal Media and Communication. In it, I examine the industry that has developed to monitor the unauthorized use and distribution of copyrighted works online. The same companies often help to facilitate copyright enforcement, targeting either allegedly infringing content, or the persons allegedly engaged in infringement. These enforcement actions include sending vast numbers of algorithmically-generated takedown requests to service providers, blocking uploaded content that matches the characteristics of certain files, or the lawsuits filed by “copyright trolls” and law firms engaged in “speculative invoicing”.
The scale and scope of the copyright surveillance industry
An interesting fact about the copyright surveillance industry, given the scale of its interventions (for example, hundreds of millions of Google takedown requests and copyright trolls targeting hundreds of thousands of defendants in both the US and Germany) is the industry’s relatively small size. It is certainly much smaller than the multi-billion dollar industry which develops technological defenses against infringement (known as digital rights management [DRM]), or the billions of dollars flowing through police, security, and military-serving surveillance companies. Copyright surveillance companies with just a handful of employees can leverage algorithmic methods to achieve online coverage on a massive scale. While some of their methods are closely guarded (notably, copyright trolls typically avoid proceeding to trial where their evidence would be subject to scrutiny), small teams of academics working with limited resources to track online file-sharing have achieved similar results.
The first wave of copyright surveillance companies were founded in 1999 and 2000, during the rapid rise of Napster. As file-sharing moved to other platforms, new firms sprang up and some were bought out by larger players. In 2005 MediaDefender (one of the more notable firms at the time, with major music, film, and software clients) was bought for $43 million. Another notable surveillance company, Media Sentry, was bought for $20 million in the same year. This appears to have been a time when enthusiasm for the industry was high. Four years later Media Sentry was sold to MediaDefender’s owner for less than $1 million. Subsequent acquisitions have involved undisclosed amounts of money, but this is generally an industry that deals in millions and tens of millions of dollars, and in which a large company might have several dozen employees.
Today, larger and more notable copyright surveillance companies include Irdeto and MarkMonitor – both the product of industry mergers and buyouts. MarkMonitor, which bought the prominent tracking firm DtecNet in 2010, was reported to have 400 employees in five countries in 2012. Irdeto entered the copyright surveillance market in 2011 when it bought the monitoring firm BayTSP and its 53 employees. These companies offer copyright monitoring and enforcement as just part of their “anti-piracy” or “brand protection” services. There are also smaller and more dedicated companies such as Evidenzia in Germany and Canipre in Canada, and more shadowy players such as Guardaley and its various alleged “shell companies“. Copyright owners (or the law firms that represent them), will seek out and hire these firms. Alternately, surveillance companies drum up business by approaching content owners, informing them that their content is being “pirated”, and offering their services.
Algorithmic surveillance
I’ll discuss copyright trolling and identification based on IP addresses in a subsequent post, but I want to take this post to discuss the sort of algorithmic surveillance commonly used in copyright enforcement. We see algorithmic surveillance wherever there is lots of data to scan and not enough discerning sets of eyeballs to go around, but the copyright surveillance industry has, since its beginnings, been driven by the need to comb through vast online domains, and to do so quickly and inexpensively (ideally, with as little human intervention and supervision as possible).
Much of what is reported, removed, blocked, or flagged as a result of these algorithms is rather uncontroversial from the perspective of copyright law. That is to say, a court might support the algorithm’s judgement that a particular act or piece of content counts as copyright infringement. But algorithms inevitably make mistakes, some of which are so ridiculous that it is clear no thinking human was involved in the process. These include misidentifying promotional content such as official websites and advertisements as copyright infringement. In at least one instance, a copyright enforcement company misidentified their own notices of infringement as actual instances of infringement and issued a takedown notice for them, resulting in a sort of algorithmic feedback loop. These automated misidentifications also result in removing legitimate content belonging to other copyright owners. In one 2011 case, Warner Brothers was accused of repeatedly and willfully issuing mistaken takedown requests. In response, the company essentially argued that it believed its identifications were accurate at the time, and mistakes were not willful because the volume of infringement meant that human beings were unable to fully supervise its automated monitoring.
While there are plenty of examples of algorithms behaving badly in the world of copyright enforcement, it is important to remember that what counts as copyright infringement is often not an easy determination to make. Courts continue to struggle with copyright law’s grey areas, with judges disagreeing on a variety of issues. This is particularly the case with various kinds of “user-generated content“, such as mashups, home videos, or parodies uploaded to YouTube. To make things worse, copyright owners often tolerate or even encourage unauthorized uses of their work (such as fan videos and other forms of fan culture) online. Expecting algorithms to adjudicate what counts as infringement in these circumstances has more to do with the business models of the web and media industries than copyright law. The same can be said for the expectation that users can identify which of their actions count as infringement in advance, and that users who are mistakenly targeted can appeal algorithmic errors when they occur. Ultimately however, copyright law supports and legitimates these practices, given that the potential penalties for not playing ball with copyright owners far exceed the consequences for abuse or automated carelessness in copyright enforcement.
Internet and digital technologies have opened new possibilities for individuals to create, consume, and distribute content. However, areas of contact between individuals and copyright owners have also increased. Legal and extra-judicial copyright enforcement mechanisms are being employed on a mass scale, based on questionable identifications of individuals and content, and often with limited recourse for those affected. We are likely to see continued calls to make the algorithms involved more accountable, and for ways to determine who can be held accountable for an algorithm’s decisions.