Jan. 19, 2022

Academic piracy

Author: Georg Fischer

Researchers and students sometimes take illicit paths to access scientific texts

Over the past two years I have visited my university library only once: the day I had to return a book that I borrowed in January 2020 before the pandemic restrictions began. My university library refers to the library of the university where I graduated some time ago – and where I have been an alumnus ever since. With alumnus status, I can use the services of the library like I did previously as doctoral student – for research, loans, interlibrary loans from other libraries, copy and scan orders, etc. This is a great help, almost a necessity in my scientific and journalistic work. Especially before the pandemic, I made active use of these offers. But what the university library could not fully do even before the pandemic was to provide me with all desired essays, anthologies and monographs at short notice.

Between intellectual desires and material possibilities

As a sociologist and journalist interested in both history and the present, I would like to look at offsite historical sources as well as new publications. I want to identify trends and turnarounds and new fields of research. I want to do targeted research, to stay up to date. I want to know what colleagues publish and where citation cartels are formed. I want to check sources and see what else can be gleaned from them. And, sometimes, I want to drift in the ocean of references without having a goal, to lose myself in rabbit holes, to descend into footnote cellars.
In short: I need open access and a wide selection of scientific literature. But I do not, or cannot, spend 30 euros or more on every book chapter or essay that could be interesting – and which I want to view beyond its abstract – as provided for by the (digital) offerings of the major academic publishers.
This gap between my intellectual desires and material possibilities is a problem. I became aware of the problem during the researching and writing of my dissertation between 2014 and 2018. While working from home due to the pandemic, the problem continued to worsen for me. Colleagues related similar difficulties. Almost all my colleagues who study, research, teach, or are unable to access the physical library seem to be affected.

A fantastically profitable business model

Simon Frith wrote about the music industry at the end of the 1980s:
For the music industry the age of manufacture is now over. Companies (and company profits) are no longer organised around making things but depend on the creation of rights. […] [T]he company task is to exploit as many of these rights as possible […].
For the academic media industry (i.e., the major scientific publishers) this insight is almost ideal. Elsevier, for example, achieves revenue margins of more than 30%. This is now also perceived as a problem in science itself.
The basis for this is a business model that makes maximum use of the copyrights of the published authors: scientists write academic texts, in most cases ensure their quality, and give them to scientific publishers who formally prepare these texts – and then sell back access to the texts to the libraries (or to interested individuals) by means of subscription models.
Unlike the music industry or the newspaper industry, for example, the academic media industry was able to transfer its business model into the digital age relatively unscathed. For some years now, however, cracks have been forming in the walls that scientific publishers have built around the texts they market. Similar to the MP3 crisis of the music industry, users are applying digital tools to overcome the paywalls of publishers.

Designing accesses, laying tunnels, overcoming paywalls

Among these users are not just researchers, students and teaching staff, but also journalists and the interested public who wish for open, free access to scientific findings. To this end, they use resistant, sometimes guerrilla-like strategies to circumvent, undermine and open up the (digital) restrictions of the major publishers.
The best-known example is probably the shadow library SciHub (short for Science Hub), which currently provides access to more than 75 million documents. It was founded in 2011 by Alexandra Elbakyan, who was frustrated that she could not access the required texts from her place of residence in Kazakhstan. Elbakyan decided to program an automatic circumvention of restrictions; in order to obtain a certain text, SciHub pretends to be a library that has already acquired access to this text. SciHub tricks the publisher's website into believing an IP that belongs to the library in question. The text becomes accessible via this digital tunnel and the user can download it as a PDF. Access granted.
To ensure the highest possible availability, SciHub works together with the Russian shadow library LibGen (short for Library Genesis). Beginning in 2008, LibGen aggregated various collections of texts circulating in Russia and put the entire corpus online. In 2014, LibGen offered about 25 million documents, including scientific literature and works of fiction in various languages, resulting from mass downloads of repositories and leaks from university networks and publishers.
Together with LibGen, SciHub is an example of a technically delegated, automated circumvention strategy allowing users to circumvent the copyright issues of accessing literature. This gives them flexibility in the short term but does not solve the underlying problem in the long term.
The whole thing remains a cat-and-mouse game: the sites have to regularly change their top-level domains to escape the access of governmental authorities. Instructions on how to bypass the locks circulate quickly.
Of course, shadow libraries are a thorn in the side of the major publishers – so bad that they have even considered having the university libraries install surveillance software in order to be able to track (illegal) access from university networks. A paper of the German Research Foundation (DFG) recently described the practice of data tracking and addressed it as a problem.
Following a lawsuit by several major publishers, SciHub did not add any new texts to its database for a few months in 2021. On Reddit, a rescue operation has formed for almost 80 terabytes of scientific texts, which should remain available for users via the file sharing system BitTorrent. In addition, the programmer Elbakyan has been targeted by governmental authorities: According to Elbakyan, her Apple account, for example, has been monitored by the FBI for the last two years. In September 2021, SciHub celebrated its ten-year anniversary by uploading more than 2 million new articles.

From peer to peer

SciHub and LibGen are well-versed technical solutions based on a central distribution principle. In contrast, decentralized peer-to-peer-practices have also established themselves in social media.
Under the hashtag #IcanhazPDF users make search queries for scientific texts. Colleagues with access to the desired texts can see who is looking for what and help. The associated Twitter account formulates an etiquette for such search queries containing three rules:

  1. DOI link
  2. Your email address
  3. Delete tweet once PDF received

The third rule in particular serves to protect sources and to prevent possible warnings. In most cases, sending a text violates the copyright of the publisher – although the authors are allegedly less likely to object to it, as they benefit from their work being known, read and quoted by colleagues.
The practice around the hashtag appeals to the community idea of globally networked science and entails a particular principle of the exchange of gifts: the greater the willingness of individuals to provide texts, the higher the general chance of receiving the desired text. At the same time, #IcanhazPDF expands the common practice among researchers to recommend and send each other literature by email.
#IcanhazPDF does not set any limit to scientific disciplines or years of literature sought. In using the hashtag, researchers also provide some insight into their own work, exhibit their own consumption of literature, and show which texts they (want to) receive.
A similar practice takes place in the exchange of literature via Telegram groups. The messenger service was originally developed by Russian programmers and has officially relocated its headquarters to Dubai. Telegram sometimes receives public criticism: not only can short messages be sent free of charge, but the service also provides the technical infrastructure for illegal activities, such as drug trafficking, the spread of conspiracy ideologies, or the organization of political upheavals.
Telegram is obviously weakly regulated. And this is one apparent reason why the sharing of scientific literature also takes place within self-organized groups on the platform. Similar to #IcanhazPDF on Twitter, students and researchers make a text request in one of the various groups, often combined with an indication of the library at which the text is digitally accessible.
The sending of the desired texts usually takes place in private messages. Presumably, this is a safety feature so that the senders cannot be prosecuted for making a copyrighted text publicly available to a large audience.
If you are interested in a text that another person has requested, you can express this via text abbreviations or emoticons. This shows that academic piracy is at best legally questionable but serves the general goal of scientific exchange and can lead to substantive recommendations of literature.
Telegram groups arose in part as a reaction to the pandemic situation, which cut many students off from access to scientific literature and thus made a technical and solidarity-based solution necessary. In some cases, comparable offers existed even before the pandemic.

Shadow libraries as a viable but illegal practice

I admit that I not only research copyright, but also break copyright laws from time to time. Without the techniques and offers listed above, which are largely in the shadow of the law, I would not have been able to write my dissertation, for example. And while I'm thrown back into home office because of the pandemic, for me – as for many others – legal access to scientific literature through libraries is deteriorating. For the year 2020, academic libraries in Germany recorded an average decline of 25 percent in the number of loans compared to the previous year.
Regardless of the pandemic, the shadow libraries and techniques described above are an indication that formal copyright rules (and their application by publishers) collide with the actual wishes of users; students, researchers and the interested public want fast, direct, easy and affordable access to scientific literature. This is needed for studying, research, teaching, and pursuing education.
To achieve their own goals – such as writing a scientific qualification thesis – many students and researchers are willing to take advantage of legally-questionable or even clearly-illegal offers. They value the benefits of the illegal offers more than the resulting damage to themselves or the scientific publishing industry. Organizational researchers refer to this phenomenon as ”useful illegality”, by which they mean the various practices, strategies and mechanisms that are illegitimate or illegal, but very necessary to keep an organization running and give it flexibility in the application of formal rules.
Many people may also see no point in paying out of their own pockets for access to already-publicly-funded research. They bypass the helpful and absolutely necessary, but usually insufficient offers of their university libraries and facilitate new paths that help them better reach their access goals. In the case of peer-to-peer exchanges, users sometimes slip into the role of librarians themselves to do their own colleagues and fellow students a favor, helping to obtain the desired literature and, if necessary, to recommend more. A para-librarian structure is thus created and consolidated.

Desire paths: Not seen as part of the problem, but as part of the solution

Bypassing hurdles relates to a phenomenon that urban planners call “desire paths”: These are defined as “paths and tracks made over time by the wishes and feet of walkers, especially those paths that run contrary to design or planning”. Urban planners have recognized the potential of such desire paths: for example on the university campus in Michigan where users were explicitly allowed to create the paths they wanted between buildings. Later, these organically generated trails and sneaky backways were developed into official paths.
With digital ways of sharing academic texts (such as SciHub or #IcanhazPDF), desire paths also emerge in the digital sphere. They provide shortcuts between the official structures of libraries, albeit through informal, illicit, or even illegal means. The music industry was the first media industry to be turned upside down by the media break of digitization and increased copying possibilities of users; user-generated side paths became the main arms of digital music distribution that could no longer be ignored.
After the music industry initially fought hard against MP3s, illegal file sharing was gradually transformed into a legal business model. iTunes and Spotify appeared as external players with convenient digital offerings, and labels and publishers had to move. Gradually, the understanding spread that MP3 streaming on platforms could fulfill many users' desires for easy and fast access to music – a convenient shortcut to the ways of the CD business. In addition, it activates fans as resources; they can recommend music to their friends, curate in playlists, rate, collect, comment, and pass on music on social media.
In science, a similar process seems to have started with shadow libraries. This is certainly worth considering not as part of the problem, but more as part of a solution that solves an underlying problem in the procurement of scientific literature. For this, the wishes of users for a useful and legal system for obtaining scientific literature would have to be taken much more seriously – and not fought as copyright infringements.

This text was originally published in June 2021 at Verfassungsblog. For the SFB Blog, it was translated into English and slightly adapted. Both original and adapted version are licensed under CC BY-SA-4.0.