Whose data is it anyway?

Paparazzi. Flickr/ Trevor Butcher. Some rights reserved.Yesterday I walked to the supermarket, like I do every Tuesday morning. All of a sudden I started noticing a few people starting to follow me. I try to convince myself that it is probably just my imagination, and carry on walking. After a few minutes, I cross the road and make another turn, but then I look behind me and see that now there are dozens of people starting to follow me, taking pictures of me and writing rapidly, documenting my every move. After a couple more steps, they became hundreds. My heart was racing, I could hardly breathe, and I started to panic. Freaking out, I shouted at them, “Who are you? What do you want from me?” I tried to get a clearer view of this huge group - some looked a bit familiar but I didn’t remember where I’d seen them before. They shouted back at me, “Don’t worry, we don’t really know who you are, we just need some information on you, so we can show you different ads on billboards”. Puzzled by their response I scream, “What do you mean you don’t know who I am!? You know my gender, skin/eyes/hair color, height, weight, where I live, the clothes and glasses I wear, that I have 10 piercing in one ear and that I shop at Sainsbury on Tuesday mornings!” They smile and try to reassure me, “But we don’t know your NAME, silly! So stop being so paranoid, we do this to everyone walking on the street, it’s public space you know...”.

This scenario might seem science fiction to some people, a dystopian reality, horror film or a South Park episode. But for the others that recognise this situation, this is actually what happens every day when you browse the internet. Kind of creepy? Back in 2000, a marketing company CEO apologised for a similar creepy practice. On March 2, Kevin O'Connor, DoubleClick’s CEO, released a statement saying that:

“It is clear from these discussions that I made a mistake by planning to merge names with anonymous user activity across Web sites in the absence of government and industry privacy standards… We commit today, that until there is agreement between government and industry on privacy standards, we will not link personally identifiable information to anonymous user activity across Web sites.”

O’Connor’s statement came after a huge backlash his online marketing company experienced following its merger with Abacus Direct for $1.7 billion, on November 24, 1999. What this merger meant was combining datasets that a marketing company (DoubleClick, which on 2008 was acquired by Google) has on people’s online behaviour with a marketing company’s dataset on people’s offline behaviour. The Electronic Privacy Information Center (EPIC) filed a complaint on February 10, 2000 with the Federal Trade Commission (FTC), arguing that the problems behind this merger “stem from the inability of the vast majority of consumers to either control the collection of information concerning Internet user behaviour or the linking of profiles with real identities“. Today, 16 years later on, there are still no privacy standards set in place. But the practice of experimenting with people’s data for various reasons in ‘public’ spaces on the internet is still a commonplace practice, usually without their knowledge or consent.

It’s not OK! Cupid

The truth is, that different forms of experimentations are being done on your online data in combination with your offline data, which include your personal details and behaviour, all the time. Whether it is governments, commercial companies, or online services, everything you do online is being analysed and sometimes sold.

On May 14, for example, it was revealed that a group of Danish researchers released a data set of 70,000 OkCupid users on Open Science Framework, containing details such as their user names, age, gender, location, and their political and religious views. This dataset also contained these people’s answers to the multiple questions that the dating service provides to algorithmically find suitable matches, which contain very intimate details on their sexual preferences. The main researchers involved - Emil Kirkegaard and Julius Daugbjerg Bjerrekær - argue in an interview on Vox that "Some may object to the ethics of gathering and releasing this data… However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it [in] a more useful form”. When confronted by Twitter users who asked them whether they had asked OkCupid for permission beforehand, Kirkgaard answered - “Don’t know. Don’t ask”, sealed with a smiley emoticon.

Since then the Open Science Framework removed this data set, which was apparently a violation of OkCupid’s (who conduct experiments on their own subscribers as well) terms and conditions under the DMCA (Digital Millennium Copyright Act), but many people had already copied this data and it is circulated in various torrents across the world.

Several scholars have made fantastic contributions on this issue, pointing to various ethical aspects that researchers, working in academia or businesses, need to take under consideration. Michael Zimmer discusses the problems of big data science and argues that ‘public’ does not mean consent. Zimmer even approached the Danish researchers asking important questions on how they managed to get this supposedly public data, but received no answer. Annette Markham argues that more courses in ethics need to be integrated into academic fields, specifically computer science.

Importantly, Markham argues that this ethical distinction between data being in public or private places is a myth maintained by legal and regulatory systems. As she argues: “This is no longer a viable distinction, if it ever was. When we define actions or information as being either in the private or the public realm, this sets up a false binary that is not true in practice or perception”.

Privacy as experience

But privacy problems in media and communications are not a new thing. Esther Milne, for example, writes in her book Letters, Postcards, Email: Technologies of Presence about notions of privacy with the postal system dating back to the seventeenth century. Postcards, as she argues, presented a new challenge in perception of the postal system because the content of communication was ‘open’ for everyone to read. Milne points to a postcard correspondence of two British brothers during World War 1, William and Elsie Fuller, who managed to maintain intimacy although there was censorship on the kind of things soldiers could write. This meant they knew all of their messages would be read thoroughly, and were able to adjust their messages accordingly. Privacy, as Milne argues, is an experience which is performed and therefore does not pre-exist in particular designated spaces. Privacy, then, is dependent on context and intimacy; it is negotiated between the people who participate in a communication process. So contrary to many legal discourses, privacy is not an objective situation that happens in particular places which were constructed as private or public.

Similarly, Kate Crawford and Jacob Metcalf rightfully point out in their research that data is a matter of contextuality and temporality. What you say to a particular group of people, in a particular context, time, and place is adjusted accordingly. If young people post drunk photos of themselves after a night out they do not necessarily know that that action will influence their job prospects a few years later. And this is exacerbated because many services’ default setting is archiving people’s behaviour in order to make richer profiles, which are then monetised. This is also the reason why many young people are moving to services such as Snapchat and WhatsApp, as they refuse to accept the assumption that everything they do needs to be documented forever. They also want to be able to control who is part of the communication experience, so they could avoid the ‘context collapse’, as danah boyd and Alice Marwick argue, that often happens on Facebook, Twitter and other services.

Not only about privacy

These notions of protecting only private spaces can be seen in the Charter of Fundamental Rights of the European Union, where people have the right for Respect for private and family life (Article 7), and specifically the protection of their Personal Data (Article 8). In the Universal Declaration of Human Rights (UDHR) this right is phrased slightly different in Article 12, whereby people have a right not to be subjected to arbitrary (whatever that may mean) interference with their privacy, family, home or correspondence. As these rights indicate, the perception that data about ourselves and our lives can be either personal, private or public is ingrained in legislation and rooted in wrong perceptions that have a long history.

On a technical level, the problem with this, as other researchers have shown, is that any anonymised data can be re-identified or de-anonymised in combination with other data sets. This makes the notion of ‘personal data’ flawed. It is flawed not only because people’s data can be ‘singled out’ from other people as the computer scientists and European law argue, but importantly because people’s experience of a (private) communication situation depends on context, time and their imagined audience. As David Beer argues, people are disciplined and shaped by measurements and multiple metrics conducted on their data, and this makes them feel anxious and insecure. Therefore, much beyond the ‘privacy’ issue, this is a matter of human autonomy - which links to a much less discussed human right - freedom of expression (Article 11 on the Charter, and Article 19 on the UDHR). It is about what we can do and say in the online environment, and about our rights to know, understand, agree or withdraw (if we later regret) and control what others can do with this data - It is about autonomy.

The problem

“The problem”, as Eben Moglen explained recently, “is that the Internet today undermines human autonomy... The problem begins with advertising”. Collection, categorisation, and experimentation on people’s data are presented as legitimate because online advertising is funding the free internet. These practices were also, of course, done in the ‘offline world’ of marketing, but the digital affordances that the internet offers of easy, fast and cheap access, storage, duplication, interception, and transmission of information is on a much larger scale. These practices, in turn, influence the way people behave and feel online and how the environment they operate in changes (mostly in a ‘personalised’ way) accordingly. Such issues arise also in the protocols and architecture that governs the internet, as Article 19 show in their recent documentary A Net of Rights which connects human rights to internet protocols.

Trying to offer guidelines for considering these new ethical dilemmas, the Council for Big Data, Ethics and Society, have recently published a White Paper, discussing the problematic issues of different (big) data research practices and norms. They provide great recommendations for different institutions and companies conducting such research, urging them to look beyond the binary private/public paradigms and focus more on the potential use of these datasets. Another great source is the Internet Rights & Principle Coalition’s (IRPC) Charter of Human Rights and Principles for the internet, which provides a great framework for internet rights and expands the UDHR to other important aspects specifically related to the internet, such as expanding freedom of expression to a freedom of online protest and freedom of choice of system and software use as well as right to use encryption and have easy-to-find, -use and -manage privacy policies and settings.

What dodgy experiments like that of the Danish researchers do is to open up much-needed discussion on the ethics and norms of online experimentation and the need for global standards that apply to all actors concerned. So don’t creep about, keep calm, and fight for your digital human rights, smiley emoticon.