Please mind the datachasm

Überwachungskameras in Berlin. Wikimedia Commons/Autorenkollectiv. Some rights reserved.

Überwachungskameras in Berlin. WikiCommons/Autorenkollectiv. Some rights reserved.

It was early morning when the German police stormed the house of Andrej Holm and his family. Armed police searched the home for 15 hours and Holm spent three weeks in pre-trial detention. Andrej Holm is an urban sociologist. His crime? Well, he committed no crime, or no more than the rest of us. He was finally released after public pressure, and the discovery, presumably, that he wasn’t a terrorist after all.

At a time when we are flooded by government demands that social media yield more information to them, that encryption be open to government, when the UK government is increasing the already vast surveillance powers they took under RIPA with dire laws like DRIP, and when the internet is harvesting ever more of our data, this should give us pause for thought.

This is, in a way, a call to more thought, much more thought, than has previously been given to these issues in public discourse. It is shocking to witness the resignation or apathy at the vast amounts of data harvesting revealed by Edward Snowden.

Holm had been noted by the security services, perhaps initially by automatic systems, for using words like ‘marxist-leninist’ and ‘gentrification’. He fitted the profile of some bombers the security services were hunting down. Unknown to him, he and his entire family had been under close scrutiny for a year before the raid took place. The police had gathered large amounts of data on him, put it all together, and come up with an entirely erroneous conclusion: that Holm was a terrorist. The mistake resulted in serious trauma inflicted on himself and his family.

Statisticians will tell you that as a data set gets larger, the likelihood of false correlations appearing rises. The police and security services of most developed countries now collect so much data on their citizens that a vast amount of false correlation is surely bound to occur. But perhaps the really scary aspect of Holm’s story is that it was under intense human-led scrutiny that Holm’s life rendered to the police the picture of a terrorist. They began to interpret things like him leaving the house without his mobile phone as indications that their suspicions were correct. Welcome to one half of the datachasm.

The datachasm is the gulf between us and our data. As available-to-the-state tracking systems grow – Google, Facebook, your phone, paypal, email, credit cards, travelcards, ANPR, facial recognition – we have ever less control of our data. The datachasm has two cliffs to it: the first is the infinite complexity of everyday social life and the embeddedness of humans. Data, as it is managed, can only simplify and individualise. It has to be separated from relationships, which deal in complexity and embeddedness. There is a massive gap between our actual selves and our selves as represented in data. As Holm discovered, the surveillance systems only see a dataself, not you.

The second cliff of the chasm is that you are not in control of your data, nor in control of how people use it. Your data flows from one system to another, often without your knowledge, for purposes you have not sanctioned. The gulf between us and our data, the datachasm, could be said to be a new type of alienation in the world.

Data creep and the real you

The datachasm was not inevitable: it opened up partially through technology, but there are at least two other tendencies that bear some of the blame. One is the tendency in western society to believe in data on the page as truth. The other is the phenomenon of data creep.

I recently stumbled across an excellent illustration of the lack of control we have over our data – or should we say ‘their data’. A Freedom of Information request was made by James Bridle to Transport for London (TfL) about Metropolitan Police Service (MPS) use of their Automatic Number Plate Recognition (ANPR) data. It was already known that the police had been asking for data from TfL for ‘national security purposes’, which was the only agreed reason for which they could ask for it; they then used the data to catch ordinary criminals, though their contract with TfL explicitly stated the data could only be used for national security purposes. This is data creep between and within official bodies – which in theory have the strictest rules controlling data. But the scariest bit about the story is not that it happened; it is the correspondence about it. Here is an email from TfL to the Met:

Notwithstanding that, if it came to TfL’s attention that the data was being used for other purposes and that the MPS was, in effect, making section 29 disclosures of the data internally, I do not consider it likely in present circumstances that TfL would seek to enforce the parts of the contract that restrict the use of the data to national security purposes. This reflects that fact that the contract does not, of course, take account of the Mayor’s intention that the MPS and TfL should begin sharing TfL’s ANPR cameras so that the MPS gain routine access to the data for non-national security purposes.

Do you like that ‘if it came to our attention’? Senior officials write all correspondence very aware that it might be subject to an FOI request, which raises the question of what they say in private, and the imbalance of data monitoring between us and them, but that is perhaps a side issue. What this email is saying is that the police ignored the law on sharing of data and use of data, and rather than apologise, they lobbied to get the law changed. Apologise? That would be the last thing they’d do, as an email to TfL by a Detective Superintendent Winterbourne makes abundantly clear:

We request data all the time from numerous bodies both public and private for use in investigating a particular crime. TfL is one of those bodies. Having made countless such requests over my years I don't recall ever having been asked for a legal position around how we intend to use it. These things seem to fall within a corporate and common sense understanding of what Police do on behalf of society.

And if you don’t find that comforting, Winterbourne has a more precise and belligerent explanation of his thinking in responding to particular lines from TfL (in bold):

Firstly, if we had a legal route by which to change the use to which data is put that would be sufficient in this case. That isn't how the MPS sees it. We have a long term plan for using ALL data for Crime.

Do TfL seek clarification of the legal provision by which Police can use data once we have it? This would be like asking for the legal position OF Police to BE Police. We use data and information to solve Crime, that is what we are for. Nobody else seeks reassurance that we have a legal provision to use data once we have it. I think we need to be careful here not to start including massive mission drift away from the real issues - which I would say don't include the legality of Police using data to solve crime once we are in lawful possession of it.

Got that? The police will use any and all data available in their mission to BE POLICE, both once they are in lawful possession of it, and, implicitly given the context of this exchange, when they aren’t. The police know where you drive. They will use that data wisely. Sleep safe.

The email exchange then moves on to talk of changing road signs to make sure drivers will know their data is being collected as they drive around the UK. But much of that data is already collected by the police, and I suspect many people don’t understand what the signs mean.

Are they really meant to? And what would they be able to do about it if they did know? One of the worst aspects of the datachasm is that it can’t be avoided: data will be collected on you, even if you are willing to pay the price of staying off social media, email, mobile phones and so on.

Your data is of course not just available to the state in vast quantities. It is also available to corporations, with Facebook and Google being the most obvious data harvesters. Christian Rudder of dating site OkCupid recently published a book, Dataclysm, based on data the site had collected, with his interpretations of what it all meant. A controversy briefly stirred about the fact no users had given their permission for the data to be used in that way, nor had they given permission to be experimented upon using methods such as fiddling match percentages. OkCupid has been completely unapologetic about this, as was Facebook when they admitted to having experimented with users. The companies aren’t even embarrassed.

It really isn’t your data, it’s theirs. This isn’t just data creep between different organisations or different parts of organisations, it is manipulation of your online self, and it is data creep from personal confessions on the internet into published books.

Rudder’s book brings us full circle back to the first cliff of the Datachasm. The subtitle of the book is Who We Are (When We Think No One's Looking). You aren’t your real self in daily life, says Rudder, but your online selves, behind the anonymous usernames, are your real selves. And so he begins to draw conclusions about what people are really like, even though he has not done a single peer-reviewed study, nor had the data replicated or verified. I would urge anyone who thinks that internet personalities are the ‘real person’ to spend a few hours on youtube comments threads, and then imagine all those people in their daily lives. Which is the real person? Philosophers have pondered for centuries about whether there’s any such thing, but Rudder already knows – he has data.

Beyond the vague datacreeps

Any number of academics have, over many decades, attacked the use of statistics and other data as a form of control, particularly as part of what Foucault called ‘biopower’ – the control of populations and their bodies, supposedly for their own good. This confirms that the phenomenon of control-by-data isn’t entirely new, but we are undergoing a huge step change in data collection, due to the reduced cost of storing and processing data, that brings a new urgency to the debate.

This presents us with new challenges and casts old challenges in a new light. For instance, is the British concern about ID cards now a bit…outdated? While I still don’t want the state to demand that I carry a particular bit of plastic everywhere, the reality is that the tracking of my life already dwarfs anything that could be achieved with ID cards. Should we not begin to talk of a new ‘right’ here? Surely we could have the right to control data about us?

But we have not yet developed either the concepts or the public discourse to deal with the datachasm. People have gradually and belatedly become aware that their data is being harvested on a grand scale, but most of us are unsure what it might mean. All we feel so far is a vague unease. The datachasm gives us small moments of fear: adverts for things we hate but recently googled appear on an unrelated website, Facebook suggests links based on a site we visited but wouldn’t talk about publicly. Data creep gives us the datacreeps, but it does not yet give us the real fear.

If we do not develop the language to talk about this, there is no doubt that the gap between us and our data will deliver more of us to big, terrifying moments in the datachasm. As Holm and his family discovered, the datachasm is deep and wide, and grows deeper and wider every day. To have even a hope of closing it, we must have a much broader discussion both about what data means, and who controls it.

Read more from our 'Closely observed citizens' series here.