Print Friendly and PDF
only search

Weapons of maths destruction

How big data increases inequality and threatens democracy through opacity, scale and damage. An interview.

Shutterstock/Ollyy. All rights reserved.Leo Hollis (LH): In your book there are three keywords that you use early on to summarise the problems that big data and access to it bring about: opacity, scale, and damage. In terms of opacity, it's the ability to collect information on a level that we've never seen before and that is something of a black box to ‘us’, normal citizens. Do you feel that that is something that has occurred without us noticing?

Cathy O’Neil (C.O.): It depends on who ‘us' is referring to, but I would say yes. I would say that we've been hearing the same line coming from the architects of the internet, which is essentially, “Well, you get something for your data. You get this free service,” for whatever reason. Even though they call it a trade, they say, “It's free.” You're paying for this service with your data, and people are willing to do that.

That's been the line, but, of course, the real problem is that there's a lag of maybe 10 years or so between when people start collecting data and when they start really using it; preying on people because of what they've learned from that data. We haven't hit the lag yet, so we don't actually know if that trade has worked for us.

LH: So if you're not paying for the product, you are the product?

CO: That's true too, yes. One thing I learned from Occupy is the lens of power. Most of those situations where you're giving data to the people that may or may not score or see you fairly, it's an important decision that you have no control over, no view into, and no appeals process.

That opacity you're talking about – and destructiveness, for that matter – are in the context of people who you need to make happy. They ask you for your data and you have to give it to them. When you're trying to get a job, you have to answer the questions they ask you. Or when you're being sentenced to prison. Or when you're trying to get into college. When you have a power disadvantage, your data is up for grabs.

LH: And the people who are asking for that information are still a wide variety of power sources? Quite a lot of the examples you use in the book are the state bodies that have been provided with a certain kind of algorithm, usually commissioned or bought from a private company. It seems to me that what we've been sold is a sense that all software is basically neutral. It's just how you use it.

CO: The example I give in the book is about making dinner for my kids, because actually algorithms happen in our own heads. We don't have to formalise them. I first of all talk about curating the data, which in this case is just the ingredients in my kitchen: what am I going to cook dinner with? I will not cook dinner with ramen noodles. That's not food to me. That's my teenager's favourite food, but it's not my favourite food.

Every algorithm has a definition of success, and we optimise to the definition of success, obviously. Just by its very name, it carries our values: what do we think matters? I define success for a meal to be if my kids eat vegetables.

My sons would not agree with that, especially my eight-year-old, who loves Nutella. His definition of success would be, “I get to eat Nutella.”

It seems to me that what we've been sold is a sense that all software is basically neutral. It's just how you use it...

That's kind of the perfect metaphor for any time you see an algorithm. It's called an objective, but there's always an embedded agenda in its definition of success – and, for that matter, how it treats errors. That's a very, very important thing. We usually optimise to accuracy, but we also optimise to the false-positive and the false-negative error rates, which can really, really matter, depending on what kind of errors we're talking about.

If it's a hiring algorithm, a false positive is when a company hires someone even though they aren't going to be good for the job. That's probably a mistake that the company wants to avoid, so they're going to optimise away from false positives. A false negative is when somebody who is totally qualified doesn't get the job.

The company will probably not even be able to measure the false negatives, if you think about it. If they never hire someone, they'll never even know that person wasn't qualified, and as long as the company is getting qualified people, they don't really care.

LH: The harm in that case falls upon the people who don't get the job. That's why I suspect that most of these hiring algorithms are ridiculously discriminatory, because there's every incentive for them to be and because there's no monitor on them. There's also no reason to think that they're just inherently fair, except for the marketing that we've been exposed to.

CO: The mistake that people make is this: they think that when you build an algorithm you're going to be following the data, which is true, but they think that means it is somehow less biased. There's no reason to think that. It's exactly as biased as the data that you feed to it.

LH: And in your book you talk about the implications of that biased data further – for example with policing.

CO: To be a bit clearer about that, it continues the past practices. It doubles down in the sense that if you also believe it to be a fair and objective process, like this algorithm, then you don't question yourself any more.

So, if the computer is telling you, “Go ahead and be as racist as you've always been,” then you don't ask yourself: “Why do we send so many more cops to black neighbourhoods?”

I think of the predictive policing algorithms as more of a police prediction than crime prediction. It's predicting what the police will do. Every algorithm should be a learning algorithm, which just means you refresh the data; you add more data all the time. In this case, the data is where are the arrests – locations of arrests? If the police started practising policing differently, if they stopped over-policing some neighbourhoods and they started expanding their reach – and, furthermore, they actually arrested white people for crimes of poverty like they arrest black people – then the algorithms themselves would look bad. They'd look inaccurate, but as you refresh them they would learn: “The police behaved differently and now here is how it works.” I think of the predictive policing algorithms as more of a police prediction than crime prediction. It's predicting what the police will do.

I'm not saying it wouldn't be possible to change our practices, our policies. The question is how much are we learning from the algorithm, and how much are we teaching the algorithm?

The answer is that it depends, but people who really believe these algorithms work will have the police follow the algorithms, so the algorithms will be taking the lead rather than the humans. That's the problem – the problem of following the rules set out by mistakes in the past.

LH: Then you get companies like Facebook, with data that has become extraordinarily powerful. The first question is on ownership of that data. At the moment, clearly, they have complete ownership of that. Is that ever going to change? Would we ever be able to get back our information, do you think? Secondly, as they gather more and more big data, are they going to become increasingly powerful and, as a result, more and more dangerous?

CO: Yes. It's up to us, first of all. Second of all, it's a really interesting ongoing conversation about data governance and how we could possibly approach regulating. We don't know right now and it's not obvious. I don't have a sound-bite answer to that, but I do think that we have to think about it. I think it's antidemocratic, the kind of power that Facebook already has, for that matter.

LH: Because they are monopolies now?

CO: Yes, they are monopolies. But, more importantly, they are propaganda machines and they have the power to swing elections. They might have already swung elections without even attempting to. That's even scarier in some sense because what they haven't done is acknowledge their power, and they haven't started monitoring their power in a way that is accountable to the public. That's the problem – the problem of following the rules set out by mistakes in the past.

The Silicon Valley companies are very powerful, and they have a lot of lobbyists, and they have an infinite amount of money. If you wanted to sum up their ideology in one sentence, it is that technology is better than and will replace politics. Of course, what they mean by that is: “It will be replaced by our politics.” That looks like an ignorance of class, ignorance of gender and race, an assumption that we will all transcend and become one with the machine and we will never die.

We need to demand accountability. I don't know if that looks like, “Give us our data back and stop tailored advertising” because that would close them down. I'm totally fine with that, by the way, we should definitely consider it. I don't have any limits on what we should do, but I don't know what actually makes sense to do.

I'm not a particularly open data, transparent data type of person. What I want is accountability, which is different from openness. I want to know: how is this algorithm affecting us? Are we losing sight of what truth is? How do we measure that? It's hard. Democracy is not an easy thing to quantify.

This article is an excerpt from IPPR’s latest Progressive Review issue published on September 13, 2017. See the full interview on openDemocracy in October, or read more here:

About the authors

Cathy O'Neil, a former Wall St. analyst, is a mathematician and author of several books on data science, including most recently Weapons of Math Destruction (Allen Lane, 2017). Her blog is and tweets @mathbabedotorg


Leo Hollis is a writer and urban historian.

We encourage anyone to comment, please consult the
oD commenting guidelines if you have any questions.