Weapons of maths destruction

Leo Hollis (LH): Thinking about your own story, the one that you talk about in Weapons of Math Destruction – you started off in academia studying maths, then went into the City, a fairly common move. But you were going into the hedge fund, into the quant world, until the financial crash. How did you first experience the crash?

Cathy O’Neil (C.O.): I actually walked into the crash. I left my professorship in June 2007, started a hedge fund, and in the summer, very soon after I came in – that was when the financial crisis started, from the perspective of people inside finance.

It was called a ‘kerfuffle’ where I worked, but really it was a big, big deal. The Equity Group, which was probably the largest trading group, had to liquidate their books. They got freaked out, and they lost a lot of money because they were unwinding all their trades. Some of the trades were very big, so that meant a lot of loss.

From that moment on for the next year, everyone was panicking. Then a year later that’s when Lehman Brothers fell and everyone else in the world noticed, but for that whole year there was a really clear problem in the overnight lending markets between banks.

LH: Did you see it as a structural problem straightaway, or was there the feeling that this was human error?

CO: I didn’t see it as anything, to be honest, because I didn’t know what I was doing. But at some point in that year, quantative analysts like me had an invitation to talk to Federal Reserve chair Alan Greenspan, former Treasury Secretary Robert Rubin and his deputy Larry Summers about macroeconomic issues at the Rockefeller Centre, and they very clearly were worried about the securities market.

They were talking about it being unstable, it being like a bomb about to go off. Then, after that, we started really talking about it within the firm. I remember our managing director describing how the mortgage-backed securities were structured, and how a terrible group of mortgages could end up being called ‘Triple A’. And I remember at that exact moment I felt sick to my stomach. I just remember thinking, “Wait a second. That’s mathematically stupid. How could that possibly be true? It doesn’t sound right. It sounds like it’s a sausage factory,” right? It’s actually that they wanted to not know the truth, so it was a different kind of bullshit algorithm.

LH: And you weren’t able to discuss this with your colleagues?

CO: That particular thing we did talk about, because one of the things that we felt protected by was the fact that we didn’t, ourselves, invest in those terrible mortgage-backed securities. We thought of ourselves as relatively walled off from that problem, which we weren’t at all, as we found out.

LH: And so you moved out of that business?

CO: I wanted to fix the problem. I tried to get a job as a regulator; I applied to the SEC (Securities and Exchange Commission), the CFTC (Commodity Futures Trading Commission) and all of these places. Nobody answered my call, so I ended up getting a job at a risk firm because I was thinking, “Maybe, if we had a better risk model, we could avoid these problems.”

I actually ended up working on the ‘Credit Default Swap’ instrument class, to try and understand the risk, but I pretty much decided that they didn’t want to know. The world didn’t want to know. People were using the risk evaluations by the company I was working at – Risk Metrics, a very widely used company – as a rubberstamp to allow themselves to keep doing the same practices.

It’s actually that they wanted to not know the truth, so it was a different kind of bullshit algorithm. One of them, the first one, was lying in order to sell more bad mortgages. The second one was lying in a different way to hide risk. Either way I was fed up, because, as a mathematician, I want to use my math to clarify, not to lie.

I understood the power of the mathematical brand and how people will be trusting. They consider mathematics trustworthy and intimidating at the same time, and I thought, “I don’t want any of this. I don’t want to be brandishing mathematics. I want to be helping people with maths.” I thought data science would be a way to do that, so I went into data science, but very quickly figured out that the same thing was happening in data science. I was fed up, because, as a mathematician, I want to use my math to clarify, not to lie.

The difference was (and this is post-financial crisis – I was actually a member of Occupy by this time) everyone noticed when the financial crisis happened. The kinds of failures I was seeing with algorithms outside of finance – they were silent. The failures themselves were silent. Even though they were happening in my estimation all the time, all over the place, there was no organised way of investigating it or even noticing it.

Sometimes I’d use the metaphor that big data as an industry is like the beginning of a transportation industry, like car manufacturing, or even airplane manufacturing. But that’s not really true, because, with big pileups on the highway, or airplane crashes, everyone knows about it. But, when these algorithms fail, nobody knows – almost never.

LH: Was it around Occupy that you started, I suppose, seeing that there was a conversation going on about algorithms, or was it still something that they had missed?

CO: I would say Occupy didn’t give me insight into algorithmic problems per se, but what it allowed me to do was connect the dots from finance to inequality, and to see these problems through the historical lenses of racism – slavery, even – and sexism. When these algorithms fail, nobody knows – almost never.

I started learning a little more history. I’m not a historian at all. I was really only interested in maths when I started all this, but Occupy started making me realise: “Wait, this is not a coincidence that the people who were screwed most by the financial crisis are African Americans,” who once again were given outrageous APRs on their mortgages that they couldn’t possibly pay. The financial crisis was the biggest loss of wealth for the black community that we’ve ever seen. That was the end result of those mortgages. I’m not saying that a larger part of the population didn’t also suffer from the financial crisis, but, if you looked how the suffering was distributed, it wasn’t distributed equally.

That’s what Occupy gave me: this conversation, and it’s still going on. I still meet with our Occupy group on Sundays about how these things come about and how they are all connected, through the Committee of Alternative Banking. Basically, we now talk about social justice through the lens of finance, not the other way around.

We had that conversation, and I slowly but surely realised, “Hey, this applies to what I’m doing right now.” I’m working on online advertising at this point – this is 2011/2012 – and I realise I am manufacturing the most concerted effort to put people in marketing silos and to extract the most amount of money that we possibly can.

Nothing has ever been this organised or this comprehensive, and it’s because of big data, it’s because of social media. It’s because we can surveil people, and keep track of them and profile them that we’re able to do this. For that reason, payday lenders are able to find the most desperate people and offer them terrible, terrible loans. Basically, we now talk about social justice through the lens of finance, not the other way around.

LH: In your book there are three keywords that you use early on to summarise the problems that big data and access to it bring about: opacity, scale, and damage. In terms of opacity, it's the ability to collect information on a level that we've never seen before and that is something of a black box to ‘us’, normal citizens. Do you feel that that is something that has occurred without us noticing?

CO: It depends on who ‘us' is referring to, but I would say yes. I would say that we've been hearing the same line coming from the architects of the internet, which is essentially, “Well, you get something for your data. You get this free service,” for whatever reason. Even though they call it a trade, they say, “It's free.” You're paying for this service with your data, and people are willing to do that.

That's been the line, but, of course, the real problem is that there's a lag of maybe 10 years or so between when people start collecting data and when they start really using it; preying on people because of what they've learned from that data. We haven't hit the lag yet, so we don't actually know if that trade has worked for us.

LH: So if you're not paying for the product, you are the product?

CO: That's true too, yes. One thing I learned from Occupy is the lens of power. Most of those situations where you're giving data to the people that may or may not score or see you fairly, it's an important decision that you have no control over, no view into, and no appeals process.

That opacity you're talking about – and destructiveness, for that matter – are in the context of people who you need to make happy. They ask you for your data and you have to give it to them. When you're trying to get a job, you have to answer the questions they ask you. Or when you're being sentenced to prison. Or when you're trying to get into college. When you have a power disadvantage, your data is up for grabs.

LH: And the people who are asking for that information are still a wide variety of power sources? Quite a lot of the examples you use in the book are the state bodies that have been provided with a certain kind of algorithm, usually commissioned or bought from a private company. It seems to me that what we've been sold is a sense that all software is basically neutral. It's just how you use it.

CO: The example I give in the book is about making dinner for my kids, because actually algorithms happen in our own heads. We don't have to formalise them. I first of all talk about curating the data, which in this case is just the ingredients in my kitchen: what am I going to cook dinner with? I will not cook dinner with ramen noodles. That's not food to me. That's my teenager's favourite food, but it's not my favourite food.

Every algorithm has a definition of success, and we optimise to the definition of success, obviously. Just by its very name, it carries our values: what do we think matters? I define success for a meal to be if my kids eat vegetables.

My sons would not agree with that, especially my eight-year-old, who loves Nutella. His definition of success would be, “I get to eat Nutella.”

It seems to me that what we've been sold is a sense that all software is basically neutral. It's just how you use it...

That's kind of the perfect metaphor for any time you see an algorithm. It's called an objective, but there's always an embedded agenda in its definition of success – and, for that matter, how it treats errors. That's a very, very important thing. We usually optimise to accuracy, but we also optimise to the false-positive and the false-negative error rates, which can really, really matter, depending on what kind of errors we're talking about.

If it's a hiring algorithm, a false positive is when a company hires someone even though they aren't going to be good for the job. That's probably a mistake that the company wants to avoid, so they're going to optimise away from false positives. A false negative is when somebody who is totally qualified doesn't get the job.

The company will probably not even be able to measure the false negatives, if you think about it. If they never hire someone, they'll never even know that person wasn't qualified, and as long as the company is getting qualified people, they don't really care.

LH: The harm in that case falls upon the people who don't get the job. That's why I suspect that most of these hiring algorithms are ridiculously discriminatory, because there's every incentive for them to be and because there's no monitor on them. There's also no reason to think that they're just inherently fair, except for the marketing that we've been exposed to.

CO: The mistake that people make is this: they think that when you build an algorithm you're going to be following the data, which is true, but they think that means it is somehow less biased. There's no reason to think that. It's exactly as biased as the data that you feed to it.

LH: And in your book you talk about the implications of that biased data further – for example with policing.

CO: To be a bit clearer about that, it continues the past practices. It doubles down in the sense that if you also believe it to be a fair and objective process, like this algorithm, then you don't question yourself any more.

So, if the computer is telling you, “Go ahead and be as racist as you've always been,” then you don't ask yourself: “Why do we send so many more cops to black neighbourhoods?”

I think of the predictive policing algorithms as more of a police prediction than crime prediction. It's predicting what the police will do. Every algorithm should be a learning algorithm, which just means you refresh the data; you add more data all the time. In this case, the data is where are the arrests – locations of arrests? If the police started practising policing differently, if they stopped over-policing some neighbourhoods and they started expanding their reach – and, furthermore, they actually arrested white people for crimes of poverty like they arrest black people – then the algorithms themselves would look bad. They'd look inaccurate, but as you refresh them they would learn: “The police behaved differently and now here is how it works.” I think of the predictive policing algorithms as more of a police prediction than crime prediction. It's predicting what the police will do.

I'm not saying it wouldn't be possible to change our practices, our policies. The question is how much are we learning from the algorithm, and how much are we teaching the algorithm?

The answer is that it depends, but people who really believe these algorithms work will have the police follow the algorithms, so the algorithms will be taking the lead rather than the humans. That's the problem – the problem of following the rules set out by mistakes in the past.

LH: Then you get companies like Facebook, with data that has become extraordinarily powerful. The first question is on ownership of that data. At the moment, clearly, they have complete ownership of that. Is that ever going to change? Would we ever be able to get back our information, do you think? Secondly, as they gather more and more big data, are they going to become increasingly powerful and, as a result, more and more dangerous?

CO: Yes. It's up to us, first of all. Second of all, it's a really interesting ongoing conversation about data governance and how we could possibly approach regulating. We don't know right now and it's not obvious. I don't have a sound-bite answer to that, but I do think that we have to think about it. I think it's antidemocratic, the kind of power that Facebook already has, for that matter.

LH: Because they are monopolies now?

CO: Yes, they are monopolies. But, more importantly, they are propaganda machines and they have the power to swing elections. They might have already swung elections without even attempting to. That's even scarier in some sense because what they haven't done is acknowledge their power, and they haven't started monitoring their power in a way that is accountable to the public. That's the problem – the problem of following the rules set out by mistakes in the past.

The Silicon Valley companies are very powerful, and they have a lot of lobbyists, and they have an infinite amount of money. If you wanted to sum up their ideology in one sentence, it is that technology is better than and will replace politics. Of course, what they mean by that is: “It will be replaced by our politics.” That looks like an ignorance of class, ignorance of gender and race, an assumption that we will all transcend and become one with the machine and we will never die.

We need to demand accountability. I don't know if that looks like, “Give us our data back and stop tailored advertising” because that would close them down. I'm totally fine with that, by the way, we should definitely consider it. I don't have any limits on what we should do, but I don't know what actually makes sense to do.

I'm not a particularly open data, transparent data type of person. What I want is accountability, which is different from openness. I want to know: how is this algorithm affecting us? Are we losing sight of what truth is? How do we measure that? It's hard. Democracy is not an easy thing to quantify.

LH: So, how is oversight achieved? Is that a personal obligation: to be aware of the way that one’s information is being used? Or are there some other ways that we can organise as a civil society?

CO: It has to be agitated at the civil society level. It has to be a political campaign, I think. It has to end up at policy. There’s this guy Ben Shneiderman. He was a Maryland computer science professor, but he came to the Alan Turing Institute a couple of months ago and he suggested that we institute something called the National Algorithmic Safety Board, modelled after the National Transportation Safety Board. They investigate plane crashes – and, for that matter, traffic crashes – and they suggest improvements in safety.

He made a very important point, and he said this was crucial to emphasise: they hold people responsible – human beings. I think that is probably by far the biggest problem we have right now with respect to algorithmic harm: that almost no algorithm has a human being who is responsible for it.

What we need is regulation. If everyone has the same regulation, then there’s no race to the bottom. That’s why regulations are sometimes very reasonable; we don’t allow chemical companies to pour their waste into the rivers, even though it would be cheaper for them to do so.

LH: So, regulation, and some kind of ethics commission coming out of civil society, forcing policy?

CO: I think accountability isn’t the same thing as ethics. When I say “accountability”, I mean things like, “Does this work?”

When we accept an algorithm from a builder, we should say, “Show me evidence this works. Show me evidence this is legal.” You could say, “Show me evidence this is ethical,” but I think that’s actually a little bit further down the road. Let’s just first rely on our laws. If the laws are inadequate, let’s improve our laws. We have antidiscrimination laws, but we have all these algorithms that aren’t being checked against them. I would like that to start happening. That’s accountability, for me.

But I do think it’s a different category altogether where you have criminal algorithms. You have all sorts of examples in my book – mostly unintentional mistakes, unintentional bias – but you also have VW with the emissions scandal. That is an algorithm. It’s a lying algorithm, and it’s a criminal algorithm. Uber had that, too, for avoiding government officials when it was operating illegally in cities. That is a totally different can of worms, and it’s much closer to hacking.

The chemical companies shouldn’t put shit in the river, but these rivers, they’re not in our backyards any more. They’re not easy to look at. Where is the river of democracy? How do you see whether it’s polluted and who polluted it? It’s hard to track. Where is the river of democracy? How do you see whether it’s polluted and who polluted it?

I’m sure big data companies are in the process of establishing their own standards on algorithms for internal use. Soon, then, they’re going to declare the war against bias in algorithms as over: “We won.” That’s going to happen and I’m glad they’re going to do it, but I know they’re not going to do it the way I would do it or the way the public should do it.

LH: Do you feel that progress is being made?

CO: There’s progress since I started worrying about it, which was six years ago, and I was the only person I knew who was worried about it. Everyone is worried about it now – not everyone, but communities are being set up, conferences are being set up to talk about it.

One problem I’m hitting up against is that there is basically a data science institute for every university popping up, and none of them want to think about this stuff. They’re all thinking about smart cities and investigating partnerships with local big data companies.

LH: Is it that there is no funding for a critical position on this? Where is the institutional investment?

CO: I’ll explain to you why I’ll never get a job doing this. I’m talking about a risk that, as far as the people who are doing this stuff are concerned, is not imminent. Until it becomes an imminent risk, there will be no money in it, because there’ll be no money saved by preventing it.

As soon as it becomes a shareholder value issue where they have to put aside $1 billion in potential litigation fees – at that moment I will have a job. But right now they’re like, “Who’s going to sue us? Who’s going to win? What is this fine going to look like? In the meantime we’re going to use this algorithm, even if it’s illegal.”

This is the full interview from IPPR’s latest Progressive Review issue published on September 13, 2017. Thanks go to Wiley for the original version here.