Friday, May 2, 2014

Disparate impact and data science, Part 2: garbage in, garbage out

I recently wrote a blog post describing how disparate impact doctrine and data science are on a collision course. Today, the Obama Administration released a report, Big Data: Seizing Opportunities, Preserving Values, that illustrates this point nicely. The report states:

    The fusion of many different kinds of data, processed in real time, has the power to deliver exactly the right message, product, or service to consumers before they even ask. Small bits of data can be brought together to create a clear picture of a person to predict preferences or behaviors. These detailed personal profiles and personalized experiences are effective in the consumer marketplace and can deliver products and offers to precise segments of the population -- like a professional accountant with a passion for knitting, or a home chef with a penchant for horror films.

    Unfortunately, "perfect personalization" also leaves room for subtle and not-so-subtle forms of discrimination in pricing, services, and opportunities. For example, one study found web searches involving black-identifying names (e.g., "Jermaine") were more likely to generate ads with the word "arrest" in them than searches with white-identifying names (e.g., "Geoffrey"). This research was not able to determine exactly why a racially biased result occurred, recognizing that ad display is algorithmically generated, based on a number of variables and decision processes. [emphasis added]

The study cited is “Discrimination in Online Ad Delivery by Latanya Sweeney. Ms Sweeney writes:

    [T]he EEOC uses an "adverse impact" test, which measures whether practices, intentional or not, have a disproportionate effect. If the ratio of the effect on groups is less than 80%, the employer may be held responsible for discrimination. ... Notice that racism can result, even if not intentional and that online activity may be so ubiquitous and intimately entwined with technology design that technologists may now have to think about societal consequences like structural racism in the technology they design. [emphasis added]

Ms. Sweeney then goes on to demonstrate that, yes, web searches for "black-identifying" names do, in fact, more frequently result in the display of ads with the word "arrest" in them than searches for "white-identifying" names:

    This study raises more questions than it answers. Here is the one answer provided. ... A greater percentage of ads having "arrest" in the ad text appeared for black identifying first names than for white identifying first names." ... There is discrimination in delivery of these ads.

The problem with the conclusion that there is "discrimination in the delivery of these ads" is that it is based on the single premise that statistical analysis alone can determine whether discrimination has taken place regardless of whether any intent to discriminate can be identified. The validity of this premise is dubious, at best. How can discrimination take place if there is no intent to discriminate? Is there any such thing as "structural racism?" Or does racism require human intention?

Roger Clegg lays out the argument against this premise:

    "Disparate impact” is the favored but dubious legal theory of the Obama administration. It’s being used to attack everything from election integrity to the financial industry when DOJ doesn’t have any evidence of intentional discrimination. This theory lets DOJ attack completely race-neutral laws and practices that it doesn’t like for policy reasons. ... The 14th and 15th Amendments prohibit state actions only where there is “disparate treatment” on the basis of race. The U.S. Supreme Court has made clear that in the context that means actions undertaken with racially discriminatory intent. Thus, congressional legislation must be aimed at preventing intentional racial discrimination, not just actions that may have just an effect that disproportionately affects racial minorities. This is especially so in light of federalism concerns and the fact that, as Justice Antonin Scalia noted in Ricci v. DeStefano, the disparate-impact approach actually encourages race-based decision making, which would violate the Constitution’s guarantee of equal protection.

Ms Sweeney goes on to suggest some proposals for "remedying" disparate impact:

    [T]echnology can do more to thwart discriminatory effects and harmonize with societal norms. ... Search and ad technology already reason extensively about context and appropriateness when deciding to deliver the best content to the reader. With some expansion, technology could additionally reason about the social and legal implications of content and context, too.

In other words, if algorithmically generated ad displays produce politically incorrect results, we should alter the algorithms. Such an action may seem rather innocuous in the case of ad placement, but if this same principle were applied to other areas, for example, the algorithmic evaluation of loan applications, it could have seriously detrimental effects. For example, as I wrote in my previous blog post on disparate impact and data science:

    Suppose, for example, that you have an algorithm that mines data in an attempt to determine the set of features that are best able to predict whether a prospective borrower will default on a loan or commit some kind of fraud. (Such algorithms are a well-known part of machine learning and statistics and are referred to under the general heading of feature selection.) Suppose further that the algorithm, presumably operating quite impartially and without human intervention [that is, with no intention to discriminate], discovers that the features that are most predictive of default are gender, race [or, since race may not be recorded, a proxy for race, such as a race-identifying name], and marital status. That is, the algorithm may find that if you are a single black mother then the probability of you defaulting or committing fraud is predicted to be high, whereas if you are a married white male, the predicted probability is small. What, then, are you supposed to do with the model?

    I have actually heard professional data scientists say that they would suppress consideration of these features in the predictive model. In other words, data scientists are actually being forced to pervert software algorithms so that they produce corrupt results, simply because they are afraid that the uncorrupted results will be politically unacceptable and subject them to attacks from the race and gender Stasi. There are even some misguided data scientists who willingly embrace the corruption of their science as the price that has to be paid in exchange for the advancement of certain "protected groups."

    This is the corrupt state of affairs that we have arrived at because of policies like disparate impact, which promote race- and gender-based prejudice over scientific understanding. Imagine what would happen in a field like Physics if scientists distorted their results to achieve such political ends.

When Ms. Sweeney talks about "expanding" search technology so that it can "additionally reason about the social and legal implications of content and context," she is coming very close to recommending that algorithms be modified to produce politically correct results. She should think again. As every computer scientist knows: garbage in, garbage out.

Update: Another problem with Ms. Sweeney's analysis is that it emphasizes that general algorithmic decisions may be unjust with regard to particular individuals. For example, Ms Sweeney demonstrates that when web searches are performed for the black-identifying names of black professionals with advanced degrees those searches still produce more ads with the word "arrest" in them. Ms. Sweeney is concentrating on individuals and ignoring aggregate results. It may be the case that Ms Sweeney, an individual with a black sounding name who holds a PhD in Computer Science, is unlikely to default on a mortgage. However, such a fact would not necessarily invalidate the results of an algorithm that found that black-identifying names were, on average, a feature that was predictive of default. There are always exceptions to rules. But that does not mean that the rules are useless. On occasion, you will find an individual Italian who doesn't like spaghetti. That does not mean that Italians in general do not eat a lot of spaghetti.

No comments:

Post a Comment