Sabaziotatos: Silicon Valley Conservative: Disparate impact and data science, Part 2: garbage in, garbage out

I recently wrote a blog post describing how disparate impact doctrine and data science are on a collision course. Today, the Obama Administration released a report, Big Data: Seizing Opportunities, Preserving Values, that illustrates this point nicely. The report states:

Unfortunately, "perfect personalization" also leaves room for subtle and not-so-subtle forms of discrimination in pricing, services, and opportunities. For example, one study found web searches involving black-identifying names (e.g., "Jermaine") were more likely to generate ads with the word "arrest" in them than searches with white-identifying names (e.g., "Geoffrey"). This research was not able to determine exactly why a racially biased result occurred, recognizing that ad display is algorithmically generated, based on a number of variables and decision processes. [emphasis added]

The study cited is “Discrimination in Online Ad Delivery by Latanya Sweeney. Ms Sweeney writes:

intentional or not

racism can result, even if not intentional

structural racism

Ms. Sweeney then goes on to demonstrate that, yes, web searches for "black-identifying" names do, in fact, more frequently result in the display of ads with the word "arrest" in them than searches for "white-identifying" names:

This study raises more questions than it answers. Here is the one answer provided. ... A greater percentage of ads having "arrest" in the ad text appeared for black identifying first names than for white identifying first names." ... There is discrimination in delivery of these ads.

The problem with the conclusion that there is "discrimination in the delivery of these ads" is that it is based on the single premise that statistical analysis alone can determine whether discrimination has taken place regardless of whether any intent to discriminate can be identified. The validity of this premise is dubious, at best. How can discrimination take place if there is no intent to discriminate? Is there any such thing as "structural racism?" Or does racism require human intention?

Roger Clegg lays out the argument against this premise:

when DOJ doesn’t have any evidence of intentional discrimination

disparate treatment

actions undertaken with racially discriminatory intent

intentional racial discrimination, not just actions that may have just an effect that disproportionately affects racial minorities

Ms Sweeney goes on to suggest some proposals for "remedying" disparate impact:

[T]echnology can do more to thwart discriminatory effects and harmonize with societal norms. ... Search and ad technology already reason extensively about context and appropriateness when deciding to deliver the best content to the reader. With some expansion, technology could additionally reason about the social and legal implications of content and context, too.

In other words, if algorithmically generated ad displays produce politically incorrect results, we should alter the algorithms. Such an action may seem rather innocuous in the case of ad placement, but if this same principle were applied to other areas, for example, the algorithmic evaluation of loan applications, it could have seriously detrimental effects. For example, as I wrote in my previous blog post on disparate impact and data science:

feature selection

I have actually heard professional data scientists say that they would suppress consideration of these features in the predictive model. In other words, data scientists are actually being forced to pervert software algorithms so that they produce corrupt results, simply because they are afraid that the uncorrupted results will be politically unacceptable and subject them to attacks from the race and gender Stasi. There are even some misguided data scientists who willingly embrace the corruption of their science as the price that has to be paid in exchange for the advancement of certain "protected groups."

This is the corrupt state of affairs that we have arrived at because of policies like disparate impact, which promote race- and gender-based prejudice over scientific understanding. Imagine what would happen in a field like Physics if scientists distorted their results to achieve such political ends.

When Ms. Sweeney talks about "expanding" search technology so that it can "additionally reason about the social and legal implications of content and context," she is coming very close to recommending that algorithms be modified to produce politically correct results. She should think again. As every computer scientist knows: garbage in, garbage out.

Update: Another problem with Ms. Sweeney's analysis is that it emphasizes that general algorithmic decisions may be unjust with regard to particular individuals. For example, Ms Sweeney demonstrates that when web searches are performed for the black-identifying names of black professionals with advanced degrees those searches still produce more ads with the word "arrest" in them. Ms. Sweeney is concentrating on individuals and ignoring aggregate results. It may be the case that Ms Sweeney, an individual with a black sounding name who holds a PhD in Computer Science, is unlikely to default on a mortgage. However, such a fact would not necessarily invalidate the results of an algorithm that found that black-identifying names were, on average, a feature that was predictive of default. There are always exceptions to rules. But that does not mean that the rules are useless. On occasion, you will find an individual Italian who doesn't like spaghetti. That does not mean that Italians in general do not eat a lot of spaghetti.

Sabaziotatos: Silicon Valley Conservative

Friday, May 2, 2014

Disparate impact and data science, Part 2: garbage in, garbage out

No comments:

Post a Comment