Critics allege big data can be discriminatory, but is it really bias?

Getty I 187131740

Big data is increasingly viewed as a strategic asset that can transform organizations through its use of powerful predictive technologies.

But when it comes to systems that help make such decisions, the methods applied may not always seem fair and just to some, according to a panel of social researchers who study the impact of big data on public and society.

The event, organized recently by New York University's Politics Society and Students for Criminal Justice Reform, centered on issues arising out of big data's use in machine learning and data mining to drive public and private sector executive decisions.

The panel that included a mix of policy researchers, technologists, and journalists, discussed ways in which big data—while enhancing our ability to make evidence-based decisions—does so by inadvertently setting rules and processes that may be inherently biased and discriminatory.

The rules, in this case, are algorithms, a set of mathematical procedures coded to achieve a particular goal. Critics argue these algorithms may perpetuate biases and reinforce built-in assumptions.

Government agencies have recently begun scrutinizing the ethical implications of the emerging field. Last week, a White House report cautioned that data collection could undermine civil rights, if not applied correctly. The report called for a conversation to determine "how best to encourage the potential of these technologies while minimizing risks to privacy, fair treatment, and other core American values."

In his 2014 report titled "Big Data's Disparate Impact", Solon Barocas, a panel member and a research fellow with the Center for Information Technology Policy at Princeton University, points out that "advocates of algorithmic techniques like data mining argue that they eliminate human biases from the decision-making process. But an algorithm is only as good as the data it works with."

Barocas studies the impact of emerging applications of machine learning and the ethical and epistemological issues that they raise. He added that "data mining can inherit the prejudices of prior decision-makers or reflect the widespread biases that persist in society at large."

In other words, machine learning systems that run on data produced by humans are based on algorithms designed by humans. Hence that very data carries the implicit biases of the humans that create it.

In its own study two years ago, the Federal Trade Commission raised similar issues. Although the regulator acknowledged the benefits of big data, it said the process of compiling such information carried the risk of companies misusing data to discriminate against certain segments of the population.

Allegations of data discrimination

The often cited case of big data discrimination points to a research conducted few years ago by Latanya Sweeny, who heads the Data Privacy Lab at Harvard University.

The case involves Google ad results when searching for certain kinds of names on the internet. In her research, Sweeney found that distinct sounding names often associated with blacks showed up with a disproportionately higher number of arrest record ads compared to white sounding names by roughly 18 percent of the time. Google has since fixed the issue, although they never publicly stated what they did to correct the problem.

The proliferation of big data in the last few years has seen other allegations of improper use and bias. These allegations run the gamut, from online price discrimination and consequences of geographic targeting to the controversial use of crime predicting technology by law enforcement, and lack of sufficient representative[data] sample used in some public works decisions.

The benefits of big data need to be balanced with the risks associated with applying modern technologies to address societal issues. Yet data advocates believe that democratization of data has in essence given power to the people to affect change by transferring 'tribal knowledge' from experts to data-savvy practitioners.

Big data is here to stay

According to some advocates, the problem is not so much that 'big data discriminates', but that failures by data professionals risk misinterpreting the findings at the heart of data mining and statistical learning. They add that the benefits far outweigh the concerns.

"In my academic research and industry consulting, I have seen tremendous benefits accruing to firms, organizations and consumers alike from the use of data-driven decision-making, data science, and business analytics," Anindya Ghose, the director of Center for Business Analytics at New York University's Stern School of Business, said.

"To be perfectly honest, I do not at all understand these big-data cynics who engage in fear mongering about the implications of data analytics," Ghose said.

"Here is my message to the cynics and those who keep cautioning us: 'Deal with it, big data analytics is here to stay forever'."