×

What data reveals about Hillary Clinton's emails

Hillary Clinton
Joshua Roberts | Reuters
Hillary Clinton

Vermont Senator Bernie Sanders may think American people are sick and tired of hearing about Hillary Clinton's email controversy, but that very sentiment also inspired data wonks to find out what the fuss was all about.

The email dump and open access provided by the State Department have spurred data wonks everywhere to take a crack at generating additional insights. For instance, Kaggle, which boasts itself as the "world's largest community of data scientists," has posted released email on their website, allowing anyone to download data and post their results.

Recently, a couple of graduate students in data analytics at New York University's Stern School of Business took on the task of dissecting Hillary Clinton's emails, which may have violated federal law because of her use of a private server to handle classified data. The NYU analysis shines the light on her close communication network, important email topics, significant words associated with those emails and frequency of critical words found in her correspondence.

Read MoreThis company grows by looking in your email

Using data provided by U.S. Department of State, the NYU study group analyzed over 4000 emails. The visualization provide additional insight into the former Secretary's inner circle, her communications, what she shared and who she shared it with.

"We wanted to apply statistical methods to identify interesting patterns from Secretary Clinton's emails since it's been such a hot topic this election season. With the public release of the data, we wanted to better understand the issue in analytical terms," said Eugene Kwak, whose team at Stern conducted the analysis.

FBI investigating Hillary Clinton's server

Based on a theory that measures how individuals and groups interact within their network, the chart reveals Monica Hanley, Cheryl Mills, Huma Abedin and Sidney Blumenthal as the most important nodes. This group signifies a high degree of centrality, or the individuals that communicated the most in the network besides Hillary Clinton. Blumenthal's role has been especially controversial, because he was not a government employee but appeared to have handled classified information.

The former Secretary of State and 2016 presidential contender has come under continued attack for using a private email server to send and receive important documents while holding public office. To answer her critics, Hillary Clinton's team has put out a fact sheet that lays out the circumstances surrounding the content and exchange of emails.

Also mentioned in the fact sheet are details about emails submitted to the U.S. Department of State. As stated, "over 30,000 copies of work-related emails were provided, totaling roughly 55,000 pages. More than 90 percent of her work or potentially work-related emails provided to the Department were already in the State Department's record-keeping system because those e-mails were sent to or received by 'state.gov' accounts."

As part of The Freedom of Information Act (FOIA), the State Department released some of Hilary Clinton's emails to public last year. Since then the Department has been releasing the documents monthly, with the latest set released earlier this year in February.

"Visualizing the [email]network shows us who the important contacts were to Clinton. This gives us a preview of what Clinton's network looks like. It was interesting to see names like Tony Blair show up among the top contacts," said Prasant Sudhakaran, who was another member of the group conducting the analysis.

In addition to identifying the important network, their analysis also digs further into the actual email texts and provides overarching themes about the language related to the 2011 Benghazi siege in Libya, in which four American personnel perished. The issue has been a sore spot for Clinton.

The report includes an account of words such as "waiver", "select", "redaction" and "sensitive", that appeared close together when "Benghazi" and "terrorism" was mentioned in the same email. Also worth reporting: among emails related to countries, Libya was the most talked about, followed by Afghanistan, Haiti and Iraq.

Further down the list, but still part of significant conversations, were other countries like Israel, United Kingdom, Brazil and India.

A full reading of Clinton's email cache can be found online, or on The Wall Street Journal's website.

—By CNBC's Pradip Sigdyal