Monday, June 10, 2013

How to investigate the IRS; what's good for the goose is good for the gander

Today's WSJ contains an article entitled "How to Investigate the IRS." According to the article, Cleta Mitchell, the woman who helped expose IRS abuse of conservative activists, thinks:

    [T]he lever of potential monetary penalties could be useful in persuading senior government officials to come clean. Ms. Mitchell is hopeful that, even if the Justice Department sits on its hands, a combination of private lawsuits and congressional investigations can help ascertain who gave the order to target conservatives.

Here's an idea. How about if the same "big data" techniques are used to investigate the IRS that the IRS and the NSA use to investigate individual citizens?

It was revealed last week, for example, that the NSA has in its possession massive amounts of "metadata" for telephone calls placed by private individuals. What better use could be made of this data than to mine it to discover which phone calls IRS employees placed to which other individuals during the period under investigation. Metadata for emails should be subjected to similar analysis. Whom did IRS employees send emails to and whom did they receive them from? Social media data should also be mined to determine whether individual IRS employees, perhaps as members of public service employee unions, showed any signs that they harbored prejudices against Tea Party groups.

For example, one question that could be answered is: Did any telephone calls take place or were any emails exchanged between individuals in the Cincinnati office of the IRS and members of the Obama campaign's analytics team during the time when the IRS was targeting conservative groups? If there were any such calls, this would suggest that members of the Obama campaign team exercised political influence on the IRS.

Since it is just "metadata" or data in the public domain, all of this information should be obtainable without search warrant. Once various pieces of data have been joined together and "suspicious patterns" have been identified, search warrants can be obtained to find out more about the contents of individual phone calls or emails.

What's good for the goose is good for the gander.

Sunday, June 9, 2013

Hadoop

The talking heads and politicians have finally begun to catch on about Hadoop. In today's WSJ:

    The NSA's advances have come in the form of programs developed on the West Coast—a central one was known by the quirky name Hadoop—that enable intelligence agencies to cheaply amplify computing power, U.S. and industry officials said. ... A computing and software revolution, launched in Silicon Valley a few years earlier, made sifting all that data easier. That was particularly true with the development of Hadoop, a piece of free software that lets users distribute big-data projects across hundreds or thousands of computers. ... The NSA also became an early adopter. At a 2009 conference on so-called cloud computing, an NSA official said the agency was developing a new system by linking its various databases and using Hadoop software to analyze them, according to comments reported by the trade publication InformationWeek. ... Mr. Garrett now runs RTRG's successor program, which was moved to the Defense Advanced Research Projects Agency and renamed Nexus 7. That effort has been using Hadoop and similar software to help manage large masses of data. One of the pieces of software, called Accumulo, was developed by the NSA using technology from Google, said a person briefed on the program.

Hadoop and its predecessor MapReduce have been around for about 10 years, It's always entertaining to see how "cutting edge" the main stream media and the politicians are.

Metadata

You know the world has taken a very strange turn indeed when politicians in Washington start talking about "metadata." (For a reasonable description of what exactly metadata is, see the Wikipedia article on it.)

By applying the label "metadata" to information collected about individuals, Washington apparatchiks are attempting to convince you that this information is of little significance. The New Yorker quotes, for example, Senator Dianne Feinstein:

    [Feinstein] assured the public earlier today that the government’s secret snooping into the phone records of Americans was perfectly fine, because the information it obtained was only “meta,” meaning it excluded the actual content of the phone conversations, providing merely records, from a Verizon subsidiary, of who called whom when and from where. In addition, she said in a prepared statement, the “names of subscribers” were not included automatically in the metadata (though the numbers, surely, could be used to identify them). “Our courts have consistently recognized that there is no reasonable expectation of privacy in this type of metadata information and thus no search warrant is required to obtain it,” she said.

The affidavit recently filed to obtain a search warrant against Fox correspondent James Rosen shows exactly how significant this so-called "harmless metadata" can be and precisely how it can be used. The affidavit states:

    Telephone call records demonstrate that earlier on the same day, multiple telephone communications occurred between multiple phone numbers associated with Mr. Kim and [Mr. Rosen.] Specifically:
    • at or around 10:15 a.m, an approximate 34 second call was made from [Rosen's] DoS desk telephone to Mr. Kim's desk telephone;
    • [there follow 3 similar descriptions, listing the source and destination phones and the times and durations of the phone calls]
    Thereafter, telephone records for Mr Kim's office phone reveal that at or around the same time as Mr Kim's user profile was viewing the TS/SCI Intelligence Report two telephone calls were placed from his desk phone to [Rosen].
    ...
    In the hour following those calls, the FBI's investigation has revealed evidence that Mr. Kim met face-to-face with [Rosen] outside the DoS. Specifically, DoS security badge access records demonstrate that Mr. Kim and [Rosen] departed the DoS building ... at nearly the same time, they were absent from the building for nearly 25 minutes, and then they returned to the building at nearly the same time. Specifically, the security badge access records indicate:

    • Mr. Kim departed DoS at or around 12:02 p.m. followed shortly thereafter by Rosen at 12:03 p.m. and;
    • [there follows a description of the time of their returns.]
Aside: could the fact that Kim and Rosen both departed the building around 12 noon merely indicate that they, along with many others in the building, were going out to lunch?

So, three sources of "metadata" were accessed and related to each other:

  • records describing the time and duration of phone calls between specific phones;
  • records describing the time and duration of Mr. Kim's login to the TS/SCI system;
  • records describing the times of exit from and entry back into the DoS building.

From these three sources of "mere" metadata, a fairly complete picture of the activities of the two individuals could be reconstructed.

It is one thing when this kind of monitoring is performed on individuals with security clearances. It would be quite another thing if this kind of monitoring were performed on ordinary citizens. But, the fact that the government now possesses all your telephone metadata (a complete record of all the calls you have made to any other individual), all your email metadata (a complete record of all the emails you have sent to or received from any other individual), all your credit card transactions, and who knows what other data, makes it quite possible that the government could construct this kind of timeline of your various activities.

The New Yorker continues:

    The answer, according to the mathematician and former Sun Microsystems engineer Susan Landau, ... the author of Surveillance or Security?, is that it’s worse than many might think.

    “The public doesn’t understand,” she told me, speaking about so-called metadata. “It’s much more intrusive than content.” She explained that the government can learn immense amounts of proprietary information by studying “who you call, and who they call. If you can track that, you know exactly what is happening—you don’t need the content.”

    For example, she said, in the world of business, a pattern of phone calls from key executives can reveal impending corporate takeovers. Personal phone calls can also reveal sensitive medical information: “You can see a call to a gynecologist, and then a call to an oncologist, and then a call to close family members.” And information from cell-phone towers can reveal the caller’s location. Metadata, she pointed out, can be so revelatory about whom reporters talk to in order to get sensitive stories that it can make more traditional tools in leak investigations, like search warrants and subpoenas, look quaint. “You can see the sources,” she said. When the F.B.I. obtains such records from news agencies, the Attorney General is required to sign off on each invasion of privacy. When the N.S.A. sweeps up millions of records a minute, it’s unclear if any such brakes are applied.

In sum, then, the fact that the government claims that it is only collecting "metadata" does not by any means assure that the privacy of an individual has not been invaded or that the collected data is not being used for some pernicious purpose. We need much more analysis into what data sources are being collected, how the various sources of data are being integrated, and how the resulting pictures of individual activity are being used.

Monday, June 3, 2013

Obama, Obamacare, and the IRS: Big Data meets Big Brother

Someone ought to pose the following question to IRS officials: Did employees of the IRS use software to search for terms like “Tea Party” or “Patriot” in determining which 501(c)(4) applicants should be targeted for special scrutiny. Anyone who works in the software industry knows that they did. (If they didn't, they should be fired for gross incompetence!)

But, this kind of software searching and filtering is primitive in comparison to what the IRS apparently is capable of doing. In an article in USNews, Richard Satran writes:

    The Internal Revenue Service is collecting a lot more than taxes this year—it's also acquiring a huge volume of personal information on taxpayers' digital activities, from eBay auctions to Facebook posts and, for the first time ever, credit card and e-payment transaction records, as it expands its search for tax cheats to places it's never gone before. The IRS, under heavy pressure to help Washington out of its budget quagmire by chasing down an estimated $300 billion in revenue lost to evasions and errors each year, will start using "robo-audits" of tax forms and third-party data the IRS hopes will help close this so-called "tax gap." ... In presentations, IRS officials have said they may use the big data for:

    • Charting and analyzing social media such as Facebook
    • Targeting audits by matching tax filings to social media or electronic payments
    • Tracking individual Internet addresses and emailing patterns
    • Sorting data in 32,000 categories of metadata and 1 million unique "attributes"
    • Machine learning across "neural" networks
    • Statistical and agent-based modeling
    • Relationship analysis based on Social Security numbers and other personal identifiers

(BTW, as I pointed out in a blog post yesterday, I don't understand why it is a criminal offense for banks to "robo-sign" foreclosure applications, but it is ok for a government agency to use "robo-audits.")

Now consider some additional facts. During the 2012 campaign, Dan Wagner of the Democratic National Committee (DNC) also built big data analytics models that allowed the DNC to understand voters at the level of the individual. In an article in the MIT Technology Review entitled "How President Obama's campaign used big data to rally individual voters," Sasha Issenberg writes:

    Starting in June [of 2010], [Wagner] began predicting the elections’ outcomes, forecasting the margins of victory with what turned out to be improbable accuracy. But he hadn’t gotten there with traditional polls. He had counted votes one by one. His first clue that the party was in trouble came from thousands of individual survey calls matched to rich statistical profiles in the DNC’s databases. ... Wagner's techniques marked the fulfillment of a new way of thinking, a decade in the making, in which voters were no longer trapped in old political geographies or tethered to traditional demographic categories, such as age or gender, depending on which attributes pollsters asked about or how consumer marketers classified them for commercial purposes. Instead, the electorate could be seen as a collection of individual citizens who could each be measured and assessed on their own terms.

BusinessWeek describes the activities of Wagner's team as follows::

    Wagner’s team pursued a bottom-up strategy of unifying vast commercial and political databases to understand the proclivities of individual voters likely to support Obama or be open to his message, and then sought to persuade them through personalized contact via Facebook (FB), e-mail, or a knock on the door.

Dan Wagner has now gone on to found Civis Analytics, a big data analytics company that hopes to use the team and technology developed during the Obama campaign to influence the outcome of future campaigns. The sole investor in Civis Analytics is Google's Eric Schmidt. BusinessWeek elaborates on Schmidt's well-known connections with the Democratic Party:

    During the 2012 campaign, Barack Obama’s reelection team had an underappreciated asset: Google’s (GOOG) executive chairman, Eric Schmidt. He helped recruit talent, choose technology, and coach the campaign manager, Jim Messina, on the finer points of leading a large organization. “On election night he was in our boiler room in Chicago,” says David Plouffe, then a senior White House adviser. Schmidt had a particular affinity for a group of engineers and statisticians tucked away beneath a disco ball in a darkened corner of the office known as “the Cave.” The data analytics team, led by 30-year-old Dan Wagner, is credited with producing Obama’s surprising 5 million-vote margin of victory.

The fact that Schmidt is a well-known supporter of the Democratic Party raises the obvious question of whether Civis will be working exclusively on behalf of Democratic campaigns. Other obvious questions to ask are: Did the software built by Wagner's team also search for terms like "Tea Party" or "Patriot" (or the negation of these terms) in order to identify the characteristics and "proclivities" of indvidual voters? Did Wagner's team or the DNC share the list of search terms they were using (or any other aspects of the data models they developed) with the IRS?

If you connect the dots, the following picture begins to emerge: The founder of the biggest big data company in the world, Eric Schmidt, a well-known supporter of the Democratic Party, is working together with former members of Obama's campaign organization to develop big data analytics software to support Democratic campaigns. This software will be used to "measure and assess" voters at the level of the indvidual. It will seek to do this by integrating information about these individual voters from a wide variety of digital sources, in essence, tracking the behaviour of individuals as they move around on the World Wide Web or engage in e-commerce transactions or send e-mails. This same kind of analytics is being used by the IRS to develop profiles of individual taxpayers so that these individuals can be subjected to audits.

In sum, we are entering a brave new world. Obama used big data analytics to snoop into the private lives of individuals and influence how they vote. Once elected, Obama recruited the IRS to support Obamacare. The IRS, in turn, has built big data analytics capabilities for the same purpose of snooping into the personal attributes and activities of individuals.

Big Data meets Big Brother.

Sunday, June 2, 2013

Steven Runciman's Crusades, the liberal arts, and the Syrian civil war

I have been reading Steven Runciman’s masterful three volume History of the Crusades.

When the term genius is used these days, almost invariably it is applied to a student of mathematics or the sciences, an Albert Einstein, Francis Crick, or Alan Turing. Most assuredly, these individuals are geniuses of the highest order. Steven Runciman is a genius of a different kind, a genius of the liberal arts.

In order to understand the greatness of Runciman, one might start by looking at the appendices of his Crusades, in which he lists the principal sources for his history. There are Greek, Latin, Arabic, Armenian, and Syriac sources. These are just the principal sources. Runciman’s history also integrates the work of modern historians writing in languages as diverse as English, French, Italian, German, and Russian. The Wikipedia article on Runciman states:

    It is said that he was reading Latin and Greek by age five. In the course of his long life he would master an astonishing number of languages, so that, for example, when writing about the Middle East, he relied not only on accounts in Latin and Greek and the Western vernaculars, but consulted Arabic, Turkish, Persian, Hebrew, Syriac, Armenian and Georgian sources as well.

In brief, Runciman was a genius at learning languages, both living and dead. He reminds us that the study of the liberal arts begins with the study of language. It is not possible to be a serious student of the liberal arts without being a serious student of languages.

But, it would be a slight and an insult to Runciman to label him as just a genius of languages. His command of the geographical, ethnic, political, religious, military, artistic, architectural, and economic factors in the patchwork that was Europe and the Middle East at the end of the first millennium is breathtaking. That is, Runciman's mastery of many languages enabled him to become a master of history, too.

As a historian, Runciman reminds us why the Middle East is such a complicated place, a region where waves of Persian, Jewish, Greek, Latin, Byzantine, Arabic, Turkish, Islamic (Sunni and Shiite), Christian (Orthodox, Monophysite, and Nestorian), Armenian, Mongol, and European (Frankish, German, Italian, Norman) influences have washed over the land at various times. We come away from his history convinced that there are no easy answers, that all attempts to cut the Gordian knot of the Middle East are in vain. The forces that shape the Middle East of today are the same forces that have been in play for centuries and they will remain in play for centuries to come. Our only hope is to fully understand all sides (for example, through the study of ethnicity, religion, and language) and to try gradually and gently to shape and influence them.

This evening, I also happened to view a video on the Weekly Standard entitled “Who Killed the Liberal Arts.” In this video, essayists Joseph Epstein and Andrew Ferguson discuss the “origin and value of a classic education.” I think Runciman’s Crusades provides a clear answer to the question of why the study of the liberal arts is so important. At this very moment, Syria is once again being torn apart by war. Certainly, the picture is complicated and we do not know for certain all the forces that are involved. But, it is apparent that one element in the struggle is the enmity between Bashar al-Assad’s Alawite co-religionists of the coastal region around Latakia and the Sunni majority in the inland. Wikipedia reports:

    The Alawites, also known as ... Nusayris ... are a prominent mystical religious group centred in Syria who follow a branch of the Twelver school of Shia Islam. They were long persecuted for their beliefs by the various rulers of Syria, until Hafez al-Assad took power there in 1970. Today they represent 12% of the Syrian population and for the past 50 years the political system has been dominated by an elite led by the Alawite Assad family. During the Syrian civil war, this rule has come under significant pressure.

Latakia, or, as Runciman refers to it, Lattakieh (the ancient city of Laodicaea), is one of the main strongholds of Assad’s Alawite followers and was one of the cities that witnessed the struggle over the Holy Lands that played out during the First Crusade. The 10th and 11th centuries, the time of the First Crusade, was also the time when the Alawite sect sprang into existence. Wikipedia goes on:

    The Alawites themselves trace their origins to the followers of the eleventh Imām, Hassan al-'Askarī (d. 873), and his pupil ibn Nuṣayr (d. 868). The sect seems to have been organised by a follower of Muḥammad ibn Nuṣayr known as al-Khasibi, who died in Aleppo about 969. In 1032 Al-Khaṣībī's grandson and pupil al-Tabarani moved to Latakia, which was then controlled by the Byzantine Empire. Al-Tabarani became the perfector of the Alawite faith through his numerous writings. He and his pupils converted the rural population of the Syrian Coastal Mountain Range to the Alawite faith.

It is clear, then, that we cannot understand what is happening in Syria today without understanding its history. In America, we naively tend to think of the events in Syria in terms of an “Arab Spring;” the “masses, yearning for freedom,” are being “oppressed” by a “tyrant,” and are staging a "revolution" to overthrow him; all will be well once the despot has been removed. In reality, what we are witnessing is merely the latest episode in an age old struggle between Shiite Alawites and Sunni Muslims; the removal of Assad may do nothing to cure this enmity. Consider, for example, the words of a Sunni cleric:

    Yusef al-Qaradawi, who is based in Qatar and has been a leading voice supporting the Arab Spring, warned that Iranian Shia were trying to "eat" Sunni Muslims, who are a majority in the Muslim world. He referred to Alawites, the followers of the Muslim sect to which President Bashar al-Assad of Syria belongs, as being "worse infidels than Christians or Jews". He also used the deliberately contemptuous term "Nusayris" when talking about them. He was particularly critical of the roles played by Iran, which is largely Shia, and the Lebanon Shia militia Hizbollah whose name translates as Party of God but which he called "Party of Satan", in supporting the Assad regime. "There is no common ground between the two sides because the Iranians, especially conservatives, want to eat the Sunni people," he said.

We are incapable, then, of understanding the events in Syria today without a thorough understanding of the dynamics of the centuries-old history of Shiite Alawites and Sunnis in Syria. It is through the work of great historians and polyglots and masters of the liberal arts like Steven Runciman that we gain this understanding. That is why the liberal arts are still so important.

Did AG Holder perjure himself or merely "robo-approve" the Rosen affidavit

The Justice Department has admitted that Attorney General Holder approved an application for the seizure of the email records of Fox News correspondent James Rosen. The affidavit supporting the application clearly indicates that the Justice Department was considering prosecution of Rosen.

The affidavit states:

    [Reginald B. Reyes, Special Agent of the FBI, believes] there is probable cause to conclude that the contents of the wire and electronic communications pertaining to [James Rosen's email account] are evidence, fruits and instrumentalities of criminal violations of 18 U.S.C Section 793 (Unauthorized Disclosure of National Defense Information) and that there is probable cause to believe that [Rosen] has committed or is committing a violation of section 793(d), as an aider and abettor and/or a co-conspirator, to which the materials relate.

And yet, in a hearing before the House Judiciary Committee on May 15, Holder stated:

    With regard to the potential prosecution of the press for the disclosure of material, that is not something that I’ve ever been involved in, heard of or would think would be a wise policy. In fact, my view is quite the opposite ... there should be a shield law with regard to the press’s ability to gather information and to disseminate it.

Either Holder was fully cognizant of the contents of the affidavit or he was not. If he was, he perjured himself before the Judiciary Committee. If he was not, then, in effect, he robo-approved the application to seize Rosen's email.

A couple of years ago, banks and financial institutions were prosecuted by the Justice Department for robo-signing mortgage foreclosure documents. The Office of Attorney General is no place for a person who engages in the same kind of robotic practice that his department once prosecuted.

Holder either perjured himself or he robo-approved the investigation of Rosen. In either case, he must go.