Wrestling with Data in the Big Apple: Strata Summit 2011
By: Shane Shifflett
Hundreds of developers and data junkies descended on Times Square last Tuesday for Strata Summit, a two-day conference to grapple with the problems and possibilities of ever-growing data sets. The Bay Citizen was in attendance — here are some highlights:
Disaster detection
Robert Kirkpatrick, director of the United Nations' Global Pulse Initiative, came to Strata calling corporations and global citizens to action.
Kirkpatrick wants business to release anonymous data that can be used to identify and predict crises like famines, droughts or economic depressions. In his own investigations into publicly available data, he found that tweets from Indonesia revealed power outages and lines of villagers waiting for fuel. Prices scraped from e-commerce sites in Latin American allowed him to track food inflation in real time.
“People share so many things online today that we care deeply about at the UN,” Kirkpatrick said, including data that could influence policy makers' decisions, if it were readily available. But he said much of the important data is in the hands of private companies.
Kirkpatrick announced a new social network, Hunchworks, to allow experts diving into data sets to connect and discuss ideas without the formalities of academic research. See Kirkpatrick's talk here or read about his agenda here.
Real-life Moneyball
You can watch Jonah Hill portray a fictional version of Paul DePodesta — former assistant to the general manager of the Oakland Athletics who was responsible for revolutionizing baseball management practices through rigorous data analysis — in the new movie "Moneyball," but Strata attendees heard the story first-hand.
DePodesta, vice president of player development and scouting for the New York Mets, discussed his experiences implementing sabermetrics — a form of empirical baseball analysis — to make decisions less subjective.
"If we like a guy, all we do is talk about what he does well," DePodesta said. "If we don't like a guy, all we talk about is what he doesn't do well and we make it sound like these guys shouldn't have made their high school team."
DePodesta institutionalized the use of objective data analysis to determine whether a player will remain obscure his entire career or sign a multimillion-dollar contract with a major league team. These guidelines apply to anyone who works with data:
1. Information overkill. We live in a world with too much data. As you cut through the noise, look for real causal relationships by learning everything about the problem domain and consolidating your search through experience.
2. Affirmation bias. Take an objective look at what the data is telling you and throw out your biases.
3. Peer perception. Consider how your peers will judge you and manage your responses with your goals in mind.
4. Ask the naive question. In the face of uncertainty, DePodesta urges you to accept your limitations and look at the problem with fresh eyes. Ask the question as if you were doing it for the first time.
Listen to DePodesta's speech in its entirety here.
The social networks
With Facebook's recent changes fresh on attendees' minds, Tim O'Reilly, founder of O’Reilly Media, invited Bradley Horowitz, product manager of Google Plus, to talk about the state of Google's social fabric. O’Reilly wasted no time addressing the elephant in the room: traffic and Google’s competition with Facebook.
"We're not worried about traffic," Horowitz said. "We will bring on big numbers with time."
He downplayed talk of competition and seemed content to answer in generalities. Horowitz moved on to talking about the latest features released to Google Plus: phones with front-facing cameras and enough bandwidth can participate in conversations. He added that Google was turning hangouts into a platform by adding support for Google Docs and allowing developers to build applications using Hangouts as a base.
The dark side of data
Generally speaking, Strata Conference was a place for optimists. Attendees shivered with excitement when the words "opportunity" and "data" were mentioned in the same sentence. The only person willing to publicly address the dark side of data was Mark Goodman.
"The more data you produce, the more organized crime is happy to consume," he said.
Goodman, chief cybercriminologist of the Germany-based Cybercrime Research Institute, is an expert at tracing exotic crimes online. Facing a room full of data analysts and programmers, he painted a picture of ruthless thugs deploying the same data-processing tactics used by his audience. Stolen data sold on the black market has yielded an estimated $2 trillion in profits in 2010, he said.
Illegal vendors offer crime as a service, he said, so computer-illiterate criminals can steal your data. Some even offer tech support to walk bullies through graphical user interfaces that can launch attacks on innocent citizens with a single keystroke.
More frightening, Goodman recounted how terrorists coordinated the 2008 Mumbai attacks by analyzing data from Google, Twitter and news outlets. Terrorists sent and received messages via their Blackberries to verify targets or pose as counterterrorism forces, he said. In one troubling case, suspects were able to identify a high-profile target for execution through Google searches.
Goodman's point: Data is just as valuable to ruthless criminals as it is to advertisers, and there are already examples of deadly abuses. See Goodman's full talk here.
But wait, there's more
Simon Rogers, editor of The Guardian's Data Blog; Quentin Hardy, deputy tech editor of The New York Times; and Aneesh Chopra, federal chief technology officer for the Obama Administration all gave fascinating talks. You can read coverage from other outlets here and watch presentations here.

