It comes in “torrents” and “floods” and threatens to “engulf” everything that stands in its path.
No, it is not a tsunami, it is Big Data, the incomprehensibly large amount of raw, often real-time data that keeps piling up faster and faster from scientific research, social media, smartphones — virtually any activity that leaves a digital trace.
The sheer size of the pile (measured in petabytes, one million gigabytes, or even exabytes, one billion gigabytes) combined with its complexity has threatened to overwhelm just about everybody, including the scientists who specialize in wrangling it.
“It’s easier to collect data,” said Michael Franklin, a professor of computer science at the University of California, Berkeley, “and harder to make sense of it.”
Making sense of Big Data is, in fact, a holy grail of computer science these days — and technology companies, academic institutions and the federal government are investing heavily in the endeavor.
And with Google, Facebook, Twitter and many other leading data-heavy technology companies based in the Bay Area, many locals are on the cutting edge of Big Data research.
Last month, the National Science Foundation awarded $10 million to Berkeley’s A.M.P. Expedition, which stands for “algorithms machines people,” a team of Cal professors and graduate students who take an interdisciplinary approach in their drive to advance Big Data analysis.
The group is working to build a new set of open source tools for the era of Big Data and is collaborating on cancer research with the University of California, San Francisco, and UrbanSim, an urban planning tool, among others.
The challenge is to combine traditional database science with new techniques that harness the power of cloud and cluster computing to handle the massive scale of today’s data landscape.
“We’ll judge our success by whether we build a new paradigm of data,” said Franklin, director of A.M.P. Expedition.
The Berkeley group was founded in early 2011 and includes Google, SAP and Oracle as sponsors.
The grant is part of the Obama administration’s “Big Data Research and Development Initiative,” which will distribute $200 million. One of the more innovative aspects of the Berkeley group is its emphasis on the people part of dealing with Big Data.
“We recognize that humans do play an important part in the system,” said Ken Goldberg, an artist, professor of robotics and new media, and faculty on A.M.P.
Goldberg has developed Opinion Space, a tool for online discussion and brainstorming that uses algorithms and data visualization to help gather meaningful ideas from a large number of participants.
Meanwhile, for every minute that it took you to read this article, 48 hours of video were uploaded to YouTube. According to the site, an overwhelming amount of material — about eight years of content — are added every day by users.
“We’re trying to move from data as a problem to data as a resource,” Franklin said.
This article also appears in the Bay Area edition of The New York Times.