It’s that time of the week again – where we round up the big data topics that have kept us on our toes all week long! For you, these stories may just be good fodder for Friday happy hour talk. For us, they are the heart and soul of our industry.
This week, read up on interactive big data visualizations that help our brains understand what all the 1s and 0s are trying to tell us, the discussions around the possibilities of big data scientific studies without consent, data selfies (yes, seriously!) and more.
EU Researchers Help Our Brains Digest Big Data
Every single minute the world generates 1.7 million billion bytes of data – that’s equal to 360,000 DVDs. Thus, we ask the question: how can our brain deal with increasingly big and complex data sets?
EU researchers within the Collective Experience of Empathetic Data Systems (CEEDs) are developing an interactive system that not only presents big data in an appealing and aesthetic way, but are innovating a system that constantly changes its presentation to avoid brain overload.
Transposing big data into an interactive environment allows the human mind to generate new ideas more efficiently. Researchers have built what they are calling an eXperience Induction Machine (XIM) that uses virtual reality to enable a user to literally “step inside” large data sets.
“Virtual reality will enable a user to literally step inside large data sets.”
This immersive multi-modal environment — located at Pompeu Fabra University in Barcelona — also contains an array of sensors that allow the system to present the information in a more user friendly way, constantly tailoring the display to their reactions as the data is examined. These reactions — such as gestures, eye movements or heart rate — are monitored by the system and used to adapt the way in which the data is presented.
Jonathan Freeman, Professor of Psychology at Goldsmiths University of London and coordinator of CEEDs, explains: “The system acknowledges when participants are getting fatigued or overloaded with information and it adapts accordingly. It either simplifies the visualizations so as to reduce the cognitive load, thus keeping the user less stressed and more able to focus. Or it will guide the person to areas of the data representation that are not as heavy in information.”
The CEED began its experiments on a group of neuroscientists using a system called BrainX3, which took the typically huge data sets generated in this scientific discipline and animated them with visual and sound displays. By providing subliminal clues, such as flashing arrows, the machine guided the neuroscientists to areas of the data that were potentially more interesting to each person.
Possible applications for CEEDs abound, from inspection of satellite imagery and oil prospecting, to astronomy, economics and historical research.
“Anywhere where there’s a wealth of data that either requires a lot of time or an incredible effort, there is potential,” says Professor Freeman.
“The system acknowledges when participants are getting fatigued or overloaded with information.”
The project could potentially enable students to study more efficiently or journalists to cross check sources more quickly. Several museums in Germany, the Netherlands, the United Kingdom and the United States have already shown interest in the new technology.
“We are seeing that it’s physically impossible for people to analyze all the data in front of them, simply because of the time it takes. Any system that can speed it up and make it more efficient is of huge value.”
EU action to take advantage of big data extends beyond research projects. The European Commission recently called on national governments to wake-up to the big data revolution (press release) and is using the full range of policy and legal tools to make the most of the data-driven economy (more information).
Vice-President of the European Commission Neelie Kroes, responsible for the Digital Agenda, says: “Big data doesn’t have to be scary. Projects like this enable us to take control of data and deal with it so we can get down to solving problems. Leaders need to embrace big data.”
Read more about the CEEDs project.
But First, Let’s Take a Data Selfie
In the 2013 book Who Owns the Future?, virtual reality pioneer Jaron Lanier poses a question: can internet users reclaim their data? Instead of giving it away to enrich tech companies, Lanier called for users to sell their data, disrupting the Google and Facebook data mining revenue model.
Austin-based artist Laurie Frick’s new app FRICKbits, which transforms your user data into art, partially fulfills the idea of allowing users to reclaim their own data. Like Lanier, Frick wants to give people the ability to use their own data for their own purposes, and she created a tool for that exact purpose.
The app is proving to be popular: on the third day of its Kickstarter campaign, FRICKbits already exceeded its $7,500 fundraising goal.
“Humans unconsciously create very eloquent rhythms via their data footprint.”
FRICKbits’ data art shares some aesthetics with Arthur Buxton’s Colourstory app, which visualizes personal experiences and events as color wheels users can then print and sell as fine art. There’s also Julie Freeman’s We Need Us project that explores big user data in a real-time animated form.
While both are interesting projects, neither pull in a ton of users, which is something to consider with FRICKbits’ launch. You have to wonder how effective user data empowerment can be at such small scales. Will it take thousands of these apps to create a sea of change in how users see their own data, or will it require something more massive and popular?
Frick, for her part, is of the mind that every little bit helps.
Originally trained as an engineer, Frick found the data artist calling after measuring sleep with self-tracking data. This was after she’d risen through the ranks of tech companies, then quit to attend graduate school for art, eventually fusing the two fields into her work.
It was only after Frick had built a studio full of hand-crafted patterns from weight, sleep, internet use, mood, walking, and location data that she realized humans unconsciously create very “eloquent rhythms.”
“It just hit me,” she said, “self-tracking data is like a pattern portrait of you.”
FRICKbits, which runs natively on iOS, features an algorithm and pattern based on Frick’s own hand-drawn ink and watercolor pattern portraits. She said it’s a vector-based system that “mimics squiggly lines and the feel of something hand-crafted.”
“Self-tracking data is like a pattern portrait of you.”
Frick describes current smartphone and mobile data mining as a “one-handed handshake.” All of this data is tracked and analyzed, but it’s hidden or simply not shared. As a data activist, she thinks we’ll make some progress in data privacy if we don’t just hide, but demand more of our information.
“I bet people would be astonished with how much is known about them, and the patterns that are extrapolated, along with predictions made about their behavior,” she said. “People look at me and ask ‘What would I possibly do with my data?'”
For Frick, the answer is simple: turn it into art, and allow a shift in equilibrium to occur that gives users a chance to fight back and have a say about data mining. “It’s not like eating vegetables because art makes data sticky,” she said, hoping that users will gain some self-awareness in the process.
In the end, Frick hopes many small startups and apps benefit from people giving private access to slivers of their personal data. This won’t change Google and Facebook’s incentive to data mine, but if smaller companies give data back to users in a transparent way, users might find clever ways to extract meaning from their own lives.
Data Ethics via the New York Times
Once forced to conduct painstaking personal interviews with subjects, scientists can now sit at a screen and instantly play with the digital experiences of millions of internet users. It’s the frontier of social science — experiments on people who may never even know they are subjects of study, let alone explicitly consent.
But this new era has brought some controversy with it. Jeffrey T. Hancock, a Cornell University professor, was a co-author of the Facebook study in which the social network quietly manipulated the news feeds of nearly 700,000 people to learn how the changes affected their emotions. When the research was published in June, the outrage was immediate.
Now Professor Hancock and other university and corporate researchers are grappling with how to create ethical guidelines for this kind of research. In his first interview since the Facebook study was made public, Professor Hancock said he would help develop such guidelines by leading a series of discussions among academics, corporate researchers and government agencies like the National Science Foundation.
“Consumers should be in the driver’s seat when it comes to their data.”
“As part of moving forward on this, we’ve got to engage,” he said. “This is a giant societal conversation that needs to take place.”
Microsoft Research, a quasi-independent arm of the software company, is a prominent voice in the conversation. It hosted a panel last month on the Facebook research with Professor Hancock and is offering a software tool to scholars to help them quickly survey consumers about the ethics of a project in its early stages.
The Federal Trade Commission, which regulates companies on issues like privacy and fair treatment of internet users, is also planning to get involved. Although the agency declined to comment specifically on the Facebook study, the broader issues touch on principles important to the agency’s chairwoman, Edith Ramirez.
“Consumers should be in the driver’s seat when it comes to their data,” Ms. Ramirez said in an interview. “They don’t want to be left in the dark and they don’t want to be surprised at how it’s used.”
“Umbel cannot be alone in enforcing proper, user-friendly data collection methods.”
The Facebook Emotion experiment and testing of the like raises fundamental questions: what types of experiments are so intrusive that they need prior consent or prompt disclosure after the fact? How do companies make sure that customers have a clear understanding of how their personal information might be used? Who even decides what the rules should be?
Here at Umbel, we’ve made a pledge to put the user at the center of data collection and use, requiring our clients follow best practices so that a user always know what is being collected and why. But, in order to ensure data privacy and security for all, creating an internet that remains a data democracy, Umbel cannot be alone in enforcing proper, user-friendly data collection methods.
You hold the key to your data. Take a stand.
Oops! DEA Pays Hefty Price For Free Data
Earlier this week, The Verge reported that The Drug Enforcement Administration (DEA) paid an Amtrak employee hundreds of thousands of dollars over two decades to obtain confidential information it could have gotten for free, according to a watchdog report.
According to a report released Monday by Amtrak’s inspector general, Tom Howard, the DEA paid an Amtrak secretary $854,460 to be an informant. He then clarifies that employee handed over the information “without seeking approval from Amtrak management or the Amtrak Police Department.”
The employee was not publicly identified except as a “secretary to a train and engine crew” and has since been allowed to retire instead of facing administrative discipline.
“It raises some serious questions about the DEA’s practices and damages its credibility.”
However, Amtrak’s own police agency is already in a joint drug enforcement task force that includes the DEA. According to the inspector general, that task force can obtain Amtrak confidential passenger reservation information – such as emergency contacts, passports, credit card numbers, gender, date of birth, travel itineraries and baggage details – at no cost.
Under an agreement with the DEA, the Amtrak Police Department provides such information for free in exchange for receiving a share of funds seized through resulting investigations. Thus, DEA’s purchase of the records deprived Amtrak police of money the department could have received by supplying the data.
Sen. Chuck Grassley, the senior Republican on the Senate Judiciary Committee, called the $854,460 an unnecessary expense and asked for further information about the incident, claiming it “raises some serious questions about the DEA’s practices and damages its credibility to cooperate with other law enforcement agencies.”
DEA spokeswoman Dawn Dearden declined to comment.
Facebook Autoscale Saves Energy
This week, social network giant Facebook revealed a new load balancing system for its several data centers that results in some serious energy savings.
The system for balancing web requests on web server clusters, called Autoscale, saves power by “10-15% for different web clusters,” according to a blog post today from Qiang Wu, an infrastructure software engineer at Facebook.
Autoscale achieves the saving by optimizing server workloads, by automating their management so that servers are never running at low capacity, which is when they at their least energy efficient.
The system’s goal is to keep servers either idling or running at medium capacity.
In short, Facebook has switched its load-balancing policy from a modified round-robin algorithm (every server receives roughly the same number of page requests and utilizes roughly the same amount of CPU) to concentrate workload to a server until it has at least a medium-level workload. The typical web server at Facebook consumes about 60 watts of power when idle (0 requests-per-second, or RPS), 130 watts when at low-level CPU utilization (small RPS), and 150 watts at medium-level CPU utilization.
“Autoscale saves power by 10-15% for different web clusters.”
If the overall workload is low during a given time, the load balancer will use only a subset of servers, leaving the rest at idle or using them for batch-processing workloads. Autoscale can also dynamically adjust the active pool size such that each active server will get at least medium-level CPU utilization regardless of the overall workload level.
In a test, Wu reported, Autoscale created a 27% power saving in a low period of activity (after midnight). Power saving during peak hours, however, was 0%. The average power saving over a 24-hour cycle was estimated at between 10 and 15% for different web clusters.
“Though the idea sounds simple, it is a challenging task to implement effectively and robustly for a large-scale system,” Wu said.
Facebook already does a lot in regards to improving energy efficiency and reducing its environmental footprint via its hardware and data center design through the Open Compute Project – saving $1.2 billion in just two years.
Yet the fact that improvements can also be achieved with software alone should help motivate other companies to consider doing the same without having to change anything in their physical infrastructure.
A time of change is upon us, my fellow data loving brethren. May we all follow in Facebook’s energy-conscious lead.