There’s been tons of tech talk this week concerning net neutrality, Verizon, Google and so much more. Frankly, if you’ve been working at all, it’s been hard to keep track of all the updates. No need to worry, though, we’ve got you covered. Here, the top six tech news stories were watching given their record-breaking, data rights abiding (or not) status. At the very least, it’ll give you some good talking fodder for your weekend festivities.
Setting New World Records
A research group at the Technical University of Denmark (DTU), which was the first to break the one-terabit barrier in 2009, has now managed to squeeze 43 terabits per second over a single optical fiber with just one laser transmitter.
In a more user-friendly unit, 43 terabits is equivalent to a transfer rate of around 5.4 terabytes per second — or 5,375 gigabytes to be exact. Yes, if you had your hands on DTU’s new fiber-optic network, you could transfer the entire contents of your 1TB hard drive in a fifth of a second — or download a 1GB DVD in 0.2 milliseconds.
The previous record over a single optical fiber — 26 terabits per second, set by Karlsruhe Institute of Technology back in 2011 — had remained unbroken for a surprisingly long period of time. DTU set a series of single-fiber world records in 2009 and 2011, but had since been forced to sit in Karlsruhe’s shadow — until now.
This was obviously a pain point for the DTU researchers — in fact, the press release [Danish] announcing the new world record actually calls out Karlsruhe by name. I guess a bit of friendly competition never hurt anyone though, right?
“The techniques used by DTU to hit 43Tbps actually have a chance of making it into real-world networks.”
The key takeaway from this world record is DTU’s use of a single laser over a single fiber. In the past there have been plenty of network demonstrations of hundreds or even thousands of terabits per second with multiple lasers over multiple fibers — but those demos are so far removed from the reality of fiber-optic networking that they’re not really worth our time. When we talk about commercial fiber-optic links, we’re nearly always talking about single-laser-single-fiber, because that’s what the entire Internet backbone is built upon.
In other words, the techniques used by DTU to hit 43Tbps actually have a chance of making it into real-world networks in the next few years. You might soon be able to download a TV show or movie in quite literally the blink of an eye.
Currently, the fastest commercial single-laser-single-fiber network connections max out at just 100Gbps (100 Gigabit Ethernet). The IEEE is currently investigating the feasibility of either a 400Gbps or 1Tbps Ethernet standard, with ratification not due until 2017 or later. Obviously DTU’s 43Tbps won’t have much in the way of real-world repercussions for now — but it’s a very good sign that we’re not going to run out of internet bandwidth any time soon. Unless, of course, your service provider has any say in it.
The Data Lake Dream
The data lake dream is of a place with data-centered architecture, where silos are minimized, and processing happens with little friction in a scalable, distributed environment. Applications are no longer islands, and exist within the data cloud, taking advantage of high bandwidth access to data and scalable computing resource. Data itself is no longer restrained by initial schema decisions, and can be exploited more freely by the enterprise.
I call it a dream, because we’ve a way to go to make the vision come true. And recently, a new report from Gartner Inc. calls into question the possibility of a data lake entirely.
The study, called The Data Lake Fallacy: All Water and Little Substance, notes that while many vendors have signed on to the data lake concept, few companies agree on a definition of what data lakes are or the value they provide.
“Few companies agree on a definition of what data lakes are or the value they provide.”
“Data lakes are marketed as enterprise wide data management platforms for analyzing disparate sources of data in their native formats,” wrote Gartner’s Nick Heudecker. “This eliminates the up-front costs of data ingestion, like transformation. Once data is placed into the lake, it’s available for analysis by everyone in the organization.”
But, co-author Andrew White pointed out that while data lakes might benefit certain parts of an organization, no one has yet realized the value proposition of enterprise-wide data management.
The authors said there are other risks associated with data lakes as well, including access control and security considerations. Data may also be restricted by regulatory or privacy requirements, and just dumping it into a lake could lead to legal exposure.
Nevertheless, Gartner’s analysts don’t write off data lakes altogether.
“The question your organization has to address is this: Do we encourage one-off, independent analysis of information in silos or a data lake, bringing said data together, or do we formalize to a degree that effort, and try to sustain the value-generating skills we develop,” wrote White.
“Data lakes are likely to appeal to an organization that prefers the first scenario, but those that want to consolidate information should move beyond data lakes to focus on building a more robust data warehouse.”
“As business is increasingly digital, access to data will become a critical priority, as will speed of development and deployment.”
Regardless of where you are now, take some time to look to the future. We’re on a journey toward connecting enterprise data together. As business is increasingly digital, access to data will become a critical priority, as will speed of development and deployment.
Interested in the future of big data and where it’s headed? Let us show you what Umbel can do.
A popular topic of discussion this week: Foursquare 8.0 has hit the market. While the majority of headlines focus on the app’s aim at Yelp, we noticed a slightly different – and quite frightening – feature.
Hiding in Foursquare’s revamped mobile app is a feature some users might find creepy: it tracks your every movement, even when the app is closed.
Starting today, users who download or update the Foursquare app will automatically let the company track their GPS coordinates any time their phone is powered on. Previously, Foursquare required users to give the app permission to turn on location-tracking. Now users must change a setting within the app to opt out.
“Users who download or update the Foursquare app will automatically let the company track their GPS coordinates.”
A new version of Foursquare’s eponymous app, released today, is a radical departure for the company. Once a kind of online bragging system, the app is now more of a tracking machine. Gone is Foursquare’s best-known feature, a large “check in” button that users clicked to voluntarily share their location. Now, the app is keeping tabs on you at all times, sending your location back to Foursquare’s servers, which then push recommendations back to your smartphone, suggesting restaurants and stores to visit — and stuff to order and buy once you get there.
“To actually get an app to talk to you like a friend would talk to you. That’s what we’re going at here, and I think we’ve done a really good job of it,” says Foursquare CEO Dennis Crowley.
“Your real-time location is not shared on the Foursquare app. If you write a tip, like or otherwise interact with a place, users may infer that you have been to that location. Some content, like tips, are time stamped and other users could use that information to infer when you were at a place even though tips can be posted when you aren’t at the place you are leaving a tip about.”
Tracking user whereabouts could arm Foursquare with more valuable data it can sell to partners and advertisers as it searches for new streams of revenue. According to Crowley, the company hopes to analyze trends in where users go and what destinations are popular, and may sell that data to its partners.
“The company hopes to analyze trends in where users go and what destinations are popular, and may sell that data to its partners.”
Crowley explains that Foursquare doesn’t share private user actions “with anyone,” but it does approach government snooping and private data brokers like “a lot of other companies” — small comfort given how pervasive information leakage and sharing has become in the tech industry.
Regarding the data Foursquare collects, Crowley said trend data provided to partners would never include users’ real names.
“We might look at anonymized trends and say, there’s a high density of people who like ribs and Arnold Palmers in the East Village,” he said. Advertisers “might be really excited about getting their hands on that data,” he said.
But this type of persistent location tracking could scare off users who are growing increasingly wary of threats to their mobile privacy.
Such concerns seem particularly warranted in light of recent revelations of extensive domestic spying by U.S. intelligence agencies. The data you share with Foursquare today could conceivably end up in the hands of the NSA, hackers or private data brokers tomorrow.
“These location data collection schemes create a honeypot for malicious actors,” says Adi Kamdar, a staff activist at the Electronic Frontier Foundation. “People tend to forget that these features are on, providing little benefit to the user while sending heaps of interesting — and personal — data over to companies.”
The time is now to champion for data rights. Take a stand.
Chris Wiggins on Data Literacy
ICYMI: earlier this year, the New York Times hired biologist and machine models expert Chris Wiggins, an associate professor of applied mathematics at Columbia University, as its chief data scientist. Safe to say, Wiggins has brought a new edge to the long-living publication.
Last month, Wiggins participated in a talk at GigaOm’s Structure Conference in San Francisco about how data science works, how it is helping to change the Times, and why he believes data literacy is essential for news-gathering companies and contemporary global citizens alike.
“Data literacy is essential for news-gathering companies and contemporary global citizens alike.”
It seems as though Wiggins is far from alone in this thinking: new operations by upstarts such as former Timesman Nate Silver’s FiveThirtyEight and ex-Washington Post reporter Ezra Klein’s site, Vox – not to mention the Post itself, which just this week launched a new data-driven initiative called Storyline – have caused some observers to dub this the “wonk wars.” The Times has its fair share of competition.
Wiggin’s newest call-to-action is the need for data literacy, now.
“In order for there to exist critical literacy – the ability to take apart somebody else’s argument based on the way they analyze data – you need for there to be enough people who are savvy and able to use the data, to make sense of the data,” Wiggins explains.
“In the same way two different reporters might find a source and ask totally different questions and come to different interpretations, two different data journalists might encounter the same data set and have a variety of ways of trying to learn from those data. You really need there to be enough people doing data journalism for there to exist some sort of peer review.”
“You really need there to be enough people doing data journalism for there to exist some sort of peer review.”
In general, whenever you’re analyzing a data set it’s important to consider the following: how was it created and what biases or assumptions went into creating that data set. In the same way, when you’re reading somebody else’s data journalism, you really need to think through what assumptions were made in that model or in that analysis.
And of course, we can’t do that if we don’t have a group of journalists who are sufficiently literate in algorithms and analysis to be critically literacy.
The truth has been revealed: data science is changing the business. In the vast and ever changing data ecosystem, literacy – one that is consistent and all encompassing – is key to our survival. Thus, a stand for data literacy is a bold, yet necessary move.
In that, we join you on your mission, Chris Wiggins. Big data is a big opportunity, and we must all take full advantage.
Microsoft Aids in Arrest
Google isn’t alone in its email scanning for child porn. In this week’s news, you’ll find that software giant Microsoft recently tipped off police to a man in Pennsylvania who has now been arrested and charged with receiving and sharing child porn through his OneDrive account.
The arrest comes just days after news of Google’s own tip off to police, resulting in a 41-year-old restaurant worker being placed in custody for possessing child pornography.
“Companies like Microsoft and Google are using big data for the good, and this is just the start.”
Microsoft’s police tip offs should come as no surprise though. Court records last year showed that the company initiated a similar tip off to alert authorities about child pornography on a OneDrive account. Microsoft scans emails and cloud storage using its PhotoDNA technology that calculates a mathematical hash for an image of child sexual abuse that allows it to recognize photos automatically even if they have been altered. Google, Twitter and Facebook all use Microsoft’s PhotoDNA tech, helping to build up a database of illegal photos.
Thus, we must remember online services are far from private; certain illegal activities are targeted, monitored and reported. Companies like Microsoft and Google are using big data for the good, and this is just the start.
Fortune’s 2014 Big Data All-Stars
Big data is more than the 1s and 0s. This week, Fortune released a list of 20 extraordinary people in the data industry, marking the first ever big data “A-List” by a major publication.
So, what you say? It has never been clearer that big data is making big waves, and even Fortune recognizes its impact.
Such a prestigious list marks the beginning of a new era – one that is dominated by big data. And we expect that Fortune’s list won’t be the last.
Want to learn more? Check out Umbel’s 41 Big Data Names You Need to Know.