banner

独家│《经济学人》Kenneth Cukier TED演讲:大数据的“好”与“坏”

作者: 大数据观察来源: 大数据观察时间:2017-05-27 17:26:200

下面是中英文对照字幕。本文是36大数据独家专稿,禁止转载

Kenneth Cukier: Big data is better data

Self-driving cars were just the start. What’s the future of big data-driven technology and design? In a thrilling science talk, Kenneth Cukier looks at what’s next for machine learning — and human knowledge.

谷歌开发的无人驾驶汽车是开始。什么是大数据驱动技术和设计的未来?在一场惊心动魄的科学讲座,Kenneth Cukier为大家讲述接下来机器学习会发生什么,而人类的知识又会如何?

America’s favorite pie is?

美国人最喜欢的派是什么?

0:15

Audience: Apple. 苹果

Kenneth Cukier: Apple. Of course it is. How do we know it? Because of data. You look at supermarket sales. You look at supermarket sales of 30-centimeter pies that are frozen, and apple wins, no contest. The majority of the sales are apple. But then supermarkets started selling smaller, 11-centimeter pies, and suddenly, apple fell to fourth or fifth place. Why? What happened? Okay, think about it. When you buy a 30-centimeter pie, the whole family has to agree, and apple is everyone’s second favorite. (Laughter) But when you buy an individual 11-centimeter pie, you can buy the one that you want. You can get your first choice. You have more data. You can see something that you couldn’t see when you only had smaller amounts of it.

当然是苹果,我们怎么知道的?因为数据,如果你查阅超市销售数据,你会发现30厘米的速冻派,苹果派毫无争议的最受欢迎,占据了大部分销售份额。但是,当超市开始销售更小的11厘米速冻派的时候,突然,苹果派掉落到第四或第五位。为啥?咋啦?好吧,想想看,当你要买一个30厘米的派的时候,整个家庭都要同意,而苹果派是每个人的次优选择。但当你自己给自己买一个11厘米的派的时候,你可以买你想要的那个,得到了你的最优选择。你有更多的数据,就可以看见原来只有少量数据的时候未曾发现的东西。

1:24

Now, the point here is that more data doesn’t just let us see more, more of the same thing we were looking at. More data allows us to see new. It allows us to see better. It allows us to see different. In this case, it allows us to see what America’s favorite pie is: not apple.

现在,更多的数据不仅仅让你看见同一件事情上更多的东西。更多的数据可以让你找到新的发现,可以让你看得更清晰,可以让你看见不同的东西。在刚才的话题上,让你发现了一个事实:什么是美国人最喜欢的派?非苹果派。

1:49

Now, you probably all have heard the term big data. In fact, you’re probably sick of hearing the term big data. It is true that there is a lot of hype around the term, and that is very unfortunate, because big data is an extremely important tool by which society is going to advance. In the past, we used to look at small data and think about what it would mean to try to understand the world, and now we have a lot more of it, more than we ever could before. What we find is that when we have a large body of data, we can fundamentally do things that we couldn’t do when we only had smaller amounts. Big data is important, and big data is new, and when you think about it, the only way this planet is going to deal with its global challenges — to feed people, supply them with medical care, supply them with energy, electricity, and to make sure they’re not burnt to a crisp because of global warming — is because of the effective use of data.

现在,你肯定听过大数据这个词,实际上,你甚至可能觉得这个词挺恶俗。的确,有很多关于大数据夸张的宣传,但是,不幸的是,大数据恰恰未来社会极其重要的工具。过去,我们看少量数据并进行解读,尝试着了解世界,现在,我们知道的远比过去能了解的多的多。我们发现:当我们有了海量数据,我们可以做一些过去我们在少量数据环境下根本没法做的事情。大数据重要,大数据新颖,设想一下,大数据是绝无仅有的解决全球性的挑战的方式:粮食、医疗健康、能源供给、电力、全球变暖(保证我们不会成为炸薯片)——解决这些问题都有赖于有效使用数据。

2:50

So what is new about big data? What is the big deal? Well, to answer that question, let’s think about what information looked like, physically looked like in the past. In 1908, on the island of Crete, archaeologists discovered a clay disc. They dated it from 2000 B.C., so it’s 4,000 years old. Now, there’s inscriptions on this disc, but we actually don’t know what it means. It’s a complete mystery, but the point is that this is what information used to look like 4,000 years ago. This is how society stored and transmitted information.

大数据到底有什么新鲜的,为什么大数据重要?为了回答这个问题,首先我们想一想信息在过去从物理上讲到底是什么样的。在1908你那,在克里特岛上,考古学家发现了一个公元前2000年的陶制盘子,现在这个东西有4000年了。盘子上有一些铭文,但我们不知道它们是什么意思。这完全是一个不解之谜,但重要的是,我们知道4000年前的信息是什么样子,当时的社会是这样储存和传承信息的。

3:30

Now, society hasn’t advanced all that much. We still store information on discs, but now we can store a lot more information, more than ever before. Searching it is easier. Copying it easier. Sharing it is easier. Processing it is easier. And what we can do is we can reuse this information for uses that we never even imagined when we first collected the data. In this respect, the data has gone from a stock to a flow, from something that is stationary and static to something that is fluid and dynamic. There is, if you will, a liquidity to information. The disc that was discovered off of Crete that’s 4,000 years old, is heavy, it doesn’t store a lot of information, and that information is unchangeable. By contrast, all of the files that Edward Snowden took from the National Security Agency in the United States fits on a memory stick the size of a fingernail, and it can be shared at the speed of light. More data. More.

现在,社会也没发生多大变化,我们仍然把信息储存在盘子上——硬盘,但我们可以存储远比以前更多的信息。更易于查找,更易于拷贝,更易于分享,更易于处理。我们知道我们可以反复使用这些信息—甚至是在我们手机这些信息时我们从未想过要这么用。从这个角度上讲,数据发生了变化,从一个批次变成一道数据流,从静止的、静态的变成流动的和动态的。信息具有流动性。4000年克里特岛发现的陶盘重而且存储不了多少信息,而且信息也无法变更。如今,所有斯诺登从国家安全局带走的文件可以存储在指甲大小的U盘里,而且可以以光速共享。更多的数据,更多。

4:50

Now, one reason why we have so much data in the world today is we are collecting things that we’ve always collected information on, but another reason why is we’re taking things that have always been informational but have never been rendered into a data format and we are putting it into data. Think, for example, the question of location. Take, for example, Martin Luther. If we wanted to know in the 1500s where Martin Luther was, we would have to follow him at all times, maybe with a feathery quill and an inkwell, and record it, but now think about what it looks like today. You know that somewhere, probably in a telecommunications carrier’s database, there is a spreadsheet or at least a database entry that records your information of where you’ve been at all times. If you have a cell phone, and that cell phone has GPS, but even if it doesn’t have GPS, it can record your information. In this respect, location has been datafied.

现在,一个人在如今为什么有如此之多的数据是因为我们一直在收集一些信息,另外一个原因是,我们把很多过去未曾进行数据化处理的信息转化为数据。举个例子,马丁路德,如果我们在公园1500年想知道他在哪,我们只好时时刻刻跟着他,现在如何?你知道在某处,也许是通讯运营商的数据中心,有一张表格,或者至少是一个访问记录,记录着时时刻刻你在哪。如果你有个手机,这个手机有GPS功能,或者连GPS功能都没有,它都能记录你的位置信息,从这个角度讲,位置信息被数据化了。

5:47

Now think, for example, of the issue of posture, the way that you are all sitting right now, the way that you sit, the way that you sit, the way that you sit. It’s all different, and it’s a function of your leg length and your back and the contours of your back, and if I were to put censors, maybe 100 censors into all of your chairs right now, I could create an index that’s fairly unique to you, sort of like a fingerprint, but it’s not your finger.

现在,试想一下,例如,你的坐姿,你现在的坐姿,你的坐姿,他的坐姿,每个人都不一样,因为腿部与背部的功能,以及背部的形态各异。如果我们部署传感器,也许100个,在你的椅子上。我就可以创造一个描述你专属于你的指数,就像指纹一样,知识不在手指上。

6:14

So what could we do with this? Researchers in Tokyo are using it as a potential anti-theft device in cars. The idea is that the carjacker sits behind the wheel, tries to stream off, but the car recognizes that a non-approved driver is behind the wheel, and maybe the engine just stops, unless you type in a password into the dashboard to say, “Hey, I have authorization to drive.” Great.

这个东西有什么用呢?东京的研究人员使用这个技术用于汽车的防盗技术。这个想法是盗车贼坐在驾驶座位上,试图启动,但汽车认出了这个人是未授权的驾车人,所以引擎拒绝启动,如非在仪表盘上输入密码,“好啦,我是有授权的驾驶者。”挺好。

What if every single car in Europe had this technology in it? What could we do then? Maybe, if we aggregated the data, maybe we could identify telltale signs that best predict that a car accident is going to take place in the next five seconds. And then what we will have datafied is driver fatigue, and the service would be when the car senses that the person slumps into that position, automatically knows, hey, set an internal alarm that would vibrate the steering wheel, honk inside to say, “Hey, wake up, pay more attention to the road.” These are the sorts of things we can do when we datafy more aspects of our lives.

如果在欧洲,每一辆车都应用了这个技术会如何?能做什么呢?我们可以收集整合所有数据,也许我们可以发现所有能指出可以预测下个5秒内发生车祸的指标。其后,我们可以通过或量化分析驾驶者的疲劳程度,从而在司机开始疲劳时,向其发送语音警示:打起精神,注意路况!这些都是我们将更多角度我们的生活数据化后可以做的事情。

7:28

So what is the value of big data? Well, think about it. You have more information. You can do things that you couldn’t do before. One of the most impressive areas where this concept is taking place is in the area of machine learning. Machine learning is a branch of artificial intelligence, which itself is a branch of computer science. The general idea is that instead of instructing a computer what do do, we are going to simply throw data at the problem and tell the computer to figure it out for itself.

所以,大数据的价值何在?想象看,你有了更多的信息,你可以做很多过去从未想过的事情。一个令人印象深刻的领域就是机器学学习。机器学习是人工智能的一个分支,人工智能是计算机科学的一个分支。传统上,我们指引计算机该干什么,而机器学习是,我们给计算机喂一堆数据,告诉他要解决什么问题,让计算机自己去搞定。

And it will help you understand it by seeing its origins. In the 1950s, a computer scientist at IBM named Arthur Samuel liked to play checkers, so he wrote a computer program so he could play against the computer. He played. He won. He played. He won. He played. He won, because the computer only knew what a legal move was. Arthur Samuel knew something else. Arthur Samuel knew strategy. So he wrote a small sub-program alongside it operating in the background, and all it did was score the probability that a given board configuration would likely lead to a winning board versus a losing board after every move. He plays the computer. He wins. He plays the computer. He wins. He plays the computer. He wins. And then Arthur Samuel leaves the computer to play itself. It plays itself. It collects more data. It collects more data. It increases the accuracy of its prediction. And then Arthur Samuel goes back to the computer and he plays it, and he loses, and he plays it, and he loses, and he plays it, and he loses, and Arthur Samuel has created a machine that surpasses his ability in a task that he taught it.

举个例子有助于大家理解这个事情是怎么发生的。50年代,一个IBM的计算机科学家Arthur Samuel喜欢下棋,所以他写了个程序,所以他可以和计算机下棋。玩一局,他赢;玩一局,他赢;玩一局,他赢。因为计算机只知道基本规则,而Arthur Samuel知道策略。所以他又写了一个子程序,分析每一步获胜的概率和输棋的概率。然后他又和计算机玩,Arthur还是赢了。之后,他让计算机自己和计算机玩,这样计算机获得了更多的数据,从而增加了每一步预测的准确性,之后Arthur再和计算机玩,他输了,再玩,再输,还玩,还输。。。Arthur创造了一台机器,拥有可以超越这台机器的老师的能力。

9:29

And this idea of machine learning is going everywhere. How do you think we have self-driving cars? Are we any better off as a society enshrining all the rules of the road into software? No. Memory is cheaper. No. Algorithms are faster. No. Processors are better. No. All of those things matter, but that’s not why. It’s because we changed the nature of the problem. We changed the nature of the problem from one in which we tried to overtly and explicitly explain to the computer how to drive to one in which we say, “Here’s a lot of data around the vehicle. You figure it out. You figure it out that that is a traffic light, that that traffic light is red and not green, that that means that you need to stop and not go forward.”

这个有关于机器学习的故事会发展到社会各个角落。我们如何看待自动驾驶汽车?是我们有了更强的能力可以把所有道路规则编入软件?不是;更便宜的存储?不;更快的算法?不;更好的处理器?不;所有的都相关,但不是原因。因为我们变换了问题的本质,把明白准确向计算机解释如何驾驶,变成很多数据在车辆周围,计算机自己想出什么是红绿灯,红灯停,绿灯行。

10:17

Machine learning is at the basis of many of the things that we do online: search engines, Amazon’s personalization algorithm, computer translation, voice recognition systems. Researchers recently have looked at the question of biopsies, cancerous biopsies, and they’ve asked the computer to identify by looking at the data and survival rates to determine whether cells are actually cancerous or not, and sure enough, when you throw the data at it, through a machine-learning algorithm, the machine was able to identify the 12 telltale signs that best predict that this biopsy of the breast cancer cells are indeed cancerous. The problem: The medical literature only knew nine of them. Three of the traits were ones that people didn’t need to look for, but that the machine spotted.

机器学习是很多我们线上在做的事情的基础:比如搜索引擎,亚马逊的个性化算法,机器翻译,语音识别系统。研究人员最近在研究癌症有关的活体组织检查时,要求计算机通过数据和存活率辨识出细胞是癌细胞还是良性的。毫无悬念,当你扔给计算机数据,通过一种机器学习算法,机器可以辨认12个标志从而指出乳腺癌活检样本是癌症。问题是,医务人员只知道其中9个。剩下3个人们从未知道,但机器发现了。

11:23

Now, there are dark sides to big data as well. It will improve our lives, but there are problems that we need to be conscious of, and the first one is the idea that we may be punished for predictions, that the police may use big data for their purposes, a little bit like “Minority Report.” Now, it’s a term called predictive policing, or algorithmic criminology, and the idea is that if we take a lot of data, for example where past crimes have been, we know where to send the patrols. That makes sense, but the problem, of course, is that it’s not simply going to stop on location data, it’s going to go down to the level of the individual. Why don’t we use data about the person’s high school transcript? Maybe we should use the fact that they’re unemployed or not, their credit score, their web-surfing behavior, whether they’re up late at night. Their Fitbit, when it’s able to identify biochemistries, will show that they have aggressive thoughts. We may have algorithms that are likely to predict what we are about to do, and we may be held accountable before we’ve actually acted. Privacy was the central challenge in a small data era. In the big data age, the challenge will be safeguarding free will, moral choice, human volition, human agency.

现在,我们探讨大数据的阴暗面。大数据当然会改善我们的生活,但是也带来了我们需要认真考虑的问题。第一个问题是我们可能会因预测而被惩罚,警察可能会使用大数据方法来实现他们的目的,有点像监控报告,现在有专有名词叫做预知执法,或者算法犯罪行为学。如果我们有足够多的数据,我们就知道该向何处派遣巡逻车。这个做法行得通,但是,问题是这个事情不会仅仅至于地理信息,它会深入个人数据。为什么不用一个人的高中成绩单呢?也许我们可以使用个人的雇佣状况,信用记录,网络行为?他们晚上是否会起来。他们的手环,当可以辨识出生化特征,显现出他们会有激进的想法。我们可能会有算法预测出我们可能会做什么,我们也许会为我们还未实施的行为负责。隐私在小数据时代是个核心挑战,在大数据时代,挑战是保卫自由意志,道德选择,人类意志,和人类组织

12:53

There is another problem: Big data is going to steal our jobs. Big data and algorithms are going to challenge white collar, professional knowledge work in the 21st century in the same way that factory automation and the assembly line challenged blue collar labor in the 20th century. Think about a lab technician who is looking through a microscope at a cancer biopsy and determining whether it’s cancerous or not. The person went to university. The person buys property. He or she votes. He or she is a stakeholder in society. And that person’s job, as well as an entire fleet of professionals like that person, is going to find that their jobs are radically changed or actually completely eliminated. Now, we like to think that technology creates jobs over a period of time after a short, temporary period of dislocation, and that is true for the frame of reference with which we all live, the Industrial Revolution, because that’s precisely what happened. But we forget something in that analysis: There are some categories of jobs that simply get eliminated and never come back. The Industrial Revolution wasn’t very good if you were a horse. So we’re going to need to be careful and take big data and adjust it for our needs, our very human needs. We have to be the master of this technology, not its servant. We are just at the outset of the big data era, and honestly, we are not very good at handling all the data that we can now collect. It’s not just a problem for the National Security Agency. Businesses collect lots of data, and they misuse it too, and we need to get better at this, and this will take time. It’s a little bit like the challenge that was faced by primitive man and fire. This is a tool, but this is a tool that, unless we’re careful, will burn us.

还有一个问题,大数据可能窃取我们的工作,大数据和算法会挑战21世纪白领、专业知识领域的工作。正如工业自动化和流水线挑战了20世纪蓝领职位一样。试想实验室中那个看着显微镜中癌症活体组织检查的人,他去了大学,他购买了房产,他或她投票,他或她是社会的利益相关方。这个人的工作,连同相关的类似专业工作,可能会发生巨大变化,或根本上被消除。现在我们喜欢设想科技变革创造工作,从长期来看,岗位减少是暂时的一个阶段。这个框架是我们参考的真实框架,在工业革命后,就是切实发生的。但我们忘了分析,有一些类比的工作消失了,也不曾再出现。如果你匹马,工业革命对你非常不利。所以,我们必须小心,把大数据带到我们需要的地方去,让大数据使用我们的需求,人类的需求,我们必须掌控这个技术,而不是它的仆从。我们刚刚跨入大数据时代,实话实说,我们对掌控所有我们所收集的数据并不十分在行。这不只是NSA的问题,商业公司收集了很多数据,也存在着滥用。我们必须有所改进,当然,这需要时间。这个挑战有点像原始人与火,这是个工具,但如果我们不小心的话,会引火自焚。

14:55

Big data is going to transform how we live, how we work and how we think. It is going to help us manage our careers and lead lives of satisfaction and hope and happiness and health, but in the past, we’ve often looked at information technology and our eyes have only seen the T, the technology, the hardware, because that’s what was physical. We now need to recast our gaze at the I, the information, which is less apparent, but in some ways a lot more important. Humanity can finally learn from the information that it can collect, as part of our timeless quest to understand the world and our place in it, and that’s why big data is a big deal.

大数据将会变革我们的生活,工作和思想,帮助我们管理我们的职业,引领我们的生活,实现满足、希望、欢乐和健康。但是,在过去。我们对“信息技术”只见“技术”,硬件,因为那是物理上的。现在,我们需要重新将我们的目光聚焦在我们忽视但更为重要的“信息”,人性终将从信息中获益,这些信息是我们持之以恒所了解的世界的一部分,这些信息,是为何大数据如此重要。

15:45 (Applause)

Kenneth Cukier

Data Editor of The Economist Kenneth Cukier is the Data Editor of The Economist. From 2007 to 2012 he was the Tokyo correspondent, and before that, the paper’s technology correspondent in London, where his work focused on innovation, intellectual property and Internet governance. Kenneth is also the co-author of Big Data: A Revolution That Will Transform How We Live, Work, and Think with Viktor Mayer-Schönberger in 2013, which was a New York Times Bestseller and translated into 16 languages

关于:肯尼斯 库克耶

《经济学人》杂志数据编辑,从2007-2012年驻东京记者,主要聚焦于创新,信息技术,互联网管理。肯尼斯是纽约时报畅销书《大数据时代—生活、工作与思维的大变革》的联席作者,与维克托 迈尔 舍恩伯格,共同出版于2013年,被翻译16种语言.

本文是36大数据独家专稿,禁止转载!

banner
看过还想看
可能还想看
热点推荐

永洪科技
致力于打造全球领先的数据技术厂商

申请试用
Copyright © 2012-2024开发者:北京永洪商智科技有限公司版本:V10.2
京ICP备12050607号-1京公网安备110110802011451号 隐私政策应用权限