banner

如果大数据是任何东西……If Big Data Is Anything at All, This Is It

作者: 大数据观察来源: 大数据观察时间:2017-05-08 19:20:510

除了我们几个人是第一次听说过“大数据”这个词,我们是在信息技术厂商开着的营销活动以宣传自己的产品和服务的背景下听到它的。正是这种营销活动使得“大数据”这个词取得了长期流行,最终导致了今天这样家喻户晓状态。尽管它很流行,但它的最终意义仍然需要长期追求。有多少个人和组织想从它存在的信念中获益,就有多少大数据的定义。我在这篇简短的博客文章中要问,“大数据是否意味着实际发生的事情,如果有的话,是什么呢?”

在这个词于2010年各地普遍使用之前,在90年代末,它就开始在这里和那里出现了。它最早出现在数据可视化的背景下,1997年在IEEE第八届会议上可视化的文件的标题由迈克尔•考克斯和大卫•埃尔斯沃定为“应用控制的需求分页外核的可视化。”这篇文章开头如下:

可视化给计算机系统提出了一个有意思的挑战:数据集通常是相当大的,超出了繁重的主存储器,本地磁盘,甚至远程磁盘的容量。我们将其称之为大数据问题。当主存储器装不下数据集时,或当本地磁盘也装不下时,最常见的解决方案是获得更多的资源。

两年后,在1999年的IEEE会议,可视化的主题为为“自动化或互动:什么最适合大数据?”

2001年二月,道格•莱尼,是当时的Meta集团的分析师,现供职于Gartner公司,其发表研究报告题为“3D数据管理:控制数据流量、流速和品种。”大数据这个词并没有出现在报告中,但十年后,数量,速度和品种的“3VS”属性成了被用来定义大数据最常用的三种属性。

我个人第一次见到大数据这个词是在S +和统计分析R语言的衍生工具的制造商2005年发出的邮件中,该电子邮件的标题是将“处理大数据。”

到2008年,这个名词在在科学界才用得足够多,以保证其在《自然》杂志成为一个特殊问题。直到2010年2月,当肯尼斯Cukier写的经济学家题为“数据,数据无处不在”时它才逐渐被广泛地使用。在这篇报告中他说:

……世界包含的多得难以想象的数字化信息变得更多更快……从商业到科学,从政府到艺术,这种影响无处不在。科学家和计算机工程师们给这种现象创造了一个新名词:“大数据”。

正是在这个时候,这个来自学术界的词语,成为近十年最成功的信息技术市场营销活动。 (我发现大部分的历史索引都源自在发布在2012年6月6日福布斯,由吉尔按的标题为“大数据的短暂历史”博客文章)

由于大数据还没有普遍接受的定义,关于它的讨论也很少有意义或有用的。我曾经见到的大数据的定义,确定的有关数据及其用途的任何事情实际都是最新的。道格•莱尼的3VS,这说明指数增加的数据量,速度和品种,在电脑问世很多年前就已经发生了。你可能会认为技术里程碑诸如个人电脑、互联网或社交网络的出现创造了指数增长的数据,但是他们仅仅取得的指数的持续上升是已经发生的事情。如果不是因为这些技术进步,数据将不再成指数增加。最近,定义已强调,大数据是不能用常规技术处理的数据的概念。什么是常规与非常规的技术?我最近对此众多的研究表明大数据是台式电脑处理不了的。而基于这个愚蠢的定义,那么大数据一直存在,因为个人电脑从来没有能够处理许多组织收集的数据集。

因此,如果大数据尚未有统一的定义,并且如果没有现行的定义确定有关数据及其用途实际上是新兴的,那么都这个词是否能形容任何事情?我想过这是一个伟大的问题,我认为,它仅描述了实际发生在最近几年的一件事:

大数据是快速增长的公共意识,这种意识表明数据是可以发现有用的,有时甚至可能有害的知识的宝贵资源。

即使大数据是这样的,仅此而已,你可能会认为我很感激它。我通过帮助人们理解和交流从数据中获得信息谋生,所以大数据更好地契合了我的工作。这里的难题:大数据,一个没有明确的定义的词语,它服务于技术供应商的营销活动,它鼓励人们在没有利用这些技术开发新的技能的前提下对这些技术有信心。这样一来,企业浪费自己的时间和金钱追逐最新的所谓大数据技术,其中有些是有用的,有些没有任何作用,因为技术只能增强人的分析能力;他们不能弥补我们缺乏的技能或完全替代我们的技能。数据确实是一种宝贵的资源,但前提是我们开发这些技能使得它变得意义并且在广阔的和成倍增长的噪音中找到这些相对较少有关的信号。大数据并不做这些,这需要人花时间去学习。

英语原文:

If Big Data Is Anything at All, This Is It

The first time that all but a few of us heard the term “Big Data,” we heard it in the context of a marketing campaign by information technology vendors to promote their products and services. It is this marketing campaign that has made the term popular, leading eventually to the household name that it is today. Despite its popularity, it remains a term seeking a definitive meaning. There are as many definitions of Big Data as there are individuals and organizations that would like to benefit from the belief that it exists. My objective in this brief blog article is to ask, “Does Big Data signify anything that is actually happening, and if so, what is it?”

Long before the term came into common usage around the year 2010, it began to pop up here and there in the late 1990s. It first appeared in the context of data visualization in 1997 at the IEEE 8th Conference on Visualization in a paper by Michael Cox and David Ellsworth titled “Application-controlled demand paging for out-of-core visualization.” The article begins as follows:

Visualization provides an interesting challenge for computer systems: data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. We call this the problem of big data. When data sets do not fit in main memory (in core), or when they do not fit even on local disk, the most common solution is to acquire more resources.

Two years later at the 1999 IEEE Conference on Visualization a panel convened titled “Automation or interaction: what’s best for big data?”

In February of 2001, Doug Laney, at the time an analyst with the Meta Group, now with Gartner, published a research note titled “3D Data Management: Controlling Data Volume, Velocity, and Variety.” The term Big Data did not appear in the note, but a decade later, the “3Vs” of volume, velocity, and variety became the most common attributes that are used to define Big Data.

The first time that I ran across the term personally was in a 2005 email from the software company Insightful, the maker of S+, a derivative of the statistical analysis language R, in the title of a course “Working with Big Data.”

By 2008 the term had become used enough in scientific circles to warrant a special issue of Nature magazine. It still didn’t begin to be used more broadly until February, 2010 when Kenneth Cukier wrote a special report for The Economist titled “Data, Data Everywhere” in which he said:

…the world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly… The effect is being felt everywhere, from business to science, from governments to the arts. Scientists and computer engineers have coined a new term for the phenomenon: “big data.”

It was around this time that the term was snatched from the world of academia to become the most successful information technology marketing campaign of the current decade. (I found most of the historical references to the term Big Data in the Forbes June 6, 2012 blog post by Gil Press titled “A Very Short History of Big Data.”)

Because Big Data has no commonly accepted definition, discussions about it are rarely meaningful or useful. Not once have I encountered a definition of Big Data that actually identifies anything that is new about data or its use. Doug Laney’s 3Vs, which describe exponential increases in data volume, velocity, and variety, have been happening since the advent of the computer many years ago. You might think that technological milestones such as the advent of the personal computer, Internet, or social networking have created exponential increases in data, but they have merely sustained exponential increases that were already happening. Had it not been for these technological advances, increases in data would have ceased to be exponential. Recently, definitions have emphasized the notion that Big Data is data that cannot be processed by conventional technologies. What constitutes conventional vs. unconventional technologies? My most recent encounter with this was the claim that Big Data is that which cannot be processed by a desktop computer. Based on this rather silly definition, Big Data has always existed, because personal computers have never been capable of processing many of the datasets that organizations collect.

So, if Big Data hasn’t been defined in an agreed-upon manner and if none of the existing definitions identify anything about data or its use that is actually new, does the term really describe anything? I’ve thought about this a great deal and I’ve concluded that it describes one thing only that has actually occurred in recent years:

Big Data is a rapid increase in public awareness that data is a valuable resource for discovering useful and sometimes potentially harmful knowledge.

Even if Big Data is this and nothing more, you might think that I’d be grateful for it. I make my living helping people understand and communicate information derived from data, so Big Data has produced a greater appreciation for my work. Here’s the rub: Big Data, as a term with no clear definition, which serves as a marketing campaign for technology vendors, encourages people to put their faith in technologies without first developing the skills that are needed to use those technologies. As a result, organizations waste their money and time chasing the latest so-called Big Data technologies—some useful, some not—to no effect because technologies can only augment the analytical abilities of humans; they cannot make up for our lack of skills or entirely replace our skills. Data is indeed a valuable resource, but only if we develop the skills to make sense of it and find within the vast and exponentially growing noise those relatively few signals that actually matter. Big Data doesn’t do this, people do—people who have taken the time to learn.

Take care,Save to del.icio.us

本文是Save写在perceptualedge上的评论。由36大数据合作伙伴 北理大数据教育 翻译,并由36大数据编辑。

转载请标明 banner

看过还想看
可能还想看
热点推荐

永洪科技
致力于打造全球领先的数据技术厂商

申请试用
Copyright © 2012-2024开发者:北京永洪商智科技有限公司版本:V10.2
京ICP备12050607号-1京公网安备110110802011451号 隐私政策应用权限