banner

大数据变现的关键途径是——可视化

作者: 大数据观察来源: 大数据观察时间:2016-12-26 14:38:300

【编者按】作者David Hoffer是一位出色的设计师,现在是Declara的UE部门负责人。如今大数据日渐从数据科学家的“象牙塔”走进人们的生活,David Hoffer根据自己的经验和思考,提出了数据可视化的概念。Infographic、dashboard、map都可以作为数据可视化的方案,dashboard的设计有时候还不够人性化。Visua.ly、Google Map、Racial Dot Map还有Kepler Mission等越来越多的可视化成果将带我们走进了更加丰富多彩的数据世界。

在Google搜索有关“大数据”,会出现很多个由立体0和1组成的图片,一些解释性的信息图示,甚至出现“黑客帝国”的界面。那“大数据”到底是什么,人类能够理解吗?

如果问一家大公司的首席执行官什么是“大数据”,他们可能会描述一些类似于黑匣子(飞机上的飞行记录器)的东西,或者在白板上画一朵云。如果问数据科学家,他们可能会向你解释一下 4V的概念,4V是指用信息图示解释(其实只是事实的视觉集合),当然还带有相应的说明。之所以这样做是因为“大数据”是一个有着不同含义、象征,应用于不同组织的模糊术语。

可以理解,要想弄明白这是发源于哪、什么时候盛行是很难的。有关记录最早是在2003年, 那时人类创造了5EB数据。到了2011年, 每两天就会产生同样多的数据。诚然,与前几代数据的呈现方式相比,我们已经取得了飞跃发展。但到了今天的大数据时代,数据的呈现方式有助于传递信息,不过它需要的就不仅仅是漂亮的表面文章了。它需要实用,能展现多个维度,还要考虑实用性。

新的软件和技术使我们能够更深入的理解这些庞大的数据集。然而,我们要去真正收集和加工有价值的大数据,唯一方法是要提高数据可视化的水平。我们怎样进行可行性分析、深入了解、全面直观地表示信息呢?答案是,我们需要使数据更容易理解。

新的可视化工具,新的挑战

通过理解大数据,使之更贴近大多数人,最重要的手段的之一就是数据可视化。数据可视化标识导向系统,包括文字的,如街头的路标指引你到高速公路,还有象征的,如颜色、大小或位置的抽象元素传达的信息。在某种意义上,恰当的视觉标识可以提供较短的路线,帮助选择路线,成为通过数据分析传递信息一种重要的工具。然而,要真正可行,数据可视化应有适当的交互性。他们必须设计良好、易于使用、易于理解、有意义、更容易被人接受。

Michal Migurski说道:“数据可视化是一个相对的概念……通常说它是即将出现的新事物。”随着技术的变化而改变,我们不断地开发新的工具以利用它实现跨行业应用。一些熟悉的可视化包括信息图示、臭名远扬的控制面板,当然还有地图。

现今无所不在的 信息图示是澄清复杂问题的好方法。在此类别中, Visua.ly是一个很大的来源。图表通常是在海报或演示文稿中精心制作来传达意思,但在一定的时间内提供的实时信息还远远不够。控制面板或许是一个有用的工具,但它们往往设计的不好。同样的图表和图形重复的出现。

当控制面板设计的像车辆仪表盘和里程计的文字说明时就更糟了。最重要的是当想要通过仪表板传达有关人的信息时,他们往往不够人性化。最后,地图作为一个依赖于地理重要的信息层,是我最喜欢的可视化成果之一。当你可以依靠像一个国家或省的地形等容易识别的形状,地图是很有用的,但如果不是地理数据怎么办?

想想谷歌地图。现在可以说是现今世界上最全面和最成功的数据可视化集。它以多种数据可视化方法提供了一套全面的数据集,不断更新而且相当容易使用。其界面提供满足个人需求和查询数据的多个视图,可以跨设备使用。它还提供了一个强大的API,使它不再仅仅是个软件,而成为一个平台。它的 API能够实现从基础地图功能到呈现难以穷尽的地理信息。

看看Weldon Cooper Center服务大众的 Racial Dot Map(用谷歌API创建),使用颜色编码描绘了在美国分布的种族多样性(类似于在热图上看早晨的天气报告)。你也可以放大一个特定区域或地区来获取细节(每个人代表一个点,按种族用颜色编码)。

有了谷歌,如何显示信息和组织信息成为了大家关心的问题。但这使一个群落更具稳健性(在为Geo产品工作的400多个谷歌员工),因此来源越少,数据可视化的风险越小。

数据光谱的另一端,可以看看纽约时报是怎样用视觉效果为它的报道增光加彩的。例如,一篇关于NASA的 Kepler mission,记录了超过190个被证实围绕遥远恒星运转的行星,从在行星轨道上运行的速度,到距离恒星的距离、恒星温度和星系的大小都加入了浅显易懂的可视化效果。

另一个例子就是用图形描绘 Silk Road,讲述这著名的贸易路线的现代版本。彩色照片和精心编辑的视频,按沿路线上的关键停留点分组、传达丝绸之路的内涵,加上帮助在地理上放置的照片和视频的信息图示。

通过这些可视化成果,你也会开始认识到一些限制,是否要呈现出整个可以想象到的数据(想象一下检查19亿颗的系外行星,而不是190颗),或者是否需要从多个层面上理解。这些例子就像发展大数据可视化的路标。我们从这些零散的示例到更大数据集的应用中又可以获得什么?

大数据才刚刚开始出现,我们管理后端的方式也在不断变化。我们需要强有力的工具通过使数据有意义的方式实现数据可视化,还有数据的可交互性。我们需要跨学科的团队,而不是单个数据科学家、设计师或数据分析员,我们需要重新思考我们所知道的数据可视化。图表和图形还只能在一个或两个维度上传递信息,那么他们怎样才能与其他维度融合到一起深入挖掘大数据呢?我们的大数据可视化(BDV)工具需要实现功能、可更新的,而不是作为软件的部分。

在此过程中,数据变得更具可塑性、可行性,最终更加人性化。通过灵活的数据和可视化框架,我们希望能容纳多种意见,使我们能够利用数据适应不断变化的需求和查询。接受大数据含糊不清的性质,但要提供并找到让它和你联系的更加紧密的工具。数据的视觉解释会因你的目标和对目标的回答的不同而不同。因此,虽然会存在视觉上的相似之处,但没有两个可视化结果是相同的,就像世界上不可能有完全相同的两片叶子。

英语原文:

A simple Google image search on “big data” reveals numerous instances of three dimensional one’s and zero’s, a few explanatory infographics, and even the interface from The Matrix. So what does “big data” look like, within human comprehension?

Ask a CEO of a major company what “big data” is, and they’ll likely describe something akin to a blackbox, the flight recorders on airplanes, or draw a cloud on a whiteboard. Ask a data scientist and you might get an explanation of the 4 V’s, itself an attempt at an infographic (but really just a visual collection of facts) and a corresponding explanation. The reason for this is that “big data” is a nebulous term with different meanings, representations, and uses for different organizations.

Understandably, it’s hard to fathom where to start when there’s so darn much of it. From the beginning of recorded time until 2003, humans had created 5 exabytes (5 billion gigabytes) of data. In 2011, the same amount was created every two days. It’s true that we’ve made leaps and bounds with showing earlier generations of data. However, when it comes to today’s big data, how it looks can help convey information but it needs to be more than just beautiful and superficial. It has to work, show multiple dimensions, and be useful.

New software and technologies have enabled us to gain higher level access to understanding these enormous sets of data. However, the only way we’re going to truly gather and juice all the information big data is worth is to apply a level of relatively unprecedented data visualization. How do we get to actionable analysis, deeper insight, and visually comprehensive representations of the information? The answer: we need to make data more human.

New Visualizations, New Challenges

One of the most valuable means through which to make sense of big data, and thus make it more approachable to most people, is through data visualization. Data visualization is wayfinding, both literally, like the street signs that direct you to a highway, and figuratively, where colors, size, or position of abstract elements convey information. In either sense, the visual, when correctly aligned, can offer a shorter route to help guide decision making and become a tool to convey information critical in all data analysis. However, to be truly actionable, data visualizations should contain the right amount of interactivity. They have to be well designed, easy to use, understandable, meaningful, and approachable.

According to Michal Migurski, “data visualization is a relative term…always referring to the next thing coming over the horizon.” It changes as technology changes and we’re constantly developing new tools in hopes of harnessing its value for application across industries. Some familiar visualizations include infographics, the notorious dashboard, and certainly maps.

Infographics, turning up everywhere these days, are a great way to clarify the complex. In this category, Visua.ly is a great source. Infographics are typically carefully crafted in a poster or presentation to convey meaning, but they fall short of supplying real time information as they’re often fixed in time. Dashboards can be a useful tool, but they’re so often poorly designed. The same charts and graphs are visible again and again.

Worse are when dashboards are literal interpretations of vehicle dashboards, complete with speedometers. Most important is that if the dashboard is trying to convey information about people, they often lack any humanity at all. Finally, maps, which of course rely on geography as an essential layer of information, are one of my favorite visualizations. When you can rely on a very identifiable shape like a country or your province to ground your data, it’s quite helpful, but what if the data isn’t geographic?

Think about Google maps. It is arguably the most comprehensive and successful set of data visualizations on the planet right now. It offers a comprehensive data set in multiple forms, it’s constantly being updated and it’s fairly easy-to-use. Its display provides multiple views on the data to suit individual needs and queries, it’s available across devices and it has a robust API which takes it past software and makes it a platform. Its API allows for anything from basic map functionality to an almost infinite number of geographic representations.

Take a look at the Racial Dot Map (built on the Google API) from the Weldon Cooper Center for Public Service, which depicts the diversity distribution in the United States using color coding (similar to a heat map you see watching the morning weather report). You can also zoom in to get an entirely granular (literally one person per dot view, color coded by race) of a specific area or region.

With Google, great care goes into how the information is displayed and how the form displays data. But it takes a village to be this robust (Google employees more than 400 people to work on their Geo product), otherwise data visualizations, supported by less resources, risk falling short.

On the other end of the data spectrum, take a look at how The New York Times augments its reporting with visuals that tell a story. For example, an article on NASA’s Kepler mission, which tallied more than 190 confirmed planets orbiting distant stars, incorporated comprehensible visualizations with dimensions of data, from the speed at which the planets orbit, to the distance they travel from their star, to the stars temperature, and system size.

Another example would be their graphic depicting the Silk Road, which tells the modern day version of this famous trade route. Colorful photographs and well-edited video, grouped along key stops on the route, convey the road’s essence, and alongside is an infographic to help place the photos and video geographically.

Through these visualizations, you can also begin to recognize a few limitations, whether in presenting the whole of imaginable data (think about examining 1.9 billion exoplanets rather than 190), or the resources needed to comprehend it on multiple dimensions. These examples serve as guideposts in the development of big data visualization. What can we learn from these discrete examples to apply towards larger datasets?

Big data is just beginning to emerge and the way we manage the backend is evolving. We need robust tools to visualize the data in meaningful ways that are interactive. We need cross disciplinary teams rather than just a single data scientist, designer, or data analyst, and we need to reconsider what we know as data visualization. Charts and graphs aren’t sufficient to convey meaning beyond one or two dimensions, so how can they be incorporated into levels of interactivity along other dimensions in order to convey the depth of big data? Our Big Data Visualization (BDV) tools need to be functioning and updatable, not unlike pieces of software.

In this process, data becomes more malleable, actionable, and, ultimately, more human. Through flexible data and visualization frameworks, we want to accommodate multiple biases and make it possible for us to leverage data to fit our changing needs and queries. Embrace the nebulous nature of big data, but provide and seek the tools to make it relevant to you. The visual interpretations of the data will vary depending on your objectives and the questions you’re aiming to answer, and thus, although visual similarities will exist, no two visualizations will be the same. The difference between cirrus and altocumoulous clouds.

banner
看过还想看
可能还想看
热点推荐

永洪科技
致力于打造全球领先的数据技术厂商

申请试用
Copyright © 2012-2024开发者:北京永洪商智科技有限公司版本:V10.2
京ICP备12050607号-1京公网安备110110802011451号 隐私政策应用权限