观世界|外媒预测2019年大数据趋势

2018-12-06 12:31  来源:数据观

  Big Data Trends in 2019

  数据观|黄玉叶(译)

  【编者按】2019年,新的大数据概念及技术将陆续浮出市面,老旧技术会逐步消失,或者出现旧术新用的情况。物联网的持续壮大为大数据提供了鲜活资源,新技术不仅可以改变商业情报的收集方式,同样也会改变商业运作的模式……

  The accessibility of data has provided a new generation of technology and has shifted the business focus towards data-driven decision making. Big Data Analytics is now an established part of gathering Business Intelligence. Many businesses, particularly those online, consider Big Data a mainstream practice. These businesses are constantly researching new tools and models to improve their Big Data utilization.

  数据的可访问性衍生出新一代技术,并将商务重头转向数据驱动的决策制定。现下,大数据分析已成为收集商业情报的组成部分。许多企业,尤其是线上企业,都认为大数据是主流标配。这些企业马不停蹄地研究新工具、新模型,以提高他们的大数据利用率。

  In 2019, some tools and trends will be more popular than others. New Big Data concepts and technologies are constantly appearing on the market, and older technologies fade away, or get used in new ways. The continuous growth of the Internet of Things (IoT) has provided several new resources for Big Data. New technologies change not only how Business Intelligence is gathered, but how business is done.

  2019年,一些工具和趋势将脱颖而出,更受青睐。新的大数据概念及技术将陆续浮出市面,老旧技术会逐步消失,或者出现旧术新用的情况。物联网的持续壮大为大数据提供了新的资源,新技术不仅改变了商业情报的收集方式,同样也改变了商业运作模式。

  Streaming the IoT for Machine Learning

  将物联网(IoT)串联至机器学习

  There are currently efforts to use the Internet of Things (IoT) to combine Streaming Analytics and Machine Learning. In 2019, we can anticipate significant research on this theme, and possibly a startup or two marketing their services or software.

  当前,相关研究正努力让物联网和流分析、机器学习结合起来。2019年,我们可以对这一主题的重大研究翘首以盼,一两家初创企业有望从事相关服务或软件营销。

  Typically, Machine Learning uses “stored” data for training, in a “controlled” learning environment. In this new model, streaming data provides useful information from the Internet of Things to offer Machine Learning in real time, in a less controlled environment. A primary goal in this process is to provide more flexible, more appropriate responses to a variety of situations, with a special focus on communicating with humans.

  通常,机器学习使用“存储”数据在“受控”的学习环境中进行训练。在新的模型中,物联网中的流数据提供有用信息,在一个不那么“受控”的环境中实时支持机器学习。这个过程的主要目的是重点关注人机交流,让机器面对各种情况可以作出更灵活更适当的反应。

  Changing from a training model that uses a controlled environment and limited training data to a much more open training system requires more complex algorithms. Machine Learning then trains the system to predict outcomes with reasonable accuracy. As the primary model adjusts and evolves, models at the edge or in the Cloud will coordinate to match the changes, as needed. Ted Dunning, the Chief Application Architect at MapR said:

  从一种使用受控环境加有限训练数据的训练模型到一个更加开放的训练系统,需要更复杂的算法。机器学习继而训练系统以合理的精度预测结果,随着初级模型的调整和演进,边缘计算或云计算中的模型将根据需要进行协调以匹配这些变化。MapR(知名大数据企业)的首席应用程序设计师Ted Dunning说:

  “We will see more and more businesses treat computation in terms of data flows rather than data that is just processed and landed in a database. These data flows capture key business events and mirror business structure. A unified data fabric will be the foundation for building these large-scale flow-based systems.”

  “我们将看到越来越多的企业以数据流的方式来处理计算,而不是仅仅处理数据并将其存入数据库。这些数据流捕获关键业务事件并反映业务结构,要构建这些大型的,基于流的系统,统一的数据结构是基础。”

  AI Platforms

  人工智能平台

  Big Data as a tool of discovery continues to evolve and mature, with some enterprises accessing significant rewards. A recent advancement is the use of AI (Artificial Intelligence) platforms. AI platforms will have significant impact over the next decade. Using AI platforms to process Big Data is a significant improvement in gathering Business Intelligence and improving efficiency. Anil Kaul, CEO and Co-Founder of Absolutdata stated:

  大数据作为一种探索工具不断发展趋向成熟,一些企业因此获得了可观回报。最近的一项进展是人工智能平台的使用,人工智能平台将在未来十年产生重大影响。利用人工智能平台处理大数据,是收集商业情报,提高效率的一个重要改进。Anil Kaul,Absolutdata(知名大数据企业)的首席执行官和联合创始人说:

  “We started an email campaign, which I think everybody uses Analytics for, but because we used AI, we created a 51 percent increase in sales. While Analytics can figure out who you should target, AI recommends and generates what campaigns should be run.”

  “我们发起了一个电子邮件活动,我认为每个人都要用到大数据分析,但是通过使用人工智能,我们创造了51%的销售增长额。当大数据分析找出你的既定目标对象时,人工智能会建议并生成应该发起的活动。”

  AI platforms will gain in popularity in 2019. AI platforms are frameworks designed to work more efficiently and effectively than more traditional frameworks. When an AI platform is designed well, it will provide faster, more efficient communications with Data Scientists and other staff. This can help reduce costs in several ways—such as by preventing the duplication of efforts, automating basic tasks, and eliminating simple, but time-consuming activities (copying, data processing, and constructing ideal customer profiles).

  人工智能平台将在2019年普及。人工智能平台比传统框架更有效,平台的设计,能够建立与数据科学家和其他工作人员之间快速、高效的交流方式,多方降低成本,比如防止重复工作、自动完成基础任务、消除简单又耗时的内容(复制、数据处理和构建理想客户档案)。

  AIs will also provide Data Governance, making best practices available to Data Scientists and staff. The AI becomes a trusted advisor, and can also help to ensure work is spread more evenly, and completed more quickly. Artificial Intelligence platforms are arranged into five layers of logic:

  人工智能系列还将提供数据治理,为数据科学家和工作人员带来最佳实践。人工智能会成为一个值得信赖的顾问,帮助确保均匀分工并快速完成工作。人工智能平台可以分为五层逻辑:

  ·The Data & Integration Layer gives access to the data. (Critical, as developers do not hand-code the rules. Instead, the rules are being “learned” by the AI.)

  ·The Experimentation Layer lets Data Scientists develop, test, and prove their hypothesis.

  ·The Operations & Deployment Layer supports model governance and deployment. This layer offers tools to manage the deployment of various “containerized” models and components.

  ·The Intelligence Layer organizes and delivers intelligent services and supports the AI.

  ·The Experience Layer is designed to interact with users through the use of technologies such as augmented reality, conversational UI, and gesture control.

  ①数据和集成层:提供对数据的访问。(关键是,开发人员不会手工编写规则;相反,人工智能正在“学习”这些规则)

  ②实验层:允许数据科学家开发、测试和验证他们的假设。

  ③操作和部署层:支持模型管理和部署。这一层提供了管理各种“集装箱化”模型和组件部署的工具。

  ④智能层:组织和交付智能服务,支持人工智能。

  ⑤体验层:旨在通过使用增强现实、对话界面和手势控制等技术与用户交互。

  The Data Curator

  数据管理员

  In 2019, many organizations will find the position of Data Curator (DC) has become a new necessity. The Data Curator’s role will combine responsibility for managing the organizations metadata, as well as Data Protection, Data Governance, and Data Quality. Data Curators not only manage and maintain data, but may also be involved in determining best practices for working with that data. Data Curators are often responsible for presentations, with the data shown visually in the form of a dashboard, chart, or slideshows.

  2019年,大众会发现数据管理员(DC)的职位将成为一种新的需要。数据管理员的角色将把管理元数据的责任和数据保护、数据治理和数据质量结合起来。数据管理员不仅管理和维护数据,而且还可能参与确定与该数据的最佳工作实践。数据管理员通常负责演示,数据显示在仪表板、图表或幻灯片的形式中。

  The Data Curator regularly interacts with researchers, and also schedules educational workshops. The DC communicates with other curators to collaborate and coordinate, when appropriate. (Good communication skills are a plus). Tomer Shiran, co-founder and CEO of Dremio, said:

  数据管理员定期与研究人员进行互动,并安排教育研讨会。在适当的情况下,数据管理员与其他策展人交流合作和协调。Dremio(知名大数据企业)的联合创始人兼首席执行官Tomer Shiran说:

  “The Data Curator is responsible for understanding the types of analysis that need to be performed by different groups across the organization, what datasets are well suited for this work, and the steps involved in taking the data from its raw state to the shape and form needed for the job a data consumer will perform. The data curator uses systems such as self-service data platforms to accelerate the end-to-end process of providing data consumers access to essential datasets without making endless copies of data.”

  “数据管理员负责理解跨组织中不同组执行的分析类型,什么数据集适配什么工作,以及数据消费者将数据从原始状态转换为执行形态时所涉及的步骤。数据管理员使用自助数据平台等系统加速端到端的流程,为数据消费者提供对基础数据集的访问,而非无休止地复制数据。”

  Politics and GDPR

  政治与《通用数据保护条例》(GDPR)

  The European Union’s General Data Protection Regulation (GDPR) went into effect on May 25, 2018. While GDPR is focused in Europe, some organizations, in an effort to simplify their business and promote good customer relations, have stated they will provide the same privacy protections for all their customers, regardless of where they live. This approach, however, is not the general position taken by businesses and organizations outside of Europe. Many corporations have chosen to revamp their consent procedures and data handling processes, and to hire new staff, all in an effort to maximize the private data they “can” gather.

  欧洲联盟的通用数据保护条例(GDPR)已于2018年5月25日生效。虽然GDPR针对欧洲国家,但一些企业为了简化业务,促进良好客户关系,也声明他们将为所有客户提供同样的隐私保护,不管他们来自哪个国家。然而,这种方法并不是欧洲以外的企业和组织所采取的基本立场,许多公司选择修改他们的同意程序和数据处理流程,并雇佣新员工,这一切做法都是为了使他们“可以”最大化收集私人数据。

  Businesses relying on “assumed consent” for all processing operations can no longer make this assumption when doing business with Europeans. Businesses have had to implement new procedures for notices and receiving consent, and many are currently trying to plan for what’s next, while simultaneously struggling with problems in the present.

  所有业务运作都依赖于“假定同意”的企业,在与欧洲人做生意时,不能再做出假定同意了。企业不得不实施通知和征求同意的新程序,许多企业目前正在努力为下一步做计划,同时也在努力解决当前问题。

  Several organizations have assigned GDPR responsibilities to their Chief Security Officers. (The CDC should be responsible for having these changes made.) Though GDPR fines can be quite large (fines can be as high as 20 million Euros or four percent of the annual global turnover, depending on which is higher), many businesses, especially in the United States, are still not prepared.

  一些组织已经将GDPR的责任交给了他们的首席安全官(首席安全官应对这些变化负责)。虽然GDPR的罚款金额可能相当大(罚款金额可能高达2000万欧元或4%的年度全球营业额,这取决于两者哪个更高),但许多企业,尤其在美国,仍然没有准备好。

  In 2019 the U.S. government could make an effort to imitate the GDPR and hold businesses accountable for how they handle privacy and personal data. In the short term, it would make sense for online businesses to begin implementing new privacy policies or simply make the shift to a GDPR policy format. Making the shift now, and advertising it on the company’s website, has the potential to develop a good relationship with the customer base.

  2019年,美国政府可能会努力模仿GDPR,让企业对他们如何处理隐私和个人数据负责。从短期来看,在线企业开始实施新的隐私政策,或者干脆改用GDPR政策模式,都是有意义的。现在,在公司网站上做广告,有可能与客户建立良好的关系。

  5G Not Likely in 2019

  2019年5G不太可能实现

  Switching to a 5G (fifth generation) system is expensive and comes with some potential issues. While the expense may not stop 5G implementation in 2019, other problems might.

  切换到5G(第五代)系统相当昂贵,并且存在一些潜在的问题。虽然高昂的费用可能不会阻挡2019年实施5G的步伐,但其他问题也许会。

  Though the U.S. Federal Government completely supports the implementation of a 5G system, some communities have passed ordinances halting the installation of a 5G infrastructure. It seems likely this will become a standard practice for blocking 5G systems.

  虽然美国联邦政府完全支持实施5G系统,但一些社区已经通过了阻止5G基础设施安装的条例,这似乎将成为阻止5G系统的标准做法。

  An additional factor blocking 5G is a decision by the United States FCC, which eliminated regulations supporting net neutrality. Net neutrality offered internet providers, and their users, a level playing field, and promoted competition. Net neutrality is the concept that internet providers should treat all data, and people, equally, without discrimination and without charging different users different rates based on such things as speed, content, websites, platforms, or applications.

  阻碍5G的另一个因素是美国联邦通信委员会(FCC)的一项决定,该决定取消了支持网络中立性的法规。网络中立为互联网提供商及其用户提供了一个公平的竞争环境,促进公平竞争。网络中立性是指互联网供应商应该平等对待所有数据和人,不歧视,不根据速度、内容、网站、平台或应用程序向不同的用户收取不同的费用。

  Hybrid Clouds Will Gain in Popularity

  混合云将或将普及

  Clouds and Hybrid Clouds have been steadily gaining in popularity and will continue to do so. While an organization may want to keep some data secure in its own data storage, the tools and benefits of a hybrid system make it worth the expense. Hybrid Clouds combine an organization’s private Cloud with the rental of a public Cloud, offering the advantages of both. Expect a significant increase in the use of Hybrid Clouds in 2019.

  云和混合云一直在稳步增长,并将继续这样做。虽然企业可能希望在自己的数据存储中保持某些数据的安全性,但是混合系统的工具和优点使其值得付出代价。混合云将企业的私有云与租用公共云结合在一起,提供了两者的优点,预计混合云的使用将在2019年显著增加。

  Generally speaking, the applications and data in a Hybrid Cloud can be transferred back and forth between on-premises (private) Clouds and IaaS (public) Clouds, providing more flexibility, deployment options, and tools. A public Cloud, for example, can be used for the high-volume, low-security projects, such as email advertisements, and the on-premises Cloud can be used for more sensitive projects, such as financial reports.

  一般来说,混合云中的应用程序和数据可以在本地云(私有)和IaaS云(公共)之间来回传输,从而提供更多的灵活性、部署选项和工具。例如,公共云可以用于高容量、低安全性的项目,如电子邮件广告,而本地云可以用于更敏感的项目,如财务报告。

  The term “Cloud Bursting” is a feature of Hybrid Cloud systems and describes an application that is running within the on-premises Cloud, until there is a spike in the demand (think Christmas shopping online, or filing taxes), and then the application will “burst” through, into the public Cloud, and tap into additional resources.

  “云爆发”这一术语是混合云系统的功能,描述了一个运行在本地云上的应用程序,当该应用程序遇到一个激增的需求(例如圣诞节网上购物,或申请税等情况),通过“爆发”至公共云,攫取和利用额外的资源。

作者:Keith D. Foote 编辑:武淑