By human standards, big data is now old enough to get married, drive for Grab or roll the dice at the casino. And while we’re not seeing servers starting families just yet, the latter two examples aren’t far from reality.

Autonomous transport powered by big data is edging closer to general availability – including the self-driving tuk-tuk. Similarly, big data-driven AI is regularly taking on and defeating humanity’s best at games like Go and, more recently, Starcraft II

While the evolving use of big data for AI applications like autonomous transport and gaming may give a false impression that big data is still in its infancy, the truth is that it’s into early adulthood, and the implications for advertising are having profound impacts.

The precise origins of the term “big data” still foster debate, but the term is generally attributed to John Mashey, a computer scientist and former employee of Silicon Graphics (SGI), who coined and popularized the term in the ‘90s. The first known appearance of ‘big data’ comes from a 1998 SGI deck called “Big Data and the Next Wave of InfraStress”. Notably, the first published, academic mention of the term occurred in that same year in a data mining book by Weiss and Indrukya.

Since those early days, the relevance and application of big data practices have widened considerably across disciplines and datasets. 

Today, the concept broadly encompasses operating with vast volumes of data, accelerating data velocity and using a broader variety of data. In particular it means operating on datasets that require specific and often unique methods of handling to extract value.

Increased volume, velocity and variety of data are three trends that have also had a big impact on digital advertising over the past two decades. Looking back over the big data milestones in advertising, 1998 resurfaces as an important year once more, as the first digital advertising networks were born in that year. Those first networks initiated what would be a fundamental shift, as media buying went from a one-to-one transaction model between advertisers and media owners to a one-to-many model, paving the way for the modern programmatic ecosystem.

The one-to-many model has yielded immense time and cost efficiency gains for digital advertisers and media owners compared to earlier methods, and the more-is-better design of those data-driven systems drove an accelerating demand for volume, velocity and variety of advertising data. 

While the one-to-many programmatic model (still) has many advantages, it could also be argued that complexity became a by-product of the scale that a one-to-many model allowed. These complexities also allowed bad actors and technical failures to creep into systems, prompting many of the modern challenges around transparency in digital advertising, including viewability, fraud and brand safety – which as an industry we are still working to address. 

As the data-driven digital advertising supply chain matured, the industry began producing far more data at greater velocity than ever before. The net impact of this has been twofold: advertisers and media owners can make more informed decisions, but information overload and decision paralysis have become a constant threat.

If the last 21 years have taught us to expect anything, it’s that the volume, velocity and variety of data is going to continue to increase. So in the years ahead, the challenge for big data in adtech will be to keep pace with that continuing growth while continuing to extract value and avoid decision paralysis.


The volume aspect of the future big data equation may be the element that can be forecasted most directly. The Visual Networking Index offers a perspective on how this has evolved over time and a glimpse of years to come. 

According to the report, internet traffic per capita in 2000 was a mere 10 megabytes (MB) per month. In 2017 that had risen more than a thousand-fold to 13 gigabytes (GB) per capita, and by 2022 total internet traffic is expected to be 150,700 GB per second, or another 5x per capita, based on current global population and internet connected population estimates. That amount of traffic activity is going to require the advertising industry to prepare for related increases in data management.

Velocity may be the most interesting element to watch. In coming years, the speed of data will increase dramatically with technologies like 5G that can deliver theoretical speeds up to one gigabyte per second. Accessibility to high speed capable devices will also increase in emerging markets as cost of ownership reduces, allowing more high speed connections than ever before. 

Similarly, as more devices are developed and new data processing workflows are created, the direction of these data streams will become increasingly diverse. The onus will then be on technology providers and data processing applications to service the increased demand and processing needs for new channels for data delivery.

Perhaps the area that’s hardest to forecast is what the variety of data will look like. Variety of data is something that tends to grow organically as people start to ask new and important questions about the data they have and the data they need. 

The same way viewability data arose from questions around whether an ad was seen, new types of data and ways to understand advertising will continue to emerge as we collectively continue to ask questions about the data that we find valuable, culminating in the eternal question in advertising: “What works?”.

The adtech ecosystem has certainly grown up alongside big data over the past 21 years, but there’s a lot more growing to go, and hopefully those future birthdays will also be worth celebrating.

This article was written by Dave Goodfellow, Business Development & Partnerships Strategy Director, International, Oracle and IAB SEA+India Regional Board Member.