What is big data?

Big data refers to any data set that is too large or complex to be dealt with by traditional data-processing stacks. But is connected vehicle data 'big'?

Cloud
Rob
July 14, 2020

There is a well established test for whether your problem is a big data problem - is it defined by the "Three Vs"? They say you know that you have a big data workload if the data you're handling has characteristics matching at least one of these:

  1. Volume - the data sets are vast such that traditional relational database tools (RDBMS) struggle to store and process them
  2. Velocity - data arrives into the system at a high rate
  3. Variety - the data has variation in schema (structure) and format (JSON, positional, CSV, parquet)

In any decent sized fleet deployment, the telemetry data collected by a connected vehicle platform would typically have these Volume and Velocity characteristics. There are also scenarios where we see Variety in the data, often when consolidating data from multiple generations of a connected vehicle platform, or combining with data sources other than the vehicles themselves, e.g. warranty or ERP systems.

Needless to say, traditional relational database tools are not well suited to running analytics across these data sets, which puts us firmly in the realm of big data.

Toy car emitting 1s and 0s - is that big data?