Don’t move data to the analytics — instead, we move analytics to the data. This is Federated Analytics.
Let’s discuss what federated analytics means and how it can help change your data approach in context of the data challenges facing your organization.
Federated Analytics is the concept and practice of performing data analysis tasks on decentralized devices, connected data sources, or cloud points locally — instead of moving that data to a central location for analysis.
The federated analytics approach preserves privacy. That’s because the analytics models process raw information on the device itself. It does not require active collaboration between the devices — user data is not transmitted between devices or a centralized third-party analytics server.
Federated analytics processes local data to update the parameters of its local analytics model. The model updates across all devices are aggregated and then shared to a backend system that updates the parameters of a global analytics model.
This process is iterative: the global model updates guide the next cycle of model parameter updates for the local device models.
At all stages of the federated analytics process, privacy is preserved for the raw user information — that’s possible because the data itself never leaves the device. The model parameter information extracted from each device is also insufficient to reverse engineer sensitive user information. You can make it especially secure if this information is encrypted at the device level prior to transmission.
When we speak of federated analytics as a generalized version of distributed intelligence and a collaborative computing paradigm, it includes non-ML models used for:
These models are lightweight and require low compute power to analyze data in real-time. These models suffice for analyzing trends and extrapolation from historical data that may not be very high-dimensional (where each data observation describes a high number of features or variables).
For example, IoT sensors measuring individual characteristics at a network node, such as:
Federated Analytics is different from the core distributed intelligence paradigm of Federated Learning, which follows the same principles for:
However, the difference is that the models are typically large neural networks. The global model combining all parameter updates typically exceeds millions of parameters, or billions in the case of Large Language Models.
(Related reading: federated data, federated search & federated AI.)
In this keynote, Gary Steele, EVP, General Manager, Splunk, discusses the role of federated analytics in modernized SOCs at RSA Conference, 2024.
Consider the following key challenges:
To solve these challenges, we need to change our approach. That means that any model where you centralize all your data — whether for IT operations or the SOC — the model of centralizing all data is fundamentally gone.
Data remains critical, of course — instead of moving data to the centralized analytics platform, let’s move analytics to the data.
For example, in the context of a SOC, a modern one will be built on the following three pillars:
Indeed, it is precisely the state of technology available today that allowsus to change our approach in many systems and drive analytics to the edge.
Now, let's review some of the key downstream applications of Federated Analytics. The demands for federated analytics come from a few primary drivers.
For an analytics model to impute data from a true data distribution, it must have access to either:
Access to this information is limited in privacy-sensitive use cases, such as serving ads based on recent financial transactions. This information is also widely distributed, across devices, client endpoints such as browsers and apps — each with their own layers of separation for privacy and security.
A federated analytics service can access distributed data sources and perform the necessary computing operations locally within all distributed devices and client endpoints, which allows for real-time analytics processing.
Only the model parameter updates are communicated to a backend server. It is this piece that allows for the efficient execution of decision controls between a centralized analytics service running a global model at the back end and the front-end user device.
This means that federated analytics and AI models can be trained on domain-specific and contextual problems that require continual learning.
As people become more aware of privacy issues, they are opting out of features and services that provide unrestricted data to third-party services. These services access limited information from sources such as:
Privacy regulations also restrict organizations from sharing sensitive end-user data with their partners. Any loss of user information or a data breach incident can lead to hefty fines, as well as loss of brand loyalty and consumer interest.
This is where federated analytics can fill the gap: data never leaves a user’s device — the model parameters are updated based on true user information to guide analytics services. The service provider and their partners never access user information outside of the user-controlled device.
That means federated analytics could help narrow the scope of your regulatory compliance.
Of course ML-based analytics models are large, but did you know that even the non-ML analytics models can be large in several ways?
The feature space and the number of variables captured by device sensors locally can be large and complex.
User information and sensor data is stored in hierarchical structures with underlying dependencies and relationships, which must be captured and analyzed separately for every model deployment. For a large collective user base, the global models are large enough to warrant dedicated computing resources in a cloud-based data center.
This is where federated analytics presents itself as a mechanism for performance optimization: It reduces the need (and cost, and resources) for large-scale data transfer. That’s because the analytics processing takes place on smaller datasets locally to update a relatively small set of model parameters for faster real-time insights.
In this model, compliance and privacy controls are strictly enforced at the user end.
The key challenge for federated analytics relates to data quality. The training data is heavily skewed with a bias toward the device user. This is different from the conventional approach to training or updating an analytics model, which involves the training process to run on the entire data distribution collectively.
The challenge for engineering teams is to understand the patterns of bias and then regulating model updates in ways that would compensate for the model bias.
See an error or have a suggestion? Please let us know by emailing ssg-blogs@splunk.com.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.