
Federated search refers to the practice of retrieving information from multiple distributed search engines and databases — all from a single user interface. Consider it to be a one-stop shop for data search.
The user interface acts as a centralized site that connects siloed information sources and search engines. Every search query, from every user, aims to find distinct pieces of information and serve them with the highest precision of relevance.
Federated vs unified search engines
In general, we can compare federated search to a single database system like so:
- Federated search offers an efficient mechanism to search across multiple database systems.
- A single database system that can grow exponentially large may be able to carry all possible information assets — but retrieving some asset may require searching through the entire database.
Now let’s go a bit deeper and see exactly how federated search works. While it’s an important goal for overall user experience, it is not without challenges.
Phases in how federated search works
A federated search system can consist of the following phases:
Query transformation & broadcasting
First, the query is transformed into the right syntax and broadcasted to all search engines. At this stage, the query does not associate to a particular text, since that will require searching into the entire database.
Combined with delays in network transmission, an efficient discovery process is adopted to select regions of interest in the database systems.
Resource representation
A variety of methods may be used to represent search engine resources:
- Extracting search terms on the query interface of the search engine.
- Generating summary of content on relevant pages listed by the search engine.
- Query-based sampling that goes beyond database crawling to find relevant resource descriptions.
Resource ranking
Once the resources are discovered, they are ranked in order of relevance and precision. At this time, multiple resources may point to similar or duplicate text results. The goal is to collectively optimize search result precision across the best search engines.
Distributed search
The quality of output is compared and the best search engines are selected for the query. The query is performed and relevant search data is extracted.
Merging
Here, merging results from combining several search engines. Common types of merging are:
- Search-time merging. Searching through each index separately. No unified indexing standardization is required.
- Index-time merging. All searchable data is available in a central indexing system and searching through the indices is more efficient.
Presentation & sorting
Combining relevant results and presenting them to the end-user through a unified interface. The results are sorted according to precision scores or other metrics that better describe relevance of the output, such as results from similar search queries, use base, location, context, industries and time.
Challenges with federated search
Any federated search system, the technology aims to solve two key problems:
- Understanding the search query in context of the searcher’s intent.
- Classifying data with the highest precision relevance.
Now, where federated search relies on AI and machine learning, which is increasingly the case, these two key issues are even more difficult to solve. Here are some of the reasons behind these challenges.
- Language nuances. Search queries are not always self-explanatory. The search process may need to consider nuances in language, based on various demographics and context that may not be available.
- Data structure. Relevant data may take different forms; it may be challenging to compare data content of different structures for relevance. For example, is a text response better than a video result?
- Selecting scoring metrics. The federated search system will only rank based on the selected scoring metric. Different metrics can return significantly different result ranking.
- Query features & robustness. Search engines may allow characters such as quotation marks and hyphen to better describe the search intent. However, not all search engines support similar search query features. The lack of a unified and standardized system for developing a robust search querying system can reduce the effectiveness of the search process.
- Availability & timeout. Users expect to see search results within seconds. Any search response that takes excessive time may be left out of the search result even when the content is relatively important.
- Restricting search scope. If a search engine requires authentication, users must be able to login to these systems. This may require handling of sensitive login credentials, applying security and privacy protocols and regulatory compliance.
- Data pipeline. An efficient data pipeline is required to store data of various formats, provide scalability and make querying an efficient process.
Solving these challenges = maturing federated search
Looking back at the two problems: understanding search queries and developing an efficient classification system. In context of the challenges described above, solving the first problem is a matter of going beyond traditional federated search practice.
The search system must incorporate advanced AI capabilities that help associate context to a search query. The search process needs to be personalized and relevant, yes, but returning the most relevant search results is not simply a matter of fixing data output based on score metrics.
A mature federated search system satisfies search results based on context, stitching the search journey using relevant information in a secure and privacy-friendly environment. It is also unified across digital channels, platforms and devices. A reactive federated search result only includes data responses to the query — a mature search system returns recommendations and personalized results to complement the expected search output.
What is Splunk?
This posting does not necessarily represent Splunk's position, strategies or opinion.