Critical Capabilities to Build Knowledge Retrieval Systems

AUTHOR: @Marko Budiselic DATE: November 22, 2024

In-memory first approach because low latency is the king

The in-memory approach excels in knowledge retrieval due to its speed and low latency, as data stored in RAM can be accessed far faster than from disk. This is critical for real-time applications like semantic search, chatbots, or any other information retrieval system where response times in milliseconds are essential. In-memory systems support efficient random access and use optimized data structures, such as direct pointed, hash tables or vector indices, enabling both, rapid lookups and complex queries.

Multiple search options

Regular database indexes and tightly integrated text or vector search indexes are essential to quickly finding the most relevant candidates for the content of interest. In a system where indexing is decoupled, it's hard to ensure the right level of consistency and low latency (again, the latency bit 😃).

Deep-path traversals

Once there are candidates for the content/entities of interest, deep path traversals like BFS, DFS, Shortest Path, and A* could be used to explore the neighborhood and find relevant connected entities. E.g., vector search is extremely powerful in finding the most similar digital content, but it can’t tell you how that content relates to the rest of your data. Learn more about deep-path traversals under Memgraph's official documentation pages.

Dynamic graph algorithms

In general, graph algorithms like Community Detection or Page Rank are interesting because they are helpful to group or order (rank) entities inside the knowledge graphs.

Usually, when people mention graph algorithms, they refer to static graph algorithms. The word static means that the algorithm assumes underlying data won’t change during the algorithm execution. The tricky thing is that these algorithms usually take a long time because the results depend on processing the whole graph. As the dataset grows, the time to recompute the results if data changes also grows. To make things worse, it’s not just linear growth in time; it’s often bigger than linear, in many cases quadratic or bigger.

To be able to compute the results in real-time, dynamic graph algorithms come to the rescue. Sometimes, those algorithms are also called streaming, incremental or online (NOTE: there are subtle differences between all these terms, but they are very, very similar). The basic idea is to assume the underlying data will change over time and implement every static algorithm so that only a small portion of the whole graph is considered during the recompute. Of course, there is a clear tradeoff between real-time latency and accuracy, but for pure analytical use cases, 100% accuracy can be taken away to get a faster response from the system.

Graph as an abstraction to express what a system should do

Graphs are a perfect abstraction for modeling the real world because everyone can reason about nodes and edges (even though not everyone knows this). If you start describing your pipelines as a graph, on the one hand, non-tech people can understand and provide feedback; on the other hand, tech people can easily understand the intentions of different stakeholders and focus on the implementation details.