<aside> ❗ The content of this guide was transferred to Memgraph docs on August 2nd, 2022. https://memgraph.com/docs/memgraph/next/under-the-hood/storage (and later) https://memgraph.com/docs/memgraph/under-the-hood/storage
This page will no longer be maintained.
</aside>
Estimating Memgraph's storage memory usage is not entirely straightforward because it depends on a lot of variables, but it is possible to do so quite accurately. Below is an example that will try to show the basic reasoning.
If you want to estimate the storage memory usage, use the following formula:
$$ StorageRAMUsage = NumberOfNodes260B + NumberOfEdges180B $$
Let's test this formula on the Marvel Comic Universe Social Network dataset, which is also available as a dataset inside Memgraph Lab and contains 21,723 nodes and 682,943 edges.
According to the formula, storage memory usage should be:
StorageRAMUsage = 21,723260B + 682,943180B StorageRAMUsage = 5,647,980B + 122,929,740B = 128,577,720B ~125MB
Now, let's run an empty Memgraph instance on a x86 Ubuntu. It consumes ~75MB of RAM due to baseline runtime overhead. Once the dataset is loaded, RAM usage rises up to ~260MB. Memory usage primarily consists of storage and query execution memory usage. After executing FREE MEMORY
query to force the cleanup of query execution, the RAM usage drops to ~200MB. If the baseline runtime overhead of 75MB is subtracted from the total memory usage of the dataset,
which is 200MB, and storage memory usage comes up to ~125MB, which shows that the formula is correct.
Let's dive deeper into the memory usage values. Because Memgraph works on the x86 architecture, calculations are based on the x86 Linux memory usage.
<aside> ℹ️ For precise/latest memory layout please clone Memgraph and use, e.g., pahole to discover accurate info.
</aside>
Each Vertex
and Edge
object has a pointer to a Delta
object. The Delta
object stores all changes on a certain Vertex
or Edge
and that's why Vertex
and Edge
memory usage will be increased by the memory of the Delta
objects they are pointing to. If there are few updates, there are also few Delta
objects because the latest data is stored in the object. But, if the database has a lot of concurrent operations, many Delta
objects will be created. Of course, the Delta
objects will be kept in memory as long as needed, and a bit more, because of the internal GC inefficiencies.
Each Delta
object has a least 104B
Each Vertex
object has at least 112B + 104B for the ****Delta
object, in total, a minimum of 216B