Unveiling GELATO: A Novel Approach to Sparse Link Prediction

By João Mattos, Zexi Huang, Mert Kosan, Ambuj Singh, and Arlei Silva

KDD 2025 Oral Presentation Highlight

Link prediction, the task of identifying missing or potential connections in graphs, is a cornerstone of modern machine learning with applications spanning recommendation systems, protein-protein interaction networks, and social media analysis. However, the real world often presents a significant challenge: graphs are sparse. In these scenarios, the vast majority of possible connections do not exist, leading to extreme class imbalance and rendering traditional link prediction methods ineffective. Our recent work, presented at KDD 2025, introduces GELATO, a novel framework designed to overcome these limitations and deliver superior performance in sparse link prediction.

Biased sampling in link prediction
Figure 1. Traditional biased sampling with equal positives and negatives.
Real-world graph sparsity
Figure 2. Real-world graphs are extremely sparse, with negatives far outweighing positives.

Traditional link prediction methods often rely on a biased sampling approach (first image above), where an equal number of existing (positive) and non-existing (negative) links are used for training. This approach, however, fails to capture the reality of most real-world graphs, which are incredibly sparse. In a sparse graph, the number of non-existent links vastly outweighs the number of existing ones (see second image above). This imbalance poses a significant challenge for machine learning models, which can become overwhelmed by the majority class (negative links) and fail to learn the subtle patterns that define the minority class (positive links).

Our research has identified two key limitations of state-of-the-art link prediction methods when confronted with sparse graphs:

  1. The “Hard Negative” Problem: Existing methods struggle to distinguish between positive links and “hard negatives.” These are non-existent links that share many characteristics with positive links, such as being located within the same community in a social network. The inability to differentiate these hard negatives from true positives is a major bottleneck for accurate link prediction.

  2. Binary Classification vs. Ranking: Many current methods treat link prediction as a binary classification problem (a link either exists or it doesn’t). This approach fails to capture the nuances of link prediction in the real world, where the goal is often to rank potential links by their likelihood. By treating all negative links as equal, these methods miss out on valuable information that could be used to improve their ranking performance.

Illustration of ranking vs classification
Figure 3. Ranking-based link prediction captures nuances missed by binary classification.
Overview of GELATO framework
Figure 4. The GELATO framework: unbiased sampling, attribute-enhanced learning, autocovariance, and ranking loss.

To address these challenges, we developed GELATO, a novel framework that reframes link prediction as a ranking problem and incorporates several key innovations:

Scaling GELATO for the Real World

Scalability challenges in GELATO
Figure 5. Scaling GELATO with hard negatives and sparse matrix operations.

While GELATO’s approach is powerful, applying it to massive real-world graphs requires careful consideration of scalability. To address this, we have developed several techniques to make GELATO efficient and practical for large-scale applications:

GELATO in Action: Superior Performance and Key Insights

Our experiments demonstrate that GELATO consistently outperforms state-of-the-art link prediction methods, particularly in sparse graph settings. GELATO not only excels at distinguishing between hard negatives and positive links but also highlights the surprising effectiveness of similarity-based learners and classic heuristics, which can even outperform more complex GNN-based methods in certain scenarios.

Our work with GELATO has yielded several key insights:

In conclusion, GELATO offers a robust, scalable, and highly effective solution for sparse link prediction. By rethinking the fundamental assumptions of traditional link prediction methods and introducing a novel set of techniques, GELATO paves the way for more accurate and reliable link prediction in a wide range of real-world applications.

We invite you to read our full paper for a more detailed discussion of GELATO and our experimental results.

Full Paper | Source Code