Breaking the Dyadic Barrier: Rethinking Fairness in Link Prediction Beyond Demographic Parity

By João Mattos, Debolina Halder Lina, and Arlei Silva (Rice University)
Accepted at AAAI-26 (Oral)


Motivation

Link prediction is the task of predicting missing or future connections in a graph. This task drives key applications from social recommendations to knowledge graph completion.

But biased link predictions can reinforce structural inequalities. For example, under-connecting minorities in professional networks like LinkedIn, or creating “filter bubbles” in online communities.

Existing work defines fairness dyadically, comparing inter-group vs. intra-group edges under Demographic Parity (ΔDP).
However, our work shows that this dyadic framing masks hidden biases and is insensitive to ranking exposure and proportions, limiting its effectiveness.


Problem: The Dyadic Fairness Trap

1. Limited Expressive Power of GNNs

Standard Graph Neural Networks (GNNs) operate within the expressivity limits of the 1-Weisfeiler-Lehman test, producing similar embeddings for structurally symmetric nodes, even if they belong to different demographic groups.

Hence, fair node embeddings do not guarantee fair link predictions

Example: Two nodes with similar neighborhoods but different genders may still get indistinguishable embeddings, leading to biased edge predictions.


2. Hidden Bias Within Edge Groups

Demographic parity aggregates sensitive subgroups into broad categories (e.g., “intra” vs “inter” edges), hiding subgroup imbalances.
For instance, male–male links may dominate over female–female links, yet ΔDP will remain low.

Property 1 (Non-Dyadic Distribution-Preserving Fairness): requires the predicted edge-type distribution (π^) to match the original one (π):

minRdist(π^,π)


3. Missing Ranking Awareness

ΔDP treats predictions as unordered, ignoring exposure bias — when certain groups dominate the top of a ranked list despite similar global proportions.
This limitation leads to unfair recommendation visibility.

To fix this, we formalize:

Property 2 (Rank-Aware Fairness): Fairness metrics must consider both proportions and positions of subgroup edges in the ranking.


Solution: Distribution-Preserving and Rank-Aware Fairness

We adapt ideas from fair information retrieval and introduce the Normalized Cumulative KL-Divergence (NDKL) metric:

NDKL=1Zk=1|C|1log2(k+1)DKL(π^kπ)

NDKL penalizes deviations between the exposure of each group in the top-k ranks and their expected proportions.
It satisfies both key properties:

  1. Non-Dyadic (Subgroup-Aware)
  2. Rank-Aware (Exposure-Sensitive)


The MORAL Algorithm

We introduce MORAL (Multi-Output Ranking Aggregation for Link Fairness) — a lightweight, post-processing framework compatible with any link prediction model.

How MORAL Works

  1. Decoupled Training:
    Train separate predictors for each sensitive-group edge type
    (e.g., male–male, female–female, male–female).

  2. Fair Ranking Aggregation:
    During inference, MORAL greedily builds a ranked list that minimizes divergence from the desired exposure distribution π using group-specific scores.

Inputs

Output

def MORAL(n, p):
    """
    Greedy probabilistic approach to generate an array that maintains the 
    target distribution (p0, p1, p2, ...., pn) for any prefix length k.
    
    This code assumes a distribution of 3 link types.
    
    Args:
        n (int): Length of the array.
        p (tuple): Target proportions (p0, p1, p2, ..., pn).

    Returns:
        list: Generated array of length n with values in {0,1,2}.
    """
    target_counts = np.array([p_i * n for p_i in p])  # Expected counts for 0,1,2
    actual_counts = np.zeros(3)  # Current counts of 0s, 1s, 2s
    arr = []

    for i in range(n):
        # Compute probabilities to minimize deviation from target counts
        remaining = n - i
        probs = (target_counts - actual_counts) / remaining  # Adjusted probabilities
        
        # Avoid numerical errors
        probs = np.clip(probs, 0, 1)
        probs /= probs.sum()  # Normalize to sum to 1

        # Sample from adjusted probabilities
        choice = np.random.choice([0, 1, 2], p=probs)
        arr.append(choice)
        actual_counts[choice] += 1

    return arr

This ensures fairness without retraining the base model, and balances accuracy and exposure simultaneously.

One can input the top links of each pair type using a mask:

final_output = torch.zeros(size=(K,))
final_labels = torch.zeros(size=(K,))

output_array_positions = MORAL(K, pi.numpy())
for sens_value in range(3):
	mask = output_array_positions == sens_value
	sens_mask = sens == sens_value
	sens_val_outputs_sorted, idx = output[sens_mask].sort(descending=True)
	final_output[mask] = sens_val_outputs_sorted[:mask.sum()]

In this snippet, output contains the scores for each pair, and sens is an indicator variable, indicating the type of each pair in output.


Experiments

We evaluated on six real-world datasets:

Dataset Sensitive Attr. Domain Type
Facebook Gender Social Periphery
Credit Age Finance Periphery
German Age Credit Periphery
NBA Nationality Twitter Periphery
Pokec-n / Pokec-z Gender Social Community

Results Highlights

Table 1. Fairness performance comparison of all approaches considered (k = 1000).
Lower NDKL and higher prec@1000 are better.
Best NDKL and prec@1000 values are in bold and underlined, respectively.

Method Metric Facebook Credit German NBA Pokec-n Pokec-z
UGE NDKL 0.05 ± 0.00 0.80 ± 0.07 0.08 ± 0.03 0.07 ± 0.02 0.06 ± 0.00 0.06 ± 0.00
prec@1000 0.97 ± 0.00 1.00 ± 0.00 0.69 ± 0.01 0.58 ± 0.01 0.90 ± 0.04 0.91 ± 0.04
EDITS NDKL 0.21 ± 0.07 0.04 ± 0.02 0.08 ± 0.02 0.08 ± 0.01 OOM OOM
prec@1000 0.96 ± 0.00 0.36 ± 0.17 0.42 ± 0.20 0.49 ± 0.00 OOM OOM
GRAPHAIR NDKL 0.13 ± 0.03 0.67 ± 0.22 0.07 ± 0.02 0.09 ± 0.01 0.09 ± 0.03 0.26 ± 0.25
prec@1000 0.96 ± 0.01 1.00 ± 0.00 0.73 ± 0.01 0.69 ± 0.01 0.97 ± 0.03 1.00 ± 0.00
FairWalk NDKL 0.06 ± 0.01 0.06 ± 0.03 0.11 ± 0.02 0.06 ± 0.01 0.07 ± 0.01 0.07 ± 0.00
prec@1000 0.96 ± 0.00 1.00 ± 0.00 0.94 ± 0.00 0.55 ± 0.01 1.00 ± 0.00 1.00 ± 0.00
FairAdj NDKL 0.10 ± 0.05 OOM 0.10 ± 0.01 0.11 ± 0.05 OOM OOM
prec@1000 0.42 ± 0.01 OOM 0.54 ± 0.01 0.50 ± 0.01 OOM OOM
DetConstSort NDKL 0.15 ± 0.00 0.06 ± 0.00 0.04 ± 0.00 0.09 ± 0.00 0.07 ± 0.00 0.23 ± 0.00
prec@1000 0.00 ± 0.00 0.00 ± 0.00 0.55 ± 0.00 0.21 ± 0.00 0.07 ± 0.00 0.01 ± 0.00
DELTR NDKL 0.10 ± 0.03 0.03 ± 0.00 0.09 ± 0.06 0.09 ± 0.02 0.23 ± 0.23 0.22 ± 0.20
prec@1000 0.91 ± 0.05 0.56 ± 0.29 0.31 ± 0.44 0.43 ± 0.24 0.65 ± 0.01 0.48 ± 0.28
MORAL (ours) NDKL 0.04 ± 0.00 0.01 ± 0.00 0.03 ± 0.00 0.02 ± 0.00 0.03 ± 0.00 0.04 ± 0.00
prec@1000 0.95 ± 0.01 1.00 ± 0.00 0.96 ± 0.00 0.80 ± 0.00 0.98 ± 0.00 0.98 ± 0.00
OOM = Out of Memory.

MORAL consistently outperforms existing baselines in the fairness–utility trade-off across all datasets.


Key Takeaways


Broader Impact

MORAL offers a general blueprint for fairness in pair-wise prediction tasks, relevant to recommender systems, graph completion, and link discovery.
By addressing fairness beyond dyadic approaches, it challenges the community to rethink fairness in structured data and graph learning.


Resources