Tor is a widely used anonymity network that
conceals user identities by routing traffic through encrypted
relays, yet it remains vulnerable to traffic correlation attacks
that deanonymize users by matching patterns in ingress and
egress traffic. However, existing correlation methods suffer from
two major limitations: limited robustness to noise and partial
observations, and poor scalability due to computationally
expensive pairwise matching. To address these challenges, we
propose RECTor, a machine learning-based framework for traffic
correlation under realistic conditions. RECTor employs attentionbased
Multiple Instance Learning (MIL) and GRU-based temporal
encoding to extract robust flow representations, even when
traffic data is incomplete or obfuscated. These embeddings
are mapped into a shared space via a Siamese network, and
efficiently matched using approximate nearest neighbor (aNN)
search. Empirical evaluations show that RECTor outperforms
state-of-the-art baselines such as DeepCorr, DeepCOFFEA, and
FlowTracker—achieving up to 60% higher true positive rates
under high-noise conditions, and reducing training and inference
time by over 50%. Moreover, RECTor demonstrates strong
scalability: inference cost grows near-linearly as the number
of flows increases. These findings reveal critical vulnerabilities
in Tor’s anonymity model and highlight the need for advanced
model-aware defenses.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the National Research Foundation Singapore, and the Cyber Security Agency of Singapore - National Cybersecurity R&D Programme and the CyberSG R&D Programme Office
Grant Reference no. : CRPO-GC2-ASTAR-001