Authors
- Jiaxun Cui*
- Xiaomeng Yang*
- Mulong Luo*
- Geunbae Lee*
- Peter Stone
- Hsien-Hsin S. Lee*
- Benjamin Lee*
- G. Edward Suh*
- Wenjie Xiong*
- Yuandong Tian*
* External authors
Venue
- ICLR'23
Date
- 2023
MACTA: A Multi-agent Reinforcement Learning Approach for Cache Timing Attacks and Detection
Jiaxun Cui*
Xiaomeng Yang*
Mulong Luo*
Geunbae Lee*
Hsien-Hsin S. Lee*
Benjamin Lee*
G. Edward Suh*
Wenjie Xiong*
Yuandong Tian*
* External authors
ICLR'23
2023
Abstract
Security vulnerabilities in computer systems raise serious concerns as computers process an unprecedented amount of private and sensitive data today. Cachetiming attacks (CTA) pose an important practical threat as they can effectively breach many protection mechanisms in today’s systems. However, the current detection techniques for cache timing attacks heavily rely on heuristics and expert knowledge, which can lead to brittleness and the inability to adapt to new attacks. To mitigate the CTA threat, we propose using MACTA, a multi-agent reinforcement learning (MARL) approach that leverages population-based training to train both attackers and detectors. Following best practices, we develop a realistic simulated MARL environment, MA-AUTOCAT, which enables training and evaluation of cache-timing attackers and detectors. Our empirical results suggest that MACTA is an effective solution without any manual input from security experts. MACTA detectors can generalize to a heuristic attack not exposed in training with a 97.8% detection rate and reduce the attack bandwidth of RL-based attackers by 20% on average. In the meantime, MACTA attackers are qualitatively more effective than other attacks studied, and the average evasion rate of MACTA attackers against an unseen state-of-the-art detector can reach up to 99%. Furthermore, we found that agents equipped with a Transformer encoder can learn effective policies in situations when agents with multi-layer perceptron encoders do not in this environment, suggesting the potential of Transformer structures in CTA problems.
Related Publications
Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications. In factored state spaces, one approach towards achieving both goals is …
Robustly cooperating with unseen agents and human partners presents significant challenges due to the diverse cooperative conventions these partners may adopt. Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse tea…
We consider algorithms for learning reward functions from human preferences over pairs of trajectory segments---as used in reinforcement learning from human feedback (RLHF)---including those used to fine tune ChatGPT and other contemporary language models. Most recent work o…
JOIN US
Shape the Future of AI with Sony AI
We want to hear from those of you who have a strong desire
to shape the future of AI.