PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing

Page view(s)

Checked on Aug 23, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/21246

Title:

PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing

Journal Title:

33rd USENIX Security Symposium (Usenix Security)

DOI:

Publication URL:

https://www.usenix.org/conference/usenixsecurity24/presentation/deng

Authors:

Gelei Deng, Yi Liu, Víctor Mayoral-Vilches, Peng Liu, Yuekang Li, Yuan Xu, Tianwei Zhang, Yang Liu, Martin Pinzger, Stefan Rass

Keywords:

Large language model, Cybersecurity

Publication Date:

14 August 2024

Citation:

Deng, G., Liu, Y., Mayoral-Vilches, V., Liu, P., Li, Y., Xu, Y., Zhang, T., Liu, Y., Pinzger, M., Rass, S. (2024). PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing. In 33rd USENIX Security Symposium (USENIX Security 24) (pp. 847–864). Philadelphia, PA: USENIX Association

Abstract:

Penetration testing, a crucial industrial practice for ensuring system security, has traditionally resisted automation due to the extensive expertise required by human professionals. Large Language Models (LLMs) have shown significant advancements in various domains, and their emergent abilities suggest their potential to revolutionize industries. In this work, we establish a comprehensive benchmark using real-world penetration testing targets and further use it to explore the capabilities of LLMs in this domain. Our findings reveal that while LLMs demonstrate proficiency in specific sub-tasks within the penetration testing process, such as using testing tools, interpreting outputs, and proposing subsequent actions, they also encounter difficulties maintaining a whole context of the overall testing scenario. Based on these insights, we introduce PENTESTGPT, an LLM-empowered automated penetration testing framework that leverages the abundant domain knowledge inherent in LLMs. PENTESTGPT is meticulously designed with three self-interacting modules, each addressing individual sub-tasks of penetration testing, to mitigate the challenges related to context loss. Our evaluation shows that PENTESTGPT not only outperforms LLMs with a task-completion increase of 228.6% compared to the GPT-3.5 model among the benchmark targets, but also proves effective in tackling real-world penetration testing targets and CTF challenges. Having been open-sourced on GitHub, PENTESTGPT has garnered over 6,500 stars in 12 months and fostered active community engagement, attesting to its value and impact in both the academic and industrial spheres.

License type:

Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

Funding Info:

This research / project is supported by the National Research Foundation, Singapore, and the Cyber Security Agency of Singapore - National Cybersecurity R&D Programme
Grant Reference no. : NCRP25-P04-TAICeN

Description:

URI:

https://oar.a-star.edu.sg/communities-collections/articles/21246

ISBN:

978-1-939133-44-1

Collections:

Institute for Infocomm Research

Files uploaded:

https://www.usenix.org/system/files/usenixsecurity24-deng.pdf