Categorical Inference Poisoning: Verifiable Defense Against Black-Box DNN Model Stealing Without Constraining Surrogate Data and Query Times

Page view(s)

Checked on Sep 01, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/18942

Title:

Categorical Inference Poisoning: Verifiable Defense Against Black-Box DNN Model Stealing Without Constraining Surrogate Data and Query Times

Journal Title:

IEEE Transactions on Information Forensics and Security

DOI:

10.1109/TIFS.2023.3244107

Publication URL:

http://dx.doi.org/10.1109/tifs.2023.3244107

Authors:

Haitian Zhang, Guang Hua, Xinya Wang, Hao Jiang, Wen Yang

Keywords:

Safety, Computer Networks and Communications, Risk, Reliability and Quality

Publication Date:

10 February 2023

Citation:

Zhang, H., Hua, G., Wang, X., Jiang, H., & Yang, W. (2023). Categorical Inference Poisoning: Verifiable Defense Against Black-Box DNN Model Stealing Without Constraining Surrogate Data and Query Times. IEEE Transactions on Information Forensics and Security, 1–1. https://doi.org/10.1109/tifs.2023.3244107

Abstract:

Deep Neural Network (DNN) models have offered powerful solutions for a wide range of tasks, but the cost to develop such models is nontrivial, which calls for effective model protection. Although black-box distribution can mitigate some threats, model functionality can still be stolen via black-box surrogate attacks. Recent studies have shown that surrogate attacks can be launched in several ways, while the existing defense methods commonly assume attackers with insufficient in-distribution (ID) data and restricted attacking strategies. In this paper, we relax these constraints and assume a practical threat model in which the adversary not only has sufficient ID data and query times but also can adjust the surrogate training data labeled by the victim model. Then, we propose a two-step categorical inference poisoning (CIP) framework, featuring both poisoning for performance degradation (PPD) and poisoning for backdooring (PBD). In the first poisoning step, incoming queries are classified into ID and (out-of-distribution) OOD ones using an energy score (ES) based OOD detector, and the latter are further classified into high ES and low ES ones, which are subsequently passed to a strong and a weak PPD process, respectively. In the second poisoning step, difficult ID queries are detected by a proposed reliability score (RS) measurement and are passed to PBD. In doing so, the first step OOD poisoning leads to substantial performance degradation in surrogate models, the second step ID poisoning further embeds backdoors in them, while both can preserve model fidelity. Extensive experiments confirm that CIP can not only achieve promising performance against state-of-the-art black-box surrogate attacks like KnockoffNets and data-free model extraction (DFME) but also work well against stronger attacks with sufficient ID and deceptive data, better than the existing dynamic adversarial watermarking (DAWN) and deceptive perturbation defense methods.

License type:

Publisher Copyright

Funding Info:

This work was supported in part by the National Natural Science Foundation of China under Grant 62272347 and Grant U19B2004.

Description:

© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

URI:

https://oar.a-star.edu.sg/communities-collections/articles/18942

ISSN:

1556-6021
1556-6013

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
double-final.pdf	4.40 MB	PDF	Open