In this chapter we describe reverse-engineering attacks (REAs) on classifiers and defenses against them. REAs involve querying (probing) a classifier to discover its decision rules. One primary application of REAs is to enable TTEs. Another is to reveal a private (e.g., proprietary) classifier’s decision-making. For example, an adversary may seek to discover the workings of a military automated target-recognition system. Early work demonstrates that, with a modest number of (random) queries, which do not rely on any knowledge of the nominal data distribution, one can learn a surrogate classifier on a given domain that closely mimics an unknown classifier. However, a critical weakness of this attack is that random querying makes the attack easily detectable – randomly selected query patterns will typically look nothing like legitimate examples. They are likely to be extreme outliers of all the classes. Each such query is thus individually highly suspicious, let alone thousands or millions of such queries (required for accurate reverse-engineering). However, more recent REAs, which are akin to active learning strategies, are stealthier. Here, we use the ADA method (developed in Chapter 4 for TTE detection) to detect REAs. This method is demonstrated to provide significant detection power against stealthy REAs.
Review the options below to login to check your access.
Log in with your Cambridge Aspire website account to check access.
If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.