PR-118 "Black-Box Attacks with Limited Queries and Information" Review (2018 ICML)(Adversarial Example)

1. Citations & Abstract 읽기

Citations : 2022.01.03 기준 579회

저자

Andrew Ilyas, Logan Engstrom, Anish Athalye, Jessy Lin - Massachusetts Institute of Technology, LabSix

Abstract

현재 NN 기반 분류기는 공격자가 모델에 대한 query 접근만 하는 black-box 설정에서도 adversarial example에 취약하다. 실제로 현실 시스템에 대한 위협 모델 (threat model)은 상대방은 임의의 많은 선택된 입력들에서 네트워크의 전체 출력을 관찰할 수 있는 일반적인 black-box 모델 보다도 제한적인 경우가 많다. 우리는 보다 정확하게 많은 실존하는 분류기를 특성화하는 3가지 현실적인 위협 모델을 정의한다: query-imited (query 제한) setting, the partial information (부분 정보) setting, and the label-only(label 전용) setting. 우리는 이전 방법들이 비실용적이거나 비효율적일 수 있는 제한적인 위협모델 하에서 분류기를 속이는 새로운 공격을 개발한다. 우리는 우리들의 제안된 위협 모델들 하에서의 ImageNet 분류기보다 우리들의 방법들이 더 효과적임을 증명한다. 우리는 Google Cloud Vision API를 깨기 위해 제한된 query 접근, 부분 정보 그리고 기타 실존 문제들을 극복하여 상업 분류기에 대한 표적 black-box 공격 또한 보인다.

2. 발표 정리

https://youtu.be/AMPpOFtg3Q4

공식 논문 링크

http://proceedings.mlr.press/v80/ilyas18a/ilyas18a.pdf

https://arxiv.org/abs/1804.08598

Black-box Adversarial Attacks with Limited Queries and Information

Current neural network-based classifiers are susceptible to adversarial examples even in the black-box setting, where the attacker only has query access to the model. In practice, the threat model for real-world systems is often more restrictive than the t

arxiv.org

Presentation Slide

https://drive.google.com/file/d/1ML0m55-dqK8WaPvS_ETviCvkkQIFZkxq/view

PR12-118_Black_Box_Attack.pdf

drive.google.com

다양한 환경에서 어떤 방법으로 adversarial attack을 진행할 수 있는지에 대한 설명하는 논문

Adversarial Example 사람 눈에는 차이를 확인할 수 없지만 모델을 속이는 것

최적화 문제로 풀면 다음과 같음.

$$min_{x^\prime} = \lVert x^\prime - x \rVert$$

$$f(x^\prime)=l^\prime, \ f(x)=l,\ l^\prime \ne l$$

$x$: Original Image

$x^\prime$: Adversarial Example

$f$: Classifier

$l$: Label

Gradient-based Adversarial Attack를 통해 구해왔음

모델 안에 대한 내용을 알 수 있다면 Back propagation을 통해 Adversarial Example 생성 가능

어떤 식으로 진행되는지 모르는 환경 : Black-Box 환경

대체할 수 있는 Approximate 모델을 활용 - Gradient 추정 가능

그러나 효율적이지 않음. 거의 사용되지 않음.

Solution 2: Gradient Estimation - Gradient만 추정하는 방법 (간단)

Gradient Estimation - Random Search

Gradient가 큰 부분에 대해서 집중해서 search를 사용하는 것

저자의 방법

Original Image가 Cat일 때 Target Class를 Car로 설정하고 싶다고 가정

Step 0: Start at Image of target class

Target Class를 100%로 분류하는 이미지로부터 시작

Step 1: Take a step towards original image Keeping "car" in the top k classes

Target Class Car에 Cat 이미지를 살짝 덧씌움 -> P(Car) = 80%, Top k 안에 들어갈 수 있게 최대한 덧 씌움

Step 2: Take (estimated) gradient step on P(car) Keeping distance from original image

Target 이미지로 확률을 늘릴 수 있는 adversarial noise를 줌.

step 1에서는 P(car)=0.8이었지만 adversarial noise를 줌으로써 P(car)=0.9로 향상.

단, 두 이미지 차이를 최소화하는 방향으로 진행 (Random Search)

Step 3: Take a step towards original image Keeping "car" in the top k classes

Step 4: Take an (estimated) gradient step

adversarial noise를 넣어서 다시 P(car) 확률을 놓임

Final Step: Stop when image = original image

원본 이미지와 같아질 때까지 진행

(1) Target Class rank가 k개 안에 들어올 때까지 반복적으로 덧씌우는 작업을 진행함.

(2) 최대한 많이 바뀔 수 있게

Top k개의 label 이름만을 확인할 수 있는 환경

어떻게 Adversarial attack이 가능할까

고양이를 Guacamole로 바꾸고 싶음.

R(x) = k-ranking

1) 4-2=2

2) 4-1=3

3) 0 (rank에 없음)

$S(x_t)=\frac{5}{3}$

step 2 대체

adversarial noise를 구할 수 있는 것이고 이를 통해 adversarial example 생성 가능

참조

GitHub

https://github.com/labsix/limited-blackbox-attacks

GitHub - labsix/limited-blackbox-attacks: Code for "Black-box Adversarial Attacks with Limited Queries and Information" (http://

Code for "Black-box Adversarial Attacks with Limited Queries and Information" (http://arxiv.org/abs/1804.08598) - GitHub - labsix/limited-blackbox-attacks: Code for "Black-box Advers...

github.com

https://journal-home.s3.ap-northeast-2.amazonaws.com/site/netsec2020/netsec-file/%EB%B0%95%ED%98%B8%EC%84%B1.pdf

블로그

labsix 스타트업

https://www.labsix.org/

labsix

LabSix is an independent, entirely student-run AI research group composed of MIT undergraduate and graduate students. We engage in a wide range of theoretical and practical research in deep learning.

www.labsix.org