SHAPFUZZ: Efficient Fuzzing via Shapley-Guided Byte Selection

Abstract

Shapley analysis 적용하여, byte position의 효과와 특정 byte position이 여러 다른 seed에 대해서도 퍼징 성능에 우월하게 적용됨을 보임. mutation시킬 때, contextual multi-armed bandit(CMAB) algorithm을 사용해서 가장 적게 선택되는 바이트와 Shapely value가 가장 높은 바이트 사이에서 trade-off를 수행함.

Introduction

기존 문제점: 1) 모든 constraint-related bytes가 new code를 discover하는 것은 아니다 & 2) extra analysis stage가 매우 time-consuming하다.
해결책: Cooperative Game Theory를 사용
fuzzing의 byte scheduling part : Shapley analysis -> input byte의 중요도를 정량화하여(new code를 더 잘 찾을 byte를 식별), 해당 바이트를 mutation에 더 자주 활용을 한다. -> extra analysis stage 불필요해짐
결과: discovery of new code는 매우 적은 byte position에 의존하고 있었으며 해당 byte를 repetitive mutation 했을때, 여러 다른 path constraint를 해결하여 new code를 찾아냈다.
Contribution: AFL++에 기반하여 SHAPFUZZ 구현, UNIFUZZ와 MAGMA 로 평가를 하였고, edge coverage와 bug discovery에서 기존 state-of-the-art fuzzers보다 우수한 성능을 보임(https://github.com/ShapFuzz/ShapFuzz)

Background and Motivation

Shapley Analysis
- cooperative game theroy중에 하나
- Shapely value : weighted average of each player's marginal contributions to all possible coalitions of players
- gain: the number of new edges
- Shapely value 계산하는 방벙: AFL++로 48시간동안 a single seed에 대해서 random mutation을 돌리고, {1,3,4} position의 mutation이 new edge(gain R)를 찾았다면 subset인 {1} / {3} / {1,3} / {1,4} / {3,4}를 recover 시켜서 테스트 하여 1,3,4 position에 대한 Shapley value를 구할 수 있다.
- 대부분의 bytes는 여러개의 CMP들과 관련이 있다.(GreyOne의 FTI이용)
- small portion of bytes가 new code를 찾는데 기여한다.
CMAB
trade -offs between exploring new knowledge and exploiting the current reliable knowledge
(https://sungkee-book.tistory.com/14)

Design of SHAPFUZZ

player: a byte in the seed S0

coalition : some certain bytes are mutated together

gain: self-new edges discovered by an input i generated by mutating the seed S0

CMAB에서는 강화학습에 나오는 Upper Confidence Bound 개념이 들어감!

Evaluation

AFL++기반으로 구현(mutation process, byte schedule을 수정)
타겟: UNIFUZZ, MAGMA
실험 환경: Ubuntu 18.04 with 103 cores (Intel(R) Xeon(R) Gold 6230R CPU) and 256GB of memory. Each fuzzer uses one CPU core to run one target program, 24 hours and repeat 5 times
analysis time, edge coverage, bug discovery 분석 -- edge 수 계산: afl-showmap 사용
select top three stack frames to de-duplicate bugs

1) Byte Scheduling 퍼저와의 비교: Inference-based fuzzer(Greyone, ProFuzzer), Neural network-based fuzzer(NEUZZ,PreFuzz), Taint-based fuzzer(Angora) -- constraint-related bytes를 식별하는 데에는 다음과 같이 3가지 종류의 퍼저가 있다.

2) Commonly Used 퍼저와의 비교: AFL, AFL++, AFLFast FairFuzz, MOPT

3) Byte Mutation Solution과 비교->EMS(the latest fuzzer focusing on byte mutation)

4) Ablation Study of ShapFuzz : calculation of Shapley values와 Shapley-guided byte selection을 나눠서 Shapley-guided를 하지 않고 calculate만 하는 모델 만들어서 비교

5) Inference-based 퍼저와 비교 : GreyOne, ProFuzzer
결과:: code coverage와 bug discovery에서 모두 우수했다!

참고

byte schedule: 변이될 시드의 바이트를 선택하는 빈도를 결정

Shapley Value:

https://zephyrus1111.tistory.com/271

https://yjjo.tistory.com/53

저작자표시

'논문리딩 > fuzzing' 카테고리의 다른 글

*[Usenix ‘24 fall] SDFuzz: Target States Driven Directed Fuzzing (0)	2024.05.15
NSFuzz : Towards Efficient and State-Aware Network Service Fuzzing (0)	2024.03.27
AFLNET: A Greybox Fuzzer for Network Protocols (0)	2024.03.26
Large Language Model guided Protocol Fuzzing (0)	2024.03.18

sonyCYDFsame

SHAPFUZZ: Efficient Fuzzing via Shapley-Guided Byte Selection

Abstract

Introduction

Background and Motivation

Design of SHAPFUZZ

Evaluation

참고

'논문리딩 > fuzzing' 카테고리의 다른 글

티스토리툴바

SHAPFUZZ: Efficient Fuzzing via Shapley-Guided Byte Selection

Abstract

Introduction

Background and Motivation

Design of SHAPFUZZ

Evaluation

참고

'논문리딩 > fuzzing' 카테고리의 다른 글

관련글

티스토리툴바