DevNet

Guansong Pang / Deep Anomaly Detection with Deviation Network / 25th 2019 ACM SIGKDD international conference on knowledge discovery & data mining

Deep Anomaly Detection with Deviation Network

์ด๋ฒˆ์— ํฌ์ŠคํŒ…ํ•  ๋…ผ๋ฌธ์€ Pang, G. et. al์˜ Deep Anomaly Detection with Deviation Network์ž…๋‹ˆ๋‹ค.

Pang, G., Shen, C., & van den Hengel, A. (2019, July). Deep anomaly detection with deviation networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 353-362).

๋‹ค์Œ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ์™€ ๊ด€๋ จํ•ด์„œ๋Š” ์ œ ๊ฐœ์ธ ๋ธ”๋กœ๊ทธpersonal blog post(1)arrow-up-right personal blog post(2)arrow-up-right์™€ ์œ ํŠœ๋ธŒ ์˜์ƒpersonal Youtube Reviewarrow-up-right์œผ๋กœ๋„ ์˜ฌ๋ ค๋†“์•˜์œผ๋‹ˆ ์ฐธ๊ณ  ๋ฐ”๋ž๋‹ˆ๋‹ค.

0. What is Anomaly Detection

์šฐ์„  ๋ณธ ๋…ผ๋ฌธ์˜ ๊ตฌ์ฒด์ ์ธ ๋‚ด์šฉ์— ๋“ค์–ด๊ฐ€๊ธฐ ์•ž์„œ Anomaly Detection์˜ ๊ฐœ์š” ๋ถ€๋ถ„์„ ๋ง์”€๋“œ๋ฆฌ๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿผ ์ฐจ๊ทผ์ฐจ๊ทผ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์ž ๊ทธ๋ ‡๋‹ค๋ฉด Anomaly Detection (AD)๋Š” ๋ฌด์—‡์„ ์˜๋ฏธํ•˜๋Š” ๊ฑธ๊นŒ์š”? Survey ๋…ผ๋ฌธ์— ๋‚˜์™€์žˆ๋Š” ๋ฌธ๊ตฌ๋ฅผ ์ธ์šฉํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Anomaly Detection (AD) is the task of detecting samples and events which rarely appear or even do not exist in the available training data

์ฆ‰, ๋ง ๊ทธ๋Œ€๋กœ ์ผ๋ฐ˜์ ์œผ๋กœ ๋ฐœ์ƒํ•˜๋Š” event๋“ค๊ณผ๋Š” ๋‹ค๋ฅธ ํŠน์ดํ•œ, ์ผ๋ฐ˜์ ์ธ ํŠน์ง•์„ ๋„๊ณ  ์žˆ์ง€ ์•Š์€ ์ƒ˜ํ”Œ์„ ํƒ์ง€ํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ์ด๋Ÿฐ ๊ถ๊ธˆ์ฆ์ด ๋“ค๊ฒ๋‹ˆ๋‹ค.

\

์ผ๋ฐ˜์ ์ธ Classification๊ณผ ๋‹ค๋ฅธ ๊ฒƒ์ด ๋ญ์ง€?

\

์œ„ ์˜์–ด ๋ฌธ์žฅ์— ์ž˜ ๋ณด๋ฉด 'rarely appear or even do not exist in training data'๋ผ๋Š” ๋ฌธ๊ตฌ๊ฐ€ ํฌ์ธํŠธ์ž…๋‹ˆ๋‹ค.

์ฆ‰, ๊ฑฐ์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ์กด์žฌํ•˜์ง€ ์•Š๊ฑฐ๋‚˜ ์‹ฌ์ง€์–ด ์•„์˜ˆ ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ์˜ ์ด์ƒ์น˜ ๋ฐ์ดํ„ฐ๋ฅผ ํƒ์ง€ํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ Supervised ๊ธฐ๋ฐ˜ ๋ถ„๋ฅ˜ ๋ชจํ˜•์€ ๋ถ„๋ฅ˜ํ•  ํด๋ž˜์Šค์— ๋Œ€ํ•œ ์ถฉ๋ถ„ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š” ๊ฒƒ์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ Anomaly Detection์€ anomaly๋กœ ๋ถ„๋ฅ˜๋˜๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฑฐ์˜ ์—†๊ธฐ ๋•Œ๋ฌธ์—, ๋˜ anomaly ๋ฐ์ดํ„ฐ ๊ฐ„์˜ distribution์ด ์œ ์‚ฌํ•˜๋Š” ๊ฒƒ์„ ๋ณด์žฅํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ผ๋ฐ˜์ ์ธ classification๊ณผ ์ฐจ์ด์ ์ด ๋ฐœ์ƒํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์•„๋ž˜ ๊ทธ๋ฆผ์„ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

reference : Mohammadi, B., Fathy, M., & Sabokrou, M. (2021). Image/Video Deep anomaly detection: A survey. arXiv preprint arXiv:2103.01739

\

๋ณธ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด, ์ผ๋ฐ˜์ ์œผ๋กœ ํŒŒ๋ž€์ƒ‰ ์› ์•ˆ์— ๋“ค์–ด๊ฐ€ ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋“ค์ด normal data๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ด ๋ฐ์ดํ„ฐ๋“ค์„ F๋ผ๊ณ  ํ•˜๋Š” feature representation์„ ํ†ตํ•ด F1, F2๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์€ plot์„ ๊ทธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋…น์ƒ‰ ์ ๋“ค์˜ ๊ฒฝ์šฐ ์ผ๋ฐ˜์ ์œผ๋กœ ์˜คํ† ๋ฐ”์ด์˜ ์ด๋ฏธ์ง€๋“ค์„ ํ‘œํ˜„ํ•œ ๋ฐ์ดํ„ฐ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋นจ๊ฐ„์ƒ‰ ์ž๋™์ฐจ์˜ ๊ฒฝ์šฐ ์šฐ๋ฆฌ๊ฐ€ ์ผ๋ฐ˜์ ์œผ๋กœ ๊ด€์ฐฐํ•œ ์˜คํ† ๋ฐ”์ด์™€๋Š” ์กฐ๊ธˆ ๋‹ค๋ฅธ ํŠน์ง•์„ ๊ฐ€์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ F๋ฅผ ํ†ตํ•ด representation์„ ํ•˜๊ฒŒ ๋˜๋ฉด ์ผ๋ฐ˜์ ์œผ๋กœ ๋‹ค๋ฅธ ์œ„์น˜์— ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•˜๊ฒŒ ๋˜๊ณ  ๊ทธ '๊ฒฉ์ฐจ'๊ฐ€ ๋ฐ”๋กœ ์ด ๋ฐ์ดํ„ฐ๋ฅผ anomaly๋ผ๊ณ  ์ธก์ •ํ•˜๊ฒŒ ๋˜๋Š” ๊ธฐ์ค€์ด ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด anomaly detection์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์ˆ˜ํ–‰๋˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

\

์ž, ๊ทธ๋ ‡๋‹ค๋ฉด Anomaly Detection์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์…‹์—์„œ ์ˆ˜ํ–‰์ด ๋  ์ˆ˜ ์žˆ๋Š”๋ฐ ์ด์— ๋”ฐ๋ผ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์ผ€์ด์Šค๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํฌ๊ฒŒ๋Š” 'Supervised', 'Unsupervised' ๊ทธ๋ฆฌ๊ณ  'Semi-supervised' ์ผ€์ด์Šค๋กœ ๋‚˜๋‰˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

\

๊ทธ๋ฆฌ๊ณ  ์šฉ์ดํ•œ ํ‘œํ˜„์„ ์œ„ํ•ด ๊ฐ๊ฐ์˜ ๊ธฐํ˜ธ๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

U : Unlabeled data

N : Normal labeled data

A : Abnormal labeled data

U๋Š” label์ด ๋˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ๋ฅผ ์˜๋ฏธํ•˜๊ณ , N๊ณผ A๋Š” ๊ฐ๊ฐ label์ด ๋œ ์ •์ƒ ๋ฐ์ดํ„ฐ์™€ ๋น„์ •์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ƒํ™ฉ์—์„œ ์œ„ 3๊ฐ€์ง€ ์ผ€์ด์Šค๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

[1] Supervised Lerning ( N + A )

Supervised Learning์˜ ๊ฒฝ์šฐ ๋ฐ์ดํ„ฐ๊ฐ€ ์ถฉ๋ถ„ํžˆ ๋งŽ์„ ๊ฒฝ์šฐ ์„ธ๊ฐ€์ง€ ์ผ€์ด์Šค ์ค‘ ๊ฐ€์žฅ ๊ฐ•๋ ฅํ•œ ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š” ์ƒํƒœ์—์„œ์˜ ์˜ˆ์ธก์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ๋ณด๋‹ค ์˜ˆ์ธก์ด ์ •ํ™•ํ•œ ๊ฑด make senseํ•˜์ฃ . ํ•˜์ง€๋งŒ, ๋ฌธ์ œ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฑฐ์˜ ์—†๋‹ค๋Š” ๊ฒƒ์ด ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ์‹ค์ œ Real world์—์„œ๋Š” labeled ๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์ง€๋„ ์•Š์„ ๋ฟ๋”๋Ÿฌ labeled ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๊ณ  ํ•ด๋„, abnormal ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ๋Š” ๊ทนํžˆ ๋“œ๋ฌธ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ supervised learning์€ ๋ฐ์ดํ„ฐ ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๋ฅผ ๋งž์ดํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ generalized๋œ ํŒ๋‹จ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์—†๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์‚ฌ์‹ค anomaly ๋ฐ์ดํ„ฐ๋Š” ์šฐ๋ฆฌ๊ฐ€ ๊ด€์ธกํ•œ ์ผ€์ด์Šค ์™ธ์—๋„ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ๋กœ ์กด์žฌํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ ๋ชจํ˜•์€ ์šฐ๋ฆฌ๊ฐ€ ๊ด€์ธกํ•œ anomaly ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํŒ๋‹จ์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šตํ•  ๋•Œ ๋ณด์ง€ ์•Š์€ unseen anomaly์— ๋Œ€ํ•ด์„œ๋Š” ๋Œ€์‘ํ•  ์ˆ˜ ์žˆ๋Š” ํž˜์ด ๋ถ€์กฑํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ผ๋ฐ˜์ ์œผ๋กœ AD์—์„œ๋Š” supervised learning์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์—๋Š” ํ•œ๊ณ„์ ์ด ์กด์žฌํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

\

[2] Unsupervised Learning ( U )

๊ทธ๋ž˜์„œ ์ผ๋ฐ˜์ ์œผ๋กœ labeled์ด ์ •์˜๋˜์ง€ ์•Š๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด ์ด๋Ÿฐ ๊ฒฝ์šฐ๊ฐ€ ์‹ค์ œ real world ์ƒํ™ฉ์— ์กฐ๊ธˆ ๋” ์œ ์‚ฌํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์‹ค์ œ labeled ๋ฐ์ดํ„ฐ๋ฅผ ์–ป๊ธฐ ์–ด๋ ค์šธ ๋ฟ๋”๋Ÿฌ abnormal ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฑฐ์˜ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•ด์„œ ๊ฐ€์žฅ real world์™€ ์œ ์‚ฌํ•œ ๊ฒƒ์ด ๋ฐ”๋กœ unsupervised learning์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์œ„ Supervised Learning์—์„œ ํ•œ๊ณ„์ ์ธ generalizability์— ๋Œ€ํ•ด์„œ๋„ ๋ณด์žฅ์ด ๋˜๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๋Ÿฌ ๋‹จ์ ์„ ๋ณด์™„ํ•œ ๋ฐฉ๋ฒ•์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

\

[3] Semi-supervised Learning ( N + A + U, N + A << U )

ํ•˜์ง€๋งŒ, Unsupervised learning์€ ์•„๋ฌด๋Ÿฐ ๋‹จ์ ์ด ์—†์„๊นŒ์š”? Unsupervised์˜ ๊ฐ€์žฅ ํฐ ๋ฌธ์ œ์ ์€ ๋ฐ”๋กœ anomaly data์— ๋Œ€ํ•œ 'pre-knowledge'๊ฐ€ ๋ถ€์กฑํ•œ ์ ์— ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ€๋œฉ์ด๋‚˜ anomaly data๊ฐ€ ์ ์€๋ฐ, ๊ทธ ํŠน์ง•์— ๋Œ€ํ•œ ์•„๋ฌด๋Ÿฐ ์ •๋ณด๊ฐ€ ์—†๊ฒŒ ๋˜๋ฉด ๋ชจํ˜•์ด ์˜จ์ „ํžˆ anomaly์˜ ํŠน์ง•์„ ์žก๋Š” ๊ฒƒ๋„ ์–ด๋ ค์šด ๋ถ€๋ถ„์ด ์žˆ๊ฒ ์ฃ . ๊ทธ๋Ÿฌ๋ฉด labeled๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ์—๋Š” ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๊ฐ€ ์žˆ๊ณ , ๋‹ค ๋น„์ง€๋„ ํ•™์Šต์„ ์‚ฌ์šฉํ•˜๊ธฐ์—๋Š” ์‚ฌ์ „ ์ง€์‹์ด ๋ถ€์กฑํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ์–ด ๋‘˜ ๋‹ค ํ™œ์šฉํ•˜์ž๋Š” ์ทจ์ง€์—์„œ ๋‚˜์˜จ ๊ฒƒ์ด ๋ฐ”๋กœ semi-supervisd learning์ž…๋‹ˆ๋‹ค.

์ด ๋ฐฉ๋ฒ•์˜ ๊ฒฝ์šฐ limited number of labeled data๋ฅผ ์‚ฌ์ „ ์ง€์‹์œผ๋กœ ํ™œ์šฉํ•ด ๋ณด๋‹ค ํ•™์Šต์„ ๊ฐ•ํ™”ํ•˜๊ฒ ๋‹ค๋Š” ์ทจ์ง€์—์„œ ๋น„๋กฏ๋ฉ๋‹ˆ๋‹ค.

\

์ด๋Ÿฌํ•œ Background๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋ณธ ๋…ผ๋ฌธ์˜ ๋‚ด์šฉ์„ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

1. Problem Definition

์šฐ์„  ๊ฐ„๋‹จํ•˜๊ฒŒ ๋…ผ๋ฌธ์˜ Introduction๋ถ€ํ„ฐ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

AD task๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š”๋ฐ ์žˆ์–ด ๋ณธ๊ฒฉ์ ์ธ deep learning ๋ฐฉ๋ฒ•์ด ์ ์šฉ๋˜๊ธฐ ์ „๊นŒ์ง€ ์ „ํ†ต์ ์ธ ๋ฐฉ๋ฒ•์€(SVM ๊ฐ™์€) ๋‹ค์Œ๊ณผ ๊ฐ™์€ 2๊ฐ€์ง€ ํ•œ๊ณ„์ ์— ์ง๋ฉดํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • high dimensionality

  • highly non-linear feature relation

์ฒซ ๋ฒˆ์งธ๋กœ ์ฐจ์›์˜ ์ €์ฃผ ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ๊ณ  ๋‘ ๋ฒˆ์งธ๋กœ๋Š” anomaly๋ฅผ detectํ•˜๊ธฐ ์œ„ํ•œ feature๊ฐ„์˜ linearํ•˜์ง€ ์•Š์€ ๊ด€๊ณ„๋กœ ์ธํ•ด ์˜จ์ „ํ•œ ๋ชจํ˜• ์„ค์ •์ด ์–ด๋ ต๊ฒŒ ๋œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋Š” non-linear ๋ฐฉ๋ฒ•์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•œ neural net ๋ฐฉ๋ฒ•์ด ๋“ฑ์žฅํ•˜๋ฉด์„œ ์œ„ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ neural net์ด ์ ์šฉ๋˜๊ณ  ๋‚˜์„œ ๋ด‰์ฐฉํ•˜๊ฒŒ ๋œ ๋ฌธ์ œ๋Š” ํฌ๊ฒŒ 2๊ฐ€์ง€๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

  • anomaly data๊ฐ€ ๋งค์šฐ ์ ๋‹ค๋Š” ๊ฒƒ

it is very difficult to obtain large-scale labeled data to train anomaly detectors due to the prohibitive cost of collecting such data in many anomaly detection application domains

  • anomaly data ๊ฐ„์˜ ์œ ์‚ฌ์„ฑ์ด ์—†๋‹ค๋Š” ๊ฒƒ

anomalies often demonstrate different anomalous behaviors, and as a result, they are dissimilar to each other, which poses significant challenges to widely-used optimization objectives that generally assume the data objects within each class are similar to each other

์šฐ์„  labeled ๋œ anomaly data์˜ ์ˆ˜๊ฐ€ ๋งค์šฐ ์ ๋‹ค๋Š” ๊ฒƒ๊ณผ ๊ทธ๊ฒƒ๋“ค์„ ์–ป๋Š” ๋ฐ์— cost๊ฐ€ ๋งŽ์ด ๋ฐœ์ƒํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ ์ผ๋ฐ˜์ ์œผ๋กœ ๋ชจํ˜•์ด train data๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜๋Š”๋ฐ ํ•™์Šตํ•œ anomaly ๋ฐ์ดํ„ฐ์™€๋Š” ๋˜ ๋‹ค๋ฅธ ํ˜•ํƒœ์˜ distribution์„ ๊ฐ–๋Š” anomaly ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•  ๊ฐ€๋Šฅ์„ฑ์ด ํฌ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ ํ˜„๋Œ€ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ AD ๋ชจํ˜•์€ supervised ๋ฐฉ๋ฒ•์ด ์•„๋‹Œ unsupervised ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

๋ฐ”๋กœ Representation learning์„ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ธ๋ฐ, ๋‹ค์Œ๊ณผ ๊ฐ™์€ two-step์˜ approach๋ฅผ ์ ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  1. They first learn to represent data with new-representation

์ฆ‰, ๋ฐ์ดํ„ฐ๋ฅผ ์ž˜ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ํ•ต์‹ฌ์ ์ธ feature๋ฅผ ๋ฝ‘์•„๋‚ด๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•˜๋Š” ๋‹จ๊ณ„๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. They use the learned representations to define anomaly scores using reconstruction error or distance metrics space

๊ทธ๋ฆฌ๊ณ  Representation learning์ด ๋ฐœ์ „ํ•˜๊ฒŒ ๋˜๋ฉด์„œ AD์—์„œ๋Š” 2๊ฐ€์ง€ ์ปจ์…‰์˜ metric์„ ์ ์šฉํ•˜๊ฒŒ ๋˜๋Š”๋ฐ, ๋Œ€ํ‘œ์ ์œผ๋กœ 'Reconstruction Error'์™€ 'Distance-based measures'์„ ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ex) Intermediate Representation in AE, Latent Space in GAN, Distance metric space in DeepSVDD

2. Motivation

ํ•˜์ง€๋งŒ ์ €์ž๋Š” ์ด๋Ÿฌํ•œ two-step์˜ approach๊ฐ€ ๊ฐ–๋Š” ๋ฌธ์ œ์ ์œผ๋กœ์„œ, representation learning์„ ํ•˜๋Š” ๋ถ€๋ถ„๊ณผ anomaly detection์„ ํ•˜๋Š” ๋ถ€๋ถ„์ด separate๋˜์–ด ์žˆ๋‹ค๋Š” ์ ์„ ์ง€์ ํ•ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ prior-knowledge์˜ ๋ถ€์กฑ์œผ๋กœ Unsupervised learning์„ ๋ฐ”ํƒ•์œผ๋กœ anomaly detection์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ, data noise๋‚˜ uninteresting data๋ฅผ anomaly ๋ฐ์ดํ„ฐ๋กœ ์ธ์‹ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ์— ๋Œ€ํ•œ ํ•ด๊ฒฐ์ฑ…์œผ๋กœ ์ €์ž๋Š” ์ œํ•œ๋œ ์ˆ˜์˜ labeled data๋ฅผ ํ™œ์šฉํ•ด์„œ ์‚ฌ์ „ ์ง€์‹์ด ๋ถ€์กฑํ•œ ๋ฌธ์ œ๋ฅผ ๋ณด์™„ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ ์œ„ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋ฌธ์ œ์ ์„ ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์ €์ž๋Š” 'Anomaly Scores'๋ฅผ ํ•™์Šตํ•˜๋Š” end-to-end learning ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

\

์œ„ ๊ทธ๋ฆผ์„ ํ•œ๋ฒˆ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. (a)์˜ ๊ฒฝ์šฐ ๊ธฐ์กด representation์„ ๋ฐ”ํƒ•์œผ๋กœ anomaly detection์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋„์‹ํ™”ํ•œ ๊ทธ๋ฆผ์ด๊ณ  (b)์˜ ๊ฒฝ์šฐ ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์˜ ๋„์‹๋„์ž…๋‹ˆ๋‹ค. ์œ„ ๊ทธ๋ฆผ์—์„œ ์•Œ ์ˆ˜ ์žˆ๋“ฏ ๊ธฐ์กด ๋ฐฉ๋ฒ•์€ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ feature๋ฅผ ๋ฝ‘์•„๋‚ด์–ด ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์—ฌ๋Ÿฌ metric์„ ์ ์šฉ ( reconstruction error, distance metric ) ํ•˜์—ฌ detecting์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ €์ž๋Š” ์ด๋Ÿฌํ•œ ๋ถ€๋ถ„์ด indirectํ•˜๊ฒŒ ๋ชจํ˜•์„ optimizeํ•˜๋Š” ๊ฒƒ์ด๋ผ๊ณ  ์ง€์ ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ (b)๊ฐ™์€ ๊ฒฝ์šฐ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅ๋ฐ›์œผ๋ฉด end-to-end๋กœ ๋ฐ”๋กœ anomaly score๋ฅผ ๋„์ถœํ•˜๊ฒŒ ๋˜์–ด detecting์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ฆ‰, ๋ชจํ˜•์„ directํ•˜๊ฒŒ optimizeํ•  ์ˆ˜ ์žˆ๋Š” ๊ตฌ์กฐ๋ผ๊ณ  ์–ธ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์˜ ๋˜ ๋‹ค๋ฅธ novelty๋Š”, anomaly์˜ ์ •๋„๋ฅผ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๋Š” reference score๋ฅผ ์ •์˜ํ•˜์˜€๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ์ฆ‰, normal data๋กœ๋ถ€ํ„ฐ ์ถ”์ถœ๋œ ํ‰๊ท ์ ์ธ anomaly score์™€ ํ˜„์žฌ ์ž…๋ ฅ ๊ฐ„์˜ deviate๋œ ์ •๋„๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํŒ๋‹จ์„ ํ•œ๋‹ค๋Š” ์ ์—์„œ ๊ธฐ์กด ๋ฐฉ๋ฒ•์—์„œ ๊ฐ–๋Š” ๋ฐฉ๋ฒ•๊ณผ ์ฐจ๋ณ„๋œ๋‹ค๋Š” ์ ์„ ์–ธ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค.

\

๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์ด ๊ฐ–๋Š” novelty๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด 2๊ฐ€์ง€๋กœ ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. With the original data as inputs, we directly learn and output the anomaly scores rather than the feataure representations.

  2. Define the mean of anomaly scores of some normal data objects based on a prior probability to serve as a reference score for guiding the subsequent anomaly score learning.

3. Method

์ž ๊ทธ๋Ÿฌ๋ฉด ๋ณธ๊ฒฉ์ ์ธ methodology๋ฅผ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

End-to-End Anomaly Score Learning

์œ„์—์„œ ์–ธ๊ธ‰ํ•œ prior knowledge๊ฐ€ ๋ถ€์กฑํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด unlabeled ๋ฐ์ดํ„ฐ์™€ limited labeled ๋ฐ์ดํ„ฐ๋ฅผ ํ˜ผ์žฌํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

N์˜ ๊ฒฝ์šฐ unlabeled ๋ฐ์ดํ„ฐ์˜ ์ˆ˜๋ฅผ ์˜๋ฏธํ•˜์—ฌ K์˜๊ฒฝ์šฐ ๋งค์šฐ ์†Œ๋Ÿ‰์˜ labeled๋œ anomaly ๋ฐ์ดํ„ฐ๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

์ด๋ ‡๊ฒŒ ๋ฐ์ดํ„ฐ๋ฅผ ์„ธํŒ…ํ•˜๊ฒŒ ๋˜๋ฉด,

์šฐ๋ฆฌ์˜ ๊ฐ€์žฅ ํฐ ๋ชฉ์ ์€ ๋ฐ”๋กœ anomaly score๋ฅผ ๋„์ถœํ•˜๋Š” ์ด ํŒŒ์ด ํ•จ์ˆ˜๋ฅผ ์ž˜ ํ•™์Šตํ•ด์„œ anomaly์™€ normal ๊ฐ„์˜ anomaly scoring ์ฐจ์ด๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๊ทธ๋ ‡๋‹ค๋ฉด ๊ฑฐ์‹œ์ ์ธ Framework๋ถ€ํ„ฐ ์‚ดํŽด๋ณผ๊นŒ์š”?

\

์šฐ์„  'Anomaly Scoring Network'์ด๋ผ๊ณ  ํ•˜๋Š” anomaly score๋ฅผ ๋„์ถœํ•˜๋Š” ๋„คํŠธ์›Œํฌ ํ•˜๋‚˜์™€ Reference score๋ฅผ generateํ•˜๋Š” ๋ถ€๋ถ„์œผ๋กœ ํฌ๊ฒŒ 2๊ฐœ์˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ฐ€์ง์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Anomaly score๋ฅผ ๋„์ถœํ•˜๋Š” ๋„คํŠธ์›Œํฌ์˜ ๊ฒฝ์šฐ 2๊ฐœ์˜ ์„ธ๋ถ€์ ์ธ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง์„ ์•Œ ์ˆ˜ ์žˆ๋Š”๋ฐ, input์ด ๋“ค์–ด์˜ค๊ณ  ๋‚˜๋ฉด representation์„ ๋งŒ๋“œ๋Š” 'Intermediate representation' layer์™€ ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ณง๋ฐ”๋กœ anomaly score๋ฅผ ๋„์ถœํ•˜๋Š” layer๋กœ ๊ตฌ์„ฑ์ด ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์— ๋Œ€ํ•œ ๊ตฌ์ฒด์ ์ธ ๊ตฌ์กฐ๋Š” ๋’ค์—์„œ ์ž์„ธํ•˜๊ฒŒ ๋‹ค๋ฃจ๊ฒ ์Šต๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  Reference score๋ฅผ ๋„์ถœํ•˜๋Š” ๋ถ€๋ถ„์€ R = { x1, .., xl }์„ ๋ณด๋ฉด l๊ฐœ์˜ random sample์„ ๋ฝ‘๊ฒŒ ๋˜๋Š”๋ฐ ์ด ์ƒ˜ํ”Œ๋“ค์€ normal ๋ฐ์ดํ„ฐ์—์„œ ๋ฝ‘์€ ์ž„์˜์˜ ์ƒ˜ํ”Œ๋“ค์ด๊ณ  ์ด๋“ค์˜ ํ‰๊ท ์„ ๊ณ„์‚ฐํ•œ ยต_r ์„ ์ด์šฉํ•˜์—ฌ ์ด๋“ค๋กœ๋ถ€ํ„ฐ ๋–จ์–ด์ง„ ์ •๋„๋กœ anomaly๋ฅผ ํŒ๋‹จํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ๋ชจํ˜•์ด ๋™์ž‘ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋ฉด ๊ตฌ์ฒด์ ์œผ๋กœ ์–ด๋–ป๊ฒŒ ๊ตฌํ˜„๋˜๋Š”์ง€ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Deviation Network

์ €์ž๊ฐ€ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์„ ํ•œ ๋ฌธ์žฅ์œผ๋กœ ์š”์•ฝํ•˜๋ฉด ๋‹ค์Œ ๋ฌธ์žฅ์œผ๋กœ ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

"The proposed framework is instantiated into a method called Deviation Networks (DevNet), which defines a Gaussian prior and a Z Score-based deviation loss to enable the direct optimization anomaly scores with an end-to-end neural anomaly score learner"

๋‹ค์Œ ๋นจ๊ฐ„์ƒ‰์˜ ๋‹จ์–ด๋“ค์ด ํ•ต์‹ฌ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ด๋ฃจ๋Š” ๋ถ€๋ถ„์ธ๋ฐ์š”. ์šฐ์„ , ์ฒซ๋ฒˆ์งธ Deviation Network์˜ backborn์„ ์ด๋ฃจ๋Š” Anomaly Scoring Network๋ถ€ํ„ฐ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

Anomaly Scoring Network๋Š” โˆ…ํ•จ์ˆ˜๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋Š” ํฌ๊ฒŒ Intermediate representation space์ธ Q๋ฅผ ๋งŒ๋“œ๋Š” ฯˆ๋„คํŠธ์›Œํฌ์™€ anomaly score๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ฮท ๋„คํŠธ์›Œํฌ๋กœ ๊ตฌ์„ฑ์ด ๋ฉ๋‹ˆ๋‹ค.

  • Intermediate representation space (Q)

  • Total anomaly scoring network

์ด โˆ… ๋„คํŠธ์›Œํฌ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” 2๊ฐ€์ง€ sub network

[1] Q๋ฅผ ๋งŒ๋“œ๋Š” ฯˆ ๋„คํŠธ์›Œํฌ ( feature learner )

[2] Q์—์„œ anomaly score๋ฅผ ๋„์ถœํ•˜๋Š” ฮท ๋„คํŠธ์›Œํฌ ( anomaly score learner )

๊ทธ๋ฆฌ๊ณ  ฯˆ ๋„คํŠธ์›Œํฌ๋Š” Feature learner๋กœ์„œ H๊ฐœ์˜ hidden layer๋กœ ๊ตฌ์„ฑ.

๋”ฐ๋ผ์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œ๊ธฐ๋ฅผ ๊ฐ€๋Šฅ.

Feature learner๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” hidden layer๋Š” ๋“ค์–ด์˜ค๋Š” input, ์ˆ˜ํ–‰ํ•˜๋ ค๋Š” task์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒŒ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ๊ฐ€๋ น ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ feature representation์„ ํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ๋Š” CNN ๋„คํŠธ์›Œํฌ๋ฅผ, sequence data๊ฐ™์€ ๊ฒฝ์šฐ๋Š” RNN ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ฮท ๋„คํŠธ์›Œํฌ์˜ ๊ฒฝ์šฐ simple linear neural unit์„ ์‚ฌ์šฉํ•ด์„œ ์Šค์ฝ”์–ด๋ฅผ ๊ณ„์‚ฐํ•˜๋„๋ก ๋„คํŠธ์›Œํฌ๋ฅผ ๊ตฌ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ „์ฒด anomaly scoring network๋ฅผ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๊ฒŒ๋ฉ๋‹ˆ๋‹ค.

์ €์ž๋Š” ์ด์™€ ๊ฐ™์€ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๊ณผ ๋‹ฌ๋ฆฌ direclyํ•˜๊ฒŒ data๋ฅผ anomaly score๋กœ mappingํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ฃผ์žฅํ•ฉ๋‹ˆ๋‹ค.

\

์ €์ž๋Š” ์—ฌ๊ธฐ์„œ ์ถ”๊ฐ€์ ์œผ๋กœ ์ด score๊ฐ€ ์ •๋ง anomalyํ•œ์ง€ ์•ˆํ•œ์ง€๋ฅผ ๊ฐ™์ด ์ฐธ๊ณ ํ•  ์ˆ˜ ์žˆ๋Š” reference score๋ฅผ ๊ตฌ์ถ•ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๊ธฐ๋ณธ์ ์œผ๋กœ ์ด reference score๋Š” normal objects์ธ R์—์„œ ๋žœ๋ค์œผ๋กœ ๋ฝ‘์•„ ๊ทธ score๋ฅผ ๊ธฐ์ค€์œผ๋กœ optimization์— ํ™œ์šฉํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋™์ž‘ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ด ๋ฐฉ๋ฒ•์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” 2๊ฐ€์ง€๋กœ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. Data-driven approach

  2. Prior-driven approach

์šฐ์„  Data-driven ๋ฐฉ๋ฒ•์€ ํ•™์Šต ๋ฐ์ดํ„ฐ X๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜์—ฌ anomaly score๋ฅผ ๋„์ถœํ•˜์—ฌ ํ‰๊ท ์„ ๊ณ„์‚ฐํ•œ ยต_r ๋ฅผ ๋„์ถœํ•ด ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด ๋ฐฉ๋ฒ•์€ ์ œํ•œ์ด ๋งŽ์€๋ฐ, ๋ฐ”๋กœ ์ด ยต_r ์ด X ๊ฐ’์ด ๋ฐ”๋€” ๋•Œ ๋งˆ๋‹ค ์กฐ๊ธˆ์”ฉ ๋ณ€ํ•˜๋Š” ํŠน์ง•์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์‚ฌ์ „ ํ™•๋ฅ  F์—์„œ ์ถ”์ถœํ•˜๋Š” reference score๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ยต_r ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ฑ„ํƒํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ ์ด์œ ๋กœ ์ €์ž๋Š” 2๊ฐ€์ง€ ์ด์œ ๋ฅผ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

  1. The chosen prior allows us to achieve good interpretability of the predicted anomaly scores

  2. It can generate ยต_r constantly, which is substantially more efficient than the data-driven approach

์ฒซ ๋ฒˆ์งธ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ๋ฒ•์ด anomaly score๋ฅผ ์˜ˆ์ธกํ•  ์‹œ good interpretability(์ข‹์€ ํ•ด์„?)์„ ๊ฐ–๋Š”๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ยต_r ์„ constantํ•˜๊ฒŒ ๊ณ ์ •ํ•  ์ˆ˜ ์žˆ๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ ์ž˜ ์™€๋‹ฟ์ง€๊ฐ€ ์•Š์ฃ . ์‚ฌ์ „ ํ™•๋ฅ  F๋ผ๋Š” ๊ฒƒ์ด ๋Œ€์ฒด ๋ฌด์—‡์ธ์ง€ ๊ฐ์„ ์žก์„ ์ˆ˜ ์—†์œผ๋‹ˆ๊นŒ์š”.

ํ•˜์ง€๋งŒ ์ž˜ ์™€๋‹ฟ์ง€๊ฐ€ ์•Š์ฃ . ์‚ฌ์ „ ํ™•๋ฅ  F๋ผ๋Š” ๊ฒƒ์ด ๋Œ€์ฒด ๋ฌด์—‡์ธ์ง€ ๊ฐ์„ ์žก์„ ์ˆ˜ ์—†์œผ๋‹ˆ๊นŒ์š”.

๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ •๋ง normal data์˜ anomaly score๋ฅผ ์ž˜ ๋Œ€๋ณ€ํ•  ์ˆ˜ ์žˆ์„๊นŒ์š”?

์ €์ž๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ฃผ์žฅํ•ฉ๋‹ˆ๋‹ค.

Gaussian distribution fits the anomaly scores very well in a range of data sets. This may be due to that the most general distribution for fitting values derived from Gaussian or non-Gaussian variables is the Gaussian distribution according to the central limit theorem.

์ฆ‰, ์•ž์„œ AD task์—์„œ ๋Œ€๋ถ€๋ถ„์˜ ๋ฐ์ดํ„ฐ๋Š” normal์ด๋ผ๋Š” ์ด์•ผ๊ธฐ๋ฅผ ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ์ด normal ๋ฐ์ดํ„ฐ๋“ค์— ๋Œ€ํ•œ anomaly score๋“ค๋„ ์–ด๋– ํ•œ distribution์„ ๋”ฐ๋ฅด๊ฒŒ ๋˜๊ฒ ์ฃ . ์ด ์ ์ˆ˜๋“ค ๋˜ํ•œ sample๋“ค์„ ์ถฉ๋ถ„ํžˆ ๋ฝ‘๊ฒŒ ๋˜๋ฉด ์ด ์นœ๊ตฌ๋“ค๋„ gaussian distribution์„ ๋”ฐ๋ฅด๊ฒŒ ๋  ๊ฒ๋‹ˆ๋‹ค. ๋ฐ”๋กœ Central Limit Theorem ๋•Œ๋ฌธ์ด์ฃ .

์ฆ‰, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ทธ๋ฆผ์œผ๋กœ ์‰ฝ๊ฒŒ ์ดํ•ดํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ฆ‰, normal ํ• ์ˆ˜๋ก ยต_r ์— ๋” ๊ฐ€๊นŒ์šด ์ ์ˆ˜๋ฅผ ํ˜•์„ฑํ•˜๊ฒŒ ๋ ๊ฑฐ์ง€๋งŒ abnormalํ•  ์ˆ˜๋ก ยต_r๋กœ๋ถ€ํ„ฐ ๋” ๊ฑฐ๋ฆฌ๊ฐ€ ์žˆ๋Š” ์ ์ˆ˜๊ฐ€ ๋‚˜์˜ค๊ฒ ์ฃ . ์ด๋Ÿฌํ•œ deviation ์ •๋„๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ loss function์„ ์ •์˜ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ฆ‰, normal ํ• ์ˆ˜๋ก ยต_r ์— ๋” ๊ฐ€๊นŒ์šด ์ ์ˆ˜๋ฅผ ํ˜•์„ฑํ•˜๊ฒŒ ๋ ๊ฑฐ์ง€๋งŒ abnormalํ•  ์ˆ˜๋ก ยต_r๋กœ๋ถ€ํ„ฐ ๋” ๊ฑฐ๋ฆฌ๊ฐ€ ์žˆ๋Š” ์ ์ˆ˜๊ฐ€ ๋‚˜์˜ค๊ฒ ์ฃ . ์ด๋Ÿฌํ•œ deviation ์ •๋„๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ loss function์„ ์ •์˜ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ € ๊ฐ๊ฐ์˜ ri๋Š” ์ •๊ทœ ๋ถ„ํฌ์—์„œ ๋„์ถœํ•˜๋ฉฐ ์ € ri๋Š” ๋žœ๋คํ•œ normal ๋ฐ์ดํ„ฐ ๊ฐ์ฒด์˜ anomaly score๋ฅผ ์˜๋ฏธํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ยต_r์„ 0, ฯƒ_r๋ฅผ 1๋กœ ์„ค์ •ํ•˜์˜€์œผ๋ฉฐ, ๋žœ๋ค ์ƒ˜ํ”Œ์€ CLT๋ฅผ ๋งŒ์กฑํ•  ์ˆ˜ ์žˆ๋Š” ์ถฉ๋ถ„ํ•œ ์–‘์ด๋ฉด ์ „๋ถ€ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ๋ช…์‹œํ•ฉ๋‹ˆ๋‹ค. ์ €์ž๋Š” 5000๊ฐœ์˜ ์ƒ˜ํ”Œ์„ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ž ๊ทธ๋Ÿฌ๋ฉด ์ด๋Ÿฌํ•œ reference score๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ตฌ์ฒด์ ์œผ๋กœ ์–ด๋–ป๊ฒŒ loss function์„ ์ •์˜ํ•˜๋Š” ์ง€๋ฅผ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” deviation ์ •๋„๋ฅผ ์ธก์ •ํ•˜๋Š” ์ง€ํ‘œ๋ฅผ Z-score ๋ฐฉ๋ฒ•์„ ์ฐจ์šฉํ•ด์„œ ํ‘œํ˜„ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜์˜ loss๋ฅผ ์ €์ž๋Š” contrastive loss๋ผ๊ณ  ๋ช…๋ช…ํ•ฉ๋‹ˆ๋‹ค.

Contrastive Loss

\

์˜ˆ๋ฅผ ๋“ค์–ด normal ๋ฐ์ดํ„ฐ๋ฉด ์ € โˆ… ๊ฐ’์ด ยต_r์ด๋ž‘ ๊ทผ์‚ฌํ•˜๊ฒŒ ๋˜๊ฒ ๊ณ  abnormal ๋ฐ์ดํ„ฐ๋ฉด ๊ทธ๋ ‡์ง€ ์•Š๊ฒ ์ฃ .

๊ทธ๋ฆฌ๊ณ  ์œ„์˜ contrast loss๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ตœ์ข… deviation loss๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค.

Deviation Loss

\

x๊ฐ€ anomaly์ธ ๊ฒฝ์šฐ y = 1, x๊ฐ€ normal์ธ ๊ฒฝ์šฐ y = 0์ด ๋ฉ๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ์œ„ 'a'์˜ ๊ฒฝ์šฐ Z-score์˜ confidence interval paramter ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

\

์ด๊ฒŒ ๋ฌด์Šจ ๋ง์ผ๊นŒ์š”?

๋‹ค์‹œ ํ•œ ๋ฒˆ ๊ตฌ์ฒด์ ์œผ๋กœ term์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Note that if x is an anomaly and it has a negative dev(x), the loss is particularly large, which encourages large positive derivations for all anomalies.

๋งŒ์•ฝ์— x๊ฐ€ anomalies์ธ๋ฐ deviation์ด ์Œ์ˆ˜์ด๋ฉด ์ „์ฒด loss ๊ฐ’์€ ์ปค์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ชจํ˜•์€ ์ด anomaly data์˜ deviation์ด ํฐ ์–‘์ˆ˜ ๊ฐ’์„ ๊ฐ€์ง€๊ฒŒ๋” ๋งŒ๋“ค๋ ค๊ณ  ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ฑฐ์˜ a๊ฐ’์— ๊ทผ์‚ฌํ•ด์งˆ ๋งŒํผ์œผ๋กœ ๋ง์ด์ฃ .

์ด ๋ง์˜ ์˜๋ฏธ๋Š” ๋‹ค์Œ ๊ทธ๋ฆผ์œผ๋กœ๋„ ์‰ฝ๊ฒŒ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ฆ‰, dev ๊ฐ’์ด 0 ๊ทผ์ฒ˜๋กœ ์˜ค๋Š” ๊ฒฝ์šฐ๋Š” ์ •์ƒ ๋ฐ์ดํ„ฐ๋“ค์ด์ง€๋งŒ a=5๋กœ ์ฃผ์—ˆ์„ ๋•Œ๋Š” ์ด ๊ฒฝ์šฐ anomaly๋ฅผ ์ € ๊ณณ์œผ๋กœ ๊ทผ์‚ฌ์‹œํ‚ค๋Š” ๊ฒฝ์šฐ๋กœ ์ ์šฉ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋„ a=5๋กœ ์ฃผ์–ด ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ํ•˜๋„ค์š”

์›๋ฌธ์€ ์ •ํ™•ํžˆ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์–ธ๊ธ‰ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

Therefore, the deviation loss is equivalent to enforcing a statistically significant deviation of the anomaly score of all anomalies from that of normal objects in the upper tail. We use a = 5 to achieve a very high significane level for all labeled anomalies.

ํ•˜์ง€๋งŒ ์—ฌ๊ธฐ์„œ ๋ถ„๋ช… ๋ฌผ์Œํ‘œ๋ฅผ ์ฐ์œผ์‹œ๋Š” ๋ถ„๋“ค์ด ๊ณ„์‹ค ๊ฒ๋‹ˆ๋‹ค. ์ž๊พธ normal, normalํ•˜๋Š”๋ฐ ์• ์ดˆ์— ์šฐ๋ฆฌ๋Š” limited๋œ anomalies๋ฅผ ์ œ์™ธํ•˜๋ฉด ์–ด๋–ค normal labeled ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋ฅด๋Š” ์ƒํ™ฉ์ธ๋ฐ, ์™œ normal ๋ฐ์ดํ„ฐ๋ฅผ ์–ธ๊ธ‰ํ•˜๋ƒ๋Š” ๋ถ€๋ถ„์ด ๊ถ๊ธˆํ•˜์‹ค ๊ฒ๋‹ˆ๋‹ค. ์‚ฌ์‹ค ์ด๊ฑด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.

We address this problem by simply treating the unlabeled training data objects in U as normal objects.

์ฆ‰ ๊ทธ๋ƒฅ normal ๋ฐ์ดํ„ฐ๋กœ ๊ฐ„์ฃผํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. Unlabeled ๋ฐ์ดํ„ฐ๊ฐ€ ์ „๋ถ€ normal์ด๋ผ๋Š” ๋ณด์žฅ์ด ์žˆ๋Š” ๊ฒƒ๋„ ์•„๋‹ˆ๋ฉด์„œ๋„ ๋ง์ด์ฃ  ( ์‹ค์ œ Unlabeled ๋ฐ์ดํ„ฐ์— abnormal์ด ๋“ค์–ด์žˆ๋Š” ๊ฒฝ์šฐ๋ฅผ contaminated ๋˜์—ˆ๋‹ค๊ณ  ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค )

๊ต‰์žฅํžˆ ์ด์ƒํ•˜์ฃ ...! ์™œ ๊ทธ๋ ‡๊ฒŒ ํ• ๊นŒ์š”?

์‚ฌ์‹ค ๊ณต๋ถ€ํ•˜๋‹ค๋ณด๋ฉด ๋งŽ์€ semi-supervised learning์—์„œ๋Š” Unlabeled๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ „๋ถ€ normal์ด๋ผ๊ณ  ์น˜๋ถ€ํ•˜์—ฌ ๋ชจํ˜•์„ fittingํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์‚ฌ์‹ค ๊ทธ ์ด์œ ๋Š” 2๊ฐ€์ง€๊ฐ€ ์žˆ๋Š”๋ฐ ์ฒ˜์Œ์œผ๋กœ๋Š” ์ด๊ฒƒ์ด ์‹ค์ œ real world์™€ ๊ต‰์žฅํžˆ ์œ ์‚ฌํ•œ ์ƒํ™ฉ์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ๋„ ์šฐ๋ฆฌ๋Š” ๋งŽ์€ labeled ๋˜์ง€ ์•Š๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์šฐ๋ฆฌ๋Š” ์•Œ๋“ฏ, anomalyํ•œ ์ƒํ™ฉ์€ ๊ต‰์žฅํžˆ scarceํ•˜๊ฒŒ ๋ฐœ์ƒํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ฆ‰ ๋Œ€๋ถ€๋ถ„์˜ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋Š” normalํ•˜๋‹ค๋Š” ์ „์ œ๊ฐ€ ๋“ค์–ด๊ฐ€๋Š” ๊ฒƒ์ด์ฃ . ๋”ฐ๋ผ์„œ ์ด๋Ÿฌํ•œ real world ์ƒํ™ฉ์„ ๊ทธ๋Œ€๋กœ ๊ณ ๋ คํ•ด์ฃผ๋Š” ์กฐ์น˜์ธ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ์ด ๋งค์šฐ ์ ์€ anomaly ๋ฐ์ดํ„ฐ๊ฐ€ ์‹ค์ œ backpropagation์„ ์ˆ˜ํ–‰ํ•  ๋•Œ SGD ๊ธฐ๋ฐ˜์˜ optimization์— ๋Œ€ํ•ด์„œ๋Š” ๊ทธ๋‹ค์ง€ ์˜ํ–ฅ๋ ฅ์ด ํฌ์ง€ ์•Š๋‹ค๋Š” ๊ฒƒ์„ ์ „์ œํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ๋ชจํ˜•์˜ ์„ฑ๋Šฅ์— ๊ทธ๋ ‡๊ฒŒ ํฐ ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š์„ ๊ฒƒ์ด๋ผ๋Š” ๊ฑฐ์ฃ . ๋”ฐ๋ผ์„œ ๊ฑฐ์˜ rule of thumb ์‹์œผ๋กœ semi-supervised learning์—์„œ๋Š” unlabeled ๋ฐ์ดํ„ฐ๋ฅผ normal์ด๋ผ๊ณ  ๊ฐ„์ฃผํ•ด์„œ ์‚ฌ์šฉํ•˜๊ณค ํ•ฉ๋‹ˆ๋‹ค. ( ๋ฌผ๋ก  ์ด ๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ๋„ ๋” ์—ฐ๊ตฌ๊ฐ€ ๋˜์–ด์•ผํ•  ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค. )

๋”ฐ๋ผ์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด DevNet ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์Šˆ๋„ ์ฝ”๋“œ๋กœ ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋ ‡๋‹ค๋ฉด ์ด์ œ loss function๊นŒ์ง€ ๋””์ž์ธํ•˜์—ฌ trainingํ•  ์ค€๋น„๊ฐ€ ๋˜์—ˆ์œผ๋‹ˆ ๋‹ค์Œ์œผ๋กœ check ํ•ด๋ด์•ผํ•˜๋Š” ๋ถ€๋ถ„์€ ๋ฐ”๋กœ, 'Interpretability'์ž…๋‹ˆ๋‹ค. ์ฆ‰, ์–ด๋–ค ๊ฒฝ์šฐ์ผ ๋•Œ normal, abnormal์ด๋ผ๊ณ  ํŒ๋‹จํ•˜๋ƒ๋Š” ๊ฑฐ์ฃ .

๋ณธ ์—ฐ๊ตฌ๋Š” ๋‹ค์Œ Proposition์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

\

์ด ๋ง์˜ ์˜๋ฏธ๊ฐ€ ๋ญ˜์ง€ ํ•œ๋ฒˆ ์ƒ๊ฐํ•ด๋ด…์‹œ๋‹ค.

์ผ๋ฐ˜์ ์ธ ํ‘œ์ค€์ •๊ทœ๋ถ„ํฌ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์„ฑ์งˆ์ด ์žˆ์ฃ .

๋งŒ์ผ ยต_r๊ฐ€ 0์ด๊ณ  ฯƒ_r๊ฐ€ 1์ธ ์ •๊ทœ๋ถ„ํฌ๊ฐ€ ์žˆ๊ณ , p=0.95๋ผ๊ณ  ํ•˜๋ฉด, z(0.95)=1.96์ด ๋˜๋ฏ€๋กœ=

ยต_r = 0์„ ๊ธฐ์ค€์œผ๋กœ ( ยต_r - z ~ ยต_r + z ) ๊ตฌ๊ฐ„์ด ๊ฒฐ๊ตญ ์‹ ๋ขฐ ๊ตฌ๊ฐ„์ด ๋ฉ๋‹ˆ๋‹ค.

\

ํ•˜์ง€๋งŒ ๋งŒ์•ฝ ์ƒˆ๋กœ ๋ฐ›์€ anomaly score๊ฐ€ ์ด boundary๋ฅผ ๋„˜์–ด๊ฐ€๋Š” ๊ณณ์— mapping์ด ๋œ๋‹ค๋ฉด ์–ด๋–จ๊นŒ์š”? ๊ทธ ์˜๋ฏธ๋Š” ๊ฒฐ๊ตญ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

The object only has a probability of 0.05 generated from the same machanism as the normal data objects.

์ฆ‰, normal ์ผ ํ™•๋ฅ ์ด ๋งค์šฐ ๋‚ฎ๊ฒŒ ๋˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์™œ ์ด๋Ÿฐ form์„ ๊ธฐ์ค€์œผ๋กœ anomaly๋ฅผ ํŒ๋‹จํ•˜๋Š” threshold๋ฅผ ์„ค์ •ํ–ˆ๋ƒ๋ฉด,

This proposition of DevNet is due to the Gaussian prior and Z-Score-based deviation loss.

์ฆ‰, Z-score ๊ธฐ๋ฐ˜์˜ deviation loss๋ฅผ ์ •์˜ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

์ด๋ ‡๊ฒŒ ํ•™์Šต์ด ์ˆ˜ํ–‰๋˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

4. Experiment

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” 9๊ฐ€์ง€์˜ real-world ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

  • Fraud Detection ( fraudulent credit card transaction ) : ์‹ ์šฉ ์นด๋“œ ๊ฑฐ๋ž˜ ์‚ฌ๊ธฐ ํƒ์ง€

  • Malicious URLs in URL : ์ด์ƒ URL ํƒ์ง€

  • The thyroid disease detection : ๊ฐ‘์ƒ์„  ๋น„๋Œ€์ฆ ์‚ฌ์ง„ ํƒ์ง€

  • ...

์ž์„ธํ•œ ์‚ฌํ•ญ์€ ์ €์ž๊ฐ€ Appendix์— ์ถ”๊ฐ€ํ•ด๋†“์€ ์•„๋ž˜ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ•˜๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

\

๋น„๊ต ์ง‘๋‹จ์œผ๋กœ ์‚ฌ์šฉํ•œ ๋ชจํ˜•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

[1] REPEN\

[2] adaptive DeepSVDD\

[3] Prototypical Network (FSNet)\

[4] iForest\

REPEN์˜ ๊ฒฝ์šฐ limited labeled data๋ฅผ ์‚ฌ์šฉํ•˜๋Š” neural net ๊ธฐ๋ฐ˜์˜ AD network์ด๊ณ , FSNet์˜ ๊ฒฝ์šฐ few show classification์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋„คํŠธ์›Œํฌ์ž…๋‹ˆ๋‹ค. ๋‘ ๋„คํŠธ์›Œํฌ ๋ชจ๋‘ limited labeled ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ํŠน์ง•์ž…๋‹ˆ๋‹ค. DevNet๊ณผ ๋™์ผํ•œ ์กฐ๊ฑด์ด์ฃ . ๋ฐ˜๋ฉด Unsupervised ๋ฐฉ๋ฒ•์œผ๋กœ AD๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ์•™์ƒ๋ธ” ๊ธฐ๋ฐ˜์˜ ๋ชจํ˜• iForest๋„ ๋น„๊ต์ง‘๋‹จ์œผ๋กœ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

DeepSVDD ๊ฐ™์€ ๊ฒฝ์šฐ๋Š” ๊ต‰์žฅํžˆ ์œ ๋ช…ํ•œ AD๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ๋ฐ์š”, ์—ฌ๊ธฐ์„œ ์ €์ž๋Š” ์–ด๋– ํ•œ ์กฐ์ž‘์„ ๊ฐ€ํ•ด DeepSVDD๋ฅผ DevNet๊ณผ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋Š” ์กฐ๊ฑด์œผ๋กœ ๋งŒ๋“ค์–ด ๋†“์Šต๋‹ˆ๋‹ค. ๋ฐ”๋กœ semi-supervised learning์„ ํ•  ์ˆ˜ ์žˆ๋Š” ํ˜•ํƒœ๋กœ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด์ฃ .

We modified DSVDD to fully leverage the labeled anomalies by adding an additional term into its objective function to guarantee a large margin between normal objects and anomalies in the new space while minimizing the c-based hypershere's volume.

์ฆ‰, labeled๋œ anomalies๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํ˜•ํƒœ๋กœ loss function์„ ์กฐ๊ธˆ adjustํ–ˆ๋‹ค๊ณ  ์–ธ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ๋กœ ์ด๋Ÿฌํ•œ ์ˆ˜์ •ํ•˜๋Š” ๊ณผ์ •์„ ๊ฑฐ์นจ์œผ๋กœ์„œ original SVDD๋ณด๋‹ค ๋” ์„ฑ๋Šฅ์ด ์ž˜ ๋‚˜์™”๋‹ค๊ณ  ์–ธ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค.

์•„์‹œ๋Š” ๋ถ„๋„ ๊ณ„์‹œ๊ฒ ์ง€๋งŒ DeepSVDD์˜ semi-supervised learning ๋ฒ„์ ผ์€ 2019-2020๋…„์— ๋™์ผํ•œ ์ €์ž๊ฐ€ ์ž‘์„ฑํ•œ DeepSAD์ด๋ผ๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ DevNet์ด ๋‚˜์˜ฌ ๋• ์•„์ง DeepSAD์ด ๋‚˜์˜ค๊ธฐ ์ „์ด๊ธฐ ๋•Œ๋ฌธ์— ๋‹น์‹œ DevNet ์ €์ž๋Š” ์ด๋Ÿฌํ•œ heuristic์„ ์ ์šฉํ–ˆ๋˜ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ด๋ ‡๊ฒŒ ์ด 4๊ฐœ์˜ ๋น„๊ต ์ง‘๋‹จ์„ ์‚ฌ์šฉํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•˜์˜€์Šต๋‹ˆ๋‹ค.

\

Metric์€ ์–ด๋–ป๊ฒŒ ๋ ๊นŒ์š”?

์ผ๋ฐ˜์ ์œผ๋กœ AD์—์„œ ์‚ฌ์šฉํ•˜๋Š” metric์€ AUROC์™€ AUC-PR์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ด ๋‘˜์— ๋Œ€ํ•ด์„œ๋Š” ๊ฐ„๋žตํ•˜๊ฒŒ ์ •๋ฆฌํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

\

  • AUROC( Area Under Receiver Operating Characteristics )

๋งŽ์ด๋“ค ์•Œ๊ณ  ๊ณ„์‹œ๋Š” ROC curve์ž…๋‹ˆ๋‹ค. ์ด ROC curve๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์šฐ์„  Confusion Matrix๋ฅผ ์ดํ•ดํ•  ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

\

actual์€ ๋ง ๊ทธ๋Œ€๋กœ ์‹ค์ œ label ๊ฐ’์ด๊ณ  pred๋Š” prediction์˜ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ƒํ™ฉ์—์„œ ์šฐ๋ฆฌ๋Š” True Positive, True Negative, False Positive, False Negative๋ฅผ ์ •์˜ํ•  ์ˆ˜ ์žˆ๊ณ  ์ด ์ˆ˜์น˜๋“ค์„ ์‚ฌ์šฉํ•ด์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŠน์ง•๋“ค์„ ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

\

Sensitivity ( ๋ฏผ๊ฐ๋„, True Positive Rate ) = ( True Positive ) / ( True Positive + False Negative )

Specificity ( ํŠน์ด๋„, True Negative Rate ) = ( True Negative ) / ( False Positive + True Negative )

False Positive Rate = ( False Positive ) / ( False Positive + True Negative )

Precision = ( ์ •๋ฐ€๋„, True Positive ) / ( True Positive + False Positive )

Recall = ( ๋ฏผ๊ฐ๋„, True Positive ) / ( True Positive + False Negative )

Accuracy = ( ์ •ํ™•๋„, True Positive + True Negative ) / ( True Positive + True Negative + False Negative + False Positive )

\

์ด ๊ฐœ๋…๋“ค ์ค‘์—์„œ TPR(True Positive Rate)๊ณผ FPR(False Positive Rate)์„ ์‚ฌ์šฉํ•˜์—ฌ Curve๋ฅผ ๊ทธ๋ฆฌ๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ Curve๋ฅผ ๊ทธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

\

๋นจ๊ฐ„ ์ ์„ ์ฒ˜๋Ÿผ ์„ ์ด ํ˜•์„ฑ๋˜๋Š” ๊ฒฝ์šฐ randomํ•˜๊ฒŒ classifierํ•œ ๊ฒฝ์šฐ์™€ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋นจ๊ฐ„ ์  ์•„๋ž˜์—์„œ ์˜์—ญ์ด ํ˜•์„ฑ๋˜๋Š” ๊ฒฝ์šฐ๋Š” 0.5์ฃ . ๋ฐ˜๋ฉด ๋ณด๋ผ์ƒ‰์˜ ์„ ์˜ ๊ฒฝ์šฐ ๊ฐ€์žฅ ๋ชจํ˜•์ด ๊ฐ•๋ ฅํ•œ ๊ฒฝ์šฐ, ์ฆ‰ ๋ชจ๋“  ๊ฒฝ์šฐ๋ฅผ ๋งž์ถ˜ ๊ฒฝ์šฐ๋กœ ์•„๋ž˜ ๋ฉด์ ์€ 1์ด ๋ฉ๋‹ˆ๋‹ค.

๋ฐ”๋กœ ์ด ๋ฉด์ ์˜ ํฌ๊ธฐ์— ๋”ฐ๋ผ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•˜๋Š” ๊ฒƒ์ด ๋ฐ”๋กœ AUROC์ž…๋‹ˆ๋‹ค.

  • AUC-PR

ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์€ ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ”๋กœ minor class์˜ error์— ๋Œ€ํ•œ ๋น„์ค‘์ด ๋‚ฎ๊ฒŒ ์žก๊ธฐ ๋•Œ๋ฌธ์ธ๋ฐ์š”.

๊ฐ์ด ์ž˜ ์•ˆ์˜ค์‹œ์ฃ ?

์ดํ•ด๋ฅผ ์œ„ํ•ด ์˜ˆ๋ฅผ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด normal ๋ฐ์ดํ„ฐ๊ฐ€ 30,000๊ฐœ, abnormal ๋ฐ์ดํ„ฐ๊ฐ€ 100๊ฐœ ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ณธ๋‹ค๊ณ  ํ•ด๋ด…์‹œ๋‹ค.

์—ฌ๊ธฐ์„œ ๋˜‘๊ฐ™์ด 50๊ฐœ๋ฅผ ํ‹€๋ฆฐ๋‹ค๊ณ  ํ•˜๋ฉด, normal์—์„œ๋Š” 50๊ฐœ ํ‹€๋ฆฐ ๊ฒƒ์ด ์ƒ๋Œ€์ ์œผ๋กœ ์ ๊ฒŒ ๋˜์ง€๋งŒ, abnormal์—์„œ๋Š” 50๊ฐœ๊ฐ€ ํ‹€๋ฆฌ๋ฉด ์ „์ฒด 2๋ถ„์˜ 1์ด ํ‹€๋ฆฌ๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ์ ์€ ์ˆ˜์น˜๋ผ๊ณ  ๋ณผ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์ฆ‰ ์ด ํ‹€๋ฆฌ๋Š” ์ •๋„๋ฅผ ๊ฐ–๊ฒŒ ํ•ด์„œ๋Š” ์•ˆ๋˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋ถ€๋ถ„์„ ์ž˜ ๋ฐ˜์˜ํ•ด์ฃผ๋Š” ๊ฒƒ์ด ๋ฐ”๋กœ Precision๊ณผ Recall์˜ ์กฐํ•ฉ์ž…๋‹ˆ๋‹ค.

์ด ๋‘˜์˜ ์กฐ๊ธˆ ๋” ๋ช…ํ™•ํ•œ ์ดํ•ด๋ฅผ ์œ„ํ•ด ๋‹ค์Œ ์˜ˆ์‹œ๋ฅผ ๋“ค์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

\

Object Detection์„ ์ˆ˜ํ–‰ํ•˜๋Š” task๋ฅผ ์˜ˆ๋กœ ๋“ค์–ด๋ณด์ฃ . ์—ฌ๊ธฐ์„œ ๋ชจํ˜•์ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‚ฌ๋žŒ 2๋ช…์ด ์žˆ๋Š” ์‚ฌ์ง„์—์„œ ํ•œ ์‚ฌ๋žŒ์„ ํƒ์ง€ํ•˜๊ณ  ์ด๋ฅผ ์‚ฌ๋žŒ์ด๋ผ๊ณ  ์˜ˆ์ธก์„ ํ•ด๋ดค๋‹ค๊ณ  ํ•ด๋ด…์‹œ๋‹ค. Precision ๊ด€์ ์—์„œ๋Š” ๋‚ด๊ฐ€ ์˜ˆ์ธก์„ ์‹œ๋„ํ•œ ๊ฒฝ์šฐ๊ฐ€ ๋‹ค ์ž˜ ๋“ค์–ด๋งž์•˜๋‹ค๊ณ  ์ƒ๊ฐํ•ด์„œ 100%์˜ ์ •ํ™•๋„๋ฅผ ๊ฐ–๋Š”๋‹ค๊ณ  ์ด์•ผ๊ธฐํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ Recall ๊ด€์ ์€ ์กฐ๊ธˆ ๋‹ค๋ฅด์ฃ . ์‚ฌ๋žŒ์ด ์‹ค์ œ๋กœ ํ•œ ๋ช… ๋” ์žˆ๋Š”๋ฐ ์ด ์‚ฌ๋žŒ์€ ๋งž์ถ”์ง€๋ฅผ ๋ชปํ–ˆ์œผ๋‹ˆ ์ •ํ™•๋„๋Š” 50%์ด๋ผ๊ณ  ์ด์•ผ๊ธฐํ•˜๋Š” ๊ฒ๋‹ˆ๋‹ค.

์ฆ‰ ์ด ๋‹ค๋ฅธ ๋‘ ๊ด€์ ์„ ์ ์ ˆํ•˜๊ฒŒ ์กฐํ•ฉํ•˜์—ฌ metric์„ ๋งŒ๋“ค๋ฉด ์œ„์™€ ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ , ์ฆ‰ minorํ•œ class์— ๋Œ€ํ•ด ๋” ์ ์ ˆํ•œ ๊ฐ€์ค‘์น˜๋ฅผ ์ค„ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํŒ๋‹จํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋กœ๋ถ€ํ„ฐ ๋‚˜์˜จ ๊ฐœ๋…์ด ๋ฐ”๋กœ AUC-PR์ž…๋‹ˆ๋‹ค.

\

์ž ๊ทธ๋Ÿผ ๋‹ค์‹œ ๋ณธ๋ก ์œผ๋กœ ๋Œ์•„์™€์„œ, ์‹คํ—˜์— ๋Œ€ํ•œ ๊ตฌ์ฒด์ ์ธ ๋‚ด์šฉ์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์šฐ์„  ์‹คํ—˜ ํ™˜๊ฒฝ ์„ค์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์„ค์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์‹ ๊ฒฝ๋ง์˜ ๊นŠ์ด๋Š” ํ•œ ๊ฐœ์˜ hidden layer๋ฅผ ์‚ฌ์šฉํ•˜์˜€๊ณ  ๊ตฌ์ฒด์ ์ธ ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์„ค์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค

  • 20 neural units

  • RMSProp Optimizer

  • 20 mini batch

  • ReLU

  • L2 Norm Regularize

๋˜ํ•œ network์€ ๋Œ€๋ถ€๋ถ„์˜ ๋ฐ์ดํ„ฐ๊ฐ€ unordered multidimensional data๋ผ๋Š” ์ ์„ ๊ณ ๋ คํ•˜์—ฌ Multilayer perceptron์„ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ์„ค์ •์„ ๋ฐ”ํƒ•์œผ๋กœ ๋ชจํ˜• ๊ฐ„ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•œ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

\

census ๋ฐ์ดํ„ฐ๋ฅผ ์ œ์™ธํ•˜๊ณ ๋Š” ๋ชจ๋“  ๊ฒฝ์šฐ์—์„œ AUROC์™€ AUC-PR ๋ชจ๋‘ DevNet์˜ ์„ฑ๋Šฅ์ด ๊ฐ€์žฅ ์šฐ์ˆ˜ํ•จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

๋ณธ ๊ทธ๋ฆผ์—์„œ Data Characteristic์—์„œ ๋‚˜์˜จ notation์˜ ์˜๋ฏธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • '# obj' -> ๋ฐ์ดํ„ฐ์˜ ์ˆ˜

  • 'D' -> ๋ฐ์ดํ„ฐ์˜ dimension

  • 'f1' -> ํ•™์Šต ๋ฐ์ดํ„ฐ์—์„œ anomaly ๋ฐ์ดํ„ฐ ๋น„์ค‘

  • 'f2' -> ์ „์ฒด ๋ฐ์ดํ„ฐ์—์„œ anomaly ๋ฐ์ดํ„ฐ ๋น„์ค‘

\

ํ•˜์ง€๋งŒ ์—ฌ๊ธฐ์„œ ๋๋‚˜๋ฉด ๋ญ”๊ฐ€ ์•„์‰ฝ์ฃ .

์ €์ž๋Š” ๊ตฌ์ฒด์ ์ธ ์—ฌ๋Ÿฌ ์‹คํ—˜์„ ํ†ตํ•ด DevNet์˜ ํšจ์œจ์„ฑ์„ ๊ฒ€์ฆํ•ฉ๋‹ˆ๋‹ค.

\

[1] Data Efficiency

์ฒซ ๋ฒˆ์งธ๋กœ ์ €์ž๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ถ๊ธˆ์ฆ์„ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • How data efficient are the DevNet and other deep methods?

  • How much improvement can the deep methods gain from the labeled anomalies compared to the unsupervisd iForest?

์ฆ‰, ์–ผ๋งˆ๋‚˜ ๋ณธ์ธ๋“ค์˜ ๋ชจํ˜•์ด label์ด ์ถ”๊ฐ€๋˜๋Š” anomlies๋ฅผ ์ž˜ ํ™œ์šฉํ•˜๋ƒ, prior knowledge๋ฅผ ์ž˜ ํ™œ์šฉํ•˜๋ƒ๋ฅผ ์ธก์ •ํ•˜๊ณ ์ž ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ด ๋•Œ base line์œผ๋กœ ์‚ฌ์šฉ๋œ iForest์˜ ๊ฒฝ์šฐ labeled์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— label์„ ์ฃผ๋“  ์•ˆ์ฃผ๋“  ๋ชจํ˜•์˜ ์„ฑ๋Šฅ ์ฐจ์ด๋Š” ์—†๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๋‹ค์Œ ๊ทธ๋ฆผ์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

\

์ถ”๊ฐ€๋˜๋Š” anomalies์˜ ์ˆ˜์— ๋Œ€ํ•ด ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ DevNet์ด ๋ณด์ž„์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠน๋ณ„ํžˆ campaign, census, news20, thyroid ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ์ถ”๊ฐ€๋˜๋Š” anomalies label์— ๋Œ€ํ•ด ๋” ํฐ ํญ์˜ ์„ฑ๋Šฅ์˜ ํ–ฅ์ƒ์„ ๋ณด์ž„์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

[2] Robustness w.r.t. Anomaly Contamination

๋‘ ๋ฒˆ์งธ๋กœ๋Š” anomaly contamination์— ๋Œ€ํ•ด ์–ผ๋งˆ๋‚˜ ๊ฐ•๋ ฅํ•œ ๋Œ€์‘์„ ์ˆ˜ํ–‰ํ•˜๋ƒ์ž…๋‹ˆ๋‹ค.

์•ž์—์„œ ์ž ๊น ์–ธ๊ธ‰ํ–ˆ์—ˆ์ง€๋งŒ Contamination์ด ์ •ํ™•ํžˆ ๋ญ˜๊นŒ์š”? ๋‹ค์Œ ๊ธ€์„ ์ฐธ๊ณ ํ•ด๋ด…์‹œ๋‹ค.

To confuse data by sampling anomaly and adding it to the unreliable training data or removing some anomaly

์ฆ‰ ์ผ๋ฐ˜์ ์ธ unlabeled ๋œ ๋ฐ์ดํ„ฐ๋ฅผ normal ๋กœ ๊ฐ„์ฃผํ•˜๋Š” ์ƒํ™ฉ์—์„œ ๊ทธ unlabeled๋œ ๋ฐ์ดํ„ฐ์— anomaly ๋น„์ค‘์ด contamination๋œ ์ •๋„๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ๋ฐ”๋กœ ์ด ์–‘์„ ๋Š˜๋ ค๊ฐ์— ๋”ฐ๋ผ ๋ชจํ˜•์ด ์–ผ๋งˆ๋‚˜ sensitiveํ•œ ์ง€๋ฅผ ์ธก์ •ํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

์ €์ž๋Š” 0~20% ์ •๋„๋กœ contamination์„ ์ฃผ๋ฉด์„œ ์„ฑ๋Šฅ์˜ ๋ณ€ํ™”๋ฅผ ์ธก์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ ๊ทธ๋ฆผ์„ ์ฐธ๊ณ ํ•ด๋ด…์‹œ๋‹ค.

\

๋Œ€์ฒด์ ์œผ๋กœ attack์— ๋Œ€ํ•ด robustํ•œ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค๋งŒ ์˜๋ฌธ์ ์€ 'news20' ๋ฐ์ดํ„ฐ์…‹์—์„œ๋Š” ์œ ๋‹ฌ๋ฆฌ DevNet์ด drasticํ•œ ๊ฐ์†Œ๋ฅผ ๋ณด์ž„์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. Text data๋กœ ๊ตฌ์„ฑ๋œ news20์— ๋Œ€ํ•ด contamination์— ๋Œ€ํ•ด์„œ๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ๋” ํฐ ๊ฐ์†Œํญ์„ ๋ณด์ž„์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

[3] Ablation Study

์„ธ ๋ฒˆ์งธ๋กœ ablation study๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

์ €์ž๋Š” DevNet์˜ ์•„ํ‚คํ…์ฒ˜์— ์‚ฌ์šฉ๋˜๋Š” ์—ฌ๋Ÿฌ ๊ตฌ์„ฑ์š”์†Œ { intermediate representation, FC layer, One hidden layer } ๊ฐ™์€ ๊ฒƒ๋“ค์ด ์‹ค์ œ ๊ฐ๊ฐ์˜ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์ง€๋ฅผ ๊ฒ€์ฆํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ € ์š”์†Œ๋“ค์ด ๊ผญ ๋‹ค ํ•„์š”ํ•œ ์ง€๋ฅผ ํ™•์ธํ•˜๊ณ ์ž ํ•œ๊ฑฐ์ฃ .

๊ทธ๋ž˜์„œ ์ด 3๊ฐœ์˜ ๋น„๊ต ์ง‘๋‹จ์„ ๋งŒ๋“ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์กฐ๊ธˆ ๋” ๊ฐ„๋‹จํ•˜๊ฒŒ ๋„์‹ํ™”ํ•ด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

\

์ฆ‰, ๊ธฐ์กด์˜ Def๊ฐ€ DevNet์ด๋ผ๋ฉด, ๊ฐ๊ฐ์˜ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ํ•˜๋‚˜์”ฉ ์ œ๊ฑฐํ•˜๊ฑฐ๋‚˜ layer๋ฅผ 3๊ฐœ๋กœ ๋Š˜๋ฆฌ๋Š” ์กฐ์ž‘์„ ๊ฐ€ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

DevNet-Rep์˜ ๊ฒฝ์šฐ ๋งˆ์ง€๋ง‰์— anomaly score๋ฅผ scala ํ˜•ํƒœ๋กœ ๋„์ถœํ•˜๋Š” FC layer๋ฅผ ์ œ๊ฑฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ฆ‰ 20๊ฐœ์˜ dimension์„ ๊ฐ–๋Š” ๋ฒกํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์„ฑ๋Šฅ์„ ๋„์ถœํ•˜๊ฒŒ ๋˜๋Š”๊ฑฐ์ฃ . ( ์ด๋ถ€๋ถ„์—์„œ ์–ด๋–ป๊ฒŒ anomaly score๋ฅผ ๋„์ถœํ–ˆ๋Š” ์ง€๊ฐ€ ๋ช…ํ™•ํžˆ ์–ธ๊ธ‰๋˜์–ด ์žˆ์ง€ ์•Š๋„ค์š”. ) DevNet-Linear์˜ ๊ฒฝ์šฐ feature representation์„ ์ˆ˜ํ–‰ํ•˜๋Š” network๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  ๋ฐ”๋กœ linearํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด anomaly score๋ฅผ ๋„์ถœํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ DevNet-3HL์€ 3๊ฐœ์˜ hidden layer๋Š” 20๊ฐœ์˜ ReLU๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํ•˜๋‚˜์˜ layer๊ฐ€ ์•„๋‹Œ 1000 - 250 - 20 ๊ฐœ์˜ ReLU๋ฅผ ์‚ฌ์šฉํ•˜๋Š” 3๊ฐœ์˜ hiddne layer๋กœ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

์ด์— ๋Œ€ํ•œ ์„ฑ๋Šฅ์˜ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

\

๋Œ€๋ถ€๋ถ„์˜ ๋ฐ์ดํ„ฐ์…‹์—์„œ AUROC๋Š” ๋ณธ DevNet์˜ ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ AUC-PR์˜ ๊ฒฝ์šฐ ์ผ๋ถ€๋ถ„์—์„œ Rep๋‚˜ 3HL์ด ๋” ์šฐ์ˆ˜ํ•˜๊ฒŒ ๋‚˜์˜จ ๊ฒฝ์šฐ๋„ ์กด์žฌํ•˜์˜€์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ census ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ๋Š” Rep์—์„œ ์„ฑ๋Šฅ์ด ๊ฐ€์žฅ ์šฐ์ˆ˜ํ–ˆ์Œ๋„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ ๋Œ€์ฒด์ ์œผ๋กœ Def์˜ ์„ฑ๋Šฅ์ด ๊ฐ€์žฅ ์šฐ์ˆ˜ํ•˜์˜€์œผ๋ฏ€๋กœ end-to-end learning์„ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฐ๊ฐ์˜ ์š”์†Œ๋“ค์ด ์ „๋ถ€ ์ ์ ˆํ•œ contribution์„ ๊ฐ–๋Š”๋‹ค๊ณ  ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ 3HL ๊ฐ™์ด ๋” ๊นŠ์€ ์‹ ๊ฒฝ๋ง์„ ์ ์šฉํ•œ ๊ฒƒ์ด ์™œ ๋” ์ž˜ ๋™์ž‘ํ•˜์ง€ ์•Š๋Š” ์ง€๋ฅผ ์„ค๋ช…ํ•˜๋Š”๋ฐ, ์ด ์ด์œ ๋กœ, ๋งค์šฐ ์ ์€ labeled anomalies๋ฅผ ๋งค์šฐ ๊นŠ์€ ์‹ ๊ฒฝ๋ง์„ ์Œ“๊ฒŒ ๋˜๋ฉด ๊ทธ ํŠน์ง•์„ ๋†“์น˜๊ธฐ ์‰ฝ๋‹ค๋Š” ๋ถ€๋ถ„์„ ์–ธ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ๋Œ€๋ถ€๋ถ„์˜ normal ๋ฐ์ดํ„ฐ๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ์ด๋ฃจ๋Š” ์ƒํ™ฉ์—์„œ ์‹ ๊ฒฝ๋ง์„ ๊นŠ๊ฒŒ ์Œ“๊ฒŒ ๋˜๋ฉด anomalies์˜ ํŠน์ง•์„ ์žƒ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ €์ž๋Š” one-hidden layer๊ฐ€ ๊ฐ€์žฅ fitํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

[4] Scalability

๋งˆ์ง€๋ง‰์œผ๋กœ ๋ฐ์ดํ„ฐ์˜ size์™€ dimension์— ๋”ฐ๋ฅธ ์ˆ˜ํ–‰ ์‹œ๊ฐ„์ด ์–ด๋А์ •๋„ ๋˜๋Š”์ง€, ์ฆ‰ complexity๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•œ ์‹คํ—˜์„ ์ถ”๊ฐ€์ ์œผ๋กœ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. size๋ฅผ ๊ณ ์ •์‹œ์ผœ๋†“๊ณ  dimension์˜ ๋ณ€ํ™”์— ๋”ฐ๋ฅธ ์ˆ˜ํ–‰ ์‹œ๊ฐ„์„ ์ธก์ •ํ•˜์˜€๊ณ  dimension์„ ๊ณ ์ •์‹œ์ผœ๋†“๊ณ  size์˜ ๋ณ€ํ™”์— ๋”ฐ๋ฅธ ์ˆ˜ํ–‰ ์‹œ๊ฐ„์„ ์ธก์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

\

๋Œ€๋ถ€๋ถ„์˜ ๋ชจํ˜•์—์„œ linear time์ด ์†Œ๋ชจ๋˜์—ˆ์ง€๋งŒ, DevNet์˜ ๊ฒฝ์šฐ data size์— ๋Œ€ํ•ด์„œ๋Š” ํฌ๊ธฐ๊ฐ€ 10๋งŒ์ด ๋„˜์–ด๊ฐ€๋Š” ์˜์—ญ์—์„œ๋„ ์ค€์ˆ˜ํ•œ ์†๋„๋ฅผ ๋ณด์ž„์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. dimension์— ๋Œ€ํ•ด์„œ๋„ FSNet๋ณด๋‹ค๋Š” ๋А๋ฆฌ์ง€๋งŒ ๋‹ค๋ฅธ ํƒ€ ๋ชจํ˜•์— ๋น„ํ•ด ๋” ๋น ๋ฅธ ์„ฑ๋Šฅ์„ ๋ณด์ž„์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

5.Conclusion

๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์ด ์ œ๊ธฐํ•˜๋Š” Contribution์— ๋Œ€ํ•ด ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ 3๊ฐ€์ง€๋กœ ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

1. _This paper introduces a novel framework and its instantiation DevNet for leveraging a few labeled anomalies with a prior to fulfill and end-to-end differentiable learning of anomaly scores_

๋Œ€๋ถ€๋ถ„์˜ Unsupervised learning์˜ ํ•œ๊ณ„์ ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ์ ์€ ์†Œ๋Ÿ‰์˜ limited anomaly labeled ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค๋Š” ์ 

2. _By a direct optimization of anomaly scores, DevNet can be trained much more data-efficiency, and performs significantly better in terms of both AUROC and AUC-PR compared other two-step deep anomaly detectors that focus on optimizing feature representations_

๊ธฐ์กด representation ์˜์—ญ๊ณผ detecting ์˜์—ญ์ด two-step์œผ๋กœ ๊ตฌ๋ถ„๋จ์œผ๋กœ์„œ indirectํ•˜๊ฒŒ optimizeํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ direct๋กœ optimizeํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆ

3. _Deep anomaly detectors can be well trained by randomly sampling negative examples from the anomaly contaminated unlabeled data and positive examples from the small labeled anomaly set._

๋˜ ๋‹ค๋ฅธ ์ธก๋ฉด์€ ๋ฐ”๋กœ normal distribution์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•œ reference score๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ anomaly ์ •๋„๋ฅผ ์ธก์ •ํ•˜๋ ค๊ณ  ํ•œ ์ ๋„ ๋…ํŠนํ•œ ์ ‘๊ทผ์ด๋ผ๊ณ  ํŒ๋‹จํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

6.Code Review

์ฝ”๋“œ ์‹ค์Šต์— ๋Œ€ํ•ด์„œ๋Š” ํฌ์ŠคํŒ…์œผ๋กœ ์ •๋ฆฌํ•˜๋Š” ๊ฒƒ์— ์˜๋ฏธ๊ฐ€ ํฌ์ง€ ์•Š๋‹ค๊ณ  ์ƒ๊ฐํ•˜์—ฌ ์ถ”๊ฐ€์ ์œผ๋กœ ํ•™์Šต์„ ํ•˜์‹œ๊ณ ์ž ํ•˜๋Š” ๋ถ„๋“ค์—๊ฒŒ๋Š” ๊ฐœ์ธ์ ์œผ๋กœ ๋…นํ™”ํ•ด๋†“์€ ์œ ํŠœ๋ธŒ ์˜์ƒ์˜ ์ขŒํ‘œ๋ฅผ ๋‚จ๊น๋‹ˆ๋‹ค. ๋”ฐ๋กœ ์ •๋ฆฌํ•˜์ง€ ๋ชปํ•œ ์  ์–‘ํ•ด๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค..!

https://www.youtube.com/watch?v=1lEtPCn-lcY

์ฝ”๋“œ ๋ฆฌ๋ทฐ : 55:30 ~ ๋๊นŒ์ง€

์ง€๊ธˆ๊นŒ์ง€ DevNet ๋ฆฌ๋ทฐ์˜€์Šต๋‹ˆ๋‹ค!!

๊ธด ๊ธ€ ์ฝ์–ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!!


Author Information

  • Yesung Cho (์กฐ์˜ˆ์„ฑ)

    • Knowledge Service Engineering, M.S Course, KS Lab (Prof. Mun.Y. Yi)

    • Anomaly Detection in Computer Vision

Reference & Additional materials

Github code

[1] ๋ณธ ๋…ผ๋ฌธ ์›์„œ์˜ github์€ ์•„๋ž˜ ๋งํฌ์™€ ๊ฐ™์œผ๋ฉฐ 'keras' ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆใ…ใ„ท.

github code : https://github.com/GuansongPang/deviation-network

[2] ๋™์ผ ์ €์ž์˜ ๋…ผ๋ฌธ์œผ๋กœ 'DevNet'์„ ๋ณด๋‹ค ๋” ๊ฐœ์„ ํ•œ ๋…ผ๋ฌธ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค (ํ•˜์ง€๋งŒ ํ•ต์‹ฌ์ ์ธ ์•„ํ‚คํ…์ฒ˜๋Š” ๊ฑฐ์˜ ๋™์ผํ•ฉ๋‹ˆ๋‹ค). ์•„๋ž˜ github ๋งํฌ๋ฅผ ๊ฐ€๋ฉด DevNet ๊ตฌํ˜„ ์ฝ”๋“œ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๊ณ  'Pytorch'๋กœ ๊ตฌํ˜„๋˜์–ด์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์œ ํŠœ๋ธŒ ์˜์ƒ์˜ ๋ฆฌ๋ทฐ๋„ ํ•ด๋‹น ์ฝ”๋“œ๋กœ ๋ฆฌ๋ทฐํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Pang, G., Ding, C., Shen, C., & Hengel, A. V. D. (2021). Explainable Deep Few-shot Anomaly Detection with Deviation Networks. arXiv preprint arXiv:2108.00462.arrow-up-right

github code : https://arxiv.org/abs/2108.00462 (this is what I reviewed in youtube)

\

Other materials

๋ณธ ํฌ์ŠคํŒ…์„ ์œ„ํ•ด ์ถ”๊ฐ€์ ์œผ๋กœ ๋‹ค์Œ reference๋“ค์„ ์ฐธ๊ณ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

[3] Mohammadi, B., M., & Sabokrou, M. (2021). Image/Video Deep anomaly detection: A survey. arXiv preprint arXiv:2103.01739

[4] Ruff, L., Vandermeulen, R. A., Gรถrnitz, N., Binder, A., Mรผller, E., Mรผller, K. R., & Kloft, M. (2019). Deep semi-supervised anomaly detection. arXiv preprint arXiv:1906.02694.

[5] Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S. A., Binder, A., ... & Kloft, M. (2018, July). Deep one-class classification. In International conference on machine learning (pp. 4393-4402). PMLR.

[6] Shi, P., Li, G., Yuan, Y., & Kuang, L. (2019). Outlier Detection Using Improved Support Vector Data Description in Wireless Sensor Networks. Sensors, 19(21), 4712.

Feedback

๋ณธ ๋…ผ๋ฌธ์„ ๋ฆฌ๋ทฐํ•œ ๋‹ค์Œ ์—ฌ๋Ÿฌ ํ•œ๊ณ„์ ์„ ๋А๊ผˆ์Šต๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ ๋Œ€ํ‘œ์ ์ธ indirectํ•œ optimize ๋ฐฉ๋ฒ•์œผ๋กœ ์†Œ๊ฐœ๋˜๋Š” representation learning์˜ ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ธ AE/GAN ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์„ SOTA ๋น„๊ต๋กœ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์ด ๋‚˜์™”๋˜ ๋‹น์‹œ 2019๋…„๋„ GAN ๊ธฐ๋ฐ˜์˜ AD ๋ชจ๋ธ์ด ๊ฐ€์žฅ ๋งŽ์ด ๋‚˜์˜ค๋Š” ์‹œ์ ์ด์—ˆ๋Š”๋ฐ ์ด๋ฅผ ๋น„๊ต์ง‘๋‹จ์œผ๋กœ ํ™œ์šฉํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ด ์˜๋ฌธ์ด์—ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, normal distribution์—์„œ ๋ฝ‘์€ anomaly score๊ฐ€ ์ •๋ง ์‹ค์ œ normal ๋ฐ์ดํ„ฐ์˜ anomaly score๋ฅผ ๊ณ„์‚ฐํ–ˆ์„ ๋•Œ์™€ ์œ ์‚ฌํ•˜๋‹ค๋Š” ๊ฒƒ์„ CLT๋ฅผ ํ†ตํ•ด์„œ๋งŒ ๋ณด์žฅ์ด ๋ ์ง€์— ๋Œ€ํ•œ ์˜๋ฌธ๋„ ํ’ˆ๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ ๋ณธ ๋…ผ๋ฌธ์˜ ๋ฐฉํ–ฅ์„ ์ƒ๊ฐํ•ด๋ณด์•˜์„ ๋•Œ ์ฐธ๊ณ ํ•  ๋งŒํ•œ ๋ถ€๋ถ„์€ ๋ฐ”๋กœ 'Reference Score'๋ฅผ ํ™œ์šฉํ–ˆ๋‹ค๋Š” ๋ถ€๋ถ„์ด์—ˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ตญ AD์—์„œ ํ•ต์‹ฌ์€ ๊ฑฐ์˜ Normal ๋ฐ์ดํ„ฐ ๋ฐ–์— ์—†๋Š” ๋ฌธ์ œ์˜€๋Š”๋ฐ, ์ด๋ฅผ reference score๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ anomaly๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” ๊ฒƒ์ด research idea๋กœ์„œ ์ฐธ๊ณ ๋ฅผ ํ•˜๊ฒŒ ๋˜์—ˆ๊ณ  ๋ณธ ์ˆ˜์—…์—์„œ ์ €ํฌ๋Š” ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ Normal ๋ฐ์ดํ„ฐ๋ฅผ unsupervised learning์œผ๋กœ ์ถฉ๋ถ„ํžˆ ํ•™์Šตํ•œ ์–ด๋–ค ๋ชจํ˜•์ด Abnormal์˜ ํŠน์ง•์„ ์ž˜ ํ•™์Šตํ•œ ๋ชจํ˜•์ด ๋‚ด๋†“๋Š” ๊ฒฐ๊ณผ๋ฅผ referenceํ•˜๋ฉด ๋” ์ข‹์„ ๊ฒƒ์ด๋‹ค๋ผ๋Š” ์•„์ด๋””์–ด๋ฅผ ์ฐฉ์•ˆํ•ด Team Project ์ฃผ์ œ๋ฅผ ์ƒ๊ฐํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ๋ณธ ๋…ผ๋ฌธ ์ดํ›„์—๋„ ๊ณ„์†ํ•ด์„œ Unbalanced ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๊ณ  ํ•˜๋Š” ๋งŽ์€ ์—ฐ๊ตฌ๋“ค์ด ๋‚˜์˜ค๊ณ  ์žˆ๋Š” ์ƒํ™ฉ์ž…๋‹ˆ๋‹ค. ์ด ๋ถ€๋ถ„์— ๋Œ€ํ•ด ๋” ๊ณต๋ถ€๋ฅผ ํ•ด๋ด์•ผ๊ฒ ๋‹ค๋Š” ์ƒ๊ฐ์ด ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.

Last updated