Points as queries: Weakly semi-supervised object detection by points

Liangye Chen et al. / Points as Queries- Weakly Semi-supervised Object Detection by Points / CVPR-2021

์ €๋Š” ํ˜„์žฌ ๊ณต์‚ฌํ˜„์žฅ์—์„œ ๋‚™์ƒ ์˜ˆ๋ฐฉ์„ ์œ„ํ•ด safety harness(์•ˆ์ „์กฐ๋ผ)์™€ lifeline(์•ˆ์ „์„ )์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํƒ์ง€ํ•˜๋Š” ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌผ์ฒด ํƒ์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ์ง€๋„ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ๋”ฅ๋Ÿฌ๋‹์„ ํ™œ์šฉํ•˜๋ ค๊ณ  ํ•˜๋Š”๋ฐ, ์ด๋ฅผ ์œ„ํ•ด์„œ๋Š” ๋ผ๋ฒจ๋ง๋œ ๋ฐ์ดํ„ฐ์…‹์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ํ˜„์žฌ ์˜จ๋ผ์ธ ์ƒ์—๋Š” ๋ผ๋ฒจ๋ง ์ž‘์—…์ด ์ด๋ฏธ ๋œ ๋ฐ์ดํ„ฐ์…‹๋“ค์ด ๋งŽ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ €ํฌ๊ฐ€ ์›ํ•˜๋Š” safety harness์™€ lifeline๊ด€๋ จ ๋ฐ์ดํ„ฐ์…‹์€ ์กด์žฌํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ง์ ‘ ๋ฐ์ดํ„ฐ์…‹์„ ์ˆ˜์ง‘ํ•˜๊ณ , ์ˆ˜์ž‘์—…์œผ๋กœ ๋ผ๋ฒจ๋ง์„ ์ง„ํ–‰ํ•ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค.

700์žฅ์ด ๋„˜๋Š” ์‚ฌ์ง„์— ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋ฅผ ๊ทธ๋ฆฌ๋ฉฐ ๋ผ๋ฒจ๋ง์„ ํ•˜๋‹ค๋ณด๋‹ˆ, ์‹œ๊ฐ„ ๋น„์šฉ์ด ๋งŽ์ด ๋“ ๋‹ค๋Š” ๊ฒƒ์„ ๋А๊ผˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ, ๋ผ๋ฒจ๋ง ์ž‘์—…์„ ์•ˆํ•˜๊ฑฐ๋‚˜ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด์„œ ๋”ฅ๋Ÿฌ๋‹์„ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜๋Š” ์—†์„๊นŒํ•˜๋Š” ์ƒ๊ฐ์ด ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ ์ด๋ฒˆ์— ์†Œ๊ฐœ๋“œ๋ฆด ๋…ผ๋ฌธ์€ 2021๋…„ CVPR์— ๋ฐœํ‘œ๋œ ๋…ผ๋ฌธ์œผ๋กœ, ์ ์€ ์–‘์˜ fully-labeled images์™€ ๋Œ€๋‹ค์ˆ˜์˜ weakly-labeled images by points๋กœ ๊ตฌ์„ฑ๋œ ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•œ weakly semi-supervised ๊ฐ์ฒด ํƒ์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

๋…ผ๋ฌธ ๋งํฌ https://arxiv.org/abs/2104.07434

1. Problem Definition

์ด ๋…ผ๋ฌธ์€ ์ง€๋„ํ•™์Šต ๋”ฅ๋Ÿฌ๋‹์„ ์œ„ํ•œ ๋ผ๋ฒจ๋ง ์ž‘์—… ๋น„์šฉ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ ์„ ํ™œ์šฉํ•œ weakly annotated images๋กœ ๊ตฌ์„ฑ๋œ ๋ฐ์ดํ„ฐ์…‹์„ ์ œ์•ˆํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€ ์ˆ˜์ค€(์‚ฌ์ง„๊ณผ ์นดํ…Œ๊ณ ๋ฆฌ๋งŒ ์žˆ๋Š”)์˜ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๋‹ฌ๋ฆฌ, ์  ์ˆ˜์ค€์˜ ๋ผ๋ฒจ๋ง(๋ฌผ์ฒด ์œ„์— ์ ๊ณผ ์นดํ…Œ๊ณ ๋ฆฌ๊ฐ€ ์žˆ๋Š”)์€ ๋ผ๋ฒจ๋ง ์ž‘์—… ๋น„์šฉ๋„ ์ค„์ด๊ณ  ๊ฐ์ฒด ์ˆ˜์ค€์˜ ์ •๋ณด๋Š” ์ œ๊ณตํ•˜์—ฌ ๊ฐ์ฒด ํƒ์ง€์— ์ ์ ˆํ•˜๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.

๋˜ํ•œ, ์ด๋Ÿฐ ๋ฐ์ดํ„ฐ์…‹์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ๊ธฐ์กด์˜ ๊ฐ์ฒด ํƒ์ง€ ๋ฐฉ๋ฒ•์˜ ๋‹จ์ ์„ ํ•ด๊ฒฐํ•œ ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์ธ PointDETR์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๊ฐ์ฒด ๋‚ด ์ ๋“ค์„ ์ž…๋ ฅ๊ฐ’์œผ๋กœ ๋ฐ›๊ณ , ์ด ์ ๋“ค์„ object queries๋กœ ๋ณ€ํ™˜ํ•˜๊ณ , queries์— ๋Œ€ํ•œ object box์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์ ์œผ๋กœ ๋ผ๋ฒจ๋ง๋œ ๋ฐ์ดํ„ฐ์…‹์— ์ ํ•ฉํ•˜๊ณ , weakly semi-supervised detection task์˜ ํšจ์œจ์„ฑ์„ ์ž˜ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

2. Motivation

๊ฐ์ฒด ํƒ์ง€๋Š” ์ปดํ“จํ„ฐ ๋น„์ „์—์„œ ์ค‘์š”ํ•œ ๋ฌธ์ œ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ผ๋ฒจ๋งํ•˜๊ธฐ์—๋Š” ์‹œ๊ฐ„์ด ๋งŽ์ด ์†Œ์š”๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ฐ๊ฐ์˜ ๊ฐ์ฒด ๋‹น ์ •๊ตํ•œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋ฅผ ๋ผ๋ฒจ๋งํ•˜๋Š”๋ฐ 10-35์ดˆ ์ •๋„ ๊ฑธ๋ฆฝ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ, ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง ๋น„์šฉ์„ ๊ฐ์†Œํ•˜๊ธฐ ์œ„ํ•ด weakly supervised object detection(WSOD) ์™€ semi-supervised object detection(SSOD) ๋ฐฉ๋ฒ•์ด ์ œ์•ˆ๋ฉ๋‹ˆ๋‹ค.

WSOD(weakly supervised object detection)๋Š” ์ด๋ฏธ์ง€ ์ˆ˜์ค€์˜ ๋ผ๋ฒจ(์นดํ…Œ๊ณ ๋ฆฌ๋งŒ ์žˆ๋Š”)๊ณผ ๊ฐ™์€ weak annotations์œผ๋กœ ๋œ ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ •๊ตํ•œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋ณด๋‹ค ๋ผ๋ฒจ๋งํ•˜๊ธฐ๊ฐ€ ์‰ฝ์Šต๋‹ˆ๋‹ค.


SSOD(semi-supervised object detection)๋Š” ์ž‘์€ ์–‘์˜ box-level(๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์™€ ์นดํ…Œ๊ณ ๋ฆฌ๊ฐ€ ์žˆ๋Š”) labeled images์™€ ๋งŽ์€ ์–‘์˜ ๋ผ๋ฒจ๋ง ๋˜์ง€ ์•Š์€ ์ด๋ฏธ์ง€๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.


๋น„๋ก ์ด ๋ฐฉ๋ฒ•๋“ค์ด ๋ผ๋ฒจ๋ง ๋น„์šฉ์€ ๋‚ฎ์ท„์ง€๋งŒ, ์„ฑ๋Šฅ์€ ์—ฌ์ „ํžˆ ์ง€๋„ํ•™์Šต์— ๋น„ํ•ด ๋–จ์–ด์ง‘๋‹ˆ๋‹ค. ์ด์— ๋Œ€ํ•œ ๋ณด์™„์ฑ…์œผ๋กœ, WSOD์™€ SSOD๋ฅผ ํ•ฉ์นœ weakly semi-supervised object detection methods(WSSOD) ๊ฐ€ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

WSSOD๋Š” ์ž‘์€ ์–‘์˜ box-level labeled images์™€ ๋งŽ์€ ์–‘์˜ weakly labeled(์—ฌ๊ธฐ์„œ๋Š” ์ด๋ฏธ์ง€ ์ˆ˜์ค€์˜ ๋ผ๋ฒจ๋ง) images๋“ค์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜, ์ด๋ฏธ์ง€ ์ˆ˜์ค€์˜ ๋ผ๋ฒจ๋ง์€ ๋ชจ๋“  ๊ฐ์ฒด์˜ instance-level ์ •๋ณด๋ฅผ ๊ฐ–๊ณ  ์žˆ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ์ฒด ํƒ์ง€์— ์ ํ•ฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.


์ด์— ๋Œ€ํ•œ ํ•ด๊ฒฐ์ฑ…์œผ๋กœ, point์„ ํ†ตํ•œ ์ด๋ฏธ์ง€ ๋ผ๋ฒจ๋ง์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

point์— ์˜ํ•œ ์ด๋ฏธ์ง€ ๋ผ๋ฒจ๋ง์€ 2๊ฐ€์ง€ ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  1. image-level ๋ผ๋ฒจ๋ง๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ, ์ ์€ ๊ฐ์ฒด์˜ ์นดํ…Œ๊ณ ๋ฆฌ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๊ฐ์ฒด ์œ„์น˜(instance position)์˜ ์‚ฌ์ „ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค

  2. ์  ์œ„์น˜๋ฅผ ๊ฐ์ฒด ์ค‘์•™์— ๋„ฃ๋“ , ๊ฐ€์žฅ์ž๋ฆฌ์— ๋„ฃ๋“  ํฌ๊ฒŒ ์ƒ๊ด€์ด ์—†์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ๋ผ๋ฒจ๋ง ๋น„์šฉ์€ image-level ๋ผ๋ฒจ๋ง๊ณผ ํฐ ์ฐจ์ด๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค

๊ทธ๋Ÿฌ๋‚˜, ํ˜„์žฌ ๋Œ€๋ถ€๋ถ„์˜ ํƒ์ง€ ๋ชจ๋ธ๋“ค์€ ์  ๋ผ๋ฒจ๋ง ๊ธฐ๋ฐ˜์œผ๋กœ object box๋ฅผ ์˜ˆ์ธกํ•˜๋Š”๋ฐ ์–ด๋ ค์›€์„ ๊ฒช์Šต๋‹ˆ๋‹ค.

์™œ๋ƒํ•˜๋ฉด, ๋Œ€๋ถ€๋ถ„ FPN(Feature Pyramid Network)์„ ๊ธฐ๋ณธ ๊ตฌ์„ฑ์œผ๋กœ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. FPN์€ object box๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด multi-level feature map์„ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, point annotation์€ single-level feature์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

์ด์— ๋Œ€ํ•œ ํ•ด๊ฒฐ์ฑ…์œผ๋กœ, ๋ณธ ์—ฐ๊ตฌ๋Š” DETR(detection with transformer)์— point encoder์„ ๋”ํ•œ ์ƒˆ๋กœ์šด ํƒ์ง€ ๋ชจ๋ธ์ธ Point DETR์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ƒˆ๋กœ์šด ๋ชจ๋ธ์€ ๋ผ๋ฒจ๋ง๋œ ์ ์„ ํ†ตํ•ด ์ •ํ™•ํ•˜๊ฒŒ object boxes๋ฅผ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, object boxes๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด single-level feature map์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๊ธฐ์กด์˜ DETR๊ณผ ๋‹ค๋ฅธ ์ ์€, ๋ผ๋ฒจ๋ง๋œ ์ ์˜ ์œ„์น˜์™€ ์นดํ…Œ๊ณ ๋ฆฌ๋ฅผ point encoder์„ ํ†ตํ•ด object queries๋กœ ์ธ์ฝ”๋”ฉํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด, ์ ๊ณผ object queries์‚ฌ์ด ์ผ๋Œ€์ผ ๋Œ€์‘๊ด€๊ณ„๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ํƒ์ง€ ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด, ๋ณธ ์—ฐ๊ตฌ๋Š” DETR์ฒ˜๋Ÿผ box predictions์„ ๋ฐ”๋กœ ๋งŒ๋“ค๊ธฐ ๋ณด๋‹จ, ํฌ์ธํŠธ ์œ„์น˜์— ๋Œ€ํ•œ ํŒŒ์ƒ์ ์œผ๋กœ ์ƒ์ž ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ์˜ ์šฐ์ˆ˜์„ฑ์„ ๋ณด์ด๊ธฐ ์œ„ํ•ด, MS-COCO๋ฐ์ดํ„ฐ์…‹์„ ๊ธฐ์ค€์œผ๋กœ ๋‹ค๋ฅธ point-based detector์ธ FCOS์™€ ๋น„๊ตํ–ˆ์Šต๋‹ˆ๋‹ค.

์ฃผ์š” ๊ธฐ์—ฌ์  3๊ฐ€์ง€๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค

  1. ์ ์€ ์–‘์˜ fully annotated images์™€ ๋งŽ์€ ์–‘์˜ weakly annotated images by points๋กœ ๊ตฌ์„ฑ๋œ weakly semi-supervised object detection task๋ฅผ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์…‹์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€ ์ˆ˜์ค€์˜ ์ด๋ฏธ์ง€์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ, ์ด ์„ธํŒ…์€ instance-level ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๊ณ , ๋ผ๋ฒจ๋ง ๋น„์šฉ๋„ ์ฐจ์ด๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.

  2. ์œ„ ๋ฐ์ดํ„ฐ์…‹์— ๊ธฐ๋ฐ˜ํ•ด์„œ, ๋ณธ ์—ฐ๊ตฌ๋Š” ๊ธฐ์กด์˜ ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ์˜ ๋‹จ์ ์„ ๋ถ„์„ํ•˜๊ณ , ์‰ฝ๊ณ  ๊ฐ„๋‹จํ•œ Point DETR์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

  3. ์ƒˆ๋กœ์šด ํƒ์ง€ ๋ชจ๋ธ์€ ๋‹ค์–‘ํ•œ ๊ตฌ์„ฑ์˜ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋Œ€๋ถ€๋ถ„์˜ ํƒ์ง€ ๋ชจ๋ธ๋ณด๋‹ค ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค.

3. Method

WSSOD(weakly semi-supervised object detection)์€ ์ ์€ ์–‘์˜ instance-level(box-level) labled images์™€ ๋งŽ์€ ์–‘์˜ weakly image-level labeled images๋ฅผ ํ›ˆ๋ จ์šฉ ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด๋ฏธ์ง€ ์ˆ˜์ค€์˜ ๋ผ๋ฒจ๋ง๋œ ์‚ฌ์ง„์€ ๊ฐ์ฒด ์ •๋ณด๋ฅผ ๊ฐ–๊ณ  ์žˆ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— WSSOD์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๊ทธ๋ ‡๋‹ค๋ฉด, ๋ผ๋ฒจ๋ง ๋น„์šฉ ๋ถ€๋‹ด์€ ์—†๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์€ ์—†์„๊นŒ์š”?

๋ณธ ์—ฐ๊ตฌ๋Š” weakly labeled images์— point annotation์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. Point annotation์€ weakly semantic segmentation์— ์‚ฌ์šฉ๋˜์—ˆ์ง€๋งŒ, ๊ฐ์ฒด ํƒ์ง€์—๋Š” ์ž˜ ํ™œ์šฉ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

๊ฐ์ฒด ํƒ์ง€์—์„œ, ๋ณธ ์—ฐ๊ตฌ๋Š” point annotation์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•ฉ๋‹ˆ๋‹ค:

๊ฐ์ฒด ๋‚ด ์œ„์น˜ํ•˜๊ณ , ๊ฐ์ฒด ํด๋ž˜์Šค๋ฅผ ์นดํ…Œ๊ณ ๋ฆฌ๋กœ ์ทจ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค.

์ฆ‰, ๊ฐ์ฒด๋ฅผ (x,y,c) ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” point annotations์€ ๊ฐ์ฒด ์–ด๋””๋“ ์ง€ ์œ„์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด, ๋ผ๋ฒจ๋ง ๋น„์šฉ ๋ถ€๋‹ด์„ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ „์ฒด์ ์ธ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ ์€ ์–‘์˜ ์™„์ „ํžˆ ๋ผ๋ฒจ๋ง๋œ ์ด๋ฏธ์ง€์™€ ๋งŽ์€ ์–‘์˜ ์ ์œผ๋กœ ๋ผ๋ฒจ๋ง๋œ ์ด๋ฏธ์ง€์™€ ํ•จ๊ป˜, ์ค€์ง€๋„ํ•™์Šต์—์„œ ์‚ฌ์šฉ๋˜๋Š” self training์„ ํ›ˆ๋ จ ๊ธฐ๋ณธ๊ฐ’์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

  1. ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ผ๋ฒจ๋ง๋œ ์ด๋ฏธ์ง€๋ฅผ ํ†ตํ•ด teacher model์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค

  2. ํ›ˆ๋ จ๋œ teacher model์„ ํ™œ์šฉํ•˜์—ฌ weakly point annotated images์˜ pseudo-labels์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค

  3. Fully labeled images์™€ pseudo-labeled images๋กœ student model์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค

๊ธฐ์กด์˜ ํƒ์ง€ ๋ชจ๋ธ์€ 2๊ฐ€์ง€ ๋ถ„๋ฅ˜๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค.

  1. Multi-level detector(FCOS) : point annotations์€ feature-level ์ •๋ณด๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์—, point annotation์œผ๋กœ object box๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค

  2. Single-level detector(Faster R-CNN) : feature map levels์„ ์„ ํƒํ•˜์ง€ ์•Š์•„๋„ ๋ ์ง€๋ผ๋„, bad performance๋‚˜ point annotation์— ์—„๊ฒฉํ•œ ์กฐ๊ฑด์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

3.1 Point DETR

point annotations์„ ๊ฐ€์ง„ WSSOD์—์„œ ๊ธฐ์กด์˜ detector์˜ ์•ฝ์ ์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด, ๋ณธ ์—ฐ๊ตฌ๋Š” ์ƒˆ๋กœ์šด detector์ธ Point DETR์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” point annotations์„ object queries๋กœ ๋ณ€ํ™˜ํ•˜๊ณ , ๊ฐ๊ฐ์˜ object query์—์„œ image features์„ ์ถ”์ถœํ•˜๊ณ , ๊ทธ์— ๋Œ€์‘ํ•˜๋Š” object box๋ฅผ ๊ฒฐ๊ณผ๋กœ ๋ƒ…๋‹ˆ๋‹ค.

์šฐ์„ , DETR์— ๋Œ€ํ•ด ์•Œ์•„๋ด…๋‹ˆ๋‹ค.

DETR์€ end-to-end set-based object detector์ž…๋‹ˆ๋‹ค. DETR์€ CNN backbone, encoder-decoder transformer, prediction head๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

DETR์€ ๋จผ์ € CNN backbone์—์„œ single-level 2D feature map์„ ์ถ”์ถœํ•˜๊ณ , flattenํ•˜๊ณ , positional encoding์œผ๋กœ ๋ณด์ถฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ๋‹ค์Œ, encoder-decoder transformer๊ฐ€ ๊ณ ์ •๋œ ์ˆ˜์˜ object queries๋ฅผ ์ž…๋ ฅ๊ฐ’์œผ๋กœ ๋ฐ›๊ณ , 1D image feature embeddingํ˜•ํƒœ๋กœ ๊ฒฐ๊ณผ๊ฐ’์„ ์‚ฐ์ถœํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, transformer์˜ output embeddings์€ prediction head๋กœ ์ „๋‹ฌ๋˜์–ด, ์–ด๋–ค ํด๋ž˜์Šค์— ์†ํ•˜๋Š”์ง€ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

Point DETR์€ DETR์˜ ๋งŽ์€ ๋ถ€๋ถ„์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์ ์€, Point DETR์€ point encoder์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. Point encoder๋Š” point annotations์„ object queries๋กœ ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค. DETR์˜ object queries์™€ ๋‹ฌ๋ฆฌ, ์ด object queries๋Š” ๊ฐ์ฒด instance์˜ position๊ณผ category๋ฅผ ํฌํ•จํ•˜๋Š” instance embeddings์— ํŠน์ˆ˜ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ, ์ด object queries๋Š” object instances์™€ ์ผ๋Œ€์ผ ๋Œ€์‘์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, object queries์˜ ์ˆ˜๋Š” DETR์—์„œ์ฒ˜๋Ÿผ ๊ณ ์ •๋œ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ์ด๋ฏธ์ง€ ๋‚ด object instance์˜ ์ˆ˜์— ๋”ฐ๋ผ ๋ณ€ํ™”ํ•ฉ๋‹ˆ๋‹ค.

ํ•™์Šตํ•˜๋Š” ๋™์•ˆ, ๊ฐ๊ฐ์˜ object query์˜ loss๋ฅผ Lbox๋ผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด, ์ด๋ฏธ ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ์žˆ๊ณ , object box๋งŒ regressํ•˜๋ฉด ๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. Lbox๋Š” DETR์—์„œ ์ •์˜ํ•œ ๊ฒƒ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.


Point encoder : point DETR์—์„œ, point annotations์„ object queries๋กœ ์ธ์ฝ”๋”ฉํ•˜๋Š” ๊ฒƒ์€ point encoder์—๊ฒŒ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

point annotation(x,y,c) ๋Š” 2D ์ขŒํ‘œ (x,y) ์™€ ์นดํ…Œ๊ณ ๋ฆฌ ์ธ๋ฑ์Šค c๋กœ ๋ถ„ํ•ด๋ฉ๋‹ˆ๋‹ค. (x,y)์— ๊ทผ๊ฑฐํ•ด์„œ, position embedding์€ fixed spatial positional encodings์—์„œ ์ถ”์ถœ๋œ๋‹ค. category embedding์€, category index c์— ์˜ํ•ด ๋ฏธ๋ฆฌ ์ •์˜๋œ category embedding์œผ๋กœ ๋ถ€ํ„ฐ ์–ป์–ด์ง‘๋‹ˆ๋‹ค. ์ตœ์ข…์ ์œผ๋กœ, ์ด sum operation์„ ํ†ตํ•ด ์ด embedding์„ ํ•ฉ์ณ์„œ object query๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.

4. Experiment

Dataset

  • COCO 2017 detection dataset (118k training images, 5k val images)

  • Point annotated setting์— ๋Œ€ํ•ด์„œ, training images์˜ 5%, 10%, 20%, 30%, 40%, 50% ๋ฅผ fully labeled set๋กœ ํ•˜๊ณ , ๋‚˜๋จธ์ง€๋ฅผ weakly labeled set์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค

  • Weakly labeled set์— ๋Œ€ํ•ด, ๊ฐ๊ฐ์˜ object์— ๋Œ€ํ•ด 2๊ฐ€์ง€ point annotation ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค

    1. ๋งŒ์•ฝ object๊ฐ€ instance segmentation์„ ๊ฐ–์œผ๋ฉด, instance mask๋กœ๋ถ€ํ„ฐ point๋ฅผ ๋žœ๋ค ์ถ”์ถœํ•œ๋‹ค

    2. ๋งŒ์•ฝ ์•ˆ ๊ฐ–์œผ๋ฉด, bounding box์—์„œ point๋ฅผ ๋žœ๋ค ์ถ”์ถœํ•œ๋‹ค

Training

  • 2๊ฐœ์˜ ๋ชจ๋ธ ์กด์žฌ

    • Teach model : Point DETR, FCOS, Faster R-CNN

    • Student model : FCOS (student model์€ teacher model์˜ ํšจ์œจ์„ฑ์„ ํ‰๊ฐ€ํ•˜๋Š”๋ฐ๋งŒ ์‚ฌ์šฉ๋จ)

    • Student model์— ๋Œ€ํ•ด, student๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด fully labeled images์™€ teacher model์— ์˜ํ•ด ์ƒ์„ฑ๋œ pseudo-labeled images๋ฅผ ํ•ฉ์นฉ๋‹ˆ๋‹ค


Results

  • Supervised๋Š”, student model์„ fully annotated images๋กœ๋งŒ ํ•™์Šตํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  • FCOS์™€ Point DETR์ด Supervised๋ณด๋‹ค ์„ฑ๋Šฅ์ด ์ข‹์€ ๊ฒƒ์œผ๋กœ ๋ณด์•„, pseudo-boxes๋กœ ์ธํ•œ ์ด์ ์ด ์ฆ๋ช…๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰**, point annotations์ด ์žˆ๋Š” images๊ฐ€ detection task ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.**

  • ๊ฒŒ๋‹ค๊ฐ€, Point DETR์ด FCOS๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋†’์•˜์Šต๋‹ˆ๋‹ค.


Ablation study

  • Effect of Point Encoder

    table3

    • Positional embedding๋งŒ ๊ฐ€์ง„ point encoder๊ฐ€ catergory embedding๋งŒ ๊ฐ€์ง„ Point Encoder๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋†’์Šต๋‹ˆ๋‹ค.

    • ์ฆ‰, ๋ณธ ์—ฐ๊ตฌ์˜ ๋ฐฉ๋ฒ•์€ ์˜ค์ง object boxes๋ฅผ regressํ•˜๊ธฐ ๋•Œ๋ฌธ์—, positional embeddings์—†์ด bounding box ๊ด€์ ์—์„œ ์ƒ๋Œ€์ ์ธ point๋กœ ํ•™์Šตํ•˜๊ธฐ๋Š” ์–ด๋ ต๋‹ค๋Š” ๊ฑธ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค

    • Catergory embedding์„ ํ†ตํ•ด์„œ๋„ object shape๊ฐ™์€ ์‚ฌ์ „ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค

  • Effect of Student Model

    table3

    • Student model๋กœ FCOS์™€ RetinaNet์„ ๋น„๊ตํ•˜์—ฌ ๋ชจ๋ธ์˜ ๊ฒฌ๊ณ ํ•จ์„ ๋น„๊ตํ–ˆ์Šต๋‹ˆ๋‹ค

    • ๋ณธ ์—ฐ๊ตฌ์˜ ๋ชจ๋ธ์ด FCOS๋ณด๋‹ค 2.1AP ๋†’์€ ๊ฒƒ์„ ํ†ตํ•ด, student model์— robustํ•จ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค

  • Comparison with another single-level detector

    • Single-level feature detector์ธ Faster R-CNN๊ณผ ๋น„๊ต ๊ฒฐ๊ณผ, ๋ณธ ์—ฐ๊ตฌ์˜ ๋ชจ๋ธ์ด 1.9AP ๋†’์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค

  • Effect of Point Location

    • ๊ฐ์ฒด ๋‚ด ์ค‘์‹ฌ์ ๊ณผ ์ค‘์‹ฌ์ ์ด ์•„๋‹Œ ์ ์˜ ์œ„์น˜ ์‚ฌ์ด์˜ ์„ฑ๋Šฅ ๋น„๊ต ๊ฒฐ๊ณผ, ์ฐจ์ด๊ฐ€ ์—†์—ˆ์Šต๋‹ˆ๋‹ค

    • ์ฆ‰, ๋ณธ ๋ชจ๋ธ์˜ ์—ฐ๊ตฌ๋Š” ์ ์˜ ์œ„์น˜์™€ robustํ•ฉ๋‹ˆ๋‹ค

  • Absolute vs. Relative Regression

    • ๋ณธ ์—ฐ๊ตฌ์˜ ๋ฐฉ๋ฒ•์€ object boxes์„ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด relative regression์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค

    • DETR์—์„œ๋Š” Absolute regression์„ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ, ์ดˆ๋ก์ƒ‰ ์‹œ๊ณ„์™€ ๊ฐ™์ด ์ ๊ณผ bouding box๋ฅผ ์ผ์น˜์‹œํ‚ค์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค

  • Effect of Point Annotations

    • Point DETR์€ mAP์™€ recall ๊ด€์ ์—์„œ DETR๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋†’์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค

    • Point annotations๊ณผ ํ•จ๊ป˜, ๋ณธ ์—ฐ๊ตฌ์˜ ๋ฐฉ๋ฒ•์€ classification score์˜ quality๋กœ๋ถ€ํ„ฐ ๋ฐฉํ•ด๋ฅผ ๋ฐ›์ง€ ์•Š์Šต๋‹ˆ๋‹ค

    • DETR์—์„œ๋Š” Absolute regression์„ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ, ์ดˆ๋ก์ƒ‰ ์‹œ๊ณ„์™€ ๊ฐ™์ด ์ ๊ณผ bouding box๋ฅผ ์ผ์น˜์‹œํ‚ค์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค

5. Conclusion

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” weakly semi-supervised detection task์—์„œ point-annotations์˜ ํšจ์œจ์„ฑ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, point annotations์ด ๊ธฐ์กด์˜ detector์™€๋Š” ์ž˜ ๋งž์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” Point DETR ๋ชจ๋ธ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ DETR๊ณผ ๋‹ค๋ฅด๊ฒŒ, point encoder์„ ์ ์šฉํ•˜์—ฌ point annotations๊ณผ objects ์‚ฌ์ด์˜ ์ผ๋Œ€์ผ ๋Œ€์‘์„ ๊ฐ€๋Šฅ์ผ€ ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์˜ ์ ‘๊ทผ๋ฒ•์€ ๊ฐ„๋‹จํ•˜๊ณ  ์‰ฝ๊ฒŒ ์ ์šฉ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. COCO ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•˜์—ฌ ๊ธฐ์กด์˜ ๋‹ค๋ฅธ ๋ชจ๋ธ๊ณผ ๋น„๊ตํ•˜์—ฌ ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

๊ฐœ์ธ์ ์ธ ์˜๊ฒฌ

์ง€๋„ํ•™์Šต์˜ ๋ฌธ์ œ์ ์ธ ๋ผ๋ฒจ๋ง ๋น„์šฉ์„ ์ค„์ด๊ณ  ์„ฑ๋Šฅ์€ ์œ ์ง€ํ•˜๋Š” ํƒ์ง€ ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ ์ž์„ธํžˆ ๋ฌ˜์‚ฌ๋˜์–ด ์žˆ์–ด์„œ ์ข‹์•˜์Šต๋‹ˆ๋‹ค. Weakly semi-supervised object detection์—์„œ ์ด๋ฏธ์ง€ ์ˆ˜์ค€์˜ weakly labeled images์˜ ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๊ฐ์ฒด ํƒ์ง€๊ฐ€ ์•„๋‹ˆ๋ผ semantic segmentation์—์„œ ์‚ฌ์šฉ๋˜์—ˆ๋˜ ์  ๊ธฐ๋ฐ˜์˜ ์ง€๋„ํ•™์Šต์„ ํ™œ์šฉํ–ˆ๋‹ค๋Š” ์ ์ด ํฅ๋ฏธ๋กœ์› ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ชจ๋ธ์— ์“ฐ์ธ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๊ฐ€ ๊ทธ๋ฆผ์œผ๋กœ ์ž˜ ํ‘œํ˜„๋˜์–ด ์ž…๋ ฅ๊ฐ’๊ณผ ๊ทธ๊ฒƒ์ด ์–ด๋–ป๊ฒŒ ์ฒ˜๋ฆฌ๋˜๊ณ  ๊ฒฐ๊ณผ๊ฐ’์ด ์–ด๋–ค ํ˜•ํƒœ์ธ์ง€ ์‰ฝ๊ฒŒ ์“ฐ์—ฌ์žˆ์–ด์„œ ์ดํ•ดํ•˜๊ธฐ ์‰ฌ์› ์Šต๋‹ˆ๋‹ค. ์‚ฌ์ง„์ด ์ œ๊ณต๋˜์–ด์„œ ์—ฐ๊ตฌ์˜ ๊ณผ์ •๊ณผ ๊ฒฐ๊ณผ๋ฅผ ์•Œ์•„๋ณด๊ธฐ๊ฐ€ ์œ ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ธฐ์กด์—๋Š” point-based detection์ด ๊ฑฐ์˜ ํ™œ์šฉ๋˜์ง€ ์•Š์•˜์—ˆ๋Š”๋ฐ, ๋‹ค๋ฅธ ๋ถ„์•ผ์˜ ๊ธฐ์ˆ ์„ ๊ฐ์ฒด ํƒ์ง€์—์„œ ์ ์ ˆํžˆ ํ™œ์šฉํ•œ๋‹ค๋Š” ์ ์„ ๋ณด๊ณ , ์ €๋„ ๋‹ค๋ฅธ ๋ถ„์•ผ์—์„œ ํ™œ์šฉ๋˜๋Š” ๊ธฐ์ˆ ๋“ค์— ๊ด€์‹ฌ์„ ๊ธฐ์šธ์ด๊ณ  ๊ฐ์ฒด ํƒ์ง€์— ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ์ž˜ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์„๊นŒ ๊ณ ๋ฏผํ•ด๋ด์•ผ ๊ฒ ๋‹ค๋Š” ์ƒ๊ฐ์ด ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.

๋˜ํ•œ, ์ ์˜ ์œ„์น˜๋Š” ํฌ๊ฒŒ ์ค‘์š”ํ•˜์ง€ ์•Š๋‹ค๋Š” ์ ์ด labeling cost๋ฅผ ์ค„์ด๋Š”๋ฐ ํฐ ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ, ์ œ๊ฐ€ ๋ผ๋ฒจ๋งํ•˜๋ ค๊ณ  ํ•˜๋Š” ํ—ฌ๋ฉง์ด๋‚˜ safety harness ๊ฐ™์€ ๊ฒฝ์šฐ๋Š” ํฌ๊ธฐ๊ฐ€ ์ž˜ ๋ณด์ด์ง€๋งŒ, ์•ˆ์ „ ์—ฐ๊ฒฐ์„ ์˜ ๊ฒฝ์šฐ ์–‡์€ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์€๋ฐ ์ ์„ ํ†ตํ•ด์„œ๋„ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•œ์ง€ ์˜๋ฌธ์ด ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.

์•„์‰ฌ์šด ์ ์€ ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” relative regression์„ ์‚ฌ์šฉํ•˜์˜€๊ณ , ๊ธฐ์กด์˜ ๋ชจ๋ธ์€ absolute regression์„ ์‚ฌ์šฉํ•˜์—ฌ์„œ ๋ณธ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ์ ๊ณผ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๊ฐ„ ๋งค์นญ ์—๋Ÿฌ๊ฐ€ ์ค„์—ˆ๋‹ค๋Š” ์ ์— ๋Œ€ํ•œ ์ด์œ ๊ฐ€ ๋ฐํ˜€์ง€์ง€ ์•Š์•„์„œ ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ , weakly labeled images๊ฐ€ ์ „์ฒด ์ด๋ฏธ์ง€ ์ค‘ ์–ด๋А ์ •๋„๋ฅผ ์ฐจ์ง€ํ•ด์•ผ ํ•˜๋Š”์ง€ ์ตœ์ ์˜ ๋น„์œจ๋„ ๋‚˜์ค‘์— ์—ฐ๊ตฌํ•ด๋ด์•ผ ํ•  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ถ”๊ฐ€๋กœ, point annotation๋œ ์ด๋ฏธ์ง€๊ฐ€ pseudo bouding box๋ฅผ ๋งŒ๋“œ๋Š” ๊ณผ์ •์ด ์ˆ˜๋„์ฝ”๋“œ์™€ ๊ฐ™์ด ์ข€ ๋” ์ž์„ธํžˆ ์„œ์ˆ ๋˜์—ˆ๋‹ค๋ฉด ์ข‹์•˜์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.


Author Information

  • Doil Kim

    • Affiliation : Master Course in KAIST KSE program

    • Research Topic : Data science, Object detection, Human factors

6. Reference & Additional materials

  • Reference :

    • Points as Queries: Weakly Semi-supervised Object Detection by Points

    • https://arxiv.org/abs/1612.03144

    • https://wikidocs.net/145910

Last updated