AI 에서의 Interpretability

AI 에서 왜 interpretability 가 필요할까?

It's not JUST about being responsibile.
물론 Interpretability - a tool to improve responsibility
+ the more we know about what we do, the more we become conscious about what we are doing.
Interpretability 는 더 넓은 개념 - fundamental underspecification in the problem
( Humans often don't know exactly what they want )

ex ) safety

More data or more clever algorithm may not be what you need if your problem is underspecified.

즉, 더 많은 데이터, 더 좋은 알고리즘이 있더라도 이런 underspecifed 된 문제는 해결할 수 없음

Good intentions are not enough.

그래, 그럼 열심히 한번 interpretability 를 위해 연구해보자! 해보자! 로 안될 수도 있다...
그 정도의 intention 으로 해결될 문제가 아님.
we have to go beyond that.
we have to think critically about what we are doing in every single step in the way.

그럼 어떻게 해야할까?

Investigating post-training interpretability methods.

인풋 새 이미지 - 딥러닝/머신러닝 모델 - prediction <Junco Bird> 가 있을 때, prediction 의 evidence 를 찾기
가운데 딥러닝/머신러닝 모델은 그대로 두고, post-training methods 로 찾기
Given a fixed model, find the evidence of prediction. Why was this a Junco bird?
- 방법 ) 그 예측이 얼마나 sensitive 한지 알아보기 위해서 - 인풋 특징(픽셀)을 조금씩 바꿔보기
- 하나의 픽셀에 변화를 줬을 때 Junco bird probability 가 얼마나 바뀌는지 살펴보기
- 많이 바뀐다면 중요한 픽셀, 조금 바뀐다면 중요하지 않은 픽셀
prediction 이 바뀌면, 당연히 explanation 도 바뀌어야 함
- 실제로 그럴까? no.
- 모델 레이어들 중 하나의 레이어에 랜덤 값을 주는 등 바꿔도 결과는 같았음

이 글은 서울대 AI 여름학교 Google Brain 김빈 연구원님의 강의를 보고 작성한 글입니다.

In Pursuit of Interpretability - Been Kim

nongdevlog