[온라인] 제10회 전자공학회 신호처리소사이어티 영상이해연구회 여름학교

강연요약

강연자	강연 내용
박은수 박사 (모두의 연구소)	CNN Architecture & Transfer Learning with TensorFlow 2 (Jupyter Noteook을 이용한 실습 포함) Email: es.park@modulabs.co.kr 컴퓨터비젼 관련 연구를 진행하다보면 일반적으로 많이 활용하게 되는 것이 전이학습(Transfer learning) 입니다. 본 과정에서는 전이학습에 많이 사용되는 CNN 구조를 공부하면서 CNN을 더 깊게 이해하는 것을 목표로 합니다. 또한 강의의 중간 중간 TensorFlow2를 이용한 레이어 구현 및 전이학습 실습을 추가하여 배운 내용을 활용해 볼 수 있도록 구성하였습니다.
김창익 교수 (KAIST)	Photometric Stereo Email: changick@gmail.com 광도스테레오 (photometric stereo)는 여러 개의 서로 다른 방향의 조명을 이용하여 물체의 표면 법선 벡터 (surface normal)를 구하는 3차원 시각인식 분야의 하나입니다. 이는 물체의 표면에서 반사되는 빛의 양은 조명 소스와 표면의 방향에 의존한다는 사실에 기반합니다. 본 강의에서는, 먼저 람버션 반사에 대해 알아보고 광도 스테레오에서 표면 법선벡터를 구하는 세가지 방법인 calibrated, uncalibrated, and semi-calibrated 방법에 대해 설명하고자 합니다. 또한 이 강의를 통해 선형시스템 행렬의 rank에 따라 homogeneous linear equation을 푸는 방법을 설명하고 이 방법이 광도 스테레오 뿐만이 아니라 다시점 기하학 (multi-view geometry) 3차원 복원에도 다양하게 사용되는 것을 소개하고자 합니다.
고병철 교수 (계명대학교)	Image Registration Based on Conventional Features to Convolutional Features Email: niceko@kmu.ac.kr 영상정합에서 전통적으로 사용되어 오던 SIFT, SURF등의 알고리즘이 최근에는 CNN을 이용한 특징 추출과 특징기술자를 기반의 방법론으로 바뀌고 있다. 본 강의에서는 전통적인 영상 정합 기술을 리뷰하고, CNN특징기술자를 바탕으로 변형된 영상간의 정합이 어떠한 방법으로 이루어 지고 있는지, 기본적인 CNN 부터 deep graphical feature learning에 걸쳐 소개하도록 한다.
이찬수 교수 (영남대학교)	Deep Learning Based Facial Expression Recognition (Jupyter Noteook을 이용한 실습 포함) Email: chansu@ynu.ac.kr 본 세미나에서는 딥러닝 기반 표정인식을 위하여, 얼굴을 검출하고, 검출된 얼굴을 정규화해서 표정인식을 위한 학습 및 테스트데이터를 만들고, 이를 바탕으로 표정인식 모델을 학습하는 과정을 OpenCV와 Python을 가지고 차례대로 진행해 봄으로써 표정인식을 위한 딥러닝 학습의 기본을 익힐 수 있도록 구성하였습니다.
박혜영 교수 (경북대학교)	Deep Learning Models for Few-shot Classification Email: hypark@knu.ac.kr 일반적인 딥러닝 모델들은 많은 양의 데이터를 학습함으로써 그에 대한 일반화 성능을 확보하기 때문에, 데이터의 양이 딥러닝 모델의 성능과 직결된다. 이러한 특성은 인간이 극소수의 샘플만으로도 새로운 패턴들을 인식할 수 있는 것과는 상반된 것으로, Few-shot classification에서는 이러한 기존의 패턴인식 시스템의 한계를 극복하고 극소수의 샘플만으로도 안정적인 분류 성능을 내는 분류기를 개발하는 것을 목적으로 한다. 이 강의에서는 Few-shot classification의 개념과, 이를 실현하기 위해 사용되는 meta-learning 기법, 그리고 Few-shot classification을 위한 다양한 deep learning 모델들을 소개한다.
최종현 교수 (GIST)	Data and visual recognition with and without deep learning Email: jhc@gist.ac.kr The role of data in visual recognition has been critical before and after the deep learning invasion to computer vision. I will discuss the representative techniques using data for visual recognition before and after the deep learning era.
Prof. Hamid Krim (North Carolina State University, USA)	From Subspace to Deep Structure Learning: Playing with Pixels and Atoms Email: ahk@ncsu.edu In this lecture, we show that structure learning may be carried out at different scales which will in turn, reflect the intrinsic data structure of interest for different specific applications. Given a typically limited number of degrees of freedom of any data, we propose a lower rank structure for the information space relative to its embedding space. We further argue that the self-representative nature of the data strongly suggests the flexible structure of union-of-subspaces (UoS) model, as a generalization of a linear subspace model. This proposed structure preserves the simplicity of linear subspace models, with an additional capacity of a piece-wise linear approximation of nonlinear data. We show a sufficient condition to use l1 minimization to reveal the underlying UoS structure, and further propose a bi-sparsity model (RoSure) as an effective strategy, to recover the given data characterization by the UoS model from non-conforming errors/corruptions. This structural characterization, albeit powerful for many applications, can be shown to be limited in large scale data (images) commonly shared features and for different applications. We make a case for further refinement by invoking a joint and principled scale-structure atomic characterization, which is demonstrated to improve performance. This resulting Deep Dictionary Learning approach is based on symbiotically formulating a classification problem regularized by a reconstruction problem. A theoretical rationale is also provided to contrast this work to Convolutional Neural Networks, with a demonstrably competitive performance. We also propose a novel structure of Deep Structure Learning, we refer to as Volterra Neural Networks, which do away with the non-linear transformations (ReLu, Sigmoid, etc.) and are shown with a capacity to outperform the state of the art CNN in Video Processing Problems. Substantiating examples are provided, and the application and performance of these approaches are shown for a wide range of problems such as video segmentation and object classification. Bio : Hamid Krim (ahk@ncsu.edu) is presently Professor of Electrical Engineering in the ECE Department, North Carolina State University, Raleigh, leading the Vision, Information and Statistical Signal Theories and Applications Laboratory. His research interests are in statistical signal and image analysis and mathematical modeling with a keen emphasis on applied problems in classification and recognition using geometric and topological tools. His research work has been funded by many Federal and Industrial agencies, including a NSF Career award. He has served on the IEEE editorial board of SP, and the TCs of SPTM and Big Data Initiative, as well as an AE of the new IEEE Transactions on SP on Information Processing on Networks, and of the IEEE SP Magazine. He is also one of the 2015-2016 Distinguished Lecturers of the IEEE SP Society.
김인중 교수 (한동대학교)	Attention Models and Memory Networks Email: ijkim@handong.edu Attention model은 영상처리, 자연어처리, 음성처리 등 다양한 분야에 널리 사용되는 딥러닝의 핵심 기술이다. 본 강의에서는 다양한 분야에서 널리 사용되는 encoder-attention-decoder 모델의 원리를 설명하고, attention model과 관계가 깊은 memory networks에 대해서도 소개한다. 또한, 자연어 및 음성 처리 분야에 가장 강력한 성능을 보이고 있는 Transformer와 최근 CNN에 널리 사용되는 visual attention model에 대해 설명한다.
Prof. Chih-Chung Hsu (National Pingtung University of Science and Technology)	Supervised and semi-supervised learning for image recognition Email: m121754@gmail.com With the rapid growth of deep learning-based applications for computer vision and image processing, several effective and efficient models such as ResNet, DenseNet, ResNeXt, and EfficientNet have been proposed to achieve SOTA performance on various tasks in supervised learning way. However, rare studies focus on semi-supervised learning for computer vision applications. In this talk, I would like to introduce a particular semi-supervised learning strategy, called pairwise learning, to learn the common feature representation for different image processing and computer vision applications. I will show how the pairwise learning is beneficial to various computer vision tasks, as well as bring some possible and potential research topics on deep pairwise learning in the future. Bio : Chih-Chung Hsu is an Assistant Professor at the Department of Management Information Systems (MIS) at the National Pingtung University of Science and Technology (NPUST) since February 2018. He earned his Ph.D. in Electrical Engineering from the National Tsing Hua University (NTHU) in 2014. His research interests mainly lie in computer vision, image/video processing, machine learning, and deep learning. Dr. Hsu received a top 10% paper award from the IEEE International Workshop on Multimedia Signal Processing (MMSP) in 2013. Dr. Hsu won the best/top performance awards from ACM Multimedia (ACM MM) in 2017-2019. He received the best grand challenge paper award ACM MM in 2017. Dr. Hsu received the best student paper award from the IEEE Internation Conference on Image Processing (ICIP) 2019. In 2018-2019, Dr. Hsu received several challenge awards in ICCV, MMSP, VCIP, and ACMMM.