{"id":832,"date":"2019-06-20T06:33:09","date_gmt":"2019-06-20T06:33:09","guid":{"rendered":"http:\/\/wpcharming.wpengine.com\/construction\/?p=1"},"modified":"2020-03-19T02:14:04","modified_gmt":"2020-03-19T02:14:04","slug":"introduction-to-image-matching-principles","status":"publish","type":"post","link":"https:\/\/anlab.jp\/en\/introduction-to-image-matching-principles\/","title":{"rendered":"Introduction to image-matching principles"},"content":{"rendered":"<p>MARCH 27TH, 2014<\/p>\n<p>At A.N.Lab, we develop various applications based on the image-matching and AR (Augmented Reality) technology.<\/p>\n<p>For example, we have developed applications to detect celebrities\u2019 faces from recorded TV programmes, to do book stocktaking from bookshelf pictures, or to recognize corporate logos on paper printings.<\/p>\n<p><!--more--><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-3299 size-full aligncenter\" src=\"https:\/\/picmatch.io\/home3\/picmatch\/public_html\/wp-content\/uploads\/2014\/03\/Blog-Principle-1.png\" alt=\"\" width=\"603\" height=\"150\" srcset=\"https:\/\/anlab.jp\/wp-content\/uploads\/2014\/03\/Blog-Principle-1.png 603w, https:\/\/anlab.jp\/wp-content\/uploads\/2014\/03\/Blog-Principle-1-300x75.png 300w, https:\/\/anlab.jp\/wp-content\/uploads\/2014\/03\/Blog-Principle-1-600x149.png 600w\" sizes=\"auto, (max-width: 603px) 100vw, 603px\" \/><\/p>\n<p>The base technology for these application is image-matching.<\/p>\n<p>Here I would like to introduce certain basics of the technology.<br \/>\nDon\u2019t expect to be able to build commercial-level products just by reading these basics, but at least these will give you an idea of how the technology works.<\/p>\n<hr \/>\n<h3><span style=\"color: #bb0000;\">Matching a pair of images<\/span><\/h3>\n<p>Please look at the below two images.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1230 size-full\" src=\"https:\/\/picmatch.io\/home3\/picmatch\/public_html\/wp-content\/uploads\/2015\/02\/6.png\" alt=\"\" width=\"425\" height=\"150\" srcset=\"https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/6.png 425w, https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/6-300x106.png 300w\" sizes=\"auto, (max-width: 425px) 100vw, 425px\" \/><\/p>\n<p>You can see that the mountain on the right side of the first image and the mountain on the left side of the second image are actually the same one.<\/p>\n<p>Where does a human look at to determine these two mountains are the same ?<br \/>\nMost of the time, humans look at the\u00a0<span style=\"color: #0000ff;\">shapes<\/span>\u00a0and the\u00a0<span style=\"color: #0000ff;\">colors<\/span>.<\/p>\n<p>The image-matching method simulates this behavior. It uses \u201c<span style=\"color: #0000ff;\">feature points<\/span>\u201d that combine the concepts of shapes and colors.<br \/>\nFeature points essentially describe the shapes and the colors of the images.<\/p>\n<p>By matching these feature points, we can see the similarities between the two images.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1231 size-full\" src=\"https:\/\/picmatch.io\/home3\/picmatch\/public_html\/wp-content\/uploads\/2015\/02\/8.png\" alt=\"\" width=\"420\" height=\"150\" srcset=\"https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/8.png 420w, https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/8-300x107.png 300w\" sizes=\"auto, (max-width: 420px) 100vw, 420px\" \/><\/p>\n<p>There are two main steps in the image-matching process.<br \/>\nA. Extracting feature points from the images.<br \/>\nB. Matching the feature points.<\/p>\n<hr \/>\n<h3><span style=\"color: #bb0000;\">A. Extracting feature points<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1232 size-full\" src=\"https:\/\/picmatch.io\/home3\/picmatch\/public_html\/wp-content\/uploads\/2015\/02\/9.png\" alt=\"\" width=\"544\" height=\"300\" srcset=\"https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/9.png 544w, https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/9-300x165.png 300w\" sizes=\"auto, (max-width: 544px) 100vw, 544px\" \/><\/p>\n<p>The shape of the building in this image could be sketched with the positions of the\u00a0<span style=\"color: #0000ff;\">corners<\/span>\u00a0(in yellow).<\/p>\n<p>The image-matching method uses these\u00a0<span style=\"color: #0000ff;\">corners<\/span>\u00a0as the feature points.<\/p>\n<p>There are algorithms to extract such corner points.<br \/>\nFor example, \u201c<a href=\"http:\/\/en.wikipedia.org\/wiki\/Corner_detection#The_Harris_.26_Stephens_.2F_Plessey_.2F_Shi.E2.80.93Tomasi_corner_detection_algorithm\" target=\"_blank\" rel=\"noopener noreferrer\">Harris corner response function<\/a>\u201d is a well-known one.<\/p>\n<p>Let me skip the mathematical details. Essentially, when we apply the Harris corner function to the below images,<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1233 size-full\" src=\"https:\/\/picmatch.io\/home3\/picmatch\/public_html\/wp-content\/uploads\/2015\/02\/10.png\" alt=\"\" width=\"464\" height=\"300\" srcset=\"https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/10.png 464w, https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/10-300x194.png 300w\" sizes=\"auto, (max-width: 464px) 100vw, 464px\" \/><\/p>\n<p>we will get the following results.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1234 size-full\" src=\"https:\/\/picmatch.io\/home3\/picmatch\/public_html\/wp-content\/uploads\/2015\/02\/11.png\" alt=\"\" width=\"464\" height=\"300\" srcset=\"https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/11.png 464w, https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/11-300x194.png 300w\" sizes=\"auto, (max-width: 464px) 100vw, 464px\" \/><\/p>\n<p><span style=\"color: #bb0000;\">Corners<\/span>\u00a0would have positive values (red colors).<br \/>\n<span style=\"color: #0000ff;\">Lines<\/span>\u00a0would have negative values (deep-blue colors).<br \/>\n<span style=\"color: #00ffff;\">Flat areas<\/span>\u00a0would have values close to zero (light-blue colors).<\/p>\n<p>The\u00a0<span style=\"color: #bb0000;\">local maxima<\/span> (red points) in this image become the \u201cfeature points\u201d.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1235 size-full\" src=\"https:\/\/picmatch.io\/home3\/picmatch\/public_html\/wp-content\/uploads\/2015\/02\/12.png\" alt=\"\" width=\"464\" height=\"300\" srcset=\"https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/12.png 464w, https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/12-300x194.png 300w\" sizes=\"auto, (max-width: 464px) 100vw, 464px\" \/><\/p>\n<p>Note: A local maximum is a point that has the value higher than any neighboring points.<br \/>\nA local maximum is not necessarily the point with the highest value in the whole image.<\/p>\n<p>You can see these local maxima points somehow describe the shape of the object in the image.<\/p>\n<h4>Not only corners could be used as feature points<\/h4>\n<p>In the below image, not the shapes but the colors actually describe the characteristics of the image better. In this case, we use\u00a0<span style=\"color: #0000ff;\">color areas<\/span>\u00a0as the image features.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1237 alignleft\" src=\"http:\/\/192.168.1.7\/wp-content\/uploads\/2015\/02\/13.png\" alt=\"\" width=\"300\" height=\"300\" srcset=\"https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/13.png 300w, https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/13-150x150.png 150w, https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/13-100x100.png 100w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>This method is called \u201cblob detection\u201d.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>The areas of the same colors are approximated into\u00a0<span style=\"color: #0000ff;\">circles<\/span>, and the centers of the circles become the feature points.<\/p>\n<hr \/>\n<h3><span style=\"color: #bb0000;\">B. Matching the feature points<\/span><\/h3>\n<p>Assume that we have extracted the feature points from the two images.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1238 size-full\" src=\"https:\/\/picmatch.io\/home3\/picmatch\/public_html\/wp-content\/uploads\/2015\/02\/14.png\" alt=\"\" width=\"422\" height=\"150\" srcset=\"https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/14.png 422w, https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/14-300x107.png 300w\" sizes=\"auto, (max-width: 422px) 100vw, 422px\" \/><\/p>\n<p>The more matching feature points are there, the more \u201csimilar\u201d the two images are.<\/p>\n<p>The steps for matching feature points are as follows:<br \/>\n\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014<br \/>\ni) Choose two feature points (one from each image) that are <strong>\u201c<\/strong><span style=\"color: #bb0000;\">close<\/span><strong>\u201c<\/strong>, and make them to\u00a0<span style=\"color: #0000ff;\">one pair<\/span>.<br \/>\nLet the feature point from image A be\u00a0<i>a<\/i>, feature point from image B be\u00a0<i>b<\/i>.<\/p>\n<p>ii) Compute the\u00a0<span style=\"color: #0000ff;\">transformation<\/span>\u00a0from\u00a0<i>a<\/i>\u00a0to\u00a0<i>b<\/i>.<br \/>\nThe transformation is something like \u201crotate xx degrees, move to left yy pixels, move upper zz pixels \u2026\u201d.<\/p>\n<p>iii) Apply the above transformation to\u00a0<span style=\"color: #0000ff;\">other<\/span>\u00a0feature points in image A. For each of the feature point in image A be transformed, look into image B to see if there is any feature point in B are \u201cclose\u201d to the transform point.<\/p>\n<p>The definition of being \u201cclose\u201d will be explained later.<\/p>\n<p>\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1239 size-full\" src=\"https:\/\/picmatch.io\/home3\/picmatch\/public_html\/wp-content\/uploads\/2015\/02\/15.png\" alt=\"\" width=\"420\" height=\"150\" srcset=\"https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/15.png 420w, https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/15-300x107.png 300w\" sizes=\"auto, (max-width: 420px) 100vw, 420px\" \/><\/p>\n<p>The above example shows the results after the feature points in image A are transformed.<\/p>\n<p>At the position where a feature point is transformed to, if there is a feature point of B that is \u201cclose\u201d to the transformed point, then we say the two points are \u201cmatching\u201d.<\/p>\n<p>iv) Repeat steps i) to iii) with different initiating pairs in step i), in order to search for the transformation that produces most matching pairs.<\/p>\n<p>v) With the\u00a0<span style=\"color: #0000ff;\">transformation that produces the most matching pairs<\/span>, we look at how many pairs there are and how much \u201cclose\u201d the pairs are to determine whether the two images \u201cmatch\u201d.<\/p>\n<p>Practically, there would be thresholds for the number of the pairs and the \u201cclose\u201d-ness.<\/p>\n<h4>\u201cClose\u201d-ness definition<\/h4>\n<p>So, how do we define that two feature points are \u201cclose\u201d ?<\/p>\n<p>A simple way is to take the\u00a0<span style=\"color: #0000ff;\">neighboring points<\/span>\u00a0of each feature point, and compare their color values correspondingly.<\/p>\n<p>However, if the images are somehow stretched as in the following examples, comparing point-to-point would not work.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1240 size-full\" src=\"https:\/\/picmatch.io\/home3\/picmatch\/public_html\/wp-content\/uploads\/2015\/02\/16.png\" alt=\"\" width=\"311\" height=\"150\" srcset=\"https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/16.png 311w, https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/16-300x145.png 300w\" sizes=\"auto, (max-width: 311px) 100vw, 311px\" \/><\/p>\n<p>These are the same image patterns, but with slightly different stretch directions.<br \/>\nIn this case, it is not easy to find the corresponding points to compare.<\/p>\n<p>Therefore, we need to do some\u00a0<span style=\"color: #0000ff;\">direction alignment (normalization)<\/span>.<\/p>\n<p>We compute the\u00a0<span style=\"color: #0000ff;\">color gradients <\/span>in each direction and resize the images so that the color gradients are the same.<\/p>\n<p>Since the brightness of the images may differ, instead of comparing the absolute color values, it is better to compare the differences in color values with neighboring points.<\/p>\n<p>The result of the normalization looks like the following.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-1246 size-medium\" src=\"https:\/\/picmatch.io\/home3\/picmatch\/public_html\/wp-content\/uploads\/2015\/02\/17-300x171.png\" alt=\"\" width=\"300\" height=\"171\" srcset=\"https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/17-300x171.png 300w, https:\/\/anlab.jp\/wp-content\/uploads\/2015\/02\/17.png 525w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>(The left ones are the original images. The middle ones are the images after being resized. The right ones are the color differences with the neighboring points.)<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>Even though the area around feature points in the original images are stretched in different directions, after normalization you could see that they have approximately the same color values and therefore are essentially the same image.<\/p>\n<p>\u2014<\/p>\n<p>Above was the introduction to the basics of image-matching technology.<\/p>\n<p>This article used examples from\u00a0<a href=\"http:\/\/www.cs.illinois.edu\/~slazebni\/spring11\/\" target=\"_blank\" rel=\"noopener noreferrer\">The University of Illinois<\/a>.<\/p>\n<p>You can refer to\u00a0<a href=\"http:\/\/www.cs.illinois.edu\/~slazebni\/spring11\/\" target=\"_blank\" rel=\"noopener noreferrer\">this page<\/a> for more detailed materials.<\/p>","protected":false},"excerpt":{"rendered":"<p>Sorry, this entry is only available in \u65e5\u672c\u8a9e.<\/p>\n","protected":false},"author":1,"featured_media":2651,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[70],"tags":[66,182,173,174,178],"class_list":["post-832","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-powered-image-recognition","tag-picmatch","tag-182","tag-173","tag-174","tag-178"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/anlab.jp\/en\/wp-json\/wp\/v2\/posts\/832","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/anlab.jp\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/anlab.jp\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/anlab.jp\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/anlab.jp\/en\/wp-json\/wp\/v2\/comments?post=832"}],"version-history":[{"count":39,"href":"https:\/\/anlab.jp\/en\/wp-json\/wp\/v2\/posts\/832\/revisions"}],"predecessor-version":[{"id":10159,"href":"https:\/\/anlab.jp\/en\/wp-json\/wp\/v2\/posts\/832\/revisions\/10159"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/anlab.jp\/en\/wp-json\/wp\/v2\/media\/2651"}],"wp:attachment":[{"href":"https:\/\/anlab.jp\/en\/wp-json\/wp\/v2\/media?parent=832"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/anlab.jp\/en\/wp-json\/wp\/v2\/categories?post=832"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/anlab.jp\/en\/wp-json\/wp\/v2\/tags?post=832"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}