Keypoints are the same thing as interest points. They are spatial locations, or points in the image that define what is interesting or what stand out in the image. The reason why keypoints are special is because no matter how the image changes... whether the image rotates, shrinks/expands, is translated (all of these would be an affine transformation by the way...) or is subject to distortion (i.e. a projective transformation or homography), you should be able to find the same keypoints in this modified image when comparing with the original image. Here's an example from a post I wrote a while ago:

The image on the right is a rotated version of the left image. I've also only displayed the top 10 matches between the two images. If you take a look at the top 10 matches, these are points that we probably would want to focus on that would allow us to remember what the image was about. We would want to focus on the face of the cameraman as well as the camera, the tripod and some of the interesting textures on the buildings in the background. You see that these same points were found between the two images and these were successfully matched.

Therefore, what you should take away from this is that these are points in the image that are interesting and that they should be found no matter how the image is distorted.

I understand that they are some kind of "points of interest" of an image. I also know that they are scale invariant and I know they are circular.

You are correct. Scale invariant means that no matter how you scale the image, you should still be able to find those points.

Now we are going to venture into the descriptor part. What makes keypoints different between frameworks is the way you describe these keypoints. These are what are known as descriptors. Each keypoint that you detect has an associated descriptor that accompanies it. Some frameworks only do a keypoint detection, while other frameworks are simply a description framework and they don't detect the points. There are also some that do both - they detect and describe the keypoints. SIFT and SURF are examples of frameworks that both detect and describe the keypoints.

Descriptors are primarily concerned with both the scale and the orientation of the keypoint. The keypoints we've nailed that concept down, but we need the descriptor part if it is our purpose to try and match between keypoints in different images. Now, what you mean by "circular"... that correlates with the scale that the point was detected at. Take for example this image that is taken from the VLFeat Toolbox tutorial:

You see that any points that are yellow are interest points, but some of these points have a different circle radius. These deal with scale. How interest points work in a general sense is that we decompose the image into multiple scales. We check for interest points at each scale, and we combine all of these interest points together to create the final output. The larger the "circle", the larger the scale was that the point was detected at. Also, there is a line that radiates from the centre of the circle to the edge. This is the orientation of the keypoint, which we will cover next.

Also I found out that they have orientation but I couldn't understand what actually it is. It is an angle but between the radius and something?

Basically if you want to detect keypoints regardless of scale and orientation, when they talk about orientation of keypoints, what they really mean is that they search a pixel neighbourhood that surrounds the keypoint and figure out how this pixel neighbourhood is oriented or what direction this patch is oriented in. It depends on what descriptor framework you look at, but the general jist is to detect the most dominant orientation of the gradient angles in the patch. This is important for matching so that you can match keypoints together. Take a look at the first figure I have with the two cameramen - one rotated while the other isn't. If you take a look at some of those points, how do we figure out how one point matches with another? We can easily identify that the top of the cameraman as an interest point matches with the rotated version because we take a look at points that surround the keypoint and see what orientation all of these points are in... and from there, that's how the orientation is computed.

Usually when we want to detect keypoints, we just take a look at the locations. However, if you want to match keypoints between images, then you definitely need the scale and the orientation to facilitate this.

opencv - What are keypoints in image processing? - Stack Overflow

opencv image-processing sift surf keypoint

If you have managed to find matches between the image and the scene, then I suggest you apply cv::findHomography(). It will calculate the homography matrix using 4 matches as input.

You can convert to camera pose from the homography matrix directly.

c++ - Sift Extraction - opencv - Stack Overflow

c++ opencv sift surf feature-extraction

If you have managed to find matches between the image and the scene, then I suggest you apply cv::findHomography(). It will calculate the homography matrix using 4 matches as input.

You can convert to camera pose from the homography matrix directly.

c++ - Sift Extraction - opencv - Stack Overflow

c++ opencv sift surf feature-extraction

Are the images taken standing from the same position but you're just rotated a bit so that they're not aligned correctly? If so then the images are related by a homography - i.e. a projective transformation. Given a set of correspondences between the images (you need at least 4 pairs), the standard way to find the homography is to use the DLT algorithm.

Calculate offset/skew/rotation of similar images in C++ - Stack Overfl...

c++ image-processing opencv image-manipulation

The most elegant and fastest solution would be to find the homography matrix, which maps rectangle coordinates to photo coordinates.

With a decent matrix library it should not be a difficult task, as long as you know your math.

However, the recursive algorithm above should work, but probably if your resources are limited, projective geometry is the only way to go.

algorithm - How to draw a Perspective-Correct Grid in 2D - Stack Overf...

algorithm graphics geometry 2d augmented-reality

How do we find the homography transform, then?

I would say that you don't have enough overlap between your images. If you look at your matches (what you call "drawed features"), most of them are wrong. As a first test, try to stitch two images that have, say, 80% overlap.

When you stitch two images, you assume that there exists an affine transform (your "homography") that will project features from one image onto the other one. When you know this transform, then you know the relative position of your images and you can "put them together". If the homography transform that you find is bad, then the stitching will be bad as well.

• First of all, you detect features (with your FeatureDetector) on both images.
• Then, you describe them (with your DescriptorExtractor). Basically this creates a representation of your features, so that you can compare two features and see how similar they are.
• You match (using your DescriptorMatcher) features from the first image to the features from the second image. It means that for each feature in the first image, you try to find the most similar one in the second image. Those are your "drawed features".
• From those matches, you use an algorithm called "RANSAC" to find the homography transform corresponding to your data. The idea is that you try to find a set of matches from all your "drawed features" that makes sense geometrically.

If you look at your "drawed features", you will see that only a few ones on the "Go" part of "Google" and some in the boorkmarks correspond, when the others are wrong. It means that most of your matches are bad, and then it makes it possible to find a homography that works for this data, but that is wrong.

In order to have a better homography, you would need much more "good" matches. Consequently, you probably need to have more overlap between your images.

It seems you are right!But is there a better way to find better matches?It is very bad when you have to use images with at least 80% overlap.

It depends on the situation. But in this case, you had something like 10%, which is not enough...

You will always have a proportion of bad matches. You just need enough good matches. In this case, increase the overlap...

At first,thank you for your attention... .I try to draw good matches by means of features2d,but it crashes in run time.I asked about it here:stackoverflow.com/questions/27699524/ Do you have any suggest about that problem?

android - Opencv4Android,stitching two images - Stack Overflow

android opencv homography image-stitching opencv4android

RQDecomp3x3 has a problem to return rotation in other axes except Z so in this way you just find spin in z axes correctly,if you find projection matrix and pass it to "decomposeProjectionMatrix" you will find better resaults,projection matrix is different to homography matrix you should attention to this point.

For the rotation axis this is OK, I only need the Z rotation in my case...

decomposeProjectionMatrix calls RQDecomp3x3 as well, but only after calculating some stuff.

opencv - findHomography() / 3x3 matrix - how to get rotation part out ...

opencv matrix rotation translation vector-graphics

How do we find the homography transform, then?

I would say that you don't have enough overlap between your images. If you look at your matches (what you call "drawed features"), most of them are wrong. As a first test, try to stitch two images that have, say, 80% overlap.

When you stitch two images, you assume that there exists an affine transform (your "homography") that will project features from one image onto the other one. When you know this transform, then you know the relative position of your images and you can "put them together". If the homography transform that you find is bad, then the stitching will be bad as well.

• First of all, you detect features (with your FeatureDetector) on both images.
• Then, you describe them (with your DescriptorExtractor). Basically this creates a representation of your features, so that you can compare two features and see how similar they are.
• You match (using your DescriptorMatcher) features from the first image to the features from the second image. It means that for each feature in the first image, you try to find the most similar one in the second image. Those are your "drawed features".
• From those matches, you use an algorithm called "RANSAC" to find the homography transform corresponding to your data. The idea is that you try to find a set of matches from all your "drawed features" that makes sense geometrically.

If you look at your "drawed features", you will see that only a few ones on the "Go" part of "Google" and some in the boorkmarks correspond, when the others are wrong. It means that most of your matches are bad, and then it makes it possible to find a homography that works for this data, but that is wrong.

In order to have a better homography, you would need much more "good" matches. Consequently, you probably need to have more overlap between your images.

It seems you are right!But is there a better way to find better matches?It is very bad when you have to use images with at least 80% overlap.

It depends on the situation. But in this case, you had something like 10%, which is not enough...

You will always have a proportion of bad matches. You just need enough good matches. In this case, increase the overlap...

At first,thank you for your attention... .I try to draw good matches by means of features2d,but it crashes in run time.I asked about it here:stackoverflow.com/questions/27699524/ Do you have any suggest about that problem?

android - Opencv4Android,stitching two images - Stack Overflow

android opencv homography image-stitching opencv4android