## algorithm Finding largest subset of points forming a convex polygon?

unsigned dpStep(ind_t i_left, ind_t from, ind_t to_ind) { ind_t child = sorted[from][to_ind]; while (child < i_left) child = sorted[from][++to_ind]; if (child == i_left) return 1; if (auto v = memorize[from][child]) return v; const auto x = max(dpStep(i_left, child, lefts[from][child]) + 1, dpStep(i_left, from, static_cast<ind_t>(to_ind + 1))); memorize[from][child] = static_cast<ind_t>(x); return x; }

- Periodically prune all points to the left of the "leftmost" point from preprocessed arrays.

- Preprocess the points to get two arrays: (a) each point sorted by direction relative to each other point and (b) position in first array of the end of line segment making least possible left turn relative to the direction of "parent" segment.

- Sort the points in left-right direction

- Take the longest of paths found on step 3.

- Use each point as the leftmost point and find the longest convex polygon with "simplified" algorithm.

@Naman: It looks clear enough. I could only try to explain it in different words. On each DP step there are 3 groups of points: (a) 4 points mentioned (first/last pairs), (b) points already in the hull (they are already chosen by DP algorithm, so they satisfy convexity condition and they are optimal), (c) all other points. On each step algorithm tries every point from group (c) independently and checks convexity condition for this point relative to points of group (a). If the points fits, there is no need to check its convexity relative to points of group (b), it is guaranteed to be satisfied.

@Naman: Now I understand the last part of your question. Yes, points are arranged. In your example first point in the pair (and first in sequence) is (3,1) and the last point is (7,1). So (5,3) cannot be inserted after (7,1). If you do this you'll get {(8,2), (7,1), (5,3), (3,1), (2,2)} which is not convex.

All this is possible to implement if we preprocess input data, sorting directed line segments starting at the same point by angle. This allows to perform depth-first traversal in the graph. We should memorize the longest path starting from each processed node, so that each graph edge is visited at most once, and we have O(E) = O(N2) time algorithm (ignoring preprocessing time for now). Space requirements are also O(N2) because we need to store sorted directions for each pair of points and memorization uses one value per node (which is also a pair of points).

And its solution (O(N3) algorithm) could be found on this page: "USACO DEC08 Problem 'fence' Analysis" by Bruce Merry and Jacob Steinhardt.

Any segment ending at leftmost point corresponds to a "leaf" node, i.e. it has no child nodes. Such construction of the graph guarantees that there are no cycles, in other words this graph is a DAG.

Both these alternatives need O(N2 log N) time (unless we use some distribution sorting). Third alternative is to use well known computational geometry methods (arrangements and duality) to get O(N2) preprocessing. But I don't think it is useful for our problem: it is probably too slow for smaller point sets because of large constant factor, as for larger point sets, it might give some speed improvement, but too insignificant because step 3 will greatly outweigh step 2. Also it is much more difficult to implement.

General problem (where the leftmost point is not fixed) could be solved this way:

Here is C++ implementation of this dynamic programming algorithm:

Here parent node and corresponding segment have blue color. Line segment corresponding to its left child (red color) starts at the end of "parent" segment and it is the line segment making least possible left turn relative to the direction of "parent" segment. Line segment corresponding to its right child (green color) starts at the start of "parent" segment and it is also the line segment making least possible left turn relative to the direction of "parent" segment.

I am sorry to bother but I am unable to comprehend one thing on the given URL cerberus.delos.com:790/TESTDATA/DEC08.fence.htm It's written as 'the first and last two points in the hull are sufficient to check that the next point doesn't break the convexity condition'. Can you please explain it? Why all the points are not required for that? Are the points arranged in particular order?

Input parameters are the leftmost point and a pair of points forming current line segment (where starting point of the segment is given directly but ending point is given as an index in sorted by angle array of points). The word "left" in this code fragment is slightly overused: it means both the leftmost point (i_left) and preprocessed array containing left childs for the graph (lefts[][]).

Now to find largest convex subset of points it is enough to find the longest path in this graph. And the longest path in DAG could be found with dynamic programming algorithm in time O(E), where E is number of edges in the graph. Since each node has only 2 outgoing edges, each corresponding to a pair of points, this problem could be solved in O(N2) time.

Or we could exclude atan2 and sort tangents instead of angles, as it is done in my implementation. This is a little bit more complicated but faster.

Simplified problem: find largest subset of points that form a convex polygon and contain the leftmost point of given set (or if there are several leftmost points, this polygon should contain topmost of them).

So you mean before the given algorithm I need to sort/order the given points? If so order how(by x,y or clockwise?). I am sorry if I am asking silly questions. If you can help me with some link that has a detailed explanation, that would also be really helpful.

Some other results: I tried to implement iterative DP instead of recursion; this did not noticeably change the speed. Also I tried to perform two DP searches at once, interleaving steps of each search (something similar to fibers or HyperThreading implemented in software); this might help because DP consists mostly of memory accesses having high latency and preventing full utilization of CPU throughput; the results are not very impressive, only about 11% speed boost, most likely with real HyperThreading it would be much faster.

Somehow I am still unable to understand as why is it guaranteed. Can I please explain a case. Consider first pair (2,2), (3,1), last pair (8,2), (7,1) and points already in hull (6,6),(7,5). Now a new point (5,3) comes. It will succeed convexity condition from first and last pair but not when we consider it against all the points including in group (b). I know I am making some trivial mistake in understanding but I would really appreciate if you can correct me.

Step 4 is optional. It does not improve O(N3) time complexity. But it slightly improves speed of DP algorithm by excluding unneeded points. In my tests this gives 33% speed boost.

The following is an attempt to explain this algorithm. Also here is my implementation of this algorithm in C++11. This code works for N<=250 and integer coordinates in range 0 .. 1023. No three points should be on the same line. Here is Python script that generates test data satisfying these requirements.

There are several alternatives for preprocessing. We could compute every angle (with atan2) and sort them, as it is done in sample code by Bruce Merry and Jacob Steinhardt. This is a simplest way but atan2 may be too expensive, especially for smaller point sets.

To solve this problem we could connect each pair of points by directed line segment, then (implicitly) treat each segment as a graph node as shown in diagram:

## algorithm Finding largest subset of points forming a convex polygon?

upper_hull(most left point, previous point, current point) { ret = 0 if (current point != most left point) /* at anytime we can decide to end the upper hull and build lower_hull from current point ending at most left point */ ret = min(ret,lower_hull(most left point, -1, current point)) for all point on the right of current point /* here we try all possible next point that still satisfy the condition of convex polygon */ if (cross(previous point,current point,next point) >= 0) max(ret,1+upper_hull(most left point, current point, next point)) return ret; } lower_hull(most left point, previous point, current point) { if (current point == most left point) return 0; ret = -INF /* it might be impossible to build a convex hull here, so if it does it will return -infinity */ for all point on the left of current point and not on the left of most left point if (cross(previous point,current point,next point) >= 0) max(ret,1+lower_hull(most left point, current point, next point)) return ret; }

First sort the point based on x axis then for tie sort by y axis, then try all point as most left point to run the upper_hull(p,-1,p) , please tell me if there's any flaw in this algorithm

One possibility is to precalculate all of the cross-product results (O(N^3)) and break them into two graphs based on if the result is positive or negative, then use depth first search starting on the left most point to find the longest of the shortest paths. It seems like it should be doable in O(E), which since edge is a triple (a,b),(b,c), is O(N^3). However, you then have to do this for O(N) points (for each left-most point), results in an overall time complexity of O(N^4). Therefore, I'm fairly sure now that you can't get better than O(N^4).

Thanks for the bounty, glad I could be of help.

This is my a Dynamic Programming O(N^4) algorithm with idea from Andrew's Algorithm posted by Nuclearman, i'm still looking for a O(N^3) algorithm

## algorithm Finding largest subset of points forming a convex polygon?

def maximal_convex_hull(points): # Expects a list of 2D points in the format (x,y) points = sorted(set(points)) # Remove duplicates and sort if len(points) <= 1: # End early return points def cross(o, a, b): # Cross product return (a[0] - o[0]) * (b[1] - o[1]) - (a[1] - o[1]) * (b[0] - o[0]) # Use a queue to extend Andrew's algorithm in the following ways: # 1. Start a new convex hull for each successive point # 2. Keep track of the largest convex hull along the way Q = [] size = 0 maximal_hull = None for i,p in enumerate(points): remaining = len(points) - i + 1 new_Q = [] for lower, upper in Q: # Go through the queue # Build upper and lower hull similtanously, slightly less efficent while len(lower) >= 2 and cross(lower[-2], lower[-1], p) < 0: lower.pop() lower.append(p) while len(upper) >= 2 and cross(upper[-2], upper[-1], p) > 0: upper.pop() upper.append(p) new_size = len(lower) + len(upper) if new_size > size: # Check for a new highest maximal convex hull size = new_size maximal_hull = (lower, upper) # There is an odd bug? that when one or both if statements are removed # xqg237.tsp (TSPLIB) gives slightly different solutions and is # missing a number of points it should have. # Seems like if the opposite should be true if anything since they # were intended to be easy optimizations not required code. if remaining + new_size >= size: # Don't bother if not enough new_Q.append((lower, upper)) # Add an updated convex hulls if remaining > size: # Don't bother if not enough new_Q.append(([p], [p])) # Add a new convex hull Q = new_Q # Update to the new queue # Extract and merge the lower and upper to a maximium convex hull # Only one of the last points is ommited because it is repeated # Asserts and related variables used for testing # Could just have "return lower[:-1] + list(reversed(upper[:-1]))[:-1]" lower, upper = maximal_hull max_hull_set = set(lower) | set(lower) | set(upper) lower = lower upper = list(reversed(upper[:-1]))[:-1] max_convex_hull = lower + upper assert len(lower) + len(upper) == len(max_hull_set) assert max_hull_set == set(max_convex_hull) return max_convex_hull

Andrew's algorithm is O(N) (but requires a sort, making it O(N log N)), this algorithm runs O(N) versions of andrew's algorithm. Making for O(N^2) runtime. Although, the algorithm may not go far enough.

Edit: This algorithm isn't correct, but it served as inspiration for the correct algorithm and thus I'm leaving it here.

Originally, I came up with a O(N^4) algorithm, but noticed was way over-complicating it and brought it down to O(N^2) algorithm. It seems like there maybe a bug in the code somewhere that causes issues in at least the case of a vertical line of points. It might be a special case (requiring changing a few lines of code), or a sign of a larger flaw in the algorithm. If it's the latter, then I'm inclined to say it's NP-hard, and offer the algorithm as a heuristic. I've spent all the time I care to coding and debugging it. In any case it seems to work fine in other cases. However, it's difficult to test for correctness when the correct algorithms seem to be O(2^N).

Thanks! i for the idea, by extending Andrew's algorithm, i get a O(N^4) algorithm using Dynamic Programming that i think doesn't have any flaw, I'm still have some doubt in your O(N^2) but i will check it out again in the weekend :)

What's the point set for that example? You might have a point that it should be considering the upper and lower hulls separately. Although that would still result in a O(N^2) algorithm, just with a larger hidden coefficients.

Yes, but i have some doubt in correctness of that algorithm, can you explain a little how the algorithm work since i'm not really familiar with phython

like in this picture imgur.com/DsJhF71, wasn't EFG pop-ed when C inserted, although the optimal upper hull is AEFGD

## Discussion