**156288**results.

## Why we care for the asymptotic bound of an algorithm?

First let's understand what big O, big Theta and big Omega are. They are all sets of functions.

Big O is giving upper asymptotic bound, while big Omega is giving a lower bound. Big Theta gives both.

T(n)

(f(n))

O(f(n))

Omega(f(n))

For example, merge sort worst case is both O(n*log(n)) and Omega(n*log(n)) - and thus is also (n*log(n)), but it is also O(n^2), since n^2 is asymptotically "bigger" than it. However, it is not (n^2), Since the algorithm is not Omega(n^2).

O(n) is asymptotic upper bound. If T(n) is O(f(n)), it means that from a certain n0, there is a constant C such that T(n) <= C * f(n). On the other hand, big-Omega says there is a constant C2 such that T(n) >= C2 * f(n))).

Not to be confused with worst, best and average cases analysis: all three (Omega, O, Theta) notation are not related to the best, worst and average cases analysis of algorithms. Each one of these can be applied to each analysis.

We usually use it to analyze complexity of algorithms (like the merge sort example above). When we say "Algorithm A is O(f(n))", what we really mean is "The algorithms complexity under the worst1 case analysis is O(f(n))" - meaning - it scales "similar" (or formally, not worse than) the function f(n).

Well, there are many reasons for it, but I believe the most important of them are:

- It is much harder to determine the exact complexity function, thus we "compromise" on the big-O/big-Theta notations, which are informative enough theoretically.

- The exact number of ops is also platform dependent. For example, if we have a vector (list) of 16 numbers. How much ops will it take? The answer is: it depends. Some CPUs allow vector additions, while other don't, so the answer varies between different implementations and different machines, which is an undesired property. The big-O notation however is much more constant between machines and implementations.

To demonstrate this issue, have a look at the following graphs:

It is clear that f(n) = 2*n is "worse" than f(n) = n. But the difference is not quite as drastic as it is from the other function. We can see that f(n)=logn quickly getting much lower than the other functions, and f(n) = n^2 is quickly getting much higher than the others. So - because of the reasons above, we "ignore" the constant factors (2* in the graphs example), and take only the big-O notation.

In the above example, f(n)=n, f(n)=2*n will both be in O(n) and in Omega(n) - and thus will also be in Theta(n). On the other hand - f(n)=logn will be in O(n) (it is "better" than f(n)=n), but will NOT be in Omega(n) - and thus will also NOT be in Theta(n). Symetrically, f(n)=n^2 will be in Omega(n), but NOT in O(n), and thus - is also NOT Theta(n).

1Usually, though not always. when the analysis class (worst, average and best) is missing, we really mean the worst case.

+1 for the nice explanation, I have a confusion ,you have written in last line that ,, f(n)=n^2 will be in Omega(n). but not in O(n) .and thus is also not Theta(n) >how?

@krishnaChandra: f(n) = n^2 is asymptotically stronger then n, and thus is Omega(n). However it is not O(n) (because for large n values, it is bigger then c*n, for all n). Since we said Theta(n) is the intersection of O(n) and Omega(n), since it is not O(n), it cannot be Theta(n) as well.

It's great to see someone explain how big-O notation isn't related to the best/worst case running time of an algorithm. There are so many websites that come up when I google the topic that say O(T(n)) means the worse case running time.

@almel It's 2*n (2n, two times n) not 2^n

@VishalK 1. Big O is the upper bound as n tends to infinity. 2. Omega is the lower bound as n tends to infinity. 3. Theta is both the upper and lower bound as n tends to infinity. Note that all bounds are only valid "as n tends to infinity", because the bounds do not hold for low values of n (less than n0). The bounds hold for all n n0, but not below n0 where lower order terms become dominant.

## algorithm - What exactly does big Ө notation represent? - Stack Overfl...

This is one of the faster algorithms, up to 170!. It fails inexplicably beyond 170!, and it's relatively slow for small factorials, but for factorials between 80 and 170 it's blazingly fast compared to many algorithms.

curl http://www.google.com/search?q=170!

Let me know if you find a bug, or faster implementation for large factorials.

This algorithm is slightly slower, but gives results beyond 170:

curl http://www58.wolframalpha.com/input/?i=171!

Use MPFR (mpfr.org). It allows floats with exponents in the 2^(2^32) range, or so...

... and the formatting of the page is destroyed!

## Factorial Algorithms in different languages - Stack Overflow

There is precisely one algorithm with runtime O(1/n), the "empty" algorithm.

For an algorithm to be O(1/n) means that it executes asymptotically in less steps than the algorithm consisting of a single instruction. If it executes in less steps than one step for all n > n0, it must consist of precisely no instruction at all for those n. Since checking 'if n > n0' costs at least 1 instruction, it must consist of no instruction for all n.

Summing up: The only algorithm which is O(1/n) is the empty algorithm, consisting of no instruction.

This is the only correct answer in this thread, and (despite my upvote) it is at zero votes. Such is StackOverflow, where "correct-looking" answers are voted higher than actually correct ones.

No, its rated 0 because it is incorrect. Expressing a big-Oh value in relation to N when it is independent of N is incorrect. Second, running any program, even one that just exists, takes at least a constant amount of time, O(1). Even if that wasn't the case, it'd be O(0), not O(1/n).

Any function that is O(0) is also O(1/n), and also O(n), also O(n^2), also O(2^n). Sigh, does no one understand simple definitions? O() is an upper bound.

@kenj0418 You managed to be wrong in every single sentence. "Expressing a big-Oh value in relation to N when it is independent of N is incorrect." A constant function is a perfectly goof function. "Second, running any program, even one that just exists, takes at least a constant amount of time, O(1)." The definition of complexity doesn't say anything about actually running any programs. "it'd be O(0), not O(1/n)". See @ShreevatsaR's comment.

it seems that you are not plumber. good answer.

## theory - Are there any O(1/n) algorithms? - Stack Overflow

To describe a permutation of n elements, you see that for the position that the first element ends up at, you have n possibilities, so you can describe this with a number between 0 and n-1. For the position that the next element ends up at, you have n-1 remaining possibilities, so you can describe this with a number between 0 and n-2. Et cetera until you have n numbers.

As an example for n = 5, consider the permutation that brings abcde to caebd.

- a, the first element, ends up at the second position, so we assign it index 1.

- b ends up at the fourth position, which would be index 3, but it's the third remaining one, so we assign it 2.

- d ends up at the last remaining position, which (out of only two remaining positions) is 1.

So we have the index sequence {1, 2, 0, 1, 0}.

Now you know that for instance in a binary number, 'xyz' means z + 2y + 4x. For a decimal number, it's z + 10y + 100x. Each digit is multiplied by some weight, and the results are summed. The obvious pattern in the weight is of course that the weight is w = b^k, with b the base of the number and k the index of the digit. (I will always count digits from the right and starting at index 0 for the rightmost digit. Likewise when I talk about the 'first' digit I mean the rightmost.)

The reason why the weights for digits follow this pattern is that the highest number that can be represented by the digits from 0 to k must be exactly 1 lower than the lowest number that can be represented by only using digit k+1. In binary, 0111 must be one lower than 1000. In decimal, 099999 must be one lower than 100000.

Encoding to variable-base The spacing between subsequent numbers being exactly 1 is the important rule. Realising this, we can represent our index sequence by a variable-base number. The base for each digit is the amount of different possibilities for that digit. For decimal each digit has 10 possibilities, for our system the rightmost digit would have 1 possibility and the leftmost will have n possibilities. But since the rightmost digit (the last number in our sequence) is always 0, we leave it out. That means we're left with bases 2 to n. In general, the k'th digit will have base b[k] = k + 2. The highest value allowed for digit k is h[k] = b[k] - 1 = k + 1.

Our rule about the weights w[k] of digits requires that the sum of h[i] * w[i], where i goes from i = 0 to i = k, is equal to 1 * w[k+1]. Stated recurrently, w[k+1] = w[k] + h[k] * w[k] = w[k]*(h[k] + 1). The first weight w[0] should always be 1. Starting from there, we have the following values:

k h[k] w[k] 0 1 1 1 2 2 2 3 6 3 4 24 ... ... ... n-1 n n!

(The general relation w[k-1] = k! is easily proved by induction.)

The number we get from converting our sequence will then be the sum of s[k] * w[k], with k running from 0 to n-1. Here s[k] is the k'th (rightmost, starting at 0) element of the sequence. As an example, take our {1, 2, 0, 1, 0}, with the rightmost element stripped off as mentioned before: {1, 2, 0, 1}. Our sum is 1 * 1 + 0 * 2 + 2 * 6 + 1 * 24 = 37.

Note that if we take the maximum position for every index, we'd have {4, 3, 2, 1, 0}, and that converts to 119. Since the weights in our number encoding were chosen so that we don't skip any numbers, all numbers 0 to 119 are valid. There are precisely 120 of these, which is n! for n = 5 in our example, precisely the number of different permutations. So you can see our encoded numbers completely specify all possible permutations.

Decoding from variable-base Decoding is similar to converting to binary or decimal. The common algorithm is this:

int number = 42; int base = 2; int[] bits = new int[n]; for (int k = 0; k < bits.Length; k++) { bits[k] = number % base; number = number / base; }

int n = 5; int number = 37; int[] sequence = new int[n - 1]; int base = 2; for (int k = 0; k < sequence.Length; k++) { sequence[k] = number % base; number = number / base; base++; // b[k+1] = b[k] + 1 }

This correctly decodes our 37 back to {1, 2, 0, 1} (sequence would be {1, 0, 2, 1} in this code example, but whatever ... as long as you index appropriately). We just need to add 0 at the right end (remember the last element always has only one possibility for its new position) to get back our original sequence {1, 2, 0, 1, 0}.

Permuting a list using an index sequence You can use the below algorithm to permute a list according to a specific index sequence. It's an O(n) algorithm, unfortunately.

int n = 5; int[] sequence = new int[] { 1, 2, 0, 1, 0 }; char[] list = new char[] { 'a', 'b', 'c', 'd', 'e' }; char[] permuted = new char[n]; bool[] set = new bool[n]; for (int i = 0; i < n; i++) { int s = sequence[i]; int remainingPosition = 0; int index; // Find the s'th position in the permuted list that has not been set yet. for (index = 0; index < n; index++) { if (!set[index]) { if (remainingPosition == s) break; remainingPosition++; } } permuted[index] = list[i]; set[index] = true; }

Common representation of permutations Normally you would not represent a permutation as unintuitively as we've done, but simply by the absolute position of each element after the permutation is applied. Our example {1, 2, 0, 1, 0} for abcde to caebd is normally represented by {1, 3, 0, 4, 2}. Each index from 0 to 4 (or in general, 0 to n-1) occurs exactly once in this representation.

Applying a permutation in this form is easy:

int[] permutation = new int[] { 1, 3, 0, 4, 2 }; char[] list = new char[] { 'a', 'b', 'c', 'd', 'e' }; char[] permuted = new char[n]; for (int i = 0; i < n; i++) { permuted[permutation[i]] = list[i]; }

for (int i = 0; i < n; i++) { list[i] = permuted[permutation[i]]; }

Converting from our representation to the common representation Note that if we take our algorithm to permute a list using our index sequence, and apply it to the identity permutation {0, 1, 2, ..., n-1}, we get the inverse permutation, represented in the common form. ({2, 0, 4, 1, 3} in our example).

To get the non-inverted premutation, we apply the permutation algorithm I just showed:

int[] identity = new int[] { 0, 1, 2, 3, 4 }; int[] inverted = { 2, 0, 4, 1, 3 }; int[] normal = new int[n]; for (int i = 0; i < n; i++) { normal[identity[i]] = list[i]; }

Or you can just apply the permutation directly, by using the inverse permutation algorithm:

char[] list = new char[] { 'a', 'b', 'c', 'd', 'e' }; char[] permuted = new char[n]; int[] inverted = { 2, 0, 4, 1, 3 }; for (int i = 0; i < n; i++) { permuted[i] = list[inverted[i]]; }

Note that all the algorithms for dealing with permutations in the common form are O(n), while applying a permutation in our form is O(n). If you need to apply a permutation several times, first convert it to the common representation.

In "Permuting a list using an index sequence", you mention a quadratic algorithm. This is certainly fine because n is probably going to be very small. This can "easily" be reduced to O(nlogn) though, through an order statistics tree (pine.cs.yale.edu/pinewiki/OrderStatisticsTree), i.e. a red-black tree which initially will contains the values 0, 1, 2, ..., n-1, and each node contains the number of descendants below it. With this, one can find/remove the kth element in O(logn) time.

This algorithm is awesome, but I just found several cases to be wrong. Take the string "123"; the 4th permutation should be 231, but according to this algorithm, it will be 312. say 1234, the 4th permutation should be 1342, but it will be mistaken to be "1423". Correct me if I observed wrong. Thanks.

@IsaacLi, if i am correct, f(4) = {2, 0, 0} = 231. And f'(312) = {1, 1, 0} = 3. For 1234, f(4) = {0, 2, 0, 0} = 1342. And f'(1423) = {0, 1 1, 0} = 3. This algorithm is really inspiring. I wonder it is the original work from the OP. i have studied and analysed it for a while. And i believe it is correct :)

How to convert from "our representation" to "common representation", {1, 2, 0, 1, 0} --> {1, 3, 0, 4, 2}? And vice versa? Is it possible? (by not converting between {1, 2, 0, 1, 0} <--> {C, A, E, B, D}, which needs O(n^2).) If "our style" and "common style" are not convertible, they are in fact two different separate things, isn't it? Thanks x

## math - Fast permutation -> number -> permutation mapping algorithms - ...

## Analysing a random algorithm by computing the distribution

My personal approach about correctness of probability-using algorithms: if you know how to prove it's correct, then it's probably correct; if you don't, it's certainly wrong.

Said differently, it's generally hopeless to try to analyse every algorithm you could come up with: you have to keep looking for an algorithm until you find one that you can prove correct.

I know of one way to "automatically" analyse a shuffle (or more generally a random-using algorithm) that is stronger than the simple "throw lots of tests and check for uniformity". You can mechanically compute the distribution associated to each input of your algorithm.

The general idea is that a random-using algorithm explores a part of a world of possibilities. Each time your algorithm asks for a random element in a set ({true, false} when flipping a coin), there are two possible outcomes for your algorithm, and one of them is chosen. You can change your algorithm so that, instead of returning one of the possible outcomes, it explores all solutions in parallel and returns all possible outcomes with the associated distributions.

In general, that would require rewriting your algorithm in depth. If your language supports delimited continuations, you don't have to; you can implement "exploration of all possible outcomes" inside the function asking for a random element (the idea is that the random generator, instead of returning a result, capture the continuation associated to your program and run it with all different results). For an example of this approach, see oleg's HANSEI.

An intermediary, and probably less arcane, solution is to represent this "world of possible outcomes" as a monad, and use a language such as Haskell with facilities for monadic programming. Here is an example implementation of a variant of your algorithm, in Haskell, using the probability monad of the probability package :

import Numeric.Probability.Distribution shuffleM :: (Num prob, Fractional prob) => [a] -> T prob [a] shuffleM [] = return [] shuffleM [x] = return [x] shuffleM (pivot:li) = do (left, right) <- partition li sleft <- shuffleM left sright <- shuffleM right return (sleft ++ [pivot] ++ sright) where partition [] = return ([], []) partition (x:xs) = do (left, right) <- partition xs uniform [(x:left, right), (left, x:right)]

You can run it for a given input, and get the output distribution :

*Main> shuffleM [1,2] fromFreqs [([1,2],0.5),([2,1],0.5)] *Main> shuffleM [1,2,3] fromFreqs [([2,1,3],0.25),([3,1,2],0.25),([1,2,3],0.125), ([1,3,2],0.125),([2,3,1],0.125),([3,2,1],0.125)]

You can see that this algorithm is uniform with inputs of size 2, but non-uniform on inputs of size 3.

The difference with the test-based approach is that we can gain absolute certainty in a finite number of steps : it can be quite big, as it amounts to an exhaustive exploration of the world of possibles (but generally smaller than 2^N, as there are factorisations of similar outcomes), but if it returns a non-uniform distribution we know for sure that the algorithm is wrong. Of course, if it returns an uniform distribution for [1..N] and 1 <= N <= 100, you only know that your algorithm is uniform up to lists of size 100; it may still be wrong.

: this algorithm is a variant of your Erlang's implementation, because of the specific pivot handling. If I use no pivot, like in your case, the input size doesn't decrease at each step anymore : the algorithm also considers the case where all inputs are in the left list (or right list), and get lost in an infinite loop. This is a weakness of the probability monad implementation (if an algorithm has a probability 0 of non-termination, the distribution computation may still diverge), that I don't yet know how to fix.

Here is a simple algorithm that I feel confident I could prove correct:

- Pick a random key for each element in your collection.

- If the keys are not all distinct, restart from step 1.

You can omit step 2 if you know the probability of a collision (two random numbers picked are equal) is sufficiently low, but without it the shuffle is not perfectly uniform.

If you pick your keys in [1..N] where N is the length of your collection, you'll have lots of collisions (Birthday problem). If you pick your key as a 32-bit integer, the probability of conflict is low in practice, but still subject to the birthday problem.

If you use infinite (lazily evaluated) bitstrings as keys, rather than finite-length keys, the probability of a collision becomes 0, and checking for distinctness is no longer necessary.

Here is a shuffle implementation in OCaml, using lazy real numbers as infinite bitstrings:

type 'a stream = Cons of 'a * 'a stream lazy_t let rec real_number () = Cons (Random.bool (), lazy (real_number ())) let rec compare_real a b = match a, b with | Cons (true, _), Cons (false, _) -> 1 | Cons (false, _), Cons (true, _) -> -1 | Cons (_, lazy a'), Cons (_, lazy b') -> compare_real a' b' let shuffle list = List.map snd (List.sort (fun (ra, _) (rb, _) -> compare_real ra rb) (List.map (fun x -> real_number (), x) list))

Algorithmic considerations: the complexity of the previous algorithm depends on the probability that all keys are distinct. If you pick them as 32-bit integers, you have a one in ~4 billion probability that a particular key collides with another key. Sorting by these keys is O(n log n), assuming picking a random number is O(1).

If you infinite bitstrings, you never have to restart picking, but the complexity is then related to "how many elements of the streams are evaluated on average". I conjecture it is O(log n) in average (hence still O(n log n) in total), but have no proof.

After more reflexion, I think (like douplep), that your implementation is correct. Here is an informal explanation.

Each element in your list is tested by several random:uniform() < 0.5 tests. To an element, you can associate the list of outcomes of those tests, as a list of booleans or {0, 1}. At the beginning of the algorithm, you don't know the list associated to any of those number. After the first partition call, you know the first element of each list, etc. When your algorithm returns, the list of tests are completely known and the elements are sorted according to those lists (sorted in lexicographic order, or considered as binary representations of real numbers).

So, your algorithm is equivalent to sorting by infinite bitstring keys. The action of partitioning the list, reminiscent of quicksort's partition over a pivot element, is actually a way of separating, for a given position in the bitstring, the elements with valuation 0 from the elements with valuation 1.

The sort is uniform because the bitstrings are all different. Indeed, two elements with real numbers equal up to the n-th bit are on the same side of a partition occurring during a recursive shuffle call of depth n. The algorithm only terminates when all the lists resulting from partitions are empty or singletons : all elements have been separated by at least one test, and therefore have one distinct binary decimal.

A subtle point about your algorithm (or my equivalent sort-based method) is that the termination condition is probabilistic. Fisher-Yates always terminates after a known number of steps (the number of elements in the array). With your algorithm, the termination depends on the output of the random number generator.

There are possible outputs that would make your algorithm diverge, not terminate. For example, if the random number generator always output 0, each partition call will return the input list unchanged, on which you recursively call the shuffle : you will loop indefinitely.

However, this is not an issue if you're confident that your random number generator is fair : it does not cheat and always return independent uniformly distributed results. In that case, the probability that the test random:uniform() < 0.5 always returns true (or false) is exactly 0 :

- the probability that the first N calls return true is 2^{-N}

- the probability that all calls return true is the probability of the infinite intersection, for all N, of the event that the first N calls return 0; it is the infimum limit of the 2^{-N}, which is 0

More generally, the algorithm does not terminate if and only if some of the elements get associated to the same boolean stream. This means that at least two elements have the same boolean stream. But the probability that two random boolean streams are equal is again 0 : the probability that the digits at position K are equal is 1/2, so the probability that the N first digits are equal is 2^{-N}, and the same analysis applies.

Therefore, you know that your algorithm terminates with probability 1. This is a slightly weaker guarantee that the Fisher-Yates algorithm, which always terminate. In particular, you're vulnerable to an attack of an evil adversary that would control your random number generator.

With more probability theory, you could also compute the distribution of running times of your algorithm for a given input length. This is beyond my technical abilities, but I assume it's good : I suppose that you only need to look at O(log N) first digits on average to check that all N lazy streams are different, and that the probability of much higher running times decrease exponentially.

My real question here, though, is what empirical tests can I throw at the output of my shuffler to see if it is plausibly shuffled? For example, that "pair a random weight with each element" approach tested badly even with my limited ability to test this stuff. (I tested the sequence [1,2] repeatedly and found a huge imbalance.)

[min_int..max_int] is not enough to bring the probability of conflict close to 0, because of the birthday problem you mentioned: with 32-bit ints, you already reach a 0.5 chance of conflict with a list of only ~77,000 items.

Also, note that in general, making any sort-based shuffle perfectly uniform/correct is probably much harder than it seems at first: for some of the problems, see Oleg's writeup, and my answer's comments. If a perfect shuffle is important at all, it is certainly much easier and simpler to just use the FisherYates algorithm.

I edited to mention your warning about [min_int..max_int] : you're right and it doesn't scale to big sequences. I also included an implementation of the real-number based sort. I agree that Fisher-Yates is simpler, but I'm not sure Oleg's proposal is.

@AJMansfield: Actually, with 64-bit keys, you only need ~5 billion selections to expect a collision with 50% probability. After 10 billion selections, the probability of a collision increases to ~93%. This counter-intuitive result is the Birthday Problem.

## functional programming - What, if anything, is wrong with this shuffli...

We begin by assembling the algorithmic building blocks from the Standard Library:

#include <algorithm> // min_element, iter_swap, // upper_bound, rotate, // partition, // inplace_merge, // make_heap, sort_heap, push_heap, pop_heap, // is_heap, is_sorted #include <cassert> // assert #include <functional> // less #include <iterator> // distance, begin, end, next

- the iterator tools such as non-member std::begin() / std::end() as well as with std::next() are only available as of C++11 and beyond. For C++98, one needs to write these himself. There are substitutes from Boost.Range in boost::begin() / boost::end(), and from Boost.Utility in boost::next().

- the std::is_sorted algorithm is only available for C++11 and beyond. For C++98, this can be implemented in terms of std::adjacent_find and a hand-written function object. Boost.Algorithm also provides a boost::algorithm::is_sorted as a substitute.

- the std::is_heap algorithm is only available for C++11 and beyond.

C++14 provides transparent comparators of the form std::less<> that act polymorphically on their arguments. This avoids having to provide an iterator's type. This can be used in combination with C++11's default function template arguments to create a single overload for sorting algorithms that take < as comparison and those that have a user-defined comparison function object.

In C++11, one can define a reusable template alias to extract an iterator's value type which adds minor clutter to the sort algorithms' signatures:

template<class It> using value_type_t = typename std::iterator_traits<It>::value_type; template<class It, class Compare = std::less<value_type_t<It>>> void xxx_sort(It first, It last, Compare cmp = Compare{});

In C++98, one needs to write two overloads and use the verbose typename xxx<yyy>::type syntax

template<class It, class Compare> void xxx_sort(It first, It last, Compare cmp); // general implementation template<class It> void xxx_sort(It first, It last) { xxx_sort(first, last, std::less<typename std::iterator_traits<It>::value_type>()); }

- Another syntactical nicety is that C++14 facilitates wrapping user-defined comparators through polymorphic lambdas (with auto parameters that are deduced like function template arguments).

- C++11 only has monomorphic lambdas, that require the use of the above template alias value_type_t.

- In C++98, one either needs to write a standalone function object or resort to the verbose std::bind1st / std::bind2nd / std::not1 type of syntax.

- Boost.Bind improves this with boost::bind and _1 / _2 placeholder syntax.

- C++11 and beyond also have std::find_if_not, whereas C++98 needs std::find_if with a std::not1 around a function object.

There is no generally acceptable C++14 style yet. For better of for worse, I closely follow Scott Meyers's draft Effective Modern C++ and Herb Sutter's revamped GotW. I use the following style recommendations:

- Herb Sutter's "Almost Always Auto" and Scott Meyers's "Prefer auto to specific type declarations" recommendation, for which the brevity is unsurpassed, although its clarity is sometimes disputed.

- Scott Meyers's "Distinguish () and {} when creating objects" and consistently choose braced-initialization {} instead of the good old parenthesized initialization () (in order to side-step all most-vexing-parse issues in generic code).

- Scott Meyers's "Prefer alias declarations to typedefs". For templates this is a must anyway, and using it everywhere instead of typedef saves time and adds consistency.

- I use a for (auto it = first; it != last; ++it) pattern in some places, in order to allow for loop invariant checking for already sorted sub-ranges. In production code, the use of while (first != last) and a ++first somewhere inside the loop might be slightly better.

Selection sort does not adapt to the data in any way, so its runtime is always O(N^2). However, selection sort has the property of minimizing the number of swaps. In applications where the cost of swapping items is high, selection sort very well may be the algorithm of choice.

To implement it using the Standard Library, repeatedly use std::min_element to find the remaining minimum element, and iter_swap to swap it into place:

Note that selection_sort has the already processed range [first, it) sorted as its loop invariant. The minimal requirements are forward iterators, compared to std::sort's random access iterators.

if (std::distance(first, last) <= 1) return;

if (first == last || std::next(first) == last) return;

Although it is one of the elementary sorting algorithms with O(N^2) worst-case time, insertion sort is the algorithm of choice either when the data is nearly sorted (because it is adaptive) or when the problem size is small (because it has low overhead). For these reasons, and because it is also stable, insertion sort is often used as the recursive base case (when the problem size is small) for higher overhead divide-and-conquer sorting algorithms, such as merge sort or quick sort.

To implement insertion_sort with the Standard Library, repeatedly use std::upper_bound to find the location where the current element needs to go, and use std::rotate to shift the remaining elements upward in the input range:

Note that insertion_sort has the already processed range [first, it) sorted as its loop invariant. Insertion sort also works with forward iterators.

- insertion sort can be optimized with an early test if (std::distance(first, last) <= 1) return; (or for forward / bidirectional iterators: if (first == last || std::next(first) == last) return;) and a loop over the interval [std::next(first), last), because the first element is guaranteed to be in place and doesn't require a rotate.

- for bidirectional iterators, the binary search to find the insertion point can be replaced with a reverse linear search using the Standard Library's std::find_if_not algorithm.

Four Live Examples (C++14, C++11, C++98 and Boost, C++98) for the fragment below:

using RevIt = std::reverse_iterator<BiDirIt>; auto const insertion = std::find_if_not(RevIt(it), RevIt(first), [=](auto const& elem){ return cmp(*it, elem); } ).base();

- For random inputs this gives O(N^2) comparisons, but this improves to O(N) comparisons for almost sorted inputs. The binary search always uses O(N log N) comparisons.

- For small input ranges, the better memory locality (cache, prefetching) of a linear search might also dominate a binary search (one should test this, of course).

When carefully implemented, quick sort is robust and has O(N log N) expected complexity, but with O(N^2) worst-case complexity that can be triggered with adversarially chosen input data. When a stable sort is not needed, quick sort is an excellent general-purpose sort.

Even for the simplest versions, quick sort is quite a bit more complicated to implement using the Standard Library than the other classic sorting algorithms. The approach below uses a few iterator utilities to locate the middle element of the input range [first, last) as the pivot, then use two calls to std::partition (which are O(N)) to three-way partition the input range into segments of elements that are smaller than, equal to, and larger than the selected pivot, respectively. Finally the two outer segments with elements smaller than and larger than the pivot are recursively sorted:

template<class FwdIt, class Compare = std::less<>> void quick_sort(FwdIt first, FwdIt last, Compare cmp = Compare{}) { auto const N = std::distance(first, last); if (N <= 1) return; auto const pivot = *std::next(first, N / 2); auto const middle1 = std::partition(first, last, [=](auto const& elem){ return cmp(elem, pivot); }); auto const middle2 = std::partition(middle1, last, [=](auto const& elem){ return !cmp(pivot, elem); }); quick_sort(first, middle1, cmp); // assert(std::is_sorted(first, middle1, cmp)); quick_sort(middle2, last, cmp); // assert(std::is_sorted(middle2, last, cmp)); }

However, quick sort is rather tricky to get correct and efficient, as each of the above steps has to be carefully checked and optimized for production level code. In particular, for O(N log N) complexity, the pivot has to result into a balanced partition of the input data, which cannot be guaranteed in general for an O(1) pivot, but which can be guaranteed if one sets the pivot as the O(N) median of the input range.

- the above implementation is particularly vulnerable to special inputs, e.g. it has O(N^2) complexity for the "organ pipe" input 1, 2, 3, ..., N/2, ... 3, 2, 1 (because the middle is always larger than all other elements).

- median-of-3 pivot selection from randomly chosen elements from the input range guards against almost sorted inputs for which the complexity would otherwise deteriorate to O(N^2).

- 3-way partitioning (separating elements smaller than, equal to and larger than the pivot) as shown by the two calls to std::partition is not the most efficient O(N) algorithm to achieve this result.

O(N log N)

std::nth_element(first, middle, last)

quick_sort(first, middle, cmp)

quick_sort(middle, last, cmp)

- this guarantee comes at a cost, however, because the constant factor of the O(N) complexity of std::nth_element can be more expensive than that of the O(1) complexity of a median-of-3 pivot followed by an O(N) call to std::partition (which is a cache-friendly single forward pass over the data).

If using O(N) extra space is of no concern, then merge sort is an excellent choice: it is the only stable O(N log N) sorting algorithm.

It is simple to implement using Standard algorithms: use a few iterator utilities to locate the middle of the input range [first, last) and combine two recursively sorted segments with a std::inplace_merge:

template<class BiDirIt, class Compare = std::less<>> void merge_sort(BiDirIt first, BiDirIt last, Compare cmp = Compare{}) { auto const N = std::distance(first, last); if (N <= 1) return; auto const middle = std::next(first, N / 2); merge_sort(first, middle, cmp); // assert(std::is_sorted(first, middle, cmp)); merge_sort(middle, last, cmp); // assert(std::is_sorted(middle, last, cmp)); std::inplace_merge(first, middle, last, cmp); // assert(std::is_sorted(first, last, cmp)); }

Merge sort requires bidirectional iterators, the bottleneck being the std::inplace_merge. Note that when sorting linked lists, merge sort requires only O(log N) extra space (for recursion). The latter algorithm is implemented by std::list<T>::sort in the Standard Library.

Heap sort is simple to implement, performs an O(N log N) in-place sort, but is not stable.

The first loop, O(N) "heapify" phase, puts the array into heap order. The second loop, the O(N log N) "sortdown" phase, repeatedly extracts the maximum and restores heap order. The Standard Library makes this extremely straightforward:

template<class RandomIt, class Compare = std::less<>> void heap_sort(RandomIt first, RandomIt last, Compare cmp = Compare{}) { lib::make_heap(first, last, cmp); // assert(std::is_heap(first, last, cmp)); lib::sort_heap(first, last, cmp); // assert(std::is_sorted(first, last, cmp)); }

In case you consider it "cheating" to use std::make_heap and std::sort_heap, you can go one level deeper and write those functions yourself in terms of std::push_heap and std::pop_heap, respectively:

namespace lib { // NOTE: is O(N log N), not O(N) as std::make_heap template<class RandomIt, class Compare = std::less<>> void make_heap(RandomIt first, RandomIt last, Compare cmp = Compare{}) { for (auto it = first; it != last;) { std::push_heap(first, ++it, cmp); assert(std::is_heap(first, it, cmp)); } } template<class RandomIt, class Compare = std::less<>> void sort_heap(RandomIt first, RandomIt last, Compare cmp = Compare{}) { for (auto it = last; it != first;) { std::pop_heap(first, it--, cmp); assert(std::is_heap(first, it, cmp)); } } } // namespace lib

The Standard Library specifies both push_heap and pop_heap as complexity O(log N). Note however that the outer loop over the range [first, last) results in O(N log N) complexity for make_heap, whereas std::make_heap has only O(N) complexity. For the overall O(N log N) complexity of heap_sort it doesn't matter.

Details omitted: O(N) implementation of make_heap

Here are four Live Examples (C++14, C++11, C++98 and Boost, C++98) testing all five algorithms on a variety of inputs (not meant to be exhaustive or rigorous). Just note the huge differences in the LOC: C++11/C++14 need around 130 LOC, C++98 and Boost 190 (+50%) and C++98 more than 270 (+100%).

While I disagree with your use of auto (and many people disagree with me), I enjoyed seeing the standard library algorithms being used well. I'd been wanting to see some examples of this kind of code after seeing Sean Parent's talk. Also, I had no idea std::iter_swap existed, although it seems strange to me that it's in <algorithm>.

@sbabbi The entire standard library is based on the principle that iterators are cheap to copy; it passes them by value, for example. If copying an iterator isn't cheap, then you're going to suffer performance problems everywhere.

@gnzlbg The asserts you can comment out, of course. The early test can be tag-dispatched per iterator category, with the current version for random access, and if (first == last || std::next(first) == last). I might update that later. Implementing the stuff in the "omitted details" sections is beyond the scope of the question, IMO, because they contain links to entire Q&As themselves. Implementing real-word sorting routines is hard!

Great post. Though, you've cheated with your quicksort by using nth_element in my opinion. nth_element does half a quicksort already (including the partitioning step and a recursion on the half that includes the n-th element you're interested in).

@DavidStone You are right of course. This Q&A is not meant to be the definite guide to write real life optimized sort routines, but rather to show how to combine basic building blocks. See e.g. the cpp-sort library on how much extra details require careful attention in real life :)

## How to implement classic sorting algorithms in modern C++? - Stack Ove...

We begin by assembling the algorithmic building blocks from the Standard Library:

#include <algorithm> // min_element, iter_swap, // upper_bound, rotate, // partition, // inplace_merge, // make_heap, sort_heap, push_heap, pop_heap, // is_heap, is_sorted #include <cassert> // assert #include <functional> // less #include <iterator> // distance, begin, end, next

- the iterator tools such as non-member std::begin() / std::end() as well as with std::next() are only available as of C++11 and beyond. For C++98, one needs to write these himself. There are substitutes from Boost.Range in boost::begin() / boost::end(), and from Boost.Utility in boost::next().

- the std::is_sorted algorithm is only available for C++11 and beyond. For C++98, this can be implemented in terms of std::adjacent_find and a hand-written function object. Boost.Algorithm also provides a boost::algorithm::is_sorted as a substitute.

- the std::is_heap algorithm is only available for C++11 and beyond.

C++14 provides transparent comparators of the form std::less<> that act polymorphically on their arguments. This avoids having to provide an iterator's type. This can be used in combination with C++11's default function template arguments to create a single overload for sorting algorithms that take < as comparison and those that have a user-defined comparison function object.

In C++11, one can define a reusable template alias to extract an iterator's value type which adds minor clutter to the sort algorithms' signatures:

template<class It> using value_type_t = typename std::iterator_traits<It>::value_type; template<class It, class Compare = std::less<value_type_t<It>>> void xxx_sort(It first, It last, Compare cmp = Compare{});

In C++98, one needs to write two overloads and use the verbose typename xxx<yyy>::type syntax

template<class It, class Compare> void xxx_sort(It first, It last, Compare cmp); // general implementation template<class It> void xxx_sort(It first, It last) { xxx_sort(first, last, std::less<typename std::iterator_traits<It>::value_type>()); }

- Another syntactical nicety is that C++14 facilitates wrapping user-defined comparators through polymorphic lambdas (with auto parameters that are deduced like function template arguments).

- C++11 only has monomorphic lambdas, that require the use of the above template alias value_type_t.

- In C++98, one either needs to write a standalone function object or resort to the verbose std::bind1st / std::bind2nd / std::not1 type of syntax.

- Boost.Bind improves this with boost::bind and _1 / _2 placeholder syntax.

- C++11 and beyond also have std::find_if_not, whereas C++98 needs std::find_if with a std::not1 around a function object.

There is no generally acceptable C++14 style yet. For better of for worse, I closely follow Scott Meyers's draft Effective Modern C++ and Herb Sutter's revamped GotW. I use the following style recommendations:

- Herb Sutter's "Almost Always Auto" and Scott Meyers's "Prefer auto to specific type declarations" recommendation, for which the brevity is unsurpassed, although its clarity is sometimes disputed.

- Scott Meyers's "Distinguish () and {} when creating objects" and consistently choose braced-initialization {} instead of the good old parenthesized initialization () (in order to side-step all most-vexing-parse issues in generic code).

- Scott Meyers's "Prefer alias declarations to typedefs". For templates this is a must anyway, and using it everywhere instead of typedef saves time and adds consistency.

- I use a for (auto it = first; it != last; ++it) pattern in some places, in order to allow for loop invariant checking for already sorted sub-ranges. In production code, the use of while (first != last) and a ++first somewhere inside the loop might be slightly better.

Selection sort does not adapt to the data in any way, so its runtime is always O(N^2). However, selection sort has the property of minimizing the number of swaps. In applications where the cost of swapping items is high, selection sort very well may be the algorithm of choice.

To implement it using the Standard Library, repeatedly use std::min_element to find the remaining minimum element, and iter_swap to swap it into place:

Note that selection_sort has the already processed range [first, it) sorted as its loop invariant. The minimal requirements are forward iterators, compared to std::sort's random access iterators.

if (std::distance(first, last) <= 1) return;

if (first == last || std::next(first) == last) return;

Although it is one of the elementary sorting algorithms with O(N^2) worst-case time, insertion sort is the algorithm of choice either when the data is nearly sorted (because it is adaptive) or when the problem size is small (because it has low overhead). For these reasons, and because it is also stable, insertion sort is often used as the recursive base case (when the problem size is small) for higher overhead divide-and-conquer sorting algorithms, such as merge sort or quick sort.

To implement insertion_sort with the Standard Library, repeatedly use std::upper_bound to find the location where the current element needs to go, and use std::rotate to shift the remaining elements upward in the input range:

Note that insertion_sort has the already processed range [first, it) sorted as its loop invariant. Insertion sort also works with forward iterators.

- insertion sort can be optimized with an early test if (std::distance(first, last) <= 1) return; (or for forward / bidirectional iterators: if (first == last || std::next(first) == last) return;) and a loop over the interval [std::next(first), last), because the first element is guaranteed to be in place and doesn't require a rotate.

- for bidirectional iterators, the binary search to find the insertion point can be replaced with a reverse linear search using the Standard Library's std::find_if_not algorithm.

Four Live Examples (C++14, C++11, C++98 and Boost, C++98) for the fragment below:

using RevIt = std::reverse_iterator<BiDirIt>; auto const insertion = std::find_if_not(RevIt(it), RevIt(first), [=](auto const& elem){ return cmp(*it, elem); } ).base();

- For random inputs this gives O(N^2) comparisons, but this improves to O(N) comparisons for almost sorted inputs. The binary search always uses O(N log N) comparisons.

- For small input ranges, the better memory locality (cache, prefetching) of a linear search might also dominate a binary search (one should test this, of course).

When carefully implemented, quick sort is robust and has O(N log N) expected complexity, but with O(N^2) worst-case complexity that can be triggered with adversarially chosen input data. When a stable sort is not needed, quick sort is an excellent general-purpose sort.

Even for the simplest versions, quick sort is quite a bit more complicated to implement using the Standard Library than the other classic sorting algorithms. The approach below uses a few iterator utilities to locate the middle element of the input range [first, last) as the pivot, then use two calls to std::partition (which are O(N)) to three-way partition the input range into segments of elements that are smaller than, equal to, and larger than the selected pivot, respectively. Finally the two outer segments with elements smaller than and larger than the pivot are recursively sorted:

template<class FwdIt, class Compare = std::less<>> void quick_sort(FwdIt first, FwdIt last, Compare cmp = Compare{}) { auto const N = std::distance(first, last); if (N <= 1) return; auto const pivot = *std::next(first, N / 2); auto const middle1 = std::partition(first, last, [=](auto const& elem){ return cmp(elem, pivot); }); auto const middle2 = std::partition(middle1, last, [=](auto const& elem){ return !cmp(pivot, elem); }); quick_sort(first, middle1, cmp); // assert(std::is_sorted(first, middle1, cmp)); quick_sort(middle2, last, cmp); // assert(std::is_sorted(middle2, last, cmp)); }

However, quick sort is rather tricky to get correct and efficient, as each of the above steps has to be carefully checked and optimized for production level code. In particular, for O(N log N) complexity, the pivot has to result into a balanced partition of the input data, which cannot be guaranteed in general for an O(1) pivot, but which can be guaranteed if one sets the pivot as the O(N) median of the input range.

- the above implementation is particularly vulnerable to special inputs, e.g. it has O(N^2) complexity for the "organ pipe" input 1, 2, 3, ..., N/2, ... 3, 2, 1 (because the middle is always larger than all other elements).

- median-of-3 pivot selection from randomly chosen elements from the input range guards against almost sorted inputs for which the complexity would otherwise deteriorate to O(N^2).

- 3-way partitioning (separating elements smaller than, equal to and larger than the pivot) as shown by the two calls to std::partition is not the most efficient O(N) algorithm to achieve this result.

O(N log N)

std::nth_element(first, middle, last)

quick_sort(first, middle, cmp)

quick_sort(middle, last, cmp)

- this guarantee comes at a cost, however, because the constant factor of the O(N) complexity of std::nth_element can be more expensive than that of the O(1) complexity of a median-of-3 pivot followed by an O(N) call to std::partition (which is a cache-friendly single forward pass over the data).

If using O(N) extra space is of no concern, then merge sort is an excellent choice: it is the only stable O(N log N) sorting algorithm.

It is simple to implement using Standard algorithms: use a few iterator utilities to locate the middle of the input range [first, last) and combine two recursively sorted segments with a std::inplace_merge:

template<class BiDirIt, class Compare = std::less<>> void merge_sort(BiDirIt first, BiDirIt last, Compare cmp = Compare{}) { auto const N = std::distance(first, last); if (N <= 1) return; auto const middle = std::next(first, N / 2); merge_sort(first, middle, cmp); // assert(std::is_sorted(first, middle, cmp)); merge_sort(middle, last, cmp); // assert(std::is_sorted(middle, last, cmp)); std::inplace_merge(first, middle, last, cmp); // assert(std::is_sorted(first, last, cmp)); }

Merge sort requires bidirectional iterators, the bottleneck being the std::inplace_merge. Note that when sorting linked lists, merge sort requires only O(log N) extra space (for recursion). The latter algorithm is implemented by std::list<T>::sort in the Standard Library.

Heap sort is simple to implement, performs an O(N log N) in-place sort, but is not stable.

The first loop, O(N) "heapify" phase, puts the array into heap order. The second loop, the O(N log N) "sortdown" phase, repeatedly extracts the maximum and restores heap order. The Standard Library makes this extremely straightforward:

template<class RandomIt, class Compare = std::less<>> void heap_sort(RandomIt first, RandomIt last, Compare cmp = Compare{}) { lib::make_heap(first, last, cmp); // assert(std::is_heap(first, last, cmp)); lib::sort_heap(first, last, cmp); // assert(std::is_sorted(first, last, cmp)); }

In case you consider it "cheating" to use std::make_heap and std::sort_heap, you can go one level deeper and write those functions yourself in terms of std::push_heap and std::pop_heap, respectively:

namespace lib { // NOTE: is O(N log N), not O(N) as std::make_heap template<class RandomIt, class Compare = std::less<>> void make_heap(RandomIt first, RandomIt last, Compare cmp = Compare{}) { for (auto it = first; it != last;) { std::push_heap(first, ++it, cmp); assert(std::is_heap(first, it, cmp)); } } template<class RandomIt, class Compare = std::less<>> void sort_heap(RandomIt first, RandomIt last, Compare cmp = Compare{}) { for (auto it = last; it != first;) { std::pop_heap(first, it--, cmp); assert(std::is_heap(first, it, cmp)); } } } // namespace lib

The Standard Library specifies both push_heap and pop_heap as complexity O(log N). Note however that the outer loop over the range [first, last) results in O(N log N) complexity for make_heap, whereas std::make_heap has only O(N) complexity. For the overall O(N log N) complexity of heap_sort it doesn't matter.

Details omitted: O(N) implementation of make_heap

Here are four Live Examples (C++14, C++11, C++98 and Boost, C++98) testing all five algorithms on a variety of inputs (not meant to be exhaustive or rigorous). Just note the huge differences in the LOC: C++11/C++14 need around 130 LOC, C++98 and Boost 190 (+50%) and C++98 more than 270 (+100%).

While I disagree with your use of auto (and many people disagree with me), I enjoyed seeing the standard library algorithms being used well. I'd been wanting to see some examples of this kind of code after seeing Sean Parent's talk. Also, I had no idea std::iter_swap existed, although it seems strange to me that it's in <algorithm>.

@sbabbi The entire standard library is based on the principle that iterators are cheap to copy; it passes them by value, for example. If copying an iterator isn't cheap, then you're going to suffer performance problems everywhere.

Great post. Regarding the cheating part of [std::]make_heap. If std::make_heap is considered cheating, so would std::push_heap. I.e. cheating = not implementing the actual behaviour defined for a heap structure. I would find it instructive have push_heap included as well.

@gnzlbg The asserts you can comment out, of course. The early test can be tag-dispatched per iterator category, with the current version for random access, and if (first == last || std::next(first) == last). I might update that later. Implementing the stuff in the "omitted details" sections is beyond the scope of the question, IMO, because they contain links to entire Q&As themselves. Implementing real-word sorting routines is hard!

Great post. Though, you've cheated with your quicksort by using nth_element in my opinion. nth_element does half a quicksort already (including the partitioning step and a recursion on the half that includes the n-th element you're interested in).

## How to implement classic sorting algorithms in modern C++? - Stack Ove...

**Skewness and Kurtosis**

For the on-line algorithms for Skewness and Kurtosis (along the lines of the variance), see in the same wiki page here the parallel algorithms for higher-moment statistics.

Median is tough without sorted data. If you know, how many data points you have, in theory you only have to partially sort, e.g. by using a selection algorithm. However, that doesn't help too much with billions of values. I would suggest using frequency counts, see the next section.

**Median and Mode with Frequency Counts**

If it is integers, I would count frequencies, probably cutting off the highest and lowest values beyond some value where I am sure that it is no longer relevant. For floats (or too many integers), I would probably create buckets / intervals, and then use the same approach as for integers. (Approximate) mode and median calculation than gets easy, based on the frequencies table.

If it is normally distributed, I would use the population sample mean, variance, skewness, and kurtosis as maximum likelihood estimators for a small subset. The (on-line) algorithms to calculate those, you already now. E.g. read in a couple of hundred thousand or million datapoints, until your estimation error gets small enough. Just make sure that you pick randomly from your set (e.g. that you don't introduce a bias by picking the first 100'000 values). The same approach can also be used for estimating mode and median for the normal case (for both the sample mean is an estimator).

All the algorithms above can be run in parallel (including many sorting and selection algorithm, e.g. QuickSort and QuickSelect), if this helps.

I have always assumed (with the exception of the section on the normal distribution) that we talk about sample moments, median, and mode, not estimators for theoretical moments given a known distribution.

In general, sampling the data (i.e. only looking at a sub-set) should be pretty successful given the amount of data, as long as all observations are realizations of the same random variable (have the same distributions) and the moments, mode and median actually exist for this distribution. The last caveat is not innocuous. For example, the mean (and all higher moments) for the Cauchy Distribution do not exist. In this case, the sample mean of a "small" sub-set might be massively off from the sample mean of the whole sample.

## statistics - "On-line" (iterator) algorithms for estimating statistica...

algorithms

algorithm

Thanks. This package seems to be very strong in more theoretical algirthm discussion, i know it from many math books. But i wont emphasize this so much (prerequsites, if, else), i would like to have a formatting like the one above.

I was only talking about the algorithm environment, not algorithmic. algorithm is a floating container, which looks pretty nice. You can put whatever you'd like inside, even the listing mentioned elsethread.

## LaTeX source code listing like in professional books - Stack Overflow

What you're looking for are called String Metric algorithms. There a significant number of them, many with similar characteristics. Among the more popular:

- Levenshtein Distance : The minimum number of single-character edits required to change one word into the other. Strings do not have to be the same length

- Hamming Distance : The number of characters that are different in two equal length strings.

- SmithWaterman : A family of algorithms for computing variable sub-sequence similarities.

- SrensenDice Coefficient : A similarity algorithm that computes difference coefficients of adjacent character pairs.

Have a look at these as well as others on the wiki page on the topic.

## language agnostic - What are some algorithms for comparing how similar...

Statistically informed algorithms solve this problem using fewer passes than deterministic approaches.

If very large integers are allowed then one can generate a number that is likely to be unique in O(1) time. A pseudo-random 128-bit integer like a GUID will only collide with one of the existing four billion integers in the set in less than one out of every 64 billion billion billion cases.

If integers are limited to 32 bits then one can generate a number that is likely to be unique in a single pass using much less than 10MB. The odds that a pseudo-random 32-bit integer will collide with one of the 4 billion existing integers is about 93% (4e9 / 2^32). The odds that 1000 pseudo-random integers will all collide is less than one in 12,000 billion billion billion (odds-of-one-collision ^ 1000). So if a program maintains a data structure containing 1000 pseudo-random candidates and iterates through the known integers, eliminating matches from the candidates, it is all but certain to find at least one integer that is not in the file.

I'm pretty sure the integers are bounded. If they weren't, then even a beginner programmer would think of the algorithm "take one pass through the data to find the maximum number, and add 1 to it"

Literally guessing a random output probably won't get you many points on an interview

@Adrian, your solution seems obvious (and it was to me, I used it in my own answer) but it's not obvious to everybody. It's a good test to see if you can spot obvious solutions or if you're going to over-complicate everything you touch.

## algorithm - Find an integer not among four billion given ones - Stack ...

Statistically informed algorithms solve this problem using fewer passes than deterministic approaches.

If very large integers are allowed then one can generate a number that is likely to be unique in O(1) time. A pseudo-random 128-bit integer like a GUID will only collide with one of the existing four billion integers in the set in less than one out of every 64 billion billion billion cases.

If integers are limited to 32 bits then one can generate a number that is likely to be unique in a single pass using much less than 10MB. The odds that a pseudo-random 32-bit integer will collide with one of the 4 billion existing integers is about 93% (4e9 / 2^32). The odds that 1000 pseudo-random integers will all collide is less than one in 12,000 billion billion billion (odds-of-one-collision ^ 1000). So if a program maintains a data structure containing 1000 pseudo-random candidates and iterates through the known integers, eliminating matches from the candidates, it is all but certain to find at least one integer that is not in the file.

I'm pretty sure the integers are bounded. If they weren't, then even a beginner programmer would think of the algorithm "take one pass through the data to find the maximum number, and add 1 to it"

Literally guessing a random output probably won't get you many points on an interview

@Adrian, your solution seems obvious (and it was to me, I used it in my own answer) but it's not obvious to everybody. It's a good test to see if you can spot obvious solutions or if you're going to over-complicate everything you touch.

## algorithm - Find an integer not among four billion given ones - Stack ...

This has the correct and optimal solution. I wanted to expand it into a separate answer to help people stuck in the paywall but... the algorithm is pretty messy.

It's asymptotically optimal but not especially practical I expect. It's certainly messy enough not to fit in the confines of an interview.

@missingno: Not so optimal. I have a more efficient average running time solution to this problem at stackoverflow.com/questions/5940420/ and it isn't that messy of an algorithm.

-1 This answer is totally useless, the article is no longer available. The answer-er has not been online since answering this question.

## Selection algorithms on sorted matrix - Stack Overflow

The author of Capo, a transcription program for the Mac, has a pretty in-depth blog. The entry "A Note on Auto Tabbing" has some good jumping off points:

I started researching different methods of automatic transcription in mid-2009, because I was curious about how far along this technology was, and if it could be integrated into a future version of Capo.

Each of these automatic transcription algorithms start out with some kind of intermediate represenation of the audio data, and then they transfer that into a symbolic form (i.e. note onsets, and durations).

This is where I encountered some computationally expensive spectral representations (The Continuous Wavelet Transform (CWT), Constant Q Transform (CQT), and others.) I implemented all of these spectral transforms so that I could also implement the algorithms presented by the papers I was reading. This would give me an idea of whether they would work in practice.

Capo has some impressive technology. The standout feature is that its main view is not a frequency spectrogram like most other audio programs. It presents the audio like a piano roll, with the notes visible to the naked eye.

(Note: The hard note bars were drawn by a user. The fuzzy spots underneath are what Capo displays.)

## audio - Chord detection algorithms? - Stack Overflow

What you have coded in your example is very similar to a depth first search. So, that's one answer.

A depth first search algorithm without any special characteristics ( like re-convergent paths that can be optimized out ), should be n^n.

This is actually not a contrived example. Chess programs operate on the same algorithm. Each move there are n moves to consider ( i.e. branches ), and you search d moves deep. So that becomes O(n^d)

Ah, but OP asked about problems which cannot be solved in better than O(n^n). An O(n^n) algorithm for a problem doesn't demonstrate that the problem can't be solved by a more efficient algorithm.

@Ted: No more efficient algorithm has been found for solving chess so far. There are optimizations - like alpha-beta pruning - but that doesn't change the fundamental characteristic of chess solving algorithm being O(n^d).

I totally agree that there are O(n^n) problems. I was just making the point that proving that a problem is O(n^n) involves more than showing that there's an O(n^n) algorithm that solves it. Chess is a good example of this because it is a finite game (there are a finite number of board positions). Theoretically, then it has O(1) complexity. But all known (practical) algorithms are very inefficient.

@Ted: Ok fair point. Though the OP did ask the question in 2 ways ( your interpretation being the second ). The first way he/she was asking whether there any algorithm that was O(n^n) that is not a gimmick.

Thank you, I guess your example with Chess is the most realistic one, even though there are better Algorithms to play Chess. And DFS based Algorithms are afaik not uncommon.

## complexity theory - Are there any real O(n^n) algorithms? - Stack Over...

You can do it in O(n) (where n is the number of digits) like this:

Starting from the right, you find the first pair-of-digits such that the left-digit is smaller than the right-digit. Let's refer to the left-digit by "digit-x". Find the smallest number larger than digit-x to the right of digit-x, and place it immediately left of digit-x. Finally, sort the remaining digits in ascending order - since they were already in descending order, all you need to do is reverse them (save for digit-x, which can be placed in the correct place in O(n)).

Let's use capital letters to define digit-strings and lower-case for digits. The syntax AB means "the concatenation of strings A and B". < is lexicographical ordering, which is the same as integer ordering when the digit-strings are of equal length.

Our original number N is of the form AxB, where x is a single digit and B is sorted descending. The number found by our algorithm is AyC, where y B is the smallest digit > x (it must exist due to the way x was chosen, see above), and C is sorted ascending.

Assume there is some number (using the same digits) N' such that AxB < N' < AyC. N' must begin with A or else it could not fall between them, so we can write it in the form AzD. Now our inequality is AxB < AzD < AyC, which is equivalent to xB < zD < yC where all three digit-strings contain the same digits.

In order for that to be true, we must have x <= z <= y. Since y is the smallest digit > x, z cannot be between them, so either z = x or z = y. Say z = x. Then our inequality is xB < xD < yC, which means B < D where both B and D have the same digits. However, B is sorted descending, so there is no string with those digits larger than it. Thus we cannot have B < D. Following the same steps, we see that if z = y, we cannot have D < C.

Therefore N' cannot exist, which means our algorithm correctly finds the next largest number.

nice solution! have one question. say "the smallest digit larger than x" is y. can we just swap x and y, then reverse x.index+1 -> end?

@Sterex, it's not just 99999; any number whose digits are already fully sorted in descending order is the max (so 98765 also has no solution, for example). This is easy to detect programatically because step 1 of the algorithm will fail (there is no pair of consecutive digits such that "the left-digit is smaller than the right-digit").

@TMN: 9 is larger than 8, so you'd move 9 to the left of 8: 9 832 then sort everything to the right of 9: 9238

@Kent for your solution to work you will have to change find the smallest digit larger than 4 to the right to find the smallest digit larger than 4 from the right. Otherwise, for example, 1234567849876 55 4321 will result in 1234567851234 54 6789 (instead of 1234567851234 45 6789). A nitpick :-)

## algorithm - Given a number, find the next higher number which has the ...

If the data you are sorting has a known distribution, I would use a Bucket Sort algorithm. You could add some extra logic to it so that you calculated the size and/or positions of the various buckets based upon properties of the distribution (ex: for Gaussian, you might have a bucket every (sigma/k) away from the mean, where sigma is the standard deviation of the distribution).

By having a known distribution and modifying the standard Bucket Sort algorithm in this way, you would probably get the Histogram Sort algorithm or something close to it. Of course, your algorithm would be computationally faster than the the Histogram Sort algorithm because there would probably not be a need to do the first pass (described in the link) since you already know the distribution.

Edit: given your new criteria of your question, (though my previous answer concerning Histogram Sort links to the respectable NIST and contains performance information), here is a peer review journal article from the International Conference on Parallel Processing:

The authors claim this algorithm has better performance (up to 30% better) than the popular Quick-Sort Algorithm.

Considering Quick-Sort as the reference sorting algorithm is quite skewed. IntroSort improves on it by special-casing the small arrays that occur in the recursion, TimSort (and a few other variations) also improve on it by detecting patterns (ascending / descending blocks) on the fly. Interesting paper still :)

## performance - Sorting algorithms for data of known statistical distrib...

You can do this in Go using interfaces. A function that takes an interface type is generic in the sense that it doesn't care about the data representation of the underlying concrete type. It does everything through method calls.

To make a generic version of your algorithm then, you have to identify all of the capabilities that the algorithm requires of the data objects and you have to define methods that abstract these capabilities. The abstract method signatures become method sets of interfaces.

To make a type compatible with this kind of generic algorithm, you define methods on the type to satisfy the interface of the algorithm parameter.

I'll take your example code and show one way to do this. Most of the required capabilities happen to be covered by sort.Interface so I chose to embed it. Only one other capability is needed, one to make a copy of the data.

type algoContainer interface { sort.Interface Copy() algoContainer }

Below is a complete working program made from your example code.

package main import ( "fmt" "sort" ) func main() { s1 := sortableString("abc") c1 := Algo(s1) fmt.Println(s1, <-c1) s2 := sortable3Ints([3]int{1,2,3}) c2 := Algo(&s2) fmt.Println(s2, <-c2) } type algoContainer interface { sort.Interface Copy() algoContainer } type sortableString []byte func (s sortableString) Len() int { return len(s) } func (s sortableString) Swap(i, j int) { s[i], s[j] = s[j], s[i] } func (s sortableString) Less(i, j int) bool { return s[i] < s[j] } func (s sortableString) Copy() algoContainer { return append(sortableString{}, s...) } func (s sortableString) String() string { return string(s) } type sortable3Ints [3]int func (sortable3Ints) Len() int { return 3 } func (s *sortable3Ints) Swap(i, j int) { (*s)[i], (*s)[j] = (*s)[j], (*s)[i] } func (s sortable3Ints) Less(i, j int) bool { return s[i] < s[j] } func (s sortable3Ints) Copy() algoContainer { c := s; return &c } func Algo(list algoContainer) chan algoContainer { n := list.Len() out := make(chan algoContainer) go func () { for i := 0; i < n; i++ { result := list.Copy() // actually useful: if result.Less(n-1, 0) { result.Swap(n-1, 0) } out <- result } close(out) }() return out }

## making generic algorithms in go - Stack Overflow

A detailed discussion on this problem has been discussed in Jon Bentley "Column 1. Cracking the Oyster" Programming Pearls Addison-Wesley pp.3-10

Bentley discusses several approaches, including external sort, Merge Sort using several external files etc., But the best method Bentley suggests is a single pass algorithm using bit fields, which he humorously calls "Wonder Sort" :) Coming to the problem, 4 billion numbers can be represented in :

4 billion bits = (4000000000 / 8) bytes = about 0.466 GB

The code to implement the bitset is simple: (taken from solutions page )

#define BITSPERWORD 32 #define SHIFT 5 #define MASK 0x1F #define N 10000000 int a[1 + N/BITSPERWORD]; void set(int i) { a[i>>SHIFT] |= (1<<(i & MASK)); } void clr(int i) { a[i>>SHIFT] &= ~(1<<(i & MASK)); } int test(int i){ return a[i>>SHIFT] & (1<<(i & MASK)); }

Bentley's algorithm makes a single pass over the file, setting the appropriate bit in the array and then examines this array using test macro above to find the missing number.

If the available memory is less than 0.466 GB, Bentley suggests a k-pass algorithm, which divides the input into ranges depending on available memory. To take a very simple example, if only 1 byte (i.e memory to handle 8 numbers ) was available and the range was from 0 to 31, we divide this into ranges of 0 to 7, 8-15, 16-22 and so on and handle this range in each of 32/8 = 4 passes.

I dont know the book, but no reason to call it "Wonder Sort", as it is just a bucketsort, with a 1-bit counter.

In the book, Jon Bentley is just being humorous...and calls it "Wonder Sort"

Although more portable, this code will be annihilated by code written to utilize hardware-supported vector instructions. I think gcc can automatically convert code to using vector operations in some cases though.

@brian I don't think Jon Bentley was allowing such things into his book on algorithms.

@BrianGordon, the time spent in ram will be negligible compared to the time spent reading the file. Forget about optimizing it.

## algorithm - Find an integer not among four billion given ones - Stack ...

A detailed discussion on this problem has been discussed in Jon Bentley "Column 1. Cracking the Oyster" Programming Pearls Addison-Wesley pp.3-10

Bentley discusses several approaches, including external sort, Merge Sort using several external files etc., But the best method Bentley suggests is a single pass algorithm using bit fields, which he humorously calls "Wonder Sort" :) Coming to the problem, 4 billion numbers can be represented in :

4 billion bits = (4000000000 / 8) bytes = about 0.466 GB

The code to implement the bitset is simple: (taken from solutions page )

#define BITSPERWORD 32 #define SHIFT 5 #define MASK 0x1F #define N 10000000 int a[1 + N/BITSPERWORD]; void set(int i) { a[i>>SHIFT] |= (1<<(i & MASK)); } void clr(int i) { a[i>>SHIFT] &= ~(1<<(i & MASK)); } int test(int i){ return a[i>>SHIFT] & (1<<(i & MASK)); }

Bentley's algorithm makes a single pass over the file, setting the appropriate bit in the array and then examines this array using test macro above to find the missing number.

If the available memory is less than 0.466 GB, Bentley suggests a k-pass algorithm, which divides the input into ranges depending on available memory. To take a very simple example, if only 1 byte (i.e memory to handle 8 numbers ) was available and the range was from 0 to 31, we divide this into ranges of 0 to 7, 8-15, 16-22 and so on and handle this range in each of 32/8 = 4 passes.

I dont know the book, but no reason to call it "Wonder Sort", as it is just a bucketsort, with a 1-bit counter.

In the book, Jon Bentley is just being humorous...and calls it "Wonder Sort"

Although more portable, this code will be annihilated by code written to utilize hardware-supported vector instructions. I think gcc can automatically convert code to using vector operations in some cases though.

@brian I don't think Jon Bentley was allowing such things into his book on algorithms.

@BrianGordon, the time spent in ram will be negligible compared to the time spent reading the file. Forget about optimizing it.