Rectangle 27 0

c++ Fastest way to scan for bit pattern in a stream of bits?


R(m) = sum  _ k = 0 ^ n' x_{k+m} y_k
[ 1 1 0 0 1 0 1 0 0 0 1 0 1 0 1]
[-1 -1  1 -1  1 -1]
[0 0 1 0 1 0]

So if you are trying to match for the bit pattern

You can implement cross-correlation, using the FFT in O(n log n ) time.

You can use the fast fourier transform for extremely large input (value of n) to find any bit pattern in O(n log n ) time. Compute the cross-correlation of a bit mask with the input. Cross -correlation of a sequence x and a mask y with a size n and n' respectively is defined by

in

the -1's in the mask guarantee that those places must be 0.

then occurences of your bit pattern that match the mask exactly where R(m) = Y where Y is the sum of one's in your bit mask.

then you must use the mask

what would you use fourier if this can be solved in O(n) time?

Note
Rectangle 27 0

c++ Fastest way to scan for bit pattern in a stream of bits?


void find_matches(unsigned char* sequence, int n_sequence, unsigned short pattern) {
    if (n_sequence < 2)
        return; // 0 and 1 byte bitstring can't match a short

    // Calculate a lookup table that shows for each byte at what bit offsets
    // the pattern could match.
    unsigned int match_offsets[256];
    for (unsigned int in_byte = 0; in_byte < 256; in_byte++) {
        match_offsets[in_byte] = 0xFF;
        for (int bit = 0; bit < 24; bit++) {
            match_offsets[in_byte] <<= 1;
            unsigned int mask = (0xFF0000 >> bit) & 0xFFFF;
            unsigned int match_location = (in_byte << 16) >> bit;
            match_offsets[in_byte] |= !((match_location ^ pattern) & mask);
        }
    }

    // Go through the input 2 bytes at a time, looking up where they match and
    // anding together the matches offsetted by one byte. Each bit offset then
    // shows if the input sequence is consistent with the pattern matching at
    // that position. This is anded together with the large offsets of the next
    // result to get a single match over 3 bytes.
    unsigned int curr, next;
    curr = 0;
    for (int pos = 0; pos < n_sequence-1; pos+=2) {
        next = ((match_offsets[sequence[pos]] << 8) | 0xFF) & match_offsets[sequence[pos+1]];
        unsigned short match = curr & (next >> 16);
        if (match)
            output_match(pos, match);
        curr = next;
    }
    // Handle the possible odd byte at the end
    if (n_sequence & 1) {
        next = (match_offsets[sequence[n_sequence-1]] << 8) | 0xFF;
        unsigned short match = curr & (next >> 16);
        if (match)
            output_match(n_sequence-1, match);
    }
}

void output_match(int pos, unsigned short match) {
    for (int bit = 15; bit >= 0; bit--) {
        if (match & 1) {
            printf("Bitstring match at byte %d bit %d\n", (pos-2) + bit/8, bit % 8);
        }
        match >>= 1;
    }
}

A fast way to find the matches in big bitstrings would be to calculate a lookup table that shows the bit offsets where a given input byte matches the pattern. Then combining three consecutive offset matches together you can get a bit vector that shows which offsets match the whole pattern. For example if byte x matches first 3 bits of the pattern, byte x+1 matches bits 3..11 and byte x+2 matches bits 11..16, then there is a match at byte x + 5 bits.

Here's some example code that does this, accumulating the results for two bytes at a time:

I'll represent bits in least to most significant order. match_offsets[0x00] = 4286594816. That is 00000000 11111100 00000001 11111111, the last 9 bits are not significant, the 6 set bits in the middle represent that the byte can match at pattern >> 1, pattern >> 2 .. pattern >> 6. (for reference least significant bit means that byte matches at pattern << 7 masking out 7 low bits, 8'th bit or bit7 means match at pattern << 0 or that the given byte equals the low byte of the pattern)

The main loop of this is 18 instructions long and processes 2 bytes per iteration. If the setup cost isn't an issue, this should be about as fast as it gets.

Trying do digest your code (not sure if I get it completely) I see one potential problem: if a byte is located at more than 1 place in the pattern then your lookuptable won't work since it can only store one place in the pattern. So given the byte: 00000000 and the pattern: 1000000000000011 it should give 5 locations, but it can only give 1, resulting in a possible miss of the pattern. Or am I missing something?

Note
Rectangle 27 0

c++ Fastest way to scan for bit pattern in a stream of bits?


for( RightShift = 0; RightShift < 8; RightShift++ )
    {
        LeftShift = 8 - RightShift;

        Starting = 128 >> RightShift;

        Byte = MSB >> RightShift;

        Count = 0xFF >> LeftShift;

        for( i = 0; i <= Count; i++ )
        {
            Index = ( i << LeftShift ) | Byte;

            Left[Index] |= Starting;
        }

        Byte = LSB << LeftShift;

        Count = 0xFF >> RightShift;

        for( i = 0; i <= Count; i++ )
        {
            Index = i | Byte;

            Right[Index] |= Starting;
        }

        Index = ( unsigned char )(( Pattern >> RightShift ) & 0xFF );

        Middle[Index] |= Starting;
    }
00001110
00101110
01001110
01101110
10001110
10101110
11001110
11101110
Count = Left[Byte0] & Middle[Byte1] & Right[Byte2];
Count = Left[Data[StreamLength - 2]] & Middle[Data[StreamLength - 1]] & 128;

if( Count )
{
    TotalCount += GetNumberOfOnes( Count );
}
for( int Index = 1; Index < ( StreamLength - 1); Index++ )
{
    Count = Left[Data[Index - 1]] & Middle[Data[Index]] & Right[Data[Index + 1]];

    if( Count )
    {
        TotalCount += GetNumberOfOnes( Count );
    }
}

Above loop cannot detect a Pattern if it is placed at the very end of stream buffer. Following code need to add after loop to overcome this limitation.

Data is stream buffer, Left is left lookup table, Middle is middle lookup table and Right is right lookup table.

Here checking for 1 to 8 will be taken care in first sample and 9 to 16 in next sample and so on. Now when we are searching for a Pattern, we will find all the 8 possible arrangements (as below) of this Pattern and will store in 3 lookup tables (Left, Middle and Right).

I can give some sample code which is tested.

I would like to suggest a solution using 3 lookup tables of size 256. This would be efficient for large bit streams. This solution takes 3 bytes in a sample for comparison. Following figure shows all possible arrangements of a 16 bit data in 3 bytes. Each byte region has shown in different color.

Lets take an example 0111011101110111 as a Pattern to find. Now consider 4th arrangement. Left part would be XXX01110. Fill all raws of Left lookup table pointing by left part (XXX01110) with 00010000. 1 indicates starting position of arrangement of input Pattern. Thus following 8 raws of Left look up table would be filled by 16 (00010000).

Middle part of arrangement would be 11101110. Raw pointing by this index (238) in Middle look up table will be filled by 16 (00010000).

Now Right part of arrangement would be 111XXXXX. All raws (32 raws) with index 111XXXXX will be filled by 16 (00010000).

Number of 1 in Count gives the number of matching Pattern in taken sample.

Take a sample of three bytes. Find Count as follows where Left is left lookup table, Middle is middle lookup table and Right is right lookup table.

This algorithm takes only N-1 logical steps to find a Pattern in an array of N bytes. Only overhead is to fill the lookup tables initially which is constant in all the cases. So this will be very effective for searching huge byte streams.

Thus raws with index XX011101 in Left lookup table and 11101110 in Middle lookup table and 111XXXXX in Right lookup table will be updated to 00100010 by 7th arrangement.

Two optimizations I can see. First would be to check middle before looking up left and right, skipping the left and right checks if middle doesn't look good. A second optimization would be to use one 256x32-bit table rather than three 256x8 tables. This would replace three one-byte dereferences with one longword deference (the matching formula would be (byte2_lookup >> 16) & (byte1_lookup >> 8) & (byte0_lookup)). Note that if a lookup value is zero, one may be able to skip a byte provided one is prepared to backtrack upon finding a non-zero lookup value.

We should not overwrite elements in look up table while filling. Instead do a bitwise OR operation to update an already filled raw. In above example, all raws written by 3rd arrangement would be updated by 7th arrangement as follows.

great that solutions still keep coming in. By the way, your name sounds awesome. Could be the name of a character in a star wars movie ;^)

Note
Rectangle 27 0

c++ Fastest way to scan for bit pattern in a stream of bits?


int numMatch(unsigned short curWord, unsigned short prevWord)
  {
       int numHits = 0;
       int combinedWords = (prevWord<<16) + curWord;

       int i=0;
       for(i=0; i<16; i++)
       {
             if((combinedWords & masks[i]) == preShifsts[i]) numHits++;
       }
       return numHits;
  }
unsigned short pattern = 1234;
 unsigned int preShifts[16];
 unsigned int masks[16];
 int i;
 for(i=0; i<16; i++)
 {
      preShifts[i] = (unsigned int)(pattern<<i);  //gets promoted to int
      masks[16] = (unsigned int) (0xffff<<i);
 }

Cool. If you use SIMD you could potentially search even faster by putting the pattern a few times 'next to eachother' in the huge registers. This way you can compare something like 4 unsigned int's at the same time.

I think precalc all shifted values of the word and put them in 16 ints so you got an array like this

I think the general idea is workable, but the details need some refinement. A pattern might span up to 3 bytes so the 'match' table would have to contain items that can hold 24 bit values (probably unsigned ints); you'd need to represent all of the possible values in the 'don't care' bits, so you'd need 16*256=4096 items your match table. A binary search might be in order. Finally you have to build the 3 byte value to look up on a byte-by-byte basis, not on a short-by-short basis. On the whole, I think that mouviciel's state machine approach would be simpler and possibly faster.

This is more along the lines of what I was thinking and quite possibly as good as it will get, especially if SIMD intrinsic will allow multiple comparisons across multiple word. The "flag" is non-zero and unique within the bit stream.

and then for every unsigned short you get out of the stream, make an int of that short and the previous short and compare that unsigned int to the 16 unsigned int's. If any of them match, you got one.

e.g. 32 bits of 0's and the pattern you want to detect is 16 0's, then it would mean the pattern is detected 16 times!

edit: do note that this could potentially mean multiple hits when the patterns is detected more than once on the same bits:

michael: I don't think you understand the logic completely. The 16bit pattern is shifted 16 times into 16 unsigned ints creating every possible variation of the 16bit word. By getting 16 bits a time and masking this and checking it against the 16 possible patterns you have done all the checks you need.

Note
Rectangle 27 0

c++ Fastest way to scan for bit pattern in a stream of bits?


Google "Adapting the KnuthMorrisPratt algorithm for pattern matching in Huffman encoded texts"

I don't understand your insistence that KMP is restricted to byte sized ops. I have in mind a bitwise KMP, which shifts right by one bit on mismatch.

Reinier, I'm concentrating on the string search algorithm here. You point our that the bitwise masking ops are not free. For now, I assume that they are comparably expensive for each algorithm. My main point, though, is that the specifics of the application may allow us to beat KMP.

Turns out the particulars might indicate a quicker approach than KMP. The KMP article links to

You can find further introductory discussion here.

for a particular case when the search pattern is 'AAAAAA'. For a multiple pattern search, the

looked good until I considered Luke and MSalter's requests for more information about the particulars.

might be most suitable.

still... these all relate to text searches and nothing on bitlevel. Turning the bitstream into a bytestream first might be very costly (memory/processortime) and impractical.

Note