Rectangle 27 9

Probably the easiest way to make your data work with the CNN example code is to make a modified version of read_cifar10() and use it instead:

Write out a binary file containing the contents of your numpy array.

import numpy as np
images_and_labels_array = np.array([[...], ...],  # [[1,12,34,24,53,...,102],
                                                  #  [12,112,43,24,52,...,98],
                                                  #  ...]


This file is similar to the format used in CIFAR10 datafiles. You might want to generate multiple files in order to get read parallelism. Note that ndarray.tofile() writes binary data in row-major order with no other metadata; pickling the array will add Python-specific metadata that TensorFlow's parsing routines do not understand.

  • Write a modified version of read_cifar10() that handles your record format. def read_my_data(filename_queue): class ImageRecord(object): pass result = ImageRecord() # Dimensions of the images in the dataset. label_bytes = 1 # Set the following constants as appropriate. result.height = IMAGE_HEIGHT result.width = IMAGE_WIDTH result.depth = IMAGE_DEPTH image_bytes = result.height * result.width * result.depth # Every record consists of a label followed by the image, with a # fixed number of bytes for each. record_bytes = label_bytes + image_bytes assert record_bytes == 22501 # Based on your question. # Read a record, getting filenames from the filename_queue. No # header or footer in the binary, so we leave header_bytes # and footer_bytes at their default of 0. reader = tf.FixedLengthRecordReader(record_bytes=record_bytes) result.key, value = # Convert from a string to a vector of uint8 that is record_bytes long. record_bytes = tf.decode_raw(value, tf.uint8) # The first bytes represent the label, which we convert from uint8->int32. result.label = tf.cast( tf.slice(record_bytes, [0], [label_bytes]), tf.int32) # The remaining bytes after the label represent the image, which we reshape # from [depth * height * width] to [depth, height, width]. depth_major = tf.reshape(tf.slice(record_bytes, [label_bytes], [image_bytes]), [result.depth, result.height, result.width]) # Convert from [depth, height, width] to [height, width, depth]. result.uint8image = tf.transpose(depth_major, [1, 2, 0]) return result
  • Modify distorted_inputs() to use your new dataset: def distorted_inputs(data_dir, batch_size): """[...]""" filenames = ["/tmp/images.bin"] # Or a list of filenames if you # generated multiple files in step 1. for f in filenames: if not gfile.Exists(f): raise ValueError('Failed to find file: ' + f) # Create a queue that produces the filenames to read. filename_queue = tf.train.string_input_producer(filenames) # Read examples from files in the filename queue. read_input = read_my_data(filename_queue) reshaped_image = tf.cast(read_input.uint8image, tf.float32) # [...] (Maybe modify other parameters in here depending on your problem.)

This is intended to be a minimal set of steps, given your starting point. It may be more efficient to do the PNG decoding using TensorFlow ops, but that would be a larger change.

Thanks for the telling about the cifar10 format and using the ndarray.toFile() method. This will clear the problems of byte data reading. I should have searched more in documentation. One Small question - The label is of 1 byte as 10 classes can be accomodated easily. I have more than 300 classes. Should I change the size of label also to 2 bytes.I didn't ask it earlier as I was testing on 10 classes.

One thing more regarding my question of adding queues to numpy arrays instead of files. Is there any way or should I directly load them and use the feeding

If you change the label size to 2 bytes, there's (currently) an extra decoding step required. This answer shows how to read the label as a tf.int16 by using a separate decoder.

OK that clears the confusion. An extra step but it will get the work done

python - Attach a queue to a numpy array in tensorflow for data fetch ...

python machine-learning tensorflow
Rectangle 27 2

You can use the struct module, especially struct.pack to convert Python data into a string of binary data that you can then write to a file.

What the most efficient way of accessing the data is depends on the particulars. If you are using the same range of lambda values for all time values and the time intervals are always the same, then you know the length of the array of intensities for each t. In that case you can say e.g.

offset = ((time - 0.001)/0.001 * amount_of_intensities + (lambda - 0.01)/0.01)

and then use that offset to create a pointer. This assumes that you've read the binary file into memory and have created a pointer of the right type to it.

In [1]: import numpy as np

In [2]: data = np.random.random(20)

In [3]: data
array([ 0.40184104,  0.60411243,  0.52083848,  0.50300288,  0.14613242,
        0.39876911,  0.16157968,  0.70979254,  0.65662686,  0.14884378,
        0.65650842,  0.40906677,  0.3027295 ,  0.26070303,  0.82051509,
        0.96337179,  0.34622595,  0.08532211,  0.65079174,  0.68009011])

In [4]: import struct

In [5]: struct.pack('{}d'.format(len(data)), *data)
Out[5]: 'f\xf9\x80y\xc3\xb7\xd9?\xe2x\x92\x99\xe3T\xe3?0vCt\xb5\xaa\xe0?7\xfcJ|\x99\x18\xe0?X\xf5l\x8ew\xb4\xc2?b\x9c\xd1\xden\x85\xd9?\xc4\x0c\xad\x9d\xa4\xae\xc4?\xae\xc3\xbe\xd7\x9e\xb6\xe6?\xd5\xf3\xebV\x16\x03\xe5?\x14J\x9a$P\r\xc3?p\xd4t\xf3\x1d\x02\xe5?\xfe\tUg&.\xda?\xf4hV\x91\xeb_\xd3?@FL\xc0[\xaf\xd0?$\xbe\x08\xda\xa8A\xea?\xf3\x93\xcb\x11\xf1\xd3\xee?\xce\x9e\xd9\xe7\x90(\xd6?\x10\xd2\x12c\xab\xd7\xb5?f\xac\x124I\xd3\xe4?}\x95\x1cSL\xc3\xe5?'

I'm using the numpy module for convenience. It would work just as well with a list of floating point numbers.

To analyse the last line from the inside out. The format expression gives:

In [9]: '{}d'.format(len(data))
Out[9]: '20d'

This means we want to make a string of 20 d values. The d is the format character for a IEEE 754 double width floating point number.

So what we really have is;

struct.pack('20d', *data)

The *-operator before the data means "unpack this list".

Note that binary numbers generally not portable between different hardware platforms (e.g. intel x86 and ARM).

Once you have this big array of binary numbers, you can just write that to a file.

In C, open the file and read the whole thing into a block of memory. Then make a pointer of the correct type to the beginning of that block of memory and you're good to go.

The time steps are logarithmically distributed, so each order of magnitude has an equal number of divisions. I suppose that changes things... I will update the question to specify that.

To clarify the above, would the array look something like this data = [t0, w00, i00, w01, i01, ... w0n, i0n, t1, w10, i10, w11, i11, ... tn, wn0, in0, wn1, in1, .. wnn, inn] where tx is time step, wxy is a corresponding wavelength and ixy is the corresponding intensity? Then one would use offset to jump ahead a certain number of bits in the binary file to find the correct data point?

@jasper If the spacing of the time-steps and wavelengths are both consistent over the whole range, you only have to store the intensities. Because you can then use a formula to index the correct intensity. That the essence of a look-up table; that you can use a simple formula to find the correct number. If the number or spacing of the wavelength is not constant for all time-steps, the problem becomes a lot more complicated.

Making lookup table in python, writing to a binary file that can be re...

python c binaryfiles lookup-tables
Rectangle 27 0

You could split each row a sign and value variable. Then if sign is negative multiply the value by -1.

row = array[0]
sign, value = row[0], row[1:]
int(''.join(map(str, value)), 2) if sign == 0 else int(''.join(map(str, value)), 2) * -1

Sign up for our newsletter and get our top new questions delivered to your inbox (see an example).

numpy - Convert a binary string into signed integer - Python - Stack O...

python numpy integer type-conversion
Rectangle 27 0

First of all, it looks like NumPy array rather than NumPy matrix. There are a couple options I can think of. Pretty straight forward way will look like that:

def rowToSignedDec(arr, row):
    res = int(''.join(str(x) for x in arr[row][1:].tolist()),2)

    if arr[row][0] == 1:
        return -res

        return res

print rowToSignedDec(arr, 0)


That one is clearly not the most efficient one and neither the shortest one-liner:

int(''.join(str(x) for x in arr[0][1:].tolist()),2) - 2*int(arr[0][0])*int(''.join(str(x) for x in arr[0][1:].tolist()),2)

Where arr is the above-mentioned array.

numpy - Convert a binary string into signed integer - Python - Stack O...

python numpy integer type-conversion
Rectangle 27 0

Pyhton doesn't know that the first bit is supposed to represent the sign (compare with bin(-59)), so you have to handle that yourself, for example, if A contains the array:

num = int(''.join(map(str, A[0,1:])), 2)
if A[0,0]:
    num *= -1

Here's a more Numpy-ish way to do it, for the whole array at once:

num = np.packbits(A).astype(np.int8)
num[num<0] = -128 - num[num<0]

Finally, a code-golf version:


numpy - Convert a binary string into signed integer - Python - Stack O...

python numpy integer type-conversion
Rectangle 27 0

Python's standard library has some of what you require -- the array module in particular lets you easily read parts of binary files, swap endianness, etc; the struct module allows for finer-grained interpretation of binary strings. However, neither is quite as rich as you require: for example, to present the same data as bytes or halfwords, you need to copy it between two arrays (the numpy third-party add-on is much more powerful for interpreting the same area of memory in several different ways), and, for example, to display some bytes in hex there's nothing much "bundled" beyond a simple loop or list comprehension such as [hex(b) for b in thebytes[start:stop]]. I suspect there are reusable third-party modules to facilitate such tasks yet further, but I can't point you to one...

scripting - What language is to binary, as Perl is to text? - Stack Ov...

scripting binary-data patching fileparsing
Rectangle 27 0

Or, if you want big-endian:

>>> np.fromstring(b'\x00\x00\x80?\x00\x00\x00@\x00\x00@@\x00\x00\x80@', dtype='>f4') # or dtype=np.dtype('>f4'), or np.float32  on a big-endian system
array([  4.60060299e-41,   8.96831017e-44,   2.30485571e-41,
         4.60074312e-41], dtype=float32)

The b isn't necessary prior to Python 3, of course.

In fact, if you actually are using a binary file to load the data from, you could even skip the using-a-string step and load the data directly from the file with numpy.fromfile().

This is excellent. (thanks). One downside here that I can see is that there's no way to specify endianness. Any ideas on that one?

np.fromstring(b'\x00\x00\x80?\x00\x00\x00@\x00\x00@@\x00\x00\x80@', dtype=np.dtype('>f4'))
array([4.60060299e-41, 8.96831017e-44, 2.30485571e-41, 4.60074312e-41], dtype=float32)

perfect. just what I was looking for.

I think you should add the stuff about endianness conversion to your answer. I think that could also be potentially helpful to someone else with the same question I had so it makes sense to feature it more prominently than in a comment.

python - convert binary string to numpy array - Stack Overflow

python numpy binary-data