Rectangle 27 46

Update 2015: Nowadays I always recommend Anaconda. It includes lots of Python packages for scientific computing, data science, web development, etc. It also provides a superior environment tool, conda, which allows to easily switch between environments, even between Python 2 and 3. It is also updated very quickly as soon as a new version of a package is released, and you can just do conda update packagename to update it.

On Windows, what is complicated is to compile the math packages, so I think a manual install is a viable option only if you are interested only in Python, without other packages.

Anaconda has around 270 packages, including the most important for most scientific applications and data analysis, that is, NumPy, SciPy, Pandas, IPython, matplotlib, Scikit-learn. So if this is enough for you, I would choose Anaconda.

Instead, if you are interested in other packages, and even more if you use any of the Enthought packages (Chaco for example is very useful for realtime data visualization), then EPD/Canopy is probably a better choice. The Academic version has a larger number of packages in the base install, and many more in the repository. Anaconda also includes Chaco.

I'm looking at this same question now myself. You state Canopy includes more packages, does that mean it is not possible to install these other packages in anaconda? It seems silly to limit myself not knowing if 2 years down the road I need a certain package.

hopefully in 2 years you'll update your OS or python installations...anyway yes, you can install every additional python package in whatever python distribution you choose. For python-only packages, this is very simple. For packages that embed C or C++ extensions (scientific packages usually) this is more difficult, specially under windows, so better think in advance.

FWIW, Anaconda also includes Chaco and includes much more than just 20 packages: docs.continuum.io/anaconda/pkgs.html (Even more are available in the repo and not bundled with the installer.)

Also FWIW, Anaconda now has nice conda-meta/pkg* info on all 100-odd packages: requires, version ... (conda-requires summarizes all the requires.)

I've been trying to get python set up for data mining on my Mac. I still haven't cracked this nut, but the most disappointing part so far has been installing Enthought Canopy Express and then learning they charge $199 for access to scikit-learn and nltk.

Anaconda vs. EPD Enthought vs. manual installation of Python - Stack O...

python epd-python anaconda
Rectangle 27 78

In [1]: import numpy as np

In [2]: a = np.array([[1, 2, 3], [4, 5, 6]])

In [3]: b = np.array([[9, 8, 7], [6, 5, 4]])

In [4]: np.concatenate((a, b))
Out[4]: 
array([[1, 2, 3],
       [4, 5, 6],
       [9, 8, 7],
       [6, 5, 4]])
In [1]: a = np.array([1, 2, 3])

In [2]: b = np.array([4, 5, 6])

In [3]: np.vstack((a, b))
Out[3]: 
array([[1, 2, 3],
       [4, 5, 6]])

Hi when i run this i get this np.concatenate((a,b),axis=1) Output: array([1, 2, 3, 2, 3, 4]) But what I looking for is numpy 2d array??

@Fraz: I've added Sven's vstack() idea. You know you can create the array with array([[1,2,3],[2,3,4]]), right?

numpy.vstack can accept more than 2 arrays in the sequence argument. Thus if you need to combine more than 2 arrays, vstack is more handy.

@oneleggedmule concatenate can also take multiple arrays

python - Append a NumPy array to a NumPy array - Stack Overflow

python numpy
Rectangle 27 78

In [1]: import numpy as np

In [2]: a = np.array([[1, 2, 3], [4, 5, 6]])

In [3]: b = np.array([[9, 8, 7], [6, 5, 4]])

In [4]: np.concatenate((a, b))
Out[4]: 
array([[1, 2, 3],
       [4, 5, 6],
       [9, 8, 7],
       [6, 5, 4]])
In [1]: a = np.array([1, 2, 3])

In [2]: b = np.array([4, 5, 6])

In [3]: np.vstack((a, b))
Out[3]: 
array([[1, 2, 3],
       [4, 5, 6]])

Hi when i run this i get this np.concatenate((a,b),axis=1) Output: array([1, 2, 3, 2, 3, 4]) But what I looking for is numpy 2d array??

@Fraz: I've added Sven's vstack() idea. You know you can create the array with array([[1,2,3],[2,3,4]]), right?

numpy.vstack can accept more than 2 arrays in the sequence argument. Thus if you need to combine more than 2 arrays, vstack is more handy.

@oneleggedmule concatenate can also take multiple arrays

python - Append a NumPy array to a NumPy array - Stack Overflow

python numpy
Rectangle 27 40

Pure Python (2 & 3), a snippet without 3rd party dependencies.

RGBA
def write_png(buf, width, height):
    """ buf: must be bytes or a bytearray in Python3.x,
        a regular string in Python2.x.
    """
    import zlib, struct

    # reverse the vertical line order and add null bytes at the start
    width_byte_4 = width * 4
    raw_data = b''.join(b'\x00' + buf[span:span + width_byte_4]
                        for span in range((height - 1) * width_byte_4, -1, - width_byte_4))

    def png_pack(png_tag, data):
        chunk_head = png_tag + data
        return (struct.pack("!I", len(data)) +
                chunk_head +
                struct.pack("!I", 0xFFFFFFFF & zlib.crc32(chunk_head)))

    return b''.join([
        b'\x89PNG\r\n\x1a\n',
        png_pack(b'IHDR', struct.pack("!2I5B", width, height, 8, 6, 0, 0, 0)),
        png_pack(b'IDAT', zlib.compress(raw_data, 9)),
        png_pack(b'IEND', b'')])

... The data should be written directly to a file opened as binary, as in:

data = write_png(buf, 64, 64)
with open("my_image.png", 'wb') as fd:
    fd.write(data)

This seems to be exactly what I'm looking for, but could you add some comments? I don't see how this writes to a file. Do you have to write the output in a previously opened file? Thanks!

Can someone specify what format the image (buf) is supposed to be in? It does not seem to be a numpy array...

@christianmbrodbeck, a bytearray (RGBARGBA...)

python - Saving a Numpy array as an image - Stack Overflow

python image numpy
Rectangle 27 6

say we have a 3 dimensional array of dimensions 2 x 10 x 10

r = numpy.random.rand(2, 10, 10)
numpy.reshape(r, shape=(5,5,8))

will do it. once you fix first dim = 5 and second dim = 5, u dont need to determine third dimension. to Assist your laziness, python gives -1.

numpy.reshape(r, shape=(5,5,-1))
numpy.reshape(r, shape=(50,-1))

will give you an array of shape = (50,4)

python - What does -1 mean in numpy reshape? - Stack Overflow

python numpy
Rectangle 27 43

As Sven mentioned, x[[[0],[2]],[1,3]] will give back the 0 and 2 rows that match with the 1 and 3 columns while x[[0,2],[1,3]] will return the values x[0,1] and x[2,3] in an array.

There is a helpful function for doing the first example I gave, numpy.ix_. You can do the same thing as my first example with x[numpy.ix_([0,2],[1,3])]. This can save you from having to enter in all of those extra brackets.

python - Slicing of a NumPy 2d array, or how do I extract an mxm subma...

python numpy slice
Rectangle 27 43

As Sven mentioned, x[[[0],[2]],[1,3]] will give back the 0 and 2 rows that match with the 1 and 3 columns while x[[0,2],[1,3]] will return the values x[0,1] and x[2,3] in an array.

There is a helpful function for doing the first example I gave, numpy.ix_. You can do the same thing as my first example with x[numpy.ix_([0,2],[1,3])]. This can save you from having to enter in all of those extra brackets.

python - Slicing of a NumPy 2d array, or how do I extract an mxm subma...

python numpy slice
Rectangle 27 4

>>> import itertools as it
>>> a = [3, 2, 5, 1]
>>> [y - x for x, y in it.combinations(a, 2)]
[-1, 2, -2, 3, -1, -4]

Cheers for that. Yeah something like that. But I was wondering if there is some function of numpy or scipy which does it efficiently? I need to calculate the above for several tens of thousands of rows of data, so ideally it should be efficient.

@Astrid - The vectorized numpy version will calculate each value twice, but it some cases it might still be faster. Basically, you want the lower triangle of np.subtract.outer(a, a). wim's answer is likely to be faster in a lot of cases, though.

Didn't notice the numpy tag at first, and you used list data in your question. Joe's suggestion is good, and then you can collect the values from the lower triangle using np.tril_indices

It would be nice if numpy ufuncs had a .pairwise method...

numpy - Difference between ALL 1D points in array with python diff()? ...

python numpy scipy difference
Rectangle 27 4

There is a syntax error in that file. I guess you're using the development sources? That rb shouldn't be there before the regular expression (it should be r in Python 2.x, maybe b in Python 3.x).

UPDATE: Yep. Here's the faulty commit:

I guess I must be using dev sources; however I did install from the Master branch.

Master seems to be broken. Either download the tarball, or checkout 1.2.1rc1 or 1.2.0.

I took the easy way out and just edit the file direct removing the 'b' and it's working now :).

Sign up for our newsletter and get our top new questions delivered to your inbox (see an example).

python - Issues with importing pylab on Mac OS X 10.8.3 - Stack Overfl...

python numpy matplotlib osx-mountain-lion
Rectangle 27 39

OpenCV has support for getting data from a webcam, and it comes with Python wrappers by default, you also need to install numpy for the OpenCV Python extension (called cv2) to work. At the time of writing (January 2015) there is no Python 3 support yet, so you need to use Python 2.

import cv2

cv2.namedWindow("preview")
vc = cv2.VideoCapture(0)

if vc.isOpened(): # try to get the first frame
    rval, frame = vc.read()
else:
    rval = False

while rval:
    cv2.imshow("preview", frame)
    rval, frame = vc.read()
    key = cv2.waitKey(20)
    if key == 27: # exit on ESC
        break
cv2.destroyWindow("preview")

There is Python 3 support if you install from wheel. I used this tutorial successfully: solarianprogrammer.com/2016/09/17/

How do I access my webcam in Python? - Stack Overflow

python webcam
Rectangle 27 8

>>> import numpy as np
>>> a=np.arange(1,7).reshape((2,3))
>>> a
array([[1, 2, 3],
       [4, 5, 6]])
>>> a.flatten()
array([1, 2, 3, 4, 5, 6])

and

>>> import numpy as np
>>> b=np.arange(1,13).reshape((2,2,3))
>>> b
array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])
>>> b.reshape((2,6))
array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

+1 for flatten() -- it can also do Fortran/column-major flattening. reshape(-1) will also flatten.

correct and efficient way to flatten array in numpy in python? - Stack...

python numpy scipy
Rectangle 27 5

It is of rank one, as you need one index to index it. That one axis has the length 3, as the index indexing it can take three different values: v[i], i=0..2.

In Python NumPy what is a dimension and axis? - Stack Overflow

python numpy
Rectangle 27 5

It is of rank one, as you need one index to index it. That one axis has the length 3, as the index indexing it can take three different values: v[i], i=0..2.

In Python NumPy what is a dimension and axis? - Stack Overflow

python numpy
Rectangle 27 6

In [26]: df
Out[26]:
    0   1   2   3
0   1 NaN NaN   2
1 NaN   1 NaN   2
2 NaN NaN NaN NaN

Then transposed and turned it into a series. I think this is similar to np.hstack:

In [28]: s = df.T.unstack(); s
Out[28]:
0  0     1
   1   NaN
   2   NaN
   3     2
1  0   NaN
   1     1
   2   NaN
   3     2
2  0   NaN
   1   NaN
   2   NaN
   3   NaN
In [29]: s.notnull().astype(int).cumsum()
Out[29]:
0  0    1
   1    1
   2    1
   3    2
1  0    2
   1    3
   2    3
   3    4
2  0    4
   1    4
   2    4
   3    4

This expression creates a Series where every nan is a 1 and everything else is a zero:

In [31]: s.isnull().astype(int)
Out[31]:
0  0    0
   1    1
   2    1
   3    0
1  0    1
   1    0
   2    1
   3    0
2  0    1
   1    1
   2    1
   3    1

We can combine the two in the following manner to achieve the counts you need:

In [32]: s.isnull().astype(int).groupby(s.notnull().astype(int).cumsum()).sum()
Out[32]:
1    2
2    1
3    1
4    4

Waow, that's some pandas magic that I am always impressed of! However, your implementation consider that consecutive nans but on different columns/rows actually belong to the same 'block'. I have create a small ipython notebook (nbviewer.ipython.org/url/www.guillaumeallain.info/) to play with showing that problem. Performance-wise, the numpy implementation is also approx. 3 times faster.

Finding start and stops of consecutive values block in Python/Numpy/Pa...

python numpy pandas
Rectangle 27 6

not mask the sender part

c[(a > 3) & (b > 8)]+=b*2
# ^ 1x1 matrix        ^3x4 matrix

The dimensions are not the same. Given you want to perform element-wise addition (based on your example), you can simply add the slicing to the right part as well:

c[(a > 3) & (b > 8)]+=b[(a > 3) & (b > 8)]*2
mask = (a > 3) & (b > 8)
c[mask] = b[mask]*2

python - Conditional operations on numpy arrays - Stack Overflow

python arrays numpy conditional
Rectangle 27 3

In [2]:

a
Out[2]:
array([[4, 1, 1, 2, 0, 4],
       [3, 4, 3, 1, 4, 4],
       [1, 4, 3, 1, 0, 0],
       [0, 4, 4, 0, 4, 3],
       [0, 0, 0, 0, 0, 0]])
In [3]:

a[~(a==0).all(1)]
Out[3]:
array([[4, 1, 1, 2, 0, 4],
       [3, 4, 3, 1, 4, 4],
       [1, 4, 3, 1, 0, 0],
       [0, 4, 4, 0, 4, 3]])

python - Remove all-zero rows in a 2D matrix - Stack Overflow

python numpy scipy
Rectangle 27 7

How much RAM do you have? You'll need quite a bit more than 2GB of RAM to store a 2-gig image. I don't know how efficient Image is at storing images, but a list of bytes uses four bytes of space for each element in the list, so you'll burn more than 8GB of (virtual) memory... and a lot of patience. Edit: Since you only have 4 (or 3) GB to play with, this is almost certainly your problem.

But why are you trying to convert it to a numeric array? Use the methods of the im object returned by Image.open, as in the PIL Tutorial.

I don't know what you're doing with the image, but perhaps you can do it without reading the entire image in memory, or at least without converting the entire object into a numpy array. Read it bit by bit if possible to avoid blowing up your machine: Read up on python generators, and see the Image.getdata() method, which returns your image one pixel value at a time.

read big image file as an array in python - Stack Overflow

python image numpy python-imaging-library
Rectangle 27 7

How much RAM do you have? You'll need quite a bit more than 2GB of RAM to store a 2-gig image. I don't know how efficient Image is at storing images, but a list of bytes uses four bytes of space for each element in the list, so you'll burn more than 8GB of (virtual) memory... and a lot of patience. Edit: Since you only have 4 (or 3) GB to play with, this is almost certainly your problem.

But why are you trying to convert it to a numeric array? Use the methods of the im object returned by Image.open, as in the PIL Tutorial.

I don't know what you're doing with the image, but perhaps you can do it without reading the entire image in memory, or at least without converting the entire object into a numpy array. Read it bit by bit if possible to avoid blowing up your machine: Read up on python generators, and see the Image.getdata() method, which returns your image one pixel value at a time.

read big image file as an array in python - Stack Overflow

python image numpy python-imaging-library
Rectangle 27 2

In [1]: def foo(V):
   ...:     return V[0]+V[1]
   ...: 
In [2]: foo(np.array([1,3]))
Out[2]: 4
In [3]: foo(np.array([[[1,2],[3,4]], [[5,6],[7,8]]]))
Out[3]: 
array([[ 6,  8],
       [10, 12]])
In [4]: np.array([[[1,2],[3,4]], [[5,6],[7,8]]])[0]
Out[4]: 
array([[1, 2],
       [3, 4]])
In [5]: np.array([[[1,2],[3,4]], [[5,6],[7,8]]])[1]
Out[5]: 
array([[5, 6],
       [7, 8]])

If you expected something else, you'll have to show us.

As for your second question:

In [6]: t1=np.array([[1,2,3], [4,5,6]])
   ...: t2=np.array([1,2,3])
   ...: t3=np.array([[1,2,3], [4,5,6],5])
   ...: 
In [7]: t1.shape
Out[7]: (2, 3)
In [8]: t2.shape
Out[8]: (3,)
In [9]: t3.shape
Out[9]: (3,)
(3,)
In [11]: (3)
Out[11]: 3
In [12]: (3,)
Out[12]: (3,)

There have been several recent questions about (3,) v (3,1) shape arrays, and np.array([[1,2,3]]) v. np.array([1,2,3]).

t3 is an object dtype array, with 3 elements. The 3 inputs are different length, so it can't create a 2d array. Stay away from this type of array for now. Focus on the simpler arrays.

In [10]: t3
Out[10]: array([[1, 2, 3], [4, 5, 6], 5], dtype=object)
In [13]: t3[0]
Out[13]: [1, 2, 3]
In [14]: t3[2]
Out[14]: 5
nGauss
In [53]: mu=np.array([0,0])
In [54]: cov=np.eye(2)
In [55]: xx=np.array([[[1,2], [5,6]], [[7,8],[9,0]]])
In [56]: np.apply_along_axis(nGauss, -1, xx, mu, cov)
Out[56]: 
array([[ -1.30642333e-02,  -9.03313360e-15],
       [ -4.61510838e-26,  -4.10103631e-19]])

apply_along_axis iterates on the 1st 2 dim, passing each xx[i,j,:] to nGauss. It's not fast, but is relatively easy to apply.

k = X.shape[0];  # I assume you want
k = X.shape[[1]   # the last dimension
dev = X-mu     # works as long as mu has k terms
p1 = np.power( np.power(np.pi * 2, k) , -0.5);
p2 = np.power( np.linalg.det(cov)  , -0.5)
p3 = np.exp( -0.5 * np.dot( np.dot(dev.transpose(), np.linalg.inv(cov)), dev));

In the simple (2,) x case, dev is 1d, and dev.transpose() does nothing.

It's easier to generalize einsum than dot; I think the equivalent is:

p3 = np.einsum('j,j', np.einsum('i,ij', dev, np.linalg.inv(cov)), dev)
p3 = np.exp( -0.5 * p3)
p3 = np.einsum('i,ij,j', dev, np.linalg.inv(cov), dev)
p3 = np.einsum('...i,ij,...j', dev, np.linalg.inv(cov), dev)
def nGaussA(X, mu, cov):
    # multivariate negative gaussian.    
    # mu is a vector and cov is a covariance matrix.

    k = X.shape[-1];
    dev = X-mu
    p1 = np.power( np.power(np.pi * 2, k) , -0.5);
    p2 = np.power( np.linalg.det(cov)  , -0.5)
    p3 = np.einsum('...i,ij,...j', dev, np.linalg.inv(cov), dev)
    p3 = np.exp( -0.5 * p3)
    return -1.0 * p1 * p2 * p3;
In [85]: nGaussA(x,mu,cov)
Out[85]: -0.013064233284684921
In [86]: nGaussA(xx,mu,cov)
Out[86]: 
array([[ -1.30642333e-02,  -9.03313360e-15],
       [ -4.61510838e-26,  -4.10103631e-19]])

So the way to generalize the function is to check each step. If it produces a scalar, keep it. If operates with an x keep it. But if it requires coordinating dimensions with other arrays, use a numpy operation that does that. Often that involves broadcasting. Sometimes it helps to study other numpy functions to see how they generalize (e.g. apply_along_axis, apply_over_axes, cross, etc).

An interactive numpy session is essential; allowing me to try ideas with small sample arrays.

Thanks for the answers. It really helps me understand mysterious arrays. Meanwhile, now I realize that I should have posted my original function, which I just updated. I wrote a negative gaussian pdf function, which takes one np.array in (n,) shape. Thus, it can only take arguments like np.array([1,2]), but cannot take arguments like np.array([[[1,2], [5,6]], [7,8],[9,0]]]). Here my question was how to make my gaussian functions take arguments of arbitrary shapes and return the pdf value of each point maintaining the same structure.

I've worked out a generalization of your nGauss, using einsum to generalize your two dot products.

python - making a function that can take arguments in various shapes -...

python numpy
Rectangle 27 3

df.stack().values
array(['1/2/2014', 'a', '3', 'z1', '1/3/2014', 'c', '1', 'x3'], dtype=object)

(Edit: Incidentally, the DF in the Q uses the first row as labels, which is why they're not in the output here.)

python pandas flatten a dataframe to a list - Stack Overflow

python list numpy pandas dataframe