Rectangle 27 141

If you want to write it to disk so that it will be easy to read back in as a numpy array, look into numpy.save. Pickling it will work fine, as well, but it's less efficient for large arrays (which yours isn't, so either is perfectly fine).

numpy.savetxt

Edit: So, it seems like savetxt isn't quite as great an option for arrays with >2 dimensions... But just to draw everything out to it's full conclusion:

I just realized that numpy.savetxt chokes on ndarrays with more than 2 dimensions... This is probably by design, as there's no inherently defined way to indicate additional dimensions in a text file.

TypeError: float argument required, not numpy.ndarray
import numpy as np
x = np.arange(200).reshape((4,5,10))
np.savetxt('test.txt', x)

One workaround is just to break the 3D (or greater) array into 2D slices. E.g.

x = np.arange(200).reshape((4,5,10))
with file('test.txt', 'w') as outfile:
    for slice_2d in x:
        np.savetxt(outfile, slice_2d)

However, our goal is to be clearly human readable, while still being easily read back in with numpy.loadtxt. Therefore, we can be a bit more verbose, and differentiate the slices using commented out lines. By default, numpy.loadtxt will ignore any lines that start with # (or whichever character is specified by the comments kwarg). (This looks more verbose than it actually is...)

import numpy as np

# Generate some test data
data = np.arange(200).reshape((4,5,10))

# Write the array to disk
with file('test.txt', 'w') as outfile:
    # I'm writing a header here just for the sake of readability
    # Any line starting with "#" will be ignored by numpy.loadtxt
    outfile.write('# Array shape: {0}\n'.format(data.shape))

    # Iterating through a ndimensional array produces slices along
    # the last axis. This is equivalent to data[i,:,:] in this case
    for data_slice in data:

        # The formatting string indicates that I'm writing out
        # the values in left-justified columns 7 characters in width
        # with 2 decimal places.  
        np.savetxt(outfile, data_slice, fmt='%-7.2f')

        # Writing out a break to indicate different slices...
        outfile.write('# New slice\n')
# Array shape: (4, 5, 10)
0.00    1.00    2.00    3.00    4.00    5.00    6.00    7.00    8.00    9.00   
10.00   11.00   12.00   13.00   14.00   15.00   16.00   17.00   18.00   19.00  
20.00   21.00   22.00   23.00   24.00   25.00   26.00   27.00   28.00   29.00  
30.00   31.00   32.00   33.00   34.00   35.00   36.00   37.00   38.00   39.00  
40.00   41.00   42.00   43.00   44.00   45.00   46.00   47.00   48.00   49.00  
# New slice
50.00   51.00   52.00   53.00   54.00   55.00   56.00   57.00   58.00   59.00  
60.00   61.00   62.00   63.00   64.00   65.00   66.00   67.00   68.00   69.00  
70.00   71.00   72.00   73.00   74.00   75.00   76.00   77.00   78.00   79.00  
80.00   81.00   82.00   83.00   84.00   85.00   86.00   87.00   88.00   89.00  
90.00   91.00   92.00   93.00   94.00   95.00   96.00   97.00   98.00   99.00  
# New slice
100.00  101.00  102.00  103.00  104.00  105.00  106.00  107.00  108.00  109.00 
110.00  111.00  112.00  113.00  114.00  115.00  116.00  117.00  118.00  119.00 
120.00  121.00  122.00  123.00  124.00  125.00  126.00  127.00  128.00  129.00 
130.00  131.00  132.00  133.00  134.00  135.00  136.00  137.00  138.00  139.00 
140.00  141.00  142.00  143.00  144.00  145.00  146.00  147.00  148.00  149.00 
# New slice
150.00  151.00  152.00  153.00  154.00  155.00  156.00  157.00  158.00  159.00 
160.00  161.00  162.00  163.00  164.00  165.00  166.00  167.00  168.00  169.00 
170.00  171.00  172.00  173.00  174.00  175.00  176.00  177.00  178.00  179.00 
180.00  181.00  182.00  183.00  184.00  185.00  186.00  187.00  188.00  189.00 
190.00  191.00  192.00  193.00  194.00  195.00  196.00  197.00  198.00  199.00 
# New slice

Reading it back in is very easy, as long as we know the shape of the original array. We can just do numpy.loadtxt('test.txt').reshape((4,5,10)). As an example (You can do this in one line, I'm just being verbose to clarify things):

# Read the array from disk
new_data = np.loadtxt('test.txt')

# Note that this returned a 2D array!
print new_data.shape

# However, going back to 3D is easy if we know the 
# original shape of the array
new_data = new_data.reshape((4,5,10))

# Just to check that they're the same...
assert np.all(new_data == data)
numpy.loadtxt

Well having it readable as text is very useful too, if you can format your answer with a little code example, I'll accept your answer :-)

I've got to catch the bus, but I'll add a code example as soon as I get in... Thanks!

python - How to write a multidimensional array to a text file? - Stack...

python file-io numpy
Rectangle 27 4

The problem with using genfromtxt() is that it attempts to load the whole file into memory, i.e. into a numpy array. This is great for small files but BAD for 3GB inputs like yours. Since you are just calculating column medians, there's no need to read the whole file. A simple, but not the most efficient way to do it would be to read the whole file line-by-line multiple times and iterate over the columns.

Well, okay. But is there a more sustainable solution to this? Like in a java program, you can choose to start it up with, say, 5GB of memory. Is there an equivalent for Python? I mean, next time I might just have a CSV file with a single line of 4Gb..

Python doesn't limit how much memory you can allocate. If you get MemoryError in 64-bit Python, you really are out of memory.

Unfortunately, not all of the Python modules support 64-bit architecture.

Python out of memory on large CSV file (numpy) - Stack Overflow

python memory csv numpy scipy
Rectangle 27 4

The problem with using genfromtxt() is that it attempts to load the whole file into memory, i.e. into a numpy array. This is great for small files but BAD for 3GB inputs like yours. Since you are just calculating column medians, there's no need to read the whole file. A simple, but not the most efficient way to do it would be to read the whole file line-by-line multiple times and iterate over the columns.

Well, okay. But is there a more sustainable solution to this? Like in a java program, you can choose to start it up with, say, 5GB of memory. Is there an equivalent for Python? I mean, next time I might just have a CSV file with a single line of 4Gb..

Python doesn't limit how much memory you can allocate. If you get MemoryError in 64-bit Python, you really are out of memory.

Unfortunately, not all of the Python modules support 64-bit architecture.

Python out of memory on large CSV file (numpy) - Stack Overflow

python memory csv numpy scipy
Rectangle 27 12

The basic problem is that NumPy doesn't understand the concept of stripping quotes (whereas the csv module does). When you say delimiter='","', you're telling NumPy that the column delimiter is literally a quoted comma, i.e. the quotes are around the comma, not the value, so the extra quotes you get on he first and last columns are expected.

Looking at the function docs, I think you'll need to set the converters parameter to strip quotes for you (the default does not):

import re
import numpy as np

fieldFilter = re.compile(r'^"?([^"]*)"?$')
def filterTheField(s):
    m = fieldFilter.match(s.strip())
    if m:
        return float(m.group(1))
    else:
        return 0.0 # or whatever default

#...

# Yes, sorry, you have to know the number of columns, since the NumPy docs
# don't say you can specify a default converter for all columns.
convs = dict((col, filterTheField) for col in range(numColumns))
data = np.genfromtxt(csvfile, dtype=None, delimiter=',', names=True, 
    converters=convs)

Or abandon np.genfromtxt() and let csv.csvreader give you the file's contents a row at a time, as lists of strings, then you just iterate through the elements and build the matrix:

EDIT: Okay, so it looks like your file isn't all floats. In that case, you can set convs as needed in the genfromtxt case, or create a vector of conversion functions in the csv.csvreader case:

magic
reader = csv.csvreader(csvfile)
result = np.array([[magic(col) for col in row] for row in reader])

... where magic() is just a name I got off the top of my head for a function. (Psyche!)

def magic(s):
    if '/' in s:
        return datetime(s)
    elif '.' in s:
        return float(s)
    else:
        return int(s)

Maybe NumPy has a function that takes a string and returns a single element with the right type. numpy.fromstring() looks close, but it might interpret the space in your timestamps as a column separator.

P.S. One downside with csvreader I see is that it doesn't discard comments; real csv files don't have comments.

The str.replace('"', '') method should perform noticeably faster than the regular expression if the input file is large (many MBs or GBs), and will be correct if you can assume the " character will not appear in the middle of a field, only at the ends.

Thanks Mike and gotgenes, but I should've also mentioned that the CSV file has a variable number of columns. I could probably use the approach you've described by adding an initial step to read in the 1st record of the file to determine the number of columns, then using that as input for later steps, but it seems pretty clunky. Is there a better way?

Tiny note: you don't need to use re.compile() because just using re.match() directly will cache the compiled regular expression anyway.

python - Reading CSV files in numpy where delimiter is "," - Stack Ove...

python csv numpy delimiter
Rectangle 27 4

A solution for a similar question was given here some time after the posting of this question. Basically, it suggests to read the file in chunks by doing the following:

chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
    process(chunk)

You should specify the chunksize parameter accordingly to your machine's capabilities (that is, make sure it can process the chunk).

python - Reading large text files with Pandas - Stack Overflow

python csv pandas ipython large-files
Rectangle 27 6

import csv
reader = csv.reader(open("myfile.csv", "rb"), 
                    delimiter='\t', quoting=csv.QUOTE_NONE)

header = []
records = []
fields = 16

if thereIsAHeader: header = reader.next()

for row, record in enumerate(reader):
    if len(record) != fields:
        print "Skipping malformed record %i, contains %i fields (%i expected)" %
            (record, len(record), fields)
    else:
        records.append(record)

# do numpy stuff.

this does not make a numpy array out of the result, unfortunately

You can do whatever you like with the data in the loop body; there it's a list broken up by delimiter. You could check if it's as long as you expect, (in edited example), or do validation on each field to make sure you're not passing garbage into your numpy array.

reading csv files in scipy/numpy in Python - Stack Overflow

python csv numpy matplotlib scipy
Rectangle 27 6

import csv
reader = csv.reader(open("myfile.csv", "rb"), 
                    delimiter='\t', quoting=csv.QUOTE_NONE)

header = []
records = []
fields = 16

if thereIsAHeader: header = reader.next()

for row, record in enumerate(reader):
    if len(record) != fields:
        print "Skipping malformed record %i, contains %i fields (%i expected)" %
            (record, len(record), fields)
    else:
        records.append(record)

# do numpy stuff.

this does not make a numpy array out of the result, unfortunately

You can do whatever you like with the data in the loop body; there it's a list broken up by delimiter. You could check if it's as long as you expect, (in edited example), or do validation on each field to make sure you're not passing garbage into your numpy array.

Sign up for our newsletter and get our top new questions delivered to your inbox (see an example).

reading csv files in scipy/numpy in Python - Stack Overflow

python csv numpy matplotlib scipy
Rectangle 27 1

Depending on your needs this solution might be overkill but when working with large sets of data files from external sources (especially excel, but also binary, csv, tsv, or others) I found the pandas module to be a very convenient and efficient way to read and process data.

Given a data file test-data.txt having the following content

1, 2,
2, 3,
4, 5,

you can read the file by using

import pandas as pd
data = pd.read_csv("test-data.txt", names = ("col1", "col2"), usecols=(0,1))
in[25]: data
Out[25]: 
   col1  col2
0     1     2
1     2     3
2     4     5
In[26]: data.col1
Out[26]: 
0    1
1    2
2    4

The result is a DataFrame object with indexed lines and column labels that can be used for data access. If your data file contains a header it is directly used for labeling the columns. Otherwise you can specify the label for each column with the names argument. The usecols argument allows to avoid the 3rd column that would otherwise be read as a column with nan values.

Python - numpy.loadtxt how to ignore end commas? - Stack Overflow

python numpy
Rectangle 27 7

Files are simply streams of bytes. Lines do not exist as separate entities; they are an artifact of treating certain bytes as newline characters. As such, you must read from the beginning of the file to identify lines in order.

If the file doesn't change (often) and this is an operation you need to perform often (say, with different values of n), you can store the byte offsets of the newline characters in a second file. You can use this much-smaller file and the seek command to quickly jump to a given line in the first file and read from there.

(Some operating systems provide record-oriented files that have more complex internal structure than the common flat file. The above does not apply to them.)

python - Efficiently Read last 'n' rows of CSV into DataFrame - Stack ...

python csv numpy pandas
Rectangle 27 13

Here's a one-liner:

arrays = [np.array(map(int, line.split())) for line in open('scienceVertices.txt')]

arrays is a list of numpy arrays.

python - Reading non-uniform data from file into array with NumPy - St...

python file-io numpy
Rectangle 27 1

I haven't tried it with such epic xml files, but last time I had to deal with large (and relatively simple) xml files, I used a sax parser.

It basically gives you callbacks for each "event" and leaves it to you to store the data you need. You can give an open file so you don't have to read it in all at once.

ElementTree's iterparse() is built on top of a sax parser.

Efficient reading of 800 GB XML file in Python 2.7 - Stack Overflow

python file text python-2.7 io
Rectangle 27 19

If you don't want the first n rows, try (if there is no missing data):

data = numpy.loadtxt(yourFileName,skiprows=n)

or (if there are missing data):

data = numpy.genfromtxt(yourFileName,skiprows=n)

If you then want to parse the header information, you can go back and open the file parse the header, for example:

fh = open(yourFileName,'r')
for i,line in enumerate(fh):
    if i is n: break
    do_other_stuff_to_header(line)
fh.close()

I think I got the idea, will I need to use the csv.dictreader to read in the header?

What I have above will loop over the lines until you hit line n and then it will stop. When it loops over them, you can do whatever you want to parse them.

how do I not import the last n lines?

python - Reading data into numpy array from text file - Stack Overflow

python arrays file-io numpy genfromtxt
Rectangle 27 6

for line in textfile:
  a = np.array([int(v) for v in line.strip().split(" ")])
  # Work on your array

python - Reading non-uniform data from file into array with NumPy - St...

python file-io numpy
Rectangle 27 2

The code below iterates over the files line by line, grabbing the lines for each station from each file in turn and appending them to a list for further processing.

The heart of this code is a generator file_buff that yields the lines of a file but which allows us to push a line back for later reading. When we read a line for the next station we can send it back to file_buff so that we can re-read it when it's time to process the lines for that station.

To test this code, I created some simple fake station data using create_data.

0 data100
1 data110
1 data111
2 data120
3 data130
3 data131
4 data140
4 data141

0 data200
1 data210
2 data220
2 data221
3 data230
3 data231
3 data232
4 data240
4 data241
4 data242

0 data300
0 data301
1 data310
1 data311
2 data320
3 data330
4 data340

Station 0
['data100', 'data200', 'data300', 'data301']
Station 1
['data110', 'data111', 'data210', 'data310', 'data311']
Station 2
['data120', 'data220', 'data221', 'data320']
Station 3
['data130', 'data131', 'data230', 'data231', 'data232', 'data330']
Station 4
['data140', 'data141', 'data240', 'data241', 'data242', 'data340']

This code can cope if station data is missing for a particular station from one or two of the files, but not if it's missing from all three files, since it breaks the main processing loop when the station_lines list is empty, but that shouldn't be a problem for your data.

For details on generators and the generator.send method, please see 6.2.9. Yield expressions in the docs.

This code was developed using Python 3, but it will also run on Python 2.6+ (you just need to include from __future__ import print_function at the top of the script).

If there may be station ids missing from all 3 files we can easily handle that. Just use a simple range loop instead of the infinite str_count generator.

from random import seed, randrange

seed(123)

station_hi = 7
def create_data():
    ''' Fill 3 files with fake station data '''
    fbase = 'datafile_'
    for fnum in range(1, 4):
        with open(fbase + str(fnum), 'w') as f:
            for snum in range(station_hi):
                for i in range(randrange(0, 2)):
                    s = '{1} data{0}{1}{2}'.format(fnum, snum, i)
                    print(s)
                    f.write(s + '\n')
        print()

create_data()

# A file buffer that you can push lines back to
def file_buff(fh):
    prev = None
    while True:
        while prev:
            yield prev
            prev = yield prev
        prev = yield next(fh)

station_start = 0
station_stop = station_hi

# Extract station data from all 3 files
with open('datafile_1') as f1, open('datafile_2') as f2, open('datafile_3') as f3:
    fb1, fb2, fb3 = file_buff(f1), file_buff(f2), file_buff(f3)

    for i in range(station_start, station_stop):
        snum_str = str(i)
        station_lines = []
        for fb in (fb1, fb2, fb3):
            for line in fb:
                #Extract station number string & station data
                sid, sdata = line.split()
                if sid != snum_str:
                    # This line contains data for the next station,
                    # so push it back to the buffer
                    rc = fb.send(line)
                    # and go to the next file
                    break
                # Otherwise, append this data
                station_lines.append(sdata)

        if not station_lines:
            continue
        print('Station', snum_str)
        print(station_lines)
1 data110
3 data130
4 data140

0 data200
1 data210
2 data220
6 data260

0 data300
4 data340
6 data360

Station 0
['data200', 'data300']
Station 1
['data110', 'data210']
Station 2
['data220']
Station 3
['data130']
Station 4
['data140', 'data340']
Station 6
['data260', 'data360']

Thanks a lot @PM_2Ring This code looks excellent and smart, but I wonder why you converted the station number to string at the str_count generator? And What if I want to iterate over the original amount of stations 100797, as there already some station numbers missing from all the three files (there are more files that include other data for missing stations, but I want to process these three files of temperature only.)

@MohammadElNesr I converted the station number to a string in the str_count generator because we need to test the station number string for every line we read, and it's more efficient to compare those number strings to a string than to convert each one to an integer to do the comparison. And I thought it was better to do that conversion in the generator than to clutter the main loop with a station number integer and a station number string.

@MohammadElNesr Before I started to write this code I asked if "Each of these 3 files contains data for every station number in range(100798)" and you replied that that was correct. I need to change the logic a little if that's not correct. But it's getting late in my time zone, and I probably won't have time to make that change until tomorrow.

No need @PM_ 2Ring to do further. As I have changed what is needed and the code is running like a charm now. Many Many thanks for your great effort.

@MohammadElNesr I've added a new version that copes with missing stations, you just need to specify the station number range.

performance - Reading large CSV files from nth line in Python (not fro...

python performance csv bigdata
Rectangle 27 2

I've used it very effectively with numpy/scipy. I would share my code but unfortunately it's owned by my employer, but it should be very straightforward to write your own.

reading csv files in scipy/numpy in Python - Stack Overflow

python csv numpy matplotlib scipy
Rectangle 27 2

I've used it very effectively with numpy/scipy. I would share my code but unfortunately it's owned by my employer, but it should be very straightforward to write your own.

reading csv files in scipy/numpy in Python - Stack Overflow

python csv numpy matplotlib scipy
Rectangle 27 1

#The first thing to do is to import the relevant packages
# that I will need for my script, 
#these include the Numpy (for maths and arrays)
#and csv for reading and writing csv files
#If i want to use something from this I need to call 
#csv.[function] or np.[function] first

import csv as csv 
import numpy as np

#Open up the csv file in to a Python object
csv_file_object = csv.reader(open('../csv/train.csv', 'rb')) 
header = csv_file_object.next()  #The next() command just skips the 
                                 #first line which is a header
data=[]                          #Create a variable called 'data'
for row in csv_file_object:      #Run through each row in the csv file
    data.append(row)             #adding each row to the data variable
data = np.array(data)            #Then convert from a list to an array
                                 #Be aware that each item is currently
                                 #a string in this format

Python is indentation-sensitive. That is, the indentation level will determine the body of the for loop, and according to the comment by thegrinner:

There is a HUGE difference in whether your data = np.array(data) line is in the loop or outside it.

That being said the following should demonstrate the difference:

>>> import numpy as np
>>> data = []
>>> for i in range(5):
...     data.append(i)
... 
>>> data = np.array(data) # re-assign data after the loop
>>> print data
array([0, 1, 2, 3, 4])
>>> data = []
>>> for i in range(5):
...     data.append(i)
...     data = np.array(data) # re-assign data within the loop
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'append'

As a side-note, I'd doubt the quality of the tutorial you are apparantly following is appropriate for bloody Python starters. I think this more basic (official) tutorial should be more appropriate for a quick first overview of the language: http://docs.python.org/2/tutorial/

Python code not running - Stack Overflow

python python-2.7 numpy
Rectangle 27 1

#The first thing to do is to import the relevant packages
# that I will need for my script, 
#these include the Numpy (for maths and arrays)
#and csv for reading and writing csv files
#If i want to use something from this I need to call 
#csv.[function] or np.[function] first

import csv as csv 
import numpy as np

#Open up the csv file in to a Python object
csv_file_object = csv.reader(open('../csv/train.csv', 'rb')) 
header = csv_file_object.next()  #The next() command just skips the 
                                 #first line which is a header
data=[]                          #Create a variable called 'data'
for row in csv_file_object:      #Run through each row in the csv file
    data.append(row)             #adding each row to the data variable
data = np.array(data)            #Then convert from a list to an array
                                 #Be aware that each item is currently
                                 #a string in this format

Python is indentation-sensitive. That is, the indentation level will determine the body of the for loop, and according to the comment by thegrinner:

There is a HUGE difference in whether your data = np.array(data) line is in the loop or outside it.

That being said the following should demonstrate the difference:

>>> import numpy as np
>>> data = []
>>> for i in range(5):
...     data.append(i)
... 
>>> data = np.array(data) # re-assign data after the loop
>>> print data
array([0, 1, 2, 3, 4])
>>> data = []
>>> for i in range(5):
...     data.append(i)
...     data = np.array(data) # re-assign data within the loop
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'append'

As a side-note, I'd doubt the quality of the tutorial you are apparantly following is appropriate for bloody Python starters. I think this more basic (official) tutorial should be more appropriate for a quick first overview of the language: http://docs.python.org/2/tutorial/

Python code not running - Stack Overflow

python python-2.7 numpy
Rectangle 27 1

#The first thing to do is to import the relevant packages
# that I will need for my script, 
#these include the Numpy (for maths and arrays)
#and csv for reading and writing csv files
#If i want to use something from this I need to call 
#csv.[function] or np.[function] first

import csv as csv 
import numpy as np

#Open up the csv file in to a Python object
csv_file_object = csv.reader(open('../csv/train.csv', 'rb')) 
header = csv_file_object.next()  #The next() command just skips the 
                                 #first line which is a header
data=[]                          #Create a variable called 'data'
for row in csv_file_object:      #Run through each row in the csv file
    data.append(row)             #adding each row to the data variable
data = np.array(data)            #Then convert from a list to an array
                                 #Be aware that each item is currently
                                 #a string in this format

Python is indentation-sensitive. That is, the indentation level will determine the body of the for loop, and according to the comment by thegrinner:

There is a HUGE difference in whether your data = np.array(data) line is in the loop or outside it.

That being said the following should demonstrate the difference:

>>> import numpy as np
>>> data = []
>>> for i in range(5):
...     data.append(i)
... 
>>> data = np.array(data) # re-assign data after the loop
>>> print data
array([0, 1, 2, 3, 4])
>>> data = []
>>> for i in range(5):
...     data.append(i)
...     data = np.array(data) # re-assign data within the loop
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'append'

As a side-note, I'd doubt the quality of the tutorial you are apparantly following is appropriate for bloody Python starters. I think this more basic (official) tutorial should be more appropriate for a quick first overview of the language: http://docs.python.org/2/tutorial/

Python code not running - Stack Overflow

python python-2.7 numpy
Rectangle 27 13

import openpyxl as px
import numpy as np

W = px.load_workbook('filename.xlsx', use_iterators = True)
p = W.get_sheet_by_name(name = 'Sheet1')

a=[]

for row in p.iter_rows():
    for k in row:
        a.append(k.internal_value)

# convert list a to matrix (for example 5*6)
aa= np.resize(a, [5, 6])

# save matrix aa as xlsx file
WW=px.Workbook()
pp=WW.get_active_sheet()
pp.title='NEW_DATA'

f={'A':0,'B':1,'C':2,'D':3,'E':4,'F':5}

#insert values in six columns
for (i,j) in f.items():
    for k in np.arange(1,len(aa)+1):
        pp.cell('%s%d'%(i,k)).value=aa[k-1][j]

WW.save('newfilname.xlsx')

This example was close but didn't quite work for me. This did -- openpyxl.readthedocs.org/en/latest/

xls - Reading xlsx files using Python - Stack Overflow

python xls xlsx xlrd openpyxl