Rectangle 27 7

As said in the question, the format used depends mainly on which project/domain the data comes from. But most of the time the data is useless without metadata information, and when the dataset starts to get big you need a way to extract the metadata from it automatically.

In astronomy, where most of the data has been open for decades, the International Virtual Observatory Alliance (IVOA) created the specifications of a format that is somehow a mix of html tables and xml, it's called VOTables and it contains information as where it comes from, what are the names of the columns, the units and other descriptors (base on a set of standards).

This fileformat, besides being compatible with a lot of tools used in astronomy can be also read and written in python using the astropy package. A simple votable can be read by just:

from astropy.io.votable import parse
votable = parse("votable.xml")

Great post. May I have your permission to include it on the initial post?

oh, sure, you mention VOTable, but no love for FITS? PyFITS merged into AstroPy, so you can use astropy.io.fits. I believe that SunPy (python library for solar physics) uses it, too.

@joe, you are right! fits are an option too! and they are indeed accessible from SunPy too.

best-practice data-format programming
Rectangle 27 7

As said in the question, the format used depends mainly on which project/domain the data comes from. But most of the time the data is useless without metadata information, and when the dataset starts to get big you need a way to extract the metadata from it automatically.

In astronomy, where most of the data has been open for decades, the International Virtual Observatory Alliance (IVOA) created the specifications of a format that is somehow a mix of html tables and xml, it's called VOTables and it contains information as where it comes from, what are the names of the columns, the units and other descriptors (base on a set of standards).

This fileformat, besides being compatible with a lot of tools used in astronomy can be also read and written in python using the astropy package. A simple votable can be read by just:

from astropy.io.votable import parse
votable = parse("votable.xml")

Great post. May I have your permission to include it on the initial post?

oh, sure, you mention VOTable, but no love for FITS? PyFITS merged into AstroPy, so you can use astropy.io.fits. I believe that SunPy (python library for solar physics) uses it, too.

@joe, you are right! fits are an option too! and they are indeed accessible from SunPy too.

best-practice data-format programming
Rectangle 27 0

Well, I got inspired a lot by the comments from you guys and I came up with a solution that compress the HTML content using zlib and POST the data to API server, on the Flask API server side, I extract the data and push to mongodb for storage.

Here is the part that might save some future headache.

myinput = "http://www.exmaple.com/001"
myoutput = "<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" ... /html>"
result = {'myinput':myinput, 'myoutput': myoutput}
data = zlib.compress(str(result))
opener.open("www.host.com/senddata", data)
@app.route('/contribute', methods=['POST'])
def contribute():
    try:
        data = request.stream.read()
        result = eval(zlib.decompress(data))
        db.result.insert(result)
    except:
        print sys.exc_info()
        pass
    return 'OK'

Result in mongodb:

{ 
"_id" : ObjectId("534e0d346a1b7a0e48ff9076"), 
"myoutput" : "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" ... /html>",  
"myinput" : "http://www.exmaple.com/001" 
}

(Note: As you have noticed, the final version in mongo somehow escaped all the sensible characters by putting a slash in front of them, like double quote, not sure how to change it back.)

There were some discussions about retrieving binary data in flask. Like here. So you don't have to mess up with the header if you read from request.stream directly.

post - Python send data over http - Stack Overflow

python post flask pickle zlib