EDIT: Show I found partial answer here: https://stackoverflow.com/a/26551913/2230844 https://stackoverflow.com/a/15026839/2230844 How can I read in pandas such ASCII formatted table:
I noticed this answer using Reading from file a hierarchical ascii table using Pandas
asked May 6, 2015 at 14:12
denfromufadenfromufa 5,74412 gold badges73 silver badges138 bronze badges Assuming that your ascii data is in a string,
A few options available in pd.read_csv can get you to this dataframe:
answered May 6, 2015 at 14:38
1 Reading¶We have already talked about Python Built-in Types and Operations, but there are more types that we did not speak about. One of these is the Let’s start off by downloading Do the usual imports of numpy and matplotlib: import numpy as np import matplotlib.pyplot as plt If you have trouble downloading the file, then from within IPython enter: from astropy.extern.six.moves.urllib import request url = 'http://python4astronomers.github.com/_downloads/data.txt' open('data.txt', 'wb').write(request.urlopen(url).read()) ls Now let’s try and get the contents of the file into IPython. We start off by creating a file object: f = open('data.txt', 'r') The and you will see something like this: >>> f.read() 'RAJ DEJ Jmag e_Jmag\n2000 (deg) 2000 (deg) 2MASS (mag) (mag) \n---------- ---------- ----------------- ------ ------\n010.684737 +41.269035 00424433+4116085 9.453 0.052\n010.683469 +41.268585 00424403+4116069 9.321 0.022\n010.685657 +41.269550 00424455+4116103 10.773 0.069\n010.686026 +41.269226 00424464+4116092 9.299 0.063\n010.683465 +41.269676 00424403+4116108 11.507 0.056\n010.686015 +41.269630 00424464+4116106 9.399 0.045\n010.685270 +41.267124 00424446+4116016 12.070 0.035\n' The data file has been read in as a single string. Let’s try that again: What’s happened? We read the file, and the file ‘pointer’ is now sitting at the end of the file, and there is nothing left to read. Let’s now try and do something more useful, and capture the contents of the file in a string: f = open('data.txt', 'r') # We need to re-open the file data = f.read() f.close() Now >>> type(data) <type 'str'> Closing files Usually, you should close file when you are done with it to free up resources (memory). If you only have a couple of files in an interactive session, that is not dramatic. On the other hand, if you write scripts which deal with dozens of files, then you should start worrying about these things. Often you will see things like this: with open('data.txt', 'r') as f: # do things with your file data = f.read() type(data) Notice
the indent under the But what we’d really like to do is read the file line by line. There are several ways to do this, the simplest of which is to use a f = open('data.txt', 'r') for line in f: print(repr(line)) Notice the indent before >>> for line in f: print(repr(line)) 'RAJ DEJ Jmag e_Jmag\n' '2000 (deg) 2000 (deg) 2MASS (mag) (mag) \n' '---------- ---------- ----------------- ------ ------\n' '010.684737 +41.269035 00424433+4116085 9.453 0.052\n' '010.683469 +41.268585 00424403+4116069 9.321 0.022\n' '010.685657 +41.269550 00424455+4116103 10.773 0.069\n' '010.686026 +41.269226 00424464+4116092 9.299 0.063\n' '010.683465 +41.269676 00424403+4116108 11.507 0.056\n' '010.686015 +41.269630 00424464+4116106 9.399 0.045\n' '010.685270 +41.267124 00424446+4116016 12.070 0.035\n' Each line is being returned as a string. Notice the Note You may also come across the following way to read files line by line: for line in f.readlines(): ...
instead is more memory efficient because it only reads one line at a time. Now we’re reading in a file line by line, what would be really nice would be to get some values out of it. Let’s examine the last line in detail. If we just type >>> line '010.685270 +41.267124 00424446+4116016 12.070 0.035\n' We can first get rid of the >>> line = line.strip() >>> line '010.685270 +41.267124 00424446+4116016 12.070 0.035' Next, we can use what we learned about strings and lists to do: >>> columns = line.split() >>> columns ['010.685270', '+41.267124', '00424446+4116016', '12.070', '0.035'] Finally, let’s say we care about the source name, and the J band magnitude. We can extract these with: >>> name = columns[2] >>> j = columns[3] >>> name '00424446+4116016' >>> j '12.070' Note that >>> j = float(columns[3]) One last piece of information we need about files is how we can read a single line. This is done using: We can put all this together to write a little script to read the data from the file and display the columns we care about to the screen! Here is is: # Open file f = open('data.txt', 'r') # Read and ignore header lines header1 = f.readline() header2 = f.readline() header3 = f.readline() # Loop over lines and extract variables of interest for line in f: line = line.strip() columns = line.split() name = columns[2] j = float(columns[3]) print(name, j) f.close() The output should look like this: 00424433+4116085 9.453 00424403+4116069 9.321 00424455+4116103 10.773 00424464+4116092 9.299 00424403+4116108 11.507 00424464+4116106 9.399 00424446+4116016 12.07 Exercise Try and see if you can understand what the following script is doing: f = open('data.txt', 'r') header1 = f.readline() header2 = f.readline() header3 = f.readline() data = [] for line in f: line = line.strip() columns = line.split() source = {} source['name'] = columns[2] source['j'] = float(columns[3]) data.append(source) After this script is run, how would you access the name and J-band magnitude of the third source? Click to Show/Hide Solution The following line creates an empty list to contain all the data: For each line, we are then creating an empty dictionary and populating it with variables we care about: source = {} source['name'] = columns[2] source['j'] = float(columns[3]) Finally, we append this source to the Therefore, >>> data [{'j': 9.453, 'name': '00424433+4116085'}, {'j': 9.321, 'name': '00424403+4116069'}, {'j': 10.773, 'name': '00424455+4116103'}, {'j': 9.299, 'name': '00424464+4116092'}, {'j': 11.507, 'name': '00424403+4116108'}, {'j': 9.399, 'name': '00424464+4116106'}, {'j': 12.07, 'name': '00424446+4116016'}] You can access the dictionary for the third source with: >>> data[2] {'j': 10.773, 'name': '00424455+4116103'} To get the name of this source, you can therefore do: >>> data[2]['name'] '00424455+4116103' Writing¶To open a file for writing, use: f = open('data_new.txt', 'wb') Then simply use f.write("Hello, World!\n") If you want to write multiple lines, you can either give a list of strings to the f.writelines(['spam\n', 'egg\n', 'spam\n']) or you can write them as a single string: f.write('spam\negg\nspam') To close a file, simply use: (this also applies to reading files) Exercise Let’s try combining reading and writing. Using at most seven lines, write a script which will read in Can you do it in a single line? (you can ignore closing the file) Click to Show/Hide Solution Here is a possible solution: f1 = open('data.txt', 'r') content = f1.read() f1.close() content = content.replace(' ','.') f2 = open('data_new.txt', 'w') f2.write(content) f2.close() And a possible one-liner!:
open('data_new.txt', 'w').write(open('data.txt', 'r').read().replace(' ', '.')) How do I read an ASCII file?Open Microsoft Excel and browse for your ASCII file. Select the ASCII file and click Open. Whenever you try to open an ASCII file using Microsoft Excel, you will need to complete 3 steps of the Text Import Wizard.
How do I read a pandas TSV file?How to read TSV file in pandas? TSV stands for Tab Separated File Use pandas which is a text file where each field is separated by tab (\t). In pandas, you can read the TSV file into DataFrame by using the read_table() function.
How do I read a text file in pandas?One can read a text file (txt) by using the pandas read_fwf() function, fwf stands for fixed-width lines, you can use this to read fixed length or variable length text files. Alternatively, you can also read txt file with pandas read_csv() function.
|