programming python

Read log file in python pandas

----- Editing answer to account for inconsistent spacing:

Not sure what the pythonic approach should be, but here's a method that could work.

Using OP's data sample as an example:

0 date=2015-09-17 time=21:05:35 duration=0 1 date=2015-09-17 time=21:05:36 duration=0 2 date=2015-09-17 time=21:05:37 duration=0 3 date=2015-09-17 time=21:05:38 duration=0 4 date=2015-09-17 time=21:05:39 duration=0 5 date=2015-09-17 time=21:05:40 duration=0

I loop through each line and split at the equals sign, then grab the desired text:

import pandas as pd log_data = open('log_sample.txt', 'r') split_list = [] for line in log_data: thing1 = line.split('=') #print(thing1) date = thing1[1][:10] time = thing1[2][:8] dur = thing1[3] split_list.append([date, time, dur]) df = pd.DataFrame(split_list, columns=['date', 'time', 'duration']) df

----- First Answer:

As @jezrael mentions in the comments, you can leverage the "sep" argument within read_csv.

pd.read_csv('test.txt', sep=r'\\t', engine='python') #[1]

See:

[1] python pandas read_csv not recognizing \t in tab delimited file
//pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

Guide to easier data handling in Python

Reach the mountain’s top less exhausted using loops (image by author, antagonism makes the world go round)

Many times we have to import log file data from machines which have to be heavily prepared before they are digestible (also called Data Munging). This story provides you with a detailed explanation (including data and code) of readline, loops, split, pop, and regular expressions. Hopefully this will help you process your data more quickly in the future.

Background:

Our robot’s machine log file looks like this:

Robot Nr1 moves from area 10 to area 1, and finally back from position “to” area 1 to position “by” area 1 (image by author)

It is not important for us what the log file means in detail. But we need to understand the basic concept so that we can more easily adapt the code to our needs:

As we can see in above’s screenshot, Robot receives Production Order (PO) 1212201 and moves from position 10 to position 1. Then Robot does whatever Robot has to do and finally returns back to position 1. This is one complete successful step. If Robot would not return back to position 1 that would mean there has been a failure in between. For our training example we only want to calculate the throughput times for those datasets which belong to a prosperous process step.

Solution:

In the interest of efficient and convenient analysis, we will first split this text file into a nice table format. Let’s see how we can achieve this.

Walk the line using Readline

We need to import pandas so we can use the open function of pandas. Then we will readline each line from our log file and append it to our list in a loop. In case of an error we will output the line where the error occurred (try and except).

The list now looks like this:

We have imported the log file, but it is currently still difficult to work with it (image by author)

Split list into dataframe

We convert this “ : “ separated list into a df using DataFrame and Split function. We will call these column headers DateAndTime and Description:

Log data looks already much cleaner in a tabular format (image by author)

To split the first column “DateAndTime” into two new columns “Date” and “Time”, we first string split this column using space (“ “) as a separator. Using the True option for expand results in a dataframe (while otherwise it would have been a Pandas Series Object as output). n defines the numbers of max separations to make in a single string, which we set to be 1 (default is -1 which means all) .

Date first(0), Time second (1) (image by author)

Now we can simply add the Date (position 0) into a new column DT, and similarly for Time from position 1.

Column DT only includes Date, while column Time includes the Time (image by author)

Extracting strings

Strings’ extract function searches for everything which is in between < and >. This extract function possibly looks a little puzzling, so we will have a look at another option later on, which you might find to be more intuitive.

Then we split everything out of column Description which comes after >

New column RobotDetails includes everything from column Description which stands after “>” symbol (image by author)

For extracting Year, Month, Day etc. out of Datestamp, Pandas’ to_datetime function works like a charm:

to_datetime function is very handy for basic time transformations (image by author)

Regular expression operations

The re module provides regular expression matching operations similar to those found in Perl and is very powerful for string operations (see Python.org for more details). Remember we used str.extract(‘\<(.*)\>’) in above’s code to receive the string in between <>? With re we can take the next step and search and split very specifically.

Firstly, we will set up empty buckets:

Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions. We are using findall, split, and pop for this:

Here comes the interesting part: we want to calculate the throughput time only for each consecutive completed process. In tabular words: when “to” position returns back to “from” position and vice versa:

Am I getting old, or are my hands just getting shaky at times like this.. (image by author)

Time Difference (Troughput Time) for completed consecutive processes (image by author)

Finally returns in what we are striving for.

df.head()

Numpy calls Not a time (NaT) where no time difference can be computed (image by author)

Summary:

Congratulations, we have successfully imported a log file and prepared it for lead time calculation with some special adjustments. Hopefully the explanations make it easy for you to customize it to your needs. You can find the Jupyter Notebook and log file in my Github repository. Many thanks for reading, I hope this was supportive! Feel free to connect with me on LinkedIn, Twitter or Workrooms.

How do I parse a log in Python?

Python Log Parser.

Read file: Read a sample log file and extract all IP addresses mentioned using a regular expression..

Count IP: Count the frequency of each IP..

Create CSV: Gather data from the above two functions and export it to a CSV file..

How do I check pandas logs?

To find the logarithm on base 10 values we can apply numpy. log10() function to the columns. In this case, we will be finding the logarithm values of the column salary. The computed values are stored in the new column “logarithm_base10”.

How do I import log files?

To import a log file: Go to Log View > Log Browse and click Import in the toolbar. In the Device dropdown list, select the device the imported log file belongs to or select [Take From Imported File] to read the device ID from the log file.