Php read big csv file

Learn how to read efficiently a huge CSV file and parse its data in PHP.

If you work for a company that offers development services even for other companies of the same industry, you may have had this "wonderful" task of importing a huge "database" from a client into the database engine preferred by your company. For example, in our company, we work with MySQL and our client came up with a CSV file of approximately 25GB with ~7.5M rows.

Php read big csv file

Obviously, because of the logic that the project required, we couldn't just simply import the file into the database through a tool like PHPMyAdmin, as every row in the CSV should be modificated to fit with our new database design.

In this article, we will explain you our approach for reading efficiently a huge CSV file in PHP.

1. Split your file into smaller chunks

To get started, when we talk about huge files, we are not talking about files with 50K or 70K rows, we talk about millions of rows like in this example, with a CSV file of 25GB. So, the correct approach for such cases is not to work with the file directly, but with smaller files.

The smaller the file, the better will it be to have an optimal performance and control over your script, not only about the performance perspective but the logic as well. We wrote an article previously of how to split huge CSV datasets into smaller chunks using CSV splitter, a tool for Windows 10. You can of course do the same using another approach, but, you get the idea right? Split the file into smaller chunks that can be easily processed by your scripts later.

2. Implementing the read and iteration script

To read the file, we will use the fopen function of PHP, this inbuilt function is used to simply open a file from a local URL, it's used to bind a resource to a steam. It expects as second argument the mode in which we'll operate, in this case, just reading with the r identifier. The method returns a file pointer as long as the file exists, otherwise it will return False in case of failure.

We will read the file using this method and will store the pointer into the $handle variable. Create as well the variable that will store the current line number as we'll iterate over the rows with an entry controlled loop (while). With a while loop, we will iterate over every single row of the file, verifying the condition that fgets always return some content.

The fgets function of PHP returns a line from an open file with fopen and it returns false when there's nothing left to read. Then, inside the while loop, you will be able to parse the raw CSV string with the str_getcsv function. Having the basic stuff implemented, you will be ready to modify the script to do whatever you need to do with the best possible performance in PHP:

// Read a CSV file
$handle = fopen("my_huge_csv_file.csv", "r");

// Optionally, you can keep the number of the line where
// the loop its currently iterating over
$lineNumber = 1;

// Iterate over every line of the file
while (($raw_string = fgets($handle)) !== false) {
    // Parse the raw csv string: "1, a, b, c"
    $row = str_getcsv($raw_string);

    // into an array: ['1', 'a', 'b', 'c']
    // And do what you need to do with every line
    var_dump($row);
    
    // Increase the current line
    $lineNumber++;
}

fclose($handle);

The advantages of this approach are:

  • You are not directly reading the entire file in memory just like file_get_contents does, so the max amount of memory needed to run the script depends on the longest line in the input data.
  • Quite easy to read and understand.

Happy coding ❤️!

After struggling a lot, finally i found a good solution, may be it help others also. When i tried 2,367KB csv file containing 18226 rows, the least time taken by different php scripts were (1) from php.net fgetcsv documentation named CsvImporter, and (2) file_get_contents => PHP Fatal error: Allowed memory exhausted

(1) took 0.92574405670166 (2) took 0.12543702125549 (string form) & 0.52903485298157 (splitted to array) Note: this calculation not include adding to mysql.

The best solution i found uses 3.0644409656525 total including adding to database and some conditional check also. It took 11 seconds in processing a 8MB file. solution is :

$csvInfo = analyse_file($file, 5);
    $lineSeperator = $csvInfo['line_ending']['value'];
    $fieldSeperator = $csvInfo['delimiter']['value'];
    $columns = getColumns($file);
    echo '<br>========Details========<br>';
    echo 'Line Sep: \t '.$lineSeperator;
    echo '<br>Field Sep:\t '.$fieldSeperator;
    echo '<br>Columns: ';print_r($columns);
    echo '<br>========Details========<br>';
    $ext = pathinfo($file, PATHINFO_EXTENSION);
    $table = str_replace(' ', '_', basename($file, "." . $ext));
    $rslt = table_insert($table, $columns);
    if($rslt){
        $query = "LOAD DATA LOCAL INFILE '".$file."' INTO TABLE $table FIELDS TERMINATED BY '$fieldSeperator' ";

        var_dump(addToDb($query, false));
    }


function addToDb($query, $getRec = true){
//echo '<br>Query : '.$query;
$con = @mysql_connect('localhost', 'root', '');
@mysql_select_db('rtest', $con);
$result = mysql_query($query, $con);
if($result){
    if($getRec){
         $data = array();
        while ($row = mysql_fetch_assoc($result)) { 
            $data[] = $row;
        }
        return $data;
    }else return true;
}else{
    var_dump(mysql_error());
    return false;
}
}


function table_insert($table_name, $table_columns) {
    $queryString = "CREATE TABLE " . $table_name . " (";
    $columns = '';
    $values = '';

    foreach ($table_columns as $column) {
        $values .= (strtolower(str_replace(' ', '_', $column))) . " VARCHAR(2048), ";
    }
    $values = substr($values, 0, strlen($values) - 2);

    $queryString .= $values . ") ";

    //// echo $queryString;

    return addToDb($queryString, false);
}


function getColumns($file){
    $cols = array();
    if (($handle = fopen($file, 'r')) !== FALSE)
    {
        while (($row = fgetcsv($handle)) !== FALSE) 
        {
           $cols = $row;
           if(count($cols)>0){
                break;
           }
        }
        return $cols;
    }else return false;
}

function analyse_file($file, $capture_limit_in_kb = 10) {
// capture starting memory usage
$output['peak_mem']['start']    = memory_get_peak_usage(true);

// log the limit how much of the file was sampled (in Kb)
$output['read_kb']                 = $capture_limit_in_kb;

// read in file
$fh = fopen($file, 'r');
    $contents = fread($fh, ($capture_limit_in_kb * 1024)); // in KB
fclose($fh);

// specify allowed field delimiters
$delimiters = array(
    'comma'     => ',',
    'semicolon' => ';',
    'tab'         => "\t",
    'pipe'         => '|',
    'colon'     => ':'
);

// specify allowed line endings
$line_endings = array(
    'rn'         => "\r\n",
    'n'         => "\n",
    'r'         => "\r",
    'nr'         => "\n\r"
);

// loop and count each line ending instance
foreach ($line_endings as $key => $value) {
    $line_result[$key] = substr_count($contents, $value);
}

// sort by largest array value
asort($line_result);

// log to output array
$output['line_ending']['results']     = $line_result;
$output['line_ending']['count']     = end($line_result);
$output['line_ending']['key']         = key($line_result);
$output['line_ending']['value']     = $line_endings[$output['line_ending']['key']];
$lines = explode($output['line_ending']['value'], $contents);

// remove last line of array, as this maybe incomplete?
array_pop($lines);

// create a string from the legal lines
$complete_lines = implode(' ', $lines);

// log statistics to output array
$output['lines']['count']     = count($lines);
$output['lines']['length']     = strlen($complete_lines);

// loop and count each delimiter instance
foreach ($delimiters as $delimiter_key => $delimiter) {
    $delimiter_result[$delimiter_key] = substr_count($complete_lines, $delimiter);
}

// sort by largest array value
asort($delimiter_result);

// log statistics to output array with largest counts as the value
$output['delimiter']['results']     = $delimiter_result;
$output['delimiter']['count']         = end($delimiter_result);
$output['delimiter']['key']         = key($delimiter_result);
$output['delimiter']['value']         = $delimiters[$output['delimiter']['key']];

// capture ending memory usage
$output['peak_mem']['end'] = memory_get_peak_usage(true);
return $output;
}

What is the better way to read the large CSV file in PHP?

Linked.
file_get_contents => PHP Fatal error: Allowed memory exhausted..
Process very big csv file without timeout and memory error..
Import large csv file to mysql database using php..
Importing Large CSV file in MySQL using php..
read big file with php line by line (csv file generated on a mac).

Can PHP read csv file?

It is a convenient form to store simple data. PHP has two inbuilt functions to read CSV file. fgetcsv() – Reads CSV using the reference of the file resource.

How do I read a csv file in column wise in PHP?

You can open the file using fopen() as usual, get each line by using fgets() and then simply explode it on each comma like this: <? php $handle = @fopen("/tmp/inputfile. txt", "r"); if ($handle) { while (($buffer = fgets($handle)) !==

What is Fgetcsv function in PHP?

The fgetcsv() function parses a line from an open file, checking for CSV fields.