As I mentioned in yesterday’s post, I was recently working on a quick way to parse CSVs into an array of arrays and alternatively into an array of dictionaries keyed on the values in the first row. I ended up landing on the following definition:

As you can see the function is annotated for anyone that may be interested in using. Let’s see example of how it could be used. Here is an example string that can be parsed:

ID,First Name,Last Name,Address,Last Purchase Date,Purchase Amount,Comment,Return Customer
1,Don,Knots,"123 Main St.,
Duggietown, ET 12342",10/23/2013,23.43,"""Doesn't like cheese"" according to his mom.",Y
2,Cher,Vega,"92 Victor Ln.
Rutrow, DA 39252",01/12/2013,588.1,,N
3,Tina,Ray,"1111 Yomdip Circle
Bribloop, EV 92341",02/03/2013,234.2,,Y
4,Charlie,Bucket,"745 Caca Pl.
Hastiville, JS 92293",05/06/2013,345.4,,N

Below is an example of processing the above CSV first as an array of arrays, then as an array of dictionaries (objects), and lastly as an array of dictionaries with typed values:

As you can see from the jPaq Proof above, this parser works well with the majority of the CSVs that you would need to process. Still, in the case that you need a fully-fledged CSV parser, Papa Parse seems to be a pretty good solution. Have fun! 😎

1 Comment

Evzen · October 7, 2015 at 5:58 AM

Just came across your function while searching for solution for CSV parsing…

There is a hardcoded comma in the “second part” of the regex, so the function actually does not work with other delimiters. To fix it, I had to change the line to:
var pattern = ‘([^”‘ + opt_delimiter + ‘\r\n]*|”((?:[^”]+|””)*)”)(‘ + opt_delimiter + ‘|\r|\r?\n)’;

Second problem is that it doesn’t work correctly with CRLF line separators – it parses extra empty row between data rows. It’s apparently confused by the CR, because when I change the CRLF separators in my CSV data to LF only, the function works fine.

Leave a Reply

Your email address will not be published. Required fields are marked *