The CSV Format
- Each record is one line – Line separator may be LF (0x0A) or CRLF (0x0D0A), a line seperator may also be embedded in the data (making a record more than one line but still acceptable).
- Fields are separated with commas. – Duh.
- Leading and trailing whitespace is ignored – Unless the field is delimited with double-quotes in that case the whitespace is preserved.
- Embedded commas – Field must be delimited with double-quotes.
- Embedded double-quotes – Embedded double-quote characters must be doubled, and the field must be delimited with double-quotes.
- Embedded line-breaks – Fields must be surounded by double-quotes.
- Always Delimiting – Fields may always be delimited with double quotes, the delimiters will be parsed and discarded by the reading applications.
CSV Files and Leading Zeros on Numeric Fields
Sometimes leading zero values are required in a data set and while the leading zeros are present in the data they are not displayed. In some software it’s possible to force strict interpretation of the CSV field value with a leading = (equal) symbol.
This may chop the leading zero on some softwares, even if quoted.
This incantation may convince that software to keep the leading zero.
Acceptable CSV Mime Types
Sadly there is no definitive standard for this, here is a collection of types we’ve seen in use.
- text/comma-separated-values – this is best
Here are some examples that demonstrate the rules above. Each sample describes the data and how the reading application should interpret it.
This shows three fields, each with simple data.
Edoceo, Seattle, WA
The first field should be interpreted by reading applications as [space]Edoceo[comma][space]Inc.[space]. Whitespace also could include line breaks.
" Edoceo, Inc. ",Seattle,WA
The first field should be interpreted by reading applications as Edoceo[comma][space]Inc.