[ACCEPTED]-How can I read and manipulate CSV file data in C++?-csv

Accepted answer
Score: 59

More information would be useful.

But the 1 simplest form:

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>

int main()
{
    std::ifstream  data("plop.csv");

    std::string line;
    while(std::getline(data,line))
    {
        std::stringstream  lineStream(line);
        std::string        cell;
        while(std::getline(lineStream,cell,','))
        {
            // You have a cell!!!!
        }
    }
 }

Also see this question: CSV parser in C++

Score: 21

You can try the Boost Tokenizer library, in 1 particular the Escaped List Separator

Score: 9

If what you're really doing is manipulating 12 a CSV file itself, Nelson's answer makes 11 sense. However, my suspicion is that the 10 CSV is simply an artifact of the problem 9 you're solving. In C++, that probably means 8 you have something like this as your data 7 model:

struct Customer {
    int id;
    std::string first_name;
    std::string last_name;
    struct {
        std::string street;
        std::string unit;
    } address;
    char state[2];
    int zip;
};

Thus, when you're working with a collection 6 of data, it makes sense to have std::vector<Customer> or std::set<Customer>.

With 5 that in mind, think of your CSV handling 4 as two operations:

// if you wanted to go nuts, you could use a forward iterator concept for both of these
class CSVReader {
public:
    CSVReader(const std::string &inputFile);
    bool hasNextLine();
    void readNextLine(std::vector<std::string> &fields);
private:
    /* secrets */
};
class CSVWriter {
public:
    CSVWriter(const std::string &outputFile);
    void writeNextLine(const std::vector<std::string> &fields);
private:
    /* more secrets */
};
void readCustomers(CSVReader &reader, std::vector<Customer> &customers);
void writeCustomers(CSVWriter &writer, const std::vector<Customer> &customers);

Read and write a single 3 row at a time, rather than keeping a complete 2 in-memory representation of the file itself. There 1 are a few obvious benefits:

  1. Your data is represented in a form that makes sense for your problem (customers), rather than the current solution (CSV files).
  2. You can trivially add adapters for other data formats, such as bulk SQL import/export, Excel/OO spreadsheet files, or even an HTML <table> rendering.
  3. Your memory footprint is likely to be smaller (depends on relative sizeof(Customer) vs. the number of bytes in a single row).
  4. CSVReader and CSVWriter can be reused as the basis for an in-memory model (such as Nelson's) without loss of performance or functionality. The converse is not true.
Score: 8

I've worked with a lot of CSV files in my 17 time. I'd like to add the advice:

1 - Depending 16 on the source (Excel, etc), commas or tabs 15 may be embedded in a field. Usually, the 14 rule is that they will be 'protected' because 13 the field will be double-quote delimited, as 12 in "Boston, MA 02346".

2 - Some sources will 11 not double-quote delimit all text fields. Other 10 sources will. Others will delimit all fields, even 9 numerics.

3 - Fields containing double-quotes 8 usually get the embedded double quotes doubled 7 up (and the field itself delimited with 6 double quotes, as in "George ""Babe"" Ruth".

4 5 - Some sources will embed CR/LFs (Excel 4 is one of these!). Sometimes it'll be just 3 a CR. The field will usually be double-quote 2 delimited, but this situation is very difficult 1 to handle.

Score: 7

This is a good exercise for yourself to 16 work on :)

You should break your library 15 into three parts

  • Loading the CSV file
  • Representing the file in memory so that you can modify it and read it
  • Saving the CSV file back to disk

So you are looking at writing 14 a CSVDocument class that contains:

  • Load(const char* file);
  • Save(const char* file);
  • GetBody

So that 13 you may use your library like this:

CSVDocument doc;
doc.Load("file.csv");
CSVDocumentBody* body = doc.GetBody();

CSVDocumentRow* header = body->GetRow(0);
for (int i = 0; i < header->GetFieldCount(); i++)
{
    CSVDocumentField* col = header->GetField(i);
    cout << col->GetText() << "\t";
}

for (int i = 1; i < body->GetRowCount(); i++) // i = 1 so we skip the header
{
    CSVDocumentRow* row = body->GetRow(i);
    for (int p = 0; p < row->GetFieldCount(); p++)
    {
        cout << row->GetField(p)->GetText() << "\t";
    }
    cout << "\n";
}

body->GetRecord(10)->SetText("hello world");

CSVDocumentRow* lastRow = body->AddRow();
lastRow->AddField()->SetText("Hey there");
lastRow->AddField()->SetText("Hey there column 2");

doc->Save("file.csv");

Which 12 gives us the following interfaces:

class CSVDocument
{
public:
    void Load(const char* file);
    void Save(const char* file);

    CSVDocumentBody* GetBody();
};

class CSVDocumentBody
{
public:
    int GetRowCount();
    CSVDocumentRow* GetRow(int index);
    CSVDocumentRow* AddRow();
};

class CSVDocumentRow
{
public:
    int GetFieldCount();
    CSVDocumentField* GetField(int index);
    CSVDocumentField* AddField(int index);
};

class CSVDocumentField
{
public:
    const char* GetText();
    void GetText(const char* text);
};

Now you 11 just have to fill in the blanks from here 10 :)

Believe me when I say this - investing 9 your time into learning how to make libraries, especially 8 those dealing with the loading, manipulation 7 and saving of data, will not only remove 6 your dependence on the existence of such 5 libraries but will also make you an all-around 4 better programmer.

:)

EDIT

I don't know how much 3 you already know about string manipulation 2 and parsing; so if you get stuck I would 1 be happy to help.

Score: 6

Here is some code you can use. The data 3 from the csv is stored inside an array of 2 rows. Each row is an array of strings. Hope 1 this helps.

#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
typedef std::string String;
typedef std::vector<String> CSVRow;
typedef CSVRow::const_iterator CSVRowCI;
typedef std::vector<CSVRow> CSVDatabase;
typedef CSVDatabase::const_iterator CSVDatabaseCI;
void readCSV(std::istream &input, CSVDatabase &db);
void display(const CSVRow&);
void display(const CSVDatabase&);
int main(){
  std::fstream file("file.csv", std::ios::in);
  if(!file.is_open()){
    std::cout << "File not found!\n";
    return 1;
  }
  CSVDatabase db;
  readCSV(file, db);
  display(db);
}
void readCSV(std::istream &input, CSVDatabase &db){
  String csvLine;
  // read every line from the stream
  while( std::getline(input, csvLine) ){
    std::istringstream csvStream(csvLine);
    CSVRow csvRow;
    String csvCol;
    // read every element from the line that is seperated by commas
    // and put it into the vector or strings
    while( std::getline(csvStream, csvCol, ',') )
      csvRow.push_back(csvCol);
    db.push_back(csvRow);
  }
}
void display(const CSVRow& row){
  if(!row.size())
    return;
  CSVRowCI i=row.begin();
  std::cout<<*(i++);
  for(;i != row.end();++i)
    std::cout<<','<<*i;
}
void display(const CSVDatabase& db){
  if(!db.size())
    return;
  CSVDatabaseCI i=db.begin();
  for(; i != db.end(); ++i){
    display(*i);
    std::cout<<std::endl;
  }
}
Score: 2

Look at 'The Practice of Programming' (TPOP) by Kernighan & Pike. It 4 includes an example of parsing CSV files 3 in both C and C++. But it would be worth 2 reading the book even if you don't use the 1 code.

(Previous URL: http://cm.bell-labs.com/cm/cs/tpop/)

Score: 2

Using boost tokenizer to parse records, see here for more details.

ifstream in(data.c_str());
if (!in.is_open()) return 1;

typedef tokenizer< escaped_list_separator<char> > Tokenizer;

vector< string > vec;
string line;

while (getline(in,line))
{
    Tokenizer tok(line);
    vec.assign(tok.begin(),tok.end());

    /// do something with the record
    if (vec.size() < 3) continue;

    copy(vec.begin(), vec.end(),
         ostream_iterator<string>(cout, "|"));

    cout << "\n----------------------" << endl;
}

0

Score: 0

I found this interesting approach:

CSV to C structure utility

Quote: CSVtoC 5 is a program that takes a CSV or comma-separated 4 values file as input and dumps it as a C 3 structure.

Naturally, you can't make changes 2 to the CSV file, but if you just need in-memory 1 read-only access to the data, it could work.

More Related questions