Ver 3.3.0 OpenOfficeCalc corrupts CSV

Joseph_Mendonca · June 7, 2011, 6:55pm

New OOCalc corrupts the csv when saving. Removes space in row3 column 1. Was able to upload after restoring quotes to match original uploaded csv.

,“fep1”,6,7,“TRUE”,
should be “”,“fep1”,6,7,“TRUE”,

Any input would will appreciated.

Joseph_Mendonca · June 8, 2011, 3:54pm

Tried with another application using version 3.2 and it still fails to load. The above assumptions need to be reexamined.

Colby.Clegg · June 8, 2011, 11:00pm

It looks like OpenOffice also quotes things that weren’t originally quoted, such as the line with the CSV version. The missing quotes shouldn’t exactly be a problem, but in your example there it seems to be missing an additional comma as well…

Another thing that’s a problem is that OpenOffice doesn’t handle the UTF-8 byte order marker that we write very well. I’m not sure that plays in to what you’re seeing here, but we’ve modified the CSV code for the next version to not write it, since technically it’s optional.

So, I’m not exactly sure what’s going on here. If it’s repeatable on a particular file, perhaps attach it and I can try to replicate it here.

Regards,

Joseph_Mendonca · June 10, 2011, 4:40am

Attached are 2 small files one that was modified and saved using OO Calc the other opened with OO calc but not altered/saved. Looked at both files with Gedit but didn’t find anything different except for the quotes in different places. The bad csv does not see the Test folder definition so it flags an error on line 5 “duplicate tag”. Not sure if Gedit is the correct tool like Notepad ++.
bad.csv (5.4 KB)
good.csv (6.36 KB)

Colby.Clegg · June 10, 2011, 3:59pm

Right, as I thought… the only real problem there is that open office is reading the UTF-8 byte-order-marker as ascii characters, and then adding quotes to the fields, which means it bundles it up with “Path”, and makes it not the first byte anymore. Basically, it’s changing the “path” column name, which throws off the import.

Strangely, when you open the csv, if you specify “Unicode/UTF-8” as the encoding, it will read the marker correctly. HOWEVER, when you save it back to disk, it still messes it up and puts it in the quotes.

Ultimately, if you edit with open office, you’ll have to open the file in notepad or something afterwards and make sure the first entry is simply “Path”.

As I mentioned, we’re going to stop writing the marker each time, and start assuming that the file is UTF-8 on import, which should help this.

Regards,

Joseph_Mendonca · June 10, 2011, 7:52pm

When using Gedit it indicated the bad file was UTF-8Y. The good file was UTF-8. GVim detected “Path” in column 1 row 1, Is it possible to expand the Find & Replace to Find & replace/move/add so you don’t have to dump the csv to edit the csv as often? Helps when you have a lot of folders.