Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CSV files with UTF16 encoding #12

Closed
dkalinai opened this issue Dec 14, 2016 · 12 comments
Closed

Support CSV files with UTF16 encoding #12

dkalinai opened this issue Dec 14, 2016 · 12 comments

Comments

@dkalinai
Copy link

Is there any way to import a utf16 file? Somehow doesnt seem to work with this iteration of the framework.

@Jeehut
Copy link
Member

Jeehut commented Dec 15, 2016

Would you mind forking CSVImporter, adding an example CSV file to the tests and write a test that is failing so I can have a look and implement the feature properly?

I'm guessing we just have to change the .characters calls to .utf16 to fix this (you can try it out yourself after writing the test if you want), but we'll need a test to ensure this keeps working in future changes anyway. Thanks in advance!

@Jeehut Jeehut changed the title utf16 Support CSV files with UTF16 encoding Dec 15, 2016
@dkalinai
Copy link
Author

dkalinai commented Dec 15, 2016 via email

@dkalinai
Copy link
Author

I will admit I am very bad at unit testing. I haven't covered that bit of the development world just yet. I have forked a version of the repo and added a .csv file there.

Where it fails is when i try to do this:

`let importer = CSVImporter<[String: String]>(path: utf16FileUrl.path)
importer.startImportingRecords(structure: { (headerValues) -> Void in

// Headers

print(headerValues) // Actually prints the whole utf file as single string

}) { $0 }.onFinish { importedRecords in

print(importedRecords) // Actually is an empty array
}`

So jut to reiterate i used your great framework with utf8 files and no issues, meaning I understand how it works but with UTF16 doesnt work simply. the init method where you have TextFile(path... in FileKit you can have path + encoding, i think that needs to be done too. But then the utf8 stuff fails.

Please tell me you can work with this little information...

@Jeehut
Copy link
Member

Jeehut commented Dec 15, 2016

Sure, I'll have a look, thank you for the CSV file and for trying. It's no problem you don't know how to write unit tests yet, everybody has to pass through that stage at some point and most people are not actively taking part in the open source community as you do at that time, so great you did what you can! ;)

I'll keep you updated once I have some results, stay tuned!

@Jeehut
Copy link
Member

Jeehut commented Dec 15, 2016

By the way, here are two link tips to help you getting into the world of unit testing:
Video (Part 1) on RayWenderlich.com
Detailed Introduction with WWDC Video Links from Apple

Feel free to also read this post on good commit messages – your commit message was too long for example. See the post for reasoning.

@Jeehut
Copy link
Member

Jeehut commented Dec 16, 2016

I have just done some work to get away from the FileKit dependency for several reasons. We were only using a single class really, so I included it into this project with some changes. This will also help when we need to change anything like the encoding. See the updated stable branch.

@Jeehut
Copy link
Member

Jeehut commented Dec 16, 2016

@dkalinai You forgot to copy or commit your utf16 example file in your fork. It is only referenced in the Xcode project navigator, but not actually included in git. Could you fix that so I can test if my UTF 16 solution (which I have not uploaded yet) works for you, too? Thanks!

@dkalinai
Copy link
Author

I know why this happened. Apparently there was a problem between the chair and my mac. I have now recommitted the file. And great to hear about the removed dependancy. If you want to go the extra mile you can add either automatic detection of utf16 or allow to specify that somehow in the let importer bit, with utf8 being the default value of course.

And thank you again, you have no idea how much time your work has saved me already.

As for the tests I will have a read and look, i was slightly confused with Nimble and Quick, as I am unfamiliar with them, maybe it makes life easier but not for the beginner in the unit testing world.

Thanks again for everything!

@Jeehut
Copy link
Member

Jeehut commented Dec 17, 2016

@dkalinai I just released version 1.4.0 of CSVImporter with support for specifying the files string encoding. Note that your file was .utf16LittleEndian encoded (see also the spec I added here).

At first even after allowing to specify the encoding the spec was failing – the reason was, that CSVImporter relies on the structure defined by RFC 4180. In 2.4 the standard specifies the following:

Within the header and each record, there may be one or more
fields, separated by commas. Each line should contain the same
number of fields throughout the file. Spaces are considered part
of a field and should not be ignored. The last field in the
record must not be followed by a comma.

After having a look at your file I found that it didn't comply to the standard, so I removed the commas at the end of each line, then the specs were passing. I hope you can get around this restriction somehow, I don't know where your file came from, but it was not a valid CSV file as it seems.

As for Quick and Nimble: True, my fault, I forgot that I used those in this project. I'm a fan of the DRY principle which Quick helps a lot with and Nimble is so much more readable then XCTAsserts – so I think those two will be the future best practices. The testing stuff is already done for this feature but if you're interested in a better testing framework you might find this post helpful.

@Jeehut Jeehut closed this as completed Dec 17, 2016
@dkalinai
Copy link
Author

Works perfectly, much obliged! I may have inadvertently sent you some copy of the file which i manually edited which resulted in this. In any case you will be happy to know that it works with my utf-16 file out of the box when imported with no workarounds needed. I did use the utf-16le which i was sure my file was not, and everything is super peachy now! thank you man, you saved a lot of nerves today, be proud!

@dkalinai
Copy link
Author

Oh and thanks for all of the helpful advice given, it is very useful.

@Jeehut
Copy link
Member

Jeehut commented Dec 20, 2016

Glad I could help. 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants