-
-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV Conversion #100
Comments
This can be done in a single line of Python! Run this script as follows (
|
I will give it a go! Thank you |
The conversion works but there is obviously something wrong with the formatting or schema in my CSV because when I try to import to SMS I/E I'm getting 'error parsing JSON'. |
Please post some of the (redacted) converted JSON, as per the instructions here. |
Here it is: |
Hm. Is your CSV file ASCII, or something else (e.g., UTF encoded Unicode)? If the latter, try saving it as ASCII and seeing if the script produces proper JSON. If it doesn't contain sensitive information, you can post it here and I can take a look at it. |
OK, I tried saving in the different format and it seemed promising with no parsing errors. It looks like this: |
At this point, I really need to see your original CSV file. Sensitive data can be redacted, if desired, but I need to see the more or less original version of the file. |
This is the test file I used: |
Well, there's your problem. In your initial report, you said that your file had lines like these:
This is standard CSV, and the conversion script runs correctly on it. The file you just posted, however, has lines like these, full of extra double quote marks:
|
So if the conversion process is adding those extra quotation marks, I need to remove them from the original CSV file? (Apologies for my slowness) |
You didn't mention how you converted the file to UTF-8 from the original before feeding it to my CSV-JSON conversion script, but a correct conversion shouldn't be adding extra quotation marks. See here for various conversion tools and methods. I use Linux, so I suppose I'd use To be clear:
|
Hi @tmo1, |
I'm glad you got it working! For the diacritics, see if running the conversion script as follows solves the problem:
|
That doesn't seem to work, unfortunately. It says no such file or directory. There is definitely a file in the same directory as when I ran the previous script with the correct name, encoded in UTF-8 as before. But, honestly, I don't expect you to go on answering me forever. It's just me... I'll see if I can find some solution or live with it |
Hello, % cat diacritics.csv % python3 csv-convert.py < diacritics.csv I have tried running python as "env PYTHONIOENCODING=utf-8 python3" |
By default, 'json.dumps' escapes escapes non-ASCII characters, which is undesirable for CSV data containing Unicode characters. This commit causes all characters to be output as-is. Addresses: #100
I've tweaked the script to correctly handle non-ASCII characters (including the diacritics at issue here) (out of the box, with no explicit UTF-8 directives required. |
Thank you, this way conversion to old-style json script works correctly preserving diacritics. [ |
A clarification: the original version of the conversion script actually worked perfectly correctly as well - the generated JSON does contain the diacritic characters, albeit in escaped form rather than as raw Unicode characters. That JSON should be correctly handled by any properly implemented JSON tools, and in particular, SMS I/E should correctly import it, and the diacritic characters should appear correctly following the import. The only advantage to the updated form of the script is the improved human-readability of the raw JSON. |
Hi. If at all possible, I am looking for help, please. I have a CSV file of old (10-15 years ago, pre-smart phone) SMS messages that I am trying to import to my current SMS app (QKSMS). Each row is as follows (all punctuation as in original):
Row 1 of CSV-
"type","address","body","date"
For the avoidance of doubt, in my file type is 1 or 2 to show whether SMS was received or sent, address is a phone number beginning with either 0 or a +, body is the content of the SMS text, date is Unix time.
So, an example from row 2 downwards is-
2,"00447779877777","No, but I am leaving soon!","1120755000000"
How do I get this in to the correct format (such as, do the quotation marks need removing? or adding to the type?) and convert to a json that the app can read and export back out to my current SMS app? All my attempts have ended in failure. Or should I be using sms-db or something else?
Thanks very much!
Originally posted by @davebeep in #55 (comment)
The text was updated successfully, but these errors were encountered: