-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a preference entry to let user change the delimiter ( ; or tab or | etc...) #1
Comments
In short - there is currently no API in VSCode to do this, a request to add it was created 2 years ago and it is still open: microsoft/vscode#1800 Rainbow highlighting is implemented as a "language" and requires a syntax file for each delimiter. It is not hard to generate many syntax files (2 for each possible delimiter because we want both quoted and non-quoted variant), but they will pollute language selection menu, and selection of the appropriate delimiter would be pretty inconvenient. The optimal way, I think, is to allow user select a delimiter in file with mouse cursor and select an option to use it as a delimiter (quoted or unquoted) from VSCode context menu. |
In fact it's pretty much an Excel issue because someone at MS decided to localize csv files so that Comma Separated actually means Semicolon Separated in German. Anyway, we Germans have to live with that decision and this issue describes a real every day work issue. |
@Lercher Interesting, I didn't though much about this problem before. BTW Vim version of rainbow csv doesn't rely on file extension, instead there's a content-based detection algorithm which checks two separators: comma and TAB by default, but since you are saying that semicolon is so popular in Europe I will add it to that list. And again once microsoft/vscode#1800 is resolved content based auto-detection approach could be used in this extension too. For now I will just add semicolon syntax grammar with .scsv extension, which no one uses. At least this would allow manual semicolon selection. |
Just published a new version with semicolon separator, which has to be manually selected from the list of languages. Waiting for the linked VSCode ticket to add all possible ascii separators and content-based autodetection. |
Cool. Works on my machine. Thanks! |
Fiddled around with adding a new language but missing something. How about pipe separated? I would have thought copying the scsv language and updating the extension.js file would have done it but alas I've been defeated. |
@boeningc Did you modify the new pipe.tmLanguage.json file? You need to replace
Also if you don't expect your pipe-separated files to contain double-quoted pipes, it would be better to modify tsv.tmLanguage.json instead. |
I did create a new file and change the regex to use 2 What I'm not seeing is the option in the languages selection. Sorry I wasn't clear about that earlier. |
|
@boeningc what about package.json ? Did you modify it? And you probably don't need the backslash in dialect_map. |
DOH! I did not. Didn't even look at it. :( |
Success! Thank you so much for the quick responses and pointers. :) |
@boeningc You are welcome! |
Hi all, I couldn't follow this thread exactly. I have a text file where columns are separated by one or more spaces (not tabs). Is it possible to use this type of file with rainbow_csv? |
@robertlugg No, it is not possible with current version. But you can substitute whitespaces with tabs in your file globally: |
Very keen to see pipe-delimiting for .dat files soon. |
OK, I think it would make sense to add more grammars, there is no point to wait for microsoft/vscode#1800 First candidate is obviously "pipe-separated" files. I won't be able to associate it with any filetype, but it will still be available with manual selection. The only question is whether anyone needs "quoted" pipe separated syntax, where fields containing pipe characters can be enclosed in double-quotes to escape them? Another two separators that I think could be relevant are colon and double-quote. Also I will probably implement csv and csv-semicolon grammars which doesn't allow quoted fields, this will allow to change original csv and csv(semicolon) grammars and highlight lines with unbalanced double-quotes as "errors". The mentioned multi-space separated files, which many *nix utility produce as output, are definitely very relevant, but there is a technical issue, that will complicate the implementation. So it will take time to make this. Single space-separated files could be useful, but people can incorrectly assume that this grammar is for multi-space separated files. So the plan is not to add all possible separators and escape rule combinations, but only those that are practical. |
In my experience, the most common pipe-delimited files are the .DAT files you get when uploading & downloading batch payment files to banks and payment gateways. They are never quoted, and generally come with a fairly irrelevant 1-2 line header (no column names) and a single-line footer that contains the number of rows and total of the dollar amounts in the file. Often the header and footer do not contain pipes, only the actual data rows have pipe delimiters. |
@harvest316 Thanks, this is interesting! |
Just stumbled into https://code.visualstudio.com/docs/extensionAPI/extension-points#_contributeslanguages and this leads me to a comfort enhancement request: What about reading the firstLine property mentioned in the article, counting the number of commas and the number of semicolons there, and whatever is the bigger figure, choose CSV or CSV (semicolon delimited) as the language of the file? This can go wrong, for sure, but if it saves x% of language switching, it‘s worth the price. One detail use case: no header line and only floats with comma as decimal point. I.e. 1,1;2,2;3,3;... it has equal number of commas and semicolons or even one comma more. My personal preference is to choose ;-delimited in this case. Thanks |
@Lercher I didn't know about this feature, but I think it will give too many false positives: a lot of non-csv files can contain commas or semicolons in the first line. Also I think it is not right to measure worth of this feature by percentage of switching: switch back could be more emotionally expensive since incorrect filetype detection would be very annoying. |
If you say so. However, I guess, if one of the counts is zero and the other one positive, then the method won't produce any false positives. IMHO this reduces switching business to non-existent for all files containing headers with names that are derived from identifiers of programming languages or DBMSs. |
I've published updated version, the only change is that now Rainbow CSV supports pipe |
Hello, |
Hello, @GrisPetitDragon , |
Good news: microsoft/vscode#1800 is complete. I even took a part in writing the API implementation 😎 So this allows to add auto-detection functionality and possibly more CSV dialects, since their selection would be much more convenient. |
Thank you!!! :) |
I've just published version 0.7.0 which has content-based separator autodetection logic. The new functionality will work only with VSCode 1.28, for older VSCode versions there should be no change in behavior. |
Tooltip: from `Col# 1` to `Col #1`
Thank you so much! |
@GrisPetitDragon you are welcome! Actually there is an issue with current implementation: separator autodetection will only work for "plaintext" files with unassigned language. i.e. if a table file has '.txt' or some unknown extension (e.g. '.unknown') - autodetection will work and switch it to "csv" or "csv (semicolon)" depending on it's content. But it won't switch ".csv" file to semicolon language even if it is really a semicolon separated file. I plan to fix this soon. |
Oh I'm facing this issue. I get this now. Thanks and hope it's coming soon :) |
Any chance to use this with tilde (~) as the delimiter? |
I've just published version 1.0.0 with 7 new separators: |
Thanks! |
In version 1.1.1 there is a new special whitespace-separated dialect that @robertlugg was suggesting. Multiple consecutive whitespaces are threated as a single one. |
Thanks for you work. I am surprised that I did not find |
@Mingun What do you mean? |
Oh, I see what you mean. The tab-separated csv is usually called "TSV", I thought this is a universally known fact. So maybe I should add language alias: "TSV" -> "CSV(tab)", I will think about this. So, @Mingun , you should just select "TSV" from the list. BTW Another option to enable the dialect is to select the delimiter -> right click -> set as rainbow separator from the context menu. |
I Thank you. I already checked documentation (who reads it :)) also saw that there is a separate |
Starting from version 3.0.0 all possible characters and even multicharacter strings can be used as a separator. To set an arbitrary separator - select it in the editor with the cursor and run |
All on the title :)
The text was updated successfully, but these errors were encountered: