Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a preference entry to let user change the delimiter ( ; or tab or | etc...) #1

Closed
pieplu opened this issue Jan 11, 2018 · 41 comments

Comments

@pieplu
Copy link

pieplu commented Jan 11, 2018

All on the title :)

@mechatroner
Copy link
Owner

mechatroner commented Jan 12, 2018

In short - there is currently no API in VSCode to do this, a request to add it was created 2 years ago and it is still open: microsoft/vscode#1800
I will mention this problem in the VSCode issue.

Rainbow highlighting is implemented as a "language" and requires a syntax file for each delimiter. It is not hard to generate many syntax files (2 for each possible delimiter because we want both quoted and non-quoted variant), but they will pollute language selection menu, and selection of the appropriate delimiter would be pretty inconvenient. The optimal way, I think, is to allow user select a delimiter in file with mouse cursor and select an option to use it as a delimiter (quoted or unquoted) from VSCode context menu.

@Lercher
Copy link

Lercher commented Feb 9, 2018

In fact it's pretty much an Excel issue because someone at MS decided to localize csv files so that Comma Separated actually means Semicolon Separated in German.

Anyway, we Germans have to live with that decision and this issue describes a real every day work issue.

@mechatroner
Copy link
Owner

@Lercher Interesting, I didn't though much about this problem before. BTW Vim version of rainbow csv doesn't rely on file extension, instead there's a content-based detection algorithm which checks two separators: comma and TAB by default, but since you are saying that semicolon is so popular in Europe I will add it to that list. And again once microsoft/vscode#1800 is resolved content based auto-detection approach could be used in this extension too. For now I will just add semicolon syntax grammar with .scsv extension, which no one uses. At least this would allow manual semicolon selection.

@mechatroner
Copy link
Owner

Just published a new version with semicolon separator, which has to be manually selected from the list of languages. Waiting for the linked VSCode ticket to add all possible ascii separators and content-based autodetection.

@Lercher
Copy link

Lercher commented Feb 11, 2018

Cool. Works on my machine. Thanks!

@boeningc
Copy link

boeningc commented Apr 2, 2018

Fiddled around with adding a new language but missing something. How about pipe separated? I would have thought copying the scsv language and updating the extension.js file would have done it but alas I've been defeated.

@mechatroner
Copy link
Owner

@boeningc Did you modify the new pipe.tmLanguage.json file? You need to replace ; with | and prepend it with two \\ backslashes, one for regexp, another one for exterior json. The result will look like this:

    "patterns": [
        { "match": "((?:\"(?:[^\"]*\"\")*[^\"]*\"(?:\\||$))|(?:[^\\|]*(?:\\||$)))?((?:\"(?:[^\"]*\"\")*[^\"]*\"(?:\\||$))|(?:[^\\|]*(?:\\||$)))?((?:\"(?:[^\"]*\"\")*[^\"]*\"(?:\\||$))|(?:[^\\|]*(?:\\||$)))?((?:\"(?:[^\"]*\"\")*[^\"]*\"(?:\\||$))|(?:[^\\|]*(?:\\||$)))?((?:\"(?:[^\"]*\"\")*[^\"]*\"(?:\\||$))|(?:[^\\|]*(?:\\||$)))?((?:\"(?:[^\"]*\"\")*[^\"]*\"(?:\\||$))|(?:[^\\|]*(?:\\||$)))?((?:\"(?:[^\"]*\"\")*[^\"]*\"(?:\\||$))|(?:[^\\|]*(?:\\||$)))?((?:\"(?:[^\"]*\"\")*[^\"]*\"(?:\\||$))|(?:[^\\|]*(?:\\||$)))?((?:\"(?:[^\"]*\"\")*[^\"]*\"(?:\\||$))|(?:[^\\|]*(?:\\||$)))?((?:\"(?:[^\"]*\"\")*[^\"]*\"(?:\\||$))|(?:[^\\|]*(?:\\||$)))?",

Also if you don't expect your pipe-separated files to contain double-quoted pipes, it would be better to modify tsv.tmLanguage.json instead.

@boeningc
Copy link

boeningc commented Apr 3, 2018

I did create a new file and change the regex to use 2 \\. I took the TSV pattern and change \\t to \\|

What I'm not seeing is the option in the languages selection. Sorry I wasn't clear about that earlier.

@boeningc
Copy link

boeningc commented Apr 3, 2018

{ "name": "pipe syntax", "scopeName": "text.pipe", "fileTypes": ["pipe"], "patterns": [ { "match": "([^\\|]*\\|?)([^\\|]*\\|?)([^\\|]*\\|?)([^\\|]*\\|?)([^\\|]*\\|?)([^\\|]*\\|?)([^\\|]*\\|?)([^\\|]*\\|?)([^\\|]*\\|?)([^\\|]*\\|?)",

var dialect_map = {'csv': [',', 'quoted'], 'tsv': ['\t', 'simple'], 'csv (semicolon)': [';', 'quoted'], 'csv (pipe)': ['\|', 'simple'] };

var pipe_provider = vscode.languages.registerHoverProvider('csv (pipe)', { provideHover(document, position, token) { return make_hover(document, position, 'csv (pipe)', token); } });

@mechatroner
Copy link
Owner

mechatroner commented Apr 4, 2018

@boeningc what about package.json ? Did you modify it? And you probably don't need the backslash in dialect_map.

@boeningc
Copy link

boeningc commented Apr 4, 2018

DOH! I did not. Didn't even look at it. :(

@boeningc
Copy link

boeningc commented Apr 4, 2018

Success! Thank you so much for the quick responses and pointers. :)

@mechatroner
Copy link
Owner

@boeningc You are welcome!

@robertlugg
Copy link

Hi all, I couldn't follow this thread exactly. I have a text file where columns are separated by one or more spaces (not tabs). Is it possible to use this type of file with rainbow_csv?

@mechatroner
Copy link
Owner

@robertlugg No, it is not possible with current version. But you can substitute whitespaces with tabs in your file globally: s/ */\t/g and use TSV syntax. If you want a permanent solution you can also modify the TSV syntax file (replace all \t with *), combine it with modified package.json file and you will get your own mini-extension just from these two files. I don't want to include new grammars into "Rainbow CSV" until the linked VSCode issue is resolved. That is because each variation creates a new language that pollute language selection menu, and I think all such use cases are pretty rare compared to CSV/TSV/CSV (semicolon). Also some users need just simple whitespace separated files, the others may want a different grammar where whitespaces can be escaped with backslashes or double-quotes.

@harvest316
Copy link

Very keen to see pipe-delimiting for .dat files soon.

@mechatroner
Copy link
Owner

OK, I think it would make sense to add more grammars, there is no point to wait for microsoft/vscode#1800

First candidate is obviously "pipe-separated" files. I won't be able to associate it with any filetype, but it will still be available with manual selection. The only question is whether anyone needs "quoted" pipe separated syntax, where fields containing pipe characters can be enclosed in double-quotes to escape them?

Another two separators that I think could be relevant are colon and double-quote.

Also I will probably implement csv and csv-semicolon grammars which doesn't allow quoted fields, this will allow to change original csv and csv(semicolon) grammars and highlight lines with unbalanced double-quotes as "errors".

The mentioned multi-space separated files, which many *nix utility produce as output, are definitely very relevant, but there is a technical issue, that will complicate the implementation. So it will take time to make this.

Single space-separated files could be useful, but people can incorrectly assume that this grammar is for multi-space separated files.

So the plan is not to add all possible separators and escape rule combinations, but only those that are practical.

@harvest316
Copy link

In my experience, the most common pipe-delimited files are the .DAT files you get when uploading & downloading batch payment files to banks and payment gateways. They are never quoted, and generally come with a fairly irrelevant 1-2 line header (no column names) and a single-line footer that contains the number of rows and total of the dollar amounts in the file. Often the header and footer do not contain pipes, only the actual data rows have pipe delimiters.

@mechatroner
Copy link
Owner

@harvest316 Thanks, this is interesting!
I don't want to add .DAT -> 'pipe' association on the extension level, but turns out there is a way to add this mapping manually through VSCode config:
https://stackoverflow.com/a/36789145/2898283
So, I will just include this instruction into README.md

@Lercher
Copy link

Lercher commented May 26, 2018

Just stumbled into https://code.visualstudio.com/docs/extensionAPI/extension-points#_contributeslanguages and this leads me to a comfort enhancement request:

What about reading the firstLine property mentioned in the article, counting the number of commas and the number of semicolons there, and whatever is the bigger figure, choose CSV or CSV (semicolon delimited) as the language of the file? This can go wrong, for sure, but if it saves x% of language switching, it‘s worth the price.

One detail use case: no header line and only floats with comma as decimal point. I.e. 1,1;2,2;3,3;... it has equal number of commas and semicolons or even one comma more. My personal preference is to choose ;-delimited in this case.

Thanks

@mechatroner
Copy link
Owner

@Lercher I didn't know about this feature, but I think it will give too many false positives: a lot of non-csv files can contain commas or semicolons in the first line. Also I think it is not right to measure worth of this feature by percentage of switching: switch back could be more emotionally expensive since incorrect filetype detection would be very annoying.
The right way to do content based-autodetection is by analyzing first 10 lines of a file, I can't imagine a situation where this would fail. I am sure that sooner or later VSCode will support this, but for now we will just have to use manual selection mechanism.

@Lercher
Copy link

Lercher commented May 27, 2018

If you say so.

However, I guess, if one of the counts is zero and the other one positive, then the method won't produce any false positives. IMHO this reduces switching business to non-existent for all files containing headers with names that are derived from identifiers of programming languages or DBMSs.

@mechatroner
Copy link
Owner

I've published updated version, the only change is that now Rainbow CSV supports pipe | separator. I probably should have done it long ago, but better late than never I suppose. The Readme doc file was also updated with a table of supported separated and instructions how to create extension -> separator association, this could be useful in some cases.

@GrisPetitDragon
Copy link

Hello,
I use Rainbow CSV and I really enjoy it ;)
I have a question though: I often work simultaneously with various csv files, and they don't all use the same separator: some of them are semicolon separated, while others use pipes as separators. I've tried to modify VSCode's Rainbow CSV parameters, but it only seems to take in account one separator at a time. For instance, setting
"*.csv": "CSV (semicolon,pipe)"
did not work.
Is there any way I can get those lovely colours on both types of csv file at a time?

@mechatroner
Copy link
Owner

Hello, @GrisPetitDragon ,
Thanks for feedback!
It will be possible once content-based auto-detection is implemented. It is trivial to implement, but I need VSCode API call, which is currently missing, to switch language ID. See the linked VS Code ticket.

@mechatroner
Copy link
Owner

Good news: microsoft/vscode#1800 is complete. I even took a part in writing the API implementation 😎 So this allows to add auto-detection functionality and possibly more CSV dialects, since their selection would be much more convenient.

@harvest316
Copy link

Thank you!!! :)

@mechatroner
Copy link
Owner

I've just published version 0.7.0 which has content-based separator autodetection logic. The new functionality will work only with VSCode 1.28, for older VSCode versions there should be no change in behavior.

mechatroner pushed a commit that referenced this issue Oct 17, 2018
Tooltip: from `Col# 1` to `Col #1`
@GrisPetitDragon
Copy link

Thank you so much!

@mechatroner
Copy link
Owner

@GrisPetitDragon you are welcome! Actually there is an issue with current implementation: separator autodetection will only work for "plaintext" files with unassigned language. i.e. if a table file has '.txt' or some unknown extension (e.g. '.unknown') - autodetection will work and switch it to "csv" or "csv (semicolon)" depending on it's content. But it won't switch ".csv" file to semicolon language even if it is really a semicolon separated file. I plan to fix this soon.

@C-Bam
Copy link

C-Bam commented Nov 22, 2018

@mechatroner

Oh I'm facing this issue.

I get this now. Thanks and hope it's coming soon :)

@arzoo1
Copy link

arzoo1 commented Jan 29, 2019

Any chance to use this with tilde (~) as the delimiter?

@mechatroner
Copy link
Owner

I've just published version 1.0.0 with 7 new separators:
^ - by @pantyushkin request
~ - by @arzoo1 request
and 5 others: : " = . -
I am also planning to add whitespace separator in the next version, since it requires a totally different grammar and backend support.
Also if microsoft/vscode#53885 is finished - this would theorethically allow us to support any possible separator or sequence of separators.

@arzoo1
Copy link

arzoo1 commented Feb 4, 2019

I've just published version 1.0.0 with 7 new separators:
~ - by @arzoo1 request

Thanks!

@mechatroner
Copy link
Owner

mechatroner commented May 25, 2019

In version 1.1.1 there is a new special whitespace-separated dialect that @robertlugg was suggesting. Multiple consecutive whitespaces are threated as a single one.

@Mingun
Copy link

Mingun commented Jun 18, 2019

Thanks for you work. I am surprised that I did not find tab in the delimiters list.

@mechatroner
Copy link
Owner

@Mingun What do you mean? tab is supported since the very first version.

@Mingun
Copy link

Mingun commented Jul 3, 2019

I do not see ability to select tab in the list. Just CSV not colors anything.
Example (tab delimiters):
Tab

Example (; delimiters):
Semicolon

@mechatroner
Copy link
Owner

Oh, I see what you mean. The tab-separated csv is usually called "TSV", I thought this is a universally known fact. So maybe I should add language alias: "TSV" -> "CSV(tab)", I will think about this. So, @Mingun , you should just select "TSV" from the list. BTW Another option to enable the dialect is to select the delimiter -> right click -> set as rainbow separator from the context menu.

@Mingun
Copy link

Mingun commented Jul 3, 2019

I Thank you. I already checked documentation (who reads it :)) also saw that there is a separate TSV language. Admit, never met such abbreviation so alias will be very useful (besides, it will allow to collect all settings in one group)

@mechatroner
Copy link
Owner

Starting from version 3.0.0 all possible characters and even multicharacter strings can be used as a separator. To set an arbitrary separator - select it in the editor with the cursor and run Rainbow CSV: Set separator - Basic command. The separator character or string can also be added to the list of autodetected characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants