Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipe handles newlines improperly #3669

Closed
stevengubler opened this issue Aug 19, 2020 · 15 comments
Closed

Pipe handles newlines improperly #3669

stevengubler opened this issue Aug 19, 2020 · 15 comments
Labels

Comments

@stevengubler
Copy link

Steps

Write some string into the buffer. Select the entire string, but do not include the newline character at the end of it. Use | in normal mode to pipe the selection into an external program; base64 works well here.

Issue 1: Copy that entire string into some separate base64 decoder like CyberChef and decode it.

Issue 2: Back in Kakoune, select your string again, (make sure to not include a new line). Pipe your selection through base64 --decode.

Outcome

With Issue 1, it is clear to see that there is an extra new line included. This is not expected.

With Issue 2, while it is clear that the encoded string contains a newline (as per Issue 1), no new line is included in the decoded string.

Expected

With Issue 1, I would expect that no newline character would be appended to my selection string, and it should string should be exactly passed to the shell command's input stream.

With Issue 2, I would expect that any newline characters included at the end of the output stream to be accounted for
and recorded in the buffer.


To add some color to this, I was using Kakoune to edit some k8s resources, some of which were secrets. In k8s, the string data of a secret is encoded with base64. For example:

apiVersion: v1
kind: Secret
metadata:
  name: MySecret
data:
  THE_SECRET: bXkgc25lYWt5IHNlY3JldA==

Withing Kakoune, I decoded a string to edit it, and then reencoded it; this resulted in a sneaky and unexpected newline character being included with my secret's value. I spent a good deal of time running around trying to figure out why the secret was not what it was showing up to be in my editor, and eventually found this issue.

@Delapouite
Copy link
Contributor

Related: #1515

@Screwtapello
Copy link
Contributor

Are there many apps that misbehave if stdin reaches EOF without a trailing newline?

@stevengubler
Copy link
Author

@Screwtapello Commands like base64, wc, gpg, diff, comm, and tail are definitely affected, and commands like grep, head, sed, and awk are potentially affected, to name a small bunch.

I definitely understand why some whitespace manipulation happens for general usability's sake, but this breaks my primary usecase for the piping feature. An option or method for disabling that manipulation behavior would be greatly appreciated.

@mawww
Copy link
Owner

mawww commented Aug 20, 2020

I agree this needs to be fixed somehow, the general issue is that many commands, like base64, append an EOL to their output, but we dont want that newline to get inserted when we pipe some text that does not end-up with a newline. The current logic does the following: if input text does not end with EOL, append an EOL then remove it from the pipe output text.

Maybe the solution is to simplify the existing logic not-to add a newline when its missing, but still remove a newline from the output if the original text did not end with one ?

I wonder what we would break if we stopped doing that, here is a diff that does that:

diff --git a/src/normal.cc b/src/normal.cc
index beb880f9..33256c0b 100644
--- a/src/normal.cc
+++ b/src/normal.cc
@@ -579,9 +579,7 @@ void pipe(Context& context, NormalParams)
                     const auto end = changes_tracker.get_new_coord_tolerant(sel.max());
 
                     String in = buffer.string(beg, buffer.char_next(end));
-                    const bool insert_eol = in.back() != '\n';
-                    if (insert_eol)
-                        in += '\n';
+                    const bool ends_with_eol = in.back() == '\n';
 
                     // Needed in case we read selections inside the cmdline
                     context.selections_write_only().set({keep_direction(Selection{beg, end}, sel)}, 0);
@@ -590,12 +588,9 @@ void pipe(Context& context, NormalParams)
                         cmdline, context, in,
                         ShellManager::Flags::WaitForStdout).first;
 
-                    if (insert_eol)
-                    {
-                        in.resize(in.length()-1, 0);
-                        if (not out.empty() and out.back() == '\n')
-                            out.resize(out.length()-1, 0);
-                    }
+                    if (not ends_with_eol and not out.empty() and out.back() == '\n')
+                        out.resize(out.length()-1, 0);
+
                     auto new_end = apply_diff(buffer, beg, in, out);
                     if (new_end != beg)
                         new_sels.push_back(keep_direction({beg, buffer.char_prev(new_end), std::move(sel.captures())}, sel));

I'll run with that for a while see if I can spot any issue.

@mawww
Copy link
Owner

mawww commented Sep 1, 2020

So, one issue I got so far with that patch is with bc:

 $ echo -n '1+1' | bc
(standard_in) 1: syntax error

For some reason it really wants an end-of-line, however seems busybox bc is happy with the missing newline, so maybe this can be considered a bug in GNU bc.

@lenormf
Copy link
Contributor

lenormf commented Sep 2, 2020

According to my GNU/bc manpage:

In bc statements are executed "as soon as possible." Execution happens when a newline in encountered and there is one or more complete statements. Due to this immediate execution, newlines are very important in bc. In fact, both a semicolon and a newline are used as statement separators. An improperly placed newline will cause a syntax error.

I guess users who know about that could pipe 1+1;.

But to play the advocate of the conservative side here, why does the editor have to pick one way or the other?

I understand we're trying to make the default behaviour as convenient as possible to everybody, but since there are cases where a trailing newline is important to the shell program, why make the editor take the decision whether to include/strip it arbitrarily?

Could we not let the | primitive pipe the text to a tool, as-is, and document a snippet/script in the wiki that handles fancy side-effects that modify the input? After all, users who don't want the input to be modified will have to do that to correct the editor's behaviour.

@mawww
Copy link
Owner

mawww commented Sep 2, 2020

Thats what I'd like to get to here, stop inserting that extra newline, reading the bc grammar at https://pubs.opengroup.org/onlinepubs/9699919799/utilities/bc.html however makes it quite clear that the newline is required, so at least for this use case, it would mean that writing a mathematical expression, selecting it and piping to bc would now fail due to the missing newline.

I think we still want to strip the output trailing newline if the input did not have one, because almost every filter will output a final newline.

@stevengubler
Copy link
Author

Hmmm...

On one hand, I like @lenormf's idea that | should be a simple tool. Using that as a primitive, more complex behavior could be implemented on top. But while that's a logically satisfying solution, I don't think it's great for usability. Having the more useful (generally) command only available through some other composition is not great for an efficient workflow.

So what about just making it more clear to the user what is happening with trailing newlines and offer them some kind of alternative? Maybe something like | pipes to a command like normal, and <a-|> does an exact-piping of sorts?

If we can make it sufficiently clear to the user which command has which behavior, I think that would work.

@mawww
Copy link
Owner

mawww commented Sep 6, 2020

The issue is that we already have <a-|> for pipe-to (pipe without replacing buffer text with the output), so we would end-up with combinatorial explosion if we were to add alternate behaviour.

I am leaning towards not adding the trailing newline and break GNU bc, this can be addressed by piping into something else, say { cat; echo; } | bc. Its still not very satisfying, but it seems the most "correct" solution, its unfortunate POSIX bc does not mandate it to handle that case.

@stevengubler
Copy link
Author

we would end-up with combinatorial explosion if we were to add alternate behaviour.

Fair point. It would soon get to the point where a user would rather use the most versatile implementation and manually manage the alternate behavior externally, (similar to your example with { cat; echo; } | bc), rather than remember all of the specific usecases for the different implementations within the editor.


On another note, I'm not as worried about stripping newlines when output is inserted into the editor, since that behavior is more easily visible to the user. However we do run the risk of losing new lines mysteriously if the newline is an important part of the output, but I don't think that's particularly common. Again, base64 is an example of where that could be an issue, especially since recognizing that a newline has been lost is difficult without external verification.

But as I said though, this issue doesn't worry me as much. Thoughts, @mawww?

@AvdN
Copy link

AvdN commented Jan 21, 2022

Would it be possible to the no-add-newline behaviour as default, and have a configuration setting that allows certain command (i.e. first word after | ) that allows adding and/or stripping of the newline, with some explicit setting for doubling or not if the original buffer ends in a newline? That way you could configure using bc to add the extra newline.

I was testing out |testfmt dumping the input handed to testfmt to a log file and was surprised to get the same input for selecting text with and without newline, and not being able to distinguish between them

@Screwtapello
Copy link
Contributor

It occurs to me that if Kakoune always piped data exactly as-is, it would be easy to write wrapper scripts to automatically add/strip a trailing newline where needed. However, since Kakoune does the fix-ups itself, it's not possible for wrapper scripts to restore the original content.

@stevengubler
Copy link
Author

I wonder if it is possible to leverage the hook system to enable both use cases nicely... Here's how I figure:

Usecase 1: bc

  1. Kakoune recieves pipe command on the selected text
  2. Kakoune runs a pre-pipe hook on the selected text, setting the state that tracks whether or not the text had an EOL at the end.
  3. Kakoune sends the text to the pipe and receives it back.
  4. Kakoune runs a post-pipe hook on the returned text, removing new line if one was returned and the previous text had no EOL at the end.

Usecase 2: base64

  1. User disables Kakoune from running hooks by first inputing \
  2. Kakoune recieves pipe command on the selected text
  3. Kakoune sends the text to the pipe and receives it back.

This seems to me like the best of both worlds: The default behavior modifies output for the better in most situations, while also using already-built functionality to enable finer control over the output. This also allows the user to configure it as they see fit for whatever use case they have.

The question I have is whether the hooks system is able to tap into the piping system. I'm not familiar enough with the code to answer that yet, but someone here might be able to speak to that.

@mawww mawww closed this as completed in 6f135c0 Jan 28, 2022
@alexherbo2
Copy link
Contributor

Kakoune still eats the newline character after piping, e.g. |printf 'foo'<ret> and |printf 'foo\n'<ret> give the same result.

@krobelus
Copy link
Contributor

krobelus commented Feb 8, 2022 via email

TeddyDD pushed a commit to TeddyDD/kakoune that referenced this issue Aug 19, 2022
This will unfortunately break some use case which will require
using wrapper scripts to add the necessary newline. It is however
harder to do the contrary, and it makes a lot of other use case
possible, such as checksuming.

Fixes mawww#3669
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants