Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coworkers report that lint --fix will sometimes delete / empty / truncate a file. #6061

Closed
Raynos opened this issue Sep 25, 2024 · 23 comments · Fixed by #6096 or #6747
Closed

Coworkers report that lint --fix will sometimes delete / empty / truncate a file. #6061

Raynos opened this issue Sep 25, 2024 · 23 comments · Fixed by #6096 or #6747
Assignees
Labels
A-linter Area - Linter C-bug Category - Bug

Comments

@Raynos
Copy link

Raynos commented Sep 25, 2024

One of the typescript files gets replaced with the empty file and all its content get deleted. This is confusing and it's unclear which oxlint rule is doing this.

@Raynos Raynos added the C-bug Category - Bug label Sep 25, 2024
@DonIsaac
Copy link
Contributor

Hi Raynos, thanks for the report. Do you think you could provide me some more details to reproduce?

  1. What rules do you have enabled?
  2. What exact command are you running?
  3. Can you provide a minimal reproduction of the file?

Thanks!

@Raynos
Copy link
Author

Raynos commented Sep 26, 2024

oxlint -c=./.oxlintrc.json --tsconfig=./tsconfig.json . -D correctness -D perf -D suspicious --promise-plugin --import-plugin --fix
{
  "settings": {},
  "rules": {
    "no-unused-vars": "allow",
    "no-new-array": "allow",
    "no-empty-file": "allow",
    "no-document-cookie": "allow",
    "no-this-alias": [
      "deny",
      {
        "allow_names": ["self"]
      }
    ],
    "no-await-in-loop": "allow",
    "react-in-jsx-scope": "allow",
    "consistent-function-scoping": "allow",
    "no-async-endpoint-handlers": "allow",
    "no-new": "allow",
    "no-extraneous-class": "allow"
  }
}

I have not being able to reproduce this locally but I've asked my coworkers for a reproduction uasecase.

@DonIsaac DonIsaac added the A-linter Area - Linter label Sep 26, 2024
@DonIsaac
Copy link
Contributor

@Raynos thanks for the details. I will start debugging this once I have a file I can reproduce on.

@Raynos
Copy link
Author

Raynos commented Sep 26, 2024

one file that disappeared is workspaces/metrics-exporter/src/metadataQueue.ts when I hadn't touched anything in that workspace

coworker mentioned a file was truncated in a directory he has not touched on his branch.

@Raynos
Copy link
Author

Raynos commented Sep 26, 2024

and it's flaky hard to reproduce.

@Raynos
Copy link
Author

Raynos commented Sep 26, 2024

Happened to me too, it randomly deleted

workspaces/pipeline/src/memo/ram-memo.ts
```

@Raynos
Copy link
Author

Raynos commented Sep 26, 2024

I'm running it in a loop and it deleted 5/6 random files :(

@Raynos
Copy link
Author

Raynos commented Sep 26, 2024

https://github.com/PabloSzx/graphql-ez/tree/jake/reproduction

And

$ while true ; do npm run lint:fix ; done

Will reproduce it ; you should run pnpm install in this repo as its pnpm; not npm.

@DonIsaac
Copy link
Contributor

@Raynos what OS are you and your colleagues using?

@DonIsaac
Copy link
Contributor

...and is this how you guys run oxlint??

@DonIsaac
Copy link
Contributor

Tl;Dr

This was weird. I think I found a way to make this stop happening, but I could not figure out or solve the underlying issue. Either way, I'll have a PR out after I finish writing this out.

Longer Version

So this bug was weird. My initial thought was "oh, maybe a deletion fix is getting applied to the whole file, or there's a bug in the fix merging or something". So I added an assertion after the new source file has been built.

// oxc_linter/src/service.rs line 268
let fix_result = Fixer::new(source_text, messages).fix();
assert(!fix_result.fixed_code.is_empty();

The assertion never panicked, and files kept getting deleted. What gives?

My next thought is that maybe something isn't getting flushed somewhere. So I replaced the line that writes the fixed source

// before
fs::write(path, fix_result.fixed_code.as_bytes()).unwrap()

With something that forces a blocking write-through to disk, using man 2 fsync

let mut file = File::create(path).unwrap();
file.write_all(fix_result.fixed_code.as_bytes()).unwrap();
file.flush().unwrap(); // for good measure
file.sync_all().unwrap();

When I tried that, oxlint went from 20m to 200ms, but still files kept getting deleted. And here is where I got confused.

This lead me down a rabbit hole of the weirdness and brokenness of close, especially in Rust, where close will silently fail (for example, if a system interrupt is sent while you're trying to close).

So what if we closed it ourselves?

unsafe {
  let exit = libc::close(file.into_raw_fd());
  assert_eq!(exit, 0);
}

And you know what? THAT DIDN'T WORK EITHER! can you tell this is driving me a little mad?

Finally, thankfully, I found a workable solution. It does not address any of the above-mentioned issues or weirdness, but it should work. Basically, the lint Runtime is always writing fixed content, _even if no fixes have been applied. This basically means that if you have --fix set, all files will have their content re-written whether or not any changes were actually made. This seems to solve the problem.

Sorry for the long reply, that was 2 hours of my life and I had to get it off my chest.

@Raynos
Copy link
Author

Raynos commented Sep 27, 2024

@Raynos what OS are you and your colleagues using?

Mostly macintosh / macbooks, maybe one ubuntu linux user.

We don't run oxlint in a loop but yes the npm run lint:fix command is what we are running.

@Raynos
Copy link
Author

Raynos commented Sep 27, 2024

I used the loop example to eventually reproduce the flaky behavior of file truncation.

@Raynos
Copy link
Author

Raynos commented Sep 27, 2024

So i'm curious what happens if you do apply the fix to all files, but at the end of the program, instead of exiting the cli, you sleep for 10milliseconds.

Whether that will stop the behavior of empty buffer being written to the files.

Like is the problem here that writing the same content to the file plus flaky close behavior plus an efficient binary that at the end of the program writes to all the files, writes to stdout and exits the process cleanly leaves some file descriptors in a nonflushed / bad state.

Do you think your fix of not writing to the file if no fixes were applied will cause this bug frequency to go from "1 in 10 runs" to "1 in 10,000 runs" ?

I've never seen file writing operations get corrupted like this before, it seems like such a trustworthy battle hardened piece of the OS...

@Raynos
Copy link
Author

Raynos commented Sep 27, 2024

Have you looked whether the issue is the non-atomic nature of fs::write ? rust-lang/rust#82590

Could it be that multiple rules are trying to concurrently fix a file and write the original content back to the file and that causes the truncation ?

Would replacing the fs::write with fs::write to temp file and then an atomic file rename cause the problem to go away ? |

Edit: oh ffs, file renames are only atomic on linux, not on macos,

DonIsaac added a commit that referenced this issue Sep 27, 2024
@DonIsaac
Copy link
Contributor

DonIsaac commented Sep 27, 2024

So i'm curious what happens if you do apply the fix to all files, but at the end of the program, instead of exiting the cli, you sleep for 10milliseconds.

Whether that will stop the behavior of empty buffer being written to the files.

Like is the problem here that writing the same content to the file plus flaky close behavior plus an efficient binary that at the end of the program writes to all the files, writes to stdout and exits the process cleanly leaves some file descriptors in a nonflushed / bad state.

Do you think your fix of not writing to the file if no fixes were applied will cause this bug frequency to go from "1 in 10 runs" to "1 in 10,000 runs" ?

I've never seen file writing operations get corrupted like this before, it seems like such a trustworthy battle hardened piece of the OS...

I added a sleep 1 command inside the loop and saw this:

  1. With sync_all, no corruption occurred.
  2. Without sync_all some corruption still occurred, but less

Unfortunately the cost of syncing, at around 10x the runtime performance, is simply too high. Note that syncing without a sleep still causes corruption.

@Boshen Boshen added the P-high Priority - High label Sep 27, 2024
@Boshen
Copy link
Member

Boshen commented Sep 27, 2024

Despite being fixed, we need an integration test to catch future regressions.

@DonIsaac
Copy link
Contributor

I have no idea how to reproduce it consistently in an integration test

@Raynos
Copy link
Author

Raynos commented Sep 30, 2024

if you want a slow integration test you can spawn oxlint as a child process in a loop with a timeout of 5s and assert no files were truncated.

You can try to revert the fix locally and see if 5s is enough looping to reproduce it ±50% of the time.

@Raynos
Copy link
Author

Raynos commented Sep 30, 2024

For flaky behavior its basically impossible to write a consistent integration test, your just writing a test that will hopefully catch regressions on every second commit on average.

@DonIsaac
Copy link
Contributor

DonIsaac commented Oct 3, 2024

@Boshen, do you think writing a 5s loop regression test is worth it?

@Boshen Boshen removed the P-high Priority - High label Oct 4, 2024
@Boshen
Copy link
Member

Boshen commented Oct 4, 2024

@Boshen, do you think writing a 5s loop regression test is worth it?

Probably, leave this to me, I'll think about this issue.

Boshen added a commit that referenced this issue Oct 21, 2024
Boshen added a commit that referenced this issue Oct 21, 2024
Boshen added a commit that referenced this issue Oct 21, 2024
@Boshen Boshen closed this as completed Oct 21, 2024
@Boshen
Copy link
Member

Boshen commented Oct 21, 2024

Added a test in #6747, the test checks the modified timestamp to ensure file is not written when no fix is applied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-linter Area - Linter C-bug Category - Bug
Projects
None yet
3 participants