Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat(Dir/Difference): See Difference Between Dir Files & Content #20

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

toksdotdev
Copy link

@toksdotdev toksdotdev commented Aug 7, 2018

Print out the difference between any two directories to stdout. This includes:

  • Difference between lines in files
  • Existence of a file

A sample output is:

╔════════════════════════════════════════════════════════════════════════╗
║                               Differences                              ║
╠════════════════════════╦═══════════════════════╦═══════════════════════╣
║ Filename               ║ tests/easy/bad/dir1   ║ tests/easy/bad/dir2   ║
╠════════════════════════╬═══════════════════════╬═══════════════════════╣
║ /sample.txt            ║ FILE EXISTS           ║ DOESN'T EXIST         ║
╠════════════════════════╬═══════════════════════╬═══════════════════════╣
║ "/test.txt":1          ║ testing testing       ║ oh no!                ║
╠════════════════════════╬═══════════════════════╬═══════════════════════╣
║ "/test.txt":2          ║ skdjjd                ║                       ║
╚════════════════════════╩═══════════════════════╩═══════════════════════╝
  • Tests created for any new feature or regression tests for bugfixes.
  • cargo test succeeds
  • cargo +nightly clippy succeeds

Copy link
Collaborator

@epage epage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for contributing!

A question I have though is if a view (e.g. table) belongs in dir-diff or only a model (e.g. iterator of results).

This is related to issue #11 which I experimented with in #17. I'm mixed on the results there and haven't had time to further experiment on it.

Additionally, I've been curious about integrating dir diffing into predicates so it can work with a testing library like assert_fs. See assert-rs/predicates-rs#33

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@epage
Copy link
Collaborator

epage commented Aug 7, 2018

Also, could you share what your use case is? It'd be helpful to better understand the larger picture of what you are trying to accomplish.

Copy link
Collaborator

@epage epage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for responding to my first round of feedback. In this round, I cover some more design questions and trade offs. You are welcome to go ahead and work on them and we can continue to iterate on each round until we come to a design that works.

I will note that I've known people who tend to get burnt out with that though and I don't want that to happen with you. Something that can help is first discussing the requirements in #11 and assert-rs/predicates-rs#33 and proposing a solution in #11. We can then iterate in the comments of #11 on the concepts before taking the time to write the code.

@@ -15,3 +15,4 @@ travis-ci = { repository = "steveklabnik/dir-diff" }

[dependencies]
walkdir = "2.0.1"
matches = "0.1.7"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is only used for tests. This should be under dev-dependencies so that clients don't get this added to their builds.

For an example, see https://github.com/assert-rs/assert_cmd/blob/master/Cargo.toml

.collect::<Vec<_>>();

if files_a.len() != files_b.len() {
return Err(Error::MissingFiles);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting trade-off between performance and information.

For people who want fast results, this is great. For people who want to know what all is different, this doesn't work out too well.

Two things about this.

First, if we're trying to optimize, I'm unsure which half of this is slower, the part above here (walk two directory tries, put all content into Vec) or the part below (diffing content).

Second, I think it might be better to offer the client the choice.

We can handle both of these if we structured this as an iterator over two directories and then just offered them a diff function on the "tuple", they can then choose whether to filter for all differences, end on the first difference, etc. I'm assuming we can make the iterator by first walking one tree, checking for what exists in the other tree, and then walk the second tree, checking for what doesn't exist in the other tree.

This is how cobalt's diffing works in its tests

let full_path_a = &a_base.as_ref().join(&a);
let full_path_b = &b_base.as_ref().join(&b);

if full_path_a.is_dir() || full_path_b.is_dir() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if one is a directory and the other isn't? Won't this silently ignore that?


for (a, b) in files_a.into_iter().zip(files_b.into_iter()).into_iter() {
if a != b {
return Err(Error::FileNameMismatch(a, b));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is convenient for writing, I don't think users will understand what a file name mismatch means. Instead this is about one of the two files is missing in one of the trees. Ideally, we tell them that and tell them which one is missing.

let mut a_lines = content_of_a.lines().collect::<Vec<&str>>();
let mut b_lines = content_of_b.lines().collect::<Vec<&str>>();

if a_lines.len() != b_lines.len() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than writing our own file diffing, we should probably use the difference crate .

This is a re-phrasing of a previous posting that is now collapsed:

Depending on how we solve binary files, should this instead use the difference crate rather than re-implementing file diffing ourselves?


#[test]
fn missing_file() {
assert_matches!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat. I've never heard of this crate before. I'll need to see where it'd simplify my tests.

#[test]
fn missing_file() {
assert_matches!(
dir_diff::see_difference("tests/missing_file/dir1", "tests/missing_file/dir2"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For every test case, we should probably ensure see_difference returns the same result as is_different.

@@ -56,13 +73,104 @@ pub fn is_different<A: AsRef<Path>, B: AsRef<Path>>(a_base: A, b_base: B) -> Res
Ok(!a_walker.next().is_none() || !b_walker.next().is_none())
}

/// Identify the differences between two directories.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context: This is more for design input then to implemented right now

We are only offering directory diffing. Subset checks would also be very valuable, possibly more valuable at least when writing tests to keep the "golden" case simple to minimize expanding a test to cover more than is intended.

Later I mention

We can handle both of these if we structured this as an iterator over two directories and then just offered them a diff function on the "tuple", they can then choose whether to filter for all differences, end on the first difference, etc. I'm assuming we can make the iterator by first walking one tree, checking for what exists in the other tree, and then walk the second tree, checking for what doesn't exist in the other tree.

That makes it really easy for us to also offer a subset check, we only do the first iteration. One more benefit for the iteration approach.

@@ -9,15 +9,15 @@
//! extern crate dir_diff;
//!
//! assert!(dir_diff::is_different("dir/a", "dir/b").unwrap());
//! ```
//!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the closing of the example code fence removed?

@@ -26,6 +26,22 @@ pub enum Error {
Io(std::io::Error),
StripPrefix(std::path::StripPrefixError),
WalkDir(walkdir::Error),
/// One directory has more or less files than the other.
MissingFiles,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this for now to minimize churn as this PR evolves but I suspect we'll want to split out the diffing error information.

@toksdotdev
Copy link
Author

Sorry, I haven't had time to work on this, as I've been busy for the past months. I'll try to address this ASAP.

Sorry for the delay.

@epage
Copy link
Collaborator

epage commented Dec 8, 2018

Understandable; we all have those times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants