Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to specify files inode #196

Open
dedmen opened this issue Jul 21, 2018 · 21 comments
Open

Unable to specify files inode #196

dedmen opened this issue Jul 21, 2018 · 21 comments

Comments

@dedmen
Copy link

dedmen commented Jul 21, 2018

This code
https://github.com/dokan-dev/dokan-dotnet/blob/master/DokanNet/DokanOperationProxy.cs#L507
Causes the inode for two completly different files to be the same if they just have the same filename.

I'm running Python scripts that copy files around on my drive.
And python's os.path.samefile checks if the file's inode and deviceID are different. Which they aren't if the files happen to have the same name.
In my case I am trying to copy a file to a different directory. So it indeed has the same name and thus the same inode. But they are still seperate files.

My proposed solution would be to provide the ability to set the inode in the FileInformation struct but keep it optional.

What could be done is have JUST GetFileInformation return a different FileInformation struct than everywhere else which contains the full path instead of just the filename.
In my case I'm storing the FileInformation struct directly in my file node because I don't need to generate a new one everytime.

Instead of

public NtStatus GetFileInformation(string filename, out FileInformation fileInfo, DokanFileInfo info)
{
    var node = GetNode(filename, info);
    if (node == null)
    {
        fileInfo = new FileInformation();
        return DokanResult.FileNotFound;
    }

    fileInfo = node.fileInfo;
    return DokanResult.Success;
}

I would need to

public NtStatus GetFileInformation(string filename, out FileInformation fileInfo, DokanFileInfo info)
{
    var node = GetNode(filename, info);
    if (node == null)
    {
        fileInfo = new FileInformation();
        return DokanResult.FileNotFound;
    }

    fileInfo = new FileInformation
    {
        FileName = filename,
        Attributes = node.fileInfo.Attributes,
        CreationTime = node.fileInfo.CreationTime,
        LastAccessTime = node.fileInfo.LastAccessTime,
        LastWriteTime = node.fileInfo.LastWriteTime,
        Length = node.fileInfo.Length
    };
    return DokanResult.Success;
}

But that would make GetFileInformation from now on always return a wrong FileName.

Which is btw what the Mirror sample is already doing.

@dedmen
Copy link
Author

dedmen commented Jul 21, 2018

Just looked over it and found that FileName in FileInformation at that place isn't needed at all. It's just used to generate the hash and nothing else.
I couldn't find a mention of that in the documentation. That's a bit unintuitive.

The same entry in the same struct stands for FileName everywhere else. And is also called FileName.
Just in this case it has to be full path although it should really just be a number.

@Liryna
Copy link
Member

Liryna commented Jul 21, 2018

Hi @dedmen ,

GetFileInformation is using FileInformation in C# but in reality the GetFileInformation wrap GetFileInformationByHandle win32 https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-getfileinformationbyhandle
that has no information of the filename https://docs.microsoft.com/en-gb/windows/desktop/api/fileapi/ns-fileapi-_by_handle_file_information

Which description do you think you would have like here ?

@dedmen
Copy link
Author

dedmen commented Jul 21, 2018

I'd say use a new struct that has the actual INODE handle as a number instead of FileName.
That would also allow to expose nFileIndexHigh and dwNumberOfLinks to the user if they need it.
It definetly should be a seperate type as the one used in FindFiles though. Developers first assume same type == same thing. Which is not the case here.

That change would be a API breaking change.

@Liryna
Copy link
Member

Liryna commented Aug 6, 2018

Hi @dedmen ,

I agree with your proposition. Do you think you could provide a pull request with the changes ? This would be greatly appreciated 👍

@dedmen
Copy link
Author

dedmen commented Aug 6, 2018

Yeah sure. Probably next weekend

@Liryna
Copy link
Member

Liryna commented Oct 9, 2019

@dedmen I imagine next weekend is far enough now 😄 just wanted to know if it is really out of your time ?

@dedmen
Copy link
Author

dedmen commented Oct 10, 2019

Sorry, I spent most of my time on other projects as the dokan thing was working well enough at that point.
Its currently using my dirty workaround
https://github.com/arma3/DokanPbo/blob/master/DokanPbo.Core/PboFS.cs#L368

This is still on my todo list, but not a high priority sadly. Too much other stuff going on.

@Liryna
Copy link
Member

Liryna commented Oct 10, 2019

No worry 👍 Thank you for the update and the workaround that might help others !

@dedmen
Copy link
Author

dedmen commented Oct 10, 2019

For reference (as I already forgot most of the stuff I was looking at back then)

Hash is calculated and used here:

rawHandleFileInformation.nFileIndexLow = (uint)(fi.FileName?.GetHashCode() ?? 0);

This

var result = operations.GetFileInformation(rawFileName, out var fi, rawFileInfo);

should really just return a struct mirroring the BY_HANDLE_FILE_INFORMATION contents in a C# format, we can basically keep most things but just add 64bit(?) number for fileIndex (inode) and dwNumberOfLinks

We need to communicate a way to the users to still let them generate hash from filename, to make sure new users won't be confused?
Might aswell just mention a //(uint)(fi.FileName?.GetHashCode() ?? 0) in comments in the new struct.

@Liryna
Copy link
Member

Liryna commented Oct 10, 2019

Exactly, we should use another struct having FileIndex that user can set OR let equal to 0 and dokan-dotnet handle it by keeping a hashmap updated with FileIndex of each file and cleanup when all handles are closed.

There is no real rules for FileIndex even if the best would be to duplicate NTFS
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information#remarks

@dedmen just to come back at what you said. When you copied the file to another folder you said os.path.samefile return true as they have the same name and therefore the same HashCode but if FileName is the full path \folder1\file and \folder2\file the HashCode should be different ? I might have missed something here.

@dedmen
Copy link
Author

dedmen commented Oct 11, 2019

but if FileName is the full path

correct. Full path would work. As thats my workaround.
Though in other places where the same FileInformation struct is used, its just name, not full path.

@Liryna
Copy link
Member

Liryna commented Oct 14, 2019

I might have missed something. For me GetFileInformation provide the full path so it OK.
What are the other places where FileInformation is used only with the file name ?

@dedmen
Copy link
Author

dedmen commented Oct 15, 2019

public NtStatus FindFiles(string filename, out IList files, DokanFileInfo info)
public NtStatus FindStreams(string fileName, out IList streams, DokanFileInfo info)
public NtStatus FindFilesWithPattern(string fileName, string searchPattern, out IList files, DokanFileInfo info)

Atleast I think thats the case. FindFiles in C API uses PWIN32_FIND_DATAW
https://docs.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-win32_find_dataw
Which contains filename according to MSDN, MSDN usually says "path" when they mean a path.

cFileName = fi.FileName

On the other hand all the dokan methods take a "filename" as argument which is really a fullpath.

FindStreams returns
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-win32_find_stream_data
which has a streamname, not a filename, and also a streamsize. I think that API should also be changed to a seperate struct. It only uses "Filename" which is supposed to be a "SteamName" and "Length" which is supposed to be the length of the stream and ignores all the other entries in the FileInformation struct.

So I guess I also kinda missinterpreted the API due to the mix of filename vs fullpath.
Or atleast didn't interpret the API in the way it was supposed to be interpreted.

https://dokan-dev.github.io/dokan-dotnet-doc/html/struct_dokan_net_1_1_file_information.html
This also says

the name of the file or directory.

@Liryna
Copy link
Member

Liryna commented Oct 15, 2019

I agree, the documentation should be improved or the API that is misleading (win32 API also use FileName whatever file or path dir is used). @dedmen 3b70a57 does this would be enough ?

Regarding the main reason of this issue, we agree that only GetFileInformation is using the FileIndex so we have no bug over the usage of (uint)(fi.FileName?.GetHashCode() ?? 0).

Can we agree to focus on changing the API ? We could just do struct inheritance as they look the same and change the API to use, as you proposed, the correct member names.

@dedmen
Copy link
Author

dedmen commented Oct 16, 2019

does this would be enough ?

Technically yes.. but it's more a hacky workaround than a real solution. And it requires the user to read the documentation instead of just relying on the filename. And then we have still the issue that the user used filename for the other variants and it worked fine, so he assumes he has to do the same in GetFileInformation too.

so we have no bug over the usage of (uint)(fi.FileName?.GetHashCode() ?? 0).

We only let the user access 32bit, of the 64bit value. User might want access the rest too.

Can we agree to focus on changing the API ?

As I said, I would just split the "FileInformation" struct, into 3 (FindFiles, FindStreams, GetFileInformation) seperate structs, that expose exactly the value that are actually needed.

/// - <see cref="IDokanOperations.GetFileInformation"/>
/// - <see cref="IDokanOperations.FindFiles"/>
/// - <see cref="IDokanOperations.FindStreams"/>
/// - <see cref="IDokanOperations.FindFilesWithPattern"/>.

FindFiles (both variants) needs to contain everything that FileInformation had. And we can keep "FileName" as that matches Win32 API.

FindStreams needs

var data = new WIN32_FIND_STREAM_DATA
{
StreamSize = fi.Length,
cStreamName = fi.FileName
};

string SteamName;
long StreamSize;

GetFileInformation needs

public NtStatus GetFileInformationProxy(

string FilePath; //Only used to create FileIndex if FileIndex wasn't set (needs to be documented)
FileAttributes Attributes;
DateTime? CreationTime;
DateTime? LastAccessTime;
DateTime? LastWriteTime;
long FileSize;
long? FileIndex; //If this is set, filepath is ignored (needs to be documented)
int NumberOfLinks = 1; //Not sure if we want to expose that? Don't know what its used for

We could do inheritance and just define properly named get/set and pass the same base-struct in the backend if that makes things easier, but I wouldn't know why also I'd prefer not to allocate/copy large structs, where 80% of the content is unused.

@Liryna
Copy link
Member

Liryna commented Oct 16, 2019

@dedmen I fully agree on changing the API to use the struct inheritance. Do you think you would be able to provide a PR for it ?

@dedmen
Copy link
Author

dedmen commented Oct 16, 2019

Able of course :D
Can't give you a timespan. Do you have a deadline it must be ready by?

@Liryna
Copy link
Member

Liryna commented Oct 16, 2019

@dedmen Great ! :) No deadline at all, of course (this is opensource :D) ! SI'm sorry if I sounded like I was pushing ^^

@kyanha
Copy link
Contributor

kyanha commented Dec 3, 2019

...this is an entire thing, just because Python's os.path.samefile has a bug?

Windows simply doesn't have the concept of an inode, nor a native concept of two directory entries that directly point to the same file (clusters). As such, on Windows NTFS, there is no possible way that two named regular files (i.e., files that are not reparse points) can ever possibly be the same underlying file object. The only instances where they could appear to be are under the old Single Instance Storage driver (which was a thing on Windows 2000 Server's Remote Installation Service), which relied on reparse points... and symlinks or hardlinks, which are also implemented as reparse points. (Notably, neither the SIS nor symlink/hardlink functionality ever require the names to match across all locations, so Python's os.path.samefile would give false negatives as well as false positives if it's implemented as described by @dedmen.)

It doesn't make sense to try to address that bug here. It's a bug in Python that Python needs to deal with.

Also, my thinking is that the most appropriate mapping of the inode concept to Windows would be through the "object ID" mechanism (ntifs.h:NtCreateFile's CreateOptions parameter containing the bit FILE_OPEN_BY_FILE_ID, which is exposed to usermode by winbase.h:OpenFileById). Using it would require several fcntls that Dokany doesn't support, and which would require some kind of additional mapping to user mode functions to manage.

@dedmen
Copy link
Author

dedmen commented Dec 3, 2019

This is not only a thing because of that python bug. I'd like to specify the id by myself too, and sending a string who'se whole purpose is to generate a ID is counterintuitive.

@kyanha
Copy link
Contributor

kyanha commented Jan 23, 2020

I have my own reasons for wanting to specify the ID, for FILE_OPEN_BY_FILE_ID handling. I honestly think it'd be most appropriate to address this bug by implementing that.

That said... all filenames that come to the functions pointed to in DOKAN_OPERATIONS from dokan1.sys are (currently) passed the full pathname within the filesystem. There's no "look for current working directory and append the filename to it" that the filesystem implementation needs to do, it's all handled by the driver. Which is good, because the driver doesn't pass current working directory information to the implementation. I don't think the driver can actually even get the current working directory information of the process that called it.

To understand BY_HANDLE_FILE_INFORMATION and its use of nFileIndexLow and nFileIndexHigh, it's important to realize that those came directly from the definition in the NT header files. Those fields are expressly defined and populated to support FILE_OPEN_BY_FILE_ID. The fact that you want to specify them is awesome, and I wholeheartedly salute it -- I just want it done correctly.

If FILE_OPEN_BY_FILE_ID is handled, the file ID is passed to the kernel in raw form (64 raw bits, with no base64 encoding, no padding, no NUL-termination) as FileName. Trying to interpret it as a string will either lead to incorrect behavior (since the first four bytes of a 32 bit value stuffed into a 64-bit space matches the UTF-32 representation of U+0000, thus leading to a string that's empty) or a bluescreen (since a 64-bit number with no 0x00 octets would have no guarantee that it would be followed by U+0000 in any reasonable space, and the kernel driver would need to interpret it as a string to determine what filename to pass up to user mode) -- see https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntcreatefile for more information. But, since the kernel driver would also receive the FILE_OPEN_BY_FILE_ID flag, it would be easy enough for the driver to break the value out into the nFileIndexLow and nFileIndexHigh fields, and pass an empty string in FileName. (This would be the differentiation that would allow a user to know that it's being requested to open a file by ID instead of by name -- if the string is @"" and not even @"\", it's not referring to a normal name.) [note: the kernel doesn't allow FILE_OPEN_BY_FILE_ID unless the filesystem is mounted with the FILE_SUPPORTS_OPEN_BY_FILE_ID deviceCharacteristic. This would need to be added to the DokanOptions enum, and its underlying handlers, to let the implementation tell dokany that it won't panic if it receives String.Empty in CreateFile.]

As an implementation note on your patch, long? FileIndex actually won't work to make it nullable, because long is a value type. Because of the underlying semantics of how memory addresses are interpreted, If 'null' were passed there, it would simply be passed the value of 0L, with no way to differentiate whether it's intended to be null or a literal 0L.

I'd also suggest making that parameter ulong instead of long, since it's not actually interpreted as an numeric value, and the underlying nFileIndexLow and nFileIndexHigh are both defined as uint (which would have the potential of making fully half of the nFileIndexHigh values appear to be negative numbers -- as well as potentially opening up sign-extension issues).

I understand that you probably derived the name FileIndex as a name from BY_HANDLE_FILE_INFORMATION.cs, but I'd suggest renaming it to FileID", since FILE_ID is what the MSDN documentation calls it. (The names nFileIndexLow and nFileIndexHigh are artifacts of how NTFS implements it -- it's the index of the file into the Master File Table. However, in dokany implementations, they're not always going to be true "indexes". I'm working on an implementation that would be able to directly expose underlying object IDs for file streams stored in PostgreSQL, which doesn't relate to indexing into an MFT. [As a side note, Postgres object IDs are all 32 bits long, which would also lead to me being impacted by a 0x00000000 nFileIndexHigh if it were interpreted as a string, which is why I'm concerned about correctly handling it.])

Also, the name that NtCreateFile receives when asked to open by FILE_ID is of either of the forms [8RawBytesOfFileID] or \??\C:\[8RawBytesOfFileID]. I don't have a means of testing right now, but I would guess that if the Dokan filesystem is mounted on a path, it would be the path to the mountpoint prepended with \??\. (I don't have any idea what it would be in a UNC environment, since the IDs provided by \Device\HardDiskVolume1\[16RawBytesOfObjectID] name syntax are 16-byte GUIDs, and UNC paths are accessed via \Device\MUP\hostname\sharename\.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants