-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Ambiguity in the File
namespace API
#14245
Comments
I was curious after reading this so I took a quick look at what other non-Ruby languages do for these types of file APIs. Interestingly enough it seems like Python, Go and C++ lack a high-level file permissions API like Crystal has currently with The Looking over the documentation for a few languages made me realize that the Crystal docs tend to not clarify the difference between file system objects and individual file types very well. File.exists?Ruby
Python
Crystal
File.file?Ruby
Crystal
|
Yeah, I think that's quite reasonable considering that valid use cases for these methods are quite rare.
That's probably a good choice in many use cases. Even if you do every possible check beforehand, |
When doing things with actual files and directories, I like being able to build it up with Path
I wish, like ruby Pathname we had the file? and directory? methods on Path and maybe even mkdir on dir Path's
|
@dsisnero I think enhancements to |
@straight-shoota I think extending For example |
This is certainly related but IMO it should still be a separate discussion. The outcome affects parts of this issue, of course. But it's not fundamental, it just offers one potential solution for one of the aspects (and it's the least important one). The more critical issues of this topic do not depend on the |
From the above analysis:
|
There's no reliably way to know for sure. But there are simple ways to tell that you can't. And I think this is actually quite valuable for a number of use cases (I list some examples in the detailed investigation report). It's better to do a basic sanity check before the read/write/exec than just perform it and then having to deal with basic errors. Compare the following examples for optionally reading from a file if it exists. begin
File.read(path)
rescue exc : FileNotFoundError
# ignore error if file doesn't exist
end if File.readable?(path) && !File.directory?(path)
File.read(path)
end Note: The examples don't do exactly the same thing and there are more error cases not covered for brevity. This fact also exemplifies that it's a complex topic that needs attention to get the semantics right. I think the latter form is more ergonomical and efficient. However the condition ( if File.can_read?(path)
File.read(path)
end UPDATE: Note that this form introduces a race condition. The file could be deleted between checking the condition and executing the actual read. So it might still be necessary to implement proper error handling. |
I'd suggest |
Hm. I see how that could be useful. But I think it's not as easy to use and could be confusing (there's already |
An alternative or supplement to query methods such as For example, It would be the same concept as An issue with this approach is that it's not possible to easily distinguish between a path not existing at all or being of a wrong kind (like a directory). But I'm not sure if that's actually that much relevant. I presume if you want to read from a specific file name, it's okay if it does not exist, but if it's something entirely different, that's probably a odd situation that needs attention. New methods related to reading and writing: |
This issue has been mentioned on Crystal Forum. There might be relevant details there: https://forum.crystal-lang.org/t/add-file-info-methods-to-path/6781/1 |
Motivated by #13849, I started looking at uses of
File.file?
and similar methods. That issue was about an inadequate use of that method in the compiler. Sadly, it doesn't look great on other call sites as well. The problem is systematic and asks for improvements on the API level.Introduction
File.file?
seems like a reasonable choice for checking if a path is a file. But it depends on the understanding of "file". This methods considers an (ordinary) file in contrast to other file types like directory, special device file, named pipe. This detail is often not terribly relevant. Most use cases shouldn't worry too much about the exact type.But often doesn't mean always and rare usage of some of these types makes it easy to miss any shortcomings on them.
A typical intention is to verify that
File.read
or similar would work on the path. But that's not whatFile.file?
does. It's focused on entry types in the file system. Named pipes or character devices work forFile.read
as well and their exclusion is often unreasonable and unintended.File.exists?
is confusing becauseFile.exists?(path)
suggests it's checking if a file exists. But this method checks if anything exists at the path. It doesn't need to be a file. It can be a directory.It get's even more confusing. The following example seems legit and safe. But in reality, it doesn't make any sense:
Despite the similar name suggesting a relation,
File.readable?
has nothing to do withFile.read
.File.readable?
is typically true for a directory1. ButFile.read
does not work on directories.I've mentioned these shortcomings in my talk at CrystalConf 2023. This issue however is a more complete and organized collection on this topic.
Investigation
I looked through stdlib and compiler sources for call sites of some related methods in the
File
namespace.File.file?
,File.directory?
,File.symlink?
check whether the file is of the respective type (seeFile::Type
)File.readable?
,File.writable?
andFile.executable?
check if the process has the rights to read/write/execute a file.File.exists?
checks the existence of a file entry, but not its type or access rights.Some of the findings from combing through stdlib and compiler sources
The most severe faults that I noticed, have already been fixed:
crystal docs
checkFile.exists?
forshard.yml
#13937FileUtils.ln_sf
to override special file types #13896Some examples for lesser issues where there was lesser need to fix them directly:
crystal/src/compiler/crystal/command.cr
Line 127 in 6b9ad16
crystal/src/compiler/crystal/command.cr
Line 545 in 6b9ad16
These are similar to #13849 (same location) and should actually check if the path points to something that we can practically read from.
There are many more similar instances in the compiler that I'm skipping for brevity. They're usually related to loading source code or cached data from files. In some cases it might be fine to use
File.file?
, but in others the same limitations as above apply.crystal/src/crystal/system/unix/time.cr
Line 124 in 6b9ad16
This goes the extra step of also checking
File.readable?
. But its still limited to ordinary files.crystal/src/time/location/loader.cr
Line 95 in 6b9ad16
This goes another step extra and over the top.
File.exists?
is redundant toFile.file?
.There are plenty uses of these methods that are dramatically wrong, others not that much. For some of these methods, their semantics rarely match the apparent intention at the callsite. I barely found any instance where I'm confident that the method corresponds exactly to the intended purpose.
It seems very clear that there are many issues with utilizing these API methods in the standard library and compiler. It's not hard to extrapolate that this would apply to other code bases as well. We can't even get it right in the stdlib!
An easy solution would be if people just read the documentation. It clearly says what these methods do!
But the sheer scale of inapproriate usage clearly points to intrinsical flaws. It's hardly possible to use these methods correctly for the most common use cases
It's not that they're completely useless, they have their purpose. But those are specific niche use cases.
You won't need them unless your Crystal program explicitly deals with file system details, for example.
Analysis
I don't think any of these individual methods are wrong in some sense.
The underlying problem is an ambiguous mixture of different concepts in the same namespace.
Some methods are tightly coupled to the tiny details of the file system and do not match the scope of higher-level, usage-oriented APIs.
Different understandigns of the term "file" in methods from the
File
namespace:File.file?
: only an ordinary file (excluding named pipes and character devices)File.read
: ordinary files, named pipe and character deviceFile.exists?
: any kind of entry in the file system, including directoriesHaving that all crowded together in the prominent top-level
File
namespace seems to be the cause for most misunderstandings.Discussion
I believe we should separate methods with different scopes to express their differences more clearly.
Lower-level methods using the narrower definitions (e.g.
.readable?
,.writable?
and.executable?
) could fit intoFile::Info
orFile::Type
. Their use cases are rare and there's not much reason to have them in a high-level namespace.The
File
type is focused on the concept of a stream that can read and write. So theFile
class namespace should be reserved for the core methods following that definition.There could be a set of methods to check wether a read or write operation would principally work on a given path. They could be named
File.can_read?
andFile.can_write?
(.readable?
and.writable?
already have a different meaning and even if we removed them, we couldn't immediately replace them with a different method).I'm not sure what to do about the wider scope referring to any entry in the file system. There's no apparent target for that. Maybe we should consider a new
FileSystem
namspace?Path
could be an option, but it has so far been understood in an abstract sense, so we might just be heading for another mixture of concerns.I think most methods of this scope are not even that bad to mix in the
File
namespace. Except for.exists?
I've found little cause for confusion. This might need some more investigation, but perhaps we could leave them in theFile
namespace and just renameFile.exists?
to a less ambiguous name.Footnotes
The readable bit is typically set on directories because that enables listing its entries. ↩
The text was updated successfully, but these errors were encountered: