From e623982c4eeacd3ab7640860ec03fb668e18e226 Mon Sep 17 00:00:00 2001 From: zeptodoctor <44736852+zeptodoctor@users.noreply.github.com> Date: Tue, 13 Feb 2024 03:04:44 +0000 Subject: [PATCH] build based on 8fdc7ce --- dev/index.html | 2 +- dev/search/index.html | 2 +- dev/z10-for-end-users/index.html | 2 +- dev/z20-for-pkg-devs/index.html | 2 +- dev/z30-for-contributors/index.html | 2 +- dev/z40-apiref/index.html | 12 ++++++------ 6 files changed, 11 insertions(+), 11 deletions(-) diff --git a/dev/index.html b/dev/index.html index a96443c..1036162 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -DataDeps.jl Documentation · DataDeps.jl

DataDeps.jl Documentation

What is DataDeps?

DataDeps is a package for simplifying the management of data in your Julia application. In particular it is designed to simplify the process of getting static files from some server into the local machine, and making programs know where that data is.

For a few examples of its usefulness see this blog post

Usage in Brief:

I want to use some data I have in my project. What do?

The short version is:

  1. Stick your data anywhere with a open HTTP link. (Skip this if it is already online.)
  2. Write a DataDep registration block.
  3. Refer to the data using datadep"Dataname/file.csv etc as if it were a file path, and DataDeps.jl will sort out getting in onto your system.
  4. For CI purposes set the DATADEPS_ALWAYS_ACCEPT environment variable.

Where can I store my data online?

Where ever you want, so long as it gives an Open HTTP(/s) link to download it. **

  • I use an OwnCloud instance hosted by our national research infrastructure.
  • Research data hosting like FigShare are a good idea.
  • You can just stick it on your website hosting if you are operating a website.
  • I'd like to hear if anyone has tested GoogleDrive or DropBox etc.

**(In other protocols and auth can be supported by using a different fetch_method)

Why not store the data in Git?

Git is good for files that meet 3 requirements:

  • Plain text (not binary)
  • Smallish (Github will not accept files >50Mb in size)
  • Dynamic (Git is version control, it is good at knowing about changes)

There is certainly some room around the edges for this, like storing a few images in the repository is OK, but storing all of ImageNet is a no go. For those edge cases ManualDataDeps are good (see below).

DataDeps.jl is good for:

  • Any file format
  • Any size
  • Static (that is to say it doesn't change)

The main use case is downloading large datasets for machine learning, and corpora for NLP. In this case the data is not even normally yours to begin with. It lives on some website somewhere. You don't want to copy and redistribute it; and depending on licensing you may not even be allowed to.

But my data is dynamic

Well how dynamic? If you are willing to tag a new release of your package each time the data changes, then maybe this is no worry, but maybe it is.

But the real question is, is DataDeps.jl really suitable for managing your data properly in the first place. DataDeps.jl does not provide for versioning of data – you can't force users to download new copies of your data using DataDeps. There are work arounds, such as using DataDeps.jl + deps/build.jl to rm(datadep"MyData", recursive=true, force=true every package update. Or considering each version of the data as a different datadep with a different name. DataDeps.jl may form part of your overall solution or it may not. That is a discussion to have on Slack or Discourse (feel free to tag me, I am @oxinabox on both). See also the list of related packages at the bottom

The other option is that if your data a good fit for git. If it is in overlapping area of plaintext & small (or close enough to those things), then you could add it as a ManualDataDep in and include it in the git repo in the deps/data/ folder of your package. The ManuaulDataDep will not need manual installation if it is being installed via git.

Other similar packages:

DataDeps.jl isn't the answer to everyone's download needs. It is focused squarely on static data. It is opinionated about providing user readable metadata at a prompt that must be accepted. It doesn't try to understand what the data means at all. It might not be good for your use case.

Alternatives that I am aware of are:

  • RemoteFiles.jl: keeps local files up to date with remotes. In someways it is the opposite of DataDeps.jl (which means it is actually very similar in many ways).
  • BinaryProvider.jl downloads binaries intended as part of a build chain. I'm pretty sure you can trick it into downloading data.
  • Base.download if your situation is really simple just sticking a download into the deps/build.jl file might do you just fine.

Outside of julia's ecosystem is

  • Python: Quilt. Quilt uses a centralised data store, and allows the user to download the data as Python packages containing it in serialised from. It might be possible to use PyCall.jl to use this from julia.
  • R: suppdata, features extra stuff relating to published datasets (See also DataDepsGenerators.jl), it might be possible to use RCall.jl to use this from julia.
  • Node/Commandline: Datproject I'm not too familiar with this, it is a bit of an ecosystem of its own. I think using it from the commandline might satisfy many people's needs. Or automating it with shell calls in build.jl.
+DataDeps.jl Documentation · DataDeps.jl

DataDeps.jl Documentation

What is DataDeps?

DataDeps is a package for simplifying the management of data in your Julia application. In particular it is designed to simplify the process of getting static files from some server into the local machine, and making programs know where that data is.

For a few examples of its usefulness see this blog post

Usage in Brief:

I want to use some data I have in my project. What do?

The short version is:

  1. Stick your data anywhere with a open HTTP link. (Skip this if it is already online.)
  2. Write a DataDep registration block.
  3. Refer to the data using datadep"Dataname/file.csv etc as if it were a file path, and DataDeps.jl will sort out getting in onto your system.
  4. For CI purposes set the DATADEPS_ALWAYS_ACCEPT environment variable.

Where can I store my data online?

Where ever you want, so long as it gives an Open HTTP(/s) link to download it. **

  • I use an OwnCloud instance hosted by our national research infrastructure.
  • Research data hosting like FigShare are a good idea.
  • You can just stick it on your website hosting if you are operating a website.
  • I'd like to hear if anyone has tested GoogleDrive or DropBox etc.

**(In other protocols and auth can be supported by using a different fetch_method)

Why not store the data in Git?

Git is good for files that meet 3 requirements:

  • Plain text (not binary)
  • Smallish (Github will not accept files >50Mb in size)
  • Dynamic (Git is version control, it is good at knowing about changes)

There is certainly some room around the edges for this, like storing a few images in the repository is OK, but storing all of ImageNet is a no go. For those edge cases ManualDataDeps are good (see below).

DataDeps.jl is good for:

  • Any file format
  • Any size
  • Static (that is to say it doesn't change)

The main use case is downloading large datasets for machine learning, and corpora for NLP. In this case the data is not even normally yours to begin with. It lives on some website somewhere. You don't want to copy and redistribute it; and depending on licensing you may not even be allowed to.

But my data is dynamic

Well how dynamic? If you are willing to tag a new release of your package each time the data changes, then maybe this is no worry, but maybe it is.

But the real question is, is DataDeps.jl really suitable for managing your data properly in the first place. DataDeps.jl does not provide for versioning of data – you can't force users to download new copies of your data using DataDeps. There are work arounds, such as using DataDeps.jl + deps/build.jl to rm(datadep"MyData", recursive=true, force=true every package update. Or considering each version of the data as a different datadep with a different name. DataDeps.jl may form part of your overall solution or it may not. That is a discussion to have on Slack or Discourse (feel free to tag me, I am @oxinabox on both). See also the list of related packages at the bottom

The other option is that if your data a good fit for git. If it is in overlapping area of plaintext & small (or close enough to those things), then you could add it as a ManualDataDep in and include it in the git repo in the deps/data/ folder of your package. The ManuaulDataDep will not need manual installation if it is being installed via git.

Other similar packages:

DataDeps.jl isn't the answer to everyone's download needs. It is focused squarely on static data. It is opinionated about providing user readable metadata at a prompt that must be accepted. It doesn't try to understand what the data means at all. It might not be good for your use case.

Alternatives that I am aware of are:

  • RemoteFiles.jl: keeps local files up to date with remotes. In someways it is the opposite of DataDeps.jl (which means it is actually very similar in many ways).
  • BinaryProvider.jl downloads binaries intended as part of a build chain. I'm pretty sure you can trick it into downloading data.
  • Base.download if your situation is really simple just sticking a download into the deps/build.jl file might do you just fine.

Outside of julia's ecosystem is

  • Python: Quilt. Quilt uses a centralised data store, and allows the user to download the data as Python packages containing it in serialised from. It might be possible to use PyCall.jl to use this from julia.
  • R: suppdata, features extra stuff relating to published datasets (See also DataDepsGenerators.jl), it might be possible to use RCall.jl to use this from julia.
  • Node/Commandline: Datproject I'm not too familiar with this, it is a bit of an ecosystem of its own. I think using it from the commandline might satisfy many people's needs. Or automating it with shell calls in build.jl.
diff --git a/dev/search/index.html b/dev/search/index.html index 5898b35..32a13cd 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -1,2 +1,2 @@ -Search · DataDeps.jl

Loading search...

    +Search · DataDeps.jl

    Loading search...

      diff --git a/dev/z10-for-end-users/index.html b/dev/z10-for-end-users/index.html index 5677ffa..aab7b9a 100644 --- a/dev/z10-for-end-users/index.html +++ b/dev/z10-for-end-users/index.html @@ -9,4 +9,4 @@ C:\Users\oxinabox\AppData\Roaming\datadeps C:\Users\oxinabox\AppData\Local\datadeps C:\ProgramData\datadeps -C:\Users\Public\datadeps

      Having multiple copies of the same DataDir

      You probably don't want to have multiple copies of a DataDir with the same name. DataDeps.jl will try to handle it as gracefully as it can. But having different DataDeps under the same name, is probably going to lead to packages loading the wrong one. Except if they are (both) located in their packages deps/data folder.

      By moving a package's data dependency into its package directory under deps/data, it becomes invisible except to that package. For example ~/.julia/v0.6/EXAMPLEPKG/deps/data/EXAMPLEDATADEP/, for the package EXAMPLEPKG, and the datadep EXAMPLEDATADEP.

      Ideally though you should probably raise an issue with the package maintainers and see if one (or both) of them want to change the DataDep name.

      Note also when it comes to file level loading, e.g. datadep"Name/subfolder/file.txt", DataDeps.jl does not check all folders with that Name (if you have multiples). If the file is not in the first folder it finds you will be presented with the recovery dialog, from which the easiest option is to select to delete the folder and retry, since that will result in it checking the second folder (as the first one does not exist).

      Removing data

      Sometimes you don't need the data anymore. You can remove files from within Julia using the rm command. If you had registered a DataDep called MyDataName, then you can remove it with

      rm(datadep"MyDataName"; recursive=true)

      Configuration

      Currently configuration is done via Environment Variables. It is likely to stay that way, as they are also easy to setup in CI tools. You can set these in the startup.jl file using the ENV dictionary if you don't want to mess up your .profile. However, most people shouldn't need to. DataDeps.jl tries to have very sensible defaults.

      +C:\Users\Public\datadeps

      Having multiple copies of the same DataDir

      You probably don't want to have multiple copies of a DataDir with the same name. DataDeps.jl will try to handle it as gracefully as it can. But having different DataDeps under the same name, is probably going to lead to packages loading the wrong one. Except if they are (both) located in their packages deps/data folder.

      By moving a package's data dependency into its package directory under deps/data, it becomes invisible except to that package. For example ~/.julia/v0.6/EXAMPLEPKG/deps/data/EXAMPLEDATADEP/, for the package EXAMPLEPKG, and the datadep EXAMPLEDATADEP.

      Ideally though you should probably raise an issue with the package maintainers and see if one (or both) of them want to change the DataDep name.

      Note also when it comes to file level loading, e.g. datadep"Name/subfolder/file.txt", DataDeps.jl does not check all folders with that Name (if you have multiples). If the file is not in the first folder it finds you will be presented with the recovery dialog, from which the easiest option is to select to delete the folder and retry, since that will result in it checking the second folder (as the first one does not exist).

      Removing data

      Sometimes you don't need the data anymore. You can remove files from within Julia using the rm command. If you had registered a DataDep called MyDataName, then you can remove it with

      rm(datadep"MyDataName"; recursive=true)

      Configuration

      Currently configuration is done via Environment Variables. It is likely to stay that way, as they are also easy to setup in CI tools. You can set these in the startup.jl file using the ENV dictionary if you don't want to mess up your .profile. However, most people shouldn't need to. DataDeps.jl tries to have very sensible defaults.

      diff --git a/dev/z20-for-pkg-devs/index.html b/dev/z20-for-pkg-devs/index.html index 2fc2a36..b06d6da 100644 --- a/dev/z20-for-pkg-devs/index.html +++ b/dev/z20-for-pkg-devs/index.html @@ -53,4 +53,4 @@ Files: 3 Size: 5075686 Compressed: 579043 -true

      Notice that it has issued a warning that the checksum was not provided, and has output the hash that needs to be added to the registration block. But it has not issued any warnings about the unpack. The fetch_method is never invoked.

      It is good to use preupload checking before you upload files. It can make debugging easier.

      +true

      Notice that it has issued a warning that the checksum was not provided, and has output the hash that needs to be added to the registration block. But it has not issued any warnings about the unpack. The fetch_method is never invoked.

      It is good to use preupload checking before you upload files. It can make debugging easier.

      diff --git a/dev/z30-for-contributors/index.html b/dev/z30-for-contributors/index.html index 21913b1..3e51296 100644 --- a/dev/z30-for-contributors/index.html +++ b/dev/z30-for-contributors/index.html @@ -1,2 +1,2 @@ -Extending DataDeps.jl for Contributors · DataDeps.jl

      Extending DataDeps.jl for Contributors

      Feel free (encouraged even) to open issues and make PRs.

      Internal Docstrings

      As well as the usual all the publicly facing methods having docstrings, most of the internal methods do also. You can view them in the source; or via the julia REPL etc. Hopefully the internal docstrings make it clear how each method is used.

      Creating custom AbstractDataDep types

      The primary point of extension for DataDeps.jl is in developers defining their own DataDep types. 99% of developers won't need to do this, a ManualDataDep or a normal (automatic) DataDep covers most use cases. However, if for example you want to have a DataDep that after the download is complete and after the post_fetch_method is run, does an additional validation, or some data synthesis step that requires working with multiple of the files simultaneously (which post_fetch_method can not do), or a SemiManualDataDep where the user does some things and then other things happen automatically, then this can be done by creating your own AbstractDataDep type.

      The code for ManualDataDep is a good place to start looking to see how that is done. You can also encapsulate an DataDep as one of the elements of your new type.

      If you do this you might like to contribute the type back up to this repository, so others can use it also. Particularly, if it is something that generalises beyond your specific use case.

      +Extending DataDeps.jl for Contributors · DataDeps.jl

      Extending DataDeps.jl for Contributors

      Feel free (encouraged even) to open issues and make PRs.

      Internal Docstrings

      As well as the usual all the publicly facing methods having docstrings, most of the internal methods do also. You can view them in the source; or via the julia REPL etc. Hopefully the internal docstrings make it clear how each method is used.

      Creating custom AbstractDataDep types

      The primary point of extension for DataDeps.jl is in developers defining their own DataDep types. 99% of developers won't need to do this, a ManualDataDep or a normal (automatic) DataDep covers most use cases. However, if for example you want to have a DataDep that after the download is complete and after the post_fetch_method is run, does an additional validation, or some data synthesis step that requires working with multiple of the files simultaneously (which post_fetch_method can not do), or a SemiManualDataDep where the user does some things and then other things happen automatically, then this can be done by creating your own AbstractDataDep type.

      The code for ManualDataDep is a good place to start looking to see how that is done. You can also encapsulate an DataDep as one of the elements of your new type.

      If you do this you might like to contribute the type back up to this repository, so others can use it also. Particularly, if it is something that generalises beyond your specific use case.

      diff --git a/dev/z40-apiref/index.html b/dev/z40-apiref/index.html index 79b73b8..f27ed5e 100644 --- a/dev/z40-apiref/index.html +++ b/dev/z40-apiref/index.html @@ -7,12 +7,12 @@ # keyword args (Optional): fetch_method=fetch_default # (remote_filepath, local_directory_path)->local_filepath post_fetch_method=identity # (local_filepath)->Any -)

      Required Fields

      Optional Fields

      source
      DataDeps.ManualDataDepType
      ManualDataDep(name, message)

      A DataDep for if the installation needs to be handled manually. This can be done via Pkg/git if you put the dependency into the packages repo's /deps/data directory. More generally, message should give instructions on how to setup the data.

      source
      DataDeps.registerFunction
      register(datadep::AbstractDataDep)

      Registers the given datadep to be globally available to the program. this makes datadep"Name" work. register should be run within this __init__ of your module.

      source
      DataDeps.@datadep_strMacro
      `datadep"Name"` or `datadep"Name/file"`

      Use this just like you would a file path, except that you can refer by name to the datadep. The name alone will resolve to the corresponding folder. Even if that means it has to be downloaded first. Adding a path within it functions as expected.

      source
      Base.downloadFunction
      Base.download(
      +)

      Required Fields

      • name: the name used to refer to this datadep
        • Corresponds to a folder name where the datatep will be stored.
        • It can have spaces or any other character that is allowed in a Windows filestring (which is a strict subset of the restriction for unix filenames).
      • message: a message displayed to the user for they are asked if they want to download it
        • This is normally used to give a link to the original source of the data, a paper to be cited etc.
      • remote_path: where to fetch the data from
        • This is usually a string, or a vector of strings (or a vector of vectors... see Recursive Structure in the documentation for developers).

      Optional Fields

      • hash: used to check whether the files downloaded correctly
        • By far the most common use is to just provide a SHA256 sum as a hex-string for the files.
        • If not provided, then a warning message with the SHA256 sum is displayed. This is to help package devs work out the sum for their files, without using an external tool. You can also calculate it using Preupload Checking in the documentation for developers.
        • If you want to use a different hashing algorithm, then you can provide a tuple (hashfun, targethex). hashfun should be a function which takes an IOStream, and returns a Vector{UInt8}. Such as any of the functions from SHA.jl, eg sha3_384, sha1_512 or md5 from MD5.jl
        • If you want to use a different hashing algorithm, but don't know the sum, you can provide just the hashfun and a warning message will be displayed, giving the correct tuple of (hashfun, targethex) that should be added to the registration block.
        • If you don't want to provide a checksum, because your data can change pass in the type Any which will suppress the warning messages. (But see above warnings about "what if my data is dynamic").
        • Can take a vector of checksums, being one for each file, or a single checksum in which case the per file hashes are xored to get the target hash. (See Recursive Structure in the documentation for developers).
      • fetch_method=fetch_default: a function to run to download the files
        • Function should take 2 parameters (remote_filepath, local_directorypath), and can must return the local filepath to the file downloaded.
        • Default (fetch_default) can correctly handle strings containing HTTP[S] URLs, or any remote_path type which overloads Base.basename and Base.download, e.g. AWSS3.S3Path.
        • Can take a vector of methods, being one for each file, or a single method, in which case that method is used to download all of them. (See Recursive Structure in the documentation for developers).
        • Overloading this lets you change things about how the download is done – the transport protocol.
        • The default is suitable for HTTP[/S], without auth. Modifying it can add authentication or an entirely different protocol (e.g. git, google drive etc).
        • This function is also responsible to work out what the local file should be called (as this is protocol dependent).
      • post_fetch_method: a function to run after the files have been downloaded
        • Should take the local filepath as its first and only argument. Can return anything.
        • Default is to do nothing.
        • Can do what it wants from there, but most likely wants to extract the file into the data directory.
        • towards this end DataDeps.jl includes a command: unpack which will extract an compressed folder, deleting the original.
        • It should be noted that post_fetch_method runs from within the data directory.
          • which means operations that just write to the current working directory (like rm or mv or run(`SOMECMD`)) just work.
          • You can call cwd() to get the the data directory for your own functions. (Or dirname(local_filepath)).
        • Can take a vector of methods, being one for each file, or a single method, in which case that same method is applied to all of the files. (See Recursive Structure in the documentation for developers).
        • You can check this as part of Preupload Checking in the documentation for developers.
      source
      DataDeps.ManualDataDepType
      ManualDataDep(name, message)

      A DataDep for if the installation needs to be handled manually. This can be done via Pkg/git if you put the dependency into the packages repo's /deps/data directory. More generally, message should give instructions on how to setup the data.

      source
      DataDeps.registerFunction
      register(datadep::AbstractDataDep)

      Registers the given datadep to be globally available to the program. this makes datadep"Name" work. register should be run within this __init__ of your module.

      source
      DataDeps.@datadep_strMacro
      `datadep"Name"` or `datadep"Name/file"`

      Use this just like you would a file path, except that you can refer by name to the datadep. The name alone will resolve to the corresponding folder. Even if that means it has to be downloaded first. Adding a path within it functions as expected.

      source
      Base.downloadFunction
      Base.download(
           datadep::DataDep,
           localdir;
           remotepath=datadep.remotepath,
           skip_checksum=false,
      -    i_accept_the_terms_of_use=nothing)

      A method to download a datadep. Normally, you do not have to download a data dependency manually. If you simply cause the string macro datadep"DepName", to be executed it will be downloaded if not already present.

      Invoking this download method manually is normally for purposes of debugging, As such it include a number of parameters that most people will not want to use.

      • localdir: this is the local directory to save to.
      • remotepath: the remote path to fetch the data from, use this e.g. if you can't access the normal path where the data should be, but have an alternative.
      • skip_checksum: setting this to true causes the checksum to not be checked. Use this if the data has changed since the checksum was set in the registry, or for some reason you want to download different data.
      • i_accept_the_terms_of_use: use this to bypass the I agree to terms screen. Useful if you are scripting the whole process, or using another system to get confirmation of acceptance.
        • For automation perposes you can set the environment variable DATADEPS_ALWAYS_ACCEPT
        • If not set, and if DATADEPS_ALWAYS_ACCEPT is not set, then the user will be prompted.
        • Strictly speaking these are not always terms of use, it just refers to the message and permission to download.

      If you need more control than this, then your best bet is to construct a new DataDep object, based on the original, and then invoke download on that.

      source

      Helpers

      DataDeps.unpackFunction
      unpack(f; keep_originals=false)

      Extracts the content of an archive in the current directory; deleting the original archive, unless the keep_originals flag is set.

      source

      Internal

      DataDeps.DataDepType
      DataDep(
      +    i_accept_the_terms_of_use=nothing)

      A method to download a datadep. Normally, you do not have to download a data dependency manually. If you simply cause the string macro datadep"DepName", to be executed it will be downloaded if not already present.

      Invoking this download method manually is normally for purposes of debugging, As such it include a number of parameters that most people will not want to use.

      • localdir: this is the local directory to save to.
      • remotepath: the remote path to fetch the data from, use this e.g. if you can't access the normal path where the data should be, but have an alternative.
      • skip_checksum: setting this to true causes the checksum to not be checked. Use this if the data has changed since the checksum was set in the registry, or for some reason you want to download different data.
      • i_accept_the_terms_of_use: use this to bypass the I agree to terms screen. Useful if you are scripting the whole process, or using another system to get confirmation of acceptance.
        • For automation perposes you can set the environment variable DATADEPS_ALWAYS_ACCEPT
        • If not set, and if DATADEPS_ALWAYS_ACCEPT is not set, then the user will be prompted.
        • Strictly speaking these are not always terms of use, it just refers to the message and permission to download.

      If you need more control than this, then your best bet is to construct a new DataDep object, based on the original, and then invoke download on that.

      source

      Helpers

      DataDeps.unpackFunction
      unpack(f; keep_originals=false)

      Extracts the content of an archive in the current directory; deleting the original archive, unless the keep_originals flag is set.

      source

      Internal

      DataDeps.DataDepType
      DataDep(
           name::String,
           message::String,
           remote_path::Union{String,Vector{String}...},
      @@ -20,15 +20,15 @@
           # keyword args (Optional):
           fetch_method=fetch_default # (remote_filepath, local_directory_path)->local_filepath
           post_fetch_method=identity # (local_filepath)->Any
      -)

      Required Fields

      • name: the name used to refer to this datadep
        • Corresponds to a folder name where the datatep will be stored.
        • It can have spaces or any other character that is allowed in a Windows filestring (which is a strict subset of the restriction for unix filenames).
      • message: a message displayed to the user for they are asked if they want to download it
        • This is normally used to give a link to the original source of the data, a paper to be cited etc.
      • remote_path: where to fetch the data from
        • This is usually a string, or a vector of strings (or a vector of vectors... see Recursive Structure in the documentation for developers).

      Optional Fields

      • hash: used to check whether the files downloaded correctly
        • By far the most common use is to just provide a SHA256 sum as a hex-string for the files.
        • If not provided, then a warning message with the SHA256 sum is displayed. This is to help package devs work out the sum for their files, without using an external tool. You can also calculate it using Preupload Checking in the documentation for developers.
        • If you want to use a different hashing algorithm, then you can provide a tuple (hashfun, targethex). hashfun should be a function which takes an IOStream, and returns a Vector{UInt8}. Such as any of the functions from SHA.jl, eg sha3_384, sha1_512 or md5 from MD5.jl
        • If you want to use a different hashing algorithm, but don't know the sum, you can provide just the hashfun and a warning message will be displayed, giving the correct tuple of (hashfun, targethex) that should be added to the registration block.
        • If you don't want to provide a checksum, because your data can change pass in the type Any which will suppress the warning messages. (But see above warnings about "what if my data is dynamic").
        • Can take a vector of checksums, being one for each file, or a single checksum in which case the per file hashes are xored to get the target hash. (See Recursive Structure in the documentation for developers).
      • fetch_method=fetch_default: a function to run to download the files
        • Function should take 2 parameters (remote_filepath, local_directorypath), and can must return the local filepath to the file downloaded.
        • Default (fetch_default) can correctly handle strings containing HTTP[S] URLs, or any remote_path type which overloads Base.basename and Base.download, e.g. AWSS3.S3Path.
        • Can take a vector of methods, being one for each file, or a single method, in which case that method is used to download all of them. (See Recursive Structure in the documentation for developers).
        • Overloading this lets you change things about how the download is done – the transport protocol.
        • The default is suitable for HTTP[/S], without auth. Modifying it can add authentication or an entirely different protocol (e.g. git, google drive etc).
        • This function is also responsible to work out what the local file should be called (as this is protocol dependent).
      • post_fetch_method: a function to run after the files have been downloaded
        • Should take the local filepath as its first and only argument. Can return anything.
        • Default is to do nothing.
        • Can do what it wants from there, but most likely wants to extract the file into the data directory.
        • towards this end DataDeps.jl includes a command: unpack which will extract an compressed folder, deleting the original.
        • It should be noted that post_fetch_method runs from within the data directory.
          • which means operations that just write to the current working directory (like rm or mv or run(`SOMECMD`)) just work.
          • You can call cwd() to get the the data directory for your own functions. (Or dirname(local_filepath)).
        • Can take a vector of methods, being one for each file, or a single method, in which case that same method is applied to all of the files. (See Recursive Structure in the documentation for developers).
        • You can check this as part of Preupload Checking in the documentation for developers.
      source
      DataDeps.preupload_checkMethod
      preupload_check(datadep, local_filepath[s])::Bool)

      Peforms preupload checks on the local files without having to download them. This is tool for creating or updating DataDeps, allowing the author to check the files before they are uploaded (or if downloaded directly). This checking includes checking the checksum, and the making sure the post_fetch_method runs without errors. It basically performs datadep resolution, but bypasses the step of downloading the files. The results of performing the post_fetch_method are not kept. As normal if the DataDep being checked does not have a checksum, or if the checksum does not match, then a warning message will be displayed. Similarly, if the post_fetch_method throws an exception, a warning will be displayed.

      Returns: true or false, depending on if the checks were all good, or not.

      Arguments:

      • datadep: Either an instance of a DataDep type, or the name of a registered DataDep as a AbstractString
      • local_filepath: a filepath or (recursive) list of filepaths. This is what would be returned by fetch in normal datadep use.
      source
      DataDeps.registerMethod
      register(datadep::AbstractDataDep)

      Registers the given datadep to be globally available to the program. this makes datadep"Name" work. register should be run within this __init__ of your module.

      source
      DataDeps.resolveMethod
      resolve("name/path", @__FILE__)

      Is the function that lives directly behind the datadep"name/path" macro. If you are working the the names of the datadeps programmatically, and don't want to download them by mistake; it can be easier to work with this function.

      Note though that you must include @__FILE__ as the second argument, as DataDeps.jl uses this to allow reading the package specific deps/data directory. Advanced usage could specify a different file or nothing, but at that point you are on your own.

      source
      DataDeps.resolveMethod
      resolve(datadep, inner_filepath, calling_filepath)

      Returns a path to the folder containing the datadep. Even if that means downloading the dependency and putting it in there.

       - `inner_filepath` is the path to the file within the data dir
      - - `calling_filepath` is a path to the file where this is being invoked from

      This is basically the function the lives behind the string macro datadep"DepName/inner_filepath".

      source
      DataDeps.unpackMethod
      unpack(f; keep_originals=false)

      Extracts the content of an archive in the current directory; deleting the original archive, unless the keep_originals flag is set.

      source
      DataDeps.@datadep_strMacro
      `datadep"Name"` or `datadep"Name/file"`

      Use this just like you would a file path, except that you can refer by name to the datadep. The name alone will resolve to the corresponding folder. Even if that means it has to be downloaded first. Adding a path within it functions as expected.

      source
      DataDeps.DisabledErrorType

      DisabledError For when functionality that is disabled is attempted to be used

      source
      DataDeps.NoValidPathErrorType

      For when there is no valid location available to save to.

      source
      DataDeps.UserAbortErrorType

      For when a users has selected to abort

      source
      Base.downloadMethod
      Base.download(
      +)

      Required Fields

      • name: the name used to refer to this datadep
        • Corresponds to a folder name where the datatep will be stored.
        • It can have spaces or any other character that is allowed in a Windows filestring (which is a strict subset of the restriction for unix filenames).
      • message: a message displayed to the user for they are asked if they want to download it
        • This is normally used to give a link to the original source of the data, a paper to be cited etc.
      • remote_path: where to fetch the data from
        • This is usually a string, or a vector of strings (or a vector of vectors... see Recursive Structure in the documentation for developers).

      Optional Fields

      • hash: used to check whether the files downloaded correctly
        • By far the most common use is to just provide a SHA256 sum as a hex-string for the files.
        • If not provided, then a warning message with the SHA256 sum is displayed. This is to help package devs work out the sum for their files, without using an external tool. You can also calculate it using Preupload Checking in the documentation for developers.
        • If you want to use a different hashing algorithm, then you can provide a tuple (hashfun, targethex). hashfun should be a function which takes an IOStream, and returns a Vector{UInt8}. Such as any of the functions from SHA.jl, eg sha3_384, sha1_512 or md5 from MD5.jl
        • If you want to use a different hashing algorithm, but don't know the sum, you can provide just the hashfun and a warning message will be displayed, giving the correct tuple of (hashfun, targethex) that should be added to the registration block.
        • If you don't want to provide a checksum, because your data can change pass in the type Any which will suppress the warning messages. (But see above warnings about "what if my data is dynamic").
        • Can take a vector of checksums, being one for each file, or a single checksum in which case the per file hashes are xored to get the target hash. (See Recursive Structure in the documentation for developers).
      • fetch_method=fetch_default: a function to run to download the files
        • Function should take 2 parameters (remote_filepath, local_directorypath), and can must return the local filepath to the file downloaded.
        • Default (fetch_default) can correctly handle strings containing HTTP[S] URLs, or any remote_path type which overloads Base.basename and Base.download, e.g. AWSS3.S3Path.
        • Can take a vector of methods, being one for each file, or a single method, in which case that method is used to download all of them. (See Recursive Structure in the documentation for developers).
        • Overloading this lets you change things about how the download is done – the transport protocol.
        • The default is suitable for HTTP[/S], without auth. Modifying it can add authentication or an entirely different protocol (e.g. git, google drive etc).
        • This function is also responsible to work out what the local file should be called (as this is protocol dependent).
      • post_fetch_method: a function to run after the files have been downloaded
        • Should take the local filepath as its first and only argument. Can return anything.
        • Default is to do nothing.
        • Can do what it wants from there, but most likely wants to extract the file into the data directory.
        • towards this end DataDeps.jl includes a command: unpack which will extract an compressed folder, deleting the original.
        • It should be noted that post_fetch_method runs from within the data directory.
          • which means operations that just write to the current working directory (like rm or mv or run(`SOMECMD`)) just work.
          • You can call cwd() to get the the data directory for your own functions. (Or dirname(local_filepath)).
        • Can take a vector of methods, being one for each file, or a single method, in which case that same method is applied to all of the files. (See Recursive Structure in the documentation for developers).
        • You can check this as part of Preupload Checking in the documentation for developers.
      source
      DataDeps.preupload_checkMethod
      preupload_check(datadep, local_filepath[s])::Bool)

      Peforms preupload checks on the local files without having to download them. This is tool for creating or updating DataDeps, allowing the author to check the files before they are uploaded (or if downloaded directly). This checking includes checking the checksum, and the making sure the post_fetch_method runs without errors. It basically performs datadep resolution, but bypasses the step of downloading the files. The results of performing the post_fetch_method are not kept. As normal if the DataDep being checked does not have a checksum, or if the checksum does not match, then a warning message will be displayed. Similarly, if the post_fetch_method throws an exception, a warning will be displayed.

      Returns: true or false, depending on if the checks were all good, or not.

      Arguments:

      • datadep: Either an instance of a DataDep type, or the name of a registered DataDep as a AbstractString
      • local_filepath: a filepath or (recursive) list of filepaths. This is what would be returned by fetch in normal datadep use.
      source
      DataDeps.registerMethod
      register(datadep::AbstractDataDep)

      Registers the given datadep to be globally available to the program. this makes datadep"Name" work. register should be run within this __init__ of your module.

      source
      DataDeps.resolveMethod
      resolve("name/path", @__FILE__)

      Is the function that lives directly behind the datadep"name/path" macro. If you are working the the names of the datadeps programmatically, and don't want to download them by mistake; it can be easier to work with this function.

      Note though that you must include @__FILE__ as the second argument, as DataDeps.jl uses this to allow reading the package specific deps/data directory. Advanced usage could specify a different file or nothing, but at that point you are on your own.

      source
      DataDeps.resolveMethod
      resolve(datadep, inner_filepath, calling_filepath)

      Returns a path to the folder containing the datadep. Even if that means downloading the dependency and putting it in there.

       - `inner_filepath` is the path to the file within the data dir
      + - `calling_filepath` is a path to the file where this is being invoked from

      This is basically the function the lives behind the string macro datadep"DepName/inner_filepath".

      source
      DataDeps.unpackMethod
      unpack(f; keep_originals=false)

      Extracts the content of an archive in the current directory; deleting the original archive, unless the keep_originals flag is set.

      source
      DataDeps.@datadep_strMacro
      `datadep"Name"` or `datadep"Name/file"`

      Use this just like you would a file path, except that you can refer by name to the datadep. The name alone will resolve to the corresponding folder. Even if that means it has to be downloaded first. Adding a path within it functions as expected.

      source
      DataDeps.DisabledErrorType

      DisabledError For when functionality that is disabled is attempted to be used

      source
      DataDeps.NoValidPathErrorType

      For when there is no valid location available to save to.

      source
      DataDeps.UserAbortErrorType

      For when a users has selected to abort

      source
      Base.downloadMethod
      Base.download(
           datadep::DataDep,
           localdir;
           remotepath=datadep.remotepath,
           skip_checksum=false,
      -    i_accept_the_terms_of_use=nothing)

      A method to download a datadep. Normally, you do not have to download a data dependency manually. If you simply cause the string macro datadep"DepName", to be executed it will be downloaded if not already present.

      Invoking this download method manually is normally for purposes of debugging, As such it include a number of parameters that most people will not want to use.

      • localdir: this is the local directory to save to.
      • remotepath: the remote path to fetch the data from, use this e.g. if you can't access the normal path where the data should be, but have an alternative.
      • skip_checksum: setting this to true causes the checksum to not be checked. Use this if the data has changed since the checksum was set in the registry, or for some reason you want to download different data.
      • i_accept_the_terms_of_use: use this to bypass the I agree to terms screen. Useful if you are scripting the whole process, or using another system to get confirmation of acceptance.
        • For automation perposes you can set the environment variable DATADEPS_ALWAYS_ACCEPT
        • If not set, and if DATADEPS_ALWAYS_ACCEPT is not set, then the user will be prompted.
        • Strictly speaking these are not always terms of use, it just refers to the message and permission to download.

      If you need more control than this, then your best bet is to construct a new DataDep object, based on the original, and then invoke download on that.

      source
      DataDeps._resolveMethod

      The core of the resolve function without any user friendly file stuff, returns the directory

      source
      DataDeps.accept_termsMethod
      accept_terms(datadep, localpath, remotepath, i_accept_the_terms_of_use)

      Ensures the user accepts the terms of use; otherwise errors out.

      source
      DataDeps.better_readlineFunction
      better_readline(stream = stdin)

      A version of readline that does not immediately return an empty string if the stream is closed. It will attempt to reopen the stream and if that fails then throw an error.

      source
      DataDeps.checksumMethod
      checksum(hasher=sha2_256, filename[/s])

      Executes the hasher, on the file/files, and returns a UInt8 array of the hash. xored if there are multiple files

      source
      DataDeps.checksum_passMethod
      checksum_pass(hash, fetched_path)

      Ensures the checksum passes, and handles the dialog with use user when it fails.

      source
      DataDeps.determine_save_pathFunction
      determine_save_path(name)

      Determines the location to save a datadep with the given name to.

      source
      DataDeps.ensure_download_permittedMethod
      ensure_download_permitted()

      This function will throw an error if download functionality has been disabled. Otherwise will do nothing.

      source
      DataDeps.env_boolFunction
      env_bool(key)

      Checks for an environment variable and fuzzy converts it to a bool

      source
      DataDeps.env_listFunction
      env_list(key)

      Checks for an environment variable and converts it to a list of strings, sperated with a colon

      source
      DataDeps.fetch_baseMethod

      fetchbase(remotepath, local_dir)

      Download from remote_path to local_dir, via Base mechanisms. The download is performed using Base.download and Base.basename(remote_path) is used to determine the filename. This is very limited in the case of HTTP as the filename is not always encoded in the URL. But it does work for simple paths like "http://myserver/files/data.csv". In general for those cases prefer http_download.

      The more important feature is that this works for anything that has overloaded Base.basename and Base.download, e.g. AWSS3.S3Path. While this doesn't work for all transport mechanisms (so some datadeps will still a custom fetch_method), it works for many.

      source
      DataDeps.fetch_defaultMethod
      fetch_default(remote_path, local_path)

      The default fetch method. It tries to be a little bit smart to work with things other than just HTTP. See also fetch_base and fetch_http.

      source
      DataDeps.fetch_httpMethod
      fetch_http(remotepath, localdir; update_period=5)

      Pass in a HTTP[/S] URL and a directory to save it to, and it downloads that file, returning the local path. This is using the HTTP protocol's method of defining filenames in headers, if that information is present. Returns the localpath that it was downloaded to.

      update_period controls how often to print the download progress to the log. It is expressed in seconds. It is printed at @info level in the log. By default it is once per second, though this depends on configuration

      source
      DataDeps.handle_missingMethod
      handle_missing(datadep::DataDep, calling_filepath)::String

      This function is called when the datadep is missing.

      source
      DataDeps.input_boolFunction
      bool_input

      Prompted the user for a yes or no.

      source
      DataDeps.input_choiceMethod
      input_choice

      Prompted the user for one of a list of options

      source
      DataDeps.input_choiceMethod
      input_choice

      Prompts the user for one of a list of options. Takes a vararg of tuples of Letter, Prompt, Action (0 argument function)

      Example:

      input_choice(
      +    i_accept_the_terms_of_use=nothing)

      A method to download a datadep. Normally, you do not have to download a data dependency manually. If you simply cause the string macro datadep"DepName", to be executed it will be downloaded if not already present.

      Invoking this download method manually is normally for purposes of debugging, As such it include a number of parameters that most people will not want to use.

      • localdir: this is the local directory to save to.
      • remotepath: the remote path to fetch the data from, use this e.g. if you can't access the normal path where the data should be, but have an alternative.
      • skip_checksum: setting this to true causes the checksum to not be checked. Use this if the data has changed since the checksum was set in the registry, or for some reason you want to download different data.
      • i_accept_the_terms_of_use: use this to bypass the I agree to terms screen. Useful if you are scripting the whole process, or using another system to get confirmation of acceptance.
        • For automation perposes you can set the environment variable DATADEPS_ALWAYS_ACCEPT
        • If not set, and if DATADEPS_ALWAYS_ACCEPT is not set, then the user will be prompted.
        • Strictly speaking these are not always terms of use, it just refers to the message and permission to download.

      If you need more control than this, then your best bet is to construct a new DataDep object, based on the original, and then invoke download on that.

      source
      DataDeps._resolveMethod

      The core of the resolve function without any user friendly file stuff, returns the directory

      source
      DataDeps.accept_termsMethod
      accept_terms(datadep, localpath, remotepath, i_accept_the_terms_of_use)

      Ensures the user accepts the terms of use; otherwise errors out.

      source
      DataDeps.better_readlineFunction
      better_readline(stream = stdin)

      A version of readline that does not immediately return an empty string if the stream is closed. It will attempt to reopen the stream and if that fails then throw an error.

      source
      DataDeps.checksumMethod
      checksum(hasher=sha2_256, filename[/s])

      Executes the hasher, on the file/files, and returns a UInt8 array of the hash. xored if there are multiple files

      source
      DataDeps.checksum_passMethod
      checksum_pass(hash, fetched_path)

      Ensures the checksum passes, and handles the dialog with use user when it fails.

      source
      DataDeps.determine_save_pathFunction
      determine_save_path(name)

      Determines the location to save a datadep with the given name to.

      source
      DataDeps.ensure_download_permittedMethod
      ensure_download_permitted()

      This function will throw an error if download functionality has been disabled. Otherwise will do nothing.

      source
      DataDeps.env_boolFunction
      env_bool(key)

      Checks for an environment variable and fuzzy converts it to a bool

      source
      DataDeps.env_listFunction
      env_list(key)

      Checks for an environment variable and converts it to a list of strings, sperated with a colon

      source
      DataDeps.fetch_baseMethod

      fetchbase(remotepath, local_dir)

      Download from remote_path to local_dir, via Base mechanisms. The download is performed using Base.download and Base.basename(remote_path) is used to determine the filename. This is very limited in the case of HTTP as the filename is not always encoded in the URL. But it does work for simple paths like "http://myserver/files/data.csv". In general for those cases prefer http_download.

      The more important feature is that this works for anything that has overloaded Base.basename and Base.download, e.g. AWSS3.S3Path. While this doesn't work for all transport mechanisms (so some datadeps will still a custom fetch_method), it works for many.

      source
      DataDeps.fetch_defaultMethod
      fetch_default(remote_path, local_path)

      The default fetch method. It tries to be a little bit smart to work with things other than just HTTP. See also fetch_base and fetch_http.

      source
      DataDeps.fetch_httpMethod
      fetch_http(remotepath, localdir; update_period=5)

      Pass in a HTTP[/S] URL and a directory to save it to, and it downloads that file, returning the local path. This is using the HTTP protocol's method of defining filenames in headers, if that information is present. Returns the localpath that it was downloaded to.

      update_period controls how often to print the download progress to the log. It is expressed in seconds. It is printed at @info level in the log. By default it is once per second, though this depends on configuration

      source
      DataDeps.handle_missingMethod
      handle_missing(datadep::DataDep, calling_filepath)::String

      This function is called when the datadep is missing.

      source
      DataDeps.input_boolFunction
      bool_input

      Prompted the user for a yes or no.

      source
      DataDeps.input_choiceMethod
      input_choice

      Prompted the user for one of a list of options

      source
      DataDeps.input_choiceMethod
      input_choice

      Prompts the user for one of a list of options. Takes a vararg of tuples of Letter, Prompt, Action (0 argument function)

      Example:

      input_choice(
           ('A', "Abort -- errors out", ()->error("aborted")),
           ('X', "eXit -- exits normally", ()->exit()),
           ('C', "Continue -- continues running", ()->nothing)),
       )
      -
      source
      DataDeps.is_valid_nameMethod
      is_valid_name(name)

      This checks if a datadep name is valid. This basically means it must be a valid folder name on windows.

      source
      DataDeps.list_local_pathsMethod
      list_local_paths( name|datadep, [calling_filepath|module|nothing])

      Lists all the local paths to a given datadep. This may be an empty list

      source
      DataDeps.postfetch_checkMethod
      postfetch_check(post_fetch_method, local_path)

      Executes the postfetchmethod on the given local path, in a temporary directory. Returns true if there are no exceptions. Performs in (async) parallel if multiple paths are given

      source
      DataDeps.preferred_pathsFunction
      preferred_paths(calling_filepath; use_package_dir=true)

      returns the datadeps loadpath plus if callingfilepath is provided and use_package_dir=true and is currently inside a package directory then it also includes the path to the dataseps in that folder.

      source
      DataDeps.progress_update_periodMethod
      progress_update_period()

      Returns the period between updated being logged on the progress. This is used by the default fetch_method and is generally a good idea to use it in any custom fetch method, if possible

      source
      DataDeps.run_checksumMethod

      Providing only a hash string, results in defaulting to sha2_256, with that string being the target

      source
      DataDeps.run_checksumMethod

      If a vector of paths is provided and a vector of hashing methods (of any form) then they are all required to match.

      source
      DataDeps.run_checksumMethod

      If only a function is provided then assume the user is a developer, wanting to know what hash-line to add to the Registration line.

      source
      DataDeps.run_checksumMethod

      If nothing is provided then assume the user is a developer, wanting to know what sha2_256 hash-line to add to the Registration line.

      source
      DataDeps.run_checksumMethod
      run_checksum(checksum, path)

      THis runs the checksum on the files at the fetched_path. And returns true or false base on if the checksum matches. (always true if no target sum given) It is kinda flexible and accepts different kinds of behaviour to give different kinds of results.

      If path (the second parameter) is a Vector, then unless checksum is also a Vector, the result is the xor of the all the file checksums.

      source
      DataDeps.run_checksumMethod

      Use Any to mark as not caring about the hash. Use this for data that can change

      source
      DataDeps.run_fetchMethod
      run_fetch(fetch_method, remotepath, localdir)

      executes the fetchmethod on the given remotepath, into the local directory and local paths. Performs in (async) parallel if multiple paths are given

      source
      DataDeps.run_post_fetchMethod
      run_post_fetch(post_fetch_method, fetched_path)

      executes the postfetchmethod on the given fetched path, Performs in (async) parallel if multiple paths are given

      source
      DataDeps.splitpathMethod
      splitpath(path)

      The opposite of joinpath, splits a path unto each of its directories names / filename (for the last).

      source
      DataDeps.try_determine_load_pathMethod
      try_determine_load_path(name)

      Tries to find a local path to the datadep with the given name. If it fails then it returns nothing.

      source
      DataDeps.try_determine_package_datadeps_dirMethod
      try_determine_package_datadeps_dir(filepath)

      Takes a path to a file. If that path is in a package's folder, Then this returns a path to the deps/data dir for that package (as a Nullable). Which may or may not exist. If not in a package returns null

      source
      DataDeps.uv_accessMethod
      uv_access(path, mode)

      Check access to a path. Returns 2 results, first an error code (0 for all good), and second an error message. https://stackoverflow.com/a/47126837/179081

      source
      +source
      DataDeps.is_valid_nameMethod
      is_valid_name(name)

      This checks if a datadep name is valid. This basically means it must be a valid folder name on windows.

      source
      DataDeps.list_local_pathsMethod
      list_local_paths( name|datadep, [calling_filepath|module|nothing])

      Lists all the local paths to a given datadep. This may be an empty list

      source
      DataDeps.postfetch_checkMethod
      postfetch_check(post_fetch_method, local_path)

      Executes the postfetchmethod on the given local path, in a temporary directory. Returns true if there are no exceptions. Performs in (async) parallel if multiple paths are given

      source
      DataDeps.preferred_pathsFunction
      preferred_paths(calling_filepath; use_package_dir=true)

      returns the datadeps loadpath plus if callingfilepath is provided and use_package_dir=true and is currently inside a package directory then it also includes the path to the dataseps in that folder.

      source
      DataDeps.progress_update_periodMethod
      progress_update_period()

      Returns the period between updated being logged on the progress. This is used by the default fetch_method and is generally a good idea to use it in any custom fetch method, if possible

      source
      DataDeps.run_checksumMethod

      Providing only a hash string, results in defaulting to sha2_256, with that string being the target

      source
      DataDeps.run_checksumMethod

      If a vector of paths is provided and a vector of hashing methods (of any form) then they are all required to match.

      source
      DataDeps.run_checksumMethod

      If only a function is provided then assume the user is a developer, wanting to know what hash-line to add to the Registration line.

      source
      DataDeps.run_checksumMethod

      If nothing is provided then assume the user is a developer, wanting to know what sha2_256 hash-line to add to the Registration line.

      source
      DataDeps.run_checksumMethod
      run_checksum(checksum, path)

      THis runs the checksum on the files at the fetched_path. And returns true or false base on if the checksum matches. (always true if no target sum given) It is kinda flexible and accepts different kinds of behaviour to give different kinds of results.

      If path (the second parameter) is a Vector, then unless checksum is also a Vector, the result is the xor of the all the file checksums.

      source
      DataDeps.run_checksumMethod

      Use Any to mark as not caring about the hash. Use this for data that can change

      source
      DataDeps.run_fetchMethod
      run_fetch(fetch_method, remotepath, localdir)

      executes the fetchmethod on the given remotepath, into the local directory and local paths. Performs in (async) parallel if multiple paths are given

      source
      DataDeps.run_post_fetchMethod
      run_post_fetch(post_fetch_method, fetched_path)

      executes the postfetchmethod on the given fetched path, Performs in (async) parallel if multiple paths are given

      source
      DataDeps.splitpathMethod
      splitpath(path)

      The opposite of joinpath, splits a path unto each of its directories names / filename (for the last).

      source
      DataDeps.try_determine_load_pathMethod
      try_determine_load_path(name)

      Tries to find a local path to the datadep with the given name. If it fails then it returns nothing.

      source
      DataDeps.try_determine_package_datadeps_dirMethod
      try_determine_package_datadeps_dir(filepath)

      Takes a path to a file. If that path is in a package's folder, Then this returns a path to the deps/data dir for that package (as a Nullable). Which may or may not exist. If not in a package returns null

      source
      DataDeps.uv_accessMethod
      uv_access(path, mode)

      Check access to a path. Returns 2 results, first an error code (0 for all good), and second an error message. https://stackoverflow.com/a/47126837/179081

      source