Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modify git-bug to use go-billy filesystem and not touch OS filesystem directly #2

Open
happybeing opened this issue Nov 8, 2020 · 25 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@happybeing
Copy link
Owner

git-bug currently uses operating system filesystem calls directly (e.g. https://golang.org/pkg/os/ in git-bug/cache) which prevents it being compiled for Web Assembly.

This task is modify git-bug to use the go-billy filesystem interface, also used by go-git which git-bug relies on for git functionality. Then to add support for wasm builds of git-bug, by building with this fork of go-billy (branch support-build-wasm) which has been modified to build successfully with GOARCH=wasm.

@MichaelMure
Copy link

In short, your goal is going to be to remove entirely (of maybe keep it just for testing purpose?) the GetPath() function in the repositories interfaces.

I see two different kind of usages: the Bleve full text search index, and writing/reading cache files.

Disclaimer: the following is very raw thoughts, the reality is likely going to be different once confronted with the code.

Bleve index:
In https://github.com/MichaelMure/git-bug/blob/master/cache/repo_cache_bug.go#L109-L127, GetPath is used to instruct Bleve where to write it's index. The goal here is going to add a RepoBleve interface (similar to RepoKeyring) to give access to a Bleve index. Once you have that, you can change the actual implementation: the GoGit repo will use an actual on disk Index as currently, while https://github.com/MichaelMure/git-bug/blob/master/repository/mock_repo.go will use the in-memory version of that index (provided by Bleve). I'm not exactly sure what kind of repo you need for your project, but having this interface will allow you to swap actual implementation and tune it the way you want.

cache files:
A bunch of files are read/written on disk by the cache, for example https://github.com/MichaelMure/git-bug/blob/master/cache/repo_cache_bug.go#L68-L71. The goal here is going to change the way the disk is accessed, again by having an additional interface in Repo. Instead of using GetPath() to get the path on disk and use the normal file access from the standard library, the goal is going to be to provide a go-billy filesystem for some places (at least .git/git-bug) and read/write files with that. As before, once you have that you can swap out the actual implementation to suit your need and work only in memory.

@happybeing
Copy link
Owner Author

Thanks Michael that's very helpful and sounds exactly what I need. For the p2p features I'm going to use a filesystem like interface (so go-git and git-bug would behave as for the CLI reading and writing ./.git but via a go-billy fs).

I'll also need to try making a git-bug library, so might try that first.

@happybeing
Copy link
Owner Author

happybeing commented Nov 29, 2020

@MichaelMure I've been familiarising myself with the parts of the code at commit (2 Nov, #25b0c71), so just repo cache and not bleve, which accesses the disk. The following is a summary of what is making use of repo.GetPath() separated into needed now and optional/could be done later:

Places which must be modified for use with either os or go-billy filesystem:

cache/repo_cache_bug.go:
	loadBugCache() 	# uses bugCacheFilePath()	
	writeBugCache()	# uses bugCacheFilePath()
cache/repo_cache_common.go:
	RepoCache.GetPath() # Just reflects repo.GetPath()
cache/repo_cache_identity.go:
	loadIdentityCache()	# uses identityCacheFilePath()
	writeIdentityCache()	# uses identityCacheFilePath()
cache/repo_cache.go:
	repoIsAvailable()	# uses repoLockFilePath()

Optional / later - places which can continue calling Repository.GetPath() because they can assume os filesystem for now at least:

bug/bug_test.go:
	TestBugRemove()	# Only used with 'file:///' path (for remotes)
cache/repo_cache_test.go:
	TestRemove()		# Only used with 'file:///' path (for remotes)
input/input.go:
	# Invoking an external editor can assume os filesystem is ok
	launchEditorWithTemplate()
	launchEditor()
repository/gogit_test.go:
	TestNewGoGitRepo()	# Just compares path strings
repository/repo_gesting.go:
	CleanupTestRepos()	# calls os.RemoveAll()
repository/git_testing.go:
	SetupReposAndRemote()
# Not sure - can we assume selection will only apply for CLI operations (where os storage can be assumed)?
commands/select.go:
	selectFilePath()  # Saves bug id that is selected (active) for operations

And here are the things that will or may need to be implemented by a new interface, when using go-billy.Filesystem:
os methods used:

Create()
Open()
OpenFile() used by select.go: Select()
Write()
WriteAll()

# Just for file locking (see cache/repo_cache.go):
Getpid()
IsNotExists()
MkdirAll
Remove()

File methods used:

Write()
WriteString()	# file locks only (write PID)
Close()

see also:

gob.NewDecoder(File)   # in loadBugCache(), loadIdentityCache() for serialisation
io.LimitReader(File, int)  # in cache/repo_cache.go: repoIsAvailable() which handles locks

As I understand your suggestion, this can be done by adding type RepoStorage interface { Storage() GitBugStorage } to git-bug Repo, where GitBugStorage is an interface which specifies the needed os functions.

Then GoGitRepo will provide an implementation GitBugStorage which uses a gobilly.Filesystem, which can be either go-billy osfs or memfs, where operating system storage is used for the git-bug CLI, or in memory (or p2p) storage will be used for an in-browser application such as the p2p git portal.

Is this what you had in mind, sound sensible?

If the above is good, what's the recommended way to implement this in ./repository, should I create a file gogit_storage.go containing a struct GogitStorage holding the go-billy.Filesystem object with an implementation of the GitBugStorage interface?

@MichaelMure
Copy link

Pretty much. I looked in more details how GetPath is used:

what is the local git remote URL ?

repoLockFilePath --> lock file, in .git/
bugCacheFilePath --> bug index file, in .git/
searchCacheDirPath --> bleve path, in .git/
identityCacheFilePath --> identity index file, in .git/
selectFilePath --> select file, in .git/

launchEditor --> any temp file
launchEditorWithTemplate --> any temp file

os.RemoveAll --> delete repo

So it seems that everything that is needed is an access to $RepoPath/.git/git-bug with a billy.Filesystem and some functions only used for testing.

I believe the changes needed to the Repo interfaces is as follow:

diff --git a/repository/repo.go b/repository/repo.go
index 4b45a1c5..d34995f9 100644
--- a/repository/repo.go
+++ b/repository/repo.go
@@ -4,6 +4,8 @@ package repository
 import (
        "errors"
 
+       "github.com/go-git/go-billy/v5"
+
        "github.com/MichaelMure/git-bug/util/lamport"
 )
 
@@ -20,6 +22,7 @@ type Repo interface {
        RepoKeyring
        RepoCommon
        RepoData
+       RepoStorage
 }
 
 // ClockedRepo is a Repo that also has Lamport clocks
@@ -48,9 +51,6 @@ type RepoKeyring interface {
 
 // RepoCommon represent the common function the we want all the repo to implement
 type RepoCommon interface {
-       // GetPath returns the path to the repo.
-       GetPath() string
-
        // GetUserName returns the name the the user has used to configure git
        GetUserName() (string, error)
 
@@ -64,6 +64,11 @@ type RepoCommon interface {
        GetRemotes() (map[string]string, error)
 }
 
+type RepoStorage interface {
+       // LocalStorage return a billy.Filesystem giving access to $RepoPath/.git/git-bug
+       LocalStorage() billy.Filesystem
+}
+
 // RepoData give access to the git data storage
 type RepoData interface {
        // FetchRefs fetch git refs from a remote
@@ -145,4 +150,10 @@ type TestedRepo interface {
 type repoTest interface {
        // AddRemote add a new remote to the repository
        AddRemote(name string, url string) error
+
+       // GetLocalRemote return the URL to use to add this repo as a local remote
+       GetLocalRemote() string
+
+       // EraseFromDisk delete this repository entirely from the disk
+       EraseFromDisk() error
 }

@MichaelMure
Copy link

Naming the function LocalStorage() leave some door open to add a GlobalStorage() or else later if a common storage become needed. defaultKeyring() use os.UserConfigDir() which would bypass but that doesn't seem to be a problem for your project (yet?).

@happybeing
Copy link
Owner Author

happybeing commented Nov 29, 2020

Thanks Michael, just skimming as I'm done for today. I'd shy away from using LocalStorage() being too similar to the localStorage browser API, see https://developer.mozilla.org/en-US/docs/Web/API/Window/localStorage

Will look more tomorrow.

@MichaelMure
Copy link

The browser LocalStorage could be a legit implementation of this interface, depending on your usage ;)

It's the same name but a very different context so I think it's fine.

@happybeing
Copy link
Owner Author

window.localStorage is a key/value rather than filesystem API so non trivial to implement a fs. We have go-billy and other options in both contexts anyway. BTW I've done something equivalent and it's unpleasant to say the least! I don't see the point in using a name that can confuse unless we have no other decent options.

@MichaelMure
Copy link

type RepoStorage interface {
       // LocalStorage return a billy.Filesystem giving access to $RepoPath/.git/git-bug
       LocalStorage() billy.Filesystem

       // ....
       GlobalStorage() billy.Filesystem

       // ...
       TempStorage() billy.Filesystem
}

Don't you think that's reasonable?

@MichaelMure
Copy link

The same concept is already used for LocalConfig() and GlobalConfig().

@happybeing
Copy link
Owner Author

What are the three categories used for?

@MichaelMure
Copy link

Only LocalStorage() would be there for now, but the logical continuation (if we need it) would be to add the next two.

GlobalStorage() would provide an access to a storage space shared by add the repo (say, the folder given by os.UserConfigDir())
TempStorage() would provide an access to a temporary storage space (say, /tmp/git-bug).

@MichaelMure
Copy link

My point is, LocalStorage() does remind of window.localstorage and could be misleading but I'd say that the interface above make sense in the context of git-bug, which is really what matter.

@happybeing
Copy link
Owner Author

happybeing commented Nov 29, 2020

The choice is yours obviously, but I may make suggestions when I understand better 😄.

So your distinction is that LS is a store per repo, whereas GS is more like a PC filesystem holding several repos?

@MichaelMure
Copy link

GS is more like a PC filesystem holding several repos

It doesn't matter if this filesystem also hold the actual repos. What matter is that this function return the same shared filesystem when called from any repo.

@happybeing
Copy link
Owner Author

happybeing commented Nov 29, 2020

Ok. I'm not sure, but for my use case GS may be what's needed but I'm still familiarising myself with go-git, which to confuse matters can have both Storage (for git objects) and go-billy.Filesystem (for a working tree) as I expect you know. Storage's everywhere!

I'll probably go to code next and figure out these details that way. You've been very helpful, thanks. So far it looks straightforward except perhaps for the file locking, although I think that should be ok too.

EDIT: no worries for now, but I'm not getting the need for GS, LS and TS. If you need the repos to be on the same storage, you can just give them all the same storage object, if separate you give each one a separate storage object. I don't see why this would be the choice of the code accessing the repo.

@MichaelMure
Copy link

no worries for now, but I'm not getting the need for GS, LS and TS.

There is no need for GS and TS ... for now. But I could see usage for them in the future.

An example would be the Keyring. The Keyring is a shared (=global in the git terminology) storage space for credentials. When a bridge is configured in a repo, the credentials become available in every other repo for when you configure your next bridge.

At the moment the Keyring is already swapped out by a memory-only implementation in the mock repo so it's not a problem but if we were to re-implement that, we could use this GS and read/write files there.

TS is just another example, I don't have usage in mind just yet, but a temporary file is useful sometimes.

@happybeing
Copy link
Owner Author

I think I'll start with Storage() as I'm not sure yet if it will be local or global and we/you can rename it appropriately later, or when we need you have multiple categories.

@happybeing
Copy link
Owner Author

@MichaelMure I see there's a 'select' mechanism which writes to disk, presumably so the CLI can have a current issue which is being addressed with multiple commands, so is git-bug writing cache and bleve index to disk for similar CLI convenience, or are there good reasons for wanting them to persist for longer periods (from one day to the next, ideally forever etc)?

This is not an immediate issue, but something for me to think about.

@MichaelMure
Copy link

Persist over long period not necessarily but across multiple CLI call, yes.

All those files (cache index, bleve, select) could be deleted without loosing too much. The downside is that everything will be re-indexed with the next call, which can be expensive.

@happybeing
Copy link
Owner Author

happybeing commented Dec 1, 2020

@MichaelMure I'm a month behind master, so maybe some things are in flux but I'm wondering about when git-bug uses git to do operations, and when it uses go-git. Is the code which spawns git commands obsolete or being refactored, or is it still in use and likely to remain so?

A couple of other points:

  • I can modify NewPersistedClock() to use the billy.Filesystem and so persist on the device. Just noting this.
  • I see places such as buildCache() where you use os.Stderr with fmt.Fprintln() and fmt.Fprintf() which I'll need to modify. I can change them to println() for now which works with wasm, or can provide a new FprintError() which will compile to use fmt.Fprintln(), or when built for wasm to println(). Let me know if you prefer something else.

@MichaelMure
Copy link

The old git CLI implementation is still around but it's not used anymore in the master branch. I think I'll keep it around until I'm sure things are really stable.

Which implementation is used is defined in those functions: https://github.com/MichaelMure/git-bug/blob/master/commands/env.go#L49-L137
Each command use one of those functions as a pre-run: https://github.com/MichaelMure/git-bug/blob/master/commands/label_add.go#L15

@MichaelMure
Copy link

So in your case we'll have to figure out a way to inject the proper Repo implementation. There is plenty of way to do that, we'll just have to find the less ugly one.

@happybeing
Copy link
Owner Author

happybeing commented Dec 1, 2020

Thanks, that's what I was hoping 😄. I've been looking at the code you reference just now.

To call NewGoGitRepo() I see env.go has functions like loadBackendEnsureUser() which aren't exported (no leading cap) but are obviously called by the commands handler presumably via a struct. The lack of leading capital made it harder to understand the code as I'm using that to signify local functions but I see why that is now, just noting it.

If we decide I should import them elsewhere I'd need to capitalise them, although my first thought is that I wouldn't use your loadRepo() etc, but do this myself. Not sure yet. Again just noting in case you want to comment.

I'm not sure there's an issue regarding 'injecting' the repos as you're already using gogit.Repository. I am not planning to change that, but to create the gogit.Repository to use a billy.Filesystem, and so would use the existing loadRepo() function for existing code using g-billy osfs, and provide a new one which supplies its own billy.Filesystem for the p2p git portal. I can then modify any code which is currently accessing the os filesystem calls, and change it to use the repo's billy.Filesystem to access .git/git-bug. Or am I missing something? Quite possibly! 😄

@happybeing
Copy link
Owner Author

Status update

The branch gobilly-test-initfs at commit 25b0c71 is working with the git portal proof-of-concept, branch main at happybeing/p2p-git-portal-poc@9c6f2ef).

The above allows work to continue in parallel on both the p2p git portal and to integrate these changes with the latest git-bug (including bleve indexing), and provide an improved API to allow use of git-bug with an external billy.Filesystem.

Remaining git-bug tasks

@happybeing:

  • fork git-bug main and make a new branch to provide an API for use with an external billy.Filesystem
  • merge commits which add ability to use an externally provided filesystem (i.e. add InitFs..(), OpenFs..()andGetGitRepo()` from 25b0c71)
  • test with p2p-git-portal-poc happybeing/p2p-git-portal-poc@9c6f2ef

@happybeing + @MichaelMure:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants