Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FUSE mount is very slow #2166

Open
Kubuxu opened this issue Jan 6, 2016 · 9 comments
Open

FUSE mount is very slow #2166

Kubuxu opened this issue Jan 6, 2016 · 9 comments
Labels
kind/bug A bug in existing code (including security flaws) topic/fuse Topic fuse topic/perf Performance

Comments

@Kubuxu
Copy link
Member

Kubuxu commented Jan 6, 2016

and uses a lot of CPU resources. This 600KiB/s was lucky, most of the time it was around 200 to 300KiB/s.

In comparison ipfs cat reached 250MiB/s.

@whyrusleeping
Copy link
Member

What version of ipfs?

On Wed, Jan 6, 2016, 11:15 Jakub Sztandera [email protected] wrote:

and uses a lot of CPU resources. This 600KiB/s was lucky, most of the time
it was around 200 to 300KiB/s.

https://camo.githubusercontent.com/d43002376ae600886ba0d4a68a478ad77a7440e0/68747470733a2f2f697066732e706963732f697066732f516d655969446e4645554a4c72396479585739453443374a65446d45526544516f463253346369534e6d77594642

In comparison ipfs cat reached 250MiB/s.


Reply to this email directly or view it on GitHub
#2166.

@Kubuxu
Copy link
Member Author

Kubuxu commented Jan 6, 2016

0.3.11-dev, I will check 0.4 if anything changed in that matter.

@Kubuxu
Copy link
Member Author

Kubuxu commented Jan 6, 2016

It is faster but still 60 times slower than ipfs cat in case of file containing zeros:

# cat /ipfs/QmdiETTY5fiwTkJeERbWAbPKtzcyjzMEJTJJosrqo2qKNm | pv -a | wc -c
[4.29MiB/s]
^C
# ~/go/bin/ipfs cat /ipfs/QmdiETTY5fiwTkJeERbWAbPKtzcyjzMEJTJJosrqo2qKNm | pv -a | wc -c
[ 247MiB/s]
1073741824

@randoms
Copy link

randoms commented Jan 7, 2016

I encountered the issue today. It seems file contents are calculated in each read action. Cat is faster because there is no need to recalculate the file content. Add a read cache will increase the read speed . I achieved 50MB/S by adding a 40MB cache.

@Kubuxu
Copy link
Member Author

Kubuxu commented Jan 7, 2016

What do you mean by read cache?

@randoms
Copy link

randoms commented Jan 8, 2016

Rewrite the read function in fuse. When read function was called, check file cache first. If target content was in file cache, just return the content from cache. If content was not in cache, start a new thread to read content from ipfs(this is the key point, read more data than read function needed. Read data is fast, but find the data to read is slow.). And write the data to file cache. Main thread will check file cache constantly until target data was found in cache. Here's part of my code written in python. I use ipfs web api to read data from ipfs.

self.fileCache = {}
self.fileCacheLock = threading.Lock()

# the fuse read function
def read(self, path, length, offset, fh):
    end = offset + length
    data = self.get_cache(fileHash, offset, end)
    return data

def get_cache(self, hash, start, end):
    if self.fileCache.has_key(hash):
        self.fileCacheLock.acquire()
        cache = self.fileCache[hash]
        self.fileCacheLock.release()
        cache["lock"].acquire()
        if start >= cache["start"] and end <= cache["end"]:
            data = cache["data"][(start-cache["start"]):(end - cache["start"])]
            cache["lock"].release()
            return data
        else:
            cache["lock"].release()
            if cache["download"] != None:
                # stop download thread
                cache["download"].stop()
                if cache["download"] != None:
                    cache["download"].join()
            # start new download thread
            downloadThread(cache, start).start()
    else:
        cache = {
            "start": start,
            "end": start,
            "data": "",
            "lock": threading.Lock(),
            "download": None,
            "hash": hash,
        }
        self.fileCacheLock.acquire()
        self.fileCache[hash] = cache
        self.fileCacheLock.release()
        downloadThread(cache, start).start()
    # wait for data
    while True:
        time.sleep(0.001)
        cache["lock"].acquire()
        if start >= cache["start"] and end <= cache["end"]:
            data = cache["data"][(start-cache["start"]):(end - cache["start"])]
            cache["lock"].release()
            return data
        cache["lock"].release()

class downloadThread(threading.Thread):

    def __init__(self, cache, start):
        super(downloadThread, self).__init__()
        self._stop = threading.Event()
        self.cache = cache
        self.startIndex = start

    def stop(self):
        self._stop.set()

    def run(self):
        # add thread record
        self.cache["lock"].acquire()
        print "download thread start " + self.cache["hash"] + " " + str(self.startIndex)
        if self.cache["download"] != None:
            print "Error download thread error"
        self.cache["download"] = self
        self.cache["lock"].release()

        chunkIndex = self.startIndex
        r = requests.get("http://127.0.0.1:8080/ipfs/" + self.cache["hash"],
        headers={"range": "bytes="+ str(self.startIndex) +"-"}, stream=True, timeout=200)
        for chunk in r.iter_content(chunk_size=1024*1024*2): # this value affect performance greatly
            if chunk: # filter out keep-alive new chunks
                self.cache["lock"].acquire()
                if chunkIndex == self.startIndex:
                    self.cache["data"] = chunk
                    self.cache["start"] = self.startIndex
                else:
                    self.cache["data"] += chunk
                chunkIndex += len(chunk)
                self.cache["end"] = chunkIndex
                self.cache["lock"].release()
            if self.stopped():
                break
            if self.cache["end"] -  self.cache["start"] > 40*1024*1024:
                # max cache size 40M
                print "max cache size"
                break
        r.close()
        self.cache["lock"].acquire()
        if self.cache["download"] == self:
            # download completed remove thread record
            self.cache["download"] = None
        print "download thread end " + self.cache["hash"]
        self.cache["lock"].release()
    def stopped(self):
        return self._stop.isSet()

@SupraSummus
Copy link

Hi,

I wrote simple mounting utility that works in a way @randoms described (or at least similar). It's witten in python and uses fusepy for mounting.

Repo is at https://github.com/SupraSummus/ipfs-api-mount

There are many things to improve in this utility. I plan to work on it. (I need "fast" IPFS mountpoints for my other project.)

@Stebalien
Copy link
Member

Nice! Actually, I wonder if it's useful to consider moving away from a built-in fuse interface? We'd probably want a faster API (unix domain sockets and a real RPC protocol) first but, from a security standpoint, it would be really nice (much easier to sandbox IPFS). Also, reducing the number of features built into IPFS directly would be kind of nice...

Thoughts @whyrusleeping?

@piedar
Copy link

piedar commented May 13, 2018

I published another utility ipfs-mount in an attempt to bring these features together in nodejs land. It supports /ipfs and /mfs (todo: /ipns) and has respectable performance with no added caching layer. The http gateway is still the fastest option...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) topic/fuse Topic fuse topic/perf Performance
Projects
None yet
Development

No branches or pull requests

8 participants