Skip to content
This repository has been archived by the owner on Apr 29, 2020. It is now read-only.

"ipfs add" and "ipfs files write" commands returns different hashes #45

Closed
lockedshadow opened this issue Dec 13, 2016 · 4 comments
Closed

Comments

@lockedshadow
Copy link

Hello! And first of all, I apologize for my bad english. Hope that you can understand it.

I'm trying to add into mfs some files (previously added via ipfs add) but ipfs files write command produces different hashes than ipfs add.

For example:

$ echo "IPFS Files API is awesome!" > ipfs-files-api-test.txt
$ ipfs add ipfs-files-api-test.txt
added QmUZtQRZG58yB55k5NFPFeYBQ3FMTKydpuNAb66JnxDgup ipfs-files-api-test.txt

Next, let's try to write this file to mfs:

$ ipfs files write --create /ipfs-files-test ipfs-files-api-test.txt
$ ipfs files stat /ipfs-files-test
QmYJnHQ8yMSursnCvJa2nKEaQKXXFbbLm5MLXqbuHKZdfe
Size: 27
CumulativeSize: 137
ChildBlocks: 2
Type: file

As we can see, hashes is actually different. Seems like one string now known as two different objects. If is actually true, it turns out that deduplication is not performed for this case.

But object, that returned by ipfs files stat command have two child blocks. Maybe one of these blocks is the same object, that was produced by previously executed ipfs add command?

$ ipfs object links QmYJnHQ8yMSursnCvJa2nKEaQKXXFbbLm5MLXqbuHKZdfe
QmejyB5JSYNMcJeQbXuPj4W1DM23DxsWrU42JQFqy3Z7Xe 8
QmPt4vGy69ENW5GJgVN8wSV5UAoG2SapjRYVUDQJVWbACR 35

No, none of these is not QmUZtQRZG58yB55k5NFPFeYBQ3FMTKydpuNAb66JnxDgup, which produced by previously executed ipfs add. But one of those definitely should contains the source string:

$ ipfs get QmUZtQRZG58yB55k5NFPFeYBQ3FMTKydpuNAb66JnxDgup -o result-of-add.txt
$ ipfs get QmPt4vGy69ENW5GJgVN8wSV5UAoG2SapjRYVUDQJVWbACR -o result-of-files-write.txt
$ diff result-of-add.txt result-of-files-write.txt --report-identical-files
Files result-of-add.txt and result-of-files-write.txt are identical

Indeed, it's the same string. But why the hashes are different? Not exactly what I would like to get.

But OK, we can directly add some previously added hashes to mfs. For example:

$ ipfs add ipfs-files-api-test.txt
added QmUZtQRZG58yB55k5NFPFeYBQ3FMTKydpuNAb66JnxDgup ipfs-files-api-test.txt
$ ipfs files cp /ipfs/QmUZtQRZG58yB55k5NFPFeYBQ3FMTKydpuNAb66JnxDgup /ipfs-files-test-2
$ ipfs files stat /ipfs-files-test-2
QmUZtQRZG58yB55k5NFPFeYBQ3FMTKydpuNAb66JnxDgup #Finally, the same hash!
Size: 27
CumulativeSize: 35
ChildBlocks: 0
Type: file

(BTW, it's slightly unclear, that we can write to mfs any existing hashes using ipfs files cp. I figured it out only after reading this: ipfs/kubo#2610 (comment))

But what if I now want to overwrite some files, existing into mfs?

$ echo "IPFS Files API is really awesome!" | ipfs add
added QmS2YcaWxiprdGuXgvsNpqnKeRPeKbrDjTZcdw2qdv8yYa QmS2YcaWxiprdGuXgvsNpqnKeRPeKbrDjTZcdw2qdv8yYa
$ ipfs files cp /ipfs/QmS2YcaWxiprdGuXgvsNpqnKeRPeKbrDjTZcdw2qdv8yYa /ipfs-files-test-2
Error: directory already has entry by that name

Actually, I cannot do that. In case that I definitely want to overwrite some files, I'll have to execute ipfs files rm first, and cannot overwrite it directly, as ipfs files write do. But I don't want to use ipfs files write, because for now it's produces different hashes that ipfs add, and don't allow to perform deduplication.

Tl;dr:

  1. ipfs add and ipfs files write probably should produce the same hashes, but it's doesn't.

  2. It should to be a bit more clearly explained in documentation, that ipfs files cp allows to copy existing hashes into mfs, not only files already written to mfs.

  3. ipfs files cp probably should have option to overwrite existing files, but it hasn't.

@rddaz2013
Copy link

rddaz2013 commented Dec 27, 2016

Actually, I cannot do that. In case that I definitely want to overwrite some files, I'll have to execute ipfs files rm first, and cannot overwrite it directly, as ipfs files write do. But I don't want to use ipfs files write, because for now it's produces different hashes that ipfs add, and don't allow to perform deduplication.

mhh...perhaps you get two hash's because of the unterlaying deduplication of the file with the same content but other filename? could it be that the second hash i only a link?

The underlying concept of ipfs makes it difficult to adopt the previous concepts for storing data easily.

@Kubuxu
Copy link

Kubuxu commented Dec 27, 2016

re. 1

The hashes are different because ipfs files write uses different linking structure than ipfs add. ipfs files write's linking structure is optimized for random seeking and writing after initial creation and ipfs add structure is optimized for reduction of link count.

Underling data is still deduplicated as they use the same chunking, AFAIK.

re. 2

I will try to improve that

re. 3

ipfs/kubo#2074

@rddaz2013
Copy link

re. 3

ipfs/kubo#2074

that would be a nice step....

@flyingzumwalt
Copy link
Contributor

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants