Skip to content
This repository has been archived by the owner on Mar 14, 2019. It is now read-only.

[1.0.0] Rework attachData #239

Closed
4 tasks
raix opened this issue Mar 31, 2014 · 12 comments
Closed
4 tasks

[1.0.0] Rework attachData #239

raix opened this issue Mar 31, 2014 · 12 comments

Comments

@raix
Copy link

raix commented Mar 31, 2014

@aldeed The data package depends on fs on the server-side, but I don't think we have to. Some ideas:

  • the attachData should simply set data in the FS.File object
  • When inserting into collection we should handle the data by having dataHandlers that streams directly to the FS.TempStore.createWriteStream(fileObj);
  • the dataHandler should be able to assist filling out name, size and type
  • If file is already mounted the attachData should trigger the streaming to TempStore and at some point on the client it should trigger upload (maybe Meteor.createWriteStream('fs.upload', fsObj))

dataHandlers simply have a createReadStream, size and type property e.g.:
ReadStream
var data = FS.Data.ReadStream(readStream, { length:, type:, name: });

  • data.createReadStream();
  • data.size()
  • data.type()
  • data.name()

These simply utilize the FS.Data.ReadStream api:
RemoteUrl
FS.Data.RemoteUrl(url, { [name: ]})

Buffer
FS.Data.Buffer(buffer, { name:, type: })

FileSystem
FS.Data.FileSystem(path, { [name] })

At some point when we have client-side streams we can do basically the same on the client too,

@aldeed
Copy link
Contributor

aldeed commented Mar 31, 2014

The data package depends on fs on the server-side

How does it depend on fs?

the attachData should simply set data in the FS.File object

That is basically exactly what it does if I remember correctly.

What problem does your API change solve? It seems more confusing to me, but maybe I just haven't understood the brilliance yet.

@raix
Copy link
Author

raix commented Mar 31, 2014

ok, I take it as a compliment that you assume theres some brilliance hiding in there - I can't promise you that :) I'll just add some notes:

How does it depend on fs?

fsData server - but that kinda ok - since thats used to load data from the filesystem - Me not thinking..

I would like it to be streaming instead of using buffers (it seems to be - just skimmed the code roughly)

What problem does your API change solve? It seems more confusing to me, but maybe I just haven't understood the brilliance yet.

Yes, well I think the point is basically to use streaming when ever possible instead of buffers.

The Http.request response is a readstream, the filesystem got createReadStream, we can also pass in a read stream directly - the only one missing is if we pass in a buffer - but its easy to create a read stream for that. The readStream can be passed directly into the TempStore.createWriteStream(fileObj);

Thats the primary reason to out factor into smaller functions with basically the same api, and use streaming? The api can be used to store data on insert or attachData (with mounted file)

The its more about taking the concept of dataSources with their separate api, eg. the FS.Data.prototype.size is complex why I'm thinking that it should perhaps be refactored into methods specific to the dataSources - this way we can isolate the code better, and create tests pr. dataSource etc.

So the attachData should just detect what dataSource type its going to handle, then set the fileObj.dataSource = new FS.Data.RemoteUrl(url); if it has detected a url

The insert and the attachData it self could use the dataSource api to get size / type etc. for updating the fileRecord, then finally fileObj.dataSource.createReadStream().pipe(FS.TempStore.createWriteStream(fileObj)); to store the data?

I have only skimmed the code - Its just some ideas I got after you made the attachData + FS.Data code.

@aldeed
Copy link
Contributor

aldeed commented Mar 31, 2014

I might need to re-read a few times, but I think what you're saying is exactly how it already does work. When I wrote it, my intention was to avoid buffers internally, but allow you to get one if you need it. Example:

I attach a filepath on the server. When I do this, FS.Data:

  • Sets the filepath in filepath property.
  • Sets the type in type property, figuring it out automatically from the extension if you don't supply it.

So no buffers involved. Then at some later point we call myData.createReadStream(), which actually just passes through to fs.createReadStream(self.filepath). In other words, we are streaming right from the filesystem.

On the other hand, if I later want a Buffer, I call myData.getBuffer() and only then do we read the file into a Buffer because the user requested it. Similarly, if I call myData.getDataUri(), then we read the file into a Buffer, convert it to base64, and return a data URI.

But in a pure streaming example, like all of the CFS internal code, there are no Buffers involved.

It's the same when you pass in a URL. We simply store the URL, type, size, and then later stream directly from the remote URL when you call myData.createReadStream(). No Buffers involved unless you request one.

FS.Data could be moved to a cfs-data package. It doesn't have any dependencies on FS.File as far as I remember, so you could potentially use it to attach arbitrary data to anything and then get that data back in a variety of formats.

@aldeed
Copy link
Contributor

aldeed commented Mar 31, 2014

Oh, and regarding separating out the handling of each type, I remember considering that but I thought there would be too much duplicate code? Although maybe you're right it would make testing easier.

@aldeed
Copy link
Contributor

aldeed commented Mar 31, 2014

Really my only problem with your initial proposal at the top of this issue is this: It would put some of the data handling back in FS.File code. I'd prefer for FS.File to be unaware of the attached data as much as possible. That is what's nice about have a separate FS.Data package. It can provide a simple API for "data in" and "data out" in a variety of formats. And FS.Data, too, would be unaware of what an FS.File is.

But I do like the idea of separating the type handling the more I think about it. So we should do that, but only internally within the FS.Data code. The external API for FS.Data can remain the same. In other words, we'll still call new FS.Data(anything) and the FS.Data constructor will actually return a reference to new FS.Data.ReadStream() if anything is detected to be a stream.

@aldeed
Copy link
Contributor

aldeed commented Apr 1, 2014

OK, @raix, I factored out FS.Data into a separate package:

https://github.com/CollectionFS/Meteor-data-man

I took it off the FS namespace so that there is no CFS dependency at all. Could be used to do data manipulation by itself.

I also refactored the server code as you suggested, separating into DataMan.URL, etc., but that is all essentially an internal API. Everything is still handled through a single new DataMan(anything), which passes through to the specific methods.

Tests are all passing still. Let me know if you think this is a good way to do it. I can remove the FS.Data code from cfs-file and have it depend on the data-man package instead. (Not sure about the DataMan name -- best I could come up with at the moment. It's not published to atmosphere yet so we could still change it.)

@raix
Copy link
Author

raix commented Apr 1, 2014

Ok, sorry, the out factored code looks good, regarding Naming i was thinking the superDataMan, maybe just dataSources or dataman :) good idea about adding the package, we could map dataman on FS.Data.
I'll dig deeper into the code, ideas:

  1. DataUrl should not be stored on self, only buffer in the dataman.buffer (mem usage)
  2. The dataUri dataSource should convert to buffer and extract type, then return new dataman buffer
var regex = /^data:.+\/(.+);base64,(.*)$/;

var matches = string.match(regex);
var ext = matches[1];
var data = matches[2];
var buffer = new Buffer(data, 'base64');
  1. The dataurl can use head to get type and size, if not fallback to buffer load
  2. Maybe have a util function converting readStream to buffer

Btw. The FS.Utility maps some of the underscore eg. Each etc. So it's easier to switch to lodash etc. And packages dont have to use the underscore package directly.

@aldeed
Copy link
Contributor

aldeed commented Apr 1, 2014

Good ideas. I cached the dataUri etc. on self thinking of making repeated calls more efficient, but yes that is at the expense of extra data in memory so we could remove that.

Usings regex is nice, too.

The underscore dependency is there for now just because I used _.bind. We can replace with native bind code.

@raix
Copy link
Author

raix commented Apr 1, 2014

Cool, I think to remember that the node Buffer is handled special in node (like blob in client) + using size on dataUrl would cause data to be in both self.buffer and self.dataUrl + we can reuse all the buffer code.

ok, I see bind is the only dependency - yep, good idea its not too hard to do native bind,

@aldeed: I'm decoupling a bit this week, got some heavy deadlines, I'll be back in a weeks time

@aldeed
Copy link
Contributor

aldeed commented Apr 5, 2014

FYI, I've pushed some improvements to data-man. All of your suggestions from above plus limited support for managing a readstream, plus a couple other improvements.

@aldeed
Copy link
Contributor

aldeed commented Apr 7, 2014

I think this issue could be closed now except for one of your suggestions not yet done:

If file is already mounted the attachData should trigger the streaming to TempStore and at some point on the client it should trigger upload (maybe Meteor.createWriteStream('fs.upload', fsObj))

I'd have to look at the code more to see if that is a good idea.

@aldeed
Copy link
Contributor

aldeed commented May 30, 2014

Closing. Will move last point to a new issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants