Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for persisting tuples of Persisters does not share computations #155

Closed
blever opened this issue Oct 11, 2012 · 4 comments
Closed
Assignees
Milestone

Comments

@blever
Copy link
Contributor

blever commented Oct 11, 2012

A call to persist can now accept many different flavours of inputs:

  • A tuple of DListPersisters and DObjects, e.g.:
val xs: DList[T] = ...
val y: DObject[U] = ...
val z: DObject[V] = ...
val (_, yv: U, zv: V) = persist(toTextFile(xs, "hdfs://..."), y, z)
  • A sequence of DListPersisters (of the same type), e.g.:
val ass: Seq[DList[T]] = ...
persist(ass.zipWithIndex map { case (as, ix) => toTextFile(as, "hdfs://...." + ix) })
  • A sequence of DObjects (of the same type), e.g.:
val bs: Seq[DObject[T]] = ...
val cs: Seq[T] = persist(bs)
  • Tuple combinations of the above 3, e.g.:
val xs: DList[T] = ...
val y: DObject[U] = ...
val z: DObject[V] = ...
val ass: Seq[DList[T]] = ...
val bs: Seq[DObject[T]] = ...

val ((_, yv: U, zv: V), _, cs: Seq[T]) =
  persist(
    (toTextFile(xs, "hdfs://..."), y, z),
    ass.zipWithIndex map { case (as, ix) => toTextFile(as, "hdfs://...." + ix) },
    bs)

There is an issue in the way that the final mentioned flavour ("tuple combinations of the above") where computations will not be shared between each tuple element. Within each tuple element (i.e. the "above 3") computations will be shared.

This should be fixed such that a call to persist will alway share any computations from the single graph constructed from all outputs specified.

@ghost ghost assigned etorreborre Oct 11, 2012
@etorreborre
Copy link
Collaborator

I think that the case can be reduced to the fact that:

persist((y, z)) creates 1 Hadoop job
persist(Seq(y, z)) creates 2 Hadoop jobs

My implementation of Persister[Seq[T]] might compile ok but not do the right thing

@blever
Copy link
Contributor Author

blever commented Oct 14, 2012

That's right. I'd hold off doing anything about it right now as I'm refactoring that code a little bit as part of the fast local mode integration.

@etorreborre
Copy link
Collaborator

This issue will be solved as part of a larger refactoring of the Persisting API.

@etorreborre
Copy link
Collaborator

see #172

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants