Use optional third argument as edge attribute. #901

npanj · 2014-05-28T08:23:57Z

This is probably easiest way to include edge attribute. Unless you guys have been thinking about some other ways? I also wanted to add one more argument for default edge attribute value..probably I will do that in another patch.

AmplabJenkins · 2014-05-28T08:27:58Z

Can one of the admins verify this patch?

rxin · 2014-05-28T18:05:05Z

This is definitely a good change.

rxin · 2014-05-28T18:05:08Z

Jenkins, test this please.

rxin · 2014-05-28T18:07:41Z

Upon further look, it might be better if we allow users to define the way to parse the 3rd argument to allow arbitrary data types. What do you think?

AmplabJenkins · 2014-05-28T18:07:58Z

Merged build triggered.

AmplabJenkins · 2014-05-28T18:08:05Z

Merged build started.

AmplabJenkins · 2014-05-28T18:09:51Z

Merged build finished.

AmplabJenkins · 2014-05-28T18:09:52Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15258/

npanj · 2014-05-28T23:41:50Z

Allowing arbitrary data types sounds like a good idea. I actually tried to do something like this:

def edgeListFile[[@specialized(Long, Int, Double) ED: ClassTag]](sc: SparkContext,
path: String,
canonicalOrientation: Boolean = false,
minEdgePartitions: Int = 1)
: Graph[Int, ED] =

and

if (lineArray.length >= 3) lineArray(2).asInstanceOf[ED]
else 1.asInstanceOf[ED]

But I am running into this error:

java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)

I guess I will have to handle primitive types separately?
(I am quite new to Scala) Are there any known patterns to handle casting? If you can point me to a code snippet within spark codebase(or else where ) that will be great.

Also do you guys follow any scala code style guide? ; so that I can follow that for future patches

ankurdave · 2014-05-31T11:29:35Z

Unlike Python (but like Java), Scala doesn't use asInstanceOf for arbitrary type conversions. In this case, it won't work to do "123".asInstanceOf[Int], because Scala doesn't know how to do the conversion. There are two ways to resolve this:

Have edgeListFile take a function that does the conversion:

def edgeListFile[ED: ClassTag](..., parser: String => ED) = {
  ... parser(lineArray(2)) ...
}
// Can be called like this:
GraphLoader.edgeListFile[Int](..., _.toInt)
GraphLoader.edgeListFile[String](..., identity)

This is simple, but it doesn't facilitate reuse of parsing functions. You always have to specify how to parse an Int, even though it's usually the same everywhere.

Define a library of standard parsers and use Scala implicits to select the right one automatically if it exists:

// `Parseable[T]` is a type class [1].  If it is defined for Foo (meaning
// there is an `implicit val` of type Parseable[Foo] in scope), it means you
// can parse any String into a Foo.
trait Parseable[T] { def parse(s: String): T }

// Here are some parsers. The user can define their own by creating
// additional implicit vals.
implicit val IntParser = new Parseable[Int] { def parse(s: String) = s.toInt }
implicit val StringParser = new Parseable[String] { def parse(s: String) = s }

// Instead of an explicit function parameter, we use a context bound, which
// lets us access the parser for ED using `implicitly[Parseable[ED]]`.
def edgeListFile[ED: ClassTag : Parseable](...) = {
  ... implicitly[Parseable[ED]].parse(lineArray(2)) ...
}
// Can be called without passing anything extra:
GraphLoader.edgeListFile[Int](...)
GraphLoader.edgeListFile[String](...)

[1] http://docs.scala-lang.org/tutorials/FAQ/finding-implicits.html#context_bounds

ankurdave · 2014-05-31T11:40:33Z

ok to test

AmplabJenkins · 2014-05-31T11:42:59Z

Merged build triggered.

AmplabJenkins · 2014-05-31T11:43:04Z

Merged build started.

AmplabJenkins · 2014-05-31T11:44:30Z

Merged build finished.

AmplabJenkins · 2014-05-31T11:44:31Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15320/

AmplabJenkins · 2014-06-01T18:22:58Z

Merged build triggered.

AmplabJenkins · 2014-06-01T18:23:04Z

Merged build started.

AmplabJenkins · 2014-06-01T18:24:44Z

Merged build finished.

AmplabJenkins · 2014-06-01T18:24:45Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15329/

AmplabJenkins · 2014-06-02T00:17:59Z

Merged build triggered.

AmplabJenkins · 2014-06-02T00:18:09Z

Merged build started.

AmplabJenkins · 2014-06-02T00:19:07Z

Merged build finished.

AmplabJenkins · 2014-06-02T00:19:07Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15337/

AmplabJenkins · 2014-06-02T01:02:58Z

Merged build triggered.

AmplabJenkins · 2014-06-02T01:03:05Z

Merged build started.

AmplabJenkins · 2014-06-02T01:04:39Z

Merged build finished.

AmplabJenkins · 2014-06-02T01:04:39Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15340/

npanj · 2014-06-02T23:14:59Z

@ankurdave

thanks for explanation about parsable and use of implicit.(I will try to implement it)
about storage level of the vertex attributes, yes I was trying to see if it can be changed to handle memory pressure; but that didn't work. From your patch it seems that it is more complicated than what I thought initially. Any reason why this patch was not submitted? Also I believe some parts of this patch will also be applicable to SPARK-1991?
(actually, I didn't mean to consider serialization changes for this pull request... but I guess everything from forked branch is pulled as pull request?; I haven't submitted patched on github before, so I was not familiar about it...)

pwendell · 2014-09-02T01:15:44Z

@ankurdave @rxin could you guys come to a decision on this one way or the other? Also @npanj mind adding [GRAPHX] to the title here? Right now this is getting sorted with the Spark core patches, which might be why people haven't seen it.

ankurdave · 2014-09-02T21:20:14Z

@pwendell @npanj @rxin This isn't mergeable at the moment because it doesn't parse the edge attributes correctly. Since it's difficult to do this parsing in general, there are two choices for now:

Modify the PR to accept only Int edge attributes (i.e., remove ED). This would help users who just want Int edge attributes, but it introduces a new special-case public API that we will need to support in the future.
Close the PR for now, and for users who want edge attributes, encourage them to implement their own custom version of GraphLoader.

I'd suggest closing this for now.

SparkQA · 2014-09-05T23:46:29Z

Can one of the admins verify this patch?

ankurdave · 2014-09-12T05:44:05Z

@npanj Mind closing this?

npanj · 2014-10-09T20:56:00Z

Sorry was out of loop for a while... closing it

* [CARMEL-5873] Upgrade Parquet to 1.12.2 (#896) * [SPARK-36726] Upgrade Parquet to 1.12.1 ### What changes were proposed in this pull request? Upgrade Apache Parquet to 1.12.1 ### Why are the changes needed? Parquet 1.12.1 contains the following bug fixes: - PARQUET-2064: Make Range public accessible in RowRanges - PARQUET-2022: ZstdDecompressorStream should close `zstdInputStream` - PARQUET-2052: Integer overflow when writing huge binary using dictionary encoding - PARQUET-1633: Fix integer overflow - PARQUET-2054: fix TCP leaking when calling ParquetFileWriter.appendFile - PARQUET-2072: Do Not Determine Both Min/Max for Binary Stats - PARQUET-2073: Fix estimate remaining row count in ColumnWriteStoreBase - PARQUET-2078: Failed to read parquet file after writing with the same In particular PARQUET-2078 is a blocker for the upcoming Apache Spark 3.2.0 release. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests + a new test for the issue in SPARK-36696 Closes #33969 from sunchao/upgrade-parquet-12.1. Authored-by: Chao Sun <[email protected]> Signed-off-by: DB Tsai <[email protected]> (cherry picked from commit a927b08) * [SPARK-34542][BUILD] Upgrade Parquet to 1.12.0 ### What changes were proposed in this pull request? Parquet 1.12.0 New Feature - PARQUET-41 - Add bloom filters to parquet statistics - PARQUET-1373 - Encryption key management tools - PARQUET-1396 - Example of using EncryptionPropertiesFactory and DecryptionPropertiesFactory - PARQUET-1622 - Add BYTE_STREAM_SPLIT encoding - PARQUET-1784 - Column-wise configuration - PARQUET-1817 - Crypto Properties Factory - PARQUET-1854 - Properties-Driven Interface to Parquet Encryption Parquet 1.12.0 release notes: https://github.com/apache/parquet-mr/blob/apache-parquet-1.12.0/CHANGES.md ### Why are the changes needed? - Bloom filters to improve filter performance - ZSTD enhancement ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit test. Closes #31649 from wangyum/SPARK-34542. Lead-authored-by: Yuming Wang <[email protected]> Co-authored-by: Yuming Wang <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit cbffc12) Co-authored-by: Chao Sun <[email protected]> * [HADP-44647] Parquet file based kms client for encryption keys (#897) * [HADP-44647] Parquet file based kms client for encryption keys (#82) create/write parquet encryption table. ``` set spark.sql.parquet.encryption.key.file=/path/to/key/file; create table parquet_encryption(a int, b int, c int) using parquet options ( 'parquet.encryption.column.keys' 'columnKey1: a, b; columnKey2: c', 'parquet.encryption.footer.key' 'footerKey'); ``` read parquet encryption table; ``` set spark.sql.parquet.encryption.key.file=/path/to/key/file; select ... from parquet_encryption ... ``` Will raise another pr for default footerKey. * [HADP-44647][FOLLOWUP] Reuse the kms instance for same key file (#84) * Fix Co-authored-by: fwang12 <[email protected]> Co-authored-by: Chao Sun <[email protected]> Co-authored-by: fwang12 <[email protected]>

Use optional third argument as edge attribute.

c971165

npanj added 2 commits May 28, 2014 12:05

Make edge attributed type generic

86dc34e

Made edge attribute type generic and updated signature in the callers

f000863

npanj added 2 commits June 1, 2014 11:18

Merge remote-tracking branch 'upstream/master'

83b4a4a

Merge branch 'master' of https://github.com/npanj/spark

440b0e1

npanj added 2 commits June 1, 2014 16:50

make vertex serializable

68d7529

Make vertex Serializable so that I can be used with _SER storage

8af9e69

Register VertexRDD

3bb103f

npanj closed this Oct 9, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use optional third argument as edge attribute. #901

Use optional third argument as edge attribute. #901

npanj commented May 28, 2014

AmplabJenkins commented May 28, 2014

rxin commented May 28, 2014

rxin commented May 28, 2014

rxin commented May 28, 2014

AmplabJenkins commented May 28, 2014

AmplabJenkins commented May 28, 2014

AmplabJenkins commented May 28, 2014

AmplabJenkins commented May 28, 2014

npanj commented May 28, 2014

ankurdave commented May 31, 2014

ankurdave commented May 31, 2014

AmplabJenkins commented May 31, 2014

AmplabJenkins commented May 31, 2014

AmplabJenkins commented May 31, 2014

AmplabJenkins commented May 31, 2014

AmplabJenkins commented Jun 1, 2014

AmplabJenkins commented Jun 1, 2014

AmplabJenkins commented Jun 1, 2014

AmplabJenkins commented Jun 1, 2014

AmplabJenkins commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

npanj commented Jun 2, 2014

pwendell commented Sep 2, 2014

ankurdave commented Sep 2, 2014

SparkQA commented Sep 5, 2014

ankurdave commented Sep 12, 2014

npanj commented Oct 9, 2014

Use optional third argument as edge attribute. #901

Use optional third argument as edge attribute. #901

Conversation

npanj commented May 28, 2014

AmplabJenkins commented May 28, 2014

rxin commented May 28, 2014

rxin commented May 28, 2014

rxin commented May 28, 2014

AmplabJenkins commented May 28, 2014

AmplabJenkins commented May 28, 2014

AmplabJenkins commented May 28, 2014

AmplabJenkins commented May 28, 2014

npanj commented May 28, 2014

ankurdave commented May 31, 2014

ankurdave commented May 31, 2014

AmplabJenkins commented May 31, 2014

AmplabJenkins commented May 31, 2014

AmplabJenkins commented May 31, 2014

AmplabJenkins commented May 31, 2014

AmplabJenkins commented Jun 1, 2014

AmplabJenkins commented Jun 1, 2014

AmplabJenkins commented Jun 1, 2014

AmplabJenkins commented Jun 1, 2014

AmplabJenkins commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

AmplabJenkins commented Jun 2, 2014

npanj commented Jun 2, 2014

pwendell commented Sep 2, 2014

ankurdave commented Sep 2, 2014

SparkQA commented Sep 5, 2014

ankurdave commented Sep 12, 2014

npanj commented Oct 9, 2014