Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve test coverage for VectorsCombiner and make vector aggregator efficient #168

Merged
merged 6 commits into from
Nov 1, 2018

Conversation

tovbinm
Copy link
Collaborator

@tovbinm tovbinm commented Oct 27, 2018

Related issues

  1. UnionVector monoid aggregator is not efficient for sparse vectors.
  2. VectorsCombiner stage has incomplete functional test coverage.

Describe the proposed solution

  1. Moved VectorsCombiner.combine method to RichVector object
  2. Added OpEstimatorSpec base class to VectorsCombinerTest to improve the functional test coverage
  3. Added +, -, dot,combine methods to OPVector + test OPVectorTest
  4. Added combine method to RichVector + tests
  5. Renamed UnionVector monoid aggregator to CombineVector and made use of RichVector.combine at reduve
  6. Bonus: added SumVector monoid aggregator

Describe alternatives you've considered
I could relocate VectorsCombiner.combine as-is to utils subproject, but it felt valuable enough to add +, -,dot and combine to OPVector as well.

@codecov
Copy link

codecov bot commented Oct 27, 2018

Codecov Report

Merging #168 into master will increase coverage by 5.39%.
The diff coverage is 96.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #168      +/-   ##
==========================================
+ Coverage   80.84%   86.24%   +5.39%     
==========================================
  Files         304      304              
  Lines        9920     9931      +11     
  Branches      334      549     +215     
==========================================
+ Hits         8020     8565     +545     
+ Misses       1900     1366     -534
Impacted Files Coverage Δ
...s/impl/feature/OPCollectionHashingVectorizer.scala 96.37% <100%> (ø) ⬆️
...e/op/stages/impl/feature/SmartTextVectorizer.scala 97.4% <100%> (ø) ⬆️
...scala/com/salesforce/op/aggregators/OPVector.scala 100% <100%> (ø) ⬆️
...p/stages/impl/feature/SmartTextMapVectorizer.scala 100% <100%> (ø) ⬆️
...la/com/salesforce/op/features/types/OPVector.scala 100% <100%> (ø) ⬆️
...orce/op/aggregators/MonoidAggregatorDefaults.scala 100% <100%> (ø) ⬆️
...ala/com/salesforce/op/utils/spark/RichVector.scala 100% <100%> (ø) ⬆️
...mpl/feature/DecisionTreeNumericMapBucketizer.scala 93.87% <100%> (ø) ⬆️
...force/op/stages/impl/feature/VectorsCombiner.scala 92.3% <66.66%> (-2.94%) ⬇️
...com/salesforce/op/utils/stages/FitStagesUtil.scala 94.73% <0%> (+1.31%) ⬆️
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0ad3c28...750a376. Read the comment docs.

* @throws IllegalArgumentException if the vectors have different sizes
* @return vector subtraction
*/
def -(that: OPVector): OPVector = (value - that.value).toOPVector
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as you are adding addition and subtraction why not dot product and cross product?

* @return result vector
*/
def combine(that: Vector, other: Vector*): Vector =
com.salesforce.op.utils.spark.RichVector.combine(v +: that +: other)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need the full import here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to distinguish the object from the implicit class.

for {(v1, v2) <- vectors.zip(vectors)} {
(v1 - v2) shouldBe (v1.value - v2.value).toOPVector
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe tests for the errors when the vectors have different lengths?

@tovbinm
Copy link
Collaborator Author

tovbinm commented Oct 31, 2018

@leahmcguire done

Copy link
Collaborator

@leahmcguire leahmcguire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tovbinm tovbinm merged commit ab521f8 into master Nov 1, 2018
@tovbinm tovbinm deleted the mt/vectors-combiner branch November 1, 2018 00:20
ericwayman pushed a commit that referenced this pull request Feb 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants