0.4.0
New features and bug fixes:
- Allow to specify the formula to compute the text features bin size for
RawFeatureFilter
(seeRawFeatureFilter.textBinsFormula
argument) #99 - Fixed metadata on
Geolocation
andGeolocationMap
so that keep the name of the column in descriptorValue. #100 - Local scoring (aka Sparkless) using Aardpfark. This enables loading and scoring models without Spark context but locally using Aardpfark (PFA for Spark) and Hadrian libraries instead. This allows orders of magnitude faster scoring times compared to Spark. #41
- Add distributions calculated in
RawFeatureFilter
toModelInsights
#103 - Added binary sequence transformer & estimator:
BinarySequenceTransformer
andBinarySequenceEstimator
+ plus the associated base traits #84 - Added
StringIndexerHandleInvalid.Keep
option intoOpStringIndexer
(same as in underlying Spark estimator) #93 - Allow numbers and underscores in feature names #92
- Stable key order for map vectorizers #88
- Keep raw feature distributions calculated in raw feature filter #76
- Transmogrify to use smart text vectorizer for text types:
Text
,TextArea
,TextMap
andTextAreaMap
#63 - Transmogrify circular date representations for date feature types:
Date
,DateTime
,DateMap
andDateTimeMap
#100 - Improved test coverage for utils and other modules #50, #53, #67, #69, #70, #71, #72, #73
- Match feature type map hierarchy with regular feature types #49
- Redundant and deadlock-prone end listener removal #52
- OS-neutral filesystem path creation #51
- Make Feature class public instead hide it's ctor #45
- Specify categorical variables in metadata #120
- Fix fill geo location vectorizer values #132
- Adding feature importance for new model types #128
- Adding binaryclassification bin score evaluator #119
- Apply DateToUnitCircleTransformer logic in raw feature filter transformations 130#
Breaking changes:
- Made case class to deal with model selector metadata #39
- Made
FileOutputCommiter
a default and got rid ofDirectMapreduceOutputCommitter
andDirectOutputCommitter
#86 - Refactored
OpVectorColumnMetadata
to allow numeric column descriptors #89 - Renaming
JaccardDistance
toJaccardSimilarity
#80 - New model selector interface #55. The breaking changes are related to return type and the way the parameters are passed into model selectors. Starting this version model selectors would return a single result feature of type
Prediction
(instead of a variable number of feature -(pred, raw, prob)
). Example:
val (pred, raw, prob) = MultiClassificationModelSelector() // won't compile anymore
val prediction = MultiClassificationModelSelector() // ok!
Another change is the way parameters are passed into model selectors. Example:
BinaryClassificationModelSelector
.withCrossValidation()
.setLogisticRegressionRegParam(0.05, 0.1) // won't compile anymore
Instead one should do:
val lr = new OpLogisticRegression()
val models = Seq(lr -> new ParamGridBuilder().addGrid(lr.regParam, Array(0.05, 0.1)).build())
BinaryClassificationModelSelector
.withCrossValidation(modelsAndParameters = models)
For more example on how to use new model selectors please refer to our documentation and helloworld examples.
Dependency upgrades & misc:
- CI/CD runtime improvements for CircleCI and TravisCI
- Updated Gradle to 4.10
- Updated
scala-graph
to1.12.5
- Updated
scalafmt
to1.5.1
- New
transmogrifai-local
subproject #41 introducesaardpfark
andhadrian
dependencies.