Releases: elastacloud/spark-excel
v0.1.13
Welcome to the latest release of the Spark Excel reader. This release brings a few changes and bug-fixes in with it including.
- New parser option to disable formula evaluation. When disabled the formula itself is extracted from the sheet rather than being evaluated.
- Fix for empty files. Now when a file is parsed that contains only the header record a single null record is returned instead of raising an error
- Support for the latest Spark 3.4.x and 3.5.x releases
- Updates to the Apache POI library and other dependencies
Another thanks goes out to @josecsotomorales for his help in this release
v0.1.12
It's been a while longer this time but we're back with the 0.1.12 release of the Excel data source for Apache Spark. A big thanks to @josecsotomorales who has contributed to this release.
This release introduces the following changes
- Update Apache POI to 5.2.3, bring in support for new functions and features
- Add the ability for users to specify values which should be treated as
null
(e.g. "N/A") - Handle Log4J conflicts across Spark versions
- Spark 3.4.1 support
- Spark 3.5.0 support
We're always looking for people to help contibute, from code changes to feature suggestions, so if you feel like you can contribute then feel free to join in.
v0.1.11
This release brings in support for Spark 3.4.0 (the latest version of the 3.4.x series at the time of release) along with a new feature to provide a per-row flag indicating if the row matches the provided or inferred schema.
- Support for Spark 3.4
schemaMatchColumnName
option for indicating if each row matches the schema- Update of Spark versions to match releases available from Apache Spark, and the versions supported by supported Databricks Runtimes and Azure Synapse Analytics.
N.B. When using Databricks, please check the Spark version used by the runtime to ensure the correct version of the package is used.
v0.1.10
v0.1.9
Minor release with a couple of significant updates.
- Update scalatest to 3.2.11
- Update Apache POI to version 5.2.2
- Update Apache Commons IO to 2.11.0
- Introduce new option for
maxBytesForTempFiles
- Resolves issue for loading larger files by setting the maximum bytes before creating temp files
- Defaults the option to 100_000_000
- Can be overridden with options
v0.1.8
Upgrades to version 5.2.0 of the Apache POI library, bringing general improvements and support for additional formula types such as CONCAT.
Brings in support for additional types in user-defined schemas, reducing the need to cast data once it's been read
Brings in Spark version 3.1.3 and 3.2.1 to the build profiles.
The JAR is a little larger as additional Apache libraries need to be included to support the updated POI library. In addition, Azure Synapse uses an older version of commons-io which doesn't include features required by the updated POI library.
v0.1.7
v0.1.6
With no new ideas turning up in the last couple of weeks I've promoted the 0.1.6-SNAPSHOT release to final. This release includes the following changes.
- Add Spark 3.0.3
- Fix for cells which contain data that does not match the target schema
- Better error reporting for parser option, suggesting which option the user might have meant if it has been mis-spelt
- Some code cleanup
v0.1.6-SNAPSHOT
Snapshot release for 0.1.6, this includes
- Fix for cells which contain data that does not match the target schema
- Better error reporting for parser option, suggesting which option the user might have meant if it has been mis-spelt
- Some code cleanup