This release modifies the handling of transport parameters for the S3 back-end in a backwards-incompatible way. See the migration docs for details.
- Refactor S3, replace high-level resource/session API with low-level client API (PR #583, @mpenkov)
- Fix potential infinite loop when reading from webhdfs (PR #597, @traboukos)
- Add timeout parameter for http/https (PR #594, @dustymugs)
- Remove
tests
directory from package (PR #589, @e-nalepa)
- Support tell() for text mode write on s3/gcs/azure (PR #582, @markopy)
- Implement option to use a custom buffer during S3 writes (PR #547, @mpenkov)
- Correctly pass boto3 resource to writers (PR #576, @jackluo923)
- Improve robustness of S3 reading (PR #552, @mpenkov)
- Replace codecs with TextIOWrapper to fix newline issues when reading text files (PR #578, @markopy)
- Refactor
s3
submodule to minimize resource usage (PR #569, @mpenkov) - Change
download_as_string
todownload_as_bytes
ingcs
submodule (PR #571, @alexandreyc)
- Exclude
requests
frominstall_requires
dependency list. If you need it, usepip install smart_open[http]
orpip install smart_open[webhdfs]
.
- Fix reading empty file or seeking past end of file for s3 backend (PR #549, @jcushman)
- Fix handling of rt/wt mode when working with gzip compression (PR #559, @mpenkov)
- Bump minimum Python version to 3.6 (PR #562, @mpenkov)
This release modifies the behavior of setup.py with respect to dependencies.
Previously, boto3
and other AWS-related packages were installed by default.
Now, in order to install them, you need to run either:
pip install smart_open[s3]
to install the AWS dependencies only, or
pip install smart_open[all]
to install all dependencies, including AWS, GCS, etc.
- Include S3 dependencies by default, because removing them in the 2.2.0 minor release was a mistake.
This release modifies the behavior of setup.py with respect to dependencies.
Previously, boto3
and other AWS-related packages were installed by default.
Now, in order to install them, you need to run either:
pip install smart_open[s3]
to install the AWS dependencies only, or
pip install smart_open[all]
to install all dependencies, including AWS, GCS, etc.
Summary of changes:
- Correctly pass
newline
parameter to built-inopen
function (PR #478, @burkovae) - Remove boto as a dependency (PR #523, @isobit)
- Performance improvement: avoid redundant GetObject API queries in s3.Reader (PR #495, @jcushman)
- Support installing smart_open without AWS dependencies (PR #534, @justindujardin)
- Take object version into account in
to_boto3
method (PR #539, @interpolatio)
Functionality on the left hand side will be removed in future releases. Use the functions on the right hand side instead.
smart_open.s3_iter_bucket
→smart_open.s3.iter_bucket
- Bypass unnecessary GCS storage.buckets.get permission (PR #516, @gelioz)
- Allow SFTP connection with SSH key (PR #522, @rostskadat)
- Azure storage blob support (@nclsmitchell and @petedannemann)
- Correctly pass
newline
parameter to built-inopen
function (PR #478, @burkovae) - Ensure GCS objects always have a .name attribute (PR #506, @todor-markov)
- Use exception chaining to convey the original cause of the exception (PR #508, @cool-RR)
- This version supports Python 3 only (3.5+).
- If you still need Python 2, install the smart_open==1.10.1 legacy release instead.
- Prevent smart_open from writing to logs on import (PR #476, @mpenkov)
- Modify setup.py to explicitly support only Py3.5 and above (PR #471, @Amertz08)
- Include all the test_data in setup.py (PR #473, @sikuan)
- This is the last version to support Python 2.7. Versions 1.11 and above will support Python 3 only.
- Use only if you need Python 2.
- Add missing boto dependency (Issue #468)
- Fix GCS multiple writes (PR #421, @petedannemann)
- Implemented efficient readline for ByteBuffer (PR #426, @mpenkov)
- Fix WebHDFS read method (PR #433, @mpenkov)
- Make S3 uploads more robust (PR #434, @mpenkov)
- Add pathlib monkeypatch with replacement of
pathlib.Path.open
(PR #436, @menshikh-iv) - Fix error when calling str() or repr() on GCS SeekableBufferedInputBase (PR #442, @robcowie)
- Move optional dependencies to extras (PR #454, @Amertz08)
- Correctly handle GCS paths that contain '?' char (PR #460, @chakruperitus)
- Make our doctools submodule more robust (PR #467, @mpenkov)
Starting with this release, you will have to run:
pip install smart_open[gcs] to use the GCS transport.
In the future, all extra dependencies will be optional. If you want to continue installing all of them, use:
pip install smart_open[all]
See the README.rst for details.
- Various webhdfs improvements (PR #383, @mrk-its)
- Fixes "the connection was closed by the remote peer" error (PR #389, @Gapex)
- allow use of S3 single part uploads (PR #400, @adrpar)
- Add test data in package via MANIFEST.in (PR #401, @jayvdb)
- Google Cloud Storage (GCS) (PR #404, @petedannemann)
- Implement to_boto3 function for S3 I/O. (PR #405, @mpenkov)
- enable smart_open to operate without docstrings (PR #406, @mpenkov)
- Implement object_kwargs parameter (PR #411, @mpenkov)
- Remove dependency on old boto library (PR #413, @mpenkov)
- implemented efficient readline for ByteBuffer (PR #426, @mpenkov)
- improve buffering efficiency (PR #427, @mpenkov)
- fix WebHDFS read method (PR #433, @mpenkov)
- Make S3 uploads more robust (PR #434, @mpenkov)
- Add version_id transport parameter for fetching a specific S3 object version (PR #325, @interpolatio)
- Document passthrough use case (PR #333, @mpenkov)
- Support seeking over HTTP and HTTPS (PR #339, @interpolatio)
- Add support for rt, rt+, wt, wt+, at, at+ methods (PR #342, @interpolatio)
- Change VERSION to version.py (PR #349, @mpenkov)
- Adding howto guides (PR #355, @mpenkov)
- smart_open/s3: Initial implementations of str and repr (PR #359, @ZlatSic)
- Support writing any bytes-like object to S3. (PR #361, @gilbsgilbs)
- Don't use s3 bucket_head to check for bucket existence (PR #315, @caboteria)
- Dont list buckets in s3 tests (PR #318, @caboteria)
- Use warnings.warn instead of logger.warning (PR #321, @mpenkov)
- Optimize reading from S3 (PR #322, @mpenkov)
- Improve S3 read performance by not copying buffer (PR #284, @aperiodic)
- accept bytearray and memoryview as input to write in s3 submodule (PR #293, @bmizhen-exos)
- Fix two S3 bugs (PR #307, @mpenkov)
- Minor fixes: bz2file dependency, paramiko warning handling (PR #309, @mpenkov)
- improve unit tests (PR #310, @mpenkov)
- Removed dependency on lzma (PR #262, @tdhopper)
- backward compatibility fixes (PR #294, @mpenkov)
- Minor fixes (PR #291, @mpenkov)
- Fix #289: the smart_open package now correctly exposes a
__version__
attribute - Fix #285: handle edge case with question marks in an S3 URL
This release rolls back support for transparently decompressing .xz files, previously introduced in 1.8.1. This is a useful feature, but it requires a tricky dependency. It's still possible to handle .xz files with relatively little effort. Please see the README.rst file for details.
- Added support for .xz / lzma (PR #262, @vmarkovtsev)
- Added streaming HTTP support (PR #236, @handsomezebra)
- Fix handling of "+" mode, refactor tests (PR #263, @vmarkovtsev)
- Added support for SSH/SCP/SFTP (PR #58, @val314159 & @mpenkov)
- Added new feature: compressor registry (PR #266, @mpenkov)
- Implemented new
smart_open.open
function (PR #268, @mpenkov)
This new function replaces smart_open.smart_open
, which is now deprecated.
Main differences:
- ignore_extension → ignore_ext
- new
transport_params
dict parameter to contain keyword parameters for the transport layer (S3, HTTPS, HDFS, etc).
Main advantages of the new function:
- Simpler interface for the user, less parameters
- Greater API flexibility: adding additional keyword arguments will no longer require updating the top-level interface
- Better documentation for keyword parameters (previously, they were documented via examples only)
The old smart_open.smart_open
function is deprecated, but continues to work as previously.
- Add
python3.7
support (PR #240, @menshikh-iv) - Add
http/https
schema correctly (PR #242, @gliv) - Fix url parsing for
S3
(PR #235, @rileypeterson) - Clean up
_parse_uri_s3x
, resolve edge cases (PR #237, @mpenkov) - Handle leading slash in local path edge case (PR #238, @mpenkov)
- Roll back README changes (PR #239, @mpenkov)
- Add example how to work with Digital Ocean spaces and boto profile (PR #248, @navado & @mpenkov)
- Fix boto fail to load gce plugin (PR #255, @menshikh-iv)
- Drop deprecated
sudo
from travis config (PR #256, @cclauss) - Raise
ValueError
if s3 key does not exist (PR #245, @adrpar) - Ensure
_list_bucket
uses continuation token for subsequent pages (PR #246, @tcsavage)
- Unpin boto/botocore for regular installation. Fix #227 (PR #232, @menshikh-iv)
- Drop support for
python3.3
andpython3.4
& workaround for brokenmoto
(PR #225, @menshikh-iv) - Add
s3a://
support forS3
. Fix #210 (PR #229, @mpenkov) - Allow use
@
in object (key) names forS3
. Fix #94 (PRs #204 & #224, @dkasyanov & @mpenkov) - Make
close
idempotent & add dummyflush
forS3
(PR #212, @mpenkov) - Use built-in
open
whenever possible. Fix #207 (PR #208, @mpenkov) - Fix undefined name
uri
insmart_open_lib.py
. Fix #213 (PR #214, @cclauss) - Fix new unittests from #212 (PR #219, @mpenkov)
- Reorganize README & make examples py2/py3 compatible (PR #211, @piskvorky)
- Migrate to
boto3
. Fix #43 (PR #164, @mpenkov) - Refactoring smart_open to share compression and encoding functionality (PR #185, @mpenkov)
- Drop
python2.6
compatibility. Fix #156 (PR #192, @mpenkov) - Accept a custom
boto3.Session
instance (support STS AssumeRole). Fix #130, #149, #199 (PR #201, @eschwartz) - Accept
multipart_upload
parameters (supports ServerSideEncryption) forS3
. Fix (PR #202, @eschwartz) - Add support for
pathlib.Path
. Fix #170 (PR #175, @clintval) - Fix performance regression using local file-system. Fix #184 (PR #190, @mpenkov)
- Replace
ParsedUri
class with functions, cleanup internal argument parsing (PR #191, @mpenkov) - Handle edge case (read 0 bytes) in read function. Fix #171 (PR #193, @mpenkov)
- Fix bug with changing
f._current_pos
when callf.readline()
(PR #182, @inksink) - Сlose the old body explicitly after
seek
forS3
. Fix #187 (PR #188, @inksink)
- Fix author/maintainer fields in
setup.py
, avoid bug fromsetuptools==39.0.0
and add workaround forbotocore
andpython==3.3
. Fix #176 (PR #178 & #177, @menshikh-iv & @baldwindc)
- Improve S3 read performance. Fix #152 (PR #157, @mpenkov)
- Add integration testing + benchmark with real S3. Partial fix #151, #156 (PR #158, @menshikh-iv & @mpenkov)
- Disable integration testing if secure vars isn't defined (PR #157, @menshikh-iv)
- Add naitive .gz support for HDFS (PR #128, @yupbank)
- Drop python2.6 support + fix style (PR #137, @menshikh-iv)
- Create separate compression-specific layer. Fix #91 (PR #131, @mpenkov)
- Fix ResourceWarnings + replace deprecated assertEquals (PR #140, @horpto)
- Add encoding parameter to smart_open. Fix #142 (PR #143, @mpenkov)
- Add encoding tests for readers. Fix #145, partial fix #146 (PR #147, @mpenkov)
- Fix file mode for updating case (PR #150, @menshikh-iv)
- Remove GET parameters from url. Fix #120 (PR #121, @mcrowson)
- Enable compressed formats over http. Avoid filehandle leak. Fix #109 and #110. (PR #112, @robottwo )
- Make possible to change number of retries (PR #102, @shaform)
- Bugfix for compressed formats (PR #110, @tmylk)
- HTTP/HTTPS read support w/ Kerberos (PR #107, @robottwo)
- HdfsOpenWrite implementation similar to read (PR #106, @skibaa)
- Support custom S3 server host, port, ssl. (PR #101, @robottwo)
- Add retry around
s3_iter_bucket_process_key
to address S3 Read Timeout errors. (PR #96, @bbbco) - Include tests data in sdist + install them. (PR #105, @cournape)
- Fix #92. Allow hash in filename (PR #93, @tmylk)
- Relative path support (PR #73, @yupbank)
- Move gzipstream module to smart_open package (PR #81, @mpenkov)
- Ensure reader objects never return None (PR #81, @mpenkov)
- Ensure read functions never return more bytes than asked for (PR #84, @mpenkov)
- Add support for reading gzipped objects until EOF, e.g. read() (PR #81, @mpenkov)
- Add missing parameter to read_from_buffer call (PR #84, @mpenkov)
- Add unit tests for gzipstream (PR #84, @mpenkov)
- Bundle gzipstream to enable streaming of gzipped content from S3 (PR #73, @mpenkov)
- Update gzipstream to avoid deep recursion (PR #73, @mpenkov)
- Implemented readline for S3 (PR #73, @mpenkov)
- Added pip requirements.txt (PR #73, @mpenkov)
- Invert NO_MULTIPROCESSING flag (PR #79, @Janrain-Colin)
- Add ability to add query to webhdfs uri. (PR #78, @ellimilial)
- Accept an instance of boto.s3.key.Key to smart_open (PR #38, @asieira)
- Allow passing
encrypt_key
and other parameters toinitiate_multipart_upload
(PR #63, @asieira) - Allow passing boto
host
andprofile_name
to smart_open (PR #71 #68, @robcowie) - Write an empty key to S3 even if nothing is written to S3OpenWrite (PR #61, @petedmarsh)
- Support
LC_ALL=C
environment variable setup (PR #40, @nikicc) - Python 3.5 support
- Bug fix release to enable 'wb+' file mode (PR #50)
- Disable multiprocessing if unavailable. Allows to run on Google Compute Engine. (PR #41, @nikicc)
- Httpretty updated to allow LC_ALL=C locale config. (PR #39, @jsphpl)
- Accept an instance of boto.s3.key.Key (PR #38, @asieira)
- WebHDFS read/write (PR #29, @ziky90)
- re-upload last S3 chunk in failed upload (PR #20, @andreycizov)
- return the entire key in s3_iter_bucket instead of only the key name (PR #22, @salilb)
- pass optional keywords on S3 write (PR #30, @val314159)
- smart_open a no-op if passed a file-like object with a read attribute (PR #32, @gojomo)
- various improvements to testing (PR #30, @val314159)
- support for multistream bzip files (PR #9, @pombredanne)
- introduce this CHANGELOG