v0.3.0 Rusty Lances and Friendly Neighbors
Sayonara C++, bonjour Rust
What started out as a holiday hack has become a full-blown Rust rewrite.
As we say farewell to our much beloved C++ implementation, we welcome a major new feature to Lance: the vector index.
- Lance's vector index is fast and has a small memory footprint. From disk, we benchmark average latencies of 1ms on vanilla macbook airs for 1M vectors.
- Your data, vectors, and index can live in harmony under one roof so you don't need to manage a separate index or service.
- You can choose to manage and retrieve additional features with the vectors with very little performance impact.
What's Changed
- Only increase cursor if file success to write by @eddyxu in #435
- GHA to add python 3.11 (and upgrade to duckdb 0.6.1) by @changhiskhan in #434
- ScannerStream accepts early stop by @eddyxu in #437
- upgrade arrow-rs to 31.0 by @eddyxu in #438
- L2 distance by @eddyxu in #439
- Create DataFragment and DataFile during Dataset write process by @eddyxu in #440
- Rust Dataset Write API by @eddyxu in #441
- [Rust] Read Partially from a plain encoded batch by @eddyxu in #443
- Get range in var-binary encoding by @eddyxu in #444
- Productionlize Flat Index by @eddyxu in #442
- Make Scan an ExecNode by @eddyxu in #445
- Take record by Row ID by @eddyxu in #446
- Implement Take for dictionary decoder. by @eddyxu in #447
- Merge two RecordBatch by @eddyxu in #449
- Integrate flat index by @eddyxu in #448
- Support limit offset as ExecNode by @changhiskhan in #450
- Read IVF_PQ index by @eddyxu in #451
- Cli to operate on dataset indices by @eddyxu in #452
- [RUST] python (re)integration v1 by @changhiskhan in #436
- Support writing dictionary values (at the dataset level). by @eddyxu in #454
- Replace ObjectReader as a pub trait. by @eddyxu in #459
- [Rust] Implement LocalObjectReader that holds an open file to improve performance. by @eddyxu in #460
- inherit from pyarrow Dataset/Scanner by @changhiskhan in #462
- [RUST] Flat index benchmark by @eddyxu in #461
- Generate spotify dataset with embeddings. by @eddyxu in #453
- Fix pylance typo and float32 array conversion. by @eddyxu in #463
- Write index metadata with a new version by @eddyxu in #466
- [rust] fix projection in Dataset:take_rows by @changhiskhan in #464
- blas feature flag by @changhiskhan in #467
- Sift dataset generation by @eddyxu in #472
- Improve scan perf by re-enable prefetching in ScanNode by @eddyxu in #473
- Changhiskhan/new docs by @changhiskhan in #474
- Fix AVX and NEON L2 distance computation. by @eddyxu in #476
- add recall metric computation by @changhiskhan in #475
- Fix reader assertion on manifest buffer size by @eddyxu in #478
- [Rust] Minimal dataset append support by @eddyxu in #482
- Pass nprobes parameter from python by @changhiskhan in #480
- add a test_dataset function to compute the recall for lance by @changhiskhan in #479
- Split sparse index read into chunks based on optimal I/O size for the media by @eddyxu in #483
- Fix codespace prebuild by @eddyxu in #485
- Make ObjectReader prefetch size configurable by @eddyxu in #486
- Add a refine stage for vector search by @eddyxu in #488
- add nprobes as parameter to benchmark by @changhiskhan in #484
- refine factor by @changhiskhan in #489
- Use ordered buffer in plain decoder by @eddyxu in #493
- New rust+pyo3 based pylance by @eddyxu in #494
- Fast count rows by @eddyxu in #490
- Count rows in python dataset, and setup GHA again by @eddyxu in #495
- Sayonara C++ by @eddyxu in #497
- [Rust] Dataset Overwrite, and Version Checkout by @eddyxu in #496
- Load S3 credentials using default credentials chain by @eddyxu in #498
- Fix doc build by @eddyxu in #499
- File format spec by @eddyxu in #500
- Doc build fix by @eddyxu in #501
- Schema evolution document by @eddyxu in #503
- update the python readme for pypi by @changhiskhan in #504
- Handle null strings for both cases where nullability is set or not. by @eddyxu in #509
- update main github readme by @changhiskhan in #508
- [python] write_dataset returns new dataset by @changhiskhan in #517
- Changhiskhan/list versions by @changhiskhan in #516
- Refine Factor is None by default by @eddyxu in #518
Full Changelog: v0.2.9...v0.3.0