Skip to content

Commit

Permalink
Add migration from pgvecto.rs to VectorChord
Browse files Browse the repository at this point in the history
Signed-off-by: cutecutecat <[email protected]>
  • Loading branch information
cutecutecat committed Dec 23, 2024
1 parent 5f83fc3 commit 5766fb5
Show file tree
Hide file tree
Showing 2 changed files with 202 additions and 0 deletions.
7 changes: 7 additions & 0 deletions .vitepress/config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,13 @@ export default defineConfig({
{ text: 'Indexing', link: '/vectorchord/usage/indexing' },
]
},
{
text: 'Administration',
collapsed: false,
items: [
{ text: 'Migration', link: '/vectorchord/admin/migration' },
]
},
],
'/cloud/':
[
Expand Down
195 changes: 195 additions & 0 deletions src/vectorchord/admin/migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
---
outline: [2,3]
---
# VectorChord Migration Overview

## From PGVecto.rs

Most of VectorChord's vector types are supported by `pgvector`.
Therefore, migrating from PGVecto.rs to VectorChord is a simple inversion of [migrating from pgvector to PGVecto.rs](../../admin/migration).

### Prerequisites

Let's suppose that you have a table with a column of vector type, and you have `PGVecto.rs` installed. A simple example would be:

```sql
CREATE TABLE table_vector (id bigserial PRIMARY KEY, vec_f32 vectors.vector(3));
CREATE TABLE table_float16 (id bigserial PRIMARY KEY, vec_f16 vectors.vecf16(3));
CREATE TABLE table_binary (id bigserial PRIMARY KEY, vec_bits vectors.bvector(3));

```

In addition, some rows will be inserted into the table, as follows:

```sql
INSERT INTO table_vector (vec_f32) VALUES ('[0.1, 0.1, 0.1]'), ('[0.2, 0.2, 0.2]');
INSERT INTO table_float16 (vec_f16) VALUES ('[0.1, 0.1, 0.1]'), ('[-0.2, 0.2, 0.2]');
INSERT INTO table_binary (vec_bits) VALUES ('[0, 0, 0]'), ('[0, 0, 1]');
```

In most cases, there will also be one or more vector indexes:

```sql
CREATE INDEX index_vector ON table_vector USING vectors (vec_f32 vectors.vector_l2_ops) WITH (options = "[indexing.hnsw]");
-- CREATE INDEX index_float16 ON table_float16 USING vectors (vec_f16 vectors.vecf16_cos_ops) WITH (options = "[indexing.hnsw]");
-- CREATE INDEX index_binary ON table_binary USING vectors (vec_bits vectors.bvector_hamming_ops) WITH (options = "[indexing.hnsw]");
```

This is a typical scenario when `PGVecto.rs` is used. If the following requirements are met, the migration can be initiated.

- Please note that your `pgvector` and `VectorChord` extension is not installed in the `vectors` schema. This means that you can install `pgvector` and `PGVecto.rs` at the same time.

### Steps to Migration

#### 1. Install VectorChord

Please follow the [instructions](../getting-started/installation) provided to install `VectorChord` and ensure that the extension is loaded by PostgreSQL.

```sql
-- It will install vectorchord and pgvector at public schame
CREATE EXTENSION vchord CASCADE;
```

The two extensions `VectorChord` and `PGVecto.rs` are now installed on PostgreSQL, but inside different schemas.

To validate the install, please run `\dx` in psql. You will see that there are 3 extensions:
- `vectors` for `PGVecto.rs` at schema vectors
- `vector` for `pgvector` at schema public
- `vchord` for `VectorChord` at schema public

```shell
postgres=# \dx
List of installed extensions
Name | Version | Schema | Description
---------+---------+------------+----------------------------------------------------------------------------------------------
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
vchord | 0.1.0 | public | vchord: Vector database plugin for Postgres, written in Rust, specifically designed for LLM
vector | 0.8.0 | public | vector data type and ivfflat and hnsw access methods
vectors | 0.4.0 | vectors | vectors: Vector database plugin for Postgres, written in Rust, specifically designed for LLM
```
#### 2. Check exist indexes
For indexes created at vector columns, please retrieve the definition and record it in a convenient location.
This information is necessary for the recovery of the indexes at a later stage.
```sql
postgres=# \di
List of relations
Schema | Name | Type | Owner | Table
--------+--------------------+-------+----------+---------------
public | index_float16 | index | postgres | table_float16
public | index_vector | index | postgres | table_vector
public | table_binary_pkey | index | postgres | table_binary
public | table_float16_pkey | index | postgres | table_float16
public | table_vector_pkey | index | postgres | table_vector
```
#### 3. Migrate vector columns
To migrate from a `PGVecto.rs` vector column, the first step is to cast column datatypes between extensions.
Here provides the conversion path for each vector type of `VectorChord`:
- `vector(PGVecto.rs) -> real[] -> vector(pgvector)`
- `vecf16 -> vector(PGVecto.rs) -> real[] -> halfvec`
- `bvector -> vector(PGVecto.rs) -> text[] -> real[] -> bit`
- `svector -> text -> index++ -> text -> sparsevec`
For sparse vector type `svector`, we need additional helper function to finish the casting, see the [appendix](#appendix-about-cast-of-sparse-vector).
The migration will be completed in one command and no additional memory is required.
However, you need to delete the index on the vector column in advance.
::: warning
This step will cause some downtime, due to:
- a long `ACCESS EXCLUSIVE` lock by `ALTER TABLE ALTER COLUMN`
:::
```sql
-- The name of index can be found in step 2
DROP INDEX IF EXISTS index_vector;
-- From vector type
ALTER TABLE table_vector ALTER COLUMN vec_f32 TYPE public.vector USING vec_f32::real[]::public.vector;
-- From halfvec type
ALTER TABLE table_float16 ALTER COLUMN vec_f16 TYPE public.halfvec(3) USING vec_f16::vectors.vector::real[]::public.halfvec;
-- For bit type
ALTER TABLE table_binary ALTER COLUMN vec_bits TYPE bit(3) USING array_to_string(vec_bits::vectors.vector::real[], '')::bit(3);
```
#### 4. Recreate index
The next step is to recreate index on `vector` columns for `PGVecto.rs`.
```sql
CREATE INDEX ON index_vector USING vchordrq (vec_f32 vector_l2_ops) WITH (options = $$
residual_quantization = true
[build.internal]
lists = [4096]
spherical_centroids = false
$$);
```
For more information of create index at `VectorChord`, see [indexing](../usage/indexing).
#### 5. Clean the environment
You may now safely remove the `PGVecto.rs` extension if it is no longer required.
```
DROP extension vectors;
```
### Appendix: cast of sparse vector
We have discussed most types, but not the sparse vector. It is a little more complicated than other types. Suppose we have a table with a sparse vector column:
```sql
CREATE TABLE table_sparse (id bigserial PRIMARY KEY, vec_sparse vectors.svector(3));
INSERT INTO table_sparse (vec_sparse) VALUES ('{0:-0.1, 1:0.1, 2:0.1}/3'), ('{0:-0.4, 1:-0.4, 2:-0.4}/3');
```
Sparse vectors have a different text representation between `PGVecto.rs` and `pgvector`: The index starts from 0 in `PGVecto.rs` and from 1 in `pgvector`.
For example, a typical sparse vector `[-1, 0, 1, 0, 2]` has a text representation of:
- `{0:-1, 2:1, 4:2}/5` in `PGVecto.rs`
- `{1:-1, 3:1, 5:2}/5` in `pgvector`
So we need a helper function to handle this change by processing the string:
```sql
CREATE OR REPLACE FUNCTION migrate_sparse(input_text text) RETURNS text AS $$
DECLARE
output_text text := '';
cache_index_number text := '';
number_started boolean := false;
BEGIN
FOR i IN 1..length(input_text) LOOP
IF substring(input_text from i for 1) = ':' THEN
output_text := output_text || (cache_index_number::int + 1)::text || ':';
number_started := false;
ELSIF substring(input_text from i for 1) ~ '[,{]' THEN
cache_index_number := '';
number_started := true;
output_text := output_text || substring(input_text from i for 1);
ELSIF substring(input_text from i for 1) ~ '[0-9]' THEN
IF number_started THEN
cache_index_number := cache_index_number || substring(input_text from i for 1);
ELSE
output_text := output_text || substring(input_text from i for 1);
END IF;
ELSE
output_text := output_text || substring(input_text from i for 1);
END IF;
END LOOP;
RETURN output_text;
END;
$$ LANGUAGE plpgsql;
```
After that, the migration can continue smoothly:
```sql
ALTER TABLE table_sparse ALTER COLUMN vec_sparse TYPE public.sparsevec USING migrate_sparse(vec_sparse::text)::public.sparsevec;
```

0 comments on commit 5766fb5

Please sign in to comment.