Skip to content

Commit

Permalink
Update pgvector loading method to use binary format (#488)
Browse files Browse the repository at this point in the history
Particularly on large vector types, the pgvector module was spending
significant time on converting floating point values to ASCII before
being transmitted to the PostgreSQL server. This changes keeps the
format in binary, reducing overhead. One test demonstrated a 63%
reduction in load time, which would have an impact on the overall
"build" time as reported by this benchmark.
  • Loading branch information
jkatz authored Feb 29, 2024
1 parent 77113e0 commit c091271
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion ann_benchmarks/algorithms/pgvector/module.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ def fit(self, X):
cur.execute("CREATE TABLE items (id int, embedding vector(%d))" % X.shape[1])
cur.execute("ALTER TABLE items ALTER COLUMN embedding SET STORAGE PLAIN")
print("copying data...")
with cur.copy("COPY items (id, embedding) FROM STDIN") as copy:
with cur.copy("COPY items (id, embedding) FROM STDIN WITH (FORMAT BINARY)") as copy:
copy.set_types(["int4", "vector"])
for i, embedding in enumerate(X):
copy.write_row((i, embedding))
print("creating index...")
Expand Down

0 comments on commit c091271

Please sign in to comment.