Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): adds migration for tracking vector store indexing status #830

Merged
merged 116 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from 72 commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
1fead30
Adds migration for tracking vector store indexing status
CollectiveUnicorn Jul 24, 2024
8504585
Removes enforcement of status
CollectiveUnicorn Jul 24, 2024
7c87604
Simplifies sql query to focus on vector_store_file
CollectiveUnicorn Jul 24, 2024
0f4125a
Pulls supabase-realtime image from official instead of bitnami
CollectiveUnicorn Jul 25, 2024
53e44a5
Reverts supabase realtime image replacement
CollectiveUnicorn Jul 25, 2024
f652a16
Adds separate supabase realtime config for websocket/api
CollectiveUnicorn Aug 2, 2024
4b3fd69
Updates the default config with necessary vars
CollectiveUnicorn Aug 2, 2024
f6d4166
Adds init container placeholder for seeding tables via curl
CollectiveUnicorn Aug 2, 2024
775339b
Moves configmap changes to extra configs
CollectiveUnicorn Aug 2, 2024
e6d5394
Renames variable for compatbility with supabase version
CollectiveUnicorn Aug 2, 2024
427ffce
Ensures configmap is mounted in container
CollectiveUnicorn Aug 2, 2024
7ba8566
Specify the variables manuallY
CollectiveUnicorn Aug 2, 2024
62a2793
Fixes env var typo
CollectiveUnicorn Aug 2, 2024
86f0b4b
Adds script to seed the realtime instance
CollectiveUnicorn Aug 2, 2024
4d9bbd9
Moves script somewhere with permissions
CollectiveUnicorn Aug 2, 2024
39cc4d5
Moves script to values file
CollectiveUnicorn Aug 2, 2024
dca8c65
Updates url and var for realtime init container
CollectiveUnicorn Aug 2, 2024
c254ee3
Replace initContainer with sidecars
CollectiveUnicorn Aug 2, 2024
811cefb
Adds wait for realtime container
CollectiveUnicorn Aug 2, 2024
1ec9ad8
Replaces os-shell with busybox
CollectiveUnicorn Aug 2, 2024
2829b41
Replaces busybox container with a curl container
CollectiveUnicorn Aug 2, 2024
4a30a4b
Replaces nc wait with curl wait
CollectiveUnicorn Aug 2, 2024
ffa7ffa
Try to replace how the realtimme seed is happening
CollectiveUnicorn Aug 5, 2024
a8fe333
Merge branch 'main' into fix-supabase-realtime
CollectiveUnicorn Aug 5, 2024
3677383
Replaces job with inline curl call
CollectiveUnicorn Aug 5, 2024
395454c
Replaces wget commands with curl
CollectiveUnicorn Aug 5, 2024
f99eda0
Replaces invalid health endpoint on realtime
CollectiveUnicorn Aug 5, 2024
0327095
Moves the curl command to one line
CollectiveUnicorn Aug 5, 2024
18fccac
Updates versions of supabase-realtime
CollectiveUnicorn Aug 5, 2024
5776532
Sets the ENC_KEY to a default value of 16 chars
CollectiveUnicorn Aug 5, 2024
294ec62
Makes the sidecar sleep after a successful run to prevent crashbackloops
CollectiveUnicorn Aug 5, 2024
b7cb54c
Adds notify private alpha
CollectiveUnicorn Aug 7, 2024
27b96e3
Replaces DB_ENC_KEY with JWT_SECRET
CollectiveUnicorn Aug 7, 2024
a2932a4
Updates the JWT_SECRET to be correct
CollectiveUnicorn Aug 7, 2024
cc85394
Removes unnecessary quotes that interfere with SSL configs
CollectiveUnicorn Aug 7, 2024
a33a3ba
Replaces sidecar based realtime tenant initialization with init args
CollectiveUnicorn Aug 8, 2024
3882e15
Adds back necessary quotes
CollectiveUnicorn Aug 8, 2024
5296c66
Adds migration to update the freshly seeded db
CollectiveUnicorn Aug 8, 2024
673c07a
Randomly generates enc_secret instead of hard coding it
CollectiveUnicorn Aug 8, 2024
537239e
Replaces secret name to prevent deployment errors
CollectiveUnicorn Aug 8, 2024
92c314f
Creates _realtime schema and sets up deployment to use it
CollectiveUnicorn Aug 8, 2024
5b055e5
Set the search path back to public
CollectiveUnicorn Aug 8, 2024
c04adf6
Initialize _realtime table in after connect query
CollectiveUnicorn Aug 8, 2024
87e8a52
Replaces multiple queries with a single query
CollectiveUnicorn Aug 8, 2024
a5cbb4b
Updates the migration to represent the new location
CollectiveUnicorn Aug 8, 2024
bcc000d
Swaps out postgres image for one with wal2json
CollectiveUnicorn Aug 8, 2024
3cc1bc1
Moves back to older postgres image
CollectiveUnicorn Aug 8, 2024
c57facf
Switch to official supabase postgres image
CollectiveUnicorn Aug 8, 2024
1c083e8
Returns to using a base postgres image
CollectiveUnicorn Aug 8, 2024
11888cb
Updates the config to set the wal_level to logical
CollectiveUnicorn Aug 8, 2024
ce97caf
Merge branch 'main' into fix-supabase-realtime
CollectiveUnicorn Aug 8, 2024
30b8ba2
Merge branch 'fix-supabase-realtime' into 788-featapi-vector-store-in…
CollectiveUnicorn Aug 9, 2024
79e09a2
Removes explicit install of requests dep and adds new realtime dep
CollectiveUnicorn Aug 13, 2024
f682926
Reverts removal of requests dependency in favor of doing it an a sepa…
CollectiveUnicorn Aug 13, 2024
6a713ad
Installs specific realtime version and switches how the tests listens…
CollectiveUnicorn Aug 13, 2024
351488a
Adds complete test to check whether realtime is working
CollectiveUnicorn Aug 14, 2024
d084c72
Ruff linting
CollectiveUnicorn Aug 14, 2024
3d57f6d
Adds fastapi to dev deps list
CollectiveUnicorn Aug 14, 2024
8532be5
Fixes linting issue
CollectiveUnicorn Aug 14, 2024
cfba262
Updates comments
CollectiveUnicorn Aug 14, 2024
33ef8fa
Adds missing dependency
CollectiveUnicorn Aug 14, 2024
49cf899
Merge branch 'main' into 788-featapi-vector-store-indexing-status
CollectiveUnicorn Aug 21, 2024
b1955cd
Pin supabase version at root to 2.6.0
CollectiveUnicorn Aug 22, 2024
568b4c9
Add background task for processing vectors
CollectiveUnicorn Aug 22, 2024
2d6c566
Create usable placeholder and refactor creation logic into indexing file
CollectiveUnicorn Aug 23, 2024
58e8193
Ruff linting
CollectiveUnicorn Aug 23, 2024
b1d91c4
Updates modify vector store to use background task when files are pro…
CollectiveUnicorn Aug 23, 2024
20dca4a
Ruff linting
CollectiveUnicorn Aug 23, 2024
e0c472a
Ruff linting
CollectiveUnicorn Aug 23, 2024
00e6e61
Removes hard dependency on FastAPI for indexing, cleans up comments
CollectiveUnicorn Aug 26, 2024
d8b949c
Adds initial test
CollectiveUnicorn Aug 26, 2024
bfb71ed
Removes unnecessary import
CollectiveUnicorn Aug 26, 2024
fd2a2c3
Ruff linting
CollectiveUnicorn Aug 26, 2024
32fa43e
Moves deletion test t o end of test_vector_stores
CollectiveUnicorn Aug 26, 2024
8e6e403
Updates content payload
CollectiveUnicorn Aug 26, 2024
a0d8044
Add messages client to test
CollectiveUnicorn Aug 26, 2024
ae554b2
Replaces more incorrect clients
CollectiveUnicorn Aug 26, 2024
f38f2cf
Switch from llama-cpp-python to test-chat
CollectiveUnicorn Aug 26, 2024
605a06c
Moves vector stores e2e test into test_api.py
CollectiveUnicorn Aug 26, 2024
8011c22
Reverts test_vector_stores.py to match main
CollectiveUnicorn Aug 26, 2024
c2b1c05
Makes test handle concurrency better, allow for re-using creds
CollectiveUnicorn Aug 27, 2024
fc38a8e
Updates text-embeddings to run on GPU
CollectiveUnicorn Aug 27, 2024
c5c580f
Updates text-embeddings to run on GPU
CollectiveUnicorn Aug 27, 2024
d949f74
Replaces gpu with cuda
CollectiveUnicorn Aug 27, 2024
03186f1
Removes code specifying the gpu device
CollectiveUnicorn Aug 27, 2024
44f5578
Bumps wait time up to 10 min
CollectiveUnicorn Aug 27, 2024
e7cbc2d
Reduces wait time down to 3 min and swaps out pdf
CollectiveUnicorn Aug 27, 2024
143933c
Adds fastapi to project root for test
CollectiveUnicorn Aug 27, 2024
42561ff
Merge branch 'main' into background-tasks
CollectiveUnicorn Aug 27, 2024
dbeadc6
Moves test into new e2e that has multiple backends
CollectiveUnicorn Aug 27, 2024
0324f02
Bumps migration name to latest
CollectiveUnicorn Aug 27, 2024
8bb0166
Fixes typo
CollectiveUnicorn Aug 27, 2024
d952c6a
Adds comment for service key
CollectiveUnicorn Aug 27, 2024
d6439bf
Merge branch 'main' into 788-featapi-vector-store-indexing-status
CollectiveUnicorn Aug 27, 2024
18dc5e7
Changes updated_at precision to ms
CollectiveUnicorn Aug 27, 2024
68a29c2
Bumps version to 0.11.1
CollectiveUnicorn Aug 28, 2024
517bb75
Resolves incorrect usage_bytes comment
CollectiveUnicorn Aug 28, 2024
17f1a3a
Reverts test back to match main
CollectiveUnicorn Aug 28, 2024
50fc526
Change how the timestamp is generated
CollectiveUnicorn Aug 28, 2024
1a60c9a
Lets the db set the created_at time for the test
CollectiveUnicorn Aug 28, 2024
5ad9563
Returns manual uuid creation
CollectiveUnicorn Aug 28, 2024
187d653
Merge branch 'main' into background-tasks
CollectiveUnicorn Aug 29, 2024
360fab1
Merge branch 'main' into 788-featapi-vector-store-indexing-status
CollectiveUnicorn Aug 29, 2024
a1adcfd
Merge branch 'background-tasks' into 788-featapi-vector-store-indexin…
CollectiveUnicorn Aug 29, 2024
059ab1a
Updates ids to be blank and allow db to generate
CollectiveUnicorn Aug 31, 2024
9df884d
Merge branch 'main' into background-tasks
CollectiveUnicorn Aug 31, 2024
9f6ece2
Switch to test fixture for openai client
CollectiveUnicorn Aug 31, 2024
928e0f0
Merge branch 'background-tasks' into 788-featapi-vector-store-indexin…
CollectiveUnicorn Sep 3, 2024
f08bd38
Merge branch 'main' into background-tasks
CollectiveUnicorn Sep 3, 2024
ebb2ed2
Adds logging and cleanup operations for failed indexing
CollectiveUnicorn Sep 3, 2024
53d3b2f
Merge branch 'background-tasks' into 788-featapi-vector-store-indexin…
CollectiveUnicorn Sep 3, 2024
7eea9d8
Fixes typo
CollectiveUnicorn Sep 3, 2024
4f1dbcd
Merge branch 'main' into 788-featapi-vector-store-indexing-status
CollectiveUnicorn Sep 3, 2024
8dee0ad
Merge branch 'main' into 788-featapi-vector-store-indexing-status
CollectiveUnicorn Sep 4, 2024
ddc6cce
Merge branch 'main' into 788-featapi-vector-store-indexing-status
gphorvath Sep 4, 2024
b30efd8
Trigger Build
CollectiveUnicorn Sep 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/actions/lfai-core/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ runs:
id: set-env-var
run: |
echo "ANON_KEY=$(uds zarf tools kubectl get secret supabase-bootstrap-jwt -n leapfrogai -o jsonpath='{.data.anon-key}' | base64 -d)" >> "$GITHUB_ENV"
echo "SERVICE_KEY=$(uds zarf tools kubectl get secret supabase-bootstrap-jwt -n leapfrogai -o jsonpath='{.data.service-key}' | base64 -d)" >> "$GITHUB_ENV"

- name: Deploy LFAI-API
shell: bash
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
-- Update the vector_store_file table to add an updated_at column
ALTER TABLE vector_store_file ADD COLUMN updated_at timestamp DEFAULT timezone('utc', now()) NOT NULL;

-- Add an index on user_id for faster queries
CREATE INDEX idx_vector_store_file_user_id ON vector_store_file(user_id);

-- Create a function to update the updated_at column
CREATE OR REPLACE FUNCTION update_modified_column()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = timezone('utc', now());
RETURN NEW;
END;
$$ language 'plpgsql';

-- Create a trigger to automatically update the updated_at column
CREATE TRIGGER update_vector_store_file_modtime
BEFORE UPDATE ON vector_store_file
FOR EACH ROW
EXECUTE FUNCTION update_modified_column();

-- Enable Supabase realtime for the vector_store_file table
alter publication supabase_realtime
add table vector_store_file;
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ dev = [
"requests-toolbelt",
"pytest",
"huggingface_hub[cli,hf_transfer]",
"fastapi",
"supabase == 2.6.0"
gphorvath marked this conversation as resolved.
Show resolved Hide resolved
]

dev-whisper = ["ctranslate2 == 4.1.0", "transformers[torch] == 4.39.3"]
Expand Down
168 changes: 164 additions & 4 deletions tests/e2e/test_supabase.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,24 @@
import asyncio
import io
import threading
import uuid
from fastapi import UploadFile
import requests
from openai.types.beta.vector_stores import VectorStoreFile
from openai.types.beta import VectorStore
from openai.types.beta.vector_store import FileCounts
import _thread

from .utils import ANON_KEY
from supabase import AClient as AsyncClient, acreate_client
from realtime import Socket
from leapfrogai_api.data.crud_file_bucket import CRUDFileBucket
from leapfrogai_api.data.crud_file_object import CRUDFileObject
from leapfrogai_api.data.crud_vector_store import CRUDVectorStore

from leapfrogai_api.data.crud_vector_store_file import CRUDVectorStoreFile

from .utils import ANON_KEY, create_test_user, SERVICE_KEY
from openai.types import FileObject

health_urls = {
"auth_health_url": "http://supabase-kong.uds.dev/auth/v1/health",
Expand All @@ -12,9 +30,151 @@
def test_studio():
try:
for url_name in health_urls:
response = requests.get(health_urls[url_name], headers={"apikey": ANON_KEY})
response.raise_for_status()
resp = requests.get(health_urls[url_name], headers={"apikey": ANON_KEY})
resp.raise_for_status()
except requests.exceptions.RequestException as e:
print(f"Error: Request failed with status code {response.status_code}")
print(f"Error: Request failed with status code {resp.status_code}")
print(e)
exit(1)


def test_supabase_realtime_vector_store_indexing():
class TestCompleteException(Exception):
pass

def timeout_handler():
print("Test timed out after 10 seconds")
# This is necessary to stop the thread from hanging forever
_thread.interrupt_main()

async def postgres_db_changes():
"""
This function is responsible for creating a vector store and uploading a file to it.
"""
client: AsyncClient = await acreate_client(
supabase_key=ANON_KEY,
supabase_url="https://supabase-kong.uds.dev",
)
await client.auth.set_session(access_token=access_token, refresh_token="dummy")

upload_file_id = await upload_file(client)
assert upload_file_id is not None, "Failed to upload file"

vector_store = VectorStore(
id=str(uuid.uuid4()),
gphorvath marked this conversation as resolved.
Show resolved Hide resolved
created_at=0,
file_counts=FileCounts(
cancelled=0,
completed=0,
failed=0,
in_progress=0,
total=0,
),
name="test_vector_store",
object="vector_store",
status="completed",
usage_bytes=0,
)

await CRUDVectorStore(client).create(vector_store)

vector_store_file = VectorStoreFile(
id=upload_file_id,
vector_store_id=vector_store.id,
created_at=0,
object="vector_store.file",
status="completed",
usage_bytes=0,
)

await CRUDVectorStoreFile(client).create(vector_store_file)

def postgres_changes_callback(payload):
"""
This function is responsible for listening for changes to the vector store file and signaling success if the file triggers realtime successfully.
"""
expected_record = {
"object": "vector_store.file",
"status": "completed",
"usage_bytes": 0,
}

all_records_match = all(
payload.get("record", {}).get(key) == value
for key, value in expected_record.items()
)
event_information_match = (
payload.get("table") == "vector_store_file"
and payload.get("type") == "INSERT"
)

if event_information_match and all_records_match:
raise TestCompleteException("Test completed successfully")

async def upload_file(client: AsyncClient) -> str:
"""
This function is responsible for uploading a file to the file bucket.
"""
id_ = str(uuid.uuid4())
CollectiveUnicorn marked this conversation as resolved.
Show resolved Hide resolved

empty_file_object = FileObject(
id=id_,
bytes=0,
created_at=0,
filename="",
object="file",
purpose="assistants",
status="uploaded",
status_details=None,
)

crud_file_object = CRUDFileObject(client)

file_object = await crud_file_object.create(object_=empty_file_object)
assert file_object is not None, "Failed to create file object"

crud_file_bucket = CRUDFileBucket(db=client, model=UploadFile)
await crud_file_bucket.upload(
file=UploadFile(filename="", file=io.BytesIO(b"")), id_=file_object.id
)
return id_

def run_postgres_db_changes():
"""
This function is responsible for running the postgres_db_changes function.
"""
asyncio.run(postgres_db_changes())

timeout_timer = None
try:
random_name = str(uuid.uuid4())
access_token = create_test_user(email=f"{random_name}@fake.com")

# Schedule postgres_db_changes to run after 5 seconds
threading.Timer(5.0, run_postgres_db_changes).start()

# Set a timeout of 10 seconds
timeout_timer = threading.Timer(10.0, timeout_handler)
timeout_timer.start()

# Listening socket
# The service key is needed for proper permission to listen to realtime events
# At the time of writing this, the Supabase realtime library does not support RLS
URL = f"wss://supabase-kong.uds.dev/realtime/v1/websocket?apikey={SERVICE_KEY}&vsn=1.0.0"
CollectiveUnicorn marked this conversation as resolved.
Show resolved Hide resolved
s = Socket(URL)
s.connect()

# Set channel to listen for changes to the vector_store_file table
channel_1 = s.set_channel("realtime:public:vector_store_file")
# Listen for all events on the channel ex: INSERT, UPDATE, DELETE
channel_1.join().on("*", postgres_changes_callback)

# Start listening
s.listen()
CollectiveUnicorn marked this conversation as resolved.
Show resolved Hide resolved
except TestCompleteException:
if timeout_timer is not None:
timeout_timer.cancel() # Cancel the timeout timer if test completes successfully

assert True
except Exception:
assert False
1 change: 1 addition & 0 deletions tests/e2e/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

# This is the anon_key for supabase, it provides access to the endpoints that would otherwise be inaccessible
ANON_KEY = os.environ["ANON_KEY"]
SERVICE_KEY = os.environ["SERVICE_KEY"]

DEFAULT_TEST_EMAIL = "[email protected]"
DEFAULT_TEST_PASSWORD = "password"
Expand Down
Loading