diff --git a/docs/reference/online-stores/overview.md b/docs/reference/online-stores/overview.md
index 04d24447058..b54329ad613 100644
--- a/docs/reference/online-stores/overview.md
+++ b/docs/reference/online-stores/overview.md
@@ -34,21 +34,21 @@ Details for each specific online store, such as how to configure it in a `featur
 
 Below is a matrix indicating which online stores support what functionality.
 
-| | Sqlite | Redis | DynamoDB | Snowflake | Datastore | Postgres | Hbase | [[Cassandra](https://cassandra.apache.org/_/index.html) / [Astra DB](https://www.datastax.com/products/datastax-astra?utm_source=feast)] | [IKV](https://inlined.io) |
-| :-------------------------------------------------------- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
-| write feature values to the online store                  | yes | yes | yes | yes | yes | yes | yes | yes | yes |
-| read feature values from the online store                 | yes | yes | yes | yes | yes | yes | yes | yes | yes |
-| update infrastructure (e.g. tables) in the online store   | yes | yes | yes | yes | yes | yes | yes | yes | yes |
-| teardown infrastructure (e.g. tables) in the online store | yes | yes | yes | yes | yes | yes | yes | yes | yes |
-| generate a plan of infrastructure changes                 | yes | no  | no  | no  | no  | no  | no  | yes | no  |
-| support for on-demand transforms                          | yes | yes | yes | yes | yes | yes | yes | yes | yes |
-| readable by Python SDK                                    | yes | yes | yes | yes | yes | yes | yes | yes | yes |
-| readable by Java                                          | no  | yes | no  | no  | no  | no  | no  | no  | no  |
-| readable by Go                                            | yes | yes | no  | no  | no  | no  | no  | no  | no  |
-| support for entityless feature views                      | yes | yes | yes | yes | yes | yes | yes | yes | yes |
-| support for concurrent writing to the same key            | no  | yes | no  | no  | no  | no  | no  | no  | yes |
-| support for ttl (time to live) at retrieval               | no  | yes | no  | no  | no  | no  | no  | no  | no  |
-| support for deleting expired data                         | no  | yes | no  | no  | no  | no  | no  | no  | no  |
-| collocated by feature view                                | yes | no  | yes | yes | yes | yes | yes | yes | no  |
-| collocated by feature service                             | no  | no  | no  | no  | no  | no  | no  | no  | no  |
-| collocated by entity key                                  | no  | yes | no  | no  | no  | no  | no  | no  | yes |
+| | Sqlite | Redis | DynamoDB | Snowflake | Datastore | Postgres | Hbase | [[Cassandra](https://cassandra.apache.org/_/index.html) / [Astra DB](https://www.datastax.com/products/datastax-astra?utm_source=feast)] | [IKV](https://inlined.io) | Milvus |
+| :-------------------------------------------------------- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |:-------|
+| write feature values to the online store                  | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes    |
+| read feature values from the online store                 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes    |
+| update infrastructure (e.g. tables) in the online store   | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes    |
+| teardown infrastructure (e.g. tables) in the online store | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes    |
+| generate a plan of infrastructure changes                 | yes | no  | no  | no  | no  | no  | no  | yes | no  | no     |
+| support for on-demand transforms                          | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes    |
+| readable by Python SDK                                    | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes    |
+| readable by Java                                          | no  | yes | no  | no  | no  | no  | no  | no  | no  | no     |
+| readable by Go                                            | yes | yes | no  | no  | no  | no  | no  | no  | no  | no     |
+| support for entityless feature views                      | yes | yes | yes | yes | yes | yes | yes | yes | yes | no     |
+| support for concurrent writing to the same key            | no  | yes | no  | no  | no  | no  | no  | no  | yes | no   |
+| support for ttl (time to live) at retrieval               | no  | yes | no  | no  | no  | no  | no  | no  | no  | no  |
+| support for deleting expired data                         | no  | yes | no  | no  | no  | no  | no  | no  | no  | no  |
+| collocated by feature view                                | yes | no  | yes | yes | yes | yes | yes | yes | no  | no  |
+| collocated by feature service                             | no  | no  | no  | no  | no  | no  | no  | no  | no  | no  |
+| collocated by entity key                                  | no  | yes | no  | no  | no  | no  | no  | no  | yes | no  | 
diff --git a/examples/rag/README.md b/examples/rag/README.md
new file mode 100644
index 00000000000..e49b00eef72
--- /dev/null
+++ b/examples/rag/README.md
@@ -0,0 +1,88 @@
+# 🚀 Quickstart: Retrieval-Augmented Generation (RAG) using Feast and Large Language Models (LLMs)
+
+This project demonstrates how to use **Feast** to power a **Retrieval-Augmented Generation (RAG)** application. 
+The RAG architecture combines retrieval of documents (using vector search) with In-Context-Learning (ICL) through a 
+**Large Language Model (LLM)** to answer user questions accurately using structured and unstructured data.
+
+## 💡 Why Use Feast for RAG?
+
+- **Online retrieval of features:** Ensure real-time access to precomputed document embeddings and other structured data.
+- **Declarative feature definitions:** Define feature views and entities in a Python file and empower Data Scientists to easily ship scalabe RAG applications with all of the existing benefits of Feast.
+- **Vector search:** Leverage Feast’s integration with vector databases like **Milvus** to find relevant documents based on a similarity metric (e.g., cosine).
+- **Structured and unstructured context:** Retrieve both embeddings and traditional features, injecting richer context into LLM prompts.
+- **Versioning and reusability:** Collaborate across teams with discoverable, versioned data pipelines.
+
+---
+
+## 📂 Project Structure
+
+- **`data/`**: Contains the demo data, including Wikipedia summaries of cities with sentence embeddings stored in a Parquet file.
+- **`example_repo.py`**: Defines the feature views and entity configurations for Feast.
+- **`feature_store.yaml`**: Configures the offline and online stores (using local files and Milvus Lite in this demo).
+- **`test_workflow.py`**: Demonstrates key Feast commands to define, retrieve, and push features.
+
+---
+
+## 🛠️ Setup
+
+1. **Install the necessary packages**:
+   ```bash
+   pip install feast torch transformers openai
+   ```
+2. Initialize and inspect the feature store:
+
+   ```bash
+     feast apply
+   ```
+
+3. Materialize features into the online store:
+
+   ```bash
+   python -c "from datetime import datetime; from feast import FeatureStore; store = FeatureStore(repo_path='.')"
+   python -c "store.materialize_incremental(datetime.utcnow())"
+   ``` 
+4. Run a query:
+
+- Prepare your question:
+`question = "Which city has the largest population in New York?"`
+- Embed the question using sentence-transformers/all-MiniLM-L6-v2.
+- Retrieve the top K most relevant documents using Milvus vector search.
+- Pass the retrieved context to the OpenAI model for conversational output.
+
+## 🛠️ Key Commands for Data Scientists
+- Apply feature definitions:
+
+```bash 
+feast apply 
+```
+
+- Materialize features to the online store:
+```python
+store.write_to_online_store(feature_view_name='city_embeddings', df=df)
+```
+
+-Inspect retrieved features using Python:
+```python
+context_data = store.retrieve_online_documents_v2(
+    features=[
+        "city_embeddings:vector",
+        "city_embeddings:item_id",
+        "city_embeddings:state",
+        "city_embeddings:sentence_chunks",
+        "city_embeddings:wiki_summary",
+    ],
+    query=query,
+    top_k=3,
+    distance_metric='COSINE',
+).to_df()
+display(context_data)
+```
+
+📊 Example Output
+When querying: Which city has the largest population in New York?
+
+The model provides:
+
+```
+The largest city in New York is New York City, often referred to as NYC. It is the most populous city in the United States, with an estimated population of 8,335,897 in 2022.
+```
\ No newline at end of file
diff --git a/examples/rag/__init__.py b/examples/rag/__init__.py
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/examples/rag/feature_repo/__init__.py b/examples/rag/feature_repo/__init__.py
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/examples/rag/feature_repo/data/city_wikipedia_summaries_with_embeddings.parquet b/examples/rag/feature_repo/data/city_wikipedia_summaries_with_embeddings.parquet
new file mode 100644
index 00000000000..63270802fdf
Binary files /dev/null and b/examples/rag/feature_repo/data/city_wikipedia_summaries_with_embeddings.parquet differ
diff --git a/examples/rag/feature_repo/example_repo.py b/examples/rag/feature_repo/example_repo.py
new file mode 100644
index 00000000000..e0a9be21452
--- /dev/null
+++ b/examples/rag/feature_repo/example_repo.py
@@ -0,0 +1,42 @@
+from datetime import timedelta
+
+from feast import (
+    FeatureView,
+    Field,
+    FileSource,
+)
+from feast.data_format import ParquetFormat
+from feast.types import Float32, Array, String, ValueType
+from feast import Entity
+
+item = Entity(
+    name="item_id",
+    description="Item ID",
+    value_type=ValueType.INT64,
+)
+
+parquet_file_path = "./data/city_wikipedia_summaries_with_embeddings.parquet"
+
+source = FileSource(
+    file_format=ParquetFormat(),
+    path=parquet_file_path,
+    timestamp_field="event_timestamp",
+)
+
+city_embeddings_feature_view = FeatureView(
+    name="city_embeddings",
+    entities=[item],
+    schema=[
+        Field(
+            name="vector",
+            dtype=Array(Float32),
+            vector_index=True,
+            vector_search_metric="COSINE",
+        ),
+        Field(name="state", dtype=String),
+        Field(name="sentence_chunks", dtype=String),
+        Field(name="wiki_summary", dtype=String),
+    ],
+    source=source,
+    ttl=timedelta(hours=2),
+)
\ No newline at end of file
diff --git a/examples/rag/feature_repo/feature_store.yaml b/examples/rag/feature_repo/feature_store.yaml
new file mode 100644
index 00000000000..223be052093
--- /dev/null
+++ b/examples/rag/feature_repo/feature_store.yaml
@@ -0,0 +1,17 @@
+project: rag
+provider: local
+registry: data/registry.db
+online_store:
+  type: milvus
+  path: data/online_store.db
+  vector_enabled: true
+  embedding_dim: 384
+  index_type: "IVF_FLAT"
+
+
+offline_store:
+  type: file
+entity_key_serialization_version: 3
+# By default, no_auth for authentication and authorization, other possible values kubernetes and oidc. Refer the documentation for more details.
+auth:
+    type: no_auth
diff --git a/examples/rag/feature_repo/test_workflow.py b/examples/rag/feature_repo/test_workflow.py
new file mode 100644
index 00000000000..05cd554d823
--- /dev/null
+++ b/examples/rag/feature_repo/test_workflow.py
@@ -0,0 +1,74 @@
+import pandas as pd
+import torch
+import torch.nn.functional as F
+from feast import FeatureStore
+from transformers import AutoTokenizer, AutoModel
+from example_repo import city_embeddings_feature_view, item
+
+TOKENIZER = "sentence-transformers/all-MiniLM-L6-v2"
+MODEL = "sentence-transformers/all-MiniLM-L6-v2"
+
+
+def mean_pooling(model_output, attention_mask):
+    token_embeddings = model_output[
+        0
+    ]  # First element of model_output contains all token embeddings
+    input_mask_expanded = (
+        attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
+    )
+    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(
+        input_mask_expanded.sum(1), min=1e-9
+    )
+
+
+def run_model(sentences, tokenizer, model):
+    encoded_input = tokenizer(
+        sentences, padding=True, truncation=True, return_tensors="pt"
+    )
+    # Compute token embeddings
+    with torch.no_grad():
+        model_output = model(**encoded_input)
+
+    sentence_embeddings = mean_pooling(model_output, encoded_input["attention_mask"])
+    sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
+    return sentence_embeddings
+
+def run_demo():
+    store = FeatureStore(repo_path=".")
+    df = pd.read_parquet("./data/city_wikipedia_summaries_with_embeddings.parquet")
+    embedding_length = len(df['vector'][0])
+    print(f'embedding length = {embedding_length}')
+
+    store.apply([city_embeddings_feature_view, item])
+    fields = [
+        f.name for f in city_embeddings_feature_view.features
+    ] + city_embeddings_feature_view.entities + [city_embeddings_feature_view.batch_source.timestamp_field]
+    print('\ndata=')
+    print(df[fields].head().T)
+    store.write_to_online_store("city_embeddings", df[fields][0:3])
+
+
+    question = "the most populous city in the state of New York is New York"
+    tokenizer = AutoTokenizer.from_pretrained(TOKENIZER)
+    model = AutoModel.from_pretrained(MODEL)
+    query_embedding = run_model(question, tokenizer, model)
+    query = query_embedding.detach().cpu().numpy().tolist()[0]
+
+    # Retrieve top k documents
+    features = store.retrieve_online_documents_v2(
+        features=[
+            "city_embeddings:vector",
+            "city_embeddings:item_id",
+            "city_embeddings:state",
+            "city_embeddings:sentence_chunks",
+            "city_embeddings:wiki_summary",
+        ],
+        query=query,
+        top_k=3,
+    )
+    print("features =")
+    print(features.to_df())
+    store.teardown()
+
+if __name__ == "__main__":
+    run_demo()
diff --git a/examples/rag/milvus-quickstart.ipynb b/examples/rag/milvus-quickstart.ipynb
new file mode 100644
index 00000000000..f22b8ed4a7b
--- /dev/null
+++ b/examples/rag/milvus-quickstart.ipynb
@@ -0,0 +1,1023 @@
+{
+ "cells": [
+  {
+   "cell_type": "raw",
+   "id": "f33a2f4a-48b5-4218-8b3f-fc884070145e",
+   "metadata": {},
+   "source": [
+    "!pip install torch\n",
+    "!pip install transformers\n",
+    "!pip install openai"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b19cb54f-e63f-4d9b-b7ff-d18a30635cd2",
+   "metadata": {},
+   "source": [
+    "# Overview\n",
+    "\n",
+    "In this tutorial, we'll use Feast to inject documents and structured data (i.e., features) into the context of an LLM (Large Language Model) to power a RAG Application (Retrieval Augmented Generation).\n",
+    "\n",
+    "Feast solves several common issues in this flow:\n",
+    "1. **Online retrieval:** At inference time, LLMs often need access to data that isn't readily \n",
+    "   available and needs to be precomputed from other data sources.\n",
+    "   * Feast manages deployment to a variety of online stores (e.g. Milvus, DynamoDB, Redis, Google Cloud Datastore) and \n",
+    "     ensures necessary features are consistently _available_ and _freshly computed_ at inference time.\n",
+    "2. **Vector Search:** Feast has built support for vector similarity search that is easily configured declaritively so users can focus on their application.\n",
+    "3. **Richer structured data:** Along with vector search, users can query standard structured fields to inject into the LLM context for better user experiences.\n",
+    "4. **Feature/Context and versioning:** Different teams within an organization are often unable to reuse \n",
+    "   data across projects and services, resulting in duplicate application logic. Models have data dependencies that need \n",
+    "   to be versioned, for example when running A/B tests on model/prompt versions.\n",
+    "   * Feast enables discovery of and collaboration on previously used documents, features, and enables versioning of sets of \n",
+    "     data.\n",
+    "\n",
+    "We will:\n",
+    "1. Deploy a local feature store with a **Parquet file offline store** and **Sqlite online store**.\n",
+    "2. Write/materialize the data (i.e., feature values) from the offline store (a parquet file) into the online store (Sqlite).\n",
+    "3. Serve the features using the Feast SDK\n",
+    "4. Inject the document into the LLM's context to answer questions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "425cf2f7-70b5-423c-a4f2-f470d8638135",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%sh\n",
+    "pip install feast -U -q\n",
+    "echo \"Please restart your runtime now (Runtime -> Restart runtime). This ensures that the correct dependencies are loaded.\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db162bb9-e262-4958-990d-fd8f3f1f1249",
+   "metadata": {},
+   "source": [
+    "**Reminder**: Please restart your runtime after installing Feast (Runtime -> Restart runtime). This ensures that the correct dependencies are loaded."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a25cf84f-c255-4bb3-a3d7-e5512c1ba10d",
+   "metadata": {},
+   "source": [
+    "## Step 2: Create a feature repository\n",
+    "\n",
+    "A feature repository is a directory that contains the configuration of the feature store and individual features. This configuration is written as code (Python/YAML) and it's highly recommended that teams track it centrally using git. See [Feature Repository](https://docs.feast.dev/reference/feature-repository) for a detailed explanation of feature repositories.\n",
+    "\n",
+    "The easiest way to create a new feature repository to use the `feast init` command. For this demo, you **do not** need to initialize a feast repo.\n",
+    "\n",
+    "\n",
+    "### Demo data scenario \n",
+    "- We data from Wikipedia about states that we have embedded into sentence embeddings to be used for vector retrieval in a RAG application.\n",
+    "- We want to generate predictions for driver satisfaction for the rest of the users so we can reach out to potentially dissatisfied users."
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "id": "61dfdc9d8732d5a6",
+   "metadata": {},
+   "source": [
+    "!feast init feature_repo"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c969b62f-4f58-49ed-ae23-ace1916de0c0",
+   "metadata": {},
+   "source": [
+    "### Step 2a: Inspecting the feature repository\n",
+    "\n",
+    "Let's take a look at the demo repo itself. It breaks down into\n",
+    "\n",
+    "\n",
+    "* `data/` contains raw demo parquet data\n",
+    "* `example_repo.py` contains demo feature definitions\n",
+    "* `feature_store.yaml` contains a demo setup configuring where data sources are\n",
+    "* `test_workflow.py` showcases how to run all key Feast commands, including defining, retrieving, and pushing features.\n",
+    "   * You can run this with `python test_workflow.py`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "5d531836-5981-4a34-9367-51b09af18a8a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "/Users/farceo/dev/feast/examples/rag/feature_repo\n",
+      "__init__.py        \u001b[1m\u001b[36mdata\u001b[m\u001b[m               feature_store.yaml test_milvus.py\n",
+      "\u001b[1m\u001b[36m__pycache__\u001b[m\u001b[m        example_repo.py    milvus_demo.db     test_workflow.py\n",
+      "\n",
+      "./__pycache__:\n",
+      "example_repo.cpython-311.pyc\n",
+      "\n",
+      "./data:\n",
+      "city_wikipedia_summaries_with_embeddings.parquet\n",
+      "online_store.db\n",
+      "registry.db\n"
+     ]
+    }
+   ],
+   "source": [
+    "%cd feature_repo/\n",
+    "!ls -R"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d14a8073-5030-4d35-9c96-f5360aeaf39f",
+   "metadata": {},
+   "source": [
+    "### Step 2b: Inspecting the project configuration\n",
+    "Let's inspect the setup of the project in `feature_store.yaml`. \n",
+    "\n",
+    "The key line defining the overall architecture of the feature store is the **provider**. \n",
+    "\n",
+    "The provider value sets default offline and online stores. \n",
+    "* The offline store provides the compute layer to process historical data (for generating training data & feature \n",
+    "  values for serving). \n",
+    "* The online store is a low latency store of the latest feature values (for powering real-time inference).\n",
+    "\n",
+    "Valid values for `provider` in `feature_store.yaml` are:\n",
+    "\n",
+    "* local: use file source with Milvus Lite\n",
+    "* gcp: use BigQuery/Snowflake with Google Cloud Datastore/Redis\n",
+    "* aws: use Redshift/Snowflake with DynamoDB/Redis\n",
+    "\n",
+    "Note that there are many other offline / online stores Feast works with, including Azure, Hive, Trino, and PostgreSQL via community plugins. See https://docs.feast.dev/roadmap for all supported connectors.\n",
+    "\n",
+    "A custom setup can also be made by following [Customizing Feast](https://docs.feast.dev/v/master/how-to-guides/customizing-feast)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "14c830ef-f5a4-4867-ad5c-87e709df7057",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\u001b[94mproject\u001b[39;49;00m:\u001b[37m \u001b[39;49;00mrag\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[94mprovider\u001b[39;49;00m:\u001b[37m \u001b[39;49;00mlocal\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[94mregistry\u001b[39;49;00m:\u001b[37m \u001b[39;49;00mdata/registry.db\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[94monline_store\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[37m  \u001b[39;49;00m\u001b[94mtype\u001b[39;49;00m:\u001b[37m \u001b[39;49;00mmilvus\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[37m  \u001b[39;49;00m\u001b[94mpath\u001b[39;49;00m:\u001b[37m \u001b[39;49;00mdata/online_store.db\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[37m  \u001b[39;49;00m\u001b[94mvector_enabled\u001b[39;49;00m:\u001b[37m \u001b[39;49;00mtrue\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[37m  \u001b[39;49;00m\u001b[94membedding_dim\u001b[39;49;00m:\u001b[37m \u001b[39;49;00m384\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[37m  \u001b[39;49;00m\u001b[94mindex_type\u001b[39;49;00m:\u001b[37m \u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mIVF_FLAT\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[94moffline_store\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[37m  \u001b[39;49;00m\u001b[94mtype\u001b[39;49;00m:\u001b[37m \u001b[39;49;00mfile\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[94mentity_key_serialization_version\u001b[39;49;00m:\u001b[37m \u001b[39;49;00m3\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[37m# By default, no_auth for authentication and authorization, other possible values kubernetes and oidc. Refer the documentation for more details.\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[94mauth\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n",
+      "\u001b[37m    \u001b[39;49;00m\u001b[94mtype\u001b[39;49;00m:\u001b[37m \u001b[39;49;00mno_auth\u001b[37m\u001b[39;49;00m\n"
+     ]
+    }
+   ],
+   "source": [
+    "!pygmentize feature_store.yaml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ce80d1a-05d3-434d-bd1e-1ade8abd1f9f",
+   "metadata": {},
+   "source": [
+    "### Inspecting the raw data\n",
+    "\n",
+    "The raw feature data we have in this demo is stored in a local parquet file. The dataset Wikipedia summaries of diferent cities."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "788a27ff-16a4-4b23-8c1c-ba27fd918aa5",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "embedding length = 384\n"
+     ]
+    }
+   ],
+   "source": [
+    "import pandas as pd \n",
+    "\n",
+    "df = pd.read_parquet(\"./data/city_wikipedia_summaries_with_embeddings.parquet\")\n",
+    "df['vector'] = df['vector'].apply(lambda x: x.tolist())\n",
+    "embedding_length = len(df['vector'][0])\n",
+    "print(f'embedding length = {embedding_length}')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "e433178c-51e8-49a7-884c-c9573082ad6d",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>id</th>\n",
+       "      <th>item_id</th>\n",
+       "      <th>event_timestamp</th>\n",
+       "      <th>state</th>\n",
+       "      <th>wiki_summary</th>\n",
+       "      <th>sentence_chunks</th>\n",
+       "      <th>vector</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>2025-01-09 13:36:59.280589</td>\n",
+       "      <td>New York, New York</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "      <td>[0.1465730518102646, -0.07317650318145752, 0.0...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "      <td>2025-01-09 13:36:59.280589</td>\n",
+       "      <td>New York, New York</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "      <td>The city comprises five boroughs, each of whic...</td>\n",
+       "      <td>[0.05218901485204697, -0.08449874818325043, 0....</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2025-01-09 13:36:59.280589</td>\n",
+       "      <td>New York, New York</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "      <td>New York is a global center of finance and com...</td>\n",
+       "      <td>[0.06769222766160965, -0.07371102273464203, -0...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "      <td>2025-01-09 13:36:59.280589</td>\n",
+       "      <td>New York, New York</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "      <td>New York City is the epicenter of the world's ...</td>\n",
+       "      <td>[0.12095861881971359, -0.04279915615916252, 0....</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "      <td>2025-01-09 13:36:59.280589</td>\n",
+       "      <td>New York, New York</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "      <td>With an estimated population in 2022 of 8,335,...</td>\n",
+       "      <td>[0.17943550646305084, -0.09458263963460922, 0....</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   id  item_id            event_timestamp               state  \\\n",
+       "0   0        0 2025-01-09 13:36:59.280589  New York, New York   \n",
+       "1   1        1 2025-01-09 13:36:59.280589  New York, New York   \n",
+       "2   2        2 2025-01-09 13:36:59.280589  New York, New York   \n",
+       "3   3        3 2025-01-09 13:36:59.280589  New York, New York   \n",
+       "4   4        4 2025-01-09 13:36:59.280589  New York, New York   \n",
+       "\n",
+       "                                        wiki_summary  \\\n",
+       "0  New York, often called New York City or simply...   \n",
+       "1  New York, often called New York City or simply...   \n",
+       "2  New York, often called New York City or simply...   \n",
+       "3  New York, often called New York City or simply...   \n",
+       "4  New York, often called New York City or simply...   \n",
+       "\n",
+       "                                     sentence_chunks  \\\n",
+       "0  New York, often called New York City or simply...   \n",
+       "1  The city comprises five boroughs, each of whic...   \n",
+       "2  New York is a global center of finance and com...   \n",
+       "3  New York City is the epicenter of the world's ...   \n",
+       "4  With an estimated population in 2022 of 8,335,...   \n",
+       "\n",
+       "                                              vector  \n",
+       "0  [0.1465730518102646, -0.07317650318145752, 0.0...  \n",
+       "1  [0.05218901485204697, -0.08449874818325043, 0....  \n",
+       "2  [0.06769222766160965, -0.07371102273464203, -0...  \n",
+       "3  [0.12095861881971359, -0.04279915615916252, 0....  \n",
+       "4  [0.17943550646305084, -0.09458263963460922, 0....  "
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from IPython.display import display\n",
+    "\n",
+    "display(df.head())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec07d38d-d0ff-4dc3-b041-3bf24de9e7e3",
+   "metadata": {},
+   "source": [
+    "## Step 3: Register feature definitions and deploy your feature store\n",
+    "\n",
+    "`feast apply` scans python files in the current directory for feature/entity definitions and deploys infrastructure according to `feature_store.yaml`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79409ca9-7552-4aa5-b95b-29f836a0d3a5",
+   "metadata": {},
+   "source": [
+    "### Step 3a: Inspecting feature definitions\n",
+    "Let's inspect what `example_repo.py` looks like:\n",
+    "\n",
+    "```python\n",
+    "from datetime import timedelta\n",
+    "\n",
+    "from feast import (\n",
+    "    FeatureView,\n",
+    "    Field,\n",
+    "    FileSource,\n",
+    ")\n",
+    "from feast.data_format import ParquetFormat\n",
+    "from feast.types import Float32, Array, String, ValueType\n",
+    "from feast import Entity\n",
+    "\n",
+    "item = Entity(\n",
+    "    name=\"item_id\",\n",
+    "    description=\"Item ID\",\n",
+    "    value_type=ValueType.INT64,\n",
+    ")\n",
+    "\n",
+    "parquet_file_path = \"./data/city_wikipedia_summaries_with_embeddings.parquet\"\n",
+    "\n",
+    "source = FileSource(\n",
+    "    file_format=ParquetFormat(),\n",
+    "    path=parquet_file_path,\n",
+    "    timestamp_field=\"event_timestamp\",\n",
+    ")\n",
+    "\n",
+    "city_embeddings_feature_view = FeatureView(\n",
+    "    name=\"city_embeddings\",\n",
+    "    entities=[item],\n",
+    "    schema=[\n",
+    "        Field(\n",
+    "            name=\"vector\",\n",
+    "            dtype=Array(Float32),\n",
+    "            vector_index=True,\n",
+    "            vector_search_metric=\"COSINE\",\n",
+    "        ),\n",
+    "        Field(name=\"state\", dtype=String),\n",
+    "        Field(name=\"sentence_chunks\", dtype=String),\n",
+    "        Field(name=\"wiki_summary\", dtype=String),\n",
+    "    ],\n",
+    "    source=source,\n",
+    "    ttl=timedelta(hours=2),\n",
+    ")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76634929-c84a-4301-93d3-88292335bde0",
+   "metadata": {},
+   "source": [
+    "### Step 3b: Applying feature definitions\n",
+    "Now we run `feast apply` to register the feature views and entities defined in `example_repo.py`, and sets up SQLite online store tables. Note that we had previously specified SQLite as the online store in `feature_store.yaml` by specifying a `local` provider."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "837e1530-e863-4e5f-b206-b6b4b3ca2aa2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "/Users/farceo/dev/feast/sdk/python/feast/feature_view.py:48: DeprecationWarning: Entity value_type will be mandatory in the next release. Please specify a value_type for entity '__dummy'.\n",
+      "  DUMMY_ENTITY = Entity(\n",
+      "/Users/farceo/dev/feast/.venv/lib/python3.11/site-packages/pymilvus/client/__init__.py:6: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html\n",
+      "  from pkg_resources import DistributionNotFound, get_distribution\n",
+      "/Users/farceo/dev/feast/.venv/lib/python3.11/site-packages/pkg_resources/__init__.py:3142: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.\n",
+      "Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages\n",
+      "  declare_namespace(pkg)\n",
+      "/Users/farceo/dev/feast/.venv/lib/python3.11/site-packages/environs/__init__.py:58: DeprecationWarning: The '__version_info__' attribute is deprecated and will be removed in in a future version. Use feature detection or 'packaging.Version(importlib.metadata.version(\"marshmallow\")).release' instead.\n",
+      "  _SUPPORTS_LOAD_DEFAULT = ma.__version_info__ >= (3, 13)\n",
+      "/Users/farceo/dev/feast/.venv/lib/python3.11/site-packages/pydantic/_internal/_fields.py:192: UserWarning: Field name \"vector_enabled\" in \"MilvusOnlineStoreConfig\" shadows an attribute in parent \"VectorStoreConfig\"\n",
+      "  warnings.warn(\n",
+      "No project found in the repository. Using project name rag defined in feature_store.yaml\n",
+      "Applying changes for project rag\n",
+      "/Users/farceo/dev/feast/sdk/python/feast/entity.py:173: DeprecationWarning: Entity value_type will be mandatory in the next release. Please specify a value_type for entity 'item_id'.\n",
+      "  entity = cls(\n",
+      "/Users/farceo/dev/feast/sdk/python/feast/entity.py:173: DeprecationWarning: Entity value_type will be mandatory in the next release. Please specify a value_type for entity '__dummy'.\n",
+      "  entity = cls(\n",
+      "Connecting to Milvus in local mode using /Users/farceo/dev/feast/examples/rag/feature_repo/data/online_store.db\n",
+      "01/29/2025 05:11:55 PM pymilvus.milvus_client.milvus_client DEBUG: Created new connection using: 9fe4c5dfbe434f1babbf9f2a0970fb87\n",
+      "Deploying infrastructure for \u001b[1m\u001b[32mcity_embeddings\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "! feast apply "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad7654cc-865c-4bb4-8c0f-d3086c5d9f7e",
+   "metadata": {},
+   "source": [
+    "## Step 5: Load features into your online store"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "34ded931-3de0-4951-aead-1e8ca1679cbe",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/Users/farceo/dev/feast/sdk/python/feast/feature_view.py:48: DeprecationWarning: Entity value_type will be mandatory in the next release. Please specify a value_type for entity '__dummy'.\n",
+      "  DUMMY_ENTITY = Entity(\n",
+      "/Users/farceo/dev/feast/.venv/lib/python3.11/site-packages/pymilvus/client/__init__.py:6: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html\n",
+      "  from pkg_resources import DistributionNotFound, get_distribution\n",
+      "/Users/farceo/dev/feast/.venv/lib/python3.11/site-packages/pkg_resources/__init__.py:3142: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.\n",
+      "Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages\n",
+      "  declare_namespace(pkg)\n",
+      "/Users/farceo/dev/feast/.venv/lib/python3.11/site-packages/environs/__init__.py:58: DeprecationWarning: The '__version_info__' attribute is deprecated and will be removed in in a future version. Use feature detection or 'packaging.Version(importlib.metadata.version(\"marshmallow\")).release' instead.\n",
+      "  _SUPPORTS_LOAD_DEFAULT = ma.__version_info__ >= (3, 13)\n",
+      "/Users/farceo/dev/feast/.venv/lib/python3.11/site-packages/pydantic/_internal/_fields.py:192: UserWarning: Field name \"vector_enabled\" in \"MilvusOnlineStoreConfig\" shadows an attribute in parent \"VectorStoreConfig\"\n",
+      "  warnings.warn(\n"
+     ]
+    }
+   ],
+   "source": [
+    "from datetime import datetime\n",
+    "from feast import FeatureStore\n",
+    "\n",
+    "store = FeatureStore(repo_path=\".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c784d77-e96c-455c-9f1f-9183bab58d72",
+   "metadata": {},
+   "source": [
+    "### Step 5a: Using `materialize_incremental`\n",
+    "\n",
+    "We now serialize the latest values of features since the beginning of time to prepare for serving. Note, `materialize_incremental` serializes all new features since the last `materialize` call, or since the time provided minus the `ttl` timedelta. In this case, this will be `CURRENT_TIME - 1 day` (`ttl` was set on the `FeatureView` instances in [feature_repo/feature_repo/example_repo.py](feature_repo/feature_repo/example_repo.py)). \n",
+    "\n",
+    "```bash\n",
+    "CURRENT_TIME=$(date -u +\"%Y-%m-%dT%H:%M:%S\")\n",
+    "feast materialize-incremental $CURRENT_TIME\n",
+    "```\n",
+    "\n",
+    "An alternative to using the CLI command is to use Python:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "a2655725-5cc4-4f07-ade4-dc5e705eed05",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Connecting to Milvus in local mode using data/online_store.db\n"
+     ]
+    }
+   ],
+   "source": [
+    "store.write_to_online_store(feature_view_name='city_embeddings', df=df)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b836e5b1-1fe2-4e9d-8c9a-bdc91da8254e",
+   "metadata": {},
+   "source": [
+    "### Step 5b: Inspect materialized features\n",
+    "\n",
+    "Note that now there are `online_store.db` and `registry.db`, which store the materialized features and schema information, respectively."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "1307b1aa-fecf-4adf-aafc-f65d89ca735c",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>item_id_pk</th>\n",
+       "      <th>created_ts</th>\n",
+       "      <th>event_ts</th>\n",
+       "      <th>item_id</th>\n",
+       "      <th>sentence_chunks</th>\n",
+       "      <th>state</th>\n",
+       "      <th>vector</th>\n",
+       "      <th>wiki_summary</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>0100000002000000070000006974656d5f696404000000...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1736447819280589</td>\n",
+       "      <td>0</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "      <td>New York, New York</td>\n",
+       "      <td>0.146573</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>0100000002000000070000006974656d5f696404000000...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1736447819280589</td>\n",
+       "      <td>0</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "      <td>New York, New York</td>\n",
+       "      <td>-0.073177</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>0100000002000000070000006974656d5f696404000000...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1736447819280589</td>\n",
+       "      <td>0</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "      <td>New York, New York</td>\n",
+       "      <td>0.052114</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>0100000002000000070000006974656d5f696404000000...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1736447819280589</td>\n",
+       "      <td>0</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "      <td>New York, New York</td>\n",
+       "      <td>0.033187</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>0100000002000000070000006974656d5f696404000000...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1736447819280589</td>\n",
+       "      <td>0</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "      <td>New York, New York</td>\n",
+       "      <td>0.012013</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                          item_id_pk  created_ts  \\\n",
+       "0  0100000002000000070000006974656d5f696404000000...           0   \n",
+       "1  0100000002000000070000006974656d5f696404000000...           0   \n",
+       "2  0100000002000000070000006974656d5f696404000000...           0   \n",
+       "3  0100000002000000070000006974656d5f696404000000...           0   \n",
+       "4  0100000002000000070000006974656d5f696404000000...           0   \n",
+       "\n",
+       "           event_ts item_id  \\\n",
+       "0  1736447819280589       0   \n",
+       "1  1736447819280589       0   \n",
+       "2  1736447819280589       0   \n",
+       "3  1736447819280589       0   \n",
+       "4  1736447819280589       0   \n",
+       "\n",
+       "                                     sentence_chunks               state  \\\n",
+       "0  New York, often called New York City or simply...  New York, New York   \n",
+       "1  New York, often called New York City or simply...  New York, New York   \n",
+       "2  New York, often called New York City or simply...  New York, New York   \n",
+       "3  New York, often called New York City or simply...  New York, New York   \n",
+       "4  New York, often called New York City or simply...  New York, New York   \n",
+       "\n",
+       "     vector                                       wiki_summary  \n",
+       "0  0.146573  New York, often called New York City or simply...  \n",
+       "1 -0.073177  New York, often called New York City or simply...  \n",
+       "2  0.052114  New York, often called New York City or simply...  \n",
+       "3  0.033187  New York, often called New York City or simply...  \n",
+       "4  0.012013  New York, often called New York City or simply...  "
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pymilvus_client = store._provider._online_store._connect(store.config)\n",
+    "COLLECTION_NAME = pymilvus_client.list_collections()[0]\n",
+    "\n",
+    "milvus_query_result = pymilvus_client.query(\n",
+    "    collection_name=COLLECTION_NAME,\n",
+    "    filter=\"item_id == '0'\",\n",
+    ")\n",
+    "pd.DataFrame(milvus_query_result[0]).head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5fbf3921-e775-46b7-9915-d18c6592586f",
+   "metadata": {},
+   "source": [
+    "### Quick note on entity keys\n",
+    "Note from the above command that the online store indexes by `entity_key`. \n",
+    "\n",
+    "[Entity keys](https://docs.feast.dev/getting-started/concepts/entity#entity-key) include a list of all entities needed (e.g. all relevant primary keys) to generate the feature vector. In this case, this is a serialized version of the `driver_id`. We use this later to fetch all features for a given driver at inference time."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "516f6e4a-2d37-4428-8dba-81620a65c2ad",
+   "metadata": {},
+   "source": [
+    "## Step 6: Embedding a query using PyTorch and Sentence Transformers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66b4e67d-6f94-4532-b107-abc4c0f002f1",
+   "metadata": {},
+   "source": [
+    "During inference (e.g., during when a user submits a chat message) we need to embed the input text. This can be thought of as a feature transformation of the input data. In this example, we'll do this with a small Sentence Transformer from Hugging Face."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "62da57be-316d-46ee-b8a7-bac54a7faf55",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn.functional as F\n",
+    "from feast import FeatureStore\n",
+    "from pymilvus import MilvusClient, DataType, FieldSchema\n",
+    "from transformers import AutoTokenizer, AutoModel\n",
+    "from example_repo import city_embeddings_feature_view, item\n",
+    "\n",
+    "TOKENIZER = \"sentence-transformers/all-MiniLM-L6-v2\"\n",
+    "MODEL = \"sentence-transformers/all-MiniLM-L6-v2\"\n",
+    "\n",
+    "def mean_pooling(model_output, attention_mask):\n",
+    "    token_embeddings = model_output[\n",
+    "        0\n",
+    "    ]  # First element of model_output contains all token embeddings\n",
+    "    input_mask_expanded = (\n",
+    "        attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()\n",
+    "    )\n",
+    "    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(\n",
+    "        input_mask_expanded.sum(1), min=1e-9\n",
+    "    )\n",
+    "\n",
+    "def run_model(sentences, tokenizer, model):\n",
+    "    encoded_input = tokenizer(\n",
+    "        sentences, padding=True, truncation=True, return_tensors=\"pt\"\n",
+    "    )\n",
+    "    # Compute token embeddings\n",
+    "    with torch.no_grad():\n",
+    "        model_output = model(**encoded_input)\n",
+    "\n",
+    "    sentence_embeddings = mean_pooling(model_output, encoded_input[\"attention_mask\"])\n",
+    "    sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)\n",
+    "    return sentence_embeddings"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "67868cdf-04e9-4086-bed8-050e4902ed71",
+   "metadata": {},
+   "source": [
+    "## Step 7: Fetching real-time vectors and data for online inference"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29b9ae94-7daa-4d56-8bca-9339d09cd1ed",
+   "metadata": {},
+   "source": [
+    "At inference time, we need to use vector similarity search through the document embeddings from the online feature store using `retrieve_online_documents_v2()` while passing the embedded query. These feature vectors can then be fed into the context of the LLM."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "0c76a526-35dc-4af5-bd46-d181e3a8c23a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "question = \"Which city has the largest population in New York?\"\n",
+    "\n",
+    "tokenizer = AutoTokenizer.from_pretrained(TOKENIZER)\n",
+    "model = AutoModel.from_pretrained(MODEL)\n",
+    "query_embedding = run_model(question, tokenizer, model)\n",
+    "query = query_embedding.detach().cpu().numpy().tolist()[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "d3099708-409b-4d9e-b1d6-8ad86de6fde2",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>vector</th>\n",
+       "      <th>item_id</th>\n",
+       "      <th>state</th>\n",
+       "      <th>sentence_chunks</th>\n",
+       "      <th>wiki_summary</th>\n",
+       "      <th>distance</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>[0.15548758208751678, -0.08017724752426147, -0...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>New York, New York</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "      <td>New York, often called New York City or simply...</td>\n",
+       "      <td>0.743023</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                              vector item_id  \\\n",
+       "0  [0.15548758208751678, -0.08017724752426147, -0...       0   \n",
+       "\n",
+       "                state                                    sentence_chunks  \\\n",
+       "0  New York, New York  New York, often called New York City or simply...   \n",
+       "\n",
+       "                                        wiki_summary  distance  \n",
+       "0  New York, often called New York City or simply...  0.743023  "
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from IPython.display import display\n",
+    "\n",
+    "# Retrieve top k documents\n",
+    "context_data = store.retrieve_online_documents_v2(\n",
+    "    features=[\n",
+    "        \"city_embeddings:vector\",\n",
+    "        \"city_embeddings:item_id\",\n",
+    "        \"city_embeddings:state\",\n",
+    "        \"city_embeddings:sentence_chunks\",\n",
+    "        \"city_embeddings:wiki_summary\",\n",
+    "    ],\n",
+    "    query=query,\n",
+    "    top_k=3,\n",
+    "    distance_metric='COSINE',\n",
+    ").to_df()\n",
+    "display(context_data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "0d56cf77-b09c-4ed7-b26e-3950d351953e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def format_documents(context_df):\n",
+    "    output_context = \"\"\n",
+    "    unique_documents = context_df.drop_duplicates().apply(\n",
+    "        lambda x: \"City & State = {\" + x['state'] +\"\\nSummary = {\" + x['wiki_summary'].strip(),\n",
+    "        axis=1,\n",
+    "    )\n",
+    "    for i, document_text in enumerate(unique_documents):\n",
+    "        output_context+= f\"****START DOCUMENT {i}****\\n{document_text.strip()}\\n****END DOCUMENT {i}****\"\n",
+    "    return output_context"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "595adf60-54bd-4ec7-966e-5ac08f643f25",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "RAG_CONTEXT = format_documents(context_data[['state', 'wiki_summary']])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "3978561a-79a0-48bb-86ca-d81293a0e618",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "****START DOCUMENT 0****\n",
+      "City & State = {New York, New York\n",
+      "Summary = {New York, often called New York City or simply NYC, is the most populous city in the United States, located at the southern tip of New York State on one of the world's largest natural harbors. The city comprises five boroughs, each of which is coextensive with a respective county. New York is a global center of finance and commerce, culture and technology, entertainment and media, academics and scientific output, and the arts and fashion, and, as home to the headquarters of the United Nations, is an important center for international diplomacy. New York City is the epicenter of the world's principal metropolitan economy.\n",
+      "With an estimated population in 2022 of 8,335,897 distributed over 300.46 square miles (778.2 km2), the city is the most densely populated major city in the United States. New York has more than double the population of Los Angeles, the nation's second-most populous city. New York is the geographical and demographic center of both the Northeast megalopolis and the New York metropolitan area, the largest metropolitan area in the U.S. by both population and urban area. With more than 20.1 million people in its metropolitan statistical area and 23.5 million in its combined statistical area as of 2020, New York City is one of the world's most populous megacities. The city and its metropolitan area are the premier gateway for legal immigration to the United States. As many as 800 languages are spoken in New York, making it the most linguistically diverse city in the world. In 2021, the city was home to nearly 3.1 million residents born outside the U.S., the largest foreign-born population of any city in the world.\n",
+      "New York City traces its origins to Fort Amsterdam and a trading post founded on the southern tip of Manhattan Island by Dutch colonists in approximately 1624. The settlement was named New Amsterdam (Dutch: Nieuw Amsterdam) in 1626 and was chartered as a city in 1653. The city came under English control in 1664 and was temporarily renamed New York after King Charles II granted the lands to his brother, the Duke of York. before being permanently renamed New York in November 1674. New York City was the capital of the United States from 1785 until 1790. The modern city was formed by the 1898 consolidation of its five boroughs: Manhattan, Brooklyn, Queens, The Bronx, and Staten Island, and has been the largest U.S. city ever since.\n",
+      "Anchored by Wall Street in the Financial District of Lower Manhattan, New York City has been called both the world's premier financial and fintech center and the most economically powerful city in the world. As of 2022, the New York metropolitan area is the largest metropolitan economy in the world with a gross metropolitan product of over US$2.16 trillion. If the New York metropolitan area were its own country, it would have the tenth-largest economy in the world. The city is home to the world's two largest stock exchanges by market capitalization of their listed companies: the New York Stock Exchange and Nasdaq. New York City is an established safe haven for global investors. As of 2023, New York City is the most expensive city in the world for expatriates to live. New York City is home to the highest number of billionaires, individuals of ultra-high net worth (greater than US$30 million), and millionaires of any city in the world.\n",
+      "****END DOCUMENT 0****\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(RAG_CONTEXT)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "09cad16f-4078-42de-80ee-2672dae5608a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "FULL_PROMPT = f\"\"\"\n",
+    "You are an assistant for answering questions about states. You will be provided documentation from Wikipedia. Provide a conversational answer.\n",
+    "If you don't know the answer, just say \"I do not know.\" Don't make up an answer.\n",
+    "\n",
+    "Here are document(s) you should use when answer the users question:\n",
+    "{RAG_CONTEXT}\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "7bb4a000-8ef3-4006-9c61-7d76fa865d28",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(\n",
+    "    api_key=os.environ.get(\"OPENAI_API_KEY\"),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "da814147-9c78-4906-a84a-78fc88c2fc49",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = client.chat.completions.create(\n",
+    "    model=\"gpt-4o-mini\",\n",
+    "    messages=[\n",
+    "        {\"role\": \"system\", \"content\": FULL_PROMPT},\n",
+    "        {\"role\": \"user\", \"content\": question}\n",
+    "    ],\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "68cbd8df-af73-4dbe-97a9-f3cd89f36f3d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The largest city in New York is New York City, often referred to as NYC. It is the most populous city in the United States, with an estimated population of 8,335,897 in 2022.\n"
+     ]
+    }
+   ],
+   "source": [
+    "print('\\n'.join([c.message.content for c in response.choices]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4f01627-533b-49b0-9814-292360d064c6",
+   "metadata": {},
+   "source": [
+    "# End"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/sdk/python/feast/infra/online_stores/milvus_online_store/milvus.py b/sdk/python/feast/infra/online_stores/milvus_online_store/milvus.py
index e39db6d3a34..642027b5630 100644
--- a/sdk/python/feast/infra/online_stores/milvus_online_store/milvus.py
+++ b/sdk/python/feast/infra/online_stores/milvus_online_store/milvus.py
@@ -138,8 +138,11 @@ def _connect(self, config: RepoConfig) -> MilvusClient:
                 )
         return self.client
 
-    def _get_collection(self, config: RepoConfig, table: FeatureView) -> Dict[str, Any]:
+    def _get_or_create_collection(
+        self, config: RepoConfig, table: FeatureView
+    ) -> Dict[str, Any]:
         self.client = self._connect(config)
+        vector_field_dict = {k.name: k for k in table.schema if k.vector_index}
         collection_name = _table_id(config.project, table)
         if collection_name not in self._collections:
             # Create a composite key by combining entity fields
@@ -200,10 +203,13 @@ def _get_collection(self, config: RepoConfig, table: FeatureView) -> Dict[str, A
                         DataType.FLOAT_VECTOR,
                         DataType.BINARY_VECTOR,
                     ]:
+                        metric = vector_field_dict[
+                            vector_field.name
+                        ].vector_search_metric
                         index_params.add_index(
                             collection_name=collection_name,
                             field_name=vector_field.name,
-                            metric_type=config.online_store.metric_type,
+                            metric_type=metric or config.online_store.metric_type,
                             index_type=config.online_store.index_type,
                             index_name=f"vector_index_{vector_field.name}",
                             params={"nlist": config.online_store.nlist},
@@ -234,7 +240,7 @@ def online_write_batch(
         progress: Optional[Callable[[int], Any]],
     ) -> None:
         self.client = self._connect(config)
-        collection = self._get_collection(config, table)
+        collection = self._get_or_create_collection(config, table)
         vector_cols = [f.name for f in table.features if f.vector_index]
         entity_batch_to_insert = []
         for entity_key, values_dict, timestamp, created_ts in data:
@@ -301,7 +307,7 @@ def update(
     ):
         self.client = self._connect(config)
         for table in tables_to_keep:
-            self._collections = self._get_collection(config, table)
+            self._collections = self._get_or_create_collection(config, table)
 
         for table in tables_to_delete:
             collection_name = _table_id(config.project, table)
@@ -347,7 +353,7 @@ def retrieve_online_documents_v2(
         }
         self.client = self._connect(config)
         collection_name = _table_id(config.project, table)
-        collection = self._get_collection(config, table)
+        collection = self._get_or_create_collection(config, table)
         if not config.online_store.vector_enabled:
             raise ValueError("Vector search is not enabled in the online store config")
 
@@ -408,11 +414,10 @@ def retrieve_online_documents_v2(
                 )
                 for field in output_fields:
                     val = ValueProto()
+                    field_value = hit.get("entity", {}).get(field, None)
                     # entity_key_proto = None
                     if field in ["created_ts", "event_ts"]:
-                        res_ts = datetime.fromtimestamp(
-                            hit.get("entity", {}).get(field) / 1e6
-                        )
+                        res_ts = datetime.fromtimestamp(field_value / 1e6)
                     elif field == ann_search_field:
                         serialized_embedding = _serialize_vector_to_float_list(
                             embedding
@@ -426,15 +431,14 @@ def retrieve_online_documents_v2(
                         PrimitiveFeastType.INT32,
                         PrimitiveFeastType.BYTES,
                     ]:
-                        res[field] = ValueProto(
-                            string_val=hit.get("entity", {}).get(field, "")
-                        )
+                        res[field] = ValueProto(string_val=field_value)
                     elif field == composite_key_name:
                         pass
+                    elif isinstance(field_value, bytes):
+                        val.ParseFromString(field_value)
+                        res[field] = val
                     else:
-                        val.ParseFromString(
-                            bytes(hit.get("entity", {}).get(field, b"").encode())
-                        )
+                        val.string_val = field_value
                         res[field] = val
                 distance = hit.get("distance", None)
                 res["distance"] = (
@@ -471,7 +475,7 @@ def _extract_proto_values_to_dict(
                         else:
                             vector_values = getattr(feature_values, proto_val_type).val
                     else:
-                        if serialize_to_string:
+                        if serialize_to_string and proto_val_type != "string_val":
                             vector_values = feature_values.SerializeToString().decode()
                         else:
                             vector_values = getattr(feature_values, proto_val_type)