-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
From Natural Language to SQL: Building and Tracking a Multi-Lingual Query Engine #132
base: main
Are you sure you want to change the base?
From Natural Language to SQL: Building and Tracking a Multi-Lingual Query Engine #132
Conversation
Preview for 3ac4775
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this blog! Great project. I appreciate the detailed code examples and I liked seeing all of the LangGraph nodes reflected in the MLflow traces.
A few general comments:
- It took me a little while to figure out exactly what we would be building; I'd like to see a clearer statement of this very early on. "We will build a system that takes user inputs in any(?) language, translates them to SQL, validates, executes, etc..." Ideally with at least one concrete example. Given a database with data about X, a user can ask <natural language query> in natural language and see <output>.
- The role of MLflow in the story of the post was a little unclear until the end. It talked about lifecycle management a few times but we don't really see it used for that purpose much. I would propose emphasizing tracing a bit more, and really highlighting the correspondence between the nodes and the recorded tracing spans, emphasizing that tracing gives a lot of visibility of what happens at each step.
- I wonder if it's possible to make the node descriptions section a little more concise. Perhaps instead of having the process/key considerations/examples lists in each section, those could be consolidated into a single table at the beginning/end of the node descriptions section, or those could be consolidated into a few sentences per section. It might be even more effective to follow one concrete example though each step instead. E.g. a user starts with "Quantos pedidos foram realizados em Novembro?" — I would be really interested to see what happens with this at each stage (maybe with tracing screenshots at each stage 🙂)
|
||
# Multilingual Query Engine using Langraph | ||
|
||
The Multilingual Query Engine leverages LangGraph’s advanced features to create a stateful, multi-agent, and cyclical graph architecture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest briefly explaining what LangGraph is and why it is the right tool for this purpose (this section gets into features, but I think readers would benefit from a higher-level intro that identifies it as an AI orchestration tool, explains why it was used in this case, etc.)
## AI Workflow Overview | ||
|
||
The Multilingual Query Engine’s advanced AI workflow is composed of interconnected nodes and edges, each representing a crucial stage: | ||
|
||
1. **Translation Node**: Converts the user’s input into English. | ||
|
||
2. **Safety Checks**: Ensures user input is free from toxic or inappropriate content and does not contain harmful SQL commands (e.g., DELETE, DROP). | ||
|
||
3. **Database Schema Extraction**: Retrieves the schema of the target database to understand its structure and available data. | ||
|
||
4. **Relevancy Validation**: Validates the user’s input against the database schema to ensure alignment with the database’s capabilities. | ||
|
||
5. **SQL Query Generation**: Generates an SQL query based on the user’s input and the current database schema. | ||
|
||
6. **SQL Query Validation**: Executes the SQL query in a rollback-safe environment to ensure its validity before running it. | ||
|
||
7. **Dynamic State Evaluation**: Determines the next steps based on the current state. If the SQL query validation fails, it loops back to Stage 5 to regenerate the query. | ||
|
||
8. **Query Execution and Result Retrieval**: Executes the SQL query and returns the results if it’s a SELECT statement. | ||
|
||
The retry mechanism is introduced in Stage 7, where the system dynamically evaluates the current graph state. Specifically, when the SQL query validation node (Stage 6) detects an issue, the state triggers a loop back to the SQL Generation node (Stage 5) for a new SQL Generation attempt (within a maximum of 3 attemps). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intro—first paragraph or two—could use a very brief summary of this. "We will build a system that takes natural language input, such as X, from the user, validates it for safety, and generates correct SQL, informed by context about the database schema." A clear description of the task and an example of the final workflow the project will enable will help readers get their bearings right from the beginning.
**Examples:** | ||
|
||
- Input: _"Quantos pedidos foram realizados em Novembro?"_ | ||
|
||
- Translated: _"How many orders were made in November?"_ | ||
|
||
- Input: _"Combien de ventes avons-nous enregistrées en France ?"_ | ||
|
||
- Translated: _"How many sales did we record in France?"_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note early on that multilingual refers to taking natural-language inputs in multiple languages, not e.g. supporting multiple SQL dialects or something like that. I wasn't 100% sure until I got to here!
def main(): | ||
# Load environment variables from .env file | ||
load_dotenv() | ||
|
||
# Access secrets using os.getenv | ||
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") | ||
|
||
# Setup database and vector store | ||
conn = setup_database() | ||
cursor = conn.cursor() | ||
vector_store = setup_vector_store() | ||
|
||
# Load the model | ||
model_uri = f"models:/{REGISTERED_MODEL_NAME}@{MODEL_ALIAS}" | ||
model = mlflow.pyfunc.load_model(model_uri) | ||
model_input = {"conn": conn, "cursor": cursor, "vector_store": vector_store} | ||
app = model.predict(model_input) | ||
|
||
# save image | ||
app.get_graph().draw_mermaid_png( | ||
output_file_path="sql_agent_with_safety_checks.png" | ||
) | ||
|
||
# Example user interaction | ||
print("Welcome to the SQL Assistant!") | ||
while True: | ||
question = input("\nEnter your SQL question (or type 'exit' to quit): ") | ||
if question.lower() == "exit": | ||
break | ||
|
||
# Initialize the state with all required keys | ||
initial_state = { | ||
"messages": [("user", question)], | ||
"iterations": 0, | ||
"error": "", | ||
"results": None, | ||
"generation": None, | ||
"no_records_found": False, | ||
"translated_input": "", # Initialize translated_input | ||
} | ||
|
||
solution = app.invoke(initial_state) | ||
|
||
# Check if an error was set during the safety check | ||
if solution["error"] == "yes": | ||
print("\nAssistant Message:\n") | ||
print(solution["messages"][-1][1]) # Display the assistant's message | ||
continue # Skip to the next iteration | ||
|
||
# Extract the generated SQL query from solution["generation"] | ||
sql_query = solution["generation"].sql_code | ||
print("\nGenerated SQL Query:\n") | ||
print(sql_query) | ||
|
||
# Extract and display the query results | ||
if solution.get("no_records_found"): | ||
print("\nNo records found matching your query.") | ||
elif "results" in solution and solution["results"] is not None: | ||
print("\nQuery Results:\n") | ||
for row in solution["results"]: | ||
print(row) | ||
else: | ||
print("\nNo results returned or query did not execute successfully.") | ||
|
||
print("Goodbye!") | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show an example invocation of this (not all the code, just one quick example) toward the beginning?
|
||
However, there are also a number of problems that remain when creating an NL2SQL system like semantic ambiguity, schema mapping or error handling and user feedback. Therefore, it is very important that while building such systems, we must put some guardrails instead of completely relying on LLM. | ||
|
||
In this blog post, we’ll walk you through the process of building and managing the lifecycle of a Multilingual Query Engine, encompassing both Natural Language to SQL generation and query execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see much about lifecycle management in the post (it mentions lifecycle management a few times, but doesn't show much about how to use MLflow for that purpose). You might de-emphasize that but show a little more about how MLflow tracing gives visibility into the many different components of a setup like this.
I think it's really cool, looking at the gif at the end, how we can see the different nodes laid out in the article reflected in the trace. It might be interesting to show screenshots of that for each section, or at least call that out more clearly in that section—specifically, that the final graph can be a bit of a black box, might be challenging to debug, to figure out what is happening at each step, but tracing gives really clear visibility into that with the one line of code.
# Logging the Model in MLFlow | ||
|
||
Now that we have built a Multi-Lingual Query Engine using LangGraph, we are ready to log the model using MLflow’s [ Model from Code](https://mlflow.org/blog/models_from_code). This approach, where we log the code that represents the model, contrasts with object-based logging, where a model object is created, serialized, and logged as a pickle or JSON object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Motivate this step (logging the model) a little more? e.g. can mention versioning, sharing, packaging for deployment, etc.
def translate_input(state: GraphState): | ||
print("---TRANSLATING INPUT---") | ||
messages = state["messages"] | ||
user_input = messages[-1][1] # Get the latest user input | ||
|
||
# Translation prompt for the model | ||
translation_prompt = f""" | ||
Translate the following text to English. If the text is already in English, repeat it exactly without any additional explanation. | ||
|
||
Text: | ||
{user_input} | ||
""" | ||
# Call the OpenAI LLM to translate the text | ||
translated_response = llm.invoke(translation_prompt) | ||
translated_text = translated_response.content.strip() # Access the 'content' attribute and strip any extra spaces | ||
state["translated_input"] = translated_text # Save the translated input | ||
print(f"Translated Input: {translated_text}") | ||
|
||
return state | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does translating to english before translating to SQL improve performance? Or would it work just as well to translate from whatever the input language is to SQL? Worth motivating this step.
def safety_check(state: GraphState): | ||
print("---PERFORMING SAFETY CHECK---") | ||
translated_input = state["translated_input"] | ||
messages = state["messages"] | ||
error = "no" | ||
|
||
# List of disallowed SQL operations (e.g., DELETE, DROP) | ||
disallowed_operations = ['CREATE', 'DELETE', 'DROP', 'INSERT', 'UPDATE', 'ALTER', 'TRUNCATE', 'EXEC', 'EXECUTE'] | ||
pattern = re.compile(r'\b(' + '|'.join(disallowed_operations) + r')\b', re.IGNORECASE) | ||
|
||
# Check if the input contains disallowed SQL operations | ||
if pattern.search(translated_input): | ||
print("Input contains disallowed SQL operations. Halting the workflow.") | ||
error = "yes" | ||
messages += [("assistant", "Your query contains disallowed SQL operations and cannot be processed.")] | ||
else: | ||
# Check if the input contains inappropriate content | ||
safety_prompt = f""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious about using pattern search on the (translated) natural language input—wouldn't checking for disallowed SQL operations make more sense after the SQL is generated?
i.e. what would happen if the natural language input is "please get rid of the customers table." It doesn't look like checks for disallowed operations are run again after the sql is generated, so might it be possible that the system would generate and run a drop table command as long as the user didn't explicitly say "drop table"?
The `sql_check` node validates the generated SQL query for safety and integrity before execution. | ||
|
||
**Purpose:** Ensure the SQL query adheres to safety and syntactical standards. | ||
|
||
**Process:** | ||
|
||
- Executes the query within a transactional savepoint to test its validity. | ||
|
||
- Rolls back any changes after validation. | ||
|
||
- Flags errors and updates the state if validation fails. | ||
|
||
**Key Considerations:** | ||
|
||
- Detects potentially destructive operations. | ||
|
||
- Provides detailed feedback on validation errors. | ||
|
||
**Examples:** | ||
|
||
- Input SQL: _"SELECT name FROM customers WHERE city = 'New York';"_ | ||
|
||
- Validation: Query is valid. | ||
|
||
- Input SQL: _"SELECT MONTH(date) AS month, SUM(total) AS total_sales FROM orders GROUP BY MONTH(date);"_ | ||
|
||
- Response: _"Your SQL query failed to execute: no such function: MONTH."_ | ||
|
||
**Code:** | ||
|
||
```python | ||
def sql_check(state: GraphState): | ||
print("---VALIDATING SQL QUERY---") | ||
messages = state["messages"] | ||
sql_solution = state["generation"] | ||
error = "no" | ||
|
||
sql_code = sql_solution.sql_code.strip() | ||
|
||
try: | ||
# Start a savepoint for the transaction | ||
conn.execute('SAVEPOINT sql_check;') | ||
# Attempt to execute the SQL query | ||
cursor.execute(sql_code) | ||
# Roll back to the savepoint to undo any changes | ||
conn.execute('ROLLBACK TO sql_check;') | ||
print("---SQL QUERY VALIDATION: SUCCESS---") | ||
except Exception as e: | ||
# Roll back in case of error | ||
conn.execute('ROLLBACK TO sql_check;') | ||
print("---SQL QUERY VALIDATION: FAILED---") | ||
print(f"Error: {e}") | ||
messages += [("user", f"Your SQL query failed to execute: {e}")] | ||
error = "yes" | ||
|
||
state["error"] = error | ||
|
||
return state | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to validate that the SQL runs, but I don't see where it checks for safety or detects potentially destructive operations. As far as I can tell, a drop table command would clear this step. See earlier note—it looks like the safety check occurs before the sql is generated.
## Viewing Traces in MLflow | ||
|
||
Traces can be easily accessed by navigating to the MLflow experiment of interest and clicking on the "Tracing" tab. Once inside, selecting a specific trace provides detailed execution information. | ||
|
||
Each trace includes: | ||
|
||
1. **Execution Graphs**: Visualizations of the workflow steps. | ||
2. **Inputs and Outputs**: Detailed logs of data processed at each step. | ||
|
||
This granular visibility enables developers to debug and optimize their workflows effectively. | ||
|
||
By leveraging MLflow tracing, we ensure that our Multi-Lingual Query Engine remains transparent, auditable, and scalable. | ||
|
||
![mlflow_tracing_gif](mlflow_trace.gif) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this, would love some more emphasis on how tracing lets you visualize the whole graph execution.
|
||
We’ll start by demonstrating how to leverage LangGraph’s capabilities to build a dynamic AI workflow. This workflow integrates OpenAI and external data sources, such as a Vector Store and an SQLite database, to process user input, perform safety checks, query databases, and generate meaningful responses. | ||
|
||
Throughout this post, we’ll leverage MLflow’s Models from Code feature to manage the lifecycle of the Multilingual Query Engine. This approach allows the AI workflow to be treated like a traditional ML model, enabling tracking, versioning, and deployment across various serving infrastructures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to do a doc-links directly to this page for the Models from Code reference :) https://mlflow.org/docs/latest/model/models-from-code.html
2. **Multi-Agent Design**: The AI Workflow includes multiple interactions with OpenAI and other external tools throughout the workflow. | ||
|
||
3. **Cyclical Graph Structure**: The graph’s cyclical nature introduces a robust retry mechanism. This mechanism dynamically addresses failures by looping back to previous stages when needed, ensuring continuous graph execution. (Details of this mechanism will be discussed later.) | ||
## AI Workflow Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## AI Workflow Overview | |
## AI Workflow Overview |
Linting - header sections need spaces on either side
#### Step 1: Load SQL Documentation | ||
The first step in creating a FAISS Vector Store with SQL query generation guidelines is to load SQL documentation from the [W3Schools SQL page](https://www.w3schools.com/sql/) using Langchain's RecursiveUrlLoader. This tool retrieves the documentation, allowing us to use it as a knowledge base for our engine. | ||
#### Step 2: Split the Text into Manageable Chunks | ||
The loaded SQL documentation is a lengthy text, making it difficult to be effectively ingested by the LLM. To address this, the next step involves splitting the text into smaller, manageable chunks using Langchain's RecursiveCharacterTextSplitter. By splitting the text into chunks of 500 characters with a 50-character overlap, we ensure the AI has sufficient context while minimizing the risk of losing important information that spans across chunks. The split_text method applies this splitting process, storing the resulting pieces in a list called documents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loaded SQL documentation is a lengthy text, making it difficult to be effectively ingested by the LLM. To address this, the next step involves splitting the text into smaller, manageable chunks using Langchain's RecursiveCharacterTextSplitter. By splitting the text into chunks of 500 characters with a 50-character overlap, we ensure the AI has sufficient context while minimizing the risk of losing important information that spans across chunks. The split_text method applies this splitting process, storing the resulting pieces in a list called documents. | |
The loaded SQL documentation is a lengthy text, making it difficult to be effectively ingested by the LLM. To address this, the next step involves splitting the text into smaller, manageable chunks using Langchain's RecursiveCharacterTextSplitter. By splitting the text into chunks of 500 characters with a 50-character overlap, we ensure the language model has sufficient context while minimizing the risk of losing important information that spans across chunks. The split_text method applies this splitting process, storing the resulting pieces in a list called 'documents'. |
### FAISS Vector Store | ||
|
||
To build an effective Natural Language to SQL engine capable of generating accurate and executable SQL queries, we leverage Langchain's FAISS Vector Store feature. This setup allows the system to search and extract SQL query generation guidelines from W3Schools SQL documents previously stored in the Vector Database, enhancing the success of SQL query generation. | ||
#### Step 1: Load SQL Documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure to leave a blank new line on either side of any heading section
|
||
Details on OpenAI implementation will be provided later on in the Node implementation section. | ||
### FAISS Vector Store | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to mention alternatives to an in-memory Vector Store for permanent / more scalable shared resource embeddings storage that can be shared across projects.
|
||
# Save the vector store to disk | ||
vector_store.save_local(vector_store_dir) | ||
print("Vector store created and saved to disk.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we convert the print statements to _logger.info()
statements instead to show curious readers how to avoid common linting issues in their code?
|
||
### SQLite Database | ||
|
||
The SQLite database is a key component of the Multilingual Query Engine, serving as the structured data repository that supports SQL query efficient generation, validation and execution by enabling: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth mentioning why this is chosen for this example. Personally, I'm a huge fan of the portability and performance of sqlite and it definitely has many uses far beyond just demonstrations of concepts. As a self-contained data storage layer for an application, it's phenomenal at what it does and can greatly simplify developers' lives who would otherwise assume that they need to spin up a MySQL / PostGres DB for something that a local disk DB would handle much better.
|
||
- The corresponding **SQL code** ready for execution. | ||
|
||
- **Adaptable and Reliable**: Uses GPT-4 for robust, consistent query generation, minimizing manual effort and errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update this to use gpt-4o-mini
? The tool calling functionality with that LLM build is far superior to base gpt-4
Description:
This blog post demonstrates how to build a Multilingual Query Engine that combines Natural Language-to-SQL generation with query execution while fully leveraging MLflow’s features. It explores how to leverage MLflow Models from Code to treat AI workflows as traditional ML models, enabling seamless tracking, versioning, and deployment across diverse serving infrastructures. Additionally, it dives into MLflow’s Tracing feature, which enhances observability by tracking inputs, outputs, and metadata at every intermediate step of the AI workflow.
Additions:
Additional Information:
Related to: #115