Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

From Natural Language to SQL: Building and Tracking a Multi-Lingual Query Engine #132

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

joanacmesquitaf
Copy link

Description:

This blog post demonstrates how to build a Multilingual Query Engine that combines Natural Language-to-SQL generation with query execution while fully leveraging MLflow’s features. It explores how to leverage MLflow Models from Code to treat AI workflows as traditional ML models, enabling seamless tracking, versioning, and deployment across diverse serving infrastructures. Additionally, it dives into MLflow’s Tracing feature, which enhances observability by tracking inputs, outputs, and metadata at every intermediate step of the AI workflow.

Additions:

  1. Created a folder for the blog content containing:
  • index.md with the content of the blog.
  • Relevant illustrations.
  1. Added all authors' information and thumbnails to the correct locations.

Additional Information:

Related to: #115

Copy link

github-actions bot commented Dec 5, 2024

Preview for 3ac4775

  • For faster build, the doc pages are not included in the preview.
  • Redirects are disabled in the preview.
Open in StackBlitz

Copy link

@djliden djliden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this blog! Great project. I appreciate the detailed code examples and I liked seeing all of the LangGraph nodes reflected in the MLflow traces.

A few general comments:

  1. It took me a little while to figure out exactly what we would be building; I'd like to see a clearer statement of this very early on. "We will build a system that takes user inputs in any(?) language, translates them to SQL, validates, executes, etc..." Ideally with at least one concrete example. Given a database with data about X, a user can ask <natural language query> in natural language and see <output>.
  2. The role of MLflow in the story of the post was a little unclear until the end. It talked about lifecycle management a few times but we don't really see it used for that purpose much. I would propose emphasizing tracing a bit more, and really highlighting the correspondence between the nodes and the recorded tracing spans, emphasizing that tracing gives a lot of visibility of what happens at each step.
  3. I wonder if it's possible to make the node descriptions section a little more concise. Perhaps instead of having the process/key considerations/examples lists in each section, those could be consolidated into a single table at the beginning/end of the node descriptions section, or those could be consolidated into a few sentences per section. It might be even more effective to follow one concrete example though each step instead. E.g. a user starts with "Quantos pedidos foram realizados em Novembro?" — I would be really interested to see what happens with this at each stage (maybe with tracing screenshots at each stage 🙂)


# Multilingual Query Engine using Langraph

The Multilingual Query Engine leverages LangGraph’s advanced features to create a stateful, multi-agent, and cyclical graph architecture.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest briefly explaining what LangGraph is and why it is the right tool for this purpose (this section gets into features, but I think readers would benefit from a higher-level intro that identifies it as an AI orchestration tool, explains why it was used in this case, etc.)

Comment on lines +68 to +88
## AI Workflow Overview

The Multilingual Query Engine’s advanced AI workflow is composed of interconnected nodes and edges, each representing a crucial stage:

1. **Translation Node**: Converts the user’s input into English.

2. **Safety Checks**: Ensures user input is free from toxic or inappropriate content and does not contain harmful SQL commands (e.g., DELETE, DROP).

3. **Database Schema Extraction**: Retrieves the schema of the target database to understand its structure and available data.

4. **Relevancy Validation**: Validates the user’s input against the database schema to ensure alignment with the database’s capabilities.

5. **SQL Query Generation**: Generates an SQL query based on the user’s input and the current database schema.

6. **SQL Query Validation**: Executes the SQL query in a rollback-safe environment to ensure its validity before running it.

7. **Dynamic State Evaluation**: Determines the next steps based on the current state. If the SQL query validation fails, it loops back to Stage 5 to regenerate the query.

8. **Query Execution and Result Retrieval**: Executes the SQL query and returns the results if it’s a SELECT statement.

The retry mechanism is introduced in Stage 7, where the system dynamically evaluates the current graph state. Specifically, when the SQL query validation node (Stage 6) detects an issue, the state triggers a loop back to the SQL Generation node (Stage 5) for a new SQL Generation attempt (within a maximum of 3 attemps).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intro—first paragraph or two—could use a very brief summary of this. "We will build a system that takes natural language input, such as X, from the user, validates it for safety, and generates correct SQL, informed by context about the database schema." A clear description of the task and an example of the final workflow the project will enable will help readers get their bearings right from the beginning.

Comment on lines +429 to +437
**Examples:**

- Input: _"Quantos pedidos foram realizados em Novembro?"_

- Translated: _"How many orders were made in November?"_

- Input: _"Combien de ventes avons-nous enregistrées en France ?"_

- Translated: _"How many sales did we record in France?"_
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note early on that multilingual refers to taking natural-language inputs in multiple languages, not e.g. supporting multiple SQL dialects or something like that. I wasn't 100% sure until I got to here!

Comment on lines +966 to +1034
def main():
# Load environment variables from .env file
load_dotenv()

# Access secrets using os.getenv
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

# Setup database and vector store
conn = setup_database()
cursor = conn.cursor()
vector_store = setup_vector_store()

# Load the model
model_uri = f"models:/{REGISTERED_MODEL_NAME}@{MODEL_ALIAS}"
model = mlflow.pyfunc.load_model(model_uri)
model_input = {"conn": conn, "cursor": cursor, "vector_store": vector_store}
app = model.predict(model_input)

# save image
app.get_graph().draw_mermaid_png(
output_file_path="sql_agent_with_safety_checks.png"
)

# Example user interaction
print("Welcome to the SQL Assistant!")
while True:
question = input("\nEnter your SQL question (or type 'exit' to quit): ")
if question.lower() == "exit":
break

# Initialize the state with all required keys
initial_state = {
"messages": [("user", question)],
"iterations": 0,
"error": "",
"results": None,
"generation": None,
"no_records_found": False,
"translated_input": "", # Initialize translated_input
}

solution = app.invoke(initial_state)

# Check if an error was set during the safety check
if solution["error"] == "yes":
print("\nAssistant Message:\n")
print(solution["messages"][-1][1]) # Display the assistant's message
continue # Skip to the next iteration

# Extract the generated SQL query from solution["generation"]
sql_query = solution["generation"].sql_code
print("\nGenerated SQL Query:\n")
print(sql_query)

# Extract and display the query results
if solution.get("no_records_found"):
print("\nNo records found matching your query.")
elif "results" in solution and solution["results"] is not None:
print("\nQuery Results:\n")
for row in solution["results"]:
print(row)
else:
print("\nNo results returned or query did not execute successfully.")

print("Goodbye!")


if __name__ == "__main__":
main()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show an example invocation of this (not all the code, just one quick example) toward the beginning?


However, there are also a number of problems that remain when creating an NL2SQL system like semantic ambiguity, schema mapping or error handling and user feedback. Therefore, it is very important that while building such systems, we must put some guardrails instead of completely relying on LLM.

In this blog post, we’ll walk you through the process of building and managing the lifecycle of a Multilingual Query Engine, encompassing both Natural Language to SQL generation and query execution.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much about lifecycle management in the post (it mentions lifecycle management a few times, but doesn't show much about how to use MLflow for that purpose). You might de-emphasize that but show a little more about how MLflow tracing gives visibility into the many different components of a setup like this.

I think it's really cool, looking at the gif at the end, how we can see the different nodes laid out in the article reflected in the trace. It might be interesting to show screenshots of that for each section, or at least call that out more clearly in that section—specifically, that the final graph can be a bit of a black box, might be challenging to debug, to figure out what is happening at each step, but tracing gives really clear visibility into that with the one line of code.

Comment on lines +872 to +874
# Logging the Model in MLFlow

Now that we have built a Multi-Lingual Query Engine using LangGraph, we are ready to log the model using MLflow’s [ Model from Code](https://mlflow.org/blog/models_from_code). This approach, where we log the code that represents the model, contrasts with object-based logging, where a model object is created, serialized, and logged as a pickle or JSON object.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Motivate this step (logging the model) a little more? e.g. can mention versioning, sharing, packaging for deployment, etc.

Comment on lines +442 to +461
def translate_input(state: GraphState):
print("---TRANSLATING INPUT---")
messages = state["messages"]
user_input = messages[-1][1] # Get the latest user input

# Translation prompt for the model
translation_prompt = f"""
Translate the following text to English. If the text is already in English, repeat it exactly without any additional explanation.

Text:
{user_input}
"""
# Call the OpenAI LLM to translate the text
translated_response = llm.invoke(translation_prompt)
translated_text = translated_response.content.strip() # Access the 'content' attribute and strip any extra spaces
state["translated_input"] = translated_text # Save the translated input
print(f"Translated Input: {translated_text}")

return state
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does translating to english before translating to SQL improve performance? Or would it work just as well to translate from whatever the input language is to SQL? Worth motivating this step.

Comment on lines +510 to +527
def safety_check(state: GraphState):
print("---PERFORMING SAFETY CHECK---")
translated_input = state["translated_input"]
messages = state["messages"]
error = "no"

# List of disallowed SQL operations (e.g., DELETE, DROP)
disallowed_operations = ['CREATE', 'DELETE', 'DROP', 'INSERT', 'UPDATE', 'ALTER', 'TRUNCATE', 'EXEC', 'EXECUTE']
pattern = re.compile(r'\b(' + '|'.join(disallowed_operations) + r')\b', re.IGNORECASE)

# Check if the input contains disallowed SQL operations
if pattern.search(translated_input):
print("Input contains disallowed SQL operations. Halting the workflow.")
error = "yes"
messages += [("assistant", "Your query contains disallowed SQL operations and cannot be processed.")]
else:
# Check if the input contains inappropriate content
safety_prompt = f"""
Copy link

@djliden djliden Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious about using pattern search on the (translated) natural language input—wouldn't checking for disallowed SQL operations make more sense after the SQL is generated?

i.e. what would happen if the natural language input is "please get rid of the customers table." It doesn't look like checks for disallowed operations are run again after the sql is generated, so might it be possible that the system would generate and run a drop table command as long as the user didn't explicitly say "drop table"?

Comment on lines +745 to +803
The `sql_check` node validates the generated SQL query for safety and integrity before execution.

**Purpose:** Ensure the SQL query adheres to safety and syntactical standards.

**Process:**

- Executes the query within a transactional savepoint to test its validity.

- Rolls back any changes after validation.

- Flags errors and updates the state if validation fails.

**Key Considerations:**

- Detects potentially destructive operations.

- Provides detailed feedback on validation errors.

**Examples:**

- Input SQL: _"SELECT name FROM customers WHERE city = 'New York';"_

- Validation: Query is valid.

- Input SQL: _"SELECT MONTH(date) AS month, SUM(total) AS total_sales FROM orders GROUP BY MONTH(date);"_

- Response: _"Your SQL query failed to execute: no such function: MONTH."_

**Code:**

```python
def sql_check(state: GraphState):
print("---VALIDATING SQL QUERY---")
messages = state["messages"]
sql_solution = state["generation"]
error = "no"

sql_code = sql_solution.sql_code.strip()

try:
# Start a savepoint for the transaction
conn.execute('SAVEPOINT sql_check;')
# Attempt to execute the SQL query
cursor.execute(sql_code)
# Roll back to the savepoint to undo any changes
conn.execute('ROLLBACK TO sql_check;')
print("---SQL QUERY VALIDATION: SUCCESS---")
except Exception as e:
# Roll back in case of error
conn.execute('ROLLBACK TO sql_check;')
print("---SQL QUERY VALIDATION: FAILED---")
print(f"Error: {e}")
messages += [("user", f"Your SQL query failed to execute: {e}")]
error = "yes"

state["error"] = error

return state
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to validate that the SQL runs, but I don't see where it checks for safety or detects potentially destructive operations. As far as I can tell, a drop table command would clear this step. See earlier note—it looks like the safety check occurs before the sql is generated.

Comment on lines +1050 to +1064
## Viewing Traces in MLflow

Traces can be easily accessed by navigating to the MLflow experiment of interest and clicking on the "Tracing" tab. Once inside, selecting a specific trace provides detailed execution information.

Each trace includes:

1. **Execution Graphs**: Visualizations of the workflow steps.
2. **Inputs and Outputs**: Detailed logs of data processed at each step.

This granular visibility enables developers to debug and optimize their workflows effectively.

By leveraging MLflow tracing, we ensure that our Multi-Lingual Query Engine remains transparent, auditable, and scalable.

![mlflow_tracing_gif](mlflow_trace.gif)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this, would love some more emphasis on how tracing lets you visualize the whole graph execution.


We’ll start by demonstrating how to leverage LangGraph’s capabilities to build a dynamic AI workflow. This workflow integrates OpenAI and external data sources, such as a Vector Store and an SQLite database, to process user input, perform safety checks, query databases, and generate meaningful responses.

Throughout this post, we’ll leverage MLflow’s Models from Code feature to manage the lifecycle of the Multilingual Query Engine. This approach allows the AI workflow to be treated like a traditional ML model, enabling tracking, versioning, and deployment across various serving infrastructures.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to do a doc-links directly to this page for the Models from Code reference :) https://mlflow.org/docs/latest/model/models-from-code.html

2. **Multi-Agent Design**: The AI Workflow includes multiple interactions with OpenAI and other external tools throughout the workflow.

3. **Cyclical Graph Structure**: The graph’s cyclical nature introduces a robust retry mechanism. This mechanism dynamically addresses failures by looping back to previous stages when needed, ensuring continuous graph execution. (Details of this mechanism will be discussed later.)
## AI Workflow Overview
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## AI Workflow Overview
## AI Workflow Overview

Linting - header sections need spaces on either side

#### Step 1: Load SQL Documentation
The first step in creating a FAISS Vector Store with SQL query generation guidelines is to load SQL documentation from the [W3Schools SQL page](https://www.w3schools.com/sql/) using Langchain's RecursiveUrlLoader. This tool retrieves the documentation, allowing us to use it as a knowledge base for our engine.
#### Step 2: Split the Text into Manageable Chunks
The loaded SQL documentation is a lengthy text, making it difficult to be effectively ingested by the LLM. To address this, the next step involves splitting the text into smaller, manageable chunks using Langchain's RecursiveCharacterTextSplitter. By splitting the text into chunks of 500 characters with a 50-character overlap, we ensure the AI has sufficient context while minimizing the risk of losing important information that spans across chunks. The split_text method applies this splitting process, storing the resulting pieces in a list called documents.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The loaded SQL documentation is a lengthy text, making it difficult to be effectively ingested by the LLM. To address this, the next step involves splitting the text into smaller, manageable chunks using Langchain's RecursiveCharacterTextSplitter. By splitting the text into chunks of 500 characters with a 50-character overlap, we ensure the AI has sufficient context while minimizing the risk of losing important information that spans across chunks. The split_text method applies this splitting process, storing the resulting pieces in a list called documents.
The loaded SQL documentation is a lengthy text, making it difficult to be effectively ingested by the LLM. To address this, the next step involves splitting the text into smaller, manageable chunks using Langchain's RecursiveCharacterTextSplitter. By splitting the text into chunks of 500 characters with a 50-character overlap, we ensure the language model has sufficient context while minimizing the risk of losing important information that spans across chunks. The split_text method applies this splitting process, storing the resulting pieces in a list called 'documents'.

### FAISS Vector Store

To build an effective Natural Language to SQL engine capable of generating accurate and executable SQL queries, we leverage Langchain's FAISS Vector Store feature. This setup allows the system to search and extract SQL query generation guidelines from W3Schools SQL documents previously stored in the Vector Database, enhancing the success of SQL query generation.
#### Step 1: Load SQL Documentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure to leave a blank new line on either side of any heading section


Details on OpenAI implementation will be provided later on in the Node implementation section.
### FAISS Vector Store

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to mention alternatives to an in-memory Vector Store for permanent / more scalable shared resource embeddings storage that can be shared across projects.


# Save the vector store to disk
vector_store.save_local(vector_store_dir)
print("Vector store created and saved to disk.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we convert the print statements to _logger.info() statements instead to show curious readers how to avoid common linting issues in their code?


### SQLite Database

The SQLite database is a key component of the Multilingual Query Engine, serving as the structured data repository that supports SQL query efficient generation, validation and execution by enabling:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth mentioning why this is chosen for this example. Personally, I'm a huge fan of the portability and performance of sqlite and it definitely has many uses far beyond just demonstrations of concepts. As a self-contained data storage layer for an application, it's phenomenal at what it does and can greatly simplify developers' lives who would otherwise assume that they need to spin up a MySQL / PostGres DB for something that a local disk DB would handle much better.


- The corresponding **SQL code** ready for execution.

- **Adaptable and Reliable**: Uses GPT-4 for robust, consistent query generation, minimizing manual effort and errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update this to use gpt-4o-mini? The tool calling functionality with that LLM build is far superior to base gpt-4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants