Jupyter Notebook

Jupyter is a powerful open-source platform for interactive computing . Here's a comprehensive overview of Jupyter, focusing on its key aspects.

1. Installation

Further installation instructions can be found on the jupyter notebook

Prerequisites

Python 3.7 or later
pip (Python package manager)

Installation Steps

Install Jupyter Notebook via pip

pip install notebook

Verify the installation

jupyter --version

You should see the Jupyter version along with additional tool versions.

Launch Jupyter Notebook

jupyter notebook

147

This will open a browser window where you can create and manage notebooks. Untitled 102

You can even access the notebook by this url http://localhost:6001/team1/jupyter

Alternative Installation Methods:

Using Anaconda: Install Jupyter as part of the Anaconda distribution

conda install -c conda-forge notebook

Docker: Run Jupyter Notebooks inside a Docker container

docker run -p 8888:8888 jupyter/base-notebook

2. Configuration

Default Configuration

By default, Jupyter saves files in the current directory where it was launched.
You can configure Jupyter by editing the jupyter_notebook_config.py file:

jupyter notebook --generate-config

Set a password

jupyter notebook password

Custom Themes & Extensions

Change the default notebook directory
Edit jupyter_notebook_config.py to set the desired folder:

c.NotebookApp.notebook_dir = '/path/to/project-folder'

Install Jupyter themes for a customized interface:

pip install jupyterthemes

980

jt -t <theme-name>

097

Use extensions like JupyterLab for an enhanced experience:

pip install jupyterlab

149

3. Implementation

Installing Jupyter and IPython kernel

To include Jupyter and the IPython kernel, add the following lines to your Dockerfile:

RUN /bin/bash -c "source ~/.bashrc && mamba install -c conda-forge jupyter ipykernel"

Installing kernel for the environment

RUN /root/miniconda3/envs/team1_env/bin/python -m ipykernel install --name team1_env --display-name "Python (team1_env)"

Specify the required ports in the Dockerfile

EXPOSE 6001

Screenshot 2024-11-21 113236

Launching the jupyter

jupyter notebook

Configure the default command to launch Jupyter

CMD ["jupyter", "notebook", "--port=6001", "--no-browser", "--ip=0.0.0.0"]

Creating a Notebook

Open Jupyter in a browser.
Click on "New" → "Python 3" to create a new Python notebook.
Each notebook consists of cells where you can enter Python code or markdown for the documentation.

Example Code

# Sample Python code in a Jupyter cell
print ("Hello world-Team 1!!!")

Screenshot 2024-11-21 114851

Saving and Exporting Notebooks

Save your work by clicking the Save button or pressing Ctrl+S.
Export notebooks as PDFs, HTML, or LaTeX via File → Download as.

4. Usage

Function to create Vector Store (Milvus database)

Creates the milvus directory if it doesn’t already exist, then attempts to connect to the database file. Returns a boolean indicating whether the database was successfully found

def vector_store_check(uri):
    """
    Returns response on whether the vector storage exists

    Returns:
        boolean
    """
    # Create the directory if it does not exist
    head = os.path.split(uri)
    os.makedirs(head[0], exist_ok=True)
    
    # Connect to the Milvus database
    connections.connect("default", uri=uri)

    # Return True if exists, False otherwise
    return utility.has_collection("IT_support")

print("Function `vector_store_check` defined.")

Screenshot 2024-11-21 101130

Function to clean the text

This function removes extra whitespace and blank lines from given input,returning a more readable,compact version of the text.

def clean_text(text):
    """Further clean the text by removing extra whitespace and new lines."""
    lines = (line.strip() for line in text.splitlines())
    cleaned_lines = [line for line in lines if line]
    return '\n'.join(cleaned_lines)

print("Function `clean_text` defined.")

Screenshot 2024-11-21 100459

Function to Clean and Extract Text from HTML Content

This function parses HTML content, removes unnecessary elements scripts, styles, headers, footers, and navigation elements, and extracts the main text. If a element is present, the function prioritizes its content. The cleaned content is returned as plain text, free from HTML tags and unnecessary whitespace.

def clean_text_from_html(html_content):
    """Clean HTML content to extract main text."""
    soup = BeautifulSoup(html_content, 'html.parser')

    # Remove unnecessary elements
    for script_or_style in soup(['script', 'style', 'header', 'footer', 'nav']):
        script_or_style.decompose()

    main_content = soup.find('main')
    if main_content:
        content = main_content.get_text(separator='\n')
    else:
        content = soup.get_text(separator='\n')

    return clean_text(content)

print("Function `clean_text_from_html` defined.")

Screenshot 2024-11-21 102005

Function for loading documents from the web

Recursively load documents from the web according to CORPUS_SOURCE, ensuring that only pages within the base_url of CORPUS_SOURCE are retrieved. The function returns the loaded documents.

def load_documents_from_web():
    """
    Load the documents from the web and store the page contents

    Returns:
        list: The documents loaded from the web
    """
    loader = RecursiveUrlLoader(
        url=CORPUS_SOURCE,
        prevent_outside=True,
        base_url=CORPUS_SOURCE
        )
    raw_documents = loader.load()
    
    # Ensure documents are cleaned
    cleaned_documents = []
    for doc in raw_documents:
        cleaned_text = clean_text_from_html(doc.page_content)
        cleaned_documents.append(Document(page_content=cleaned_text, metadata=doc.metadata))

    return cleaned_documents

print("Function `load_documents_from_web` defined.")

Screenshot 2024-11-21 102724

5. Troubleshooting

Common Issues

Unable to access Jupyter in browser: Verify port configuration: Ensure that port 6001 is properly exposed and correctly mapped in your Docker run command
Check firewall settings: Make sure no firewall rules are restricting access to port 6001.
Kernel Errors: Restart the kernel by clicking Kernel → Restart.
Ensure the IP address is correct: Confirm you are using the appropriate IP address or localhost if the service is running locally.
Notebook Missing: Verify the directory you used to launch the Jupyter Notebook.
Rebuild the container: If changes to Jupyter are not applied after updating the Dockerfile, rebuild the Docker image.

docker build -t team1-app

Debugging Tips

View detailed error logs by navigating to the terminal where the Jupyter is running.
Use %debug magic command to step into any errors within the notebook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly