Skip to content

Jupyter Notebook

ritvik180 edited this page Nov 21, 2024 · 66 revisions

Jupyter is a powerful open-source platform for interactive computing . Here's a comprehensive overview of Jupyter, focusing on its key aspects.

Contents

  1. Installation
  2. Configuration
  3. Implementation
  4. Usage
  5. Troubleshooting

1. Installation

Further installation instructions can be found on the jupyter notebook

Prerequisites

  • Python 3.7 or later
  • pip (Python package manager)

Installation Steps

  1. Install Jupyter Notebook via pip
pip install notebook

image

  1. Verify the installation
jupyter --version

image

You should see the Jupyter version along with additional tool versions.

  1. Launch Jupyter Notebook
jupyter notebook

147

This will open a browser window where you can create and manage notebooks. Untitled 102

Alternative Installation Methods:

  • Using Anaconda: Install Jupyter as part of the Anaconda distribution
conda install -c conda-forge notebook
  • Docker: Run Jupyter Notebooks inside a Docker container
docker run -p 8888:8888 jupyter/base-notebook

2. Configuration

Default Configuration

  • By default, Jupyter saves files in the current directory where it was launched.

  • You can configure Jupyter by editing the jupyter_notebook_config.py file:

jupyter notebook --generate-config

image

  • Set a password
jupyter notebook password

image

Custom Themes & Extensions

  • Change the default notebook directory
  • Edit jupyter_notebook_config.py to set the desired folder:
c.NotebookApp.notebook_dir = '/path/to/project-folder'
 
  • Install Jupyter themes for a customized interface:
pip install jupyterthemes

980

jt -t <theme-name>

image 097

  • Use extensions like JupyterLab for an enhanced experience:
pip install jupyterlab

149

3. Implementation

Installing Jupyter and IPython kernel

  • To include Jupyter and the IPython kernel, add the following lines to your Dockerfile:
RUN /bin/bash -c "source ~/.bashrc && mamba install -c conda-forge jupyter ipykernel"

Installing kernel for the environment

RUN /root/miniconda3/envs/team1_env/bin/python -m ipykernel install --name team1_env --display-name "Python (team1_env)"

Specify the required ports in the Dockerfile

EXPOSE 6001

Screenshot 2024-11-21 113236

Launching the jupyter

jupyter notebook
  • Configure the default command to launch Jupyter
CMD ["jupyter", "notebook", "--port=6001", "--no-browser", "--ip=0.0.0.0"]

Creating a Notebook

  1. Open Jupyter in a browser.
  2. Click on "New" → "Python 3" to create a new Python notebook.
  3. Each notebook consists of cells where you can enter Python code or markdown for the documentation.

Example Code

# Sample Python code in a Jupyter cell
print ("Hello world-Team 1!!!")

Screenshot 2024-11-21 114851

Saving and Exporting Notebooks

  • Save your work by clicking the Save button or pressing Ctrl+S.
  • Export notebooks as PDFs, HTML, or LaTeX via File → Download as.

4. Usage

Function to create Vector Store (Milvus database)

  • Creates the milvus directory if it doesn’t already exist, then attempts to connect to the database file. Returns a boolean indicating whether the database was successfully found
def vector_store_check(uri):
    """
    Returns response on whether the vector storage exists

    Returns:
        boolean
    """
    # Create the directory if it does not exist
    head = os.path.split(uri)
    os.makedirs(head[0], exist_ok=True)
    
    # Connect to the Milvus database
    connections.connect("default", uri=uri)

    # Return True if exists, False otherwise
    return utility.has_collection("IT_support")

print("Function `vector_store_check` defined.")

Screenshot 2024-11-21 101130

Function to clean the text

  • This function removes extra whitespace and blank lines from given input,returning a more readable,compact version of the text.
def clean_text(text):
    """Further clean the text by removing extra whitespace and new lines."""
    lines = (line.strip() for line in text.splitlines())
    cleaned_lines = [line for line in lines if line]
    return '\n'.join(cleaned_lines)

print("Function `clean_text` defined.")

Screenshot 2024-11-21 100459

Function to Clean and Extract Text from HTML Content

  • This function parses HTML content, removes unnecessary elements scripts, styles, headers, footers, and navigation elements, and extracts the main text. If a element is present, the function prioritizes its content. The cleaned content is returned as plain text, free from HTML tags and unnecessary whitespace.
def clean_text_from_html(html_content):
    """Clean HTML content to extract main text."""
    soup = BeautifulSoup(html_content, 'html.parser')

    # Remove unnecessary elements
    for script_or_style in soup(['script', 'style', 'header', 'footer', 'nav']):
        script_or_style.decompose()

    main_content = soup.find('main')
    if main_content:
        content = main_content.get_text(separator='\n')
    else:
        content = soup.get_text(separator='\n')

    return clean_text(content)

print("Function `clean_text_from_html` defined.")

Screenshot 2024-11-21 102005

Function for loading documents from the web

  • Recursively load documents from the web according to CORPUS_SOURCE, ensuring that only pages within the base_url of CORPUS_SOURCE are retrieved. The function returns the loaded documents.
def load_documents_from_web():
    """
    Load the documents from the web and store the page contents

    Returns:
        list: The documents loaded from the web
    """
    loader = RecursiveUrlLoader(
        url=CORPUS_SOURCE,
        prevent_outside=True,
        base_url=CORPUS_SOURCE
        )
    raw_documents = loader.load()
    
    # Ensure documents are cleaned
    cleaned_documents = []
    for doc in raw_documents:
        cleaned_text = clean_text_from_html(doc.page_content)
        cleaned_documents.append(Document(page_content=cleaned_text, metadata=doc.metadata))

    return cleaned_documents

print("Function `load_documents_from_web` defined.")

Screenshot 2024-11-21 102724

5. Troubleshooting

Common Issues

  1. Unable to access Jupyter in browser: Verify port configuration: Ensure that port 6001 is properly exposed and correctly mapped in your Docker run command

  2. Check firewall settings: Make sure no firewall rules are restricting access to port 6001.

  3. Kernel Errors: Restart the kernel by clicking Kernel → Restart.

  4. Ensure the IP address is correct: Confirm you are using the appropriate IP address or localhost if the service is running locally.

  5. Notebook Missing: Verify the directory you used to launch the Jupyter Notebook.

  6. Rebuild the container: If changes to Jupyter are not applied after updating the Dockerfile, rebuild the Docker image.

docker build -t team1-app 

Debugging Tips

  • View detailed error logs by navigating to the terminal where the Jupyter is running.
  • Use %debug magic command to step into any errors within the notebook.