Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a model cache to avoid running out of storage #201

Merged
merged 7 commits into from
Jan 23, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions sync.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@ OBJECT_ID="${MODEL_ID//\//--}"
S3_BASE_DIRECTORY="models--$OBJECT_ID"
S3_PATH="s3://${HF_CACHE_BUCKET}/${S3_BASE_DIRECTORY}/"
LOCAL_MODEL_DIR="${HUGGINGFACE_HUB_CACHE}/${S3_BASE_DIRECTORY}"
LOCKFILE="${HUGGINGFACE_HUB_CACHE}/cache.lock"
CACHE_FILE="${HUGGINGFACE_HUB_CACHE}/cache.txt"
# Read cache size from environment variable, default to 4
DEFAULT_CACHE_SIZE=4
CACHE_SIZE=${CACHE_SIZE:-$DEFAULT_CACHE_SIZE}

# Function to check if lorax-launcher is running
is_launcher_running() {
Expand All @@ -20,6 +25,75 @@ is_launcher_running() {
kill -0 "$launcher_pid" >/dev/null 2>&1
}

# Check if the cache file exists and is not empty and that we can get the file lock
if [ -f "${HUGGINGFACE_HUB_CACHE}/.lock" ] && [ -s "${HUGGINGFACE_HUB_CACHE}/.lock" ] && ! { set -C; 2>/dev/null >"${HUGGINGFACE_HUB_CACHE}/.lock"; }; then
echo "Another process is downloading the weights. Waiting for it to finish."
while [ -f "${LOCAL_MODEL_DIR}/.lock" ] && [ -s "${LOCAL_MODEL_DIR}/.lock" ]; do
sleep 1
done
echo "The other process has finished downloading the weights. Continuing."
fi

clean_up_cache() {
local temp_file=$(mktemp)
local removed_lines=""
local key=$1
local file=$2

# Remove the key if it exists
grep -v "^$key\$" "$file" > "$temp_file"

# Add the key to the bottom of the file
echo "$key" >> "$temp_file"

# Count total lines in temp file
local total_lines=$(wc -l < "$temp_file")

# Calculate number of lines to be removed, if any
local lines_to_remove=$((total_lines - CACHE_SIZE))

if [ "$lines_to_remove" -gt 0 ]; then
# Store removed lines in a variable
removed_lines=$(head -n "$lines_to_remove" "$temp_file")
echo "Deleting $removed_lines from cache"
fi

# Ensure only the last 5 items are retained
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, "5" should be N given its base don env var.

tail -n $CACHE_SIZE "$temp_file" > "$file"

# Clean up the temporary file
rm "$temp_file"

for line in $removed_lines; do
model_to_remove="${HUGGINGFACE_HUB_CACHE}/${line}"
echo "Removing $model_to_remove"
rm -rf $model_to_remove
done
}

(
# Wait for lock on $lockfile (fd 200)
flock -x 200

# The following code is executed only after acquiring the lock.
echo "Lock acquired."
mkdir -p "$LOCAL_MODEL_DIR"

# Read the file (optional, based on your requirement)
if [ -f "$CACHE_FILE" ]; then
echo "Cache file exists."
while read -r line; do
echo "Line read: $line"
if [ "$line" = "$S3_BASE_DIRECTORY" ]; then
echo "Model found in cache."
fi
done < "$CACHE_FILE"
else
echo "Cache file does not exist."
fi
clean_up_cache "$S3_BASE_DIRECTORY" "$CACHE_FILE"
) 200>$LOCKFILE

sudo mkdir -p $LOCAL_MODEL_DIR

if [ -n "$(ls -A $LOCAL_MODEL_DIR)" ]; then
Expand Down Expand Up @@ -56,6 +130,7 @@ else
echo "Downloading weights from ${S3_PATH}"
fi


echo "Files found for model ${MODEL_ID}"
aws s3 ls "${S3_PATH}" --recursive | awk '{print $4}'

Expand Down