Bug fix for illegal memory access error caused when running medusa lora and plain loras in parallel. #525

ajtejankar · 2024-06-25T17:46:30Z

The error is caused when forwarding through the BGMV kernel during the decode stage and when there are multiple lora adapters. The reason is the discrepancy between the list of lora weight pointers and indices into them. This pull request fixes this. I used the following script to test that everything is working properly.

import fire
from multiprocessing import Pool
from lorax import Client
import numpy as np
import time

adapters = [
    ("/usr/src/lorax/medusa-23", "local"),
    ("vineetsharma/qlora-adapter-Mistral-7B-Instruct-v0.1-gsm8k", "hub"),
    ("predibase/magicoder", "hub"),
    ("predibase/gsm8k", "hub"),
    (None, None),
]

max_new_tokens = 30

def send_request(idx):
    adapter_id, adapter_source = adapters[idx]
    client = Client("http://127.0.0.1:8080", timeout=(60 * 100))
    prompt = "[INST] How much is 2 + 2? [/INST]"
    print(f'==> Send request: {adapter_id}::{adapter_source}')
    time.sleep(np.random.permutation(10)[0])
    if adapter_id is not None:
        result = client.generate(
            prompt,
            adapter_id=adapter_id,
            adapter_source=adapter_source,
            max_new_tokens=max_new_tokens,
        ).generated_text
    else:
        result = client.generate(prompt, max_new_tokens=max_new_tokens).generated_text
    print(f'==> Rcvd request: {adapter_id}::{adapter_source}')
    return result, adapter_id, adapter_source


def main(num_workers: int, num_requests: int, seed: int = 123):
    np.random.seed(seed)
    adapter_indices = [i % len(adapters) for i in np.random.permutation(100)][:num_requests]
    with Pool(num_workers) as p:
        for result, adapter_id, adapter_source in p.map(send_request, adapter_indices):
            print(f'============== {adapter_id} //// {adapter_source} ==============')
            print(result)
            print(f'================================================================')


if __name__ == '__main__':
    fire.Fire(main)

Run it with parameters --num-workers 10 --num-requests 100 --seed 123

The error is caused by incorrect indices passed to the BGMV kernel

tgaddair

Nice! Is there a small unit test we can add to validate this as well?

ajtejankar · 2024-06-26T07:30:58Z

Nice! Is there a small unit test we can add to validate this as well?

Yup, added a small unit test and refactored the code a bit so that it's clearer.

bug (parallel medusa): medusa + lora in parallel = illegal mem access

e4317e1

The error is caused by incorrect indices passed to the BGMV kernel

arnavgarg1 requested a review from tgaddair June 25, 2024 18:16

tgaddair approved these changes Jun 25, 2024

View reviewed changes

test : add unit tests, refactor code and add comments

f4ffae7

ajtejankar merged commit f3a67bb into main Jun 26, 2024
1 check passed

ajtejankar deleted the parallel-medusa-bug branch June 26, 2024 07:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix for illegal memory access error caused when running medusa lora and plain loras in parallel. #525

Bug fix for illegal memory access error caused when running medusa lora and plain loras in parallel. #525

ajtejankar commented Jun 25, 2024

tgaddair left a comment

ajtejankar commented Jun 26, 2024

Bug fix for illegal memory access error caused when running medusa lora and plain loras in parallel. #525

Bug fix for illegal memory access error caused when running medusa lora and plain loras in parallel. #525

Conversation

ajtejankar commented Jun 25, 2024

tgaddair left a comment

Choose a reason for hiding this comment

ajtejankar commented Jun 26, 2024