-
Notifications
You must be signed in to change notification settings - Fork 363
Conversation
I ported the code from C++. I'm sure the reading part is correct. Maybe Tensor setup has changed? Or ggml has changed? The model runs, but is producing garbage
|
I'd take the same model in ggmj and ggjt format and compare the loaded tensors. I'm guessing it's probably a misalignment or something similar. |
Is it normal to warn for bad tokens? The bad tokens are single-byte string >128. That's invalid UTF-8, but the tokenizer don't care? I'm probably done with this for awhile. |
Yes, we're discussing that over at #11. The short explanation is that the invalid tokens are not valid UTF8 by themselves, but are composed to form valid UTF8. We're figuring out what the actual solution should be. |
Does it work with the old formats but just gets garbage when using a GGJT model or just always garbage? |
I haven't change the behavior of old formats. only GGJT produce garbage. I don't have an older model. Can you try this branch on GGMF models? |
Now it can load the model, but it's not working
|
Was that last force push just rebasing on the current version or does it involve changes to the loading that may fix stuff that previously didn't work? |
just rebase can you share a simple GGMF model that I can use for testing? |
I think this one is GGMF: https://huggingface.co/Sosaka/Alpaca-native-4bit-ggml/ (Don't know if it's a problem to post something like that here, if so let me know and I'll edit it out after iacore sees it.) |
That one is GGML aka unversioned, not GGMF |
Sorry, my mistake. Unfortunately, I don't really have a reasonable way to share huge files. I've been trying to figure out what the issue with GGJT is. One thing I can say is I don't think your logic for finding the tensors/lengths/offsets has a problem. I added some
Absolutely no difference in output between the C and Rust versions. |
in math, tensor loading
still no clue |
Okay, so I got it actually running inference on a GGJT model. However, what I had to do makes the mmap part pointless. I believe the problem has something to do with how you can set a context parameters to However, we're still doing the old context size calculation. I tried making the Anyway, in tensor.set_data(ptr as *mut std::ffi::c_void); to ptr.copy_to_nonoverlapping(tensor.data() as *mut u8, tensor.nbytes()); With that, it runs just fine on the GGJT model. Loading speed seems normal compared to the current version. Obviously it's silly to Then later on it will be possible to add By the way, you probably need to run clippy on your changes. It's very unhappy right now! |
Yeah, I'd be happy with not supporting |
The approach I'd go for if I was writing it would be to just have a general type that can describe the types of tensor and where they are, dimensions, etc. Something similar to this: https://github.com/KerfuffleV2/smolrsrwkv/blob/182cd3205b7a7c95571a09bcfbb954b0041e4f90/smolrwkv/src/loader.rs#L15 Then the specific file format code can just scan through the file for metadata, build a structure with a list of those things or whatever. Then there can be generic loader code that just loads tensors based on that: it could use reading files, mmap, whatever. edit: Also could convert from something like SafeTensors, PyTorch, whatever. Then if some data actually needs to be converted, it could be described in that structure and the conversion process wouldn't have to care about low level details like GGJT vs GGML, just "I have a tensor of type X, but I need Y". I think that approach would make dealing with different file formats much easier. |
Got Vicuna working here. The loading speed is unfortunate. I can use io_uring to make this faster, but that's more code. maybe copying from mmap-ed memory is faster? ship it 🚀 |
@iacore I was experimenting with trying to clean it up also. It does seem like reading is way slower than the copying from My change looks like this: let offset_curr = reader.stream_position()?;
let offset_aligned: u64 = (offset_curr + 31) & !31;
reader.seek_relative((offset_aligned - offset_curr) as i64)?;
let td =
unsafe { std::slice::from_raw_parts_mut(tensor.data() as *mut u8, tensor.nbytes()) };
reader.read_exact(td)?;
total_loaded_bytes += tensor.nbytes(); So, uhh... I guess maybe just keep mmap? edit:
Also probably don't want OS specific optimizations. I also experimented with making the |
for me, seq. read is faster than mmap I've made a branch |
Are you saying you tried the |
yes |
Ahh, why can't life be simple for once? What OS are you using, out of curiosity? |
Linux. I think this branch is done. I'll not touch it again. |
I'm on Linux as well. Sorry for the confusion. I've been trying both versions and I don't get consistent results. No sure what's going on, but I don't think there's a problem with the current approach.
How can you say that when Clippy is still sad? You can probably just nuke the |
it probably will be useful in the future? |
We'd have to figure out how to actually use it in the future though. I don't think it currently can work at all, since you aren't even able to turn of You'd think it would still be possible to set the tensor to point at a different chunk of memory but that didn't actually work: otherwise your first approach would have had no issues. So my line of thinking is, it's a only a couple of lines of code to wrap a Just to be clear, this is just the opinion of some other random person on the internet. So take that for it's worth, I have no authority. |
bf847dd
to
32925e7
Compare
I also got this working on new ggjt file format. |
Btw, from https://justine.lol/mmap/
The speedup is meant to be for subsequent loads. Did you guys check that? Or just the initial load? Btw, you may need to configure some settings like read_advise in mmap2 library to get some better prefetching. The default MMAP way of reading is to trigger page faults. |
Another thing to check: multiple processes utilizing the same mmaped file:
|
)? | ||
} | ||
ModelVersion::GGJT => { | ||
let mmap = unsafe { Mmap::map(&file)? }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Succeeded by #125 |
Related to #62 #93
single-file model format magic=ggjt