Microllm just the bare basics to run inference on local hardware. currently working: read_gguf.py Refactor made it faster and more compact TODO: fix token generation to generate sensible tokens...