Training data and inclusion #13

mashdragon · 2024-12-31T22:14:01Z

Hi there,

I have noticed that JoyCaption sometimes skips image features I want the captions to talk about. I know I can fine tune it, but I am curious about where the training data came from and reviewing the training data might help explain to me why the model behaves this way.

Will you publish the training data set or will that always remain hidden?

fpgaminer · 2024-12-31T22:20:38Z

All of the training data will be published (with images as hashes and urls where possible). I'm working on getting the dataset organized (it's a complete mess at the moment), so expect it to be uploaded closer to a version 1.0 release.

Yeah, the model will miss features for a variety of reasons. Finetuning will always be the best for improving that, but I'm working to get better instruction following into JoyCaption so that it can be guided by a prompt to focus on whatever specifically you want it to describe.

mashdragon · 2024-12-31T22:57:34Z

Thank you for responding, I'm looking forward to seeing the dataset. From a user perspective, using a torrent would be most convenient as it has hash checking included, but I understand if it's too large for that to be practical.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training data and inclusion #13

Training data and inclusion #13

mashdragon commented Dec 31, 2024

fpgaminer commented Dec 31, 2024

mashdragon commented Dec 31, 2024

Training data and inclusion #13

Training data and inclusion #13

Comments

mashdragon commented Dec 31, 2024

fpgaminer commented Dec 31, 2024

mashdragon commented Dec 31, 2024