How many values of an embedding do you need to keep in order for the dot products to be roughly the same? Can I keep every third value and my sentence embeddings perform close to identically? Releated research was done with Matrioshka Models in Feb 2024
Can you appoximate the attention mechanism with a small model like BiLD / Speculative Sampling and throw out some number of irrelevant tokens? Specifically on chat based tasks where topics can change.
July 13th, 2023
Can you take a pretrained model and continue training with QAT and wind up with a good model? I know int8 works and I even have done int4 but it would be really cool if you could do this down to int2.
Can you take a dense model and SVD the weights and slightly lower the rank, continue training, and repeat this until the inner rank is significantly lower than the baseline dense model while maintaing benchmarks?
July 31st, 2023
Can you do some basic retrieval and add some cross attention layers into a GPT/decoder-only model and train that for some steps and get a better-grounded/factually correct model given a retrieval output?
Can LLava be done with cross attention instead of the current self attention? LLaMA 3 now does this
August 27th, 2023
Take a large model and throw out every other transformer block and continue training. How does performance compare? I've done this and seen loss go <2 so I think it's possible with full fine tuning instead of LoRAs. Done here in March 2024