Hi I'm Carson. I'm the cofounder of Forefront.ai
Feel free to email me at carson@forefront.ai
Here's a list of research ideas to steal from
July 12th, 2023
- ReLoRA - seen first here in November 2022
- Sparse Upcycling - seen first here also in November 2022
- GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints - seen first here in March 2023
- How many values of an embedding do you need to keep in order for the dot products to be roughly the same? Can I keep every third value and my sentence embeddings perform close to identically?
- Can you diff a fine tuned model and SVD the diff and turn the largest K singular values into LoRA weights with close to the same performance? Would enable merging a whole lot more LoRAs of alpacas/orcas/etc
- Can you appoximate the attention mechanism with a small model like BiLD / Speculative Sampling and throw out some number of irrelevant tokens? Specifically on chat based tasks where topics can change.
July 13th, 2023
- Can you take a pretrained model and continue training with QAT and wind up with a good model? I know int8 works and I even have done int4 but it would be really cool if you could do this down to int2.
- Can you take a dense model and SVD the weights and slightly lower the rank, continue training, and repeat this until the inner rank is significantly lower than the baseline dense model while maintaing benchmarks?
July 31st, 2023
- Can you do some basic retrieval and add some cross attention layers into a GPT/decoder-only model and train that for some steps and get a better-grounded/factually correct model given a retrieval output?
- Can LLava be done with cross attention instead of the current self attention?
August 27th, 2023
- Take a large model and throw out every other transformer block and continue training. How does performance compare? I've done this and seen loss go <2 so I think it's possible with full fine tuning instead of LoRAs.
Sep 13th, 2023
- MoE with LoRA (Sep 2023) - first seen here in May of 2023 and here in July of 2023. I called it MoLE.