Focused Transformer: Contrastive Training for Context Scaling - 256k context length AI

InternetPirate@lemmy.fmhy.ml · 1 year ago

Focused Transformer: Contrastive Training for Context Scaling - 256k context length AI

InternetPirate@lemmy.fmhy.ml · edit-2 1 year ago

The paper actually demonstrates a 16-million context window with 92% accuracy. Most models can be retrained to have a 100k context window with over 92% accuracy, but the accuracy drops to 74% at 256k. The code has already been released on GitHub as well. I’m excited to see the development of 100k models using this method soon!

Martineski@lemmy.fmhy.ml · edit-2 1 year ago

Sorry for a late reaction I was ill in the past few days and didn’t have energy to moderate this sub. Please include the date of the paper in the title, thank you.