• keepthepace@slrpnk.net
    link
    fedilink
    arrow-up
    3
    ·
    6 days ago

    It is llama3-8B so it is not out of question but I am not sure how much memory you would need to really go to 1M context window. They use ring attention to achieve high context window, which I am unfamiliar with but that seems to lower greatly the memory requirements.