Ok, so, tokenization of the words is why I get that I have seen tech nerds get so excited about a system that allows for being able to come up with synonyms for words that were auto-generated that have a basic ability to sometimes be correct by looking at the words before and after it…
But it’s such a shitty way to look up synonyms! Using the words on either side doesn’t mean you found a synonym just that you found another word that might work and it still has to use the full horsepower of ridiculously overpowered system.
Or you could have a lookup table that just reads the frickin word and has alternate synonyms predefined and it was able to run in word 97.
It’s ridiculous that we think this is better in any meaningful way instead of just wasteful development.
Sure if you want to reduce it down to it’s most basic elements. Anything sounds useless like that. Video games are just dots changing color on a screen and use a lot of electricity. Banking software is just tracking changes to a number and also is extremely inefficient and resource intensive.
Well I can personally say it has cut down the amount of busy work at my job down by a lot. Boilerplate code is easy and near instantaneous. I am also working on a project that can bridge the communication gap between highly technical text and non technical readers. It works extremely well at this task even with a non fine tuned model, which is my current step in development.
In all reality this tokenization and llm language processing is useful for all sorts of things which can be mathematically somewhat modeled similarly to language. Using them for shitty web searches is not ideal.
Right. My point was that it doesn’t view any of these as words without a shit ton of additional effort and processing power. That it doesn’t view tokens as words and therefore can’t just search a direct synonym. The whole thing is a bloated mess of over design to take half a step forward and then 6 steps to the left.
Ok, so, tokenization of the words is why I get that I have seen tech nerds get so excited about a system that allows for being able to come up with synonyms for words that were auto-generated that have a basic ability to sometimes be correct by looking at the words before and after it…
But it’s such a shitty way to look up synonyms! Using the words on either side doesn’t mean you found a synonym just that you found another word that might work and it still has to use the full horsepower of ridiculously overpowered system.
Or you could have a lookup table that just reads the frickin word and has alternate synonyms predefined and it was able to run in word 97.
It’s ridiculous that we think this is better in any meaningful way instead of just wasteful development.
Because that’s not the point of an LLM lmao
Sure but what is their purpose other than to create convincing sounding sentences? And use a lot of computer resources to do so?
Sure if you want to reduce it down to it’s most basic elements. Anything sounds useless like that. Video games are just dots changing color on a screen and use a lot of electricity. Banking software is just tracking changes to a number and also is extremely inefficient and resource intensive.
This is not a convincing argument to say that other things can be reduced down arbitrarily, than explaining the usefulness of it.
Well I can personally say it has cut down the amount of busy work at my job down by a lot. Boilerplate code is easy and near instantaneous. I am also working on a project that can bridge the communication gap between highly technical text and non technical readers. It works extremely well at this task even with a non fine tuned model, which is my current step in development.
In all reality this tokenization and llm language processing is useful for all sorts of things which can be mathematically somewhat modeled similarly to language. Using them for shitty web searches is not ideal.
Huh? Tokens is not how you seem similar words, or concepts. vector similarity search is.
Right. My point was that it doesn’t view any of these as words without a shit ton of additional effort and processing power. That it doesn’t view tokens as words and therefore can’t just search a direct synonym. The whole thing is a bloated mess of over design to take half a step forward and then 6 steps to the left.