Favorites - > The issue here is that you are describing the goal of LLMs, not how they...

theterrasque , 1 year ago

The issue here is that you are describing the goal of LLMs, not how they actually work.

No, I am describing how they actually work.

it cannot achieve this via rudimentary statistics alone because the model simply does not have enough parameters to memorize which token is more likely to go next in all cases.

True, hence the limitations. That would require infinite storage and infinite compute capability.

Also, going “one token at a time” is only a “limitation” because LLMs are not accurate enough.

No, it’s done because one letter at a time is too slow. Tokens are a “happy” medium tradeoff.

The space token effectively just makes it reflect on the conversation.

It makes a “break” of the block, which lets it start a new answer instead of continuing on the previous. How it reacts to that depends on the fine tune and filters before the data hits the LLM.

To be clear, I do not believe LLMs are the future.

I have just said that LLM’s we have today can’t fix the problems with false data and hallucinations, because it’s a core principle of how it operates. It will require a new approach.

You could add a rocket engine and wings to a pogo stick, but then it’s no longer a pogo stick but an airplane with a weird landing gear. Today’s LLM’s could give us hints to how to make a better AI, but that would be a different thing than today’s LLM’s. From what has been leaked from OpenAI GPT4 has scaling issues so they use mixture of experts. Just throwing hardware at it is already showing diminishing returns. And we’re learning fascinating new ways of training them, but the inherent problem is the same.

For example, if you ask an LLM if it can give an answer to a question, it will have two paths to go down, positive and negative. Note, at the point where it chooses that it doesn’t know how to finish it, it doesn’t look ahead. But it sees for example that 80% of the answers in the texts it’s been trained on starts with a positive, then it will most likely start with “yes” - and when it does that it will continue to generate an answer - often very convincing and plausibly real looking answer, because it already committed to that path.

And as for the link about teaching it backspace token, the comments there are already pointing out the issue:

It’s interesting that in the examples (Table 3 on page 21), the model uses the backspace token to erase the randomly-added token from the prompt, but it does not seem to ever use the token to correct its own output. I’m curious how frequently the model actually uses this backspace token in practice - and if the answer is “vanishingly rarely”, what is the source of the improved Mauve score and sample diversity they show? Is it just that the different training procedure gives an improvement?

For it to use the backspace, wouldn’t it have to predict the wrong token with greater confidence than the corrected token? I would think this would require more examples of a wrong token + correction than the correct token, which seems a bit odd.

Almost none of the text it’s trained on has a backspace token, and to finetune it in is tricky since it’s a completely new concept - and remember it’s still doing token for token - so it would have to write a token and then right after find out that it’s more likely to send a backspace token than to continue it. It’s interesting, and LLM’s can pick up on some crazy patterns, but I’m skeptical.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...