At the simplest, it takes in a vector of floating-point numbers, multiplies them with other similar vectors (the “weights”), sums each one, applies a RELU* the the result, and then uses those values as a vector for another layer with it’s own weights (or gives output). The magic is in the weights.
This operation is a simple matrix-by-vector product followed by pairwise RELU, if you know what that means.
Where modelWeights is [[[Float]]], and so layer has type [Float] -> [[Float]] -> [Float].
RELU: if i>0 then i else 0. It could also be another nonlinear function, but RELU is obviously fast and works about as well as anything else. There’s interesting theoretical work on certain really weird functions, though.
Less simple, it might have a set pattern of zero weights which can be ignored, allowing fast implementation with a bunch of smaller vectors, or have pairwise multiplication steps, like in the Transformer. Aaand that’s about it, all the rest is stuff that was figured out by trail and error like encoding, and the math behind how to train the weights. Now you know.
Assuming you use hex values for 32-bit weights, you could write a line with 4 no problem:
That’s cool, though honestly I haven’t fully understood, but that’s probably because I don’t know Haskell, that line looked like complete gibberish to me lol. At least I think I got the gist of things on a high level, I’m always curious to understand but never dare to dive deep (holds self from making deep learning joke). Much appriciated btw!
Yeah, maybe somebody can translate for you. I considered using something else, but it was already long and I didn’t feel like writing out multiple loops.
No worries. It’s neat how much such a comparatively simple concept can do, with enough data to work from. Circa-2010 I thought it would never work, lol.
Only 1288 lines? Can I raise you a 6000+ lines stored procedure that calls to multiple different sql functions that each implements a slightly different variation of the same logic?
There was a research paper that took a variety of weaker LLMs and randomly asked each one to generate the next word, and it actually turned out really well.
I don’t personally have it, but I am using webstorm 2024.1 beta that has line generation. This is simply tab to complete the generated line, escape to remove the gen and focus on intellisence.
I won’t lie, the line gen is crap. I’d rather use my self hosted RefactAI docker but the plugin isn’t compatible for 2024.1 yet
The conversational part is really good though. I love that it has access to my code without having to paste it so I can just say “on line 274” or something. It’s apparently not good at generating code but if you were using it for that you should learn how to code. But it’s really good at fixing errors and issues.
I don’t use chat, as it never really have been more than a digital rubber ducky for me.
And it’s not really generating lots of code. Most of the time it’s just generating constructors/factory functions, or something easy like summing a vector of integers.
My philosophy is that my brain comes first, if the AI did what I was thinking of, then press tab. I ain’t debugging a AI made function for two hours when I can make it in an hour
It’s the same way for me. I don’t know if my work is this trivial or I’m just “good enough” at it, but it takes me much longer to prompt the chat to get what I want than it takes me to just write it myself.
I honestly kinda feel like I’m using this ai stuff wrong, but outside of generating some basic unit tests and a little better auto complete it feels kinda useless in my day to day work.
Unrelated, but the other day I read that the main computer for core calculation in Fukushima’s nuclear plant used to run a very old CPU with 4 cores. All calculations are done in each core, and the result must be exactly the same. If one of them was different, they knew there was a bit flip, and can discard that one calculation for that one core.
Interesting. I wonder why they didn’t just move it to somewhere with less radiation? And clearly, they have another more trustworthy machine doing the checking somehow. A self-correcting OS would have to parity check it’s parity checks somehow, which I’m sure is possible, but would be kind of novel.
In a really ugly environment, you might have to abandon semiconductors entirely, and go back to vacuum as the magical medium, since it’s radiation proof (false vacuum apocalypse aside). You could make a nuvistor integrated “chip” which could do the same stuff; the biggest challenge would be maintaining enough emissions from the tiny and quickly-cooling cathodes.
I often do this, but I always hit Ctrl-S before running it again. Shamefully, this probably works about 10% of the time. Does that technically count as changing nothing?
Yeah. And I can send a quick email to update the team after I get home from my 45 minute commute, then log off and go to the cottage in that cell signal dead spot by the lake.
programmer_humor
Active
This magazine is from a federated server and may be incomplete. Browse more on the original instance.