Activity - At the simplest, it takes in a vector of floating-point numbers, multiplies them...

CanadaPlus , 6 months ago (edited 6 months ago)

At the simplest, it takes in a vector of floating-point numbers, multiplies them with other similar vectors (the “weights”), sums each one, applies a RELU* the the result, and then uses those values as a vector for another layer with it’s own weights (or gives output). The magic is in the weights.

This operation is a simple matrix-by-vector product followed by pairwise RELU, if you know what that means.

In Haskell, something like:

layer layerInput layerWeights = map relu $ map sum $ map (zipWith (*) layerInput) layerWeights

foldl layer modelInput modelWeights

Where modelWeights is [[[Float]]], and so layer has type [Float] -> [[Float]] -> [Float].

RELU: if i>0 then i else 0. It could also be another nonlinear function, but RELU is obviously fast and works about as well as anything else. There’s interesting theoretical work on certain really weird functions, though.

Less simple, it might have a set pattern of zero weights which can be ignored, allowing fast implementation with a bunch of smaller vectors, or have pairwise multiplication steps, like in the Transformer. Aaand that’s about it, all the rest is stuff that was figured out by trail and error like encoding, and the math behind how to train the weights. Now you know.

Assuming you use hex values for 32-bit weights, you could write a line with 4 no problem:

wgt35 = [0x1234FCAB, 0x1234FCAB, 0x1234FCAB, 0x1234FCAB];

And, you can sometimes get away with half-precision floats.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...