A newbie question on neural network

@wargreymon2023 · 1 year ago

A newbie question on neural network

@model_tar_gz@lemmy.world · 1 year ago

The ‘swish’ activation function is f(x) = x.sigmoid(B.x).

B is typically set to 1, but it doesn’t have to be. You can use it as a parameter for the model to learn if you want. I’ve played with it and not really seen any significant benefit though; I’ve found that allowing the learning rate and/or batch size to vary are more impactful than a learned activation function. Also you can end up with vanishing or exploding gradients if you don’t constrain B; and even then B might saturate depending on what happens during training.

The choice of activation function itself is more impactful than allowing it to be dynamic/learned.

Happy learning!