A newbie question on neural network

wargreymon2023 · 2 years ago

A newbie question on neural network

80085@lemmy.world · edit-2 2 years ago

I just looked it up, and apparently someone implemented dynamic activation functions in a CNN: https://www.nature.com/articles/s41598-022-19020-y . I’ve never seen something like this elsewhere. I have included various activation functions in hyperparameter searches before full training to find the “best” one on datasets. I haven’t really seen much of a difference in validation performance between activation functions.

Found another paper using dynamic activation functions with transformers: https://arxiv.org/pdf/2208.14111.pdf

theobromus@vlemmy.net · 2 years ago

There are activation functions with some learnable parameters (search Google for learnable activation functions and you’ll find some papers). But it’s not particularly common to use them. Instead, the weights of the layer are learned, and that can (together with the activation function) represent very complicated functions.

I will note that it is quite common to use AutoML techniques, which try a variety of architectures (including different activation functions) to see which works best.

model_tar_gz@lemmy.world · 2 years ago

The ‘swish’ activation function is f(x) = x.sigmoid(B.x).

B is typically set to 1, but it doesn’t have to be. You can use it as a parameter for the model to learn if you want. I’ve played with it and not really seen any significant benefit though; I’ve found that allowing the learning rate and/or batch size to vary are more impactful than a learned activation function. Also you can end up with vanishing or exploding gradients if you don’t constrain B; and even then B might saturate depending on what happens during training.

The choice of activation function itself is more impactful than allowing it to be dynamic/learned.

Happy learning!

Hopps@lemmy.world · 2 years ago

Based on my research, there is an emerging interest in the field of meta-learning, or “learning to learn.” Some researchers are exploring the concept of allowing neural networks to learn their own hyperparameters, which could include parameters of activation functions. However, it’s my understanding that this approach could lead to more complex training processes and risks such as unstable gradients, and it might not always result in significantly better performance.

While activation functions with learnable parameters aren’t commonly used, there is ongoing research that explores them. One such example is the Parametric ReLU (PReLU) function - a variant of the ReLU activation function that allows the negative slope to be learned during training, as opposed to being a predetermined hyperparameter.

In my opinion, if you’re new to this field, it’s essential to grasp the basics of neural networks, including understanding how common activation functions like ReLU, sigmoid, tanh, etc., operate. These advanced concepts are undoubtedly fascinating and might offer incremental improvements, but even most of today’s state-of-the-art models primarily use these “standard” activation functions. So, starting with a solid foundation is key.