Keras: Layers and Activation Functions

Keras layers are the fundamental building blocks of neural networks. They encapsulate weights, biases, and the logic to perform a transformation on input data. Activation functions are typically applied after a layer's linear transformation to introduce non-linearity into the model, allowing it to learn complex patterns.

1. Core Layers

Keras provides a wide variety of layers for different network architectures.

a. `Dense` (Fully Connected) Layer

The most basic layer, where each neuron in the layer is connected to every neuron in the previous layer. Performs output = activation(dot(input, kernel) + bias).

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

model = keras.Sequential([
    layers.InputLayer(input_shape=(10,)), # Input feature size 10
    layers.Dense(units=32, activation='relu'), # Output 32 neurons
    layers.Dense(units=1, activation='sigmoid') # Output 1 neuron (e.g., for binary classification)
])
model.summary()

# Get the weights and biases of the first Dense layer
print("\nWeights of first Dense layer (kernel):", model.layers[0].get_weights()[0].shape)
print("Biases of first Dense layer:", model.layers[0].get_weights()[1].shape)

b. Convolutional Layers (`Conv1D`, `Conv2D`, `Conv3D`)

Used primarily in image processing for feature extraction. They apply a set of learnable filters to the input.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

# Example: Conv2D for image data (e.g., 28x28 grayscale image)
model_conv = keras.Sequential([
    layers.InputLayer(input_shape=(28, 28, 1)), # H, W, Channels
    layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu', padding='valid'),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Flatten(), # Flatten to 1D for Dense layers
    layers.Dense(10, activation='softmax')
])
model_conv.summary()

c. Pooling Layers (`MaxPooling2D`, `AveragePooling2D`)

Reduce the spatial dimensions (height, width) of the input volume, helping to reduce computational cost and control overfitting.

# See example above in Conv2D. MaxPooling2D is used after Conv2D.
# Other pooling layers include GlobalAveragePooling2D, GlobalMaxPooling2D, etc.

d. Recurrent Layers (`LSTM`, `GRU`, `SimpleRNN`)

Used for sequential data like text or time series, allowing information to persist across sequence elements.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

# Example: LSTM for sequence data (e.g., 50 time steps, 10 features per step)
model_rnn = keras.Sequential([
    layers.InputLayer(input_shape=(None, 10)), # (timesteps, features) - None for variable timesteps
    layers.LSTM(units=64, return_sequences=True), # return_sequences=True to stack another RNN layer
    layers.LSTM(units=32),
    layers.Dense(1, activation='sigmoid')
])
model_rnn.summary()

e. Embedding Layer (`Embedding`)

Used for converting positive integers (indices) into dense vectors of fixed size, typically for processing categorical inputs or text.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

# Example: Embedding for text data (e.g., vocabulary size 1000, embedding dimension 64)
# Input is a sequence of integers (word indices)
model_embedding = keras.Sequential([
    layers.InputLayer(input_shape=(None,)), # Sequence of word indices
    layers.Embedding(input_dim=1000, output_dim=64), # Vocabulary size 1000, 64-dim embeddings
    layers.Flatten(),
    layers.Dense(1, activation='sigmoid')
])
model_embedding.summary()

f. Other Important Layers:

Flatten: Flattens the input, e.g., from (batch, H, W, C) to (batch, H*W*C).
Dropout: Randomly sets input units to 0 at a frequency of rate during training time, helping prevent overfitting.
BatchNormalization: Normalizes the activations of the previous layer, stabilizing and accelerating training.
Reshape: Reshapes outputs to a target shape.

2. Activation Functions

Activation functions introduce non-linearity, allowing neural networks to learn complex relationships. They are typically applied after the linear transformation of a layer.

Common Activation Functions:

ReLU (Rectified Linear Unit): f(x) = max(0, x). Most common for hidden layers.
Sigmoid: f(x) = 1 / (1 + exp(-x)). Squashes output to [0, 1]. Often used in binary classification output layers.
Tanh (Hyperbolic Tangent): f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)). Squashes output to [-1, 1].
Softmax: f(x_i) = exp(x_i) / sum(exp(x_j)). Converts a vector of numbers into a probability distribution. Often used in multi-class classification output layers.

Using Activation Functions:

As a parameter to a layer: python model = keras.Sequential([ layers.Dense(64, activation='relu', input_shape=(10,)), layers.Dense(1, activation='sigmoid') ])
As a separate Activation layer: python model = keras.Sequential([ layers.Dense(64, input_shape=(10,)), layers.Activation('relu'), # Explicit activation layer layers.Dense(1), layers.Activation('sigmoid') ]) This is less common for simple activations but can be useful for more complex custom activations or when combining layers.

3. Custom Layers

For specialized operations not available in built-in layers, you can create custom layers by subclassing tf.keras.layers.Layer.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

class MyCustomLayer(layers.Layer):
    def __init__(self, output_dim, **kwargs):
        super(MyCustomLayer, self).__init__(**kwargs)
        self.output_dim = output_dim

    def build(self, input_shape):
        # Create a trainable weight variable for this layer.
        self.kernel = self.add_weight(name='kernel',
                                      shape=(input_shape[-1], self.output_dim),
                                      initializer='uniform',
                                      trainable=True)
        super(MyCustomLayer, self).build(input_shape) # Be sure to call this at the end

    def call(self, inputs):
        # Define the layer's logic
        return tf.matmul(inputs, self.kernel)

    def get_config(self):
        config = super(MyCustomLayer, self).get_config()
        config.update({"output_dim": self.output_dim})
        return config

# Example using the custom layer
model_custom = keras.Sequential([
    layers.InputLayer(input_shape=(10,)),
    MyCustomLayer(output_dim=5),
    layers.Activation('relu'),
    layers.Dense(1, activation='sigmoid')
])
model_custom.summary()

Further Topics:

Regularization layers (L1L2, ActivityRegularization)
Constraint layers (max_norm, unit_norm)
Advanced activation functions (LeakyReLU, ELU, PReLU)
Merge layers (Add, Subtract, Multiply) in the Functional API
Pre-trained layers for transfer learning.

Understanding the various Keras layers and activation functions is essential for designing and implementing effective neural network architectures for different machine learning tasks.