MuhammadLab
CNN training intuition

CNN Operations Lab - Channels, Padding and Normalization

Upload an image and follow the CNN pipeline from channels to padding, convolution, activation, pooling, and normalization. The goal is not to hide the math but to make every operation visible for students.

ChannelsPaddingNormalizationCalculations

Pipeline

Image to feature map

3 + gray

Channels

10x10

Grid

3x3

Kernel

Files are processed in the browser. Uploaded images are not sent to a server.

Slide

CNN Channels

RGB image = height x width x 3

A color image has red, green, and blue channels. CNNs can learn different filters on each channel and combine them into feature maps.

Slide

CNN Padding

same output size: add border pixels

Padding keeps edge information available to the kernel. Without padding, the feature map becomes smaller after each convolution.

Slide

CNN Normalization

z = (x - mean) / std

Normalization stabilizes value ranges so later layers train more smoothly and gradients behave better.

Channels

Image channels become matrices

Red channel

10x10
35
53
71
89
107
125
143
161
179
197
35
53
71
89
107
125
143
161
179
197
35
53
71
89
107
125
143
161
179
197
35
53
71
209
227
245
255
161
179
197
35
53
71
209
227
245
255
161
179
197
35
53
71
209
227
245
255
161
179
197
35
53
71
209
227
245
255
161
179
197
35
53
71
89
107
125
143
161
179
197
35
53
71
89
107
125
143
161
179
197
35
53
71
89
107
125
143
161
179
197

Green channel

10x10
115
115
35
35
35
35
35
35
35
35
133
133
133
53
53
53
53
53
53
53
71
151
151
151
71
71
71
71
71
71
89
89
169
169
169
89
89
89
89
89
107
107
107
187
187
187
107
107
107
107
125
125
125
125
205
205
205
125
125
125
143
143
143
143
143
223
223
223
143
143
161
161
161
161
161
161
241
241
241
161
179
179
179
179
179
179
179
255
255
255
197
197
197
197
197
197
197
197
255
255

Blue channel

10x10
180
169
158
147
136
125
114
103
92
81
200
189
178
167
156
145
134
123
112
101
180
169
158
147
136
125
114
103
92
81
200
189
178
167
156
145
134
123
112
101
180
169
158
147
136
125
114
103
92
81
200
189
178
167
156
145
134
123
112
101
180
169
158
147
136
125
114
103
92
81
200
189
178
167
156
145
134
123
112
101
180
169
158
147
136
125
114
103
92
81
200
189
178
167
156
145
134
123
112
101

Grayscale input

10x10
98
103
60
64
68
72
76
80
85
89
111
115
120
77
81
85
89
93
97
102
73
124
128
132
89
93
97
102
106
110
86
90
141
181
185
142
144
114
119
123
94
98
102
189
193
197
152
123
127
131
107
111
115
155
206
210
212
136
140
144
115
119
123
163
167
218
220
191
148
152
128
132
136
140
144
148
200
204
208
165
136
140
144
148
153
157
161
210
214
218
149
153
157
161
165
170
174
178
216
220
Grayscale pixel = 0.299R + 0.587G + 0.114B. For the first pixel: 98 = 0.299(35) + 0.587(115) + 0.114(180).

Padding and convolution

Slide a kernel over the padded image

zero padded input

12x12
0
0
0
0
0
0
0
0
0
0
0
0
0
98
103
60
64
68
72
76
80
85
89
0
0
111
115
120
77
81
85
89
93
97
102
0
0
73
124
128
132
89
93
97
102
106
110
0
0
86
90
141
181
185
142
144
114
119
123
0
0
94
98
102
189
193
197
152
123
127
131
0
0
107
111
115
155
206
210
212
136
140
144
0
0
115
119
123
163
167
218
220
191
148
152
0
0
128
132
136
140
144
148
200
204
208
165
0
0
136
140
144
148
153
157
161
210
214
218
0
0
149
153
157
161
165
170
174
178
216
220
0
0
0
0
0
0
0
0
0
0
0
0
0

Edge detector

-1
-1
-1
-1
8
-1
-1
-1
-1

Convolution feature map

10x10
455
320
1
106
165
177
189
200
219
428
375
103
157
-126
-32
15
14
12
9
329
58
128
44
54
-264
-168
-86
-43
-12
333
209
-126
84
289
264
-14
132
-58
16
391
260
-62
-264
234
79
132
-62
-160
-14
395
319
15
-140
-18
156
115
249
-225
-32
454
323
-15
-87
118
-48
237
241
60
-156
411
382
15
-21
-58
-142
-236
91
80
162
380
386
-15
-15
-16
-9
-59
-153
125
93
721
763
498
510
521
531
550
516
449
688
1112
Output cell = sum(pixel x kernel weight). Example raw sum at one location: 79.
181 x -1 = -181
185 x -1 = -185
142 x -1 = -142
189 x -1 = -189
193 x 8 = 1544
197 x -1 = -197
155 x -1 = -155
206 x -1 = -206
210 x -1 = -210

Activation

ReLU removes negative responses

ReLU(x) = max(0, x)

After ReLU

10x10
455
320
1
106
165
177
189
200
219
428
375
103
157
0
0
15
14
12
9
329
58
128
44
54
0
0
0
0
0
333
209
0
84
289
264
0
132
0
16
391
260
0
0
234
79
132
0
0
0
395
319
15
0
0
156
115
249
0
0
454
323
0
0
118
0
237
241
60
0
411
382
15
0
0
0
0
91
80
162
380
386
0
0
0
0
0
0
125
93
721
763
498
510
521
531
550
516
449
688
1112

Pooling

Pooling compresses local regions

2x2 max = largest value

max pooled

5x5
455
157
177
200
428
209
289
264
132
391
319
234
156
249
454
382
118
237
241
411
763
521
550
516
1112

Normalization

Normalize the activation scale

x_norm = (x - min) / (max - min)

minmax normalized

5x5
0.34
0.04
0.06
0.08
0.31
0.09
0.17
0.15
0.01
0.27
0.2
0.12
0.04
0.13
0.34
0.27
0
0.12
0.12
0.29
0.65
0.41
0.43
0.4
1

Training connection

In training, the CNN learns the kernel weights.

This lab uses hand-picked filters so students can see the operations. During real CNN training, backpropagation updates many kernels so the network learns features that help reduce the loss for the task.

Open convolution lab

What this app is showing

The uploaded image is sampled into a small matrix so the math remains readable. Real CNNs use larger tensors, many channels, many learned filters, batches of images, nonlinear activations, normalization layers, and repeated blocks.