CNN training intuition

CNN Operations Lab - Channels, Padding and Normalization

Upload an image and follow the CNN pipeline from channels to padding, convolution, activation, pooling, and normalization. The goal is not to hide the math but to make every operation visible for students.

ChannelsPaddingNormalizationCalculations

Pipeline

Image to feature map

3 + gray

Channels

10x10

Grid

3x3

Kernel

Files are processed in the browser. Uploaded images are not sent to a server.

Slide

CNN Channels

RGB image = height x width x 3

A color image has red, green, and blue channels. CNNs can learn different filters on each channel and combine them into feature maps.

Slide

CNN Padding

same output size: add border pixels

Padding keeps edge information available to the kernel. Without padding, the feature map becomes smaller after each convolution.

Slide

CNN Normalization

z = (x - mean) / std

Normalization stabilizes value ranges so later layers train more smoothly and gradients behave better.

Channels

Image channels become matrices

Red channel

10x10

107

125

143

161

179

197

107

125

143

161

179

197

107

125

143

161

179

197

209

227

245

255

161

179

197

209

227

245

255

161

179

197

209

227

245

255

161

179

197

209

227

245

255

161

179

197

107

125

143

161

179

197

107

125

143

161

179

197

107

125

143

161

179

197

Green channel

10x10

115

133

151

169

107

187

107

125

205

125

143

223

143

161

241

161

179

255

197

255

Blue channel

10x10

180

169

158

147

136

125

114

103

200

189

178

167

156

145

134

123

112

101

180

169

158

147

136

125

114

103

200

189

178

167

156

145

134

123

112

101

180

169

158

147

136

125

114

103

200

189

178

167

156

145

134

123

112

101

180

169

158

147

136

125

114

103

200

189

178

167

156

145

134

123

112

101

180

169

158

147

136

125

114

103

200

189

178

167

156

145

134

123

112

101

Grayscale input

10x10

103

111

115

120

102

124

128

132

102

106

110

141

181

185

142

144

114

119

123

102

189

193

197

152

123

127

131

107

111

115

155

206

210

212

136

140

144

115

119

123

163

167

218

220

191

148

152

128

132

136

140

144

148

200

204

208

165

136

140

144

148

153

157

161

210

214

218

149

153

157

161

165

170

174

178

216

220

Grayscale pixel = 0.299R + 0.587G + 0.114B. For the first pixel: 98 = 0.299(35) + 0.587(115) + 0.114(180).

Padding and convolution

Slide a kernel over the padded image

zero padded input

12x12

103

111

115

120

102

124

128

132

102

106

110

141

181

185

142

144

114

119

123

102

189

193

197

152

123

127

131

107

111

115

155

206

210

212

136

140

144

115

119

123

163

167

218

220

191

148

152

128

132

136

140

144

148

200

204

208

165

136

140

144

148

153

157

161

210

214

218

149

153

157

161

165

170

174

178

216

220

Edge detector

-1

Convolution feature map

10x10

455

320

106

165

177

189

200

219

428

375

103

157

-126

-32

329

128

-264

-168

-86

-43

-12

333

209

-126

289

264

-14

132

-58

391

260

-62

-264

234

132

-62

-160

-14

395

319

-140

-18

156

115

249

-225

-32

454

323

-15

-87

118

-48

237

241

-156

411

382

-21

-58

-142

-236

162

380

386

-15

-16

-9

-59

-153

125

721

763

498

510

521

531

550

516

449

688

1112

Output cell = sum(pixel x kernel weight). Example raw sum at one location: 79.

181 x -1 = -181

185 x -1 = -185

142 x -1 = -142

189 x -1 = -189

193 x 8 = 1544

197 x -1 = -197

155 x -1 = -155

206 x -1 = -206

210 x -1 = -210

Activation

ReLU removes negative responses

ReLU(x) = max(0, x)

After ReLU

10x10

455

320

106

165

177

189

200

219

428

375

103

157

329

128

333

209

289

264

132

391

260

234

132

395

319

156

115

249

454

323

118

237

241

411

382

162

380

386

125

721

763

498

510

521

531

550

516

449

688

1112

Pooling

Pooling compresses local regions

2x2 max = largest value

max pooled

5x5

455

157

177

200

428

209

289

264

132

391

319

234

156

249

454

382

118

237

241

411

763

521

550

516

1112

Normalization

Normalize the activation scale

x_norm = (x - min) / (max - min)

minmax normalized

5x5

0.34

0.04

0.06

0.08

0.31

0.09

0.17

0.15

0.01

0.27

0.2

0.12

0.04

0.13

0.34

0.27

0.12

0.29

0.65

0.41

0.43

0.4

Training connection

In training, the CNN learns the kernel weights.

This lab uses hand-picked filters so students can see the operations. During real CNN training, backpropagation updates many kernels so the network learns features that help reduce the loss for the task.

Open convolution lab

What this app is showing

The uploaded image is sampled into a small matrix so the math remains readable. Real CNNs use larger tensors, many channels, many learned filters, batches of images, nonlinear activations, normalization layers, and repeated blocks.