Skip to content

Convolutional neural networks

Python notebook: https://github.com/daviskregers/data-science-recap/blob/main/34-keras-convolutional-networks.ipynb

CNN's: what are they for?

  • When you have data that doesn't neatly align into columns
    • Images that you want to find features within
    • Machine translation
    • Sentence classification
    • Sentiment analysis
  • They can find features that aren't in specific spot
    • Like a stop sign in a picture
    • Or words within a sentence
  • They are "feature-location invariant"

How do they work?

  • Inspired by the biology of the visual cortex
    • Local receptive fields are groups of neurons that only respond to a part of what your eyes see (subsampling)
    • They overlap each other to cover the entire field (convolutions)
    • They feed into a higher layers that identify increasingly complex images
      • Some receptive fields identify horizontal lines, lines at different angles, etc. (filters)
      • These would feed into a layer that identifies shapes
      • Which might feed into a layer that identifies objects
      • For color images, extra layers for red and blue

How do we know that's a stop sign?

  • Individual local receptive fields scan the image looking for edges, and pick up the edges of the stop sign in a layer
  • Those edges in turn get picked up by a higher level convolution that identifies the stop sign's shape (and letters too)
  • This shape then gets matched against your pattern of what a stop sign looks like, also using the strong red signal coming from your red layers
  • That information keeps getting processed upward until your foot hits the brake
  • A CNN works the same way

CNNs with keras

  • Source data must be of appropriate dimensions
    • ie with x length x colors channels
  • Conv2D layer type does the actual convolution on a 2D image
    • Conv1D and Conv3D also available - doesn't have to be image data.
  • MaxPooling2D layers can be used to reduce a 2d layer down by taking the minimum value in a given block
  • Flatten layers will convert the 2D layer to a 1D layer for passing into a flat hidden layer of neurons
  • Typical usage: Conv2D -> MaxPooling2D -> Dropout -> Flatten -> Dense -> Dropout -> Softmax

CNNs are hard

  • Very resource-intensive (CPU, GPU and RAM)
  • Lots of hyperparameters
    • Kernel sizes, many layers with different numbers of units, amount of pooling... in addition to the usual stuff like number of layers, choice of optimizer
  • Getting the training data is often the hardest part. As well as storing and accessing it.

Specialized CNN architectures

  • Defines specific arrangement of layers, padding, and hyperparameters
  • LeNet-5
    • Good for handwriting recognition
  • AlexNet
    • Image classification, deeper than LeNet
  • GoogLeNet
    • Even deeper, buty with better performance
    • Introduces inception modules (groups of convolution layers)
  • ResNet (Residual Network)
    • Even deeper - maintains performance via skip connections.