CapsNet (short for Capsule Network) is an artificial neural network that was proposed in 2017 by Geoffrey Hinton, Sara Sabour and Nicholas Frosst. It is a type of deep learning network that is used for image classification. CapsNet differs from traditional convolutional neural networks (CNNs) in that it explicitly models the relationships between parts of an object, such as the relative positioning of eyes, nose and mouth in a face.
Capsules are used to model assemblies of neurons. Each capsule has several neurons connected to it. By relating these neurons with each other, Capsule Networks are able to “model” the relationship between the various parts and entities in an image e.g. a face in a photograph.
Capsule networks operate at two levels, at an object level and a part level. At the part level, each capsule has several neurons connected to it. These are used to represent various aspects of the object such as shape, size, orientation, velocity, deformation etc. At the object level, each capsule encapsulates interactions between the parts and allows the network to reason about the object as a whole. This reasoning includes properties such as whether the object is horizontal or vertical, and its 3-dimensional orientation.
Unlike CNNs, Capsule Networks have the advantage of better spatial understanding as they can better recognize the relationship between the parts of an object. Capsule networks have also been used to solve tasks such as stereo matching and image segmentation.