Recent developments in computer technology have led to another boom in artificial intelligence (AI). AI has been around since the 1950s, but it was not until more recent years that we started to see it become a thing. Now there are many applications of AI where computers can perform tasks automatically with little or no human input!
With all this talk about how advanced AI is becoming, one question often arises: just how powerful are these systems?
How much information do these machines need to learn what they want them to know? And once they have learned that knowledge, how large an amount of data must they use to ensure that their model works properly?
These questions play a big part in helping us understand why some people get excited about AI, while other’s remain skeptical.
In this article, we will discuss one type of deep learning – convolutional neural networks (CNN) – and how they work by looking at different types of images. We will also explore how many examples, or “training sets,” of pictures needed to effectively train a CNN.
After reading this article, you should be able to determine an appropriate number of training images for your chosen CNN architecture.
Types of neural networks
There are several different types of neural network architectures that can be used to do deep learning. Some are better than others, depending on what you want your model to learn!
The most popular type is called convolutional neural networks (ConvNets). These work by taking pictures or videos as input and then looking at small sections of the picture/video multiple times while also changing the size of the section analyzed.
These ConvNets typically have many layers, where each layer performs a specific task (for example, extracting features from the image or transforming the data in some way like dimensionality reduction).
By having lots of layers, it is possible to combine all these individual transformations into one complex process which comes very close to how our brains process information – by thinking about it, understanding it, and creating relationships between things around us. This is why they are referred to as “neural” because they imitate the neurons in our brain.
Here are some examples of tasks done with ConvNets:
Extracting features such as shapes, colors, textures, etc.
Classifying objects (e.g., dog vs. cat)
Recognizing people or animals
Finding patterns in images or sounds
Running computer programs using inputs such as pictures or spoken words
In this article we will focus only on the first two uses listed above: extracting features and classifying objects.
The number of images needed for neural networks
There is an argument that says that you do not need very many examples to train your network. This theory argues that even one example can teach your network enough information to recognize most types of objects or patterns.
This is what makes it difficult to use deep learning for image recognition, because almost every new picture contains something interesting or novel. If we trained our network with just these materials, it would never be able to tell us what those pictures mean!
The other side of this debate asserts that there are certain classes of object people seem to agree upon across all cultural backgrounds, which require large amounts of training data.
By having lots of examples, the system has more chances to find such generalizations about how humans organize knowledge. Therefore, it can learn better representations of these concepts.
Calculating the number of images needed
When it comes to investing in or using AI, one of the biggest questions is how much resources are needed to achieve good results.
In this article we will talk about how to determine the amount of pictures you need to use for your deep learning experiments. This information can help save you money by not buying more than you need!
Deep neural networks require lots of data to work properly. The more examples you have of any given concept, the better the network learns what that concept looks like.
By limiting yourself to only a certain number of examples, you may be creating your own limitations on what concepts the algorithm can learn.
There are two main factors when calculating how many examples are needed: the size of the dataset and the complexity of the model.
This article will focus mostly on the first factor: figuring out the size of the dataset. We will also do an extended experiment comparing different ways of determining the picture count.
Transfer learning
In computer science, transfer learning is an approach to building new systems by taking pre-existing components and reusing them. This cuts down on development time because you do not have to start from scratch!
A good example of this is when you are trying to determine what animals are in a picture. By looking at lots of pictures with dogs or cats, it is possible to identify other species such as elephants or lions.
By using this technique here, we can use VGG16 which has over 16 million parameters (the number that describe each layer of the network) to learn how to recognize cats from before. We then simply need to train it to recognize more dog breeds!
This article will go into detail about how to implement transfer learning in PyTorch.
Applying deep learning to images
Recent developments in computer vision use large, pre-trained networks that have been trained using lots of data to apply advanced image processing techniques. These new architectures are usually referred to as “deep neural networks” or “neural network models.”
The most famous example of this is probably Google’s now-famous GoogLeNet which was designed to recognize all sorts of things like animals, fruits and vegetables, and even some people. More recent examples include face recognition technology and object classification systems.
By having very complex layers of computation, these architectures can learn more about images than simple threshold tests or edge detection. For instance, they may be able to identify what kind of flower a given picture contains, determine if something is natural looking or not, or find patterns such as cars or buildings.
There are several benefits to this approach over other methods. First, it does not require any programming knowledge beyond knowing how to create applications using HTML, CSS, and JavaScript. This makes it much easier to implement and potentially less expensive to develop.
Second, these networks typically do not need very many training samples to work effectively. Only around 10,000 pictures per class are enough! Even fewer will often produce good results.
Identifying and correcting bad photos
One of the most important things to do when editing your pictures is to correct or remove poor quality images. This includes having too many people in the picture, blurry shots, and clearly out-of-focus photographs.
Having too many people in the photo makes it take up more space on the computer which can reduce the speed at which you edit photos. Blurry photos may also need re-cropping or re-sizing so that all parts of the image are not blurred. Out-of-focus photos may require using software such as Photoshop or Lightroom to sharpen either the foreground, middle ground, or background element.
Understanding and using brightness
One of the most important parts of taking good pictures is knowing how to use light effectively. This includes both natural light and artificial light, like studio lights or flash.
When shooting with nature as your source of light, you have more control over the exposure than when using artificial lighting. However, there are still things that you can do to make the picture better.
One of these is to understand how light interacts with color. Different colors absorb different amounts of light, depending on their composition. For example, white items like blankets will not only reflect some of the light being absorbed by other colored objects in the photograph, but they also lose little-to-no light themselves.
This process is what makes it possible to include bright whites in a photo. By having enough light, you can create very vibrant, well-exposed photos.
Understanding and using contrast
The second important element to consider when choosing which pictures to use as inputs into your model is contrast. Contrast comes in two main forms: color contrast and intensity (or brightness) contrast.
Color contrast refers to the difference in hue or tone of an object or area. For example, if there are 2 objects that both contain the same number of pixels then one would have more solid colors than the other. If one was all white with some gray in it while the other had lots of strong colors like red, green and yellow then they would have high color contrast.
Intensity contrast occurs when one thing is totally covered up by something else. A common occurrence of this is when you put paper down on top of another surface. The surface underneath will show through due to intensity contrast.
Both these concepts apply to images as well! For instance, if we took a picture of a person’s face but replaced their skin with a very light background color then its color contrast would be low because nothing about the image indicates there should be any distinction in hue or tone between the air and the skin.
The way to fix this is to make the background darker so that you can see the texture of the skin. Or use a photo of someone with natural exposed skin as an input into your AI. This has higher color contrast!
Another way to increase intensity contrast is to take a black and white picture instead of a colored one.