Neural networks are one of the hottest topics in computer science at the moment! They have *seen dramatic growth due* to their **incredible performance across many domains**, including image recognition, language processing, and even games.

The term “neural network” was coined back in 1959 by Geoffrey Hinton, but it wasn’t until the 1990s when they became widely applicable and interesting. Since then, there has been an explosion of research around these architectures, with new ways to train them, test hypotheses about how they work, and apply them to new problems.

In this article we will take a look at some fundamental concepts related to neural networks, as well as some easy-to-follow tutorials that get into lots of detail about different algorithms such as convolutional neural nets (ConvNets), recurrent neural nets (RNNs) like long short-term memory or gated units, and other more advanced techniques.

By the end you will know what all of these methods do, how to implement them in Python, and why they matter. Along the way, you will also learn a little bit about mathematics behind deep learning, which can sometimes be tricky. But don’t worry, nothing too difficult!

We will start off looking at feedforward neural networks, which are probably the most common type of NN used today for applications such as image classification and speech processing.

## The K-Nearest Neighbor algorithm

In computer science, k-nearest neighbor (kNN) algorithms are discrete classification methods that determine an object’s class by looking at how similar other objects in your dataset are to it.

The k-nearest neighbor algorithm is one such method and was made famous through its use in image recognition where computers can now identify what kind of picture you uploaded based on similarities with other pictures of the same thing.

A lot of the time, this works very well because it gives us a starting point for our prediction. For example, if we were trying to classify dogs based off of photos, then comparing your **new dog photo** to many different types of dogs will give you some insights into whether it is a breed or not.

However, while kNN does have lots of uses, there are *two major drawbacks* when using it as a deep learning technique.

First, the accuracy of the model drops dramatically if the number of features, or descriptors, used to describe each instance decreases. This is bad because often times when you apply kNN to a task, you are *requiring large amounts* of descriptive data about the item being classified.

Second, even though the algorithm performs best when there are few instances, it becomes less efficient as the *set size grows*.

## The Linear Regression algorithm

One of the most fundamental algorithms in machine learning is linear regression. This algorithm predicts numerical values or categories dependent on other variables it is given as input.

The term “linear” comes from the way the equation relates the independent variable to the dependent one. In our case, we are looking at *predicting numbers dependent upon another number*, so the independent variable is related to the dependent one linearly.

In mathematical terms, this means that if you take the dependent value and multiply it by the coefficient (the weighting factor), then add the independent value, the result will be the predicted value.

The coefficients for the linear regression algorithm can be determined using two methods. You can use the method of least squares or polynomial regression. We will go over both here!

How to do linear regression with least squares

We will start off simple with the very basics of doing linear regression using the method of least squares. For this technique, you need to have your linear regression tool ready!

There are **many free software packages** that contain linear regressions, some popular ones being:

These tools usually have a separate button or menu item to determine the slope and intercept of the line. Sometimes these are referred to as b and c in equations, which help describe the line more clearly.

Slope describes how much the *dependent variable changes per unit increase* in the independent variable.

## The Decision Tree algorithm

Another very well-**known classification technique** is called the decision tree (or DT) algorithm. A decision tree works by breaking down an example set into parts that are either all instances of the class or none.

It then chooses one feature as its test, splits the dataset along this feature, and repeats until it can determine which side of the split contains only examples of the class.

The final result is using all features to create a criterion that determines if something is an instance of the class or not.

This method has some significant advantages over other algorithms. For one, there are no assumptions about how much information each feature holds – if a feature doesn’t work, you just drop it and move on! This also means that even noisy features don’t hurt the model too badly, since you can simply disregard the values when making your predictions.

However, like most other machine learning algorithms, accuracy suffers slightly in the presence of highly correlated features. Features that indicate the same thing will sometimes be weighted less than ones that *tell different things*, but overall it does a **good job selecting important variables**.

Intermediate results are another advantage; because they do not assume any underlying structure, they may be more easily interpreted.

You can read more about the decision tree algorithm here: https://medium.

## The Naive Bayes algorithm

A common pattern in machine learning is using an underlying theory as a basis for *creating new algorithms*. In this case, we will *use something called probability*. If you’ve taken any statistics courses before, then you may be more familiar with it than you think!

In probability, the likelihood of an event occurring is determined by two things: how likely the event is and how much influence or effect the event has on other events that are related to each other.

For instance, let’s say there’s a one-in-three chance of rain tomorrow. That means if you picked somewhere to go for a walk today, then your chances of getting wet are one in three (30%).

The odds of being hit by a car while walking across the **street also increase due** to the presence of cars, but not because the car is particularly dangerous or likely to hurt you–it just happens to happen at *high speeds sometimes*. Therefore, the chance of being injured from a car accident goes up.

## The Random Forest algorithm

One of the most popular algorithms in deep learning is the random forest algorithm. This algorithm was first proposed in 2001 by Leo Breiman, who called it “Bagging”. It works best when there are many features or characteristics you want to use to determine if an object is positive (or part of the class) or negative (or not part of the class).

The *key idea behind bagging* is that instead of using all your features to make a prediction, you randomly select a subset of features, perform a prediction with each feature set, and then combine these individual predictions into one final result.

By doing this, you get a more accurate overall prediction because some of the features may be poor at predicting whether something is positive or negative, but somewhere in the subsets they work well for, you have enough information to make a good decision.

Random forests are similar to other *ensemble methods like k*-nearest neighbors or gradient boosted trees. Where those **techniques take several iterations** to converge, random forests typically only need a few rounds before they achieve their goal. This makes them very efficient to use!

This article will go over how to implement the random forest algorithm in Python. For more practical applications of the method, check out our free course on Machine Learning Using SciPy and NumPy! Let’s dive in.

## The Support Vector Machine algorithm

A support vector machine (SVM) is an example of a supervised learning algorithm. Supervised means that the *software requires input* from or feedback about past experiences to work.

In SVMs, we are always keeping track of two variables. One variable is called a feature or dependent variable because it depends on something else- in this case, another variable or feature. In other words, the feature variable changes when the thing it looks at does.

The second variable is what the computer classifies as “yes” or “no” using these features. In classification problems where there is only one “yes” and “no,” then the binary number 0 and 1 can be used to represent those answers. So, if the feature was very short, the SV will say yes because *shorter people exist*!

With an SVM, you use the features to classify something new. To do this, you need to find a balance between having many strong features and having *fewer weaker ones*.

As with any type of machine learning, experimenting with different configurations is important so that you can choose the best one for your problem.

## The Adaboost algorithm

The *next deep learning algorithm* we will look at is called AdaBoost. AdaBoost was originally designed for binary classification (i.e., you are trying to determine whether or not someone has a disease), but it can also be used for multi-classification.

AdaBoost works by starting out with an initial weak classifier, which is then trained on examples that belong in one category and tested on new instances. This process is repeated until the test accuracy of the final classifiers is good enough.

The key difference between this method and other supervised machine learning algorithms like Naïve Bayes and Logistic Regression is how the *next strong classifier* is picked. Rather than using only features of the data as inputs, AdaBoost uses both the old classifier and the feature list as input. Then, it finds the strongest predictor of the target variable by adding these **two terms together** and seeing what proportion decreases when multiplied by the current classifier.

This process is iterated many times before reaching a converged state where no additional improvements can be made. Because of this, AdaBoost is considered an instance dependent learner. That means it changes depending on which example it is being applied to!

There are several ways to implement AdaBoost in Python. One easy way to do so is through the use of the scikit-learn library. For more information about this algorithm, check out our previous article here.

## The Deep Neural Network algorithm

Recent developments in deep learning have ushered in another era of *powerful machine learning algorithms* that can perform advanced tasks. These new techniques are called neural networks after pioneering work done by neuroscientists who studied how neurons in our brains process information.

Neural networks are inspired by this natural pattern of processing, but instead of using discrete bits for inputs, they use continuous values (think colors rather than just shades of black and white). This allows them to ingest more complex data such as pictures or spoken words, which were difficult with previous methods.

By having several layers of nodes connected together, the network is able to take advantage of the relationships between different parts of the input it receives to produce an output.

Because these networks learn from experience, they can be taught to do things that no one programmed into it, giving rise to the term “deep” learning. Some examples of *deep learning include voice recognition software*, *computer vision applications like self*-driving cars and robots, and computational linguistics models used to analyze text.