# Introduction to Machine Learning

Machine Learning is the buzzword of the moment in computer science and technology and undeniably one of the most powerful technologies today. Machine learning is an integral part of many applications we use every day such as Google search, Google maps, Netflix recommendations, spam email detection, virtual assistants like Alexa and Siri, and even medical diagnosis and healthcare. There are very few domains that are untouched by this rapidly expanding field.

## What is Machine Learning?

A machine is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured byP, improves with experience E.– Tom Mitchell

Machine Learning, 1997.

Machine Learning is a subset of Artificial Intelligence that develops systems which can automatically learn and improve from experience without being explicitly programmed. In simple words it is the process of training a piece of software, called a model, to make useful predictions using a data set.

The basic premise of machine learning is to build models from example inputs to make data-driven predictions or decisions. The world is filled with an enormous amount of data today which continues to grow. This massive amount of data is useless unless we learn how to analyze them. Machine Learning helps us give meaning to this data. It converts the information in the data into knowledge.

## Major approaches in Machine Learning

The famous No Free Lunch theorem in machine learning states that there is no single algorithm that will work well for all tasks. Each task that you try to solve has its quirks. Hence, there are a lot of algorithms and approaches to suit each one of them.

Machine Learning is typically classified into three major paradigms – supervised, unsupervised and reinforcement learning. Although not often defined as a fourth class of machine learning, there is another approach called semi-supervised learning which represents a middle ground between supervised and unsupervised learning. These different types of machine learning algorithms differ in their approach, the type of data they input and output, and the type of problems that they are intended to solve.

**Supervised Learning**

Supervised Learning is the most commonly used branch of machine learning. Typically every new machine learning practitioner starts with supervised learning algorithms. With supervised learning you use labelled data, which is a data set that has been classified, to infer a learning algorithm. This data is called training dataset. This training data is used as the basis for predicting the classification of other unlabelled data (also known as test data set) through the use of machine learning algorithms. So, the basic goal is to find the mapping function to map the input variable (X) with the output variable (Y) in the training data.

Y =* f*(X)

One practical example of supervised learning problems is predicting house prices. For this, we first need data about he houses: square foot area, number of rooms, whether a house has a garden or not, and other such features. We then need to know the housing price, i.e. the target labels. By leveraging data coming from thousands of houses, we can now train a supervised machine learning model to predict a new house’s price based on the examples observed by the model.

Datasets are generally split into training, validation, and testing datasets before using them. Models will always perform optimally on the data they are trained on. Being able to adapt to new inputs and make predictions is the crucial generalisation part of machine learning. Over-training the model often results in the model’s inability to adapt to new, previously unseen data, which is referred to as overfitting.

#### Methods of supervised learning – Classification and Regression

The output or the target feature in a supervised machine learning model could be a number, in which case it is known as a **regression** model; or alternatively could be a category and in this case it is called a classification model. The housing price example we discussed above is a regression problem. A simple widely used classification algorithm is email spam **classification** based on spam/non-spam email examples.

**Unsupervised Learning**

Today’s machine learning applications need a lot of labelled data to have good performance, but most of the world’s data is not labelled. For machine learning to advance, algorithms will need to learn from unlabelled data and make sense of the world from pure observation, much like how children learn to operate in the real world after birth without too much guidance.

Most of human and animal learning is unsupervised learning. If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake.-Yann LeCun

Unsupervised learning is about extracting underlying structure and detecting patterns from an unlabelled dataset without any reference to labelled outcomes or predictions. These algorithms are named unsupervised because the patterns that may or may not exist in a dataset are not informed by a target and are left to be determined by the algorithm. Unlike supervised learning, here only the input data is provided; we lack the access to labelled examples.

When presented with data, an unsupervised machine will search for similarities between the data, namely images, and separate them into individual groups, attaching its own labels onto each group. This kind of algorithmic behaviour is very useful when it comes to segmenting customers as it can easily separate data into groups without any form of bias that might hinder a human due to pre-existing knowledge about the nature of the data on the customers.

#### Methods of Unsupervised Learning

Some of the main methods used in unsupervised learning are** cluster analysis, association rules** and **dimensionality reduction.**

**Cluster analysis**, also known as data segmentation, involves grouping or segmenting a collection of objects into subsets or “clusters”. This involves successively grouping the clusters themselves so that at each level of the hierarchy, clusters within the same group are more similar to each other than those in different groups.

**Association rule** mining finds interesting associations and relationships among large sets of data items. This rule shows how frequently an itemset occurs in a transaction. A typical example is Market Based Analysis.

Large numbers of input features often make a predictive modelling task more challenging to model. This is often referred to as the curse of dimensionality. **Dimensionality reduction **techniques reduce the number of input variables in a dataset to simplify a classification or a regression problem.

**Semi-supervised Learning**

The most inherent disadvantage of any supervised learning algorithm is that the dataset needs to be hand-labelled either by a Machine Learning Engineer or a Data Scientist. This is a very costly process, especially when dealing with large volumes of data. On the other hand, the main disadvantage of unsupervised learning is that it’s application spectrum is limited. To counter these disadvantages, the concept of semi-supervised learning was introduced.

In semi-supervised learning, the algorithm trains on a combination of** a small amount of labelled data and a large amount of unlabelled data**. The basic idea is to first cluster the similar data using an unsupervised learning algorithm and then use the existing labelled data to label the rest of the unlabelled data.

Semi-supervised learning models are becoming widely applicable in scenarios across a large variety of industries. **Text document classifiers** are a classic example of semi-supervised learning. Semi-supervised learning is ideal in this situation because it would be nearly impossible to find a large amount of labelled text documents. This is simply because it is not time efficient to have a person read through entire text documents just to assign it a simple classification. So, the algorithm learns from a small amount of labelled text documents while still classifying a large amount of unlabelled text documents in the training data.

**Reinforcement Learning**

Reinforcement Learning is one of the hottest research topics currently. This machine learning method is quite different from both supervised and unsupervised learning. It doesn’t use labels and instead uses rewards to learn. Reinforcement learning is a type of machine learning technique that enables an agent to learn by directly interacting with its environment through trial and error using feedback from its own actions and experiences.

To get the machine to do what the programmer wants, the artificial intelligence gets either rewards or penalties for the actions it performs. The learner is not told which actions to take, but instead it must discover which actions yield the maximum reward by trying them.

Markov Decision Processes (MDPs) are mathematical frameworks typically used to describe an environment in reinforcement learning. An MDP consists of a set of finite environment states S, a set of possible actions A(s) in each state, a real valued reward function R(s) and a transition model P(s’, s | a). However, real world environments are more likely to lack any prior knowledge of environment dynamics. In such cases model-free RL methods like Q-learning and SARSA (State-Action-Reward-State-Action) are used.

One of the most incredible things about reinforcement learning is that it brings machine learning closer to how we humans learn. Throughout our lives, interactions are undoubtedly a major source of knowledge about the environment and ourselves. Whether it be learning to drive a car or to hold a conversation, we are aware of how our environment responds to what we do, and seek to influence what happens through our behaviour.

Games are very popular in RL research. DeepMind’s AlphaGo Zero was able to learn the game of Go from scratch using reinforcement learning and end up achieving superhuman performance. It learned by playing against itself. After 40 days of self-training, Alpha Go Zero was able to outperform the version of AlphaGo known as *Master* that defeated world number one Ke Jie in what is often referred to as one of the most complex board games in human history.

Training models that control autonomous cars is another excellent example of a potential application of reinforcement learning. Apart from this, RL also has applications in industry automation, trading and finance, natural language processing(NLP), healthcare, recommendation systems

**Mathematics and Machine Learning**

For all the operations we perform on data, there is one common foundation that helps us achieve all of this through computation – and that is Mathematics. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results. Mathematics defines the underlying concept behind the algorithms and tells which one is better and why.

The major topics of Mathematics widely used in ML are Linear Algebra, Statistics, Probability Theory and Calculus.

Linear Algebra is a key foundation to the field of machine learning. It focuses mainly on computation. Some of the Machine Learning algorithms like Linear regression, Logistic regression, SVM and Decision trees use Linear Algebra in building the algorithms. Calculus deals with optimizing the performance of machine learning algorithms. Statistics plays an important role in the field of Machine Learning as it deals with large amounts of data. It deals with the statistical methods of collecting, presenting, analyzing and interpreting the numerical data.

Probability forms the basis of sampling. In machine learning, uncertainty can arise in many ways – for example – noise in data. Probability provides a set of tools to model uncertainty. Probability also forms the basis of specific algorithms like Naive Bayes classifier. Many iterative machine learning techniques like Maximum likelihood estimation (MLE) are based on probability theory. MLE is used for training in models like linear regression, logistic regression and artificial neural networks. Apart from these, model evaluation techniques require us to summarize the performance of a model based on predicted probabilities.

**Conclusion**

This article is a brief introduction to a few basic widely used machine learning algorithms. Another very important subfield of machine learning that is not discussed in the article is Deep Learning. Deep Learning consists of machine learning techniques based on Artificial Neural Networks. The introduction and advancement of deep learning has resulted in a revolution in the applications of machine learning in recent years.