Machine Learning

##Table of Contents

Summary
Types of Machine Learning Problems
Workflow

Machine Learning is basically using data to create algorithms that a machine can learn from (usually by finding patterns). We normally need to put data in a usable format, and then apply different algorithms to see what has better predictive power.

##Types of Machine Learning Problems

Machine Learning can fall under a few different categories. Problems can be Supervised or Unsupervised, Continuous or Categorical. We’ll go over the definitions of each first, then combine the categories to see what kind of problems we can solve.

####Definitions

So what are these categories?

Supervised learning means we know a pair of data (usually an input and an associated output). The data is labeled, meaning we know what the input is and what a correct output should be. For example, we can compare a handwriting image to what the typed text is. We have an input of a person’s age and an output of the person’s height.
Unsupervised learning means we try to find some type of hidden structure in a dataset. The data is unlabeled, meaning we don’t know what the dataset stands for. For example, we might get images of some handwriting, but we don’t know what the typed text is. Instead, our algorithms would try to find patterns (like here’s an e shape) and group those together.
Continuous (aka Quantitative) means numerical data. For example, 100 cookies.
Categorical (aka Qualitative) means data is in categories. For example, pregnant or not pregnant, human/cat/dog, first/second/third.

####Problems Summary

Supervised Continuous - make quantitative predictions (e.g. regression)
Supervised Categorical - make qualitative predictions (e.g. classification)
Unsupervised Continuous - extract quantitative structure (e.g. dimension reduction)
Unsupervised Categorical - extract qualitative structure (e.g. clustering)

####Supervised Continuous

With supervised learning on one or more continuous variables, we call this regression. Let’s say we try to predict the length of a salmon (a continuous variable) based on its age and weight.

####Supervised Categorical

With supervised learning on one or more categorical variables, we call this classification. Let’s say we try to predict the gender of a person (a categorical variable) based on their age and height.

####Unsupervised Continuous

With unsupervised learning on one or more continuous variables, we call this dimension reduction. We’re basically trying to take a lot of numbers/dimensions and squeeze it into a smaller set of numbers/dimensions (usually just two or three). This helps with data visualization.

####Unsupervised Categorical

With unsupervised learning on one or more categorical variables, we call this clustering. Let’s say we look at a lot of attributes of a flower (e.g. color, petal length), we can find what are the natural groupings.

##Data Science Workflow

So what’s the basic workflow?

Identify the problem
Collect data
Clean and reformat data
Build model
Evaluate Model
Communicate Results

The process is iterative so during Step 5 (evaluating your model) you may realize that the data you collected may not be enough so you move back to Step 2 (collecting more data).

Machine Learning

Related Posts