William Liu

Machine Learning


##Table of Contents

##Summary

Machine Learning is basically using data to create algorithms that a machine can learn from (usually by finding patterns). We normally need to put data in a usable format, and then apply different algorithms to see what has better predictive power.

##Types of Machine Learning Problems

Machine Learning can fall under a few different categories. Problems can be Supervised or Unsupervised, Continuous or Categorical. We’ll go over the definitions of each first, then combine the categories to see what kind of problems we can solve.

####Definitions

So what are these categories?

####Problems Summary

####Supervised Continuous

With supervised learning on one or more continuous variables, we call this regression. Let’s say we try to predict the length of a salmon (a continuous variable) based on its age and weight.

####Supervised Categorical

With supervised learning on one or more categorical variables, we call this classification. Let’s say we try to predict the gender of a person (a categorical variable) based on their age and height.

####Unsupervised Continuous

With unsupervised learning on one or more continuous variables, we call this dimension reduction. We’re basically trying to take a lot of numbers/dimensions and squeeze it into a smaller set of numbers/dimensions (usually just two or three). This helps with data visualization.

####Unsupervised Categorical

With unsupervised learning on one or more categorical variables, we call this clustering. Let’s say we look at a lot of attributes of a flower (e.g. color, petal length), we can find what are the natural groupings.

##Data Science Workflow

So what’s the basic workflow?

  1. Identify the problem
  2. Collect data
  3. Clean and reformat data
  4. Build model
  5. Evaluate Model
  6. Communicate Results

The process is iterative so during Step 5 (evaluating your model) you may realize that the data you collected may not be enough so you move back to Step 2 (collecting more data).