Machine Learning Introduction

Shalika Prasad
6 min readMay 9, 2019

Machine Learning is a method of teaching machines or computers to make a prediction based on data sets and experiences.

In briefly, Machine learning is subset of artificial intelligence that automates analytical model building by using an algorithm.

It is like a system to ask questions and answers.

1. Application of ML

  • Search Engine Result
  • Number Plate Recognition
  • Voice Recognition
  • Dream Reader

2. How to ML Works

It has some phases that you should want to know.

  1. Phase 1 (Learning)

2. Phase 2 (Prediction)

3. ML Work-flow

4. Type of ML

  1. Supervised Learning
  2. Unsupervised Learning
  3. Reinforcement Learning

5. Follow these Steps :-

1. Data Gathering :

Firstly, We want to datasets via any sources :

  • Collecting by yourself : (if you have business that you could have data)
  • by third parties : (government or non-government organizational data)
  • crowed sourcing : (if you publish any app, web or some application you could have data)

Data Integration :

You can find datasets with different source types :

  • Databases
  • Files
  • Data Cubes

Then, you cannot use, want to avoid problems with data.

  • Entity Identification : Some people can have store as different name, so we should identify before using datasets. so, we can apply identify person from ID number, then we can avoid that issue.
  • Data value conflict : Some data can store as different data format in different databases. like one database — km and another database — m.
  • Derivable Data : we want to convert some data, like derived age from database.

Handling missing Values :

  • Some data can missing int your dataset.
  • If you have millions of data, then we have 100 data which has missing, we can just ignore. but you have only 1000 of data and out of 100 data is missing we cannot ignore that data.

So, We can

  • ignore the row
  • replace next/previous data in the row
  • replace mean/median
  • predict the values according to distribution.

Noise Removal :

Noise => Unwanted Data : that are no correlation with our target

Reasons for occurring noises : improper data collections

Benefits of remove noises :

  • Reduce the training time
  • Reduce model over-fitting
  • Increase the accuracy

Standardization :

  • Re-scaling the features to have the properties of a Gaussian Distribution.

Normalization :

  • Min-Max scaling (Shrinking data to 0 to 1)
  • the we can convert dataset same format, then we compare data fields and identify patterns.

Sampling :

Handle the class imbalance.

technique :

  • Oversampling : increase the data point in minority class.
  • Under-sampling : reduce the data point in majority class.
  • SMOTE(Synthetic Minority Oversampling Technique) :
SMOTE
  • We can create smooth data point in this way.
  • 5 point increase up to 15 points by using SMOTE.

Out-lier Detection :

  • Record don’t follow the form or relation which rest of the relation.

2. Identify what are the variables :

Firstly, We want identify what are variables that we suppose to use this model. After find variables, if we have input and output dataset, we want to divide 2 parts according to their types.

Two type of Variables :-

  1. Independent Variables:- This variable cannot change values via any effort. We use these variables as X-axis. e.g. Humidity, Pressure, Soil Status, ID Number and etc.
  2. Dependent Variables:- It depends on others, as well as we can tell these as predicted values that we hope to predict. Thus, We use these variables as Y-axis. e.g.: Salary, Demand, Rate and etc.

3. Identify Which Variables are Numerical or not

4. Feature Engineering :

Creating new feature using domain knowledge.

Correlation Analysis ;

  • Identify most related features to our target.
  • Help to remove unwanted and redundancy features.

Features : numerical , categorical, ordinal, date-time, coordinate

Steps :

  • Brainstorm features.
  • Create Features.
  • Check how the features work with the model.
  • Start again from first until the features work perfectly.

6. Identify Which Algorithm Type that you want to follow : Model Construction

Supervised Learning :-

  • If we use this type, we should have past data sets to learn and make future predictions.
  • Input variable(X) and Output variable(Y) and you use and algorithm to the learn mapping function from input to the output.
  • Unsupervised Learning :- If you don’t have past data sets, we can use this type. Then we can analyzing and grouping data.
  • Reinforcement Learning :- According to past action and feedback, This model can learn. Output data depends on the state of the current input data and Next input data depends on the output of the previous input. like chess game.

7. Identify Which Algorithm that you want to follow :

Supervised Learning :-

It can divide 2 parts.

  • Regression
  • Classification

Regression :

  • if you get Numeric data as predicted value, you can use this type.
  • e.g. : prediction of housing Price, Temperature, like that.
  • It has 2 Parts.

Linear Regression :

Simple Linear Regression :

  • If you have one variable as X you can use this.

Multiple Linear Regression :

  • If you have multiple variable as X you can use this.

Polynomial Regression :

  • non-linearly separable data

Logistic Regression :

outcome (dependent variable) has only a limited number of possible values.

outcome is categorical in nature.

For instance,

  • yes/no,
  • true/false,
  • red/green/blue,
  • 1st/2nd/3rd/4th, etc.

Classification :

Separate data in to distinct classes.

e.g. : Color, shape, categorized data.

It has 2 type :

  • Decision Tree
  • Support Vector Machine

Unsupervised Learning :-

Clustering :

Analyzing a grouping data which does not include pre-labeled class attributed.

Algorithms :

  • K-means
  • Hierarchical Clustering

Association :

Discover probability of co-occurrence of item in a collection

Algorithms :

  • Apriori
  • FP- Growth

e.g. :

  • 2 customers are buy foods, predict which foods are 3rd customer will buy next.

Reinforcement Learning :

  • It is learning by interacting with a space or and a environment.
  • It select its actions on basis of its past experiences and also by new choices.

7. Tunning Hyper-parameter :

try to identify combination of parameter.

It depend on selected Algorithm.

e.g. :

  • Random Forest : Dept of tree
  • No. of Parameter

8. Model Evaluation :

  1. Bias VS Variance :

Bias :

  • Model with high bias is too simple.
  • Low number of prediction.
  • Lead to high error on training and testing data.

Variance :

  • Very complex model
  • Has a very large No of predictors.
  • Lead to high error on testing data.
  • Lead to accurate prediction on test data.

2. Over-fit and Under-fit :

6. Limitation of Machine Learning :

failed to solve bellow those:

  • Crucial problem of AI
  • Natural Language processing
  • Image Recognition
  • not useful while working with high dimensional data(large no. of inputs and outputs)

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response