Machine Learning Introduction

6 min readMay 9, 2019

Machine Learning is a method of teaching machines or computers to make a prediction based on data sets and experiences.

In briefly, Machine learning is subset of artificial intelligence that automates analytical model building by using an algorithm.

It is like a system to ask questions and answers.

1. Application of ML

Search Engine Result
Number Plate Recognition
Voice Recognition
Dream Reader

2. How to ML Works

It has some phases that you should want to know.

Phase 1 (Learning)

2. Phase 2 (Prediction)

3. ML Work-flow

4. Type of ML

Supervised Learning
Unsupervised Learning
Reinforcement Learning

5. Follow these Steps :-

1. Data Gathering :

Firstly, We want to datasets via any sources :

Collecting by yourself : (if you have business that you could have data)
by third parties : (government or non-government organizational data)
crowed sourcing : (if you publish any app, web or some application you could have data)

Data Integration :

You can find datasets with different source types :

Databases
Files
Data Cubes

Then, you cannot use, want to avoid problems with data.

Entity Identification : Some people can have store as different name, so we should identify before using datasets. so, we can apply identify person from ID number, then we can avoid that issue.
Data value conflict : Some data can store as different data format in different databases. like one database — km and another database — m.
Derivable Data : we want to convert some data, like derived age from database.

Handling missing Values :

Some data can missing int your dataset.
If you have millions of data, then we have 100 data which has missing, we can just ignore. but you have only 1000 of data and out of 100 data is missing we cannot ignore that data.

So, We can

ignore the row
replace next/previous data in the row
replace mean/median
predict the values according to distribution.

Noise Removal :

Noise => Unwanted Data : that are no correlation with our target

Reasons for occurring noises : improper data collections

Benefits of remove noises :

Reduce the training time
Reduce model over-fitting
Increase the accuracy

Standardization :

Re-scaling the features to have the properties of a Gaussian Distribution.

Normalization :

Min-Max scaling (Shrinking data to 0 to 1)
the we can convert dataset same format, then we compare data fields and identify patterns.

Sampling :

Handle the class imbalance.

technique :

Oversampling : increase the data point in minority class.
Under-sampling : reduce the data point in majority class.
SMOTE(Synthetic Minority Oversampling Technique) :

We can create smooth data point in this way.
5 point increase up to 15 points by using SMOTE.

Out-lier Detection :

Record don’t follow the form or relation which rest of the relation.

2. Identify what are the variables :

Firstly, We want identify what are variables that we suppose to use this model. After find variables, if we have input and output dataset, we want to divide 2 parts according to their types.

Two type of Variables :-

Independent Variables:- This variable cannot change values via any effort. We use these variables as X-axis. e.g. Humidity, Pressure, Soil Status, ID Number and etc.
Dependent Variables:- It depends on others, as well as we can tell these as predicted values that we hope to predict. Thus, We use these variables as Y-axis. e.g.: Salary, Demand, Rate and etc.

3. Identify Which Variables are Numerical or not

4. Feature Engineering :

Creating new feature using domain knowledge.

Correlation Analysis ;

Identify most related features to our target.
Help to remove unwanted and redundancy features.

Features : numerical , categorical, ordinal, date-time, coordinate

Steps :

Brainstorm features.
Create Features.
Check how the features work with the model.
Start again from first until the features work perfectly.

6. Identify Which Algorithm Type that you want to follow : Model Construction

Supervised Learning :-

If we use this type, we should have past data sets to learn and make future predictions.
Input variable(X) and Output variable(Y) and you use and algorithm to the learn mapping function from input to the output.

Unsupervised Learning :- If you don’t have past data sets, we can use this type. Then we can analyzing and grouping data.
Reinforcement Learning :- According to past action and feedback, This model can learn. Output data depends on the state of the current input data and Next input data depends on the output of the previous input. like chess game.

7. Identify Which Algorithm that you want to follow :

Supervised Learning :-

It can divide 2 parts.

Regression
Classification

Regression :

if you get Numeric data as predicted value, you can use this type.
e.g. : prediction of housing Price, Temperature, like that.
It has 2 Parts.

Linear Regression :

Simple Linear Regression :

If you have one variable as X you can use this.

Multiple Linear Regression :

If you have multiple variable as X you can use this.

Polynomial Regression :

non-linearly separable data

Logistic Regression :

outcome (dependent variable) has only a limited number of possible values.

outcome is categorical in nature.

For instance,

yes/no,
true/false,
red/green/blue,
1st/2nd/3rd/4th, etc.

Classification :

Separate data in to distinct classes.

e.g. : Color, shape, categorized data.

It has 2 type :

Decision Tree
Support Vector Machine

Unsupervised Learning :-

Clustering :

Analyzing a grouping data which does not include pre-labeled class attributed.

Algorithms :

K-means
Hierarchical Clustering

Association :

Discover probability of co-occurrence of item in a collection

Algorithms :

Apriori
FP- Growth

e.g. :

2 customers are buy foods, predict which foods are 3rd customer will buy next.

Reinforcement Learning :

It is learning by interacting with a space or and a environment.
It select its actions on basis of its past experiences and also by new choices.

7. Tunning Hyper-parameter :

try to identify combination of parameter.

It depend on selected Algorithm.