Course Overview
View Offline

Course Overview

1h 41m | 12 Videos | 84280 Views |

Intermediate| English[Auto-generated]

trigger text

hidden content

What is Cluster Analysis?

Cluster Analysis is a statistical tool which is used to classify objects into groups called clusters, where the objects belonging to one cluster are more similar to the other objects in that same cluster and the objects of other clusters are completely different. In simple words cluster analysis divides data into clusters that are meaningful and useful. Clustering is used mainly for two purposes – clustering for understanding and clustering for utility.

wbcr_snippet

Application of cluster analysis

Cluster analysis is used in many fields like machine learning, market research, pattern recognition, data analysis, information retrieval, image processing and data compression.
Cluster analysis can help the marketers to find out distinct groups of their customer base.
Cluster analysis is used in the field of biology to find out plant and animal taxonomies and categorize genes with similar characteristics
Cluster analysis is used in an earth observation database to group the houses in a city according to the house type, value and location.
Clustering can also be used to segment the documents on the web based on a specific criteria
In data mining, cluster analysis is used to gain in-depth understanding about the characteristics of data in each cluster.

Clustering Methods

Clustering methods can be divided into the following categories

Partitioning method
Hierarchical Method
Density based method
Grid Based Method
Model Based Method
Constraint Based Method

Advantages of Cluster Analysis

Given below are the advantages of cluster analysis

Cluster analysis gives a quick overview of data
It can be used if there are many groups in data
Cluster analysis can be used when there are unusual similarity measures to be done
Cluster analysis can be added on ordination plots and it is good for the nearest neighbours

Approaches to cluster analysis

There are a number of different approaches used to carry out cluster analysis which are divided into two

Hierarchical Method – Agglomerative Methods and Divisive Methods
Non Hierarchical Method also known as K-means Clustering methods

Cluster Analysis Course Objectives

At the end of this course you will be able to know

How to use cluster analysis in data mining
About the various types of clusters
About the Marketing applications of cluster analysis
Implications of wide variety of clustering techniques
Use clustering in statistical analysis

Pre Requisites for Cluster Analysis course

Basic knowledge of statistics is required. Some familiarity with data analysis will be considered as an added advantage though it is not a necessity.

Target Audience

The target audience of this course are listed below

Students
Research professionals
Data Analysts
Data Miners
And anyone who is interested in learning about cluster analysis

Cluster Analysis Course Description

Section 1: Introduction

Meaning of Cluster Analysis

The term cluster analysis includes a number of different algorithms and methods for grouping of data and objects. It is an exploratory data analysis tool. Cluster analysis is used to discover data structures without explaining why they exist. This section includes the brief introduction, history and benefits of cluster analysis.

Understanding of Cluster Analysis

Under this section we will learn about good clustering which produces high quality clusters and also you will learn how to measure the quality of clustering. The other topics included in this section are major clustering approaches, techniques of cluster analysis, basic concepts and algorithms of cluster analysis.

Example of Cluster Analysis

Clustering is used in every aspect of our daily life. Under this chapter you will learn see some illustration and practical application of cluster analysis in various fields. One example is given with a retail chain of stores across various locations. Another example is given based on market segmentation. Finally a simple numerical example is given which explains the objectives of cluster analysis. An example from each field like marketing, land use, biology, Psychology, Medicine, information retrieval, etc where cluster analysis is used are also given under this section.

Section 2: Types of Clustering

Hierarchical method of Clustering

Hierarchical clustering is a set of nested clusters that are organized in the form of a tree. The hierarchical clustering also contains different methods under it to find out which clusters should be joined at each stage. There are two main types of hierarchical clustering – Agglomerative and Divisive. The agglomerative clustering algorithm is explained in detail with example under this section.

The main methods of hierarchical clustering are also explained in brief in this section

Nearest Neighbour Method (Single Linkage Method)
Furthest Neighbour Method (Complete Linkage Method)
Average Linkage Method (Between Groups)
Centroid Method
Ward’s Method

Single linkage clustering

Single linkage method is also known as the nearest neighbour method. This methods is used to measure the distance between clusters where there are more than two observations. The major topics included in this section are listed below

Spanning tree
Contracting Space
Chaining
Dendrogram or tree diagram
Example of nearest neighbour method using diagrams

Linkage methods, Wards method

The single linkage method is explained in detail in the previous chapter. This section deals with the other two linkage methods – Complete linkage and Average Linkage.

In Complete linkage method the distance between the two clusters is said to be the maximum distance between the members. The formula is explained in this section. An example is given in detail to make you understand easily.

In average linkage method the distance between two clusters is considered as the average distance between all the pairs in the two clusters. This method is explained in detail under this section with an example.

In centroid method the mean value of each variable of each cluster is found out and the distance between centroids is used to merge the clusters. This method is also explained with an example.

In the ward’s method the pairs of clusters are combined and the sum of the squared distances within each cluster is found out. Finally the lowest sum of squares is chosen. This method is more popular. This section contains examples of this method.

k means clustering

K means clustering is also known as Non Hierarchical clustering. Under this method the desired number of clusters are mentioned beforehand and the best solution is chosen from that. The steps for carrying out K means clustering is mentioned in this chapter.

K means and Example of K means, difference between hierarchical and non hierarchical clustering

The important points of K means clustering is mentioned in this chapter which includes Partitional clustering approach, centroid and K means algorithm. The details of K means clustering is explained using the following points

Initial Centroids
Closeness
Similarity measures
Happening of convergences
Complexity of K means
Types of K means clustering – Sub optimal clustering and Optimal Clustering
Solutions to Initial Centroids problem
Evaluating K means cluster
Difference between Hierarchical Clustering and K means Clustering
Strengths of K means clustering
Limitations of K means clustering

Example of K means no. of cluster, Statistical tests, Dendrogram, Scree plot

With its computation K means clustering is considered as a Analysis of Variance (ANOVA) in reverse. The physical fitness example is given to explain the K means clustering method. The K means clustering is explained with other examples using plots and graphs.

Dendrogram – When carrying out a hierarchical cluster analysis, the result can be represented in the form of a diagram which is known as Dendrogram. This diagram explains which are the clusters which have been joined at each stage of the analysis and what was its distance at the time of joining. This helps to select the optimum number of clusters. An example of a Dendrogram is given under this heading.

Scree Plot displays the eigenvalues connected with a component in descending order versus the number of the component. The pattern of Scree plot and the properties of Scree plot in cluster analysis is discussed in this section.

Two step cluster analysis, Evaluation

The two step cluster analysis is used to reveal natural clusters within a data set. It runs pre clustering method first and then hierarchical method. This section contains the following topics under it

Algorithm of two step cluster analysis
The two steps of the two step cluster analysis
Case study – classifying motor vehicles using two step cluster analysis

Example for Listwise and Pairwise deletion of missing values , SPSS windows of output

Listwise and Pairwise deletions are used to find out the missing data. These techniques are used when a data is missing completely at random. Listwise deletion deletes all the data if there is one or more missing values. Pairwise deletion tries to minimize the loss that can be caused because of Listwise deletion. Listwise and Pairwise deletion has its own advantages and disadvantages. This section includes the following topics

What is Listwise deletion
Example of Listwise deletion
What is Pairwise deletion
Example of Pairwise deletion

SPSS windows of output

In SPSS cluster analysis can be found under Analyze à Classify. SPSS offers three methods of cluster analysis – Hierarchical, K means and Two step cluster. This section includes examples of performing cluster analysis in SPSS.

K means cluster theory, SPSS windows for k means

This section explains what is K means clustering method, its history, algorithm, initialization methods, applications and description.

SPSS is another statistical software which is used to perform cluster analysis. The steps to conduct cluster analysis in SPSS is simple and it lets you to choose the variables on which the cluster analysis needs to be performed. You can perform K means in SPSS by going to the Analyze à Classify à K means cluster. The steps for performing K means cluster analysis in SPSS in given under this chapter. Necessary screenshots are also provided for your easy reference.

FAQ’s General Questions

What technical support will be provided ?

Our customer support centre will be available at your service 24*7. Through that you can ask your queries and contact your instructors. You can also email your queries to the mail id provided in the site for technical support.

How can I get access to my course ?

You will be sent an email along with your user name and password. A link will also be sent for your learning course.

How much time commitment is required for each course ?

Each course requires at least 8 hours to be spent every week. You can choose your flexible time and complete the course at your convenience. Flexibility to learn on your own time is an advantage of taking an online course with educba.

Testimonials

Samuel

This is an excellent introductory course on Cluster analysis. The course covers mainly two types of cluster analysis – Hierarchical and K means. The quality of the material in this course are of high standards. The course flow from one topic into another is best. The examples under each section makes the learning and understanding process easy. Thanks to educba for offering this course.

Henry Mark

This is my first online course and it provided me a good experience. The syllabus of this course makes it more interesting. It is not stuffed with content. The content is good and self explanatory. It gave me a greater overview of the clustering methods and techniques which I was not aware of before taking this course. This course is recommended to someone who is new to the concept of cluster analysis as well as to one who knows how to apply cluster analysis to data. Overall a great course to begin with cluster analysis.

Richard

This is a good course on cluster analysis. It covers all the important topics and gives good examples to understand the methods and algorithms. It also gives some real life applications of clustering as examples and thus it makes the content more interesting and engaging. I loved this course and would definitely recommend.

Where do our learners come from?

Professionals from around the world have benefited from eduCBA’s Cluster Analysis courses. Some of the top places that our learners come from include New York, Dubai, San Francisco, Bay Area, New Jersey, Houston, Seattle, Toronto, London, Berlin, UAE, Chicago, UK, Hong Kong, Singapore, Australia, New Zealand, India, Bangalore, New Delhi, Mumbai, Pune, Kolkata, Hyderabad and Gurgaon among many.