Course Overview
About Data Mining
The present era in which we live is a complete digital era. The time of outdoors games has been taken by video games and if we want to gain knowledge, in place of looking for it in libraries, we just simply Google it. Knowledge gain has become extremely simple but with the increase of resources, the problems has increased as well. The world is filled with information but are all the information relevant for us? Just imagine you are scanning a database in search of looking for a particular name and information related to that person but in place you are finding information about every other person except him. This is information but this information cannot be classified as knowledge.
So what exactly is knowledge?
Relevant or useful information is known as knowledge. From the view point of any particular person, the information that is relevant for that person is knowledge else it is not. For different persons the definition of useful knowledge varies as their requirements varies.
So what is Data Mining and why its importance has increased so much in the last decade?
Mining of knowledge from large amount of data is referred as Data. It is actually a misleading term. Now why is it misleading? Let us take an example, the process of mining gold is referred as gold mining, not the sand or rock mining, similarly the more appropriate name for data mining would have been “knowledge mining from data”, but unfortunately it is too long to be used. Whereas, if a shorter term like “knowledge mining” cannot reveal the focus that the mining is done from large amounts of data. Nonetheless, mining is a vibrant expression portraying the progression that finds a small arrangement of valuable tidbits from a huge amount of raw material. In this manner, such an inaccuracy that bears both the terms, that is, “data” and “mining” became a well-liked and widely accepted choice. There are many terms that possess quite similar meaning to that of data mining, like mining of knowledge from data, data/pattern analysis, knowledge extraction, data dredging, and data archaeology. For many people Knowledge Discovery from Data, or KDD and data mining are synonym to one another. Then again, many view data mining as an important step in the course of action in knowledge discovery.
What data mining does?
- It uncovers esteemed info hidden in huge tomes of data.
- It is the evaluation of data and use of different software systems in order to find patterns and regularities in groups of data.
- For identifying the patterns in the huge volumes of data, the computer uses the fundamental rules and features in the data, and computer is solely responsible for it.
- Nothing is impossible in the magical world of data, hence there is a possibility of finding an interesting pattern at the most unexpected places as the data mining software excerpts patterns not formerly noticeable or recognizable.
- Mining resemblance:
- Huge amount of data are examined in an effort to find something meaningful.
- In a mining maneuver large tomes of low grade stuffs are filtered so that something valuable can be found.
Data Mining- a versatile field
- Mathematics
- Databases
- High Performance Computing
- Statistics
- Visualization
- Machine Learning
Data Mining and how it is related to various technologies
Data mining in relation with Machine Learning
- Huge amount of information sets in Data Mining.
- Proficiency of algorithms is crucial for analysis.
- For the algorithms scalability is equally important.
- There exist lots of inconsistency because of the missing values
- Data is of the real world frame.
- The data present are not the user generated but they are pre-existing, historic data.
- Data are not static in nature but are prone to updates
- Many effective technologies are present that can be used data retrieval.
- In the form of integrity constraints domain knowledge is present.
Data mining in relation with DBMS
- Example in DBMS accounts
- For each service type, the sales of the previous month.
- Grouping of customer on the basis of their sex or age bracket and the sales done per service
- List of all the customers who has lapsed their policy.
- Questions that are answered using Data Mining
- All those customers who has lapsed their policy, which characteristics is common between all of them?
- How they differ from those customers who renew their policy?
Data Warehouse
When multiple heterogeneous data sources are organized under a unified schema and stored at a single site or repository, so that the management of decision making can be facilitated, is known as data warehouse. The technology included in data warehousing are cleaning of data, integrating those data, and on-line analytical processing (OLAP), that is, probing techniques including functionalities like consolidation, summarization, and aggregation along with the facility to view data from different angles.
Techniques used
- Customary positioned database approaches
- Statistics is used in various stages of data mining. Like:
- Data cleaning: In this process all the irrelevant and erroneous data are removed.
- EDA (exploratory data analysis): It includes histograms, graphs, frequency counts etc.
- Selection and sampling of data: It reduces the level of computation
- Re-definition of the attribute
- Data analysis – Analysis is done for the measurement of association and relationships between variables, interestingness of conventions, classification etc.
- Visualization: enhancement of EDA, it makes patterns more observable.
- Cluster Analysis or Clustering
- The partitioning of the database so that each segment or partition is similar agreeing to some principals or norms is termed as clustering or segmentation.
- Clustering corresponding to similarity is a notion that appears in many fields, for example. In chemistry is used as clustering of molecules
- Clustering according to similarity has its applications in the fields of Data mining as well, for example, a client or customer base segmenting.
- When we are dealing with very large databases, it provides with sub-groups for the easier analysis.
Applications of Data Mining
- Estimation of credit
- Prediction of the stock market
- Diagnosis of faults in Production Systems
- Medical Discovery
- Scientific Discovery
- Detection of fraud
- Forecasting of natural or man-made hazard
- Analysis of trends in marketing and buying
- Restructuring of the Organizations.
- Target Mailing
- Acquisition of the knowledge
- In DBMS the performance enhancement based on semantics
Testimonial
Partho Das:
Since Data Mining has extended its wings to each and every field, hence the knowledge, at least the basic knowledge of data mining is essential. This course guides you through the curvy ways of data mining and makes it an interesting subject for you. This course provides you with the Introduction & Applications of Data Mining, data mining applications, explanation of kdd along with the diagrams, key steps in kdd process, convergence of multiple disciplines, discovered and interesting patterns, examples of dmql, data mining and its major issues, algorithm for frequent pattern mining etc. This has helped me grow my career path.
Evelyn Olivia:
With this course you can learn the details of data mining and implement it in your work. The knowledge of data mining enhances your capabilities and gives your career a new turn.