Data Mining Multi-Attribute Decision System. Facilitating Decision Support Through Data Mining Technique by Hierarchical Multi-Attribute Decision Models


Doctoral Thesis / Dissertation, 2020

134 Pages

Dr. Pankaj Pathak (Author)


Excerpt


CONTENTS

List of abbreviations

List of Figures

List of Tables

Abstract

Chapter 1. Introduction to Data Mining and Decision Support
1.1 Introduction
1.2 The KDD Process
1.2.1 Developing and understanding of the application domain
1.2.2 Selecting and creating a data set
1.2.3 Pre-processing and cleansing
1.2.4 Data transformation
1.2.5 Choosing the appropriate Data Mining task
1.2.6 Choosing the Data Mining algorithm
1.2.7 Employing the Data Mining algorithm
1.2.8 Evaluation
1.2.9 Using the discovered knowledge
1.3 The Data Mining, a Step of the KDD Process
1.3.1 Database, Data Warehouse, or Other Information Repositories
1.3.2 Database or Data Warehouse Server
1.3.3 Knowledge Base
1.3.4 Data Mining Engine
1.3.5 Pattern Evaluation Module
1.3.6 Graphical User Interface
1.4. Data Mining Functionalities
1.4.1 Concept / Class Description: Characterization and Discrimination
1.4.2 Association Rule Mining
1.4.3 Classification and Prediction
1.4.4 Clustering
1.4.5 Outlier Analysis
1.4.6 Evolution Analysis
1.5 Common Uses of Data Mining
1.6 Decision Support
1.6.1 Basic Discipline
1.6.2 Decision Making
1.6.3 Classification of decision problems
1.6.4 Decision Support System
1.7 Contributions of This Thesis

Chapter 2. A Survey of Existing Work and Problem Definition
2.1 A Survey of existing work
2.1.1 Problem with integration of Data Mining and Decision Support
2.1.2 Evolution of Decision Support System (DSS)
2.1.3 A Survey of existing decision tree algorithm (Traditional)
2.1.4 A Survey of existing decision tree algorithm (Advanced)
2.2 Problems yet to be solved
2.3 Problem Definition
2.4 Research hypothesis, aims and objectives
2.5 Conceptual Research Framework
2.6 Conclusion

Chapter 3. Analysis of Data Mining Methods
3.1 Data Mining Methods
3.2 Discovery Method
3.3 Flat versus hierarchical classification
3.4 Basic Methods
3.5 Hierarchical classification
3.5.1 Why to choose hierarchies
3.5.2 Advantages of hierarchies
3.6 Machine Learning and Classification
3.6.1 Classification
3.6.1.1 Evaluation of classification methods
3.7 Classification Based on decision tree
3.8 Classification rules
3.9 The Pruning of Decision Tree
3.9.1 Types of Pruning Technique
3.9.1.1 Pre-Pruning
3.9.1.2 Post- Pruning
3.9.2 Fuzzy Decision Trees
3.10 Conclusion

Chapter 4. Decision tree Techniques and their formulation
4.1 Formulation of decision trees
4.2 Characteristics of Classification Trees
4.2.1 Tree Size
4.2.2 The hierarchical nature of decision trees
4.3 Basic concept and algorithm of Decision Tree
4.3.1 ID3
4.3.1.1 Attribute Selection
4.3.1.2 Information Gain
4.3.2 C4.5
4.3.3 CART
4.3.4 CHAID
4.3.5 QUEST
4.4 Advantages and Disadvantages of Decision Trees
4.5 Decision Tree Extensions
4.5.1 Oblivious Decision Trees
4.5.2 Fuzzy Decision Trees
4.6 Decision Trees Inducers for Large Datasets
4.7 Incremental Induction
4.8 Evaluation of Decision Tree Techniques
4.8.1 Generalization Error
4.8.1.2 Empirical Estimation of Generalization Error
4.8.2 Confusion Matrix
4.8.3 Computational Complexity
4.8.4 Comprehensibility
4.9 Scalability to Large Datasets
4.9.1 Robustness
4.10 Conclusion
4.8.1.1 Theoretical Estimation of Generalization Error

Chapter 5. The Development of New Algorithm for Decision Tree Learning
5.1 Proposed Improved ID3 Algorithm
5.2 Steps of Improved ID3 Algorithm
5.3 Pseudocode of Proposed Improved Algorithm
5.4 Experimental Example
5.5 Experiments on Datasets
5.6 Investigation and analysis based on performance parameters
5.6.1 Accuracy
5.6.2 Model Build Time
5.6.3 Predictor Error Measures
5.7 Empirical comparison and Investigation results
5.8 Conclusion

Chapter 6. Decision Support Framework and Related Work
6.1 Introduction
6.2 Proposed Decision Support Framework
6.3 Real world applications
6.3.1 Predicting Usage of Library Books
6.3.2 Intrusion Detection
6.3.3 Machine Learning
6.3.4. Diagnosis
6.3.5 Banking Sector
6.3.6 Credit Risk Analysis
6.4 Decision Tree Construction Using Weka
6.5 Weka Screen Shot
6.6 Conclusion

Chapter 7. Conclusions and Future Work
7.1 Summary and Contributions
7.2 Limitations and Future Work
7.3 Future Work

References

ABOUT THE AUTHORS

Dr. Pankaj Pathak obtained Masters and Ph.D. Computer Science in 2005 and 2014 respectively. He is working as an Assistant Professor in Symbiosis International (Deemed University). His area of interests are Data Mining, AI, and Smart Technologies. He has Published Several Research papers in the area of Data Mining, IOT security and Speech Recognition Technology. He is passionate for motivating the students towards research for wellbeing of the society.

Dr. Parashu Ram Pal, obtained Ph.D. in Computer Science. He is working as a Professor in ABES Engineering College, Ghaziabad, India. He has published more than 60 Research Papers, Patents, Books and Book Chapters. Dr. Pal is devoted to Education, Research & Development for more than twenty two years and always try to create a proper environment for imparting quality education with the spirit of service to the humanity. He believes in motivating the colleagues and students to achieve excellence in the field of education and research.

ACKNOWLEDGEMENT

From Pankaj Pathak

The process of writing this thesis was an interesting, educating, exciting, and encouraging one, not least because of the support of many people who accompanied me through the ups and downs, I faced during that time. I want to- take- this opportunity to thank these people who- that have encouraged and motivated me, who have patiently beard my lack- of time, who have encouraged me to think- in different directions/, and who have generally helped me to improve this thesis and the research- efforts behind it

First of all, I am sincerely pass my gratitude to Prof Dr. Parashu Ram- Pal, for bls help, guidance and encouragement His patience, insights, research- style and the ability to draw research- questions from/ literature have been integral to significantly improve this work.

I also would like to thank- many anonymous reviewers for their critical and valuable comments on our papers- My gratitude also goes to my colleagues for their helps and supports,, which- provide countless assista nce a nd suggestions. Th e la st but not th e lea st, I would like to express my gratitude to my parents,, my wife and son for their love, support, encouragement, as well as understanding and patience.

ACKNOWLEDGEMENT

From paraShu Ram Pal

A journey is easier when you travel together. Interdependence is certainly more valuable than independence In this wovh, I have been accompanied and supported by many people. It is a pleasant aspect that I have now the opportunity to express my gratitude to all of them Without the help of a large number of students and my colleagues, this booh would never have existed. I would lebe to thanb the editors and reviewers who tooh the time to read this booh and provided with valuable suggestions to Priyanha and me make this booh a reality.

I wish to express my feelings of extreme gratefulness to my parents, my wife beloved Aditi for their continuous moral support, encouragement, inspiration and patience during the period of my worb. I owe thanhs to my loving daughter Anwesha and son Atharva who missed my company quite often but never complained.

List of abbreviations

Abbildung in dieser Leseprobe nicht enthalten

List of Figures

Figure 1.1: Steps Taken in Data Mining

Figure 1.2: KDD Process

Figure 2.1: Decision Support Research Framework

Figure 3.1: Taxonomy of Data Mining Methods

Figure 3.2(a): A flat multiclass classification problem

Figure 3.2(b): A class hierarchy exhibiting two superclasses

Figure 3.2(c): A valid binarization of the n-ary class hierarchy

Figure 3.3: Hierarchical Classification problems

Figure 3.4: Decision tree with classification rules

Figure 4.1: Decision Tree for Diagnosing Problem

Figure 4.2: Decision Tree Learning Algorithm

Figure 4.3: Decision tree for classification of their patients

Figure 5.1: Proposed Improved Algorithm

Figure 5.2: Accuracy Comparison of Algorithms in %

Figure 5.3: Models Build Time Comparison of Algorithms in Sec

Figure 5.4: Comparison of Mean Absolute Error

Figure 6.1: Proposed Decision Support Framework

Figure 6.2: Hierarchical structure to predict the potential customer

Figure 6.3: Decision tree for assessing credit risk

Figure 6.4: Weka Acquisition of data set

Figure 6.5: Weka Classification of data set

Figure 6.6: Decision Tree view

List of Tables

Table 4.1 Confusion Matrix

Table 5.1 Credit-analysis

Table 5.2 Classifiers Accuracy

Table 5.3 Execution Time to Build the Model

Table 5.4- Mean Absolute Error of Algorithms

Table 6.1 Credit-Analysis (Raw data for decision tree)

Table 6.2 Weather dataset

ABSTRACT

Data Mining is to be considered to be an important while discovering knowledge from the large datasets. Data Mining is usually the term mostly used than the term knowledge discovery from data. Data mining is coined with one of the step while discovering insights from large amounts of data which may be stored in databases, data warehouses, or in other information repositories. It is observed that recently the awareness of data mining technology has been increased. This awareness resulted to increase in collection of data, storing then in warehouses and to use them for decision making by the leading organization. Data mining now playing a significant role in seeking a decision support to draw higher profits by modern business world. It is very important to know the key success factors for deployment decision support projects successfully. If the key success factors studied well in advance and documented properly the risk of implementing new Decision Support Systems (DSS's) can be reduced. Various researchers studied the benefits of data mining process and its adoption by business organization, but very few of them have discussed about the success factors of Decision support projects.

One of the most interesting new problems in theoretical computer science is massive data algorithmic. This problem is even more urgent and important when the organizational decision is depend on the output of data mining algorithms.

We have discussed a lot of work done in the field of Decision support and hierarchical multi-attribute decision models. Ample amount of algorithms are available which are used to classify the data in datasets. The evaluation criteria of these algorithms are the accuracy of data classification, time taken for classification by the algorithms etc. Most algorithms use the concept of information gain for classification purpose. Some Lacking areas also exist. There is a need for ideal algorithm for large datasets. There is a need for handling the missing values. There is a need for removing attribute biasness towards choosing a random class when an conflicts occurs i.e. there may exist some attributes in test dataset whose values are equal but the class is different. There is a need for decision support model which takes the advantages of hierarchical multi-attribute classification algorithms.

ID3 algorithm is developed by Quinlan in 1986. This algorithm uses information gain to classify the data in data sets. But it has not worked accurately when there exist some attributes in test dataset whose values are equal but the class is different. In this situation the algorithm either not producing the decision tree or selects any random class for such attributes whose all vales are same except the class to which they belongs. So we need to resolve this confliction of class that, which class is to be adopted by the algorithm to classify the data so that the maximum accuracy can be achieved.

The Research Hypothesis states the involvement of decision tree while adopting accuracy of classification and while emphasizing on impact factor or importance of the attributes rather information gain. The concept of involvement of impact factor rather than just accuracy can be utilized in developing the new algorithm whose performance improves over the existing algorithms. The aim of this thesis is to propose a new algorithm which improves over accuracy and contributing effectively in decision tree learning.

We present an algorithm that is able to handle the task of classification more accurately. It resolves the above stated problem of confliction of class, i.e. selection of class which is to be adopted by the algorithm to classify the data with maximum accuracy. We have introduced the impact factor and Classified impact factor to resolve the conflict situation. We have used data mining technique in facilitating the decision support with improved performance over its existing companion. We have also addressed the unique problem which have not been addressed before. Definitely the fusion of data mining and decision support can contribute to problem solving by enabling the vast hidden knowledge from data and knowledge received from experts.

For utilizing our proposed improved decision tree algorithm we also establish a decision support framework. In this Decision Support Model, the relationships between different modules of the proposed framework are expressed by modules diagram. On the basis of this model, we aim to devise our proposed algorithm to discover the hidden information and to use this kind of knowledge to facilitate decision support.

In this dissertation Chapter 1 introduces about the background and history of the data mining process and their utilization. This chapter also highlights the issues, relevance, and significances of this dissertation. Chapter 2 reviews the literature, existing works, discusses general concepts of data mining and data mining algorithms, and reviews general concepts of decision support methodologies. It also describes the identification of problems yet to be solved and research gaps which are used to develop the research framework. Chapter 3 describes the theoretical analysis and results of data mining methods. It also describes the evaluation techniques for data mining methods. Chapter 4 presents details of decision tree techniques and their formulation, analysis and results, along with the chain of evidence for each decision tree algorithm, the quantitative instrument and survey results. In addition, the chapter also depicts data collection and data analysis methods in response to answer the research questions. Chapter 5 Proposed and develops a new improved algorithm for decision tree learning. It presents the pseudocode of the proposed algorithm. It also validates the proposed algorithm for various datasets on the basis of different performance measures. In addition it includes empirical comparison and investigation results. Chapter 6 formulates the decision support framework by utilizing the proposed improved algorithm. It also discusses the real world applications and constructs the practical implementation of decision tree algorithm. Chapter 7 concludes with a summary of the contributions and findings, describing assumptions and limitations and suggesting future research.

Chapter 1

INTRODUCTION TO DATA MINING AND DECISION SUPPORT

1.1 INTRODUCTION

In present business world a typical huge task is to extract information from data. This is not only the task of enquiring a database about its content, but is also the process of building non-structured information that lie hidden in the raw data. Data mining 33 is a big umbrella including many distinct problems. One of the most interesting amongst these problems can be described in the following abstract form: Suppose that a shopping mall possesses collections of products. Moreover, suppose that all the products in the collections come from a limited number of possible existing product kinds, so that several people can purchase product of the same kind. A reasonable question is to ask which subsets of products are often purchased by people, or to ask which products are likely to be purchased together. In general, given a dataset, it is useful to understand which are the patterns, if any, that happen in the dataset. Moreover, once a pattern is discovered, it can be interesting to understand how frequent such a pattern actually is.

The volume and variety of data being captured by companies today is staggering. This exponential growing in collected data increases the demand for data analysis and the vital to turn this information into business processes and actionable plans to make smarter strategic decisions. Gaining fresh new insights into emerging trends and behaviours that can significantly increase the revenue and reduce the cost with data mining technology.

This thesis concerned with Decision Support: collecting and presenting data from several sources, which help people make decisions supported data that's collected from a good range of sources and Data Mining: extracting useful insights from large and detailed collections of knowledge. With the advancement of business dealings and increased possibilities in modern society for companies and institutions to gather data reasonably and efficiently, this subject has become of growing importance. This attention has motivated a rapidly growing research field with improvements both on a theoretical, furthermore as on a practical level with the accessibility of a variety of business tools. Unfortunately, the extensive application of this technology has been limited by a major notion in mainstream data processing methods. This statement - all data resides, or are often made to reside, during a single table - stops the utilization of those data mining tools in certain vital domains, or requires substantial massaging and modifying of the information as a pre-processing step. This constraint has laid a comparatively recent interest in richer data mining paradigms that do permit structured data as hostile the normal flat representation. Over the last decade, we've got seen the occurrence of data Mining techniques that provide to the analysis of structured data. These techniques are normally upgrades from recognised and accepted data mining techniques for tabular data, and emphasis on handling the richer representational setting. Within these techniques, which we'll collectively mention to as Structured data mining techniques, we are able to determine variety of paradigms or 'traditions', each of which is inspired by an existing and well-known choice for representing and manipulating structured data. Data mining methods have become a part of combined Information Technology (IT) software packages. The three tiers of the choice support aspect of IT. Beginning since the info sources (such as operational databases, semi- and non-structured data and reports, websites etc.), the primary tier is that the data warehouse, tracked by OLAP (On Line Analytical Processing) servers and ending with analysis tools, where data mining tools are the foremost advanced. The most advantage of the combined approach is that the pre-processing steps are much easier and more suitable. Meanwhile this part is often the key problem for the KDD process 33 (and can consumes most of the KDD project time), this industry trend is incredibly significant for growing the utilization and utilization of data mining. Though, the danger of the integrated IT method comes from the actual fact that DM techniques are much more complex and complex than OLAP that the users have to be trained appropriately.

1.2 THE KDD PROCESS

The main steps which are taken one by one are shown in Figure1.1 from the extraction of data from large repositories to interpreting the results. The knowledge discovery process (Figure 1.2) is iterative and collaborative, containing of nine steps. Note that the process is iterative at each step, implication that moving back to regulate previous steps may be requisite. The process has many "artistic" features in the sense that one cannot present one formula or make a whole taxonomy for the correct selections for each step and application type. Thus it is required to intensely understand the process and the different needs and possibilities in each step. Taxonomy for the Data Mining methods 33 is facilitating in this process. It is described in the next section.

The process starts with defining the KDD goals, and "ends" with the utilization of the discovered knowledge. As a consequence, deviations would must be made within the application domain (such as proposing different features to mobile users so as to drop churning). This ends the loop, and also the effects are then measured on the fresh data repositories, and also the KDD process 54 is launched again. Subsequent could be a brief description of the nine-step KDD process, starting with a managerial step:

Abbildung in dieser Leseprobe nicht enthalten

Figure 1.1: Steps Taken in Data Mining

1.2.1. Developing an understanding of the application domain

This is the preliminary step. It prepares the act for understanding what should be finished the various decisions (about transformation, algorithms, representation, etc.). Those who are in supervision of a KDD project have to realize and define the goals of the end-user and therefore the environment within which the knowledge discovery procedure will occur (including relevant prior knowledge). Because the KDD process 33 continues, there is also even an amendment and tuning of this step. Having understood the KDD goals, the pre-processing of the information starts, as defined within the next three steps (note that a number of the approaches here are analogous to data processing algorithms, but are employed in the pre-processing context)

1.2.2. Selecting and creating a data set

Selecting and creating a data set on which discovery are going to be accomplished. Taking defined goals, the info which will be used for the knowledge discovery should be determined. This comprises searching for what data is offered, obtaining additional necessary data, then assimilating all the info for the knowledge discovery into one data set, including the attributes that may be considered for the method. This process is incredibly imperative because the Data Mining learns and discovers from the available data. This can be the evidence base for building the models. If some significant attributes are missing, then the full study may fail. From success of the procedure it's good to ruminate as many as possible attribute at this phase. On the opposite hand, to collect, organize and operate complex data repositories is dear, and there's a balance with the prospect for best understanding the phenomena. This balance signifies a facet where the interactive and iterative aspect of the KDD is happening. It starts with the simplest available data set and later expands and discerns the effect in terms of data discovery and modelling.

1.2.3. Pre-processing and cleansing

In this stage, data reliability is improved. It includes data clearing, such as handling missing values and removal of noise or outliers. Several methods are such that they do nothing to becoming the major part (in terms of time consumed) of a KDD process in certain projects. It may include difficult statistical methods, or using particular Data Mining algorithm in this perspective. For example, if one suspect that a assured attribute is not reliable enough or has too many missing data, then this attribute could become the goal of a data mining supervised algorithm. A prediction model for this attribute will be constructed, and then missing data can be predicted. In addition to which one pays attention to this level be determined by many factors. In any case, studying these features is important and normally informative insight by itself, about enterprise information systems.

1.2.4. Data transformation

In this phase, the generation of improved data for the data mining is arranged and developed. Approaches here contain dimension reduction (such as feature selection and extraction, and record sampling), and attribute transformation (such as discretization of numerical attributes and functional transformation). The success of the whole KDD project is depends on this step which is taken into account to be an important step. But it's typically very project-specific step for instance, in medical examinations, the measure of attributes may often be the foremost important factor, and not everyone by itself. In marketing, i t's needed to think about effects beyond our control further as efforts and temporal issues (such as studying the effect of advertising build-up). Though, whether or not we don't use the correct transformation at the start, we may get an unforeseen effect that hints to us about the conversion needed (in the subsequent iteration). Thus the KDD process reflects upon itself and results in an understanding of the transformation needed (like a quick knowledge of a talented person in a very certain field regarding key leading indicators). Having accomplished the above four steps, the subsequent four steps are associated with the info Mining part, where the stress is on the algorithmic aspects employed for every project.

1.2.5. Choosing the appropriate Data Mining task

We are now prepared to choose on which type of Data Mining to use, for example, classification, regression, or clustering. This typically depends on the KDD goals, and furthermore on the previous steps. There are two major goals in Data Mining: prediction and description. Prediction is regularly denoted to as supervised Data Mining, whereas descriptive Data Mining includes the unsupervised and visualization aspects of Data Mining. Maximum data mining methods are founded on inductive learning, where a model is built explicitly or implicitly by simplifying from a sufficient number of training examples. The fundamental notion of the inductive approach is that the proficient model is applicable to future cases. The scheme also takes into account the level of meta-learning for the precise set of available data.

1.2.6. Choosing the Data Mining algorithm

As per the approach, we now settle on the strategies. This phase includes choosing the particular method to be used for searching patterns (including multiple inducers). For instance, in with precision versus understand ability. The previous is healthier with neural networks, while the second is healthier with decision trees. For every strategy of meta-learning there are several possibilities to grasp how it may be accomplished. Meta-learning emphases on elucidating on causes of data Mining algorithm to achieve success or not during a precise problem. Thus, this approach attempts to recognise the circumstances under which a data Mining algorithm is most fitted. Each algorithm has parameters and techniques of learning (such as ten-fold cross-validation or another division for training and testing).

1.2.7. Employing the Data Mining algorithm

Lastly the implementation of the Data Mining algorithm is reached. during this step we would require to deploy the algorithm several times until a satisfied result's achieved, for example by changing the algorithm's control parameters, like the minimum number of instances during a single leaf of a choice tree.

1.2.8. Evaluation

In this step we assess and deduce the mined patterns (rules, reliability etc.), with relevance the goals distinct within the commencement. Here we consider the pre-processing steps with relevance their result on the info Mining algorithm results (for example, adding features in Step 4, and repeating from there). This step emphases on the clarity and effectiveness of the developed model. During this step the discovered knowledge is additionally documented for further usage. The last step is that the usage and overall feedback on the patterns and discovery results obtained by the data Mining.

1.2.9. Using the discovered knowledge

We are now prepared to include the knowledge into another system for further action. The knowledge adapts active in the sense that we may make deviations to the system and measure the effects. Actually the success of this step decides the effectiveness of the entire KDD process. There are many challenges in this step, such as losing the "laboratory conditions" beneath which we have functioned. For instance, the knowledge was discovered after a certain static snapshot (usually sample) of the data, but now the data turn into dynamic. Data structures may change (certain attributes become unavailable), and the data domain may be modified (such as, an attribute may have a value that was not assumed before). 31.

1.3 THE DATA MINING, A STEP OF THE KDD PROCESS

Data Mining is a step in the knowledge discovery process. The term Data Mining is more widespread than the term knowledge discovery from the large datasets. Data mining is the procedure of discovering interesting knowledge from large amounts of data resides in databases, data warehouses, or other information repositories. Based on this opinion, the architecture of a typical data mining system may have the following main components:

1.3.1 Data, data warehouse or other information repositories

This is one or a set of databases, data warehouses, spreadsheets or other kinds of information repositories. Data cleaning and data integration techniques may be performed on the data.

1.3.2. Database or data warehouse server

The database or data warehouse server is accountable for fetching the pertinent data, based on the user's data mining request.

1.3.3 Knowledge base

This is the domain knowledge which is used to guide the search or assess the interestingness of consequential patterns. Such knowledge can comprise concept hierarchies, used to form attributes or attribute values into different levels of abstraction. Knowledge such as user beliefs, which can used to be considering a pattern's interestingness base on its rapidity, may also be included.

1.3.4. Data Mining Engine

This is vital to the data mining system and preferably comprises of a set of functional modules for tasks such as characterization, association and correlation analysis, classification, prediction, cluster analysis, outlier analysis, and evolution analysis etc.

1.3.5. Pattern evaluation module

This component normally occupies interestingness measures and relates with the data mining segments so as to focus on the search concerning interesting patterns. It may use interestingness inceptions to filter out discovered patterns. Instead, the pattern evaluation module may be combined with the mining module, depending on the employment of the data mining method used. For efficient data mining, it is highly commended to push the evaluation of pattern interestingness as deep as possible into the mining process so as to restrain the search to only the interesting patterns.

1.3.6. User interface

This module communicates among users and the data mining system, permitting the user to interact with the system by stipulating a data mining query or task. It is also providing information to help centre for the search, and performing exploratory data mining based on the transitional data mining results. In addition, this component permits the user to browse database and data warehouse schemas or data structures, evaluate mined patterns, and visualize the patterns in diverse forms.

1.4 DATA MINING FUNCTIONALITIES

Data mining functionalities are used to state the kind of patterns to be found in data mining task. In general, data mining tasks can be categorised into two categories: descriptive and predictive. Descriptive mining tasks describe the general properties of the data in the database. Predictive mining tasks accomplish implication on the current data in order to make predictions.

Typically users may have no idea concerning what kinds of patterns in their data may be fascinating, and hence may like to search for several different types of patterns in parallel. Therefore it is significant to have a data mining system that can mine many kinds of patterns to accommodate different user prospects or applications. Furthermore data mining systems should be able to determine patterns at various levels of abstraction. Data mining systems should also permit users to specify hints to guide or to pay attention on the search for interesting patterns. Since some patterns might not hold for all of the data in the database, a measure of certainty is accompanying with each discovered pattern. Data mining functionalities, and the types of patterns they can discover, are described below.

1.4.1 Concept / Class Description: Characterization and Discrimination

Data can be related with classes or concepts. It can be beneficial to describe discrete classes and concepts in summarized, brief, and yet precise terms. Such descriptions of a class or a concept are christened class/concept descriptions. These descriptions can be resultant via (1) data characterization, by summarizing the data of the class under study or (2) data discrimination, by comparison of the target class with one or a set of comparative classes or (3) both data characterization and discrimination.

Data characterization is a summarization of the general characteristics or features of a target class of data. The data consistent to the user-specified class are usually collected by a database query. Data discrimination is a comparison of the general features of target class data objects with the common features of objects from one or a set of conflicting classes. The target and conflicting classes can be specified by the user, and the equivalent data objects retrieved through database queries.

1.4.2 Association Rule mining

Association rule mining is to treasure out association rules that mollify the predefined minimum support and confidence from a given database. The problem is usually decomposed into two sub problems. One is to find those itemsets whose incidences exceed a predefined threshold in the database; those itemsets are called frequent or large itemsets. The second problem is to produce association rules from those large itemsets with the limitations of minimal confidence. Suppose one of the large itemsets is Lk, Lk = {I1, I2, ... , Ik}, association rules with this itemsets are produced in the following way: the first rule is {I1, I2, ... , Ik-1}^ {Ik}, by examination for the confidence of this rule can be determined as interesting or not. Now other rule are produced by deleting the last items in the antecedent and implanting it to the consequent, further the confidences of the new rules are checked to regulate the interestingness of them. Those processes repeated until the antecedent becomes void. Since the second sub problem is quite straight forward, most of the researches centre on the first sub problem.

1.4.3 Classification and Prediction

Classification is the process of discovery a model that defines and distinguishes data classes or concept, for the purpose of being able to use the model to predict the class of objects whose class label is anonymous. The derived model is based on the analysis of a set of training data (i.e. data objects whose class label is known). The resultant model may be denoted in various forms, such as classification (IF-THEN) rules, decision trees, mathematical formulae, or neural networks. A decision tree is a flow chart like tree structure, where individual node denotes a test on an attribute value, each branch signifies an outcome of the test, and tree leaves characterises classes or class distributions. Decision trees can easily be transformed to classification rules. There are many other methods of for building of classification models, such as naive Bayesian classification, support vector machines, and k-nearest neighbour classification.

However classification predicts categorical labels, prediction models continuous valued functions. That is, it is used to forecast missing or unavailable numerical data values rather than class labels. Regression analysis is a statistical methodology that is most frequently used for numeric prediction, although other methods exist as well.

1.4.4 Clustering

U nlike classification and prediction, which analyze class-labelled data objects, clustering analyzes data objects without checking a known class label. Clustering can be used to create such labels. In common the class labels are not present in the training data simply, since they are not recognised to begin with. The objects are clustered or grouped based on the principle of maximizing the intraclass resemblance and minimizing the interclass similarity. Each cluster that is designed can be viewed as a class of objects, from which rules can be derived.

1.4.5 Outlier Analysis

A database may comprise of data objects that do not abide by with the general behaviour or model of the data. These data objects are known as outliers. Most data mining methods remove outliers as noise or exceptions. Though, in some applications such as fraud detection, the exceptional events can be more exciting than the more frequently occurring ones. The analysis of outlier data is referred to as outlier mining. Outlier analysis may discover fraudulent usage of credit cards by detecting purchases of enormously large amounts for a given account number in comparison to regular charges acquired by the same account. Outlier values may also be detected with respect to the location and type of purchase, or the purchase frequency.

1.4.6 Evolution Analysis

Data evolution analysis describes and models regularities or trends for objects whose actions changes over time. Even though this may include characterization, discrimination, association and correlation analysis, classification, prediction, or clustering of time related data, discrete features of such an analysis include time series data analysis, sequence or periodicity pattern matching, and similarity- based data analysis.

1.5 COMMON USES OF DATA MINING

Data mining tools are used to predict future trends and behaviours, which benefits businesses to create knowledge-driven judgements. The automated analysis done by data mining tools is the foundation of decision support systems.

Data mining tools can provide answer to business question that conventionally were time consuming to decide. Today data mining is mostly used by companies with a strong consumer focus-retail, financial, communication and marketing organization. It allows them to govern relationships among internal factors such as price, product positioning, or staff skills and exterior factors such as economic indicators, competition and customer demographics. The amalgamation of data mining and decision support system enables companies to determine the impact on sales , customer satisfaction and corporate profits and also permits them to drill down, into summary information to view particulars transactional data 33.

A large no of corporations use data mining today. Different companies are using data mining for the different purposes. Here are a few areas in which corporations use data mining to attain a strategic benefit:

1. Direct marketing: Data mining tools are used to find out who are the most probable or most desirable customers to buy certain products. This information can be used for numerous marketing activities.
2. Trends analysis: With trend analysis companies are able to predict trends in the market place. Exhausting this information companies can lead to a strategic gain according to current market trends.
3. Fraud detection: companies use data mining techniques to predict which business transaction or customers are expected to be fraudulent so this is used for loan approval, insurance claims, cellular phone or credit card purchases.
4. Forecasting in financial markets: there are many prospects to predict financial markets with data mining methods.

Apart from these applications companies can use data mining also for:

Business information

- Investment analysis
- Loan approval

Manufacturing information

- Controlling and scheduling
- Network management
- Experiment result analysis

Scientific information

- Sky survey cataloguing
- Bio sequence data bases
- Geosciences: quake finder.

1.6 DECISION SUPPORT

Whereas proposing decision support, we need answer many important questions. What precisely is decision making, how is it accomplished by people and how should we support it? Can we categorize decisions and decision processes? This of them can be effectually supported by information technology? How the main components for decision making can be described? As a result of computerised processes what are the input data and its expected output data? What exactly constitutes a "good decision”?

1.6.1 Basic Discipline

While speaking on decision making, the question from a computer scientist typically starts: who makes the decision what kind of the decision has to be made, the man or the computer? The objective of decision support

In decision support, we desire to help people who make decisions; consequently we are mostly interested in human decision making. Though, in computer science and associated disciplines, such as artificial intelligence, the purpose is also to build "intelligent” systems, i.e., computer programs and machines, which are capable to make self-directed decisions by themselves. That is, the focus there is on machine decision making. As a significance, we categorize disciplines which are alarmed with decision making into two main groups: decision sciences and decision systems. These groups are concerned with human and machine decision making, respectively. Decision sciences mention to a broad interdisciplinary field concerned in all traits of human decision making. It draws on economics, forecasting, statistical decision theory, and cognitive psychology, and is normally divided into three main groups:

1. The first group is concerned with rational decision making. The method is referred to as normative or prescriptive, where the decision problem is defined in relations of identifying the best (or optimal) decision, supposing an ideal decision maker who is fully knowledgeable, able to compute with perfect accuracy, and fully rational. Approaches developed in this area are mainly theoretical; classic examples include decision theory, multi-attribute utility theory and game theory 6.
2. The second group is interested in what way people really do make decisions. It has been noticeably presented that people are sensible only to some extent; they incline to use rules of thumb and take shortcuts to select among alternatives. Frequently these shortcuts do well, but often they lead to methodical prejudices and serious errors 7. This approach is called descriptive and is typical for the research in cognitive sciences.
3. The third group is concerned with decision support: assumed what we know about rational decision making and concrete behaviour, how can we help people to improve their decision making? This is the main space of concern for computer scientists and information technologists, who try to deliver effective methods and tools for assisting human decision makers.

1.6.2 Decision Making

Decision making is typically defined as a mental procedure, which encompasses judging multiple choices or alternatives, in order to select one, so as to finest accomplish the aims or goals of the decision maker 6. Consequently, there are two main components involved in decision making: the set of another possibility, adjudicated by the decision maker, and the goals to be satisfied with the choice of one alternate. The output of this process can be an action or a view of choice.

Decision making is a process. This means that in common it takes some time and effort until the choice is made, involving several activities, such as [6, 9]:

- Identification of the decision problem;
- Gathering and verifying pertinent information;
- Identifying decision alternatives;
- Anticipating the consequences of decisions;
- Making the decision informing concerned people and public of the decision and rationale;
- Implementing the selected alternative;
- Evaluating the consequences of the decision.

1.6.3 Classification of decision problems

Decision problems are dissimilar in nature. On the one hand, we are encountered with daily problems, which are typically simple and easy to solve: once to get up in the morning, what kind of clothes to purchase, whether to stopover at the red light or not, etc. On the other hand, there are challenging problems which need large resources, affect numerous people and have significant consequences: which scheme or strategies to take on Indian automobile market, how to systematize public transportation in a capital city, which customer's creditworthiness is good etc. Wherever in between are imperative difficulties of individuals (what to study?), families (where to live?) and organisations (how to endure in the economic calamity?).

In decision support, we are concerned only in most problematic decision problems, which having valuable approaching in a structured and organized way and which have prominence. In other words, it should make sense to gather information about these difficulties, ponder and discuss about the possible solutions, and in general support the procedure with some method, computer program or information system. It is also important to comprehend that it is likely to effectively support only decision problems and processes that are adequately well understood. When impending a problem, we have to identify what precisely we are deciding about, what are the goals and what are the probable consequences of the decision, we should at least partially know the substitutes and their belongings, we have to be conscious of probable uncertainties, etc.

Decision problems can be categorised along diverse dimensions [6, 9]. One classification is into routine and non-routine problems, which repeatedly suggests a substantial difference in trouble. Routine decisions are taken often and repeatedly. The decision maker stereotypically knows them well and feels familiar with the problem. All key factors, consequences and suspicions are well understood and under control. Such decisions are typically easy. In conflicting, non-routine decisions incline to be more difficult, predominantly because of the lack of knowledge and experience in taking such decisions. Often, non-routine decisions are dangerous and have vital consequences 12.

1.6.4 Decision Support System

In decision support system, we use models to forecast the outcome of decision adoptions. We might make for multi-attribute decision problems i.e. decision making circumstances in which the alternatives are described by several attributes that cannot be augmented simultaneously. We develop multi-attribute decision models. Most frequently such models are built in a hierarchical fashion, starting from some general but unfocussed goal statement, which is increasingly refined into more precise sub goals.

The construction of hierarchical multi-attribute decision models is challenging and includes numerous tens of attributes. In most cases, the models are developed physically in a tiresome and prolonged process in which the creators (decision analyst, decision makers, experts, and knowledge engineers) use their knowledge about the problem and employ their ability and knowledge. On the other hand the computers offer comparatively economical and existing means to collect data, there is an increasing volume of data about decisions previously made. This data may comprise useful information for decision support, discovery of essential principles, and different analysis tasks.

In this technique a hierarchical multi-attribute decision model can be developed by using decision examples that might be taken either from a prevailing database of past decisions, or provided explicitly by the decision maker. Each example is described by a set of attributes and its usefulness 12.

1.7 THESIS CONTRIBUTION

This thesis embodies several contributions:

1. We discussed a framework and a collection of algorithms that constructs decision trees using very large-scale repositories of human knowledge.
2. We proposed an improved Decision tree (ID3) algorithm and novel kind of decision tree analysis performed during decision generation.
3. Instantiating our decision tree generation methodology for two specific knowledge repositories, the Open Directory, as well as categories with few training examples.
4. We also develop the decision tree model by using Weka tool and describe a way to further enhance the knowledge embedded in the large repositories.

Chapter 2

RESEARCH BACKGROUND AND PROBLEM DEFINITION

2.1 A SURVEY OF EXISTING WORK

There are some other explanations of decision support that emphasis on particular disciplines, such as operations research and management science 37, decision analysis 19, decision support systems 71, and others including data warehousing , group decision support systems [37,71] and computer-supported cooperative work. Decision analysis introduced by Clemen applied decision theory 19. Decision analysis offers a framework for analyzing decision problems by structuring and breaking them down into more manageable parts, and obviously considering the possible substitutions, available information, uncertainties involved and relevant preferences. Clemen introduces three models in decision making.

2.1.1 Problem with integration of Data Mining and Decision Support

The problem of poor DM and DS integration has only recently been noticeably acknowledged in the scientific literature and little work has been done along the lines presented here. Some recent conference calls for papers confirm the relevance of our lookout, such as 1: "Recently emerged the idea that database management systems should provide support and technology for data mining, as it occurred for OLAP and data warehouses in the last decade", as several others, mentions "integration of data warehousing, OLAP and data mining" 44 in its list of preferred topics of interest. Some conferences are aimed more specifically at decision making and less so at the technological infrastructure supporting it, as in the work of 7, while others are associated with the field of multiagent systems 51: "Integrated knowledge intensive systems emerge in all domains of business and engineering, when intelligent decision support requires knowing how this knowledge is produced, measured, communicated, and interpreted" 98 for an agent-based method to DM. The latest domain of grid intelligence, 50, is concerned with large-scale distributed enterprises information systems and can surely be reflected in promising as it attempts to integrate the so-called data and computational grids with the knowledge grid 16.

Quick developments in information and sensor technologies (IT and ST) along with the availability of large-scale scientific and business data repositories or database management technologies are happened. And these developments combined with computing technologies, computational methods and processing speeds, have released the floodgates to data uttered models and pattern matching 9. The uses of sophisticated and computationally exhaustive analytical methods are expected to become even more commonplace with recent research breakthroughs in computational methods and their commercialization by leading vendors [82, 31]. Advanced models and newly discovered patterns in complex, nonlinear and stochastic systems, including the natural and human environments, have demonstrated the efficiency of these approaches. However, applications that can utilize these tools in the context of scientific databases in a scalable approach have only begun to emerge [34, 31].

2.1.2 Evolution of Decision Support System (DSS)

According to Keen 49 "the concept of decision support has evolved from two main areas of research: The theoretical studies of organizational decision making done at the Carnegie Institute of Technology during the late 1950s and early 1960s, and the technical work on interactive computer systems, mainly carried out at the Massachusetts Institute of Technology in the 1960s." It is considered that before gaining in intensity during the 1980s, the concept of DSS became an area of research of its own in the middle of the 1970s. In the middle and late 1980s, various types of DSS evolved from the single user and model-oriented DSS and these are like executive information systems (EIS), group decision support systems (GDSS), and organizational decision support systems (ODSS)

According to Sol 36 the definition and scope of DSS has been migrating over the years. In the 1970s DSS was described as "a computer based system to aid decision making". Late 1970s the DSS movement started focusing on "interactive computer­based systems which help decision-makers utilize data bases and models to solve ill- structured problems". In the 1980s DSS should provide systems "using suitable and available technology to improve effectiveness of managerial and professional activities", and end 1980s DSS faced a new challenge towards the design of intelligent workstations 36.

In 1987 Texas Instruments completed development of the Gate Assignment Display System (GADS) for United Airlines. This decision support system in the management of ground operations at various airports, beginning with O'Hare International Airport in Chicago and Stapleton Airport in Denver Colorado [24, 28] is credited with significantly reducing travel delays. In 1990, data warehousing and on-line analytical processing (OLAP) began broadening the realm of DSS. As the opportunity of the millennium approached, new Web-based analytical applications were presented.

Relational learning research is not a new research area and it has a long history. In the learning multi-relational domains, Muggleton and DeRaedt 65 introduce the concept of Inductive Logic Programming (ILP) and its theory, methods and implementations. The selections of aggregation methods and parameters also have important effects on the results on noisy real-world domains 53. Krogel, have shown a comparative evaluation of approaches to Boolean and numeric aggregation in propositionalization; though their results are indecisive 54.

In contrast, Perlich and Provost 68 have found that logic-based relational learning and logic-based (binary) propositionalization perform poorly on a noisy domain compare to numerical propositionalization. Another variant of relational learning include distance-based methods [68, 52]. The central idea of distance-based methods is that it is possible to compute the mutual distance 38 for each pair of object for clustering. Probabilistic Relational Models (PRMs) deliver another approach to relational data mining that is grounded in a sound statistical framework 29 Getoor , present a model that specifies, for each attributes of an object, its (probabilistic) dependency on other attributes of that object and on attributes of related objects.

For specific propose a combination approach used, and it is called Structural Logistic Regression (SLR) that combines relational and statistical learning 72. Database numeric aggregation 32 techniques propose a method in which aggregation is done by using some of built-in functions of common relational database system [29, 56].

Alternative approach suggested by Perlich and Provost, where the main technique use vector distances for dimensionality reduction and is adept of aggregating high­dimensional categorical attributes that traditionally have pretended a important challenge in relational modelling 68.

2.1.3 A Survey of existing decision tree algorithm (Traditional)

In 1986 IDE3 (Iterative Dichotomiser 3) decision tree algorithm was introduced in 1986 by Quinlan Ross 73.It is based on Hunt's algorithm and it is consecutively implemented. Like other decision tree algorithms the tree is constructed in two phases; tree growth and tree pruning.IDE3 uses information gain measure in choosing the splitting attribute. It only accepts categorical attributes in construction a tree model 73. IDE3 does not give accurate result when there is too-much noise or details in the training data set, thus a an intensive pre-processing of data is carried out before building a decision tree model with IDE3.

In 1993 C4.5 algorithm is an improvement of IDE3 algorithm, developed by Quinlan Ross (1993) 74. It is based on Hunt's algorithm and also like IDE3, it is serially implemented. Pruning takes place in C4.5 by replacing the internal node with a leaf node thereby reducing the error rate 74. Unlike IDE3, C4.5 accepts both continuous and categorical attributes in building the decision tree. It has an enhanced method of tree pruning that reduces misclassification errors due noise or too-much details in the training data set.

In 1996 SLIQ (Supervised Learning In Ques) was introduced by Mehta et al, (1996) 63. It is a fast, scalable decision tree algorithm that can be implemented in serial and parallel pattern. It is not based on Hunt's algorithm for decision tree classification. It partitions a training data set recursively using breadth-first greedy strategy that is integrated with pre-sorting technique during the tree building phase 63. One of the disadvantages of SLIQ is that it uses a class list data structure that is memory resident thereby imposing memory restrictions on the data.

In 1997 M. Bohanec 12 at el. presented a novel method for the development of hierarchical multiattribute decision models from an unstructured set of decision examples. The method develops a hierarchical structure by discovering new aggregate attributes and their and their descriptions. Each new aggregate attribute is described by an example set whose complexity is lower than the complexity of initial set. The method is based on function decomposition.

2.1.4 A Survey of existing decision tree algorithm (Advanced)

In 2005, A. Zimmermann & B. Bringmann 100 presented CTC (Correlating Tree patterns for Classification), a new approach to structural classification. It uses the predictive power of tree patterns correlating with the class values. It combines tree mining with sophisticated pruning techniques to find k most discriminative patterns in a dataset. The experiments show that CTC classifiers achieve good accuracies while the induced models are smaller than those of existing approaches.

In 2007, Siegfried Nijssen et al. Presented 81 DL8, an algorithm for finding decision trees that maximize an optimization criterion under constraints, and successfully applied this algorithm on a large number of datasets. They showed that there is a clear link between DL8 and frequent itemsets miners, which means that it is possible to apply many of the optimizations that have been proposed for itemsets miners also when mining decision trees under constraints. In experiments they compared the test set accuracies of trees mined by DL8 and C4.5. Under the same frequency thresholds, they found that the trees learned by DL8 are often significantly more accurate than trees learned by C4.5. When they compare the best settings of both algorithms, J48 performs significantly better for 45% of the datasets. Many open questions regarding the instability of decision trees, the intense of size constraints, heuristics, pruning strategies, and so on, may be answered by further studies of the results of DL8.

In 2008 B. Chandra and P. Paul Varghese founded 11 that Traditional decision tree algorithms face the problem of having sharp decision boundaries and it is barely found in any real-life classification problems. A fuzzy supervised learning in Quest (SLIQ) decision tree (FS-DT) algorithm is suggested by them. "It is aimed at constructing a fuzzy decision boundary instead of a hard decision boundary. Size of the decision tree constructed is another very important parameter in decision tree algorithms. Large and deeper decision tree results in incomprehensible induction rules”. The proposed FS-DT algorithm transforms the SLIQ decision tree algorithm to construct a fuzzy binary decision tree of considerably reduced size. The performance of the FS-DT algorithm is compared with SLIQ using several real-life datasets taken from the UCI Machine Learning Repository. The FS-DT algorithm outperforms its crisp complement in terms of classification accuracy. FS-DT also results in more than 70% reduction in size of the decision tree compared to SLIQ.

In 2009 35 Hanif D. Sherali et al. presented a 0-1 programming approach to optimally prune decision trees, while allowing for a variety of (possibly multiple) objective functions, as well as side-constraints that control the size, statistical characteristic, and structural properties of the resultant tree. They have presented that the basic model that seeks to minimize some general penalty function possesses a totally unimodular interval matrix-based set-partitioning structure, which is transformable to a shortest-path problem on an acyclic graph. Additionally, this model with an additional constraint governing the number of leaf nodes in the final tree can be solved in O(mn) time, where m is the number of leaf nodes in the original tree. For more commonly constrained problems, they have unveiled empirically that the primary model structure facilitates the efficient solution of large-scale problems using an off-the-shelf commercial software such as CPLEX (2005). To encourage this modelling approach, they have also described a variety of objective functions and side-constraints that can be used in practice, expressed in terms of the model variables that are predicated on the important key feature of whether or not a node turns out to be a leaf node. This has been demonstrated through a numerical example, as well as in the context of a real-life transportation planning system.

SPRINT (Scalable Parallelizable Induction of decision Tree algorithm) was introduced by Shafer et al, 1996 79. It is a fast, scalable decision tree classifier. It is based on partitions the training data set recursively using breadth first greedy technique until each partition belong to the same leaf node or class but not rely on Hunt's algorithm in constructing the decision tree.

In 2009 Gjorgji Madzarov, Dejan Gjorgjevikj, member, IEEE, have presented a novel method of arranging a binary classifiers 30 like support vector machines. This method solve a multi-class problem. The proposed Support Vector Machines in Decision Tree Architecture (SVM-DTA) method was designed to provide higher recognition speed using decision tree architecture, even though keeping comparable acknowledgment rate to the other known methods. The algorithm that utilizes Mahalanobis distance measures at the kernel space is used to convert the multi-class problem into series of binary decision problems in a binary decision tree structure. In this structure the binary decisions are made by the SVMs. The experimentations performed on 2 different datasets of handwritten digits and letters. The experiment have shown that this method has one of the fastest training times and comparable testing times while keeping similar recognition rate to the other methods. SVM-DTA also shows very good scalability with the number of the classes in the classification problem as its complexity grows much slower than the other methods, making it a preferable choice for multiclass classification problems with large number of classes.

In 2011 Yujin Zhou et al., discoursed a multi- classifier combined decision tree hierarchical classification method 97. This method adopted the C4.5 decision tree algorithm to create an initial decision tree, get the decision rules, and connect some of the nodes in the tree with other classifier to generate a hybrid decision tree. Further it can use the HDT to classify unknown images. Compared with single classifier method, this method can assimilate classification information obtained in different classifiers, avoid the one-sidedness produced by single classifier method, improved the accuracy of classification and realized the automatic classification multi-source remote sensing data. The classification structure of HDT is complicated; the selection of classifier for classification and attributes for classification will be the emphasis of further research .

In 2011 XIN Jinguo et al. discussed as Decision tree algorithm 95 to be considered as the most extensively used classification methods. From large amounts of data in economic statistics, we can apply the decision tree in various classifications and various predictions. According to the practical application of the article, we can find that decision tree which as a non-parametric analysis method of data mining, has very substantial advantages in data analysis and data mining: a. Rapid building, high precision and less calculation; b. Output results are very intuitive, easy to understand and interpret; c. Able to process multiple types of the data; d. Able to handle the missing values. But decision tree technology is a "greedy" search, using decision tree method. We will encounter some issues in data preparation and data representation, such as: do not know the data details; data translation is not precise, fraudulent data etc. These issues will affect the final decision tree building and the accuracy and practicality of rule extraction, that is, the application of decision tree has certain limitations.

[...]

Excerpt out of 134 pages

Details

Title
Data Mining Multi-Attribute Decision System. Facilitating Decision Support Through Data Mining Technique by Hierarchical Multi-Attribute Decision Models
College
Symbiosis International University
Authors
Year
2020
Pages
134
Catalog Number
V950605
ISBN (eBook)
9783346292315
ISBN (Book)
9783346292322
Language
English
Keywords
data, mining, multi-attribute, decision, system, facilitating, support, through, technique, hierarchical, models
Quote paper
Dr. Pankaj Pathak (Author)Dr. Parashu Ram Pal (Author), 2020, Data Mining Multi-Attribute Decision System. Facilitating Decision Support Through Data Mining Technique by Hierarchical Multi-Attribute Decision Models, Munich, GRIN Verlag, https://www.grin.com/document/950605

Comments

  • No comments yet.
Look inside the ebook
Title: Data Mining Multi-Attribute Decision System. Facilitating Decision Support Through Data Mining Technique by Hierarchical Multi-Attribute Decision Models



Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free