Neil: Data Mining

2011年4月23日星期六

1. Definition: the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management.

2. Data mining commonly involves four classes of tasks:^[18]

Association rule learning – Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.
Clustering – is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data.
Classification – is the task of generalizing known structure to apply to new data. For example, an email program might attempt to classify an email as legitimate or spam. Common algorithms include decision tree learning, nearest neighbor, naive Bayesian classification, neural networks and support vector machines.
Regression – Attempts to find a function which models the data with the least error.

Neil