Are We Really Discovering “Interesting”Knowledge from Data?Alex A. FreitasUniversity of Kent, UKOutline of the TalkzIntroduction to Knowledge Discovery–Criteria to evaluate discovered patternszSelecting “interesting”rules/patterns–User-driven (subjective) approach–Data-driven (objective) approachzSummaryzResearch Directions1The Knowledge Discovery Process–the “standard”definitionz“Knowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data”(Fayyadet al. 1996)zFocus on the quality of discovered patterns–independent of the data mining algorithmzThis definition is often quoted, but not very seriously taken into accountCriteria to Evaluate the “Interestingness”of Discovered PatternsusefulAmount of Difficulty of Researchnovel, surprisingmeasurementcomprehensiblevalid (accurate)2Main Machine Learning / Data Mining Advances in the Last Decade ??zEnsembles (boosting, bagging, etc)zSupport Vector MachineszEtc…zThese methods aim at maximizing accuracy, but…–they usually discover patterns that are not comprehensibleMotivation for comprehensibilityzUser’s interpretation and validation of discovered patternszGive the user confidence in the discovered patterns–Improves actionability, usefulnesszImportance of comprehensibility depends on the application–Pattern recognition vs. medical/scientific applications 3Discovering comprehensible knowledgezMost comprehensible kinds of knowledge representation ??–Decision trees–IF-THEN rules–Bayesian nets–Etc…zNo major comparison of the comprehensibility of these representations -(Pazzani, 2000)zTypical comprehensibility measure: size of the model–Syntactical measure, ignores semanticsAccuracy and comprehensibility do not imply novelty or surprisingnessz(Brinet al. 1997) found 6,732 rules with a maximum ‘conviction’valuein Census data, e.g.:–“five years olds don’t work”–“unemployed residents don’t earn income from work”–“men don’t give birth”z(Tsumoto2000) found 29,050 rules, out of which only 220 (less than 1%) were considered interesting or surprising by the userz(Wong & Leung 2000) found rules with 40-60% confidence which were considered novel and more accurate than some doctor’s domain knowledge4The intersection between useful (actionable) and surprising patternsz(Silberchatz& Tuzhilin1996) argue that there is a large intersection between these two notionsactionablesurprising¾The remainder of the talk focuses on the notions of novelty or surprisingness(unexpectedness)Selecting the Most Interesting Rules in a Post-Processing StepzGoal: select a subset of interesting rules out of all discovered rules, in order to show to the user just the most interesting ruleszUser-driven, “subjective”approach–Uses the domain knowledge, believes or preferences of the userzData-driven, “objective”approach–Based on statistical properties of discovered patterns–More generic, independent of the application domain and user5Data-driven vsuser-driven rule interestingness measuresall rulesall rules Note: user-drivenmeasures ≠real user interestdata-drivenuser-drivenmeasuremeasurebelieves, Dom. Knowl.user userselectedrulesselectedrulesSelecting Interesting Association Rules based on User-Defined TemplateszUser-driven approach: the user specifies templates for interesting and non-interesting ruleszInclusive templates specify interesting rules zRestrictive templates specify non-interesting ruleszA rule is considered interesting if it matches at least one inclusive template and it matches no restrictive template(Klemettinenet al. 1994)6Selecting Interesting Association Rules –an ExamplezHierarchies of items and their categoriesfood drink clothesrice fish …hot dog coke wine …water trousers …socks zInclusive template: IF (food) THEN (drink)zRestrictive template: IF (hot dog) THEN (coke)zRule IF (fish) THEN (wine) is interestingSelecting Surprising Rules based on General Impressions zUser-driven approach: the user specifies her/his general impressions about the application domain(Liu et al. 1997, 2000); (Romaoet al. 2004)zExample:IF (salary = high) THEN (credit = good)(different from “reasonably precise knowledge”, e.g. IF salary > £35,500 …)zDiscovered rules are matched with the user’s general impressions in a post-processing step, in order to determine the most surprising (unexpected) rules7Selecting Surprising Rules based on General Impressions –cont.zCase I –rule with unexpected consequentA rule and a general impression have similar antecedents but different consequentszCase II –rule with unexpected antecedentA rule and a general impression have similar consequents but the rule antecedent is different from the antecedent of all general impressionszIn both cases the rule is considered unexpectedSelecting Surprising Rules based on General Impressions -examplezGENERAL IMPRESSION:IF (salary = high) AND (education_level = high) THEN (credit = good)zUNEXPECTED RULES:IF (salary > £30,000) AND (education ≥BSc) AND (mortage= yes) THEN (credit = bad) /* Case I -unexpected consequent */IF (mortgage = no) AND (C/A_balance > £10,000)THEN (credit = good) /* Case II –unexpected antecedent *//* assuming no other general impression has a similar antecedent */8Data-driven approaches for discovering interesting patternszTwo broad groups of approaches:zApproach based on the statistical strength of patterns–entirely data-driven, no user-driven aspectzBased on the assumption that a certain kind of pattern is surprising to “any user”, in general–But still independent from the application domain and from believes / domain knowledge of the target user–So, mainly data-drivenRule interestingness measures based on the statistical strength of the patternszMore than 50 measures have been called rule “interestingness”measures in the literaturezFor instance, using rule notation: IF (A) THEN (C)zInterest = |A ∩C| –(|A| ×|C|) / N(Piatetsky-Shapiro, 1991)–Measures deviation from statistical independence between A, C–Measures co-occurrence of A and C, not implication A →CzThe vast majority of these measures are based only on statistical strength or mathematical properties (Hilderman& Hamilton 2001), (Tan et al. 2002)zAre they correlated with real human interest ??9Evaluating the correlation between data-driven measures and real human interestz3-step Methodology:–Rank discovered rules according to each of a number of data-driven rule interestingness measures–Show (a subset of) discovered rules to the user, who evaluates how interesting each rule is–Measure the linear correlation between the ranking of each data-driven measure and real human interestzRules are evaluated by an expert user, who also provided the dataset for the data minerFirst Study –(Ohsakiet al. 2004)zA single dataset (hepatitis), and a single userz39 data-driven rule interestingness measures zFirst experiment: 30 discovered rules–1 measure had correlation ≥0.4; correlation: 0.48zSecond experiment: 21 discovered rules–4 measures had correlation ≥0.4; highest corr.: 0.48zIn total, out of 78 correlations, 5 (6.4%) were ≥0.4zNB: the paper also shows other performance measures10Second Study –(Carvalhoet al. 2005)z8 datasets, one user for each datasetz11 data-driven rule interestingness measures zEach data-driven measure was evaluated over 72
pairs (8 users ×9 rules/dataset)z88 correlation values (8 users ×11 measures)–31 correlation values (35%) were ≥0.6–The correlations associated with each measure varied considerably across the 8 datasets/userszMore recent results with 45 users (5 users x 9 datasets) obtained a significantly lower correlation, overallData-driven approaches based on finding specific kinds of surprisingpatternszPrinciple: a few special kinds of patterns tend to be surprising to most users, regardless of the “meaning”of the attributes in the pattern and the application domainzDoes not require domain knowledge or believes specified by userzWe discuss two approaches:–Discovery of exception rules contradicting general rules–Detection of Simpson’s paradox11Selecting Surprising Rules Which Are Exceptions of a General RulezData-driven approach: consider the following 2 rulesR1: IF Cond1 THEN Class1(general rule)R2: IF Cond1AND Cond2THEN Class2(exception)Cond1, Cond2are conjunctions of conditionszRule R2is a specialization of rule R1zRules R1and R2predict different classeszAn exception rule is interesting if both the rule and its general rule have a high predictive accuracy (Suzuki & Kodratoff1998), (Suzuki 2004)Selecting Surprising Exception Rules –an ExamplezMining data about car accidents –(Suzuki 2004)IF (used_seat_belt = yes) THEN (injury = no) (general rule)IF (used_seat_belt = yes) AND (passenger = child) THEN (injury = yes) (exception rule)zIf both rules are accurate, then the exception rule would be considered interesting (it is both accurate and surprising)12Simpson’s Paradox –an examplezIncome and taxes in the USA (in 109US$)Year Value 1974 income 880.18tax 123.69rate 0.141 From 1974 to 1978, the1978 tax rate (tax / income)income 1242.40 has increasedtax 188.58rate 0.152Simpson’s Paradox –an example (cont.)income category (in 103US$)Year <5 5..10 10..15 15..100 >100 Total1974 income 41.7 146.4 192.7 470.0 29.4 880.18tax 2.2 13.7 21.5 75.0 11.3 123.69rate 0.054 0.093 0.111 0.160 0.384 0.141 1978 income 19.9 122.9 171.9 865.0 62.81 1242.40 tax 0.7 8.8 17.2 137.9 24.05 188.58rate 0.035 0.072 0.100 0.159 0.383 0.152Overall the rate increased, but it decreased in each category!13Discovering Instances of Simpson’s Paradox as Surprising PatternszSimpson’s paradox occurs when: zOne value of attribute A(year) increases the probability of event X(tax) in a given population but, at the same time, decreases the probability of Xin every subpopulation defined by attribute B (income category)zAlthough Simpson’s paradox is known by statisticians, occurrences of the paradox are surprising to userszAlgorithms that find all instances of the paradox in data and rank them in decreasing order of surprisingness: –(Fabris& Freitas 1999, 2005) –algorithms for “flat”, single-relation data and for hierarchical multidimensional (OLAP) dataSummaryzData mining algorithms can easily discover too many rules/patterns to the user –there is a clear motivation for selecting the most interesting rules/patternszThe main challenge is to discover novel, useful patterns, going beyond accuracy and comprehensibilityzUser-driven and data-driven approaches have complementary advantages and disadvantages–Using a hybrid approach seems sensiblezDiscovering patterns that are truly interesting to the user without using a lot of user-specified domain knowledge is an open problem14Research Direction –increasing the autonomy of the user-driven approachzUser-driven approach is very user-dependent, requires “prior model”(e.g. general impressions) from the userzAutomatic learning of general impressions with text miningdataWeb text general datamining impressions miningpatternsResearch Direction –a broader role for interestingness measureszThink about interestingness (surprisingness, usefulness) since the start of the KDD process:pre-procdata miningpost-procinterestingness measuresE.g. Attribute selection and construction trying to maximize not only accuracy and comprehensibility, but also surprisingnessand/or usefulness15Any Questions ??Thanks for listening!Referencesz(Abe et al. 2005) H. Abe, S. Tsumoto, M. Ohsaki, T. Yamaguchi. A rule evaluation support method with learning models based on objective rule evaluation indices. To appear in 2005 IEEE Int. Conf. on Data Mining.z(Basuet al. 2001) S. Basu, R.J. Mooney, K.V. Pasupuleti, J. Ghosh. Evaluating the novelty of text-mined rules using lexical knowledge. Proc. KDD-2001, 233-238. z(Brinet al. 1997) S. Brin, R. Motwani, J.D. Ullman, S. Tsur. Dynamic itemsetcounting and implication rules for market basket data. Proc. KDD-97.z(Carvalhoet al. 2005) D.R. Carvalho, A.A. Freitas, N.F. Ebecken. Evaluating the correlation between objective rule interestingness measures and real human interest. To appear in Proc. PKDD-2005.16References (cont.)z(Fabris& Freitas1999) C.C. Fabrisand A.A. Freitas. Discovering surprising patterns by detecting occurrences of Simpson's paradox. In: Research and Development in Intelligent Systems XVI, 148-160. Springerz(Fabris& Freitas 2005) C.C. Fabrisand A.A. Freitas. Discovering surprising instances of Simpson’s paradox in hierarchical multidimensional data. To appear in Int. J. of Data Warehousing and Mining.z(Fayyadet al., 1996) U. Fayyad, G. Piatetsky-Shapiro, P. Smyth. From data mining to knowledge discovery: an overview. In: Advances in Knowledge Discovery and Data Mining, 1-34. AAAI Press.z(Hilderman & Hamilton 2001) R.J. Hildermanand H.J. Hamilton. Knowledge Discovery and Measures of Interest, Kluwer.References (cont.)z(Klemettinenet al. 1994) M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, A.I. Verkamo. Finding interesting rules from large sets of discovered association rules. Proc. 3rd Int. Conf. on Information and Knowledge Management, 401-407. z(Liu et al. 1997) B. Liu, W. Hsu, S. Chen. Using general impressions to analyzediscovered classification rules. Proc. KDD-97, 31-36z(Liu et al. 1999) B. Liu, W. Hsu, L-F. Mun, H-Y Lee. Finding interesting patterns using user expectations. IEEE Trans. on Knowledge Engineering 11(6), 817-832. z(Ohsakiet al. 2004) M. Ohsaki, S. Kitaguchi, K. Okamoto, H. Yokoi, T. Yamaguchi. Evaluation of rule interestingness measures with a clinical dataset on hepatitis. Proc. PKDD-2004, 362-373.17References (cont.)z(Pazzani 2000) M.J. Pazzani. Knowledge discovery from data? IEEE Intelligent Systems, Mar/Apr. 2000, pp.10-13.z(Piatetsky-Shapiro, 1991) G. Piatetsky-Shapiro. Discovery, analysis and presentation of strong rules. In: Knowledge Discovery in Databases, 229-248. AAAI/MIT Press.z(Romaoet al. 2004) W. Romao, A.A. Freitas, I.M.S. Gimenes. Discovering Interesting Knowledge from a Science & Technology Database with a Genetic Algorithm. Applied Soft Computing 4, 121-137.z(Silberchatz& Tuzhilin1996) S. Silberchatzand A. Tuzhilin. What makes patterns interesting in knowledge discovery systems. IEEE Trans. Knowledge and Data Engineering, 8(6).z(Suzuki & Kodratoff1998) E. Suzuki and Y. Kodratoff. Discovery of surprising exception rules based on intensity of implication. Proc. PKDD-98.References (cont.)z(Suzuki 2004) E. Suzuki. Discovering interesting exception ruleswith rule pair. Proc. Workshop on Advances in Inductive Rule Learning at PKDD-2004, 163-178.z(Tan et al. 2002) P-N. Tan, V. Kumar, J. Srivastava. Selecting the right interestingness measure for association patterns. Proc. KDD-2002.z(Tsumoto2000) S. Tsumoto. Clinical knowledge discovery in hospital information systems: two case studies. Proc. PKDD-2000, 652-656.z(Wong & Leung 2000) M.L. Wong and K.S. Leung. Data mining using grammar based genetic programming and applications. Kluwer.18