Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Breast Cancer Wisconsin (Original) Data Set

Below are papers that cite this data set, with context shown. Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.

Return to Breast Cancer Wisconsin (Original) data set page.


Gavin Brown. Diversity in Neural Network Ensembles. The University of Birmingham. 2004.

critical to consider values for the strength parameter outside the originally specified range. Table 5.3 shows the classification error rates of two empirical tests, on the Wisconsin breast cancer dataset from the UCI repository (699 patterns), and the Heart disease dataset from Statlog (270 patterns). An ensemble consisting of two networks, each with five hidden nodes, was trained using NC. We use


András Antos and Balázs Kégl and Tamás Linder and Gábor Lugosi. Data-dependent margin-based generalization bounds for classification. Journal of Machine Learning Research, 3. 2002.

attributes were binary coded in a 1-out-of-n fashion. Data points with missing attributes were removed. Each attribute was normalized to have zero mean and 1= p d standard deviation. The four data sets were the Wisconsin breast cancer (n = 683, d = 9), the ionosphere (n = 351, d = 34), the Japanese credit screening (n = 653, d = 42), and the tic-tac-toe endgame (n = 958, d = 27) database. 84


Kristin P. Bennett and Ayhan Demiriz and Richard Maclin. Exploiting unlabeled data in ensemble methods. KDD. 2002.

experiments we used simple multilayer perceptrons with a single layer of hidden units. The networks were trained using backpropagation with a learning rate of 0.15 and a momentum value of 0.90. The datasets for the experiments are breast cancer wisconsin pima-indians diabetes, and letter-recognition drawn from the UCI Machine Learning repository [3]. The number of units in the hidden layer for the


Hussein A. Abbass. An evolutionary artificial neural networks approach for breast cancer diagnosis. Artificial Intelligence in Medicine, 25. 2002.

well, compared to the previous studies. In another study, Setiono [26] used his rule extraction from ANNs algorithm [28, 29] to extract useful rules that can predict breast cancer from the Wisconsin dataset. He needed first to train an ANN using BP and achieved an accuracy level on the test data of approximately 94%. After applying his rule extraction technique, the accuracy of the extracted rule set


Robert Burbidge and Matthew Trotter and Bernard F. Buxton and Sean B. Holden. STAR - Sparsity through Automated Rejection. IWANN (1). 2001.

available from the UCI Machine Learning Data Repository [11], are as follows. The breast cancer Wisconsin data set has 699 examples in nine dimensions and is `noise-free', one feature has 16 missing values which are replaced with the feature mean. The ionosphere data set has 351 examples in 33 dimensions and is


Lorne Mason and Peter L. Bartlett and Jonathan Baxter. Improved Generalization Through Explicit Optimization of Margins. Machine Learning, 38. 2000.

chosen as the final solution. In some cases the training sets were reduced in size to makeoverfitting more likely (so that complexity regularization with DOOM could have an effect). In three of the datasets (Credit Application, Wisconsin Breast Cancer and Pima Indians Diabetes), AdaBoost gained no advantage from using more than a single classifier. In these datasets, the number of classifiers was


Endre Boros and Peter Hammer and Toshihide Ibaraki and Alexander Kogan and Eddy Mayoraz and Ilya B. Muchnik. An Implementation of Logical Analysis of Data. IEEE Trans. Knowl. Data Eng, 12. 2000.

the housing value is above or below the median. Using training sets of 80% of the observations, [16] reports correct prediction rates ranging from 82% to 83.2%. Breast Cancer Wisconsin . The dataset, compiled by O. Mangasarian and K.P. Bennett, is widely used in the machine learning community for comparing learning algorithms. It is, however, difficult to use it for rigorous comparisons since


Justin Bradley and Kristin P. Bennett and Bennett A. Demiriz. Constrained K-Means Clustering. Microsoft Research Dept. of Mathematical Sciences One Microsoft Way Dept. of Decision Sciences and Eng. Sys. 2000.

the Johns Hopkins Ionosphere dataset and the Wisconsin Diagnostic Breast Cancer dataset (WDBC) [7]. The Ionosphere dataset contains 351 data points in R 33 and values along each dimension Contrained K-Means Clustering 6 0 5 10 15 20 25


Huan Liu and Hiroshi Motoda and Manoranjan Dash. A Monotonic Measure for Optimal Feature Selection. ECML. 1998.

with unknown relevant attributes, consists of WBC - the Wisconsin Breast Cancer data set, LED-7 - data with 7 Boolean attributes and 10 classes, the set of decimal digits (0..9), Letter - the letter image recognition data, LYM - the lymphography data, and Vote - the U.S. House of


Rudy Setiono and Huan Liu. NeuroLinear: From neural networks to oblique decision rules. Neurocomputing, 17. 1997.

A. Detailed analysis 1: The University of Wisconsin Breast Cancer Dataset. This data set has been used as the test data for several studies on pattern classification methods using linear programming techniques [1, 13] and statistical techniques [23]. Each pattern is


Kristin P. Bennett and Erin J. Bredensteiner. A Parametric Optimization Method for Machine Learning. INFORMS Journal on Computing, 9. 1997.

of the Federal Reserve Bank of Dallas [BS90], has 9 numeric features which range from 0 to 1. The data represent 4311 successful banks and 441 failed banks. Wisconsin Breast Cancer Database This dataset is used to classify a set of 682 patients with breast cancer [WM90]. Each patient is represented by nine integral attributes ranging in value from 1 to 10. The two classes represented are benign and


Jennifer A. Blue and Kristin P. Bennett. Hybrid Extreme Point Tabu Search. Department of Mathematical Sciences Rensselaer Polytechnic Institute. 1996.

(Liver); the PIMA Indians Diabetes dataset (Diabetes), the Wisconsin Breast Cancer Database (Cancer) [23], and the Cleveland Heart Disease Database (Heart) [9]. We used 5-fold cross validation. Each dataset was divided into 5 parts. The


Jarkko Salojarvi and Samuel Kaski and Janne Sinkkonen. Discriminative clustering in Fisher metrics. Neural Networks Research Centre Helsinki University of Technology.

and secondly through the density function estimate that generates the metric used to define the Fisherian Voronoi regions. IV. EXPERIMENTS Experiments were run with the Wisconsin breast cancer data set from the UCI machine learning repository [9]. The 569 samples consisted of 30 attributes, measured from malignant and benign tumors. We chose the ordinary k-means as the baseline reference method.


Wl odzisl and Rafal Adamczak and Krzysztof Grabczewski and Grzegorz Zal. A hybrid method for extraction of logical rules from data. Department of Computer Methods, Nicholas Copernicus University.

obtained from the UCI repository [14]. A. Wisconsin breast cancer data. The Wisconsin cancer dataset [17] contains 699 instances, with 458 benign (65.5%) and 241 (34.5%) malignant cases. Each instance is described by the case number, 9 attributes with integer value in the range 1-10 (for example,


Rafael S. Parpinelli and Heitor S. Lopes and Alex Alves Freitas. PART FOUR: ANT COLONY OPTIMIZATION AND IMMUNE SYSTEMS Chapter X An Ant Colony Algorithm for Classification Rule Discovery. CEFET-PR, Curitiba.

2. The numbers after the "±" symbol are the standard deviations of the corresponding accuracy rates. As shown in this table, Ant-Miner discovered rules with a better accuracy rate than C4.5 in four data sets, namely Ljubljana breast cancer Wisconsin breast cancer, Hepatitis and Heart disease. In two data sets, Ljubljana breast cancer and Heart disease, the difference was quite small. In the other two


Andrew I. Schein and Lyle H. Ungar. A-Optimality for Active Learning of Logistic Regression Classifiers. Department of Computer and Information Science Levine Hall.

54. The lodgepole pine variety of tree happens to represent about 50% of the observations and so we merge all other tree types into a single category. The Wisconsin Diagnostic Breast Cancer (WDBC) data set consists of evaluation measurements (predictors) and final diagnosis for 569 patients. The goal is to predict the diagnosis using the measurements. The number of predictors is 30. The Thyroid Domain


Rudy Setiono and Huan Liu. Neural-Network Feature Selector. Department of Information Systems and Computer Science National University of Singapore.

are described below. 1. The University of Wisconsin Breast Cancer Diagnosis Dataset. The Wisconsin Breast Cancer Data (WBCD) is a large data set that consists of 699 patterns of which 458 are benign samples and 241 are malignant samples. Each of these patterns consists of nine


Return to Breast Cancer Wisconsin (Original) data set page.

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML