Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

Abalone Data Set
Download: Data Folder, Data Set Description

Abstract: Predict the age of abalone from physical measurements

Data Set Characteristics:  


Number of Instances:




Attribute Characteristics:

Categorical, Integer, Real

Number of Attributes:


Date Donated


Associated Tasks:


Missing Values?


Number of Web Hits:



Data comes from an original (non-machine-learning) study:
Warwick J Nash, Tracy L Sellers, Simon R Talbot, Andrew J Cawthorn and Wes B Ford (1994)
"The Population Biology of Abalone (_Haliotis_ species) in Tasmania. I. Blacklip Abalone (_H. rubra_) from the North Coast and Islands of Bass Strait",
Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288)

Original Owners of Database:

Marine Resources Division
Marine Research Laboratories - Taroona
Department of Primary Industry and Fisheries, Tasmania
GPO Box 619F, Hobart, Tasmania 7001, Australia
(contact: Warwick Nash +61 02 277277, wnash '@'

Donor of Database:

Sam Waugh (Sam.Waugh '@'
Department of Computer Science, University of Tasmania
GPO Box 252C, Hobart, Tasmania 7001, Australia

Data Set Information:

Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem.

From the original data examples with missing values were removed (the majority having the predicted value missing), and the ranges of the continuous values have been scaled for use with an ANN (by dividing by 200).

Attribute Information:

Given is the attribute name, attribute type, the measurement unit and a brief description. The number of rings is the value to predict: either as a continuous value or as a classification problem.

Name / Data Type / Measurement Unit / Description
Sex / nominal / -- / M, F, and I (infant)
Length / continuous / mm / Longest shell measurement
Diameter / continuous / mm / perpendicular to length
Height / continuous / mm / with meat in shell
Whole weight / continuous / grams / whole abalone
Shucked weight / continuous / grams / weight of meat
Viscera weight / continuous / grams / gut weight (after bleeding)
Shell weight / continuous / grams / after being dried
Rings / integer / -- / +1.5 gives the age in years

The readme file contains attribute statistics.

Relevant Papers:

Sam Waugh (1995) "Extending and benchmarking Cascade-Correlation", PhD thesis, Computer Science Department, University of Tasmania.
[Web Link]

David Clark, Zoltan Schreter, Anthony Adams "A Quantitative Comparison of Dystal and Backpropagation", submitted to the Australian Conference on Neural Networks (ACNN'96).

Papers That Cite This Data Set1:

Jianbin Tan and David L. Dowe. MML Inference of Decision Graphs with Multi-way Joins and Dynamic Attributes. Australian Conference on Artificial Intelligence. 2003. [View Context].

Edward Snelson and Carl Edward Rasmussen and Zoubin Ghahramani. Warped Gaussian Processes. NIPS. 2003. [View Context].

Alexander G. Gray and Bernd Fischer and Johann Schumann and Wray L. Buntine. Automatic Derivation of Statistical Algorithms: The EM Family and Beyond. NIPS. 2002. [View Context].

Christopher K I Williams and Carl Edward Rasmussen and Anton Schwaighofer and Volker Tresp. Observations on the Nystrom Method for Gaussian Process Prediction. Division of Informatics Gatsby Computational Neuroscience Unit University of Edinburgh University College London. 2002. [View Context].

Anton Schwaighofer and Volker Tresp. Transductive and Inductive Methods for Approximate Gaussian Process Regression. NIPS. 2002. [View Context].

Bernhard Pfahringer and Hilan Bensusan and Christophe G. Giraud-Carrier. Meta-Learning by Landmarking Various Learning Algorithms. ICML. 2000. [View Context].

Matthew Mullin and Rahul Sukthankar. Complete Cross-Validation for Nearest Neighbor Classifiers. ICML. 2000. [View Context].

Kai Ming Ting and Ian H. Witten. Issues in Stacked Generalization. J. Artif. Intell. Res. (JAIR, 10. 1999. [View Context].

Tapio Elomaa and Juho Rousu. General and Efficient Multisplitting of Numerical Attributes. Machine Learning, 36. 1999. [View Context].

Christopher J. Merz. Combining Classifiers Using Correspondence Analysis. NIPS. 1997. [View Context].

. Efficiently Updating and Tracking the Dominant Kernel Eigenspace. (a) Katholieke Universiteit Leuven Department of Electrical Engineering, ESAT-SCD-SISTA. [View Context].

Luc Hoegaerts and J. A. K Suykens and J. Vandewalle and Bart De Moor. Subset Based Least Squares Subspace Regression in RKHS. Katholieke Universiteit Leuven Department of Electrical Engineering, ESAT-SCD-SISTA. [View Context].

C. Titus Brown and Harry W. Bullen and Sean P. Kelly and Robert K. Xiao and Steven G. Satterfield and John G. Hagedorn and Judith E. Devaney. Visualization and Data Mining in an 3D Immersive Environment: Summer Project 2003. [View Context].

Johannes Furnkranz. Round Robin Rule Learning. Austrian Research Institute for Artificial Intelligence. [View Context].

Christian Borgelt and Rudolf Kruse. Speeding Up Fuzzy Clustering with Neural Network Techniques. Research Group Neural Networks and Fuzzy Systems Dept. of Knowledge Processing and Language Engineering, School of Computer Science Otto-von-Guericke-University of Magdeburg. [View Context].

Miguel Moreira and Alain Hertz and Eddy Mayoraz. Data binarization by discriminant elimination. Proceedings of the ICML-99 Workshop: From Machine Learning to. [View Context].

Johannes Furnkranz. Pairwise Classification as an Ensemble Technique. Austrian Research Institute for Artificial Intelligence. [View Context].

Edward Snelson and Carl Edward Rasmussen and Zoubin Ghahramani. Draft version; accepted for NIPS*03 Warped Gaussian Processes. Gatsby Computational Neuroscience Unit University College London. [View Context].

Citation Request:

Please refer to the Machine Learning Repository's citation policy

[1] Papers were automatically harvested and associated with this data set, in collaboration with

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML