Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

Browse Through:

Default Task

Classification (19)
Regression (3)
Clustering (0)
Other (1)

Attribute Type

Categorical (8)
Numerical (3)
Mixed (10)

Data Type

Multivariate (20)
Univariate (1)
Sequential (0)
Time-Series (0)
Text (1)
Domain-Theory (0)
Other (2)


Life Sciences (8)
Physical Sciences (1)
CS / Engineering (2)
Social Sciences (4)
Business (0)
Game (2)
Other (5)

# Attributes

Less than 10 (8)
10 to 100 (11)
Greater than 100 (2)

# Instances

Less than 100 (1)
100 to 1000 (13)
Greater than 1000 (7)

Format Type

Matrix (20)
Non-Matrix (2)

22 Data Sets

Table View  List View

1. Abalone: Predict the age of abalone from physical measurements

2. Adult: Predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset.

3. Annealing: Steel annealing data

4. Anonymous Microsoft Web Data: Log of anonymous users of; predict areas of the web site a user visited based on data on other areas the user visited.

5. Arrhythmia: Distinguish between the presence and absence of cardiac arrhythmia and classify it in one of the 16 groups.

6. Artificial Characters: Dataset artificially generated by using first order theory which describes structure of ten capital letters of English alphabet

7. Audiology (Original): Nominal audiology dataset from Baylor

8. Audiology (Standardized): Standardized version of the original audiology database

9. Auto MPG: Revised from CMU StatLib library, data concerns city-cycle fuel consumption

10. Automobile: From 1985 Ward's Automotive Yearbook

11. Badges: Badges labeled with a "+" or "-" as a function of a person's name

12. Balance Scale: Balance scale weight & distance database

13. Balloons: Data previously used in cognitive psychology experiment; 4 data sets represent different conditions of an experiment

14. Breast Cancer: Breast Cancer Data (Restricted Access)

15. Breast Cancer Wisconsin (Diagnostic): Diagnostic Wisconsin Breast Cancer Database

16. Breast Cancer Wisconsin (Original): Original Wisconsin Breast Cancer Database

17. Breast Cancer Wisconsin (Prognostic): Prognostic Wisconsin Breast Cancer Database

18. Car Evaluation: Derived from simple hierarchical decision model, this database may be useful for testing constructive induction and structure discovery methods.

19. Census Income: Predict whether income exceeds $50K/yr based on census data. Also known as "Adult" dataset.

20. Chess (King-Rook vs. King-Knight): Knight Pin Chess End-Game Database Creator

21. Chess (King-Rook vs. King-Pawn): King+Rook versus King+Pawn on a7 (usually abbreviated KRKPA7).

22. Pittsburgh Bridges: Bridges database that has original and numeric-discretized datasets

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML