Skip to content

Code for reading and writing machine learning data formats, in particular ARFF

License

Notifications You must be signed in to change notification settings

sdestercke/dataformat

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dataformat 0.1 - a collection of modules for reading and writing
	machine learning data sets

The goal of this project is to provide code for reading and writing
machine learning data sets for as many programming languages as
possible. Using these code, it should become much easier to have code
written in different languages speak to each other.

Currently, we are focusing on the ARFF file format
(http://weka.sourceforge.net/wekadoc/index.php/en:ARFF_%283.5.6%29),
developed in the Weka project (http://www.cs.waikato.ac.nz/~ml/).

Currently covered languages are:

  - python
  - ruby
  - matlab
  - java

C and C++ are next on the list.

ARFF is covered except for:

  - sparse features
  - date attributes
  - relational attributes
  - missing values
  - instance weights

Some things might not work as expected, in particular

  - strings with commas in them

But we're working on that.

About

Code for reading and writing machine learning data formats, in particular ARFF

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published