What file formats should be supported for data and models? #30

ryanbressler · 2014-02-04T19:59:12Z

No description provided.

ryanbressler · 2014-02-24T16:32:46Z

Libsvm file format has been requested here:

ryanbressler · 2014-03-01T21:21:18Z

ARFF and possibly unlabeled csv as commonly used by machine learning reopos

ryanbressler · 2014-03-02T00:08:49Z

Basic arff support is in and csv is supported now but only if you use it as a library since you need to define feature types.

Wondering if sparse arff and libsvm should be included and if a sparse feature representation is needed to do them well.

ryanbressler · 2014-03-02T04:11:19Z

maybe C4.5:

http://www.cs.washington.edu/dm/vfml/appendixes/c45.htm

ryanbressler · 2014-03-03T20:49:55Z

basic libsvm support is in

tungntdhtl · 2014-04-14T15:26:51Z

How can I grow a cloudRF with libsvm file? (I don't know which a target to declare).
e.g:
~/cloudRF/growforest -train usps.libsvm -rfpred usps.sf -target ??? -nTrees 1000
where usps.libsvm is a training data file.

ryanbressler · 2014-04-14T15:38:25Z

-target 0 should do it since the target is in the first column and their aren't column names

tungntdhtl · 2014-04-14T15:57:13Z

I received some errors as below:
~/cloudRF/growforest -train usps -rfpred usps.sf -target 0 -nTrees 500
Threads : 1
nTrees : 500
Loading data from: usps
panic: runtime error: index out of range

goroutine 1 [running]:
runtime.panic(0x8186020, 0x836d037)
/usr/local/go/src/pkg/runtime/panic.c:266 +0xac
github.com/ryanbressler/CloudForest.ParseAFM(0xb772bab8, 0x18600468, 0x836fd50)
/home/gm/golang/gopath/src/github.com/ryanbressler/CloudForest/featurematrix.go:294 +0xccd
github.com/ryanbressler/CloudForest.LoadAFM(0xbff7f407, 0x4, 0x0, 0x0, 0x0)
/home/gm/golang/gopath/src/github.com/ryanbressler/CloudForest/featurematrix.go:367 +0x2d4
main.main()
/home/gm/golang/gopath/src/github.com/ryanbressler/CloudForest/growforest/growforest.go:168 +0x1009

ryanbressler · 2014-04-14T16:04:46Z

You need to rename usps to usps.libsvm so that growforest knows how to parse it.

ryanbressler · 2014-04-14T16:08:33Z

Also do an update if you haven't as I recently fixed some small bugs with libsvm support.

tungntdhtl · 2014-04-14T16:15:09Z

Great! It is running.
You should write some comments abt this for CloudRF's users :)
Thanks Ryan!

tungntdhtl · 2014-04-15T03:50:54Z

ryanbressler commented "-target 0 should do it since the target is in the first column and their aren't column names"
How does CloudRF recognite the data type of the target response? (B:, N:, or C:)

ryanbressler · 2014-04-15T04:27:41Z

It checks to see if the first entry is an int or a float. Ints are handled
as C of B. Floats as N...if you want regression and the first entry is an
int just make sure it is written with a decimal point (ie 0.0 non 0)

On Mon, Apr 14, 2014 at 9:50 PM, tungntdhtl [email protected]:

ryanbressler commented "-target 0 should do it since the target is in the
first column and their aren't column names"
How does CloudRF recognite the data type of the target response? (B:, N:,
or C:)

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/30#issuecomment-40443177
.

tungntdhtl · 2014-04-15T05:05:23Z

OK, thanks! That is a good way.
It also can read spare libsvm format file, right?
i.e. Xi and Yi represent such as col:value
e.g data with 100 features: 3 1:1 5:2.5 16:8 19:0.4 50:-1.2 55:1 72:4 85:6 90:3.2 98: 3.8 100: 6.2

ryanbressler · 2014-04-15T05:20:44Z

Yes, all unspecified features will be assumed to be zero.

On Mon, Apr 14, 2014 at 11:05 PM, tungntdhtl [email protected]:

OK, thanks! That is a good way.
It also can read spare libsvm format file, right?
i.e. Xi and Yi represent such as col:value
e.g data with 100 features: 3 1:1 5:2.5 16:8 19:0.4 50:-1.2 55:1 72:4 85:6
90:3.2 98: 3.8 100: 6.2

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/30#issuecomment-40446130
.

tungntdhtl · 2014-04-15T05:59:08Z

In LIBSVM file containing lots of records (e.g 60,000,000), how can I build trees in couldRF?

I try setting a portion of total records using "nSamples=0.1" option, that means cloudRF works only 10% of total sample?
If yes, how can I take a bootstrap samples of total records using their portion? i.e. each tree grows from 10% of total records, each 10% records was random samples from total records

ryanbressler · 2014-04-15T06:06:36Z

Random forest bags samples independently for each tree so I think it is
already doing what you are asking for.

On Mon, Apr 14, 2014 at 11:59 PM, tungntdhtl [email protected]:

In LIBSVM file containing lots of records (e.g 60,000,000), how can I
build trees in couldRF?

I try setting a portion of total records using "nSamples=0.1" option, that
means cloudRF works only 10% of total sample?
If yes, how can I take a bootstrap samples of total records using their
portion? i.e. each tree grows from 10% of total records, each 10% records
was random samples from total records

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/30#issuecomment-40448391
.

tungntdhtl · 2014-04-15T06:15:16Z

I mean RF struggles to build trees from large samples size because of a tree size is large.
In cloudRF, RF can grow from a portion of total records.
My question is that what is the scope of that portion? it uses all bagged records or just only small records independently (e.g 10%).

ryanbressler mentioned this issue Feb 24, 2014

CloudRF for large data #31

Closed

ryanbressler added enhancement and removed enhancement labels Feb 24, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What file formats should be supported for data and models? #30

What file formats should be supported for data and models? #30

ryanbressler commented Feb 4, 2014

ryanbressler commented Feb 24, 2014

ryanbressler commented Mar 1, 2014

ryanbressler commented Mar 2, 2014

ryanbressler commented Mar 2, 2014

ryanbressler commented Mar 3, 2014

tungntdhtl commented Apr 14, 2014

ryanbressler commented Apr 14, 2014

tungntdhtl commented Apr 14, 2014

ryanbressler commented Apr 14, 2014

ryanbressler commented Apr 14, 2014

tungntdhtl commented Apr 14, 2014

tungntdhtl commented Apr 15, 2014

ryanbressler commented Apr 15, 2014

tungntdhtl commented Apr 15, 2014

ryanbressler commented Apr 15, 2014

tungntdhtl commented Apr 15, 2014

ryanbressler commented Apr 15, 2014

tungntdhtl commented Apr 15, 2014

What file formats should be supported for data and models? #30

What file formats should be supported for data and models? #30

Comments

ryanbressler commented Feb 4, 2014

ryanbressler commented Feb 24, 2014

ryanbressler commented Mar 1, 2014

ryanbressler commented Mar 2, 2014

ryanbressler commented Mar 2, 2014

ryanbressler commented Mar 3, 2014

tungntdhtl commented Apr 14, 2014

ryanbressler commented Apr 14, 2014

tungntdhtl commented Apr 14, 2014

ryanbressler commented Apr 14, 2014

ryanbressler commented Apr 14, 2014

tungntdhtl commented Apr 14, 2014

tungntdhtl commented Apr 15, 2014

ryanbressler commented Apr 15, 2014

tungntdhtl commented Apr 15, 2014

ryanbressler commented Apr 15, 2014

tungntdhtl commented Apr 15, 2014

ryanbressler commented Apr 15, 2014

tungntdhtl commented Apr 15, 2014