Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross validation dataset iterators are written specifically for DenseDesignMatrix #1556

Open
se4u opened this issue Aug 21, 2015 · 0 comments

Comments

@se4u
Copy link

se4u commented Aug 21, 2015

  1. The code in pylearn2/cross_validation/dataset_iterators.py calls n = dataset.X.shape[0] in many places. These statements should be changed to n = dataset.get_num_examples() because that's the standard way according to the base interface.
  2. The code of DatasetCV class in pylearn2/cross_validation/dataset_iterators.py is specialized to the DenseDesignMatrix constructors and it does not work with VectorSpacesDataset. In the absence of a standard constructor in the Dataset abstract interface, I added some exception handling and an assertion to check that the type of datasets used during cross validation is the same as the original dataset. This is not a great solution but at least the assertion is useful.
-                X, y = data
-                datasets[label] = DenseDesignMatrix(X=X, y=y)
+                try:
+                    X, y = data
+                    data_subset = DenseDesignMatrix(
+                        X=X, y=y, X_labels=self.dataset.X_labels,
+                        y_labels=self.dataset.y_labels)
+                except:
+                    data_subset = self.dataset.__class__(
+                        data=data, data_specs=self.dataset.data_specs)
+                assert isinstance(data_subset, self.dataset.__class__)
+                datasets[label] = data_subset
  • Another thing (not an issue) is that the following check in the constructor for FiniteDatasetIterator in pylearn2/utils/iteration.py throws up unless the yaml file is formatted with !!python/tuple directives for the data_specs. A helpful message could be added at this point suggesting this fix.
# Code that throws up when source is list instead of tuple.
898 if not isinstance(source, tuple):
899            source = (source,)
...
904 assert len(convert) == len(source), "Try and change dataset data_specs" + \
             " in yaml file to !!python/tuple [ 'a', 'b']"
# Fixed yaml file.
data_specs: !!python/tuple [ 'a', 'b'],
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant