Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to convert vectors to matrices and vice versa #129

Merged

Conversation

nikhilpinnaparaju
Copy link

No description provided.

@hemalvarambhia
Copy link
Contributor

Hi, Nikhil. First question I have is what is driving the need for the functionality you're developing?

numOfRows isNil ifFalse: [rowNum := numOfRows ].
numOfCols isNil ifFalse: [colNum := numOfCols ].

(rowNum isNil and: colNum isNil) ifFalse: [
Copy link
Contributor

@hemalvarambhia hemalvarambhia May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referrring to the block within, I think here we might extract to an asMatrix message. It feels like you're doing that. The advantage is that there are fewer nested if and thus the code is easier to understand.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite understand what you mean here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial part of the code checks if both dimensions - num of cols and num of rows have been given. For example in numpy we can reshape a (10,1) vector to (5,-1) which means (5,2) matrix. The '-1' tells the program to automaticaly identify what the other dimension is.

Copy link
Contributor

@hemalvarambhia hemalvarambhia May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit the code is a bit difficult to follow. My motivation is to make it easier to understand, I guess. There are a lot of ifs there. Maybe we start with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have also the same concern about the readability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should drop the nil idea, personally. By expecting numbers it keeps the logic simple. It's a contractual obligation on the client to provide those and forces them to think a bit more :D

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how fundamental vectors and matrices are, it's worth spending some time cleaning it up. Both the API and the implementation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something else to keep in mind: do we want N-dimensional arrays in PolyMath one day? If yes, we should have an API that scales to more than two dimensions. Everything must be doable without references to "rows" or "columns". In particular, the distinction between "row vectors" and "column vectors" has no useful generalization to higher dimensions.

Copy link
Contributor

@hemalvarambhia hemalvarambhia May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context are N-dimensional arrays really tensors? Forgive my ignorance here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The currently fashionable name is indeed tensors. As a physicist, I prefer to reserve that word for its original meaning (a quantity that respects certain transformation laws under changes of coordinate systems), so I stick to the old-fashioned term "N-dimensional array".

@nikhilpinnaparaju
Copy link
Author

Motivation - Softmax is primarily used on vectors but it can be applied to matrices as well in a couple of ways. Either along any of the axis of the matrix or we can flatten the matrix and then apply softmax (on the matrix that is now a vector) and then reshape it back to the original shape.

Copy link
Contributor

@hemalvarambhia hemalvarambhia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, the tests read really well, @nikhilpinnaparaju. Thank you for taking the proper care! ❤️

numOfCols isNil ifTrue: [ colNum := self size / rowNum ] .
].

self assert: (rowNum * colNum = self size) description: 'Number of Rows not compatible with number of Columns'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 'rows' and 'columns' rather than 'Rows'/'Columns' as in this case they're not proper nouns.

@@ -237,6 +237,26 @@ PMVector >> productWithVector: aVector [
into: [ :sum :each | n := n + 1. (aVector at: n) * each + sum]
]

{ #category : #'as yet unclassified' }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this categorised as 'transformation'?

@SergeStinckwich SergeStinckwich merged commit d65800e into PolyMathOrg:development Jun 5, 2019
Copy link
Contributor

@hemalvarambhia hemalvarambhia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've left some messages uncategorised so consider rectifying that where you can, @nikhilpinnaparaju. Otherwise, I am happy. Thank you for addressing our comments.

@SergeStinckwich what are you thoughts on the above?


self assert: (self size = prod) description: 'Imcompatible combination of Dimensions provided'.

^true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unnecessary as you're not relying on the output in places where you send this message.

SergeStinckwich added a commit that referenced this pull request Aug 21, 2019
* Added convenience methods with tests
#atAllPut: (Matrix instance protocol)
#atColumn:put:repeat:
#initializeRows:columns:
#rows:columns: (Matrix class protocol)
#rows:columns:element: (Matrix class protocol)

* Clean class PMFloatingPointMachine and sync with the book

* Cleaning

* Code cleaning during book sync session

* Typos

* Sync with book chap 6

* Cleaning code

* Clean PCA code

* Cleaning

* Add a class similar to StandardScaler in Scikit-learn

* Add scale for PMStandardizationScaler

* Use PMMatrix for PMDataTransformer classes

* Add a transform test

* Close #77. Implement class PMStandardizationScaler for doing standardization cleaning on matrices + abstract class PMDataTransformer. 5 unit tests include.

* Add a fit: and transform: methods to PCA but apparently there is a bug. Results different from what you obtain from Python ...

* Clean PMSingularValueDecomposition

* Some optimisation for PMStandardizationScaler>>transform:

* Refactor PMPrincipalComponentAnalyser in order to have 2 implementation classes (Jacobi transformation of the covariance matrix and SVD) and one abstract classes for all PCA.
Still 2 unit tests not green.

* Cleanup setUp in PMSingularValueDecompositionTest

* Clean and sync with book

* Add flattenOnColumns and flattenOnRows + tests
Close #79

* Add argMaxOnColumns + argMaxOnRows for PMMatrix and argMax on vector.
Close #80

* Following discussion #79 and #80, I rename the flatten methods and add more tests for argmax.

* Remove unecessary code in cos, cosh, log, sin, sinh, sqrt, tan and tanh on a matrix

* Add a sign op on PMMAtrix + one test

* Remove PMPrincipalComponentAnalysisServer and tests because we change completely the API and old version got some problems with encapsulation ...

* Commit tests for PCA.

* Add one example with PCA with SVD on IRIS dataset

* Update CONTRIBUTING.md - changed 8 space indentation to 4 spaces

* Update CONTRIBUTING.md

* Moved the tests to a more sensible package

* Moved the tests to a more sensible package.

* Moved the tests to a more sensible package.

* Moved the tests to a more sensible package.

* Corrected the test - of course PMMatrix does not have the assert message as part of its interface.

* Add .project metadata

* Rename Quaternion as PMQuaternion

* QuantileTest => PMQuantileTest

* Working on #16

* Add PM prefix for Math-Tests-Random classes #16

* Add PM prefix for Math-Tests-KolmogorovSmirnov classes #16

* Fix #16 for Math-Tests-KDTree

* #16 for Math-Tests-FastFourierTransform

* Continue to work on #16

* More on #16

* [issue-85] Moved the TestCase classes to more sensible packages.

* [issue-88] Corrected the type.

* [issue-81] Added a test for the eigevalues of a 2 x 2 identity matrix as a starting point.

* [issue-81] Added another missing test to confirm that the eigenvectors of I are the unit basis vectors.

* [issue-81] Simplified the test a bit.

* [issue-81] Extracted Scikit Learn's algorithm to its own class to allow
a little more isolated testing.

* [issue-81] Refactor - improved the name of the method.

* [issue-81] Added a method to demonstrate how the argMax message works for vectors over the field of real numbers.

* [issue-81]  The signs computed by the computeSignsFromU message
match Sci-Kit Learn.

* [issue-81] To make it easier to understand the algebra, return the
signs as a vector object.

* [issue-81] Extracted a helper method and added a test, where the example is taken from scikit-learn.

* [issue-81] Simplified initialisation of the algorithm, removed obselete method.

* [issue-81] Inlined a local variable.

* [issue-81] Developed a method that computes the sign matrix for the U matrix.

* [issue-81] Removed an unused variable.

* [issue-81] Spelling error.

* [issue-81] Removed an obselete comment.

* [issue-81] Store the flipped matrices in better named state variables.

* [issue-81] Removed unused accessor methods.

* [issue-81] Corrected the comment.

* [issue-81] Added a test for uFlipped.

* [issue-81] Fixed the comment, again.

* [issue-81] Added a test for vFlipped, corrected an existing test.

* [issue-81] Removed unused state variables.

* [issue-81] Removed the unused method.

* [issue-81] Extracted duplicate arrange code to an intent-revealing set up helper.

* [issue-81] Use the helper as it generates the same set up.

* [issue-81] Use the signMatrixForU message to be consistent with vFlipped.

* [issue-81] Moved the duplicated  comments to a more sensible place.

* [issue-81] We now have a working SVD flip algorithm.

* [issue-81] Improved the name of the instanciation  method.

* Update .smalltalk.ston

* Reverse

* [issue-81] The sign flipping is now working in that the test for the SVD-based PCA implementation is now failing rather than erroring.

* [issue-81] Name of the class now conforms to the PolyMath codestyle.

* Support for principalDiagonal function #91 (#92)

* Update .travis.yml

* Update PMMatrixTest.class.st

* missing PM before a class name ...

* [Issue 81] - 2 PCA implementations that give same results but different from Python scikit-learn implementation (#96)

* [issue-81] Categorised the methods, removed duplication to the set up method.

* [issue-81] Added additional tests from the example in the Sandia sign ambiguity paper.

* [issue-81] Added tests to confirm that we would get the same flipped U and V.

* [issue-81] Renamed the set up method to follow convention.

* [issue-81] Mobbing with colleagues we decided that it sufficed, because of sign ambiguity, to compare the moduli of the matrices.

* [issue-81] Removed the computation of argMax to an certain level of precision. After discussion with colleagues it was deemed inadvisable.

* [issue-81] Extracted expected outputs to a local variable with a good name.

* [Issue 81] 2 PCA implementations that give same results but different from Python scikit-learn implementation (#97)

* [issue-81] Added an acceptance test with the mean-centred data from the Lindsay Smith PCA Tutorial paper.

* [issue-81] Improved the name of the test method.

* [issue-81] Added an acceptance for the Jacobi-based transform: message.

* [Issue 81] Refactoring/Clean Up (#98)

* [issue-81] Improved the name of the variables.

* [issue-81] Improved the name of the categories.

* [issue-81] Removed obselete comments.

* [issue-81] Emphasised the polymorphism of the two implementations of PCA in their respective tests.

* [issue-81] Removed duplicate code to set up messages.

* [issue-81] Removed unused instance variables.

* [issue-81] Use an existing message that has the same method.

* [issue-81] Moved the duplicate acceptance tests to a more sensible place.

* Add missing tests in BaselineOf

* Add more missing tests package

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* [Issue 62] - Rename packages that have DHB in their names (Part 1) (#102)

* [issue-62] Removed an empty test case.

* [issue-62] Moved the tests to a more appropriate package.

* [issue-62] Moved the test class to a more appropriate package.

* [issue-62] Removed DHB from the package names. (#103)

* Remove all DHB in baseline

* Missing packages in the baseline

* Missing Math-Tests-Core package

* Remove all DHB (even in comments)

* Remove copyright comments

* Rename tests to use CamelCase

* Update README.md

* Rename initialize to setUp in some tests

* - Remove print1On: print2On: methods and use properly printOn:
- Rename PMKolmogorovSmirnov to PMKolmogorovSmirnovSample

* Working on #62 rename TSNE => PMSTE

* - Remove Manifest classes
- work on #62 to rename classes with PM prefix

* Close #62

* Close #62

* Close #62

* fix typo.

* Don't test if a package has been loaded or not, this is manage by Baseline.

* Clean comments

* Still comments fixes

* Clean some tests

* More tests cleaning

* Remove dependancy between PM-RandomDistributionBased and PM-Polynomials

* Ongoing work on #105. Remove Math-RandomDistributionBased package and merge classes with Math-Random.

* Update .travis.yml

* Update .travis.yml

* Update appveyor.yml

* Remove all Transcript show from tests.

* Update .smalltalk.ston

* Update README.md

* Update appveyor.yml

* Update .smalltalk.ston

* [issue-104] Set the seed to zero for more certainty. (#107)

* [Issue 104] Some tests methods fail randomly (#111)

* [issue-104] Removed obselete code.

* [issue-104] Corrected the method category, improved the formatting of the code in the test method.

* [issue-104] Removed unnecessary code.

* [issue-104] Improved code formatting.

* [issue-104] Extracted the computation to an intention-revealing variable.

* [issue-104] Improved code formatting, used data from a series from slides from QMUL

* [issue-104] Separated the assert from the arrange.

* [issue-104] Improved code formatting.

* [issue-104] Removed obselete method.

* [issue-104] We need to call the super class method according to the Pharo style guide.

* [issue-104] Corrected the category of the method.

* [issue-104] Removed obselete code - here I removed them and re-ran the tests. They remained green.

* [issue-104] Corrected formatting.

* [issue-104] Improved formatting, used data from the Oxford University Statistics Dept lecture notes.

* [issue-104] Corrected the link to the data source

* Removed == operator from PMVector (#108)

The == operator should not be overridden.
Removed that block entirely due to no it having no real application.

* Issue 29 - PMVector/PMMatrix >> #dot: does not compute a dot product. (#119)

* [issue-29] Marked PMMatrix's dot message as deprecated and introduced the hadamard product in its place. Replaced calls to dot in places where we can be confident nothing breaks.

* [issue-29] Introduce the new and better-named elementwiseTimes message.

* [issue-29] Marked the PMVector dot message as deprecated.

* [issue-29] Added a separate test for the Hadamard product and reverted the element-wise multiply test to its original state.

* [issue-29] Improved the name of the method.

* Removed PMVector sum for speedup (#126)

* Implementing the t-SNE algorithm (#117)

* Added "Hbeta" function for PMTSNE with test

It is named as `entropyOf: andPRow: withBeta:`.
Added a test for the method ensuring:
- Original pVector reference is maintained
- entropy is calculated correctly
- pVector is calculated correctly

* Added `reduceXToInputDims:` method for PMTSNE

- Added corresponding test for new method.
- Renamed `ndims` to `outputDims`
- Renamed `initialdims` to `initialDims`
- Added accessor `x`

* Added method to get pairwise affinities in PMTSNE

- The new function is called `computePairwiseAffinities` (as in original paper)
- The old one is renamed to `computeLowDimentionalAffinities`
- Removed x2p function (translated it's code to `computePairwiseAffinities`)
- Also added tests

* Added progress bar(Job) for PMTSNE

Moved contents of the `start` method to `initializeJob`.

* Added gradient descent in PMTSNE

- Also added various accessors and instance vars
- Classified existing methods
- Added a test for `computePValues`

* [Issue 29] PMVector/PMMatrix >> #dot: does not compute a dot product. (#127)

* [issue-29] Removed test for a deprecated message.

* [issue-29] Replaced a call to a deprecated message, corrected a style violation.

* [issue-29] Removed deprecated messages.

* Update README.md

* Added test cases for subtraction between scalars and vectors and suggested quickfix (#121)

* Added Basic Scalar subtraction test

* added test for subtract PMVector from a Number

* QuickFix of the scalar subtraction with PMVector Issue

* Fixed ZeroDivideError in PMStandardizationScaler (#128)

This happened when one of the input features was constant.
- Set scale to 1 for the feature which is constant
- Added two tests for this case

* Remove old Math-RandomDistributionBased package from repo

* Refactored PMTSNE to include steps (#131)

New methods `step` and `postStep` have been added.
`postStep` will contain all viz and debug methods.
A new `stateVariables` contains required variables for
the steps.

* [Issue 105] Math-Random Package Is Messy - Improve Naming, Refactor Tests (#132)

* [issue-105] Removed an obselete test.

* [issue-105] Improved the name of the class. The class comment says it is a generator.

* [issue-105] Improved the name of the class.

* [issue-105] Extracted duplicate code to an intention-revealing method.

* [issue-105] Extracted duplicate code to a helper method.

* [issue-105] Inlined the method as it's only used in one place.

* [issue-105] Improved the name of the method.

* [issue-105] Corrected code style.

* [issue-105] Improved the name of the method.

* [issue-105] Removed obselete code.

* [issue-105] Corrected the category.

* [issue-105] Improved the names of the methods.

* [issue-105] Improved the names of the method.

* [issue-105] Removed obselete test.

* [issue-105] Brought back a test deleted accidentally.

* [issue-105] Made similar code even more similar.

* [issue-105] Moved the code to a better place, an intention-revealing method.

* [issue-105] Made the code consistent with that of the other tests. This reads a little better now.

* [issue-105] Made similar code look the same, improved the name of a test method by using domain language.

* Added vizualization examples for PMTSNE (#133)

- Renamed variable `computeErrorEvery` to `updateErrorEvery`
- Added `updateError` variable which calculates error only when true
- Added grid vizualization example
- Added linked-rings vizualization and generator

* Ability to convert vectors to matrices and vice versa (#129)

* Added Basic Scalar subtraction test

* added test for subtract PMVector from a Number

* QuickFix of the scalar subtraction with PMVector Issue

* Added methods for intuitive conversion between vectors and matrices

* Tests for matrix <-> vector interconversions

* Fixed implementation of reshape

* [Issue 105] Refactor Math-Random Tests Package (#134)

* [issue-105] Removed unnecessary dot.

* [issue-105] The code is doing the same thing differently, so I have made it look exactly the same in order that we can extract the duplication elsewhere later.

* [issue-105] Extracted code to an intention-revealing method.

* [issue-105] Improved the name of the test method and extracted generation of a sample to an intention revealing variable.

* [issue-105] Extracted the generation of a sample to an intention-revealing method.

* [issue-105] In the case of the Poisson generator I understand that lambda is the mean/expected value (cf. wikipedia).

* [issue-105] Introduced the expectedValue message for consistency, in view of removing the mean message completely.

* [issue-105] Corrected the message name to conform to the Pharo style.

* [issue-105] Extracted duplicate code to a test helper class.

* [issue-105] Removed obselete comment.

* [issue-105] Made the assertion the same as used in the exponential generator test in hopes of moving it to a super class.

* [issue-105] Corrected and improved the name of the class and the method therein.

* [issue-105] Improved code formatting, replaced implicitly duplicate code with the random sample generator class.

* [issue-105] Improved the names of the temporary variables.

* [issue-105] Improved the name of the assertion.

* [issue-105] Removed an obselete comment.

* [issue-105] Improved the name of the assertion.

* [issue-105] Made the name of the test clearer.

* [issue-105] Inlined the temp variable as it does not enhance understandability.

* [#122] Fix PMVector comparison operators (#123)

- Also added relevant tests for the operators

* [Issue 105] Math-Random package is messy (#135)

* [issue-105] Moved the incrementing to the bottom so we can extract the tempering to a method in the next refactor.

* [issue-105] Extracted the tempering section to an intention-revealing method.

* [issue-105] Corrected the categorisation to match the super class's.

* [issue-105] Corrected the categorisation to match the super class's.

* [issue-105] Corrected the initialisation.

* [Issue 106] Support For coveralls.io (#137)

* [issue-106] Fake it til we make it - just test the random package for now.

* [issue-106] Corrected structure.

* [issue-106] Added another package to track.

* [issue-106] Added another two packages.

* [issue-106] Added the benchmark packages.

* [issue-106] Added the Chromosome, Clustering, Complex and Core packages.

* [issue-106] Added the Distribution packages.

* [issue-106] Added the FFT and FunctionFit packages.

* [issue-106] The FFT and FunctionFit packages seem broken so we replace
it with the K-packages.

* [issue-106] Added the Math-Matrix (and any Math-M*) package.

* [issue-106] Added all Math-N* packages.

* [issue-106] Added the Math-O* packages.

* [issue-106] The Math-ODE packages seems to cause the CI job to run out of
memory, so let's proceed to the Math-P* packages.

* [issue-106] Added the Math-Q* packages.

* [issue-106] Added the coverage status to the README.

* [issue-106] Replaced hard-coded package name with something that
keeps overage open to additional R-packages.

* [issue-106] Added the TNSE package.

* [issue-106] Support For coveralls.io (#138)

* [issue-106] The link to the badge should point to the development branch.

* [issue-106] We can at least cover the FunctionFit package.

* [Issue 106] Refactor Math-Accuracy (#139)

* [issue-106] Improved message names.

* [issue-106] The happy path should not be the guard clause; normally the subclass should respond to the message.

* [issue-106] Improved the name of the block variable.

* [issue-106] Moved related code closer together. initRest goes on to populate parameters, arguments and results.

* [issue-106] The asString message send is redundant.

* [issue-106] Improved the name of the temp variable.

* [issue-106] Improved the names of the block variables.

* [issue-106] The asString is redundant as the tests remain green.

* [issue-106] Extracted code to an intention-revealing method.

* [issue-106] Simplified the calculation of the occurence of a key.

* [issue-106] Inlined the temporary variable as it is now redundant.

* [issue-106] Extracted an code to an intention-revealing local temp and improved ormatting.

* [issue-106] Inlined the variable as it over wrote the earlier assignment.

* [issue-106] The initialize call looks like a special case, so transformed it into a guard clause.

* [issue-106] Improved code formatting.

* [issue-106] The check is redundant now because we have got passed the guard clause.

* [issue-106] The nil check is unnecessary now given the ifNone: call. The call to join: is for efficiency (O(n) vs O(n^2) for concatenation). This was recommended by Pharo.

* [issue-106] Reverted the change as we are only concatenating two strings which is not super inefficient.

* [issue-106] Removed obselete code.

* [Issue 106] Get Math-ODE Covered By Coveralls.io  (#140)

* [issue-106] Cover all packages starting with Math-O.

* [issue-106] Back to hard-coding the package name (Travis ran out of memory).

* Use double dispatch for PMDualNumber (#142)

* Clean tests of AutomaticDifferenciation

* Use double dispatch for +

* Double dispatch for * in DualNumber

* Use double dispatch for *,+ and /.
Now all tests are green again

* Add a test about derivatives

* PMDualNumber is now a subclass of Number and not Magnitude.

* Add conjugated and real method (to support dual numbers with complex numbers).
Add support in PMComplex classes also.
Add green tests.

* Use absSquared instead of abs2

* Remove constant comments

* Fix formatting

* Reformat correctly PMHyperDualNumber

* [skip ci]

* [skip CI]

* Update README.md

* [skip CI]

* [skip CI]

* [skip CI]

* [skip CI]

* [skip CI]

* Update README.md

* [skip CI]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants