Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lda opt #99

Open
wants to merge 136 commits into
base: master
Choose a base branch
from
Open

Lda opt #99

wants to merge 136 commits into from

Conversation

hyu596
Copy link

@hyu596 hyu596 commented Oct 15, 2018

This PR implements distributed Latent Dirichlet Allocation using the Cirrus system and interface.

@hyu596 hyu596 force-pushed the lda_opt branch 4 times, most recently from e319875 to 767aeb5 Compare January 31, 2019 22:12
@hyu596 hyu596 closed this Jan 31, 2019
@hyu596 hyu596 reopened this Jan 31, 2019
@hyu596 hyu596 closed this Feb 14, 2019
@hyu596 hyu596 reopened this Feb 14, 2019
Copy link
Owner

@jcarreira jcarreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look.


return config

def launch_ps(self, command_dict=None):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs some work to improve a few things:

  1. The lambdas should be kept alive by the automate.py (as is done in the other algorithms).
  2. The parameter server should be launched by the automate.py (see logistic_regression,ipynb)

};

static constexpr int WORKERS_BASE[5] = {-1, 3, -1, -1, 4};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

src/LDAModel.cpp Outdated
K_ = load_value<int16_t>(info);

// load the current topic assignments
t.clear();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is the constructor, t/d/w are empty vectors. No need for clear.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

src/Tasks.h Outdated
std::vector<std::vector<int>>& topic_scope);

private:
std::array<int, VOCAB_DIM_UPPER> lookup_map;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit dangerous. If VOCAB_DIM_UPPER is too big it can lead to crashes be cause std::array puts data on the stack.

You can replace this with an std::vector and then resize to VOCAB_DIM_UPPER.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens in other places. We can fix this later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lookup_map is now vector and its initializing part (filling with -1) is in the beginning of LoadingLDATaskS3.run function.

// Storing local variables (LDAStatistics)
for (unsigned int i = 1; i < num_s3_objs + 1; ++i) {
// for (unsigned int i = 1; i < 3; ++i) {
std::vector<int> w;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to right before count_dataset().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

float est_time_one_iter =
((get_time_ms() - start_time) / 1000.) / full_iteration;

if (elapsed_sec > (lambda_time_out - est_time_one_iter - 10.)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much code here. Would put this code below in a separate function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code refactoring has been done. Please let me know if it remains messy.

}
}
int since_start_sec = (get_time_ms() - start_time) / 1000;
if (since_start_sec > benchmark_time) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing, this should go into separate function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^Code refactoring has been done. Please let me know if it remains messy.

*/
void load_serialized_indices(char* mem_begin);

std::vector<std::unique_ptr<std::thread>> help_upload_threads;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem used

Copy link
Author

@hyu596 hyu596 Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's being used now [1]. Previously help_upload_threads was removed since in all the experiments I ensured that each worker was assigned with exactly one LDAStatistics and thus it was not needed.

[1] https://github.com/hyu596/cirrus-1/blob/lda_opt/src/LDATaskS3.cpp#L201

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Work has been done to ensure it works fine.

uint64_t to_send_size,
int& upload_lock,
int bucket_id) {
while (true) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't understand this code. As far as I can understand this task is single threaded (help_upload_threads is not used).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^

bootstrap.sh Outdated
fi
cd lz4-dev
make
make install
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the LDA tests are failing on travis because we don't have permissions to do 'make install' there. To fix this do:

  1. Remove this make install and sudo make install.
  2. Add to the .travis.yml file an "apt-get install liblz4-dev".
  3. Add liblz4-dev to the cirrus/README

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants