Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

[High-Level-API] Rewrite Chapter 5 Personalized Recommendation in Book to use new Flui… #526

Conversation

nickyfantasy
Copy link
Contributor

@nickyfantasy nickyfantasy commented May 31, 2018

…d API

I will add plot in next PR


Our program starts with importing necessary packages and initializes some global variables:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starts with importing necessary packages and initializing

```

Movie title, a sequence of words represented by an integer word index sequence, will be feed into a `sequence_conv_pool` layer, which will apply convolution and pooling on time dimension. Because pooling is done on time dimension, the output will be a fixed-length vector regardless the length of the input sequence.
Movie title, which is a sequence of words represented by an integer word index sequence, will be feed into a `sequence_conv_pool` layer, which will apply convolution and pooling on time dimension. Because pooling is done on time dimension, the output will be a fixed-length vector regardless the length of the input sequence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be fed


Finally, we can use cosine similarity to calculate the similarity between user characteristics and movie features.
Finally, we can define a `inference_program` that use cosine similarity to calculate the similarity between user characteristics and movie features.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an inference_program that uses


Before jumping into creating a training module, algorithm setting is also necessary. Here we specified Adam optimization algorithm via `paddle.optimizer`.
Next we define data feeders for test and train. The feeder reads a `BATCH_SIZE` of data each time and feed them to the training/testing process.
`paddle.dataset.movielens.train` will yield records during each pass, after shuffling, a batch input of `buf_size` is generated for training.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sentence is not clear. Plus, buf_size is larger than BATCH_SIZE. I think the logic is reversed...


`paddle.dataset.movielens.train` will yield records during each pass, after shuffling, a batch input is generated for training.
Create a trainer that takes `train_program` as input and specifies optimizer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create ... and specify

if step % 100 == 0: # every 100 batches, update cost plot
cost_ploter.plot()
Use create_lod_tensor(data, lod, place) API to generate LoD Tensor, where `data` is a list of sequences of index numbers, `lod` is the level of detail (lod) info associated with `data`.
For example, data = [[10, 2, 3], [2, 3]] means that it contains two sequences of indexes, of length 3 and 2, respectively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexes => indices

Finally, we can invoke `trainer.train` to start training:
### Infer

Now we can infer with inputs that matched with the yield records that we provide in `feed_order` during training.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matched => match

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is not clear. Maybe break it into two?

@@ -98,13 +98,13 @@ Figure 4. A hybrid recommendation model.

We use the [MovieLens ml-1m](http://files.grouplens.org/datasets/movielens/ml-1m.zip) to train our model. This dataset includes 10,000 ratings of 4,000 movies from 6,000 users to 4,000 movies. Each rate is in the range of 1~5. Thanks to GroupLens Research for collecting, processing and publishing the dataset.

`paddle.v2.datasets` package encapsulates multiple public datasets, including `cifar`, `imdb`, `mnist`, `moivelens` and `wmt14`, etc. There's no need for us to manually download and preprocess `MovieLens` dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I told Nicki the same. He really has a keen sight 👀

```

Finally, we can invoke `trainer.train` to start training:
### Infer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inference

@jetfuel jetfuel changed the title Rewrite Chapter 5 Personalized Recommendation in Book to use new Flui… [High-Level-API] Rewrite Chapter 5 Personalized Recommendation in Book to use new Flui… Jun 5, 2018
},
return_numpy=False)

print("infer results: ", np.array(results[0]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we show a comparison between prediction and the real data? For example, user 23::M::35::0::90049 rated movie 2278::Ronin (1998)::Action|Crime|Thriller a 4.0 score. Our prediction is 3.458

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Contributor

@sidgoyal78 sidgoyal78 Jun 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, i think it would be helpful

@nickyfantasy nickyfantasy merged commit e82c7d4 into PaddlePaddle:high-level-api-branch Jun 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants