Apps recommendation ranking at Google Play store.

- Two-step recommendation:

- Retrieval system, which returns a short list of items as the candidate pool;

- Ranking system, which ranks all candidate items by their scores. - The score of an item (e.g. an app) is the probability of a user action label (e.g. app acquisition) conditioned on

- user features

- context features

- item features - Two objectives:

- Memorization: learning the frequent co-occurrence of items or features and exploiting the correlation available in the historical data.

- Generalization: exploring new feature combinations that have never or rarely occurred in the past based on transitivity of correlation. - Background knowledge:

- Collaborative filtering achieves memorization by using the sparse user-item interaction matrix.

- Typically, serving a query entails finding the top-k similar items to the items that this user has historically interacted with.

- Matrix factorization is a technique that reduces the dimension of user and item vectors (through SVD++, etc).

- Logistic regression is a simple, scalable and interpretable model for predicting the user action (e.g. CTR) based on manual feature engineering (or GBDT).

- Factorization machines learn low-dimensional embeddings for features and include pairwise combinations of features. - Problems with past work:

- Linear models generalize poorly to unseen feature interactions.

- Embedding-based models can over-generalize when the user-item interactions are sparse and high-rank. - The approach of the paper:

- The model includes a linear combination of a wide component and a deep component.

- The wide component is a generalized linear model with cross-product transformations.

- The deep component is a feed-forward neural network that contains 3 ReLU layers.

- The two components undergo joint training instead of simple ensemble.

- System implementation:

- The training set contains over 500 billing examples, each corresponds to one impression.

- Categorical feature strings are mapped to integer IDs, and then generate 32-dimensional embeddings.

- Continuous feature values are normalized to [0,1] based on their CDF quantiles.

- All the feature embeddings are concatenated into a ~1200-dimensional dense vector before being fed into the deep component.

- The embeddings and the linear model weights are initialized from the previous model for a new training set.

- Multithreading parallelism is used to reduce the serving latency.

- Evaluation metrics:

- acquisition rates in online A/B tests

- AUC in offline holdout data

## References:

- Cheng, Heng-Tze, et al.
*Wide & Deep Learning for Recommender Systems*. Proceedings of the 1st workshop on deep learning for recommender systems. ACM, 2016.