Apps recommendation ranking at Google Play store.
- Two-step recommendation:
- Retrieval system, which returns a short list of items as the candidate pool;
- Ranking system, which ranks all candidate items by their scores. - The score of an item (e.g. an app) is the probability of a user action label (e.g. app acquisition) conditioned on
- user features
- context features
- item features - Two objectives:
- Memorization: learning the frequent co-occurrence of items or features and exploiting the correlation available in the historical data.
- Generalization: exploring new feature combinations that have never or rarely occurred in the past based on transitivity of correlation. - Background knowledge:
- Collaborative filtering achieves memorization by using the sparse user-item interaction matrix.
- Typically, serving a query entails finding the top-k similar items to the items that this user has historically interacted with.
- Matrix factorization is a technique that reduces the dimension of user and item vectors (through SVD++, etc).
- Logistic regression is a simple, scalable and interpretable model for predicting the user action (e.g. CTR) based on manual feature engineering (or GBDT).
- Factorization machines learn low-dimensional embeddings for features and include pairwise combinations of features. - Problems with past work:
- Linear models generalize poorly to unseen feature interactions.
- Embedding-based models can over-generalize when the user-item interactions are sparse and high-rank. - The approach of the paper:
- The model includes a linear combination of a wide component and a deep component.
- The wide component is a generalized linear model with cross-product transformations.
- The deep component is a feed-forward neural network that contains 3 ReLU layers.
- The two components undergo joint training instead of simple ensemble.
- System implementation:
- The training set contains over 500 billing examples, each corresponds to one impression.
- Categorical feature strings are mapped to integer IDs, and then generate 32-dimensional embeddings.
- Continuous feature values are normalized to [0,1] based on their CDF quantiles.
- All the feature embeddings are concatenated into a ~1200-dimensional dense vector before being fed into the deep component.
- The embeddings and the linear model weights are initialized from the previous model for a new training set.
- Multithreading parallelism is used to reduce the serving latency.
- Evaluation metrics:
- acquisition rates in online A/B tests
- AUC in offline holdout data
References:
- Cheng, Heng-Tze, et al. Wide & Deep Learning for Recommender Systems. Proceedings of the 1st workshop on deep learning for recommender systems. ACM, 2016.