特征
交互
数据
深度
点击率
模型
新趋势
张伟楠
张伟楠 上海交通大学http:/DataFunSummit从特征交互到数据交互:深度点击率模型的新趋势From Feature Interaction to Sample Interaction:the new Trend for Deep CTR ModelsAn Display Ad Examplehttp:/Click or not?User Response Estimation Problem Click-through rate estimation as an exampleDate:20160320Hour:14Weekday:7IP:119.163.222.*Region:EnglandCity:LondonCountry:UKAd Exchange:GoogleDomain:yahoo.co.ukURL:http:/www.yahoo.co.uk/abc/xyz.htmlOS:WindowsBrowser:ChromeAd size:300*250Ad ID:a1890User tags:Sports,ElectronicsClick(1)or not(0)?Predicted CTR(0.15)CTR and Data Click-Through Rate(CTR)Estimation is an essential task It predicts the probability of a user clicking an item based on thelogged behavior data Applications:e-commerce,online advertising,recommendersystems Data format:multi-categorical data with binary labels CTR estimation as a binary classification formulationWeinan Zhang,Jiarui Qin,Wei Guo,Ruiming Tang,Xiuqiang He.Deep Learning for Click-Through Rate Estimation.IJCAI 2021(Survey Track)CTR Models(2007 to 2021)Developing trend of CTR estimation models Feature engineering requirementWeinan Zhang,Jiarui Qin,Wei Guo,Ruiming Tang,Xiuqiang He.Deep Learning for Click-Through Rate Estimation.IJCAI 2021(Survey Track)Focus of this talkModel capacityFeature Representation Binary one-hot encoding of categorical datax=Weekday=Wednesday,Gender=Male,City=Londonx=0,0,1,0,0,0,0 0,1 0,0,1,00High dimensional sparse binary feature vector Label representationy=1(click)or 0(non-click)Logistic Regression Prediction Cross Entropy Loss Stochastic Gradient Descent LearningLee et al.Estimating Conversion Rate in Display Advertising from Past Performance Data.KDD 12Factorization Machines Prediction based on feature embeddingOentaryo et al.Predicting response in mobile advertising with hierarchical importance-aware factorization machine.WSDM 14.Rendle.Factorization machines.ICDM 2010.Logistic RegressionFeature InteractionsFor x=Weekday=Friday,Gender=Male,City=ShanghaiCombined Models:GBDT+LRHe et al.Practical Lessons from Predicting Clicks on Ads at Facebook.ADKDD 2014.Combined Models:GBDT+FMhttp:/www.csie.ntu.edu.tw/r01922136/kaggle-2014-criteo.pdfNeural Network Models Difficulty:Impossible to directly deploy neural network models on such data1M500500ME.g.,input features 1M,first layer 500,then 500M parameters for first layerReview Factorization Machines Prediction based on feature embedding Embed features into a k-dimensional latent space Explore the feature interaction patterns using vector inner-productOentaryo et al.Predicting response in mobile advertising with hierarchical importance-aware factorization machine.WSDM 14.Rendle.Factorization machines.ICDM 2010.Logistic RegressionFeature InteractionsFactorization Machine is a Neural NetworkZhang et al.Deep Learning over Multi-field Categorical Data A Case Study on User Response Prediction.ECIR 16.Factorisation Machine InitialisedFactorization-machine supported Neural Networks(FNN)Zhang et al.Deep Learning over Multi-field Categorical Data A Case Study on User Response Prediction.ECIR 16.Factorization-machine supported Neural Networks(FNN)Chain rule to update factorisation machine parametersBut factorization machine is still different from common additive neural networksProduct Operations as Feature InteractionsYanru Qu et al.Product-based Neural Networks for User Response Prediction.ICDM 2016.Feature Interaction Feature interaction operators Product operator PNN(IPNN,OPNN)1,NFM 2,DeepFM:inner and outerproduct PIN 3:micro neural network as feature interaction operator DCN 4(DCN V2 5),CIN 6:cross network1 Qu et al.Product-based neural networks for user response prediction.ICDM,2016.2 He et al.Neural factorization machines for sparse predictive analytics.SIGIR,2017.3 Qu et al.Product-based neural networks for user response prediction over multi-field categorical data.TOIS,2018.4 Wang et al.Deep&cross network for ad click predictions.ADKDD,2017.5 Wang et al.Dcn-m:Improved deep&cross network for feature cross learning in web-scale learning to rank systems.6 Lian et al.xdeepfm:Combining explicit and implicit feature interactions for recommender systems.KDD 2018.Feature Interaction Feature interaction operators Convolutional operators Use convolution networks to model the feature interaction onthe feature map CCPM 1,FGCNN 2 GCN:FiGNN 31 Liu et al.A convolutional click prediction model.CIKM,2015.2 Liu et al.Feature generation by convolutional neural network for click-through rate prediction.WWW 2019.3 Li et al.Fi-gnn:Modeling feature interactions via graph neural networks for ctr prediction.CIKM,2019.Feature Interaction Feature interaction operators Attention operators Use attention mechanism to weight different featureinteractions:AFM 1,FiBiNet 2 Self attention:AutoInt 3,InterHAt 41 Xiao et al.Learning the weight of feature interactions via attention networks.IJCAI,2017.2 Huang et al.Fibinet:combining feature importance and bilinear feature interaction for click-through rate prediction.RecSys 2019.3 Song et al.Autoint:Automatic feature interaction learning via self attentive neural networks.CIKM,2019.4 Li et al.Interpretable click-through rate prediction through hierarchical attention.WSDM 2020.User Behavior Modeling Basic structure of user behavior modeling based CTRestimation Behavior feature is a special multi-value feature consistsof the item sequence one user has interacted with Model the temporal patterns inside the behaviorsequencesUser Behavior Modeling Attention based models:Attention is utilized to model the importance of differentbehaviors of user to the prediction target DIN 1,DIEN 2(GRU+Attention)Self-attention:BST 3;Session-split behavior model:DSIN 41 Zhou et al.Deep interest network for clickthrough rate prediction.KDD 2018.2 Zhou et al.Deep interest evolution network for click-through rate prediction.AAAI 2019.3 Chen et al.Behavior sequence transformer for e-commerce recommendation in alibaba.DLP-KDD,2019.4 Li et al.Deep session interest network for click-through rate prediction.IJCAI 2019.DIENUser Behavior Modeling Memory network based models:1 Ren et al.Lifelong sequential modeling with personalized memorization for user response prediction.SIGIR 2019.2 Pi et al.Practice on long sequential user behavior modeling for click-through rate prediction.KDD 2019.Use memory network to storethe user states and updatethe states when newbehaviors happen Model very long userbehavior sequence HPMN 1,MIMN 2User Behavior Modeling Retrieval based models:Use retrieved relevant behaviors instead of recentconsecutive behaviors Especially useful for utilizing very long user behaviorsequences Build index structure for retrieving behaviors UBR 1,SIM 2,RIM 31 Qin et al.User behavior retrieval for click-through rate prediction.SIGIR 2020.2 Pi et al.Search-based user interest modeling with lifelong sequentialbehavior data for click-through rate prediction.CIKM 2020.3 Qin et al.Retrieval&interaction machine for tabular data prediction.KDD 2021.As already hinted in HPMNUser Behavior RetrievalQin et al.User behavior retrieval for click-through rate prediction.SIGIR 2020.UBR Architecture Prediction module:Attention-based Prediction Network User behavior retrieval module:Feature selection model using self-attention:UBR Experiments UBR achieves significant performance improvement on open CTR benchmarks The retrieved set size should be tuned over datasetsSearch-based Interest ModelPi Qi et al.Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction.CIKM 2020.Extending from User Modeling to Sample InteractionJiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.Extending from User Modeling to Sample Interaction Traditional Models for Tabular Data Prediction LR,GBDT,SVM Feature interaction-based models(FM,DeepFM,PIN,Fi-GNN)Sequential behavioral models(DIN,DIEN,UBR)General Tabular Data Traditional models could be categorized assingle-row-multi-column The relations and interactions between different rows(samples)areignoredJiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.General Tabular Data It could be beneficial if considers the neighbor samples(kNN)as additional discriminative information Multi-row-multi-column:1.retrieve the relevant rows,2.aggregate,3.interact with target row(RIM,Retrieval&Interaction Machine)Jiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.RIM:Retrieval&Interaction Machine Overall idea:using neighbor samples(rows)to assist the inference of the target sample(row)RIM architecture Retrieval module Retrieval pool Ranking function Aggregation Prediction module Interaction functions Deep layersJiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.Retrieval Module Use search engine techniques to retrieve relevant samplesfrom the retrieval pool Target sample as“query”Data samples as“documents”Features as“terms”Indexing&Storage Inverted index is used to store the data samplesJiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.Retrieval Module Query:use target sampleas the query Ranking function:BM25,top-Kresults will be retrievedJiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.Retrieval Module Target-row-aware order-invariant aggregation function Attention mechanism as Label information is also aggregatedJiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.Prediction Module To better model the relations between the target and retrieved samples,we use a feature interaction unit Intra-sample feature interaction Cross-sample feature interaction Deep layers follow the interaction layer and output the result Loss function depend on the specific task(Logloss,MSE etc)Jiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.Practical Issues Time complexity analysis Time of building the index(offline,once)Retrieve Accessing inverted index Scoring Speed up Do the retrieval process offline and store the retrievalresults Down sampling on the retrieval poolJiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.:Total number of unique features in:Retrieval pool:Feature column countExperimental Results RQ 1:Does RIM work?Experiments are conducted on 11 datasets of three differenttasks:CTR prediction,Top-n ranking and regression Exp Group 1-1(CTR Prediction,against sequential behavioralmodels)Jiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.Experimental Results Exp Group 1-2(CTR Prediction,against feature interactionmodels)Jiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.Experimental Results Exp Group 2(Top-n ranking,against sequentialrecommendation models)Jiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.Experimental Results Exp Group 3(Regression task:rating prediction)Jiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.Experimental Results RQ 2:About different retrieval approaches Three different retrieval mechanism:Random retrieval Filtered retrieval(filter by user id)RIMJiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.Experimental Results RQ 3:About different interaction functions Three different interaction functions:Inner product Kernel product Micro-networkJiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.Experimental Results RQ 4:About different sizes of the retrieval set There exists a best size of retrieval samples RQ 5:About the necessity of the label information Label information of the retrieved samples are important to theperformance of RIMJiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.Possible Future Directions of RIM More flexible and powerful retrieval function From fixed to parametric function Effective training algorithm Highly efficient DB system for tabular data retrieval Online inference Mini-batch training More complex relations between different data instancesother than feature similarity Time Causal relations Different data interaction models GNNsSummary Current trends 2019 now:AutoML for CTR model design 2020 now:sample interaction paradigm Past developments 2000 2015:from linear models to bi-linear models 2015 2019:deep models on feature interaction miningThank You!Questions?Jiarui Qin,Weinan Zhang,Xin Wu,Jiarui Jin,Yuchen Fang and Yong Yu.User Behavior Retrieval for Click-Through Rate Prediction.SIGIR 2020.https:/arxiv.org/abs/2005.14171Jiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu.Retrieval&Interaction Machine for Tabular Data Prediction.KDD 2021.https:/arxiv.org/abs/2108.05252Jiarui QinXin WuYuchen FangChenxu ZhuWeinan Zhang,Jiarui Qin,Wei Guo,Ruiming Tang,Xiuqiang He.Deep Learning for Click-Through Rate Estimation.IJCAI 2021.https:/arxiv.org/abs/2104.10584And special thanks to Huawei Noahs Ark LabStudent team of SJTUFind more at my website http:/