Interactive Intent Modeling Based on Probabilistic Sparse Models

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Date
2017-04-03
Department
Major/Subject
Machine Learning and Data Mining
Mcode
SCI3044
Degree programme
Master’s Programme in Computer, Communication and Information Sciences
Language
en
Pages
56+0
Series
Abstract
In the exploratory search system, the user interacts with the system by providing feedback on the relevance of the recommended documents and keywords. It is often that the user is unfamiliar with the topic she is investigating, so the system should be able to help her form a precise query and explore the information space. Typically, the exploratory search process is modeled as a contextual bandit problem, a sequential learning algorithm which adopts the recommendation strategy based on user's feedback, aiming at suggesting more precise keywords and retrieving the most relevant documents with minimum user interactions. One big challenge in the exploratory search is that the corpus in which a bandit algorithm explores is huge while the feedback from the user is always scarce, leading to a non-trivial learning problem with large dimensionality and limited observations. In this thesis, I tackle this challenge by adopting Bayesian linear regression with spike and slab priors which enforce sparsity on the feature space, so the bandit algorithm could narrow down the search to the most relevant documents. I incorporate the Expectation Propagation algorithm to approximate the posterior distribution of the sparse model, Thompson sampling to address the exploration-exploitation dilemma, and Topic model to discover the structure of the documents which could provide group information that can further constrain the search space to specific topics. To assess the models, I simulate the user behavior in an exploratory search process and compare the model coefficients learned by linear models using Gaussian prior and, spike-and-slab prior with or without group information. Several performance metrics are also evaluated. Empirically, the spike-and-slab with or without group information perform similarly and outperform Gaussian prior which does not encourage sparsity. The learned model coefficients justify the assumption that most of the coefficients do not contribute to the model. Besides, the model of group spike-and-slab prior has fewer coefficients needed to be estimated than spike-and-slab prior without group information, and potentially could be applied to larger corpora.
Description
Supervisor
Kaski, Samuel
Thesis advisor
Daee, Pedram
Keywords
exploratory search, Bayesian sparse linear model, multi-armed bandit, Thompson sampling, spike and slab priors
Other note
Citation