Diverse Data Expansion with Semi-Supervised k-Determinantal Point Processes

S.V. Johansson, O. Engkvist, M. Chehreghani and A. Schliep

In 2023 IEEE International Conference on Big Data (BigData), IEEE Computer Society, 5260–5265, Dec 2023.

Determinantal point processes (DPPs) have become prominent in data summarization and recommender system tasks for their ability to simultaneously model diversity as well as relevance. In practical applications, k-Determinantal point processes (k-DPPs) are used to yield a selection of k items from a set of size N that are the most representative of the set. In this paper, we study a special case of the diverse subset selection problem where a fixed set GO is already given as a forced recommendation and the task is to determine the remainder of the recommendation G1. The standard k-DPP optimization objectives here can suggest items that are close to optimal when considering only items in G1, but are arbitrarily close to items in G0, i.e., they might not be sufficiently diverse w.r.t. G0. We explore a semi-supervised k-DPP objective that simultaneously considers G0 and G1 and compares the difference between the two recommendations. We demonstrate our findings using multiple examples where the diverse subset selection problem with forced recommendation is important in practice.

DOI: 10.1109/BigData59044.2023.10386642.

The publication includes results from the following projects or software tools: IDADrugDesign.

Further publications by Alexander Schliep, Simon Johansson.