Star Discrepancy Subset Selection: Problem Formulation and Efficient Approaches for Low Dimensions

Article “Star Discrepancy Subset Selection: Problem Formulation and Efficient Approaches for Low Dimensions”, C. Doerr and L. Paquete, available at arXiv.

Abstract:

Motivated by applications in instance selection, we introduce the star discrepancy subset selection problem, which consists of finding a subset of m out of n points that minimizes the star discrepancy. We introduce two mixed integer linear formulations (MILP) and a combinatorial branch-and-bound (BB) algorithm for this problem and we evaluate our approaches against random subset selection and a greedy construction on different use-cases in dimension two and three. Our results show that one of the MILPs and BB are efficient in dimension two for large and small m/n ratio, respectively, and for not too large n. However, the performance of both approaches decays strongly for larger dimensions and set sizes. As a side effect of our empirical comparisons we obtain point sets of discrepancy values that are much smaller than those of common low-discrepancy sequences, random point sets, and of Latin Hypercube Sampling. This suggests that subset selection could be an interesting approach for generating point sets of small discrepancy value.