Nuit Blanche: Phase Transitions in Group Testing

Wednesday, September 16, 2015

Phase Transitions in Group Testing

Great, phase transitions in group testing problems!

Much like previously, when I don't see a citation in the reference section I always wonder if the paper provides a new or a very new insight from previously [0,1,2,3]. What is most fascinating here is that some of these recent results on group testing, are coming from different communities and it is definitely a challenge to delineate which work provides the most advances.

For one, I am also convinced that, much like what happened in compressive sensing, the use of empirical phase transitions (read the Map Makers) goes a long way toward avoid endless discussions about asymptotics for real life reconstruction solvers. Without further due:

Phase Transitions in Group Testing by Scarlett, Jonathan; Cevher, Volkan

We consider the fundamental limits on the number of tests required for the group testing problem, which has applications in medical testing, fault detection, communication protocols, pattern matching, and database systems. Specifically, we have p items labeled 1,…,p, among which a uniformly random subset S of cardinality k are defective. The goal is to recover the set S based on n possibly noisy measurements, each taking the form Y = \boldsymbol{1}\bigg{ \bigcup_{i \in S} {X_i = 1} \bigg} \oplus Z, \nonumber where X=(X1,…,Xp)∈0,1p is a measurement vector indicating the items involved in each test, and Z∈0,1 represents possible noise under modulo-2 addition. We consider Bernoulli designs, in which each entry of each measurement vector is independently drawn at random from a Bernoulli distribution. In the noiseless case with k=Θ(pθ) (θ≤13), we show the following for an optimal recovery algorithm: n \ge \Big( k\log_2\frac{p}{k} \Big) (1+\eta) \implies \PP[\mathrm{error}] \to 0 \nonumber
n \le \Big( k\log_2\frac{p}{k} \Big) (1-\eta) \implies \PP[\mathrm{error}] \to 1 \nonumber for any η>0, where an error means that S^≠S, with S^ denoting the estimate of S. That is, we provide an exact threshold at which the error probability under goes a phase transition from 0 to 1, for a measurement-optimal recovery rule minimizing the required number of measurements regardless of the computational complexity. The same threshold was previously shown to be the best possible when the measurement vectors are designed adaptively, with each vector chosen based on the outcomes of previous tests. Thus, despite being non-adaptive, Bernoulli designs are asymptotically optimal when θ≤13. In the case of partial recovery, we show that n \le \Big( (1-\alpha^{*}) k\log_2\frac{p}{k} \Big) (1-\eta) \implies \PP[\mathrm{error}] \to 1, \nonumber for any η>0, where an error means that |S∖S^|≥α∗|S| or |S^∖S|≥α∗|S|, so that α∗ indicates the allowed ``distance'' from S^ to S. Thus, there is little gain to be obtained in the required number of measurements even under this relaxed criterion. Finally, analogous necessary and sufficient conditions are derived for the noisy setting with Z∼Bernoulli(ρ), and we again provide exact thresholds for sufficiently small θ. Our analysis takes a novel approach revealing that the error probability is characterized by tail probabilities containing quantities of the form ı(x;y):=logPY|X(y|x)PY(y), permitting precise characterizations due to the fact that independent summations concentrate around their mean. This approach is a significant departure from widely-used tools such as maximum-likelihood decoding and Fano's inequality, and leads to sharper bounds for the case that k scales with p, as well as the stronger statement Pr[error]→1 (as opposed to merely Pr[error]↛0) in the converse bounds.

As a reminder, all the preprints/papers mentioned on this blog related to Group Testing are listed under the GroupTesting tag. There are 50 blog entries using that tag. A while ago, I explained how compressive sensing works on Quora using the concepts behind group testing: What is compressed sensing (compressive sampling) in layman's terms?

[0] Group testing with Random Pools: Phase Transitions and Optimal Strategy by M. Mézard, M.Tarzia, C. Toninelli

The problem of Group Testing is to identify defective items out of a set of objects by means of pool queries of the form "Does the pool contain at least a defective?". The aim is of course to perform detection with the fewest possible queries, a problem which has relevant practical applications in different fields including molecular biology and computer science. Here we study GT in the probabilistic setting focusing on the regime of small defective probability and large number of objects, p→0 and N→∞. We construct and analyze one-stage algorithms for which we establish the occurrence of a non-detection/detection phase transition resulting in a sharp threshold, M¯, for the number of tests. By optimizing the pool design we construct algorithms whose detection threshold follows the optimal scaling M¯∝Np|logp|. Then we consider two-stages algorithms and analyze their performance for different choices of the first stage pools. In particular, via a proper random choice of the pools, we construct algorithms which attain the optimal value (previously determined in Ref. [16]) for the mean number of tests required for complete detection. We finally discuss the optimal pool design in the case of finite p.