Showing posts sorted by relevance for query group testing. Sort by date Show all posts
Showing posts sorted by relevance for query group testing. Sort by date Show all posts

Monday, September 07, 2015

Group Testing and SAFFRON

A few days ago, we featured this preprint entitled " SAFFRON: A Fast, Efficient, and Robust Framework for Group Testing based on Sparse-Graph Codes", I wondered then why I did not see a reference to some other similar work in that area featured here back two years ago. ". One of the authors, Kannan Ramchandran kindly responded to my interrogation: outstanding ! Here is what Kannan sent me:

Dear Igor,



We recently read your blog article introducing our new arxived paper on group testing.
http://nuit-blanche.blogspot.fr/2015/08/saffron-fast-efficient-and-robust.html


We would like to first thank you for sharing our paper on your well-read blog! In your post, you referred to a paper from two years ago (by Mezart et al.), which we didn’t cite in our new paper. We didn't know about this prior work, and thank you for pointing this out. We read it, and found that our papers are very different. While apparently related at the surface, perhaps due to the confusing common use of spatial-coupled codes to tackle some group testing problem, we would like to clarify that the two works are indeed very different. FYI, we would like to describe some clear distinctions between our works.


First, and most important, the two papers are working on different problems: the paper by Mezart et al. is on linear group testing, while our work is on boolean group testing. In linear group testing, a test result of a pool reveals the number of defective items in the pool, while in boolean group testing, a test result is simply a Boolean OR operator, i.e. it is a boolean 1 if there is one or more defective items, 0 otherwise. It is clear that boolean group testing is a strictly harder problem than the linear group testing problem since in our case, the test gives only one bit of information that is also a non-linear function of the components of the sparse input vector.


Further, our use of spatially coupled LDPC codes is completely different form the way they use spatially coupled LDPC codes. The use of LDPC codes for the linear problem is not that surprising as the problem is linear. However, the non-linear nature of our Boolean problem makes the use of spatial LDPC codes somewhat challenging. Specifically, we design our pooling matrix using a 2-layer approach. The upper layer is designed based on simple left-regular random bipartite graphs. The lower layer can be designed based on any error correcting codes. We mentioned spatially coupled LDPC in our work because it is a good candidate of error-correcting codes that can be used for this lower layer. However, our scheme is very general, and it is not limited to spatially coupled LDPC. On the other hand, Mezart’s paper uses a pooling matrix that is entirely based on a spatially coupled code. Further, we were able to achieve the optimal computational complexity with this layered pooling design, while Mezart’s paper uses a standard belief propagation algorithm, which has a suboptimal computational complexity.


Though different, now that we are aware of this work, we will reference Mezart’s paper in our revised version as they proposed the use of spatially coupled codes to tackle a linear variant of the group testing problem that we address. We will also add a reference to the Cheraghchi paper on group testing with thresholds that you point out in your blog in our revised version.


Again, thank you for sharing our paper on your blog and pointing out a useful reference that we were unaware of.


Regards,
Kannan 

Thank you Kannan !

As a reminder, all the preprints/papers mentioned on this blog related to Group Testing are listed under the GroupTesting tag. There are 48 blog entries using that tag. A while ago, I explained how compressive sensing works on Quora using the concepts behind group testing: What is compressed sensing (compressive sampling) in layman's terms?


 Potentially relevant

 


 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Wednesday, September 16, 2015

Phase Transitions in Group Testing

Great, phase transitions in group testing problems!
 
 Much like previously, when I don't see a citation in the reference section I always wonder if the paper provides a new or a very new insight from previously [0,1,2,3]. What is most fascinating here is that some of these recent results on group testing, are coming from different communities and it is definitely a challenge to delineate which work provides the most advances. 
 
For one, I am also convinced that, much like what happened in compressive sensing, the use of empirical phase transitions (read the Map Makers)  goes a long way toward avoid endless discussions about asymptotics for real life reconstruction solvers. Without further due:



Phase Transitions in Group Testing  by Scarlett, Jonathan; Cevher, Volkan

We consider the fundamental limits on the number of tests required for the group testing problem, which has applications in medical testing, fault detection, communication protocols, pattern matching, and database systems. Specifically, we have p items labeled 1,,p, among which a uniformly random subset S of cardinality k are defective. The goal is to recover the set S based on n possibly noisy measurements, each taking the form Y = \boldsymbol{1}\bigg{ \bigcup_{i \in S} {X_i = 1} \bigg} \oplus Z, \nonumber where X=(X1,,Xp)0,1p is a measurement vector indicating the items involved in each test, and Z0,1 represents possible noise under modulo-2 addition. We consider Bernoulli designs, in which each entry of each measurement vector is independently drawn at random from a Bernoulli distribution. In the noiseless case with k=Θ(pθ) (θ13), we show the following for an optimal recovery algorithm: n \ge \Big( k\log_2\frac{p}{k} \Big) (1+\eta) \implies \PP[\mathrm{error}] \to 0 \nonumber
n \le \Big( k\log_2\frac{p}{k} \Big) (1-\eta) \implies \PP[\mathrm{error}] \to 1 \nonumber for any η>0, where an error means that S^S, with S^ denoting the estimate of S. That is, we provide an exact threshold at which the error probability under goes a phase transition from 0 to 1, for a measurement-optimal recovery rule minimizing the required number of measurements regardless of the computational complexity. The same threshold was previously shown to be the best possible when the measurement vectors are designed adaptively, with each vector chosen based on the outcomes of previous tests. Thus, despite being non-adaptive, Bernoulli designs are asymptotically optimal when θ13. In the case of partial recovery, we show that n \le \Big( (1-\alpha^{*}) k\log_2\frac{p}{k} \Big) (1-\eta) \implies \PP[\mathrm{error}] \to 1, \nonumber for any η>0, where an error means that |SS^|α|S| or |S^S|α|S|, so that α indicates the allowed ``distance'' from S^ to S. Thus, there is little gain to be obtained in the required number of measurements even under this relaxed criterion. Finally, analogous necessary and sufficient conditions are derived for the noisy setting with ZBernoulli(ρ), and we again provide exact thresholds for sufficiently small θ. Our analysis takes a novel approach revealing that the error probability is characterized by tail probabilities containing quantities of the form ı(x;y):=logPY|X(y|x)PY(y), permitting precise characterizations due to the fact that independent summations concentrate around their mean. This approach is a significant departure from widely-used tools such as maximum-likelihood decoding and Fano's inequality, and leads to sharper bounds for the case that k scales with p, as well as the stronger statement Pr[error]1 (as opposed to merely Pr[error]0) in the converse bounds.

As a reminder, all the preprints/papers mentioned on this blog related to Group Testing are listed under the GroupTesting tag. There are 50 blog entries using that tag. A while ago, I explained how compressive sensing works on Quora using the concepts behind group testing: What is compressed sensing (compressive sampling) in layman's terms?

[0] Group testing with Random Pools: Phase Transitions and Optimal Strategy by  M. Mézard, M.Tarzia, C. Toninelli

The problem of Group Testing is to identify defective items out of a set of objects by means of pool queries of the form "Does the pool contain at least a defective?". The aim is of course to perform detection with the fewest possible queries, a problem which has relevant practical applications in different fields including molecular biology and computer science. Here we study GT in the probabilistic setting focusing on the regime of small defective probability and large number of objects, p0 and N. We construct and analyze one-stage algorithms for which we establish the occurrence of a non-detection/detection phase transition resulting in a sharp threshold, M¯, for the number of tests. By optimizing the pool design we construct algorithms whose detection threshold follows the optimal scaling M¯Np|logp|. Then we consider two-stages algorithms and analyze their performance for different choices of the first stage pools. In particular, via a proper random choice of the pools, we construct algorithms which attain the optimal value (previously determined in Ref. [16]) for the mean number of tests required for complete detection. We finally discuss the optimal pool design in the case of finite p.
see recently also:
  1. Group Testing and SAFFRON
  2. SAFFRON: A Fast, Efficient, and Robust Framework for Group Testing based on Sparse-Graph Codes"
  3. Non-adaptive pooling strategies for detection of rare faulty items / Improved Constructions for Non-adaptive Threshold Group Testing

 


 Potentially relevant
 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Monday, July 15, 2013

Group Testing: GROTESQUE: Noisy Group Testing (Quick and Efficient), Efficient Probabilistic Group Testing Based on Traitor Tracing, Group testing algorithms: bounds and simulations

So today, our emphasis is on group testing, here is what we could find in the past two months on the subject on Arxiv. I sure would love more of the authors to make an implementation of their work available especially since now it looks like we have gone beyond the asymptotics games.
\



Group-testing refers to the problem of identifying (with high probability) a (small) subset of $D$ defectives from a (large) set of $N$ items via a "small" number of "pooled" tests. For ease of presentation in this work we focus on the regime when $D = \cO{N^{1-\gap}}$ for some $\gap > 0$. The tests may be noiseless or noisy, and the testing procedure may be adaptive (the pool defining a test may depend on the outcome of a previous test), or non-adaptive (each test is performed independent of the outcome of other tests). A rich body of literature demonstrates that $\Theta(D\log(N))$ tests are information-theoretically necessary and sufficient for the group-testing problem, and provides algorithms that achieve this performance. However, it is only recently that reconstruction algorithms with computational complexity that is sub-linear in $N$ have started being investigated (recent work by \cite{GurI:04,IndN:10, NgoP:11} gave some of the first such algorithms). In the scenario with adaptive tests with noisy outcomes, we present the first scheme that is simultaneously order-optimal (up to small constant factors) in both the number of tests and the decoding complexity ($\cO{D\log(N)}$ in both the performance metrics). The total number of stages of our adaptive algorithm is "small" ($\cO{\log(D)}$). Similarly, in the scenario with non-adaptive tests with noisy outcomes, we present the first scheme that is simultaneously near-optimal in both the number of tests and the decoding complexity (via an algorithm that requires $\cO{D\log(D)\log(N)}$ tests and has a decoding complexity of {${\cal O}(D(\log N+\log^{2}D))$}. Finally, we present an adaptive algorithm that only requires 2 stages, and for which both the number of tests and the decoding complexity scale as {${\cal O}(D(\log N+\log^{2}D))$}. For all three settings the probability of error of our algorithms scales as $\cO{1/(poly(D)}$.



Inspired by recent results from collusion-resistant traitor tracing, we provide a framework for constructing efficient probabilistic group testing schemes. In the traditional group testing model, our scheme asymptotically requires T ~ 2 K ln N tests to find (with high probability) the correct set of K defectives out of N items. Several other models are also considered, such as some noisy group testing and threshold group testing models, but we emphasize that this framework can be applied to many other variants of the classical model as well, both in adaptive and in non-adaptive settings.



We consider the problem of non-adaptive noiseless group testing. We develop several new algorithms and analyse their probability of success. In particular, we describe the Definite Defectives (DD) algorithm, and prove bounds on its probability of success, comparing it to the best known algorithms in the literature. Further, we describe the SCOMP algorithm, and perform simulations which show that it performs even better than the DD algorithm. Finally, we derive upper bounds on the success probability for any algorithms based on a Bernoulli sampled test matrix. We analyse the normalized number of tests required, in terms of information theoretic capacity. In certain regimes we argue that in an asymptotic sense the DD algorithm is optimal, and that less sparse problems (more defectives) have a larger `adaptivity gap'.
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Monday, August 01, 2011

Group Testing and Its Connection to Compressed Sensing.

Today is the first day of the Coding, Complexity and Sparsity Workshop at University of Michigan. Some of the organizers of this meeting are also featured in this post. I look forward for some of these presentations.

Two preprints and a thesis just came out to provide some insight on Group Testing and how it relates to compressed sensing. If you recall, Group Testing is probably one of the easily understandable technique that can be understood by non specialists and is therefore a nice way to explain compressed sensing in layman's terms (see some examples here which include finding Cylons in some Galactica's ships) A recent meeting was held on the subject, It is very nice to see some sort of convergence between these two subjects.

Recovering Simple Signals by Anna Gilbert, Brett Hemenway, Atri Rudra, Martin Strauss, and Mary Wootters. The abstract reads:
The primary goal of compressed sensing and (non-adaptive) combinatorial group testing is to recover a sparse vector x from an underdetermined set of linear equations \phi x = y. Both problems entail solving \phi x = y given \phi and y but they use di"erent models of arithmetic, different models of randomness models for \phi, and different guarantees upon the solution x and the class of signals from which it is drawn. In [32], Lipton introduced a model for error correction where the channel is computationally bounded, subject to standard cryptographic assumptions, and produces the error vector x that must be found and then corrected. This has been extended in [24, 34] to create more e#cient schemes against polynomial and logspace bounded channels. Inspired by these results in error correction, we view compressed sensing and combinatorial group testing as an adversarial process, where Mallory the adversary produces the vector x to be measured, with limited information about the matrix !. We define a number of computationally bounded models for Mallory and show that there are significant gains (in the minimum number of measurements) to be had by relaxing the model from adversarial to computationally or information-theoretically bounded, and not too much (in some cases, nothing at all) is lost by assuming these models over oblivious or statistical models. We also show that di"erences in adversarial power give rise to different lower bounds for the number of measurements required to defeat such an adversary. By contrast we show that randomized one pass log space streaming Mallory is almost as powerful as a fully adversarial one for group testing while for compressed sensing such an adversary as weak as an oblivious one.
The introduction starts with:
....Group testing was introduced by Dorfman in his seminal 1943 paper to identify soldiers in WWII who had syphilis [15]. Combinatorial group testing 1 has found applications in drug and DNA library screening (the literature is large, see [37, 26, 46, 25] and references thereof ), multiple access control protocols [5, 45], live baiting of DoS attackers [27], data forensics [23] and data streams [13], among others. The reader is referred to the standard monograph on group testing for more details [16]. Compressed sensing was introduced much more recently in 2006 by [8, 14] as a method for acquiring signals in a “compressed” fashion and then reconstructing good (i.e., sparse) approximations to the signals from those observations. The problem has many practical applications from analogto-digital conversion [28, 43, 31], novel optical device design [47, 42, 17], pooling designs for rare allele detection [20], and video acquisition [44], to cite a few examples. Even though they were introduced more than half a century apart, compressed sensing and (non-adaptive) combinatorial group testing are similar. Their primary goal is to recover a sparse vector x from an underdetermined set of linear equations \phi x = y. Both problems include solving \phi x = y given \phi and y but they use different models of arithmetic, di"erent randomness models for \phi and different guarantees upon the solution x and the class of signals from which it is drawn. Compressed sensing attempts to recover a sparse signal x element of R^N from a small number of linear measurements \phi x. The signal x is usually assumed to be composed of a “head”, the k largest entries (in magnitude) of the vector, supported on a small number of indices, and a tail whose l_1 or l_2 norm is small. The goal is to recover a close approximation x^ to x from the measurements \phi x. In the combinatorial group testing model the binary matrix ! represents a pooling design and the goal is to recover a small set (of size d) of “defective” elements from a set of tests !x. (Here x ! {0, 1}N.) In the group testing model, a test fails if any items in the group being tested are defective, so the matrix vector product \phi x is done using boolean OR....

We consider the problem of detecting a small subset of defective items from a large set via non-adaptive "random pooling" group tests. We consider both the case when the measurements are noiseless, and the case when the measurements are noisy (the outcome of each group test may be independently faulty with probability q). Order-optimal results for these scenarios are known in the literature. We give information-theoretic lower bounds on the query complexity of these problems, and provide corresponding computationally efficient algorithms that match the lower bounds up to a constant factor. To the best of our knowledge this work is the first to explicitly estimate such a constant that characterizes the gap between the upper and lower bounds for these problems.

The last item is a PhD thesis entitled Applications of Derandomization Theory in Coding by Mahdi Cheraghchi. The abstract reads:
Randomized techniques play a fundamental role in theoretical computer science and discrete mathematics, in particular for the design of efficient algorithms and construction of combinatorial objects. The basic goal in derandomization theory is to eliminate or reduce the need for randomness in such randomized constructions. In this thesis, we explore some applications of the fundamental notions in derandomization theory to problems outside the core of theoretical computer science, and in particular, certain problems related to coding theory. First, we consider the wiretap channel problem which involves a communication system in which an intruder can eavesdrop a limited portion of the transmissions, and construct efficient and information-theoretically optimal communication protocols for this model. Then we consider the combinatorial group testing problem. In this classical problem, one aims to determine a set of defective items within a large population by asking a number of queries, where each query reveals whether a defective item is present within a specified group of items. We use randomness condensers to explicitly construct optimal, or nearly optimal, group testing schemes for a setting where the query outcomes can be highly unreliable, as well as the threshold model where a query returns positive if the number of defectives pass a certain threshold. Finally, we design ensembles of error-correcting codes that achieve the information-theoretic capacity of a large class of communication channels, and then use the obtained ensembles for construction of explicit capacity achieving codes.

Image taken from a slide presentation entitled Group Testing: How to find out what’s important in life by Mark Iwen and Tsvetanka Sendova explaining group testing for educational outreach purposes.

Printfriendly