Nuit Blanche

Tuesday, January 17, 2012

Analysis Dictionary Learning: A New Matrix Factorization ?

I eventually was able to watch Miki Elad's presentation at MIA2012. While that presentation is not on his website yet, the closest I could find is: K-SVD Dictionary-Learning for Analysis Sparse Models. I had read these papers before on analysis versus synthesis and while I am still not quite clear on what is being really said (it needs to sink in first), there is I think a very interesting matrix decomposition I had not seen before:

Namely, \Omega and A are unknowns, X is known and A is made of sparse vector columns. While talking to Miki, we were wondering if this type of matrix decomposition existed or was needed in other fields of investigation (besides analysis dictionary learning for sparse models) ? Recall that the current and now mainstream dictionary learning solvers (featured in the Matrix Factorization Jungle Page) solve the following problem:

(Synthesis) Dictionary Learning: A = DX with unknown D and X, solve for sparse X

which is to be contrasted with the new decomposition:

(Analysis) Dictionary Learning A = \Omega X with unknown \Omega and A, solve for sparse A

Of the results that surprised me, the first one was pretty telling:

In short, the operator \Omega found by this dictionary learning decomposition seems to find back a TV like operator! (The Xs were made of patches).

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Monday, January 16, 2012

The Nuit Blanche Mailbag

MIA2012 starts today. The Call for papers: Advances in signal and image processing for physico-chemical analysis has been extended. Following up on the X-Prize Tricorder Challenge, I already have gotten a person interested. Also Phil Schniter sent me this:

Dear Igor,

I'm writing to let you know about two faculty positions in _Machine Learning_ that we recently announced at the Ohio State University. Both faculty will be jointly appointed across the departments of ECE and Biomedical Informatics (BMI). We seek candidates who will make solid theoretical contributions, and who are also excited by the real-world datasets that BMI has to offer.

More details about these positions can be found at: https://ece.osu.edu/about/employment

Perhaps you can mention this on your blog.

Thanks and best regards,

Phil

Thanks Phil for the heads-up. Dick Gordon also mentioned one of his paper in Biotechnology Focus back in 2000:

Dear Igor,

Wrote:

Blyden, E.R. & R. Gordon (2000). Genomics, pharmacology and 3D imaging: self-knowledge in the post-genomic era. Biotechnology Focus 3(6), 14, 16.

a while ago. Must be 15 diseases diagnosable via ultrasound, so that’s where I’d start.

Fifteen diseases detected through ultrasound ?! I did not realize they was even one. Thanks Dick. Finally, Jim Fowler sent the following:

Hi Igor,

I would like to let you know of a journal article that we will have appearing soon in Foundations and Trends in Signal Processing. The details are:

J. E. Fowler, S. Mun, and E. W. Tramel, “Block-Based Compressed Sensing of Images and Video,” Foundations and Trends in Signal Processing, to appear.
Abstract:

A number of techniques for the compressed sensing of imagery are surveyed. Various imaging media are considered, including still images, motion video, as well as multiview image sets and multiview video. A particular emphasis is placed on block-based compressed sensing due to its advantages in terms of both lightweight reconstruction complexity as well as a reduced memory burden for the random-projection measurement operator. For multiple-image scenarios, including video and multiview imagery, motion and disparity compensation is employed to exploit frame-to-frame redundancies due to object motion and parallax, resulting in residual frames which are more compressible and thus more easily reconstructed from compressed-sensing measurements. Extensive experimental comparisons evaluate various prominent reconstruction algorithms for still-image, motion-video, and multiview scenarios in terms of both reconstruction quality as well as computational complexity.
PDF at: http://www.ece.msstate.edu/~fowler/Publications/FMT2012.html
Best Regards,
-Jim

Thanks Jim.

One of you asked for permission to make this blog more accessible in the Middle Empire, it is a great initiative.For those of you who want to catch, you may want to download The Nuit Blanche Chronicles featuring a pdf of all the entries of most of 2011.

Finally, in the Matrix Factorization Jungle Group on LinkedIn, Danny Bickson asked the following about a GraphLab workshop:

GraphLab workshop?

Hi all,

We are thinking about arranging a graphlab workshop in the bay area around April. We thought about having demos and tutorials about graphlab v2 with some contributed talks from industry about future challenges in large scale machine learning. A preliminary list of companies who already confirmed their participation: Intel, Cloudera, WallMart Labs, Technicolor Labs, LinkedIn, Pandora Internet Radio, Oracle Labs and multiple startups. If you are interested in participating and or giving a talk please contact me.

We are also checking the possibility of planning a similar workshop at the east coast. Here we are less sure about the demand. Please contact me if you are interested!

Thanks a lot!

As a reminder, there are currently 154 members in that group and 1253 members in the compressive sensing group.

Image Credit: NASA/JPL/Space Science Institute
Full-Res: W00071706.jpg
W00071706.jpg was taken on January 13, 2012 and received on Earth January 15, 2012. The camera was pointing toward SATURN at approximately 2,801,959 kilometers away, and the image was taken using the CB2 and CL2 filters.

Saturday, January 14, 2012

Another clue Journals are in the Branding Business

Yaroslav pointed this entry on Christian Robert's blog. The astute reader may recall a similar point in "You think peer review as currently implemented suck ? Wait till it is going to be the only way to publish"

Friday, January 13, 2012

Request for Expression of Interest: The Qualcomm Tricorder X Prize

The Qualcomm Tricorder X Prize was announced at CES this week. The goal of the competition is to build a "thing" that can be

".....capable of capturing key health metrics and diagnosing a set of 15 diseases. Metrics for health could include such elements as blood pressure, respiratory rate, and temperature. Ultimately, this tool will collect large volumes of data from ongoing measurement of health states through a combination of wireless sensors, imaging technologies, and portable, non-invasive laboratory replacements...".

I say a "thing" because it does not have to be just a portable device as one could envision by remembering the original Tricorder. The current description of the prize is pretty light on the details for the moment and the FAQ is not extraordinarily helpful....yet. For instance, right now we don't know if the 15 diseases are already known or are just up to the team to define.

Let us look at some of the response of the FAQ to get a sense of what this challenge is really about.. First of all, are we talking about Star Treck type of technology, where most of the technology either does not exist or is very low in the TRL scale ? Probably not:

"....What's new about the technology? Doesn't most of this already exist?
Yes, some of the technology exists today. However, the the teams in this competition will pull this all together in one seamless system. The resulting instrument will also push the sensing component of technology in different ways: Smaller, lighter, cheaper, faster, better. Integration of these many different components is expected to be very challenging...."

While the emphasis is on diagnostics, I note the importance of continuous monitoring

What will the Device actually do?

Diagnose diseases

Provide ongoing metrics of health (vitals)

Allow monitoring or continuous use of sensors to diagnose and measure health

Provide awareness of health state

Give confirmation that everything is ok with a consumer

Notify that something is not ok (a "check engine light")

Of related interest, there seems to be an interest for non invasive capabilities, this one is tough one.

Do the sensors have to be wireless?
No; however, due to consumer experience requirements it's unlikely a non-wireless sensor will be successful.

Can the sensors be invasive? (What is "invasive?")
There is no requirement or limit on sensing; we define a grand challenge and let teams find the best, innovative new solutions. "Invasive" means it punctures the skin. The competition allows this but it's very unlikely this would be acceptable to a consumer. For example, drawing blood is invasive but the accelerometer in your phone is non-invasive.

Finally, when they talk about sensing, they really mean sensing and make sense of it:

What is the difference between sensors and sensing? (What is "sensing?")
Sensors are generally physical hardware. These are used to collect health metrics and data about a person. The sensor can collect data for a short or long period of time. Sensing is the process of taking the data and interpreting it for patterns. These patterns can be analyzed to show unusual variations within one person, or compared to other people.

From the compressive sensing standpoint, there are obvious subjects of interest in this description, some of which have somehow already been implemented by some research teams. However, besides ECG, EEG, there might be some trickier inverse problems if we want to avoid the issue of non invasiveness. In particular, there may be a need for the fusion of inverse problems that generally are not considered together.

I would be interested in being part of a team that competes in this challenge. I may contact some of you in the future on the matter but if you want to just talk about it, we can do that as well. Obviously, all these discussions will remain private unless we, both parties, agree to communicate on these matters. Wave me in if interested.

Request for Expression of Interest: The Qualcomm Tricorder X Prize

The Qualcomm Tricorder X Prize was announced at CES this week. The goal of the competition is to build a "thing" that can

".....capable of capturing key health metrics and diagnosing a set of 15 diseases. Metrics for health could include such elements as blood pressure, respiratory rate, and temperature. Ultimately, this tool will collect large volumes of data from ongoing measurement of health states through a combination of wireless sensors, imaging technologies, and portable, non-invasive laboratory replacements...".

"....What's new about the technology? Doesn't most of this already exist?
Yes, some of the technology exists today. However, the the teams in this competition will pull this all together in one seamless system. The resulting instrument will also push the sensing component of technology in different ways: Smaller, lighter, cheaper, faster, better. Integration of these many different components is expected to be very challenging...."

While the emphasis is on diagnostics, I note the importance of continuous monitoring

What will the Device actually do?

Diagnose diseases

Provide ongoing metrics of health (vitals)

Allow monitoring or continuous use of sensors to diagnose and measure health

Provide awareness of health state

Give confirmation that everything is ok with a consumer

Notify that something is not ok (a "check engine light")

Of related interest, there seems to be an interest for non invasive capabilities, this one is tough one.

Do the sensors have to be wireless?
No; however, due to consumer experience requirements it's unlikely a non-wireless sensor will be successful.

Can the sensors be invasive? (What is "invasive?")
There is no requirement or limit on sensing; we define a grand challenge and let teams find the best, innovative new solutions. "Invasive" means it punctures the skin. The competition allows this but it's very unlikely this would be acceptable to a consumer. For example, drawing blood is invasive but the accelerometer in your phone is non-invasive.

Finally, when they talk about sensing, they really mean sensing and make sense of it:

What is the difference between sensors and sensing? (What is "sensing?")
Sensors are generally physical hardware. These are used to collect health metrics and data about a person. The sensor can collect data for a short or long period of time. Sensing is the process of taking the data and interpreting it for patterns. These patterns can be analyzed to show unusual variations within one person, or compared to other people.

Thursday, January 12, 2012

OSTP RFI Last day for comments, Around the blogs in 80 hours and Extension of SBL Algorithms for the Recovery of Block Sparse Signals with Intra-Block Correlation

Various items on the block today:

First Today is the last for getting your throughts to two Request For Information by the Office of Science and Technology Policy:

I expressed my view on the former in Toward Robust Science: Why Open Access of Government Funded Peer Review Work is Important but it could rightly apply to the latter as well. I listed the questions here.

"...How To Submit a Response All comments must be submitted electronically to: publicaccess@ostp.gov. Responses to this RFI will be accepted through January 12, 2012. You will receive an electronic confirmation acknowledging receipt of your response,..."

Please note that you do not have to be a US person to submit a response. Also note that any information (that includes your identity) will be a matter of public record as normally expected in this type of generic inquiry. You can still make your submission anonymous if this is bothering you.

I will not be watching the presentations at MIA 2012 but I may or may not drop in in the cafeteria next door in between some presentations for the coffee breaks.. I removed myself from the list of participants early on so that younger participants could have a chance to attend and learn. Gabriel, one of the organizers, told me that more than 300 people applied but that the rooms at IHP could only safely host 200 or so folks. Congratulations to the organizers, it looks like it will be an impressive series of talks with a large audience.

On LinkedIn, someone asked Does anybody know how to implement compressive sensing in simulink? and another I'm building a compressed sensing Receiver for UWB signals...do You know how to construct the matrices using amplifiers?... Also, Ruiliang Zhang provided an answer to What's the most important paper to deal with Bregman iteration?

Rich mentioned the upcoming Connexions conference, Bob talks about Strange behavior in sparse representation classification?, Zhilin provides some information on his new paper (see his email below for more information). Danny provides us with a Vowpal Wabbit Tutorial. Terry reviews Random matrices and specifically The Four Moment Theorem for Wigner ensembles. Here is a review of Persi Diaconis' latest book in the WSJ , I am going to get it on the Kindle app for the iPhone/iPad,

At UBC there are two courses related to compressed sensing: MATH 555 taught by Ozgur Yilmaz and

EOSC 513 taught by Felix Hermann. Ar University of Michigan, Anna Gilbert has a blog where she writes down some of her lectures there. At Iowa State, Namrata Vaswani teaches EE 527: Detection and Estimation Theory with relevant parts related to compressive sensing.

Justin Romberg's lectures at ENS Lyon last week just showed up on the interwebs::

lecture-1-1-basis.pdf

lecture-1-2-frames.pdf

lecture-1-3-dct-notes.pdf

lecture-1-4-wavelets.pdf

lecture-1-5-sparsity-overview.pdf

lecture-2-1-sparseapprox.pdf

lecture-2-2-bp.pdf

lecture-2-3-upsparse.pdf

lecture-2-4-1-csoverview.pdf

lecture-2-4-1-csoverview1.pdf

lecture-2-4-2-csoverview.pdf

lecture-2-4-3-csoverview.pdf

lecture-2-4-4csoverview.pdf

lecture-3-1-invprobs.pdf

lecture-3-2-ls.pdf

lecture-3-3-l1dual.pdf

lecture-3-4-l1cone.pdf

lecture-3-5-stable.pdf

lecture-4-1-gaussrip.pdf

notation.pdf

At the Joint Mathematics Meeting, Eric Price gave a talk on the Geometric Aspects of Compressive Sensing

Knome Expands Scientific Advisory Board and brings Dr. Yaniv Erlich bring genomics expertise as company pursues future growth

Finally, Zhilin Zhang sent me the following:

".....Hi, Igor,

....We just submitted a paper on block sparse model, which considers to exploit intra-block correlation:, Zhilin Zhang, Bhaskar D. Rao , Extension of SBL Algorithms for the Recovery of Block Sparse Signals with Intra-Block Correlation, submitted to IEEE Transaction on Signal Processing, January 2012. The preprint can be downloaded here: http://arxiv.org/abs/1201.0862. Here is the abstract:

We examine the recovery of block sparse signals and extend the framework in two important directions; one by exploiting intra-block correlation and the other by generalizing the block structure. We propose two families of algorithms based on the framework of block sparse Bayesian learning (bSBL). One family, directly derived from the bSBL framework, requires knowledge of the block partition. Another family, derived from an expanded bSBL framework, is based on a weaker assumption about the a priori information of the block structure, and can be used in the cases when block partition, block size, block sparsity are all unknown. Using these algorithms we show that exploiting intra-block correlation is very helpful to improve recovery performance. These algorithms also shed light on how to modify existing algorithms or design new ones to exploit such correlation for improved performance.

Please note that:

Our proposed algorithms have the best recovery performance among ALL the existing algorithms (I've sent more than one month to carry out experiments to compare algorithms, but didn't find any algorithms have the similar performance as ours).
These algorithms are the first algorithms that adaptively exploit intra-block correlation.
We revealed that intra-block correlation, if exploited, can significantly improve recovery performance. I think you may not be surprised by this observation, since we obtained similar observation from our previous MMV work (i.e. temporal correlation, if exploited, can significantly improve recovery performance of MMV algorithms)
But interestingly, we found that the intra-block correlation has little effects on the performance of existing algorithms. This observation is entirely different to our previous finding on the MMV model, where we found temporal correlation has obvious negative effects on the performance of existing algorithms. For example, group Lasso keeps almost the same recovery performance no matter what's the intra-block correlation value. I guess maybe this is the reason that why intra-block correlation has not drawn attention from the people working on the block sparse model. But as you can see from our paper, exploiting the intra-block correlation can be very helpful to improve recovery performance (or reduce the number of measurements with the same recovery performance).

The codes will be posted on the website: http://dsp.ucsd.edu/~zhilin/BSBL.html (probably at the end of this month). But any one, if interested in, can send email to me to get these codes...."

Thanks Zhilin for the heads-up.

Wednesday, January 11, 2012

Implementation for the Generalization of the Column-Row Matrix Decomposition to Multi-way Arrays

Cesar Caiafa sent me the following:

Dear Igor;

My name is Cesar Caiafa, some time ago I found your excellent blog with very useful and important references to works on Compressed Sensing and more recently, about matrix factorization (MF) and low rank approximations which is one of my research interests.

One of the generalizations of MF are tensor factorizations, in particular, one of the possible tensor factorizations is the Tucker model used for multiway data since long time ago.

I would like to drive your attention to our work from 2010 in Linear Algebra and Its applications:

“Generalizing the Column-Row Matrix Decomposition to Multi-way Arrays”, Cesar F. Caiafa, A. Cichocki, Linear Algebra and its Applications, Vol. 433, pp. 557–573, 2010 (Elsevier).

In this paper, we developed new formulas and algorithms to compute a compressed format of an N-way tensor based only on the information contained in few selected n-mode fibers, i.e. for a 3D tensor, we are able to approximate it based on the entries on few columns (1-mode), rows (2-mode) and tubes (3-modes). We called this method as Fiber Sampling Tensor Decomposition (FSTD). Our idea was motivated by several previous works by M Mahoney, P. Drineas, I. Oseledets and E. Tyrtyshnikov. The abstract reads as follows:

"In this paper, we provide two generalizations of the CUR matrix decomposition Y=CUR (also known as pseudo-skeleton approximation method [1]) to the case of N-way arrays (tensors). These generalizations, which we called Fiber Sampling Tensor Decomposition types 1 and 2 (FSTD1 and FSTD2), provide explicit formulas for the parameters of a rank-(R,R,…,R) Tucker representation (the core tensor of size R×R×⋯×R and the matrix factors of sizes In×R, n=1,2,…N) based only on some selected entries of the original tensor. FSTD1 uses PN-1(P⩾R)n-mode fibers of the original tensor while FSTD2 uses exactly R fibers in each mode as matrix factors, as suggested by the existence theorem provided in Oseledets et al. (2008) [2], with a core tensor defined in terms of the entries of a subtensor of size R×R×⋯×R. For N=2 our results are reduced to the already known CUR matrix decomposition where the core matrix is defined as the inverse of the intersection submatrix, i.e. U=W-1. Additionally, we provide an adaptive type algorithm for the selection of proper fibers in the FSTD1 model which is useful for large scale applications. Several numerical results are presented showing the performance of our FSTD1 Adaptive Algorithm compared to two recently proposed approximation methods for 3-way tensors."

The preprint of our paper is available at

http://web.fi.uba.ar/~ccaiafa/preprints/TensorCUR_v6.pdf

Also Matlab codes for FSTD are available at

http://web.fi.uba.ar/~ccaiafa/Code/FSTD1_package.rar

You can share this information if you like.

Best Regards

Cesar

http://web.fi.uba.ar/~ccaiafa/Cesar/Cesar.html

Thank you Cesar.

Tuesday, January 10, 2012

Implementation of Mirror Prox: ℓ1 Minimization via Randomized First Order Algorithm

Anatoli Juditsky, Fatma Kilinc Karzan, and Arkadi Nemirovski let me know that their MATLAB First Order software for solving min_x{∥x∥_1 : ∥Ax − b∥_p ≤ \delta } with p=2 or p=\infty is available here. The attendant manual for the Mirror Prox Solver is here. They mention that all comments are highly welcome. As a reminder one of the attendant paper is:

L1 Minimization via Randomized First Order Algorithms by Anatoli Juditsky, Fatma Kilinc Karzan, Arkadi Nemirovski. The abstract reads:

In this paper we propose randomized first-order algorithms for solving bilinear saddle points problems. Our developments are motivated by the need for sublinear time algorithms to solve large-scale parametric bilinear saddle point problems where cheap online assessment of solution quality is crucial. We present the theoretical efficiency estimates of our algorithms and discuss a number of applications, primarily to the problem of $\ell_1$ minimization arising in sparsity-oriented Signal Processing. We demonstrate, both theoretically and by numerical examples, that when seeking for medium-accuracy solutions of large-scale $\ell_1$ minimization problems, our randomized algorithms outperform significantly (and progressively as the sizes of the problem grow) the state-of-the art deterministic methods.

From the paper:

"...In our opinion, the preliminary numerical results we have reported suggest that “acceleration via randomization” possesses a signiﬁcant practical potential when solving extremely large-scale convex programs of appropriate structure..."

Arkadi reminded me that the current package featured today

"... implements deterministic Mirror Prox algorithm. The reference to the paper on randomized minimization is related to certain ``upper-level'' routine (the technique for solving generalized saddle point problems), common for both deterministic and randomized Mirror Prox, and presented in the paper on randomization. In this release, the ``working horse'' -- the Mirror Prox solver for saddle point subproblems -- is deterministic, which is better for ``medium'' problems (sizes of matrices in the range of few thousands) and for problems with much larger matrices allowing for reasonably fast matrix-vector multiplications, as is the case when the matrix is sparse, or represents DFT, or convolution, etc. The manual explains exactly what is inside. While we do have randomized version of the solver, its release, provided it is needed, will take some time..."

The algorithm will be added to the compressive sensing reconstruction page. Thanks Anatoli, Fatma, and Arkadi

Credit: NASA, GOES 15 view of the Sun.

Monday, January 09, 2012

Toward Robust Science: Why Open Access of Government Funded Peer Review Work is Important

Here are some of my thoughts with regards to comments to the OSTP RFI on Open Access to Government Funded Peer Review work. If you do respond to the RFI directly do not hesitate to use any of the words below.

Executive Summary

In the past few years, we have seen a perfect storm mixing the growth of two phenomena, a data deluge stemming from access to cheap sensing and computational equipment and the growth of scholarly publications. At the same time, there has been a near constant supply of reviewers. Open access to government funded work is the only short- and long-term policy decision that quickly enables a larger pool of quality reviewing capability aside from imposing reproducible research standards. In the end, it enables a more robust scientific process.

Introduction

With the advent of cheap high throughput equipment, we are seeing the emergence of what one would call "the curse of dimensionality", .i.e. the ability to produce cheaply large amount of data and the somewhat still limited ability to makes sense of them. This data deluge is, in turn, the primary reason behind the growth of the number of scholarly journals and journal articles over the recent few years. Unfortunately, the pool of potential reviewers has remained about the same and has not caught up to the level needed to deal with these two growth factors. One can certainly wonder how this is having an impact on how Science is performed (i.e. judged). In particular, the growth of the number of journals has eventually yielded a reliance on a lower number of potential high quality reviewers per journals. More insidiously, the growth in data production and/or computational experiments has removed from most time-constrained reviewers the physical ability to take on real reviews.

Peer-Review

In light of this situation, the current response by non-profit and commercial publishing entities has been to exacerbate the problem by opening the gate for newer journals and conference venues instead of developing innovative processes to do the one function that is generally thought to be their value added to the process of scientific discovery: The management of the peer-review process. An item of considerable interest is the current lukewarm ability by publishers (commercial or non-profit) to deal pro-actively and fairly with retraction. In particular, there is currently no system in place for publications to address the fact that they may have referenced a recently retracted publication for instance.

Under a regime of government funded open access of publications, new or older players could change the way peer review is performed by enabling systems like a post-peer review capability. This is just an example but innovation has to enter this market in order for the different stakeholders to continue on producing high quality work, at the lowest price to the government.

Conclusions

The interest of the US Government to have open access of government funded work can be clearly delineated into the following reasons:

Open access opens the ability of non-time constrained post peer-review processes by a larger pool of reviewers, thereby enabling a more robust scientific discovery process.
Open access provides the ability for innovation in the marketplace by enabling new (commercial or non-profit) actors in the peer review process. The new players may provide the ability to create new opportunities that are currently seldom explored by the current landscape.
Open access potentially reduces some large cost to the government in its ability to deal effectively with past flawed work and attendant retractions. Some of these retracted works may have had broad policy implications.
Open access comforts the United States leadership in manners related to Science and Technology development.

Saturday, January 07, 2012

Extended till Jan12: OSTP RFI Public Access to Peer-Reviewed Scholarly Publications Resulting From Federally Funded Research

In Could SOPA shut down Nuit Blanche ? and responding to OSTP's RFI I mentioned that the Request for Information on Public Access to Peer-Reviewed Scholarly Publications Resulting From Federally Funded Research ended January 2. I was wrong, the request has been extended, see the end of page 80418 of the Federal Register of December 23rd:.