A while back, we wondered If Education [was] a low Rank problem or probably how to use the fact Aha moments are sparse. How about using the fact that most key concepts are really sparse ? Looks like a probabilistic matrix factorization problem:
Armed with this model and given incomplete observations of the graded learner–question responses Yi;j, our goal is to estimate the factors W, C, and M. Such a factor-analysis problem is ill-posed in general, especially when each learner answers only a small subset of the collection of questions (see Harman (1976) for a factor analysis overview). Our first key observation that enables a well-posed solution is the fact that typical educational domains of interest involve only a small number of key concepts (i.e., we have K N; Q in Figure 1).
We develop a new model and algorithms for machine learning-based learning analytics, which estimate a learner’s knowledge of the concepts underlying a domain, and content analytics, which estimate the relationships among a collection of questions and those concepts. Our model represents the probability that a learner provides the correct response to a question in terms of three factors: their understanding of a set of underlying concepts, the concepts involved in each question, and each question’s intrinsic difficulty. We estimate these factors given the graded responses to a collection of questions. The underlying estimation problem is ill-posed in general, especially when only a subset of the questions are answered. The key observation that enables a well-posed solution is the fact that typical educational domains of interest involve only a small number of key concepts. Leveraging this observation, we develop both a bi-convex maximum-likelihood-based solution and a Bayesian solution to the resulting SPARse Factor Analysis (SPARFA) problem. We also incorporate user-defined tags on questions to facilitate the interpretability of the estimated factors. Experiments with synthetic and real-world data demonstrate the efficacy of our approach. Finally, we make a connection between SPARFA and noisy, binary-valued (1-bit) dictionary learning that is of independent interest.
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.
An interesting paper with an interesting pair of algorithms, but I question the validity of this assumption:
ReplyDelete"Our third observation is that the entries of W should be non-negative, since we postulate that having strong concept knowledge should never hurt a learner’s chances to answer questions correctly."
I've noticed that many people learn an incorrect model which harms their learning of other related concepts. Dijkstra once wrote "It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration". This might be a little harsh, but I've seen the same effect with students, and with my own learning. To test this, it might be interesting to see whether relaxing the non-negative constraint leads to more sparse solutions.
(Or have I misunderstood something, making my understanding of this paper harder?)
fyi,
ReplyDeletehttp://nuit-blanche.blogspot.com/2014/02/is-bad-learning-learning.html
igor