Paris Machine Learning: Meetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOn: Newsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myself: LightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
Late-interaction models take a different approach:
This simple but powerful insight has sparked an open-source ecosystem that’s now shaping both academic research and production-scale AI systems.
PyLate began as an internal experiment to simplify multi-vector training. Today, it’s a full-fledged library with 527 GitHub stars and growing adoption.
if you want to learn more about the library: PyLate documentation
In partnership with Answer.AI, LightOn co-developed ModernBERT, a model that fundamentally rethinks encoder architecture.
ModernBERT has already been cited 305+ times, with variants like BioClinical ModernBERT emerging for healthcare applications.
👉 Explore: ModernBERT LightOn blog post
Building great models is only half the challenge, making them work in production is the other. That’s where FastPlaid comes in.
As Raphael Sourty explains, static indexes solve many use cases, but mutable indexes (new in v1.10.0) unlock real-world applications where data evolves continuously.
👉 Read more: FastPlaid LightOn blogpost
Finally, to push accessibility even further, PyLate-rs compiles late-interaction inference to WebAssembly (WASM).
That means:
This lowers the barrier for demos, education, and lightweight deployments, proving late-interaction isn’t just powerful, it’s portable.
Taken together, these projects form a technical symphony:
The ecosystem has grown from an academic curiosity into a reasoning-first retrieval stack. With recognition at CIKM and ACL, adoption across GitHub and HuggingFace, and practical tools for real-world workflows, LightOn is helping shape the next era of AI search.
📖 Explore LightOn’s open-source ecosystem:
Neural ranking has become a cornerstone of modern information retrieval. While single vector search remains the dominant paradigm, it suffers from the shortcoming of compressing all the information into a single vector. This compression leads to notable performance degradation in out-of-domain, long-context, and reasoning-intensive retrieval tasks. Multi-vector approaches pioneered by ColBERT aim to address these limitations by preserving individual token embeddings and computing similarity via the MaxSim operator. This architecture has demonstrated superior empirical advantages, including enhanced out-of-domain generalization, long-context handling, and performance in complex retrieval scenarios. Despite these compelling empirical results and clear theoretical advantages, the practical adoption and public availability of late interaction models remain low compared to their single-vector counterparts, primarily due to a lack of accessible and modular tools for training and experimenting with such models. To bridge this gap, we introduce PyLate, a streamlined library built on top of Sentence Transformers to support multi-vector architectures natively, inheriting its efficient training, advanced logging, and automated model card generation while requiring minimal code changes to code templates users are already familiar with. By offering multi-vector-specific features such as efficient indexes, PyLate aims to accelerate research and real-world application of late interaction models, thereby unlocking their full potential in modern IR systems. Finally, PyLate has already enabled the development of state-of-the-art models, including GTE-ModernColBERT and Reason-ModernColBERT, demonstrating its practical utility for both research and production environments.
“True happiness comes from the joy of deeds well done, the zest of creating things new” Antoine de Saint-Exupéry
Hi Igor,I am an algorithms researcher at Google (http://theory.stanford.edu/~rinap) and I am organizing this workshop on "Conceptual Understanding of Deep Learning" (details below). It's trying to understand the Brain/Mind as an algorithm from a mathematical/theoretical perspective. I believe that a mathematical/algorithmic approach for understanding the Mind is crucial and very much missing. I'd appreciate any help I can get with advertising this on your blog/mailing-lists/twitter.Best,Rina
Please join us for a virtual Google workshop on “Conceptual Understanding of Deep Learning”When: May 17th 9am-4pm PST.Where: Live over Youtube,Goal: How does the Brain/Mind (perhaps even an artificial one) work at an algorithmic level? While deep learning has produced tremendous technological strides in recent decades, there is an unsettling feeling of a lack of “conceptual” understanding of why it works and to what extent it will work in the current form. The goal of the workshop is to bring together theorists and practitioners to develop an understanding of the right algorithmic view of deep learning, characterizing the class of functions that can be learned, coming up with the right learning architecture that may (provably) learn multiple functions, concepts and remember them over time as humans do, theoretical understanding of language, logic, RL, meta learning and lifelong learning.The speakers and panelists include Turing award winners Geoffrey Hinton, Leslie Valiant, and Godel Prize winner Christos Papadimitriou (full-details).Panel Discussion: There will also be a panel discussion on the fundamental question of “Is there a mathematical model for the Mind?”. We will explore basic questions such as “Is there a provable algorithm that captures the essential capabilities of the mind?”, “How do we remember complex phenomena?”, “How is a knowledge graph created automatically?”, “How do we learn new concepts, function and action hierarchies over time?” and “Why do human decisions seem so interpretable?”Twitter: #ConceptualDLWorkshop.Please help advertise on mailing-lists/blog-posts and Retweet.Hope to see you there!
Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. This report summarizes the outcomes of that workshop, "Randomized Algorithms for Scientific Computing (RASC)," held virtually across four days in December 2020 and January 2021.
Progress usually comes from a steady technology bootstrap…until it doesn’t.
Take for instance the race for the $1,000 genome that started in the early 2000s. Initially, sequencing the human genome meant a race between the well-funded public and private sectors but more importantly, the resources for the first breakthrough ended up costing upwards of $450M. Yet despite all the economic promise of genome sequencing, had Moore’s law been applied, sequencing one full genome would still cost $100,000 today. However, once the goal became clearer to everyone, a diversity of technologies and challengers emerged. This intense competition eventually yielded a growth faster than Moore’s Law. The main takeaway is that one cannot rely on the steady progress of one specific technology alone to commoditize tools.
What does this have to do with the current state of silicon computing and the new demand for Large Language Models (LLMs)? Everything if you ask us and here is how.
Less than a year into existence, Large Language Models like GPT-3 have already spawned a new generation of startups built on the ability of the model to respond to requests for which it was not trained. More importantly for us, hardware manufacturers are positing that one or several customers will be willing to put a billion dollars on the table to train an even larger model in the coming years.
Interestingly, much like the mass industrialization in the 1930s, the good folks at OpenAI are sketching new scaling laws for the industrialization of these larger models.
The sad truth is that extrapolating their findings to the training of a 10 Trillion parameters model involves a supercomputer running continuously for two decades. The minimum capital expenditure of this adventure is estimated in the realm of several hundreds of million dollars.
Much like what happened in sequencing, while silicon improvement and architecture may achieve speedups in the following years, it is fair to say that, even with Moore’s law, no foreseeable technology can reasonably train a fully scaled-up GPT-4 and grab the economic value associated with it.
Rebooting silicon with a different physics, light, and NvNs
For a real breakthrough to occur, much like what happened in the sequencing story, different technologies need to be jointly optimized. In our case, this means performing co-design with new hardware and physics but also going rogue on full programmability.
LightOn’s photonic hardware can produce massively parallel matrix-vector multiplications with an equivalent of 2 trillion parameters “for free”: this is about one-fifth of the number of parameters needed for GPT-4. Next comes revisiting the programmability. Current LightOn’s technology keeps these weights fixed by design. Co-design means finding the algorithms for which CPUs and GPUs can perform some of the most intelligent computations and how LightOn’s massive Non-von Neumann (NvN) hardware can do the heavy lifting. We already published how we are replacing backpropagation, the workhorse of Deep Learning, with an algorithm that unleashes the full potential of our hardware in distributed training. We are also working similarly on an inference step that will take full advantage of the massive number of parameters at our disposal. This involved effort relies in a heavy part thanks to our access to ½ million GPU hours on some of France’s and Europe’s largest supercomputers.
And this is just the beginning. There is a vast untapped potential for repurposing large swaths of optical technologies directed primarily for entertainment and telecommunication into computing.
The road towards a $1,000 GPT-3
Based on the GPT-3 training cost estimates, achieving a $1,000 GPT-3 requires four orders of magnitude improvements. Much like what occurred in 2007 with the genome sequencing revolution, Moore’s law may take care of the first two orders of magnitude in the coming decade but the next two rely on an outburst of new efficient technologies — hardware and algorithms. It just so happens that GPT-3 has close to 100 layers, so achieving two orders of magnitude savings may arise faster than you can imagine. Stay tuned!
Igor Carron is the CEO and co-founder at LightOn