Paris Machine Learning Meetup Archives

Paris Machine Learning Meetups Archives



Related links

 If you:

  • want to do a presentation ? go fill this form here
  • want to add something to our monthly newsletter, go fill this form here.
  • want to receive a low frequency newsletter, go here


All the slides and videos of the previous meetups can be found below:

Season 12 (September 2025-June 2026)

Meetup #4 at Linkup (Organized by Linkup and LightOn)

18:45 – 19:00 · Julia Sartre (10 + 5) Workflows That Design Themselves

Building the right agentic workflow for a client use case usually takes days of trial and error to get to the best possible ROI. What if the agent figured it out on its own? We'll show how we applied an autonomous research loop on top of Paradigm to automatically explore workflow architecture and call patterns to converge on the optimal pipeline without human iteration.

***

19:00 – 19:15 · Wajdi Ben Saad (10 + 5) I Built a Search Agent, Then Spent Months Teaching It When Not to Answer

Building an AI assistant often starts with a simple idea: connect an LLM to internal knowledge, retrieve the right documents, and generate useful answers. In practice, the hard part begins after the first demo works.

This talk shares the story of building a search-based assistant for customer-service knowledge, using sources such as templates, PDFs, operational rules, and structured store information. The first version followed a classic RAG pattern: chunk documents, embed them, retrieve context, and ask the model to answer. But real usage exposed harder problems: personal data, weak retrieval confidence, outdated content, conflicting sources, missing context, and cases where the safest answer was not to answer at all.

I will walk through how the system evolved from a simple RAG pipeline into a more controlled search agent with routing, source prioritization, PII detection, moderation, SQL-backed tools, and feedback loops. The focus is not on a perfect architecture, but on the practical decisions needed to make an AI assistant reliable in production.

The main lesson: a good search agent is not the one that always answers. It is the one that knows when it has enough evidence, when it needs a tool, and when it should stay silent.


19:15 – 19:30 · Guillaume Desforges (10 + 5) LLM-wiki systems

Qui n'a pas entendu parler du "karpathy LLM-wiki" ? Ce gist a rapidement explosé les compteurs de stars sur GitHub. Pourquoi un tel succès ? De quoi s'agit-il ? Dans quelle situation est-ce utile ?

19:30 – 19:50 · Antoine Chaffin (15 + 5) LightOn's Bet on Late Interaction Paying Off

LightOn has been betting on late interaction retrieval for a while, but is it worth it? Come and learn how the simple difference between single and multi vector retrievers leads to much more effective, modern retrievers outperforming models up to 54 times bigger and how PyLate makes them as easy to train as Sentence Transformers.

Transforming causal generative language models into bidirectional encoders offers a powerful alternative to BERT-style architectures. However, current approaches remain limited: they lack consensus on optimal training objectives, suffer from catastrophic forgetting at scale, and fail to flexibly integrate the vast ecosystem of specialized generative models. In this work, through systematic ablations on the Gemma3 and Qwen3 families, we identify the key factors driving successful adaptation, highlighting the critical role of an often-omitted prior masking phase. To scale this process without original pre-training data, we introduce a dual strategy combining linear weight merging with a lightweight multi-domain data mixture that mitigates catastrophic forgetting. Finally, we augment our encoders by merging them with specialized causal models, seamlessly transferring modality- and domain-specific capabilities. This open-source recipe, designed for any causal decoder LLM, yields BidirLM, a family of five encoders that outperform alternatives on text, vision, and audio representation benchmarks.

***

20:10 – 20:30 · Thibault Formal (15 + 5) Learning Retrieval Models with Sparse Autoencoders

Paper abstract: Sparse autoencoders (SAEs) provide a powerful mechanism for decomposing the dense representations produced by Large Language Models (LLMs) into interpretable latent features. We posit that SAEs constitute a natural foundation for Learned Sparse Retrieval (LSR), whose objective is to encode queries and documents into high-dimensional sparse representations optimized for efficient retrieval. In contrast to existing LSR approaches that project input sequences into the vocabulary space, SAE-based representations offer the potential to produce more semantically structured, expressive, and language-agnostic features. By leveraging recently released open-source SAEs, we show that their latent features can serve as effective indexing units for representing documents and queries for sparse retrieval. Our experiments demonstrate that SAE-based LSR models consistently outperform their vocabulary-based counterparts in multilingual and out-of-domain settings. Finally, we introduce SPLARE, a 7B-parameter multilingual retrieval model capable of producing generalizable sparse latent embeddings for a wide range of languages and domains, achieving top results on MMTEB's multilingual and English retrieval tasks. We also release a more efficient 2B-parameter variant, offering strong performance with a significantly lighter footprint.





Season 7 (September 2019 - June 2020)

TBD

Previously on the Paris Machine Learning meetup:

Season 6 (September 2018 - June 2019)


Season 5 (September 2017 - July 2018)

                Season 4 (September 2016-June 2017)

                Season 3 (September 2015- June 2016)


                Paris Machine Learning Newsletter, March 2016 [In French]



                Season 2 (Sept 2014 - July 2015)

                    Season 1 (June 2013 - July 2014)
                    •  Epilogue Season 1 (July 2014 at DojoCrea)

                    General links:
                    Academic Paris based Machine Learning groups

                    all blog entries related to Meetups.

                    No comments:

                    Printfriendly