Claude Code Leak: 16 Lessons on Building Production-Ready AI Systems
Over the past 24 hours, the developer community has been obsessed with one thing. A leak. The source code of Claude Code, one of the most advanced AI coding systems, surfaced online. Within hours, GitHub was flooded with forks, breakdowns, and deep dives. For developers, it felt like rare access. Wh...
IBM Releases Granite 4.0 3B Vision: A New Vision Language Model for Enterprise Grade Document Data Extraction
IBM has announced the release of Granite 4.0 3B Vision, a vision-language model (VLM) engineered specifically for enterprise-grade document data extraction. Departing from the monolithic approach of larger multimodal models, the 4.0 Vision release is architected as a specialized adapter designed to ...
Two-Stage Optimizer-Aware Online Data Selection for Large Language Models
arXiv:2604.00001v1 Announce Type: new
Abstract: Gradient-based data selection offers a principled framework for estimating sample utility in large language model (LLM) fine-tuning, but existing methods are mostly designed for offline settings. They are therefore less suited to online fine-tuning, w...
Task-Centric Personalized Federated Fine-Tuning of Language Models
arXiv:2604.00050v1 Announce Type: new
Abstract: Federated Learning (FL) has emerged as a promising technique for training language models on distributed and private datasets of diverse tasks. However, aggregating models trained on heterogeneous tasks often degrades the overall performance of indivi...
MIT researchers developed a testing framework that pinpoints situations where AI decision-support systems are not treating people and communities fairly.
Human-in-the-Loop Control of Objective Drift in LLM-Assisted Computer Science Education
arXiv:2604.00281v1 Announce Type: new
Abstract: Large language models (LLMs) are increasingly embedded in computer science education through AI-assisted programming tools, yet such workflows often exhibit objective drift, in which locally plausible outputs diverge from stated task specifications. E...
A Safety-Aware Role-Orchestrated Multi-Agent LLM Framework for Behavioral Health Communication Simulation
arXiv:2604.00249v1 Announce Type: new
Abstract: Single-agent large language model (LLM) systems struggle to simultaneously support diverse conversational functions and maintain safety in behavioral health communication. We propose a safety-aware, role-orchestrated multi-agent LLM framework designed...
Open, Reliable, and Collective: A Community-Driven Framework for Tool-Using AI Agents
arXiv:2604.00137v1 Announce Type: new
Abstract: Tool-integrated LLMs can retrieve, compute, and take real-world actions via external tools, but reliability remains a key bottleneck. We argue that failures stem from both tool-use accuracy (how well an agent invokes a tool) and intrinsic tool accurac...
One Panel Does Not Fit All: Case-Adaptive Multi-Agent Deliberation for Clinical Prediction
arXiv:2604.00085v1 Announce Type: new
Abstract: Large language models applied to clinical prediction exhibit case-level heterogeneity: simple cases yield consistent outputs, while complex cases produce divergent predictions under minor prompt changes. Existing single-agent strategies sample from on...
Temporal Memory for Resource-Constrained Agents: Continual Learning via Stochastic Compress-Add-Smooth
arXiv:2604.00067v1 Announce Type: new
Abstract: An agent that operates sequentially must incorporate new experience without forgetting old experience, under a fixed memory budget. We propose a framework in which memory is not a parameter vector but a stochastic process: a Bridge Diffusion on a repl...
Two weeks of dogfooding Engram, Weaviate's memory product, in daily Claude Code sessions. This surfaced where a dedicated memory product adds value, and the specific mechanics that prevent integration with coding assistants from working well.
Z.ai Launches GLM-5V-Turbo: A Native Multimodal Vision Coding Model Optimized for OpenClaw and High-Capacity Agentic Engineering Workflows Everywhere
In the field of vision-language models (VLMs), the ability to bridge the gap between visual perception and logical code execution has traditionally faced a performance trade-off. Many models excel at describing an image but struggle to translate that visual information into the rigorous syntax requi...
It feels like spring has sprung here, and so has a new NVIDIA integration, ticket sales for Interrupt 2026, and announcing LangSmith Fleet (formerly Agent Builder).
ADeLe: Predicting and explaining AI performance across tasks
AI benchmarks report how large language models (LLMs) perform on specific tasks but provide little insight into their underlying capabilities that drive their performance. They do not explain failures or reliably predict outcomes on new tasks. To address this, Microsoft researchers in collaboration ...
While we've seen remarkable progress in AI for coding and mathematics, creating agents that can navigate the messy, open-ended nature of real research (where things break for no obvious reason) has proven far more challenging.
Why thinking longer can matter more than being bigger
The post How Can A Model 10,000× Smaller Outsmart ChatGPT? appeared first on Towards Data Science.
The Model You Love Is Probably Just the One You Use
The following article originally appeared on Medium and is being republished here with the author’s permission. Ask 10 developers which LLM they’d recommend and you’ll get 10 different answers—and almost none of them are based on objective comparison. What you’ll get instead is a reflection of the m...
The gig workers who are training humanoid robots at home
When Zeus, a medical student living in a hilltop city in central Nigeria, returns to his studio apartment from a long day at the hospital, he turns on his ring light, straps his iPhone to his forehead, and starts recording himself. He raises his hands in front of him like a sleepwalker and puts a…
Speculative Decoding: How LLMs Generate Text 3x Faster
You probably use Google on a daily basis, and nowadays, you might have noticed AI-powered search results that compile answers from multiple sources. But you might have wondered how the AI can gather all this information and respond at such blazing speeds, especially when compared to the medium-sized...