RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning
Dense image captioning is critical for cross-modal alignment in vision-language pretraining and text-to-image generation, but scaling expert-quality annotations is prohibitively expensive. While synthetic captioning via strong vision-language models (VLMs) is a practical alternative, supervised dist...
Fingerprinting Concepts in Data Streams with Supervised and Unsupervised Meta-Information
arXiv:2603.11094v1 Announce Type: new
Abstract: Streaming sources of data are becoming more common as the ability to collect data in real-time grows. A major concern in dealing with data streams is concept drift, a change in the distribution of data over time, for example, due to changes in environ...
Structure-Aware Epistemic Uncertainty Quantification for Neural Operator PDE Surrogates
arXiv:2603.11052v1 Announce Type: new
Abstract: Neural operators (NOs) provide fast, resolution-invariant surrogates for mapping input fields to PDE solution fields, but their predictions can exhibit significant epistemic uncertainty due to finite data, imperfect optimization, and distribution shif...
Comparison of Outlier Detection Algorithms on String Data
arXiv:2603.11049v1 Announce Type: new
Abstract: Outlier detection is a well-researched and crucial problem in machine learning. However, there is little research on string data outlier detection, as most literature focuses on outlier detection of numerical data. A robust string data outlier detecti...
Gated Adaptation for Continual Learning in Human Activity Recognition
arXiv:2603.10046v1 Announce Type: new
Abstract: Wearable sensors in Internet of Things (IoT) ecosystems increasingly support applications such as remote health monitoring, elderly care, and smart home automation, all of which rely on robust human activity recognition (HAR). Continual learning syste...
Multi-level meta-reinforcement learning with skill-based curriculum
arXiv:2603.08773v1 Announce Type: new
Abstract: We consider problems in sequential decision making with natural multi-level structure, where sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure has remained a longstanding chall...
Best-of-Tails: Bridging Optimism and Pessimism in Inference-Time Alignment
arXiv:2603.06797v1 Announce Type: new
Abstract: Inference-time alignment effectively steers large language models (LLMs) by generating multiple candidates from a reference model and selecting among them with an imperfect reward model. However, current strategies face a fundamental dilemma: ``optimi...
arXiv:2603.06602v1 Announce Type: new
Abstract: As datasets continue to grow in size and complexity, finding succinct yet accurate data summaries poses a key challenge. Centroid-based clustering, a widely adopted approach to address this challenge, finds informative summaries of datasets in terms o...
The emergence of the AI Architect: Engineering the future of tech
According to Gartner, over 80% of enterprise AI projects fail to move beyond the prototype stage, highlighting the need for professionals who can design systems that work in the real world. Enter the AI Architect...
Capability Thresholds and Manufacturing Topology: How Embodied Intelligence Triggers Phase Transitions in Economic Geography
arXiv:2603.04457v1 Announce Type: new
Abstract: The fundamental topology of manufacturing has not undergone a paradigm-level transformation since Henry Ford's moving assembly line in 1913. Every major innovation of the past century, from the Toyota Production System to Industry 4.0, has optimized w...
FedEMA-Distill: Exponential Moving Average Guided Knowledge Distillation for Robust Federated Learning
arXiv:2603.04422v1 Announce Type: new
Abstract: Federated learning (FL) often degrades when clients hold heterogeneous non-Independent and Identically Distributed (non-IID) data and when some clients behave adversarially, leading to client drift, slow convergence, and high communication overhead. T...
Enterprise adoption is shifting from “capability” to “credibility.” Organizations without strong oversight, documentation, and risk management risk losing trust and market momentum. Are you ready?
arXiv:2603.02365v1 Announce Type: new
Abstract: The paper investigates whether and how AI systems can realize states of uncertainty. By adopting a functionalist and behavioral perspective, it examines how symbolic, connectionist and hybrid architectures make room for uncertainty. The paper distingu...
Estimating Visual Attribute Effects in Advertising from Observational Data: A Deepfake-Informed Double Machine Learning Approach
arXiv:2603.02359v1 Announce Type: new
Abstract: Digital advertising increasingly relies on visual content, yet marketers lack rigorous methods for understanding how specific visual attributes causally affect consumer engagement. This paper addresses a fundamental methodological challenge: estimatin...
Microsoft research lead Doug Burger introduces his new podcast series, The Shape of Things to Come, an exploration into the fundamental truths about AI and how the technology will reshape the future.
The post Trailer: The Shape of Things to Come appeared first on Microsoft Research.
StaTS: Spectral Trajectory Schedule Learning for Adaptive Time Series Forecasting with Frequency Guided Denoiser
arXiv:2603.00037v1 Announce Type: new
Abstract: Diffusion models have been used for probabilistic time series forecasting and show strong potential. However, fixed noise schedules often produce intermediate states that are hard to invert and a terminal state that deviates from the near noise assump...
Econometric vs. Causal Structure-Learning for Time-Series Policy Decisions: Evidence from the UK COVID-19 Policies
arXiv:2603.00041v1 Announce Type: new
Abstract: Causal machine learning (ML) recovers graphical structures that inform us about potential cause-and-effect relationships. Most progress has focused on cross-sectional data with no explicit time order, whereas recovering causal structures from time ser...
EMBridge: Enhancing Gesture Generalization from EMG Signals through Cross-Modal Representation Learning
Hand gesture classification using high-quality structured data such as videos, im-
ages, and hand skeletons is a well-explored problem in computer vision. Alterna-
tively, leveraging low-power, cost-effective bio-signals, e.g., surface electromyo-
graphy (sEMG), allows for continuous gesture predict...
ChatGPT as a therapist? New study reveals serious ethical risks
As millions turn to ChatGPT and other AI chatbots for therapy-style advice, new research from Brown University raises a serious red flag: even when instructed to act like trained therapists, these systems routinely break core ethical standards of mental health care. In side-by-side evaluations with ...
Causal Identification from Counterfactual Data: Completeness and Bounding Results
arXiv:2602.23541v1 Announce Type: new
Abstract: Previous work establishing completeness results for $\textit{counterfactual identification}$ has been circumscribed to the setting where the input data belongs to observational or interventional distributions (Layers 1 and 2 of Pearl's Causal Hierarch...