arXiv:2601.15380v1 Announce Type: new
Abstract: We generalize the attention mechanism by viewing it through the lens of Entropic Optimal Transport, revealing that standard attention corresponds to a transport problem regularized by an implicit uniform prior. We introduce Generalized Optimal transpo...
Improving MoE Compute Efficiency by Composing Weight and Data Sparsity
arXiv:2601.15370v1 Announce Type: new
Abstract: Mixture-of-Experts layers achieve compute efficiency through weight sparsity: each token activates only a subset of experts. Data sparsity, where each expert processes only a subset of tokens, offers a complementary axis. Expert-choice routing impleme...
The Ontological Neutrality Theorem: Why Neutral Ontological Substrates Must Be Pre-Causal and Pre-Normative
arXiv:2601.14271v1 Announce Type: new
Abstract: Modern data systems must support accountability across persistent legal, political, and analytic disagreement. This requirement imposes strict constraints on the design of any ontology intended to function as a shared substrate. We establish an imposs...
Epistemic Constitutionalism Or: how to avoid coherence bias
arXiv:2601.14295v1 Announce Type: new
Abstract: Large language models increasingly function as artificial reasoners: they evaluate arguments, assign credibility, and express confidence. Yet their belief-forming behavior is governed by implicit, uninspected epistemic policies. This paper argues for ...
Scalable Knee-Point Guided Activity Group Selection in Multi-Tree Genetic Programming for Dynamic Multi-Mode Project Scheduling
arXiv:2601.14485v1 Announce Type: new
Abstract: The dynamic multi-mode resource-constrained project scheduling problem is a challenging scheduling problem that requires making decisions on both the execution order of activities and their corresponding execution modes. Genetic programming has been w...
arXiv:2601.11620v1 Announce Type: new
Abstract: Whether machines can be conscious depends not only on what they compute, but \emph{when} they compute it. Most deployed artificial systems realise their functions via sequential or time-multiplexed updates. Conscious experience appears unified and sim...
PRISM: Learning Design Knowledge from Data for Stylistic Design Improvement
arXiv:2601.11747v1 Announce Type: new
Abstract: Graphic design often involves exploring different stylistic directions, which can be time-consuming for non-experts. We address this problem of stylistically improving designs based on natural language instructions. While VLMs have shown initial succe...
Analytic Bijections for Smooth and Interpretable Normalizing Flows
arXiv:2601.10774v1 Announce Type: new
Abstract: A key challenge in designing normalizing flows is finding expressive scalar bijections that remain invertible with tractable Jacobians. Existing approaches face trade-offs: affine transformations are smooth and analytically invertible but lack express...
Attention Consistency Regularization for Interpretable Early-Exit Neural Networks
arXiv:2601.08891v1 Announce Type: new
Abstract: Early-exit neural networks enable adaptive inference by allowing predictions at intermediate layers, reducing computational cost. However, early exits often lack interpretability and may focus on different features than deeper layers, limiting trust a...
Affect and Effect: Limitations of regularisation-based continual learning in EEG-based emotion classification
arXiv:2601.07858v1 Announce Type: new
Abstract: Generalisation to unseen subjects in EEG-based emotion classification remains a challenge due to high inter-and intra-subject variability. Continual learning (CL) poses a promising solution by learning from a sequence of tasks while mitigating catastr...
HOSC: A Periodic Activation with Saturation Control for High-Fidelity Implicit Neural Representations
arXiv:2601.07870v1 Announce Type: new
Abstract: Periodic activations such as sine preserve high-frequency information in implicit neural representations (INRs) through their oscillatory structure, but often suffer from gradient instability and limited control over multi-scale behavior. We introduce...
arXiv:2601.08005v1 Announce Type: new
Abstract: Frontier AI regulations primarily focus on systems deployed to external users, where deployment is more visible and subject to outside scrutiny. However, high-stakes applications can occur internally when companies deploy highly capable systems within...
The Hessian of tall-skinny networks is easy to invert
arXiv:2601.06096v1 Announce Type: new
Abstract: We describe an exact algorithm for solving linear systems $Hx=b$ where $H$ is the Hessian of a deep net. The method computes Hessian-inverse-vector products without storing the Hessian or its inverse in time and storage that scale linearly in the numb...
Automatic Question Generation for Intuitive Learning Utilizing Causal Graph Guided Chain of Thought Reasoning
arXiv:2601.06098v1 Announce Type: new
Abstract: Intuitive learning is crucial for developing deep conceptual understanding, especially in STEM education, where students often struggle with abstract and interconnected concepts. Automatic question generation has become an effective strategy for perso...
Dynamic Intelligence Ceilings: Measuring Long-Horizon Limits of Planning and Creativity in Artificial Systems
arXiv:2601.06102v1 Announce Type: new
Abstract: Recent advances in artificial intelligence have produced systems capable of remarkable performance across a wide range of tasks. These gains, however, are increasingly accompanied by concerns regarding long-horizon developmental behavior, as many syst...
Comment on arXiv:2511.21731v1: Identifying Quantum Structure in AI Language: Evidence for Evolutionary Convergence of Human and Artificial Cognition
arXiv:2601.06104v1 Announce Type: new
Abstract: This note is a friendly technical check of arXiv:2511.21731v1. I highlight a few places where the manuscript's interpretation of (i) the reported CHSH/Bell-type calculations and (ii) Bose--Einstein (BE) fits to rank-frequency data seems to go beyond w...
Active Sensing Shapes Real-World Decision-Making through Dynamic Evidence Accumulation
arXiv:2601.04214v1 Announce Type: new
Abstract: Human decision-making heavily relies on active sensing, a well-documented cognitive behaviour for evidence gathering to accommodate ever-changing environments. However, its operational mechanism in the real world remains non-trivial. Currently, an in-...
Aligning Findings with Diagnosis: A Self-Consistent Reinforcement Learning Framework for Trustworthy Radiology Reporting
arXiv:2601.03321v1 Announce Type: new
Abstract: Multimodal Large Language Models (MLLMs) have shown strong potential for radiology report generation, yet their clinical translation is hindered by architectural heterogeneity and the prevalence of factual hallucinations. Standard supervised fine-tuni...
Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts
arXiv:2601.03315v1 Announce Type: new
Abstract: We report a case study of four end-to-end attempts to autonomously generate ML research papers using a pipeline of six LLM agents mapped to stages of the scientific workflow. Of these four, three attempts failed during implementation or evaluation. On...
arXiv:2601.03322v1 Announce Type: new
Abstract: Electroencephalography (EEG)-based brain-computer interfaces facilitate direct communication with a computer, enabling promising applications in human-computer interactions. However, their utility is currently limited because EEG decoding often suffer...
mHC-GNN: Manifold-Constrained Hyper-Connections for Graph Neural Networks
arXiv:2601.02451v1 Announce Type: new
Abstract: Graph Neural Networks (GNNs) suffer from over-smoothing in deep architectures and expressiveness bounded by the 1-Weisfeiler-Leman (1-WL) test. We adapt Manifold-Constrained Hyper-Connections (\mhc)~\citep{xie2025mhc}, recently proposed for Transforme...
An Empirical Study of On-Device Translation for Real-Time Live-Stream Chat on Mobile Devices
arXiv:2601.02641v1 Announce Type: new
Abstract: Despite its efficiency, there has been little research on the practical aspects required for real-world deployment of on-device AI models, such as the device's CPU utilization and thermal conditions. In this paper, through extensive experiments, we in...
Value-guided action planning with JEPA world models
arXiv:2601.00844v1 Announce Type: new
Abstract: Building deep learning models that can reason about their environment requires capturing its underlying dynamics. Joint-Embedded Predictive Architectures (JEPA) provide a promising framework to model such dynamics by learning representations and predi...
SLO-Conditioned Action Routing for Retrieval-Augmented Generation: Objective Ablation and Failure Modes
arXiv:2601.00841v1 Announce Type: new
Abstract: Retrieval-augmented generation (RAG) introduces a practical control problem: retrieval depth and generation behavior must be chosen per query to satisfy service-level objectives (SLOs) such as cost, refusal rate, and hallucination risk. This work mode...