Agent Research Papers

Automatically Updated on 2025.10.30

Current Search Keywords: Agent,Multi-Agent,Tool Learning,Agent RL,Autonomous Agent,LLM Agent

If you have any other keywords, please feel free to let us know :)

Web Page (Scrape Code)

Agent

Publish Date Title Authors PDF Code
2025-10-28 Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents Yueqi Song et.al. 2510.24702 null
2025-10-28 AgentFold: Long-Horizon Web Agents with Proactive Context Management Rui Ye et.al. 2510.24699 null
2025-10-28 AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis Xuanzhong Chen et.al. 2510.24695 null
2025-10-28 Repurposing Synthetic Data for Fine-grained Search Agent Supervision Yida Zhao et.al. 2510.24694 null
2025-10-28 OrchDAG: Complex Tool Orchestration in Multi-Turn Interactions with Plan DAGs Yifu Lu et.al. 2510.24663 null
2025-10-28 FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling Zengzhuang Xu et.al. 2510.24645 null
2025-10-28 ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers? Christine Ye et.al. 2510.24591 null
2025-10-28 Affordance Representation and Recognition for Autonomous Agents Habtom Kahsay Gidey et.al. 2510.24459 null
2025-10-28 Law in Silico: Simulating Legal Society with LLM-Based Agents Yiding Wang et.al. 2510.24442 null
2025-10-28 Can LLMs Write Faithfully? An Agent-Based Evaluation of LLM-generated Islamic Content Abdullah Mushtaq et.al. 2510.24438 null
2025-10-28 Policy Cards: Machine-Readable Runtime Governance for Autonomous AI Agents Juraj Mavračić et.al. 2510.24383 null
2025-10-28 Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation Lingyue Fu et.al. 2510.24358 null
2025-10-28 Cybersecurity AI Benchmark (CAIBench): A Meta-Benchmark for Evaluating Cybersecurity AI Agents María Sanz-Gómez et.al. 2510.24317 null
2025-10-28 Retrieval and Argumentation Enhanced Multi-Agent LLMs for Judgmental Forecasting Deniz Gorur et.al. 2510.24303 null
2025-10-28 MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools Wenhao Wang et.al. 2510.24284 null
2025-10-28 Investigating Software Aging in LLM-Generated Software Systems César Santos et.al. 2510.24188 null
2025-10-28 BLM $_1$ : A Boundless Large Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning Wentao Tan et.al. 2510.24161 null
2025-10-28 From Observability Data to Diagnosis: An Evolving Multi-agent System for Incident Management in Cloud Systems Yu Luo et.al. 2510.24145 null
2025-10-28 Reinforcement Learning for Long-Horizon Multi-Turn Search Agents Vivek Kalyan et.al. 2510.24126 null
2025-10-28 PFEA: An LLM-based High-Level Natural Language Planning and Feedback Embodied Agent for Human-Centered AI Wenbin Ding et.al. 2510.24109 null
2025-10-28 BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents Litu Ou et.al. 2510.23458 null
2025-10-28 Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views Anna Deichler et.al. 2510.22672 null
2025-10-27 Are Agents Just Automata? On the Formal Equivalence Between Agentic AI and the Chomsky Hierarchy Roham Koohestani et.al. 2510.23487 null
2025-10-27 Model Proficiency in Centralized Multi-Agent Systems: A Performance Study Anna Guerra et.al. 2510.23447 null
2025-10-27 AutoStreamPipe: LLM Assisted Automatic Generation of Data Stream Processing Pipelines Abolfazl Younesi et.al. 2510.23408 null
2025-10-27 Multi-Stakeholder Alignment in LLM-Powered Collaborative AI Systems: A Multi-Agent Framework for Intelligent Tutoring Alexandre P Uchoa et.al. 2510.23245 null
2025-10-27 Evaluation of Vision-LLMs in Surveillance Video Pascal Benschop et.al. 2510.23190 null
2025-10-27 SI-Bench: Benchmarking Social Intelligence of Large Language Models in Human-to-Human Conversations Shuai Huang et.al. 2510.23182 null
2025-10-27 Adapting Interleaved Encoders with PPO for Language-Guided Reinforcement Learning in BabyAI Aryan Mathur et.al. 2510.23148 null
2025-10-27 Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLMs Kai Zhuang et.al. 2510.23127 null
2025-10-27 Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning Ran Xu et.al. 2510.23038 null
2025-10-27 P1GPT: a multi-agent LLM workflow module for multi-modal financial information analysis Chen-Che Lu et.al. 2510.23032 null
2025-10-27 TALM: Dynamic Tree-Structured Multi-Agent Framework with Long-Term Memory for Scalable Code Generation Ming-Tung Shen et.al. 2510.23010 null
2025-10-27 CodeAD: Synthesize Code of Rules for Log-based Anomaly Detection with LLMs Junjie Huang et.al. 2510.22986 null
2025-10-27 Language Server CLI Empowers Language Agents with Process Rewards Yifan Zhang et.al. 2510.22907 null
2025-10-27 On Generalization in Agentic Tool Calling: CoreThink Agentic Reasoner and MAVEN Dataset Vishvesh Bhat et.al. 2510.22898 null
2025-10-26 Distributed Multi-Agent Bandits Over Erdős-Rényi Random Networks Jingyuan Liu et.al. 2510.22811 null
2025-10-26 Collaborative LLM Agents for C4 Software Architecture Design Automation Kamil Szczepanik et.al. 2510.22787 null
2025-10-26 How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations Zora Zhiruo Wang et.al. 2510.22780 null
2025-10-26 ATLAS: Actor-Critic Task-Completion with Look-ahead Action Simulation Jiali Cheng et.al. 2510.22732 null
2025-10-24 A Knowledge-Graph Translation Layer for Mission-Aware Multi-Agent Path Planning in Spatiotemporal Dynamics Edward Holmberg et.al. 2510.21695 null
2025-10-24 AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite Jonathan Bragg et.al. 2510.21652 null
2025-10-24 Five-loop beta function for gauge theories: computations, results and consequences F. Herzog et.al. 2510.21624 null
2025-10-24 DeepAgent: A General Reasoning Agent with Scalable Toolsets Xiaoxi Li et.al. 2510.21618 null
2025-10-24 Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine Wenyi Wang et.al. 2510.21614 null
2025-10-24 Doc-Researcher: A Unified System for Multimodal Document Parsing and Deep Research Kuicai Dong et.al. 2510.21603 null
2025-10-24 EU-Agent-Bench: Measuring Illegal Behavior of LLM Agents Under EU Law Ilija Lichkovski et.al. 2510.21524 null
2025-10-24 OpenHype: Hyperbolic Embeddings for Hierarchical Open-Vocabulary Radiance Fields Lisa Weijler et.al. 2510.21441 null
2025-10-24 Context Engineering for AI Agents in Open-Source Software Seyedmoein Mohsenimofidi et.al. 2510.21413 null
2025-10-24 HIKMA: Human-Inspired Knowledge by Machine Agents through a Multi-Agent Framework for Semi-Autonomous Scientific Conferences Zain Ul Abideen Tariq et.al. 2510.21370 null
2025-10-24 Magellan: Guided MCTS for Latent Space Exploration and Novelty Generation Lufan Chang et.al. 2510.21341 null
2025-10-24 Towards Reliable Code-as-Policies: A Neuro-Symbolic Framework for Embodied Task Planning Sanghyun Ahn et.al. 2510.21302 null
2025-10-24 Securing AI Agent Execution Christoph Bühler et.al. 2510.21236 null
2025-10-24 DispatchMAS: Fusing taxonomy and artificial intelligence agents for emergency medical services Xiang Li et.al. 2510.21228 null
2025-10-24 DAO-AI: Evaluating Collective Decision-Making through Agentic AI in Decentralized Governance Chunghyun Han et.al. 2510.21117 null
2025-10-24 Soft Instruction De-escalation Defense Nils Philipp Walter et.al. 2510.21057 null
2025-10-24 Mixture-of-Minds: Multi-Agent Reinforcement Learning for Table Understanding Yuhang Zhou et.al. 2510.20176 null
2025-10-23 From Questions to Queries: An AI-powered Multi-Agent Framework for Spatial Text-to-SQL Ali Khosravi Kazazi et.al. 2510.21045 null
2025-10-23 AgentArcEval: An Architecture Evaluation Method for Foundation Model based Agents Qinghua Lu et.al. 2510.21031 null
2025-10-23 Co-Designing Quantum Codes with Transversal Diagonal Gates via Multi-Agent Systems Xi He et.al. 2510.20728 null
2025-10-23 C-NAV: Towards Self-Evolving Continual Object Navigation in Open World Ming-Ming Yu et.al. 2510.20685 null
2025-10-23 Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Jiahao Meng et.al. 2510.20579 null
2025-10-23 EmbodiedBrain: Expanding Performance Boundaries of Task Planning for Embodied Intelligence Ding Zou et.al. 2510.20578 null
2025-10-23 Designing Intent Communication for Agent-Human Collaboration Yi Li et.al. 2510.20409 null
2025-10-23 Balancing Specialization and Centralization: A Multi-Agent Reinforcement Learning Benchmark for Sequential Industrial Control Tom Maus et.al. 2510.20408 null
2025-10-23 GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments? Chiyu Chen et.al. 2510.20333 null
2025-10-23 From Generation to Attribution: Music AI Agent Architectures for the Post-Streaming Era Wonil Kim et.al. 2510.20276 null
2025-10-23 ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases Ziqian Zhong et.al. 2510.20270 null
2025-10-23 Towards AI Agents for Course Instruction in Higher Education: Early Experiences from the Field Yogesh Simmhan et.al. 2510.20255 null
2025-10-23 Automated Cloud Infrastructure-as-Code Reconciliation with AI Agents Zhenning Yang et.al. 2510.20211 null
2025-10-23 Merge and Conquer: Evolutionarily Optimizing AI for 2048 Maggie Bai et.al. 2510.20205 null
2025-10-23 Human-Centered LLM-Agent System for Detecting Anomalous Digital Asset Transactions Gyuyeon Na et.al. 2510.20102 null
2025-10-22 ToolScope: Enhancing LLM Agent Tool Use through Tool Merging and Context-Aware Filtering Marianne Menglin Liu et.al. 2510.20036 null
2025-10-22 Communication to Completion: Modeling Collaborative Workflows with Intelligent Multi-Agent Communication Yiming Lu et.al. 2510.19995 null
2025-10-22 A Tutorial on Cognitive Biases in Agentic AI-Driven 6G Autonomous Networks Hatim Chergui et.al. 2510.19973 null
2025-10-22 Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets Jiashi Feng et.al. 2510.19944 null
2025-10-22 Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation Jackson Hassell et.al. 2510.19897 null
2025-10-22 Large Language Model enabled Mathematical Modeling Guoyun Zhang et.al. 2510.19895 null
2025-10-22 Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents Gil Pasternak et.al. 2510.19771 null
2025-10-22 Review of Tools for Zero-Code LLM Based Application Development Priyaranjan Pattnayak et.al. 2510.19747 null
2025-10-22 Misalignment Bounty: Crowdsourcing AI Agent Misbehavior Rustem Turtayev et.al. 2510.19738 null
2025-10-22 Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning Gunshi Gupta et.al. 2510.19732 null
2025-10-22 Are Large Language Models Sensitive to the Motives Behind Communication? Addison J. Wu et.al. 2510.19687 null
2025-10-22 Pragmatic Heterogeneous Collaborative Perception via Generative Communication Mechanism Junfei Zhou et.al. 2510.19618 null
2025-10-22 Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1 Qianli Ma et.al. 2510.19600 null
2025-10-22 gem5 Co-Pilot: AI Assistant Agent for Architectural Design Space Exploration Zuoming Fu et.al. 2510.19577 null
2025-10-22 AegisMCP: Online Graph Intrusion Detection for Tool-Augmented LLMs on Edge Devices Zhonghao Zhan et.al. 2510.19462 null
2025-10-22 MSC-Bench: A Rigorous Benchmark for Multi-Server Tool Orchestration Jia-Kai Dong et.al. 2510.19423 null
2025-10-22 ColorAgent: Building A Robust, Personalized, and Interactive OS Agent Ning Li et.al. 2510.19386 null
2025-10-22 Nonmonotone subgradient methods based on a local descent lemma Francisco J. Aragón-Artacho et.al. 2510.19341 null
2025-10-22 Learning to Make Friends: Coaching LLM Agents toward Emergent Social Ties Philipp J. Schneider et.al. 2510.19299 null
2025-10-22 Trace: Securing Smart Contract Repository Against Access Control Vulnerability Chong Chen et.al. 2510.19254 null
2025-10-22 SheetBrain: A Neuro-Symbolic Agent for Accurate Reasoning over Complex and Large Spreadsheets Ziwei Wang et.al. 2510.19247 null
2025-10-22 DiSRouter: Distributed Self-Routing for LLM Selections Hang Zheng et.al. 2510.19208 null
2025-10-22 Defending Against Prompt Injection with DataFilter Yizhu Wang et.al. 2510.19207 null
2025-10-22 WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation Yaoyao Qian et.al. 2510.19205 null
2025-10-21 When Your AI Agent Succumbs to Peer-Pressure: Studying Opinion-Change Dynamics of LLMs Aliakbar Mehdizadeh et.al. 2510.19107 null
2025-10-21 Plural Voices, Single Agent: Towards Inclusive AI in Multi-User Domestic Spaces Joydeep Chandra et.al. 2510.19008 null
2025-10-21 Search Self-play: Pushing the Frontier of Agent Capability without Supervision Hongliang Lu et.al. 2510.18821 null
2025-10-21 WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection Guanzhong He et.al. 2510.18798 null
2025-10-21 KAT-Coder Technical Report Zizheng Zhan et.al. 2510.18779 null
2025-10-21 Fetch.ai: An Architecture for Modern Multi-Agent Systems Michael J. Wooldridge et.al. 2510.18699 null
2025-10-21 Tokencake: A KV-Cache-centric Serving Framework for LLM-based Multi-Agent Applications Zhuohang Bian et.al. 2510.18586 null
2025-10-21 WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality Chunyang Li et.al. 2510.18560 null
2025-10-21 SOCIA-Nabla: Textual Gradient Meets Multi-Agent Orchestration for Automated Simulator Generation Yuncheng Hua et.al. 2510.18551 null
2025-10-21 JAUNT: Joint Alignment of User Intent and Network State for QoE-centric LLM Tool Routing Enhan Li et.al. 2510.18550 null
2025-10-21 EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval Zebin Yang et.al. 2510.18546 null
2025-10-21 Socialized Learning and Emergent Behaviors in Multi-Agent Systems based on Multimodal Large Language Models Sureyya Akin et.al. 2510.18515 null
2025-10-21 Crucible: Quantifying the Potential of Control Algorithms through LLM Agents Lianchen Jia et.al. 2510.18491 null
2025-10-21 LAFA: Agentic LLM-Driven Federated Analytics over Decentralized Data Sources Haichao Ji et.al. 2510.18477 null
2025-10-21 Probabilistic Modeling of Intentions in Socially Intelligent LLM Agents Feifan Xia et.al. 2510.18476 null
2025-10-21 Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents Guangfu Guo et.al. 2510.18424 null
2025-10-21 Memory-Augmented State Machine Prompting: A Novel LLM Agent Framework for Real-Time Strategy Games Runnan Qi et.al. 2510.18395 null
2025-10-21 MENTOR: A Reinforcement Learning Framework for Model Enhancement via Teacher-Optimized Rewards in Small Models ChangSu Choi et.al. 2510.18383 null
2025-10-21 InspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger Collaboration Yunkun Wang et.al. 2510.18327 null
2025-10-21 Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning Aaron Bell et.al. 2510.18318 null
2025-10-21 Genesis: Evolving Attack Strategies for LLM Web Agent Red-Teaming Zheng Zhang et.al. 2510.18314 null
2025-10-21 Food4All: A Multi-Agent Framework for Real-time Free Food Discovery with Integrated Nutritional Metadata Zhengqing Yuan et.al. 2510.18289 null
2025-10-21 Optimal allocations with distortion risk measures and mixed risk attitudes Mario Ghossoub et.al. 2510.18236 null
2025-10-21 Applying voxel-based analysis to oropharyngeal cancer proton therapy patients: a correlation study on radiation-induced acute dysphagia Qianxia Wang et.al. 2510.18210 null
2025-10-21 Adaptive Coopetition: Leveraging Coarse Verifier Signals for Resilient Multi-Agent LLM Reasoning Rui Jerry Huang et.al. 2510.18179 null
2025-10-21 NEBULA: Do We Evaluate Vision-Language-Action Agents Correctly? Jierui Peng et.al. 2510.16263 null
2025-10-21 SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection Yang Feng et.al. 2510.16219 null
2025-10-21 PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold Yi Wan et.al. 2510.15862 null
2025-10-21 FinAI Data Assistant: LLM-based Financial Database Query Processing with the OpenAI Function Calling API Juhyeong Kim et.al. 2510.14162 null
2025-10-21 A $^2$ FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning Qianben Chen et.al. 2510.12838 null
2025-10-20 AgentChangeBench: A Multi-Dimensional Evaluation Framework for Goal-Shift Robustness in Conversational AI Manik Rana et.al. 2510.18170 null
2025-10-20 World-in-World: World Models in a Closed-Loop World Jiahan Zhang et.al. 2510.18135 null
2025-10-20 SafeCoop: Unravelling Full Stack Safety in Agentic Collaborative Driving Xiangbo Gao et.al. 2510.18123 null
2025-10-20 Investigating the Impact of Dark Patterns on LLM-Based Web Agents Devin Ersoy et.al. 2510.18113 null
2025-10-20 Does Reasoning Help LLM Agents Play Dungeons and Dragons? A Prompt Engineering Experiment Patricia Delafuente et.al. 2510.18112 null
2025-10-20 CompactPrompt: A Unified Pipeline for Prompt Data Compression in LLM Workflows Joong Ho Choi et.al. 2510.18043 null
2025-10-20 OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning Zhenyu Bi et.al. 2510.18032 null
2025-10-20 FABRIC: Framework for Agent-Based Realistic Intelligence Creation Abhigya Verma et.al. 2510.17995 null
2025-10-20 PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits Neeladri Bhuiya et.al. 2510.17947 null
2025-10-20 Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics Akshara Prabhakar et.al. 2510.17797 null
2025-10-20 Executable Knowledge Graphs for Replicating AI Research Yujie Luo et.al. 2510.17795 null
2025-10-20 A Mimamsa Inspired Framework For Instruction Sequencing In AI Agents Bama Srinivasan et.al. 2510.17691 null
2025-10-20 ShapeCraft: LLM Agents for Structured, Textured and Interactive 3D Modeling Shuyuan Zhang et.al. 2510.17603 null
2025-10-20 MIRAGE: Agentic Framework for Multimodal Misinformation Detection with Web-Grounded Reasoning Mir Nafis Sharear Shopnil et.al. 2510.17590 null
2025-10-20 Cybersecurity AI: Evaluating Agentic Cybersecurity in Attack/Defense CTFs Francesco Balassone et.al. 2510.17521 null
2025-10-20 Empowering Real-World: A Survey on the Technology, Practice, and Evaluation of LLM-driven Industry Agents Yihong Tang et.al. 2510.17491 null
2025-10-20 Agentic Reinforcement Learning for Search is Unsafe Yushi Yang et.al. 2510.17431 null
2025-10-20 Diverse Planning with Simulators via Linear Temporal Logic Mustafa F. Abdelwahed et.al. 2510.17418 null
2025-10-20 Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems Rishi Jha et.al. 2510.17276 null
2025-10-20 Coinvisor: An RL-Enhanced Chatbot Agent for Interactive Cryptocurrency Investment Analysis Chong Chen et.al. 2510.17235 null
2025-10-20 ALPINE: A Lightweight and Adaptive Privacy-Decision Agent Framework for Dynamic Edge Crowdsensing Guanjie Cheng et.al. 2510.17162 null
2025-10-20 Decentralized Real-Time Planning for Multi-UAV Cooperative Manipulation via Imitation Learning Shantnav Agarwal et.al. 2510.17143 null
2025-10-20 Do LLMs Recognize Your Latent Preferences? A Benchmark for Latent Information Discovery in Personalized Interaction Ioannis Tsaknakis et.al. 2510.17132 null
2025-10-20 Semantic Intelligence: A Bio-Inspired Cognitive Framework for Embodied Agents Wenbing Tang et.al. 2510.17129 null
2025-10-20 Verification-Aware Planning for Multi-Agent Systems Tianyang Xu et.al. 2510.17109 null
2025-10-20 Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models Elias Hossain et.al. 2510.17098 null
2025-10-20 A Brain Cell Type Resource Created by Large Language Models and a Multi-Agent AI System for Collaborative Community Annotation Rongbin Li et.al. 2510.17064 null
2025-10-20 Consistent Zero-Shot Imitation with Contrastive Goal Inference Kathryn Wantlin et.al. 2510.17059 null
2025-10-20 Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks Trilok Padhi et.al. 2510.14207 null
2025-10-19 ToolCritic: Detecting and Correcting Tool-Use Errors in Dialogue Systems Hassan Hamad et.al. 2510.17052 null
2025-10-19 ReclAIm: A multi-agent framework for degradation-aware performance tuning of medical imaging AI Eleftherios Tzanis et.al. 2510.17004 null
2025-10-19 EEschematic: Multimodal-LLM Based AI Agent for Schematic Generation of Analog Circuit Chang Liu et.al. 2510.17002 null
2025-10-19 STARK: Strategic Team of Agents for Refining Kernels Juncheng Dong et.al. 2510.16996 null
2025-10-19 Towards Interpretable and Trustworthy Time Series Reasoning: A BlueSky Vision Kanghui Ning et.al. 2510.16980 null
2025-10-19 Lark: Biologically Inspired Neuroevolution for Multi-Stakeholder LLM Agents Dheeraj Chintapalli et.al. 2510.16978 null
2025-10-19 Learning Ecology with VERA Using Conceptual Models and Simulations Spencer Rugaber et.al. 2510.16944 null
2025-10-19 VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents Kangrui Wang et.al. 2510.16907 null
2025-10-19 Agentic Inequality Matthew Sharp et.al. 2510.16853 null
2025-10-19 FinSight: Towards Real-World Financial Deep Research Jiajie Jin et.al. 2510.16844 null
2025-10-19 More with Less: An Empirical Study of Turn-Control Strategies for Efficient Coding Agents Pengfei Gao et.al. 2510.16786 null
2025-10-19 Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI Jitao Sang et.al. 2510.16720 null
2025-10-19 An Agentic Framework with LLMs for Solving Complex Vehicle Routing Problems Ni Zhang et.al. 2510.16701 null
2025-10-19 Pursuing Minimal Sufficiency in Spatial Reasoning Yejie Guo et.al. 2510.16688 null
2025-10-19 Agentic Design of Compositional Machines Wenqian Zhang et.al. 2510.14980 null
2025-10-18 Unleashing Diverse Thinking Modes in LLMs through Multi-Agent Collaboration Zhixuan He et.al. 2510.16645 null
2025-10-18 Prompt Optimization via Retrieved Reasoning Assets and Multi-Agent Analysis Wonduk Seo et.al. 2510.16635 null
2025-10-18 Prior Makes It Possible: From Sublinear Graph Algorithms to LLM Test-Time Methods Avrim Blum et.al. 2510.16609 null
2025-10-18 Ripple Effect Protocol: Coordinating Agent Populations Ayush Chopra et.al. 2510.16572 null
2025-10-18 BuildArena: A Physics-Aligned Interactive Benchmark of LLMs for Engineering Construction Tian Xia et.al. 2510.16559 null
2025-10-18 Check Yourself Before You Wreck Yourself: Selectively Quitting Improves LLM Agent Safety Vamshi Krishna Bonagiri et.al. 2510.16492 null
2025-10-18 REALM: An MLLM-Agent Framework for Open World 3D Reasoning Segmentation and Editing on Gaussian Splatting Changyue Shi et.al. 2510.16410 null
2025-10-18 ATA: A Neuro-Symbolic Approach to Implement Autonomous and Trustworthy Agents David Peer et.al. 2510.16381 null
2025-10-18 Synergizing chemical and AI communities for advancing laboratories of the future Saejin Oh et.al. 2510.16293 null
2025-10-17 Outraged AI: Large language models prioritise emotion over cost in fairness enforcement Hao Liu et.al. 2510.17880 null
2025-10-17 WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale Yuxuan Lu et.al. 2510.16252 null
2025-10-17 Towards Automatic Evaluation and Selection of PHI De-identification Models via Multi-Agent Collaboration Guanchen Wu et.al. 2510.16194 null
2025-10-17 Agentic AI for Ultra-Modern Networks: Multi-Agent Framework for RAN Autonomy and Assurance Sukhdeep Singh et.al. 2510.16144 null
2025-10-17 Narrowing Action Choices with AI Improves Human Sequential Decisions Eleni Straitouri et.al. 2510.16097 null
2025-10-17 TriAgent: Automated Biomarker Discovery with Deep Research Grounding for Triage in Acute Care by LLM-Based Multi-Agent Collaboration Kerem Delikoyun et.al. 2510.16080 null
2025-10-17 EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle Rong Wu et.al. 2510.16079 null
2025-10-17 SIADAFIX: issue description response for adaptive program repair Xin Cao et.al. 2510.16059 null
2025-10-17 PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction Simon Yu et.al. 2510.15863 null
2025-10-17 Self-evolving expertise in complex non-verifiable subject domains: dialogue as implicit meta-RL Richard M. Bailey et.al. 2510.15772 null
2025-10-17 AURA: An Agent Autonomy Risk Assessment Framework Lorenzo Satta Chiris et.al. 2510.15739 null
2025-10-17 Build Your Personalized Research Group: A Multiagent Framework for Continual and Interactive Science Automation Ed Li et.al. 2510.15624 null
2025-10-17 The Spark Effect: On Engineering Creative Diversity in Multi-Agent AI Systems Alexander Doudkin et.al. 2510.15568 null
2025-10-17 MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games Huining Yuan et.al. 2510.15414 null
2025-10-17 SHARE: Scene-Human Aligned Reconstruction Joshua Li et.al. 2510.15342 null
2025-10-17 VERA-MH Concept Paper Luca Belli et.al. 2510.15297 null
2025-10-17 Exemplar-Guided Planing: Enhanced LLM Agent for KGQA Jingao Xu et.al. 2510.15283 null
2025-10-17 Experience-Driven Exploration for Efficient API-Free AI Agents Chenwei Tang et.al. 2510.15259 null
2025-10-17 Multi-dimensional Data Analysis and Applications Basing on LLM Agents and Knowledge Graph Interactions Xi Wang et.al. 2510.15258 null
2025-10-17 Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding Sensen Gao et.al. 2510.15253 null
2025-10-17 Where to Search: Measure the Prior-Structured Search Space of LLM Agents Zhuo-Yang Song et.al. 2510.14846 null
2025-10-16 GUIrilla: A Scalable Framework for Automated Desktop UI Exploration Sofiya Garkot et.al. 2510.16051 null
2025-10-16 MAGPIE: A benchmark for Multi-AGent contextual PrIvacy Evaluation Gurusha Juneja et.al. 2510.15186 null
2025-10-16 Internalizing World Models via Self-Play Finetuning for Agentic RL Shiqi Chen et.al. 2510.15047 null
2025-10-16 Generalized Dynamics Generation towards Scannable Physical World Model Yichen Li et.al. 2510.15041 null
2025-10-16 UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos Mingxuan Liu et.al. 2510.15018 null
2025-10-16 Data-driven Calibration Sample Selection and Forecast Combination in Electricity Price Forecasting: An Application of the ARHNN Method Tomasz Serafin et.al. 2510.15011 null
2025-10-16 Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents Guoqing Wang et.al. 2510.14967 null
2025-10-16 VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation Han Zhao et.al. 2510.14902 null
2025-10-16 The Gatekeeper Knows Enough Fikresilase Wondmeneh Abebayew et.al. 2510.14881 null
2025-10-16 LabOS: The AI-XR Co-Scientist That Sees and Works With Humans Le Cong et.al. 2510.14861 null
2025-10-16 RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning Jinrui Liu et.al. 2510.14828 null
2025-10-16 To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models Eran Malach et.al. 2510.14826 null
2025-10-16 ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling Jianghao Lin et.al. 2510.14703 null
2025-10-16 LLM Agents for Automated Web Vulnerability Reproduction: Are We There Yet? Bin Liu et.al. 2510.14700 null
2025-10-16 LLM Agents Beyond Utility: An Open-Ended Perspective Asen Nachkov et.al. 2510.14548 null
2025-10-16 Agentic Entropy-Balanced Policy Optimization Guanting Dong et.al. 2510.14545 null
2025-10-16 Helmsman: Autonomous Synthesis of Federated Learning Systems via Multi-Agent Collaboration Haoyuan Li et.al. 2510.14512 null
2025-10-16 LiRA: Linguistic Robust Anchoring for Cross-lingual Large Language Models Haolin Li et.al. 2510.14466 null
2025-10-16 Towards Automated Governance: A DSL for Human-Agent Collaboration in Software Projects Adem Ait et.al. 2510.14465 null
2025-10-16 Why Instant-Runoff Voting Is So Resilient to Coalitional Manipulation: Phase Transitions in the Perturbed Culture François Durand et.al. 2510.14450 null
2025-10-16 Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents Rui Wang et.al. 2510.14438 null
2025-10-16 Bounds and asymptotic expansions for the radii of convexity and uniform convexity of normalized Bessel functions Árpád Baricz et.al. 2510.14323 null
2025-10-16 Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies Mason Nakamura et.al. 2510.14312 null
2025-10-16 ReUseIt: Synthesizing Reusable AI Agent Workflows for Web Automation Yimeng Liu et.al. 2510.14308 null
2025-10-16 AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement Learning Framework for Stock Trading Zheye Deng et.al. 2510.14264 null
2025-10-16 MAFA: A Multi-Agent Framework for Enterprise-Scale Annotation with Configurable Task Adaptation Mahmood Hegazy et.al. 2510.14184 null
2025-10-16 Training LLM Agents to Empower Humans Evan Ellis et.al. 2510.13709 null
2025-10-16 OpenDerisk: An Industrial Framework for AI-Driven SRE, with Design, Implementation, and Case Studies Peng Di et.al. 2510.13561 null
2025-10-16 SVAG-Bench: A Large-Scale Benchmark for Multi-Instance Spatio-temporal Video Action Grounding Tanveer Hannan et.al. 2510.13016 null
2025-10-16 Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics Marco Del Tredici et.al. 2510.12787 null
2025-10-16 Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning Xingang Guo et.al. 2510.12712 null
2025-10-16 MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites Zhenxin Lei et.al. 2510.12126 null
2025-10-15 When “Correct” Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents? Yibo Peng et.al. 2510.17862 null
2025-10-15 CiteGuard: Faithful Citation Attribution for LLMs via Retrieval-Augmented Validation Yee Man Choi et.al. 2510.17853 null
2025-10-15 CodeEvolve: An open source evolutionary coding agent for algorithm discovery and optimization Henrique Assumpção et.al. 2510.14150 null
2025-10-15 Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems Edoardo Allegrini et.al. 2510.14133 null
2025-10-15 Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Serving Nikos Pagonas et.al. 2510.14126 null
2025-10-15 STEMS: Spatial-Temporal Enhanced Safe Multi-Agent Coordination for Building Energy Management Huiliang Zhang et.al. 2510.14112 null
2025-10-15 Three-Dimensional Simulation of the University of Hawai`i FEL Oscillator: Superradiant Emission and Cavity Desynchronization Amir Weinberg et.al. 2510.14061 null
2025-10-15 Sequential Quantum Measurements and the Instrumental Group Algebra Christopher S. Jackson et.al. 2510.13980 null
2025-10-15 An LLM-Powered AI Agent Framework for Holistic IoT Traffic Interpretation Daniel Adu Worae et.al. 2510.13925 null
2025-10-15 FACTS: Table Summarization via Offline Template Generation with Agentic Workflows Ye Yuan et.al. 2510.13920 null
2025-10-15 Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms Shrey Pandit et.al. 2510.13913 null
2025-10-15 RECODE: Reasoning Through Code Generation for Visual Question Answering Junhong Shen et.al. 2510.13756 null
2025-10-15 From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails Ravi Pandya et.al. 2510.13727 null
2025-10-15 Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module Ruitao Feng et.al. 2510.13558 null
2025-10-15 Tandem Training for Language Models Robert West et.al. 2510.13551 null
2025-10-15 In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers Avihay Cohen et.al. 2510.13543 null
2025-10-15 MADREC: A Multi-Aspect Driven LLM Agent for Explainable and Adaptive Recommendation Jiin Park et.al. 2510.13371 null
2025-10-15 Higher Satisfaction, Lower Cost: A Technical Report on How LLMs Revolutionize Meituan’s Intelligent Interaction Systems Xuxin Cheng et.al. 2510.13291 null
2025-10-15 Automated Network Protocol Testing with LLM Agents Yunze Wei et.al. 2510.13248 null
2025-10-15 EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems Yufei He et.al. 2510.13220 null
2025-10-15 Addressing the alignment problem in transportation policy making: an LLM approach Xiaoyu Yan et.al. 2510.13139 null
2025-10-14 Using Kolmogorov-Smirnov Distance for Measuring Distribution Shift in Machine Learning Ozan K. Tonguz et.al. 2510.15996 null
2025-10-14 MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents Dongsen Zhang et.al. 2510.15994 null
2025-10-14 Benefits and Limitations of Communication in Multi-Agent Reasoning Michael Rizvi-Martel et.al. 2510.13903 null
2025-10-14 GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents Xi Yu et.al. 2510.13896 null
2025-10-14 MultiFoodhat: A potential new paradigm for intelligent food quality inspection Yue Hu et.al. 2510.13889 null
2025-10-14 Deliberate Lab: A Platform for Real-Time Human-AI Social Experiments Crystal Qian et.al. 2510.13011 null
2025-10-14 SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents Simon Sinong Zhan et.al. 2510.12985 null
2025-10-14 From Literal to Liberal: A Meta-Prompting Framework for Eliciting Human-Aligned Exception Handling in Large Language Models Imran Khan et.al. 2510.12864 null
2025-10-14 Three Lenses on the AI Revolution: Risk, Transformation, Continuity Masoud Makrehchi et.al. 2510.12859 null
2025-10-14 VQArt-Bench: A semantically rich VQA Benchmark for Art and Cultural Heritage A. Alfarano et.al. 2510.12750 null
2025-10-14 SPORTS: Simultaneous Panoptic Odometry, Rendering, Tracking and Segmentation for Urban Scenes Understanding Zhiliu Yang et.al. 2510.12749 null
2025-10-14 Multi-Agent Debate for LLM Judges with Adaptive Stability Detection Tianyu Hu et.al. 2510.12697 null
2025-10-14 ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning Hanyang Chen et.al. 2510.12693 null
2025-10-14 Designing Tools with Control Confidence Ajith Anil Meera et.al. 2510.12630 null
2025-10-14 A Survey of Vibe Coding with Large Language Models Yuyao Ge et.al. 2510.12399 null
2025-10-14 GOAT: A Training Framework for Goal-Oriented Agent with Tools Hyunji Min et.al. 2510.12218 null
2025-10-14 Agent-Based Simulation of a Financial Market with Large Language Models Ryuji Hashimoto et.al. 2510.12189 null
2025-10-14 IL3D: A Large-Scale Indoor Layout Dataset for LLM-Driven 3D Scene Generation Wenxu Zhou et.al. 2510.12095 null
2025-10-14 ToPolyAgent: AI Agents for Coarse-Grained Topological Polymer Simulations Lijie Ding et.al. 2510.12091 null
2025-10-14 Evaluating the Quality of Randomness and Entropy in Tasks Supported by Large Language Models Rabimba Karanjai et.al. 2510.12080 null
2025-10-14 EmboMatrix: A Scalable Training-Ground for Embodied Decision-Making Zixing Lei et.al. 2510.12072 null
2025-10-14 AI Agents as Universal Task Solvers Alessandro Achille et.al. 2510.12066 null
2025-10-14 Empowering LLM Agents with Geospatial Awareness: Toward Grounded Reasoning for Wildfire Response Yiheng Chen et.al. 2510.12061 null
2025-10-14 On the Number of Small Points for Rational Maps Jit Wu Yap et.al. 2510.12039 null
2025-10-14 ManiAgent: An Agentic Framework for General Robotic Manipulation Yi Yang et.al. 2510.11660 null
2025-10-14 Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs Yujie Zhao et.al. 2510.11062 null
2025-10-13 Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation Sayash Kapoor et.al. 2510.11977 null
2025-10-13 Scaling Long-Horizon LLM Agent via Context-Folding Weiwei Sun et.al. 2510.11967 null
2025-10-13 DMAS-Forge: A Framework for Transparent Deployment of AI Applications as Distributed Systems Alessandro Cornacchia et.al. 2510.11872 null
2025-10-13 Demystifying Reinforcement Learning in Agentic Reasoning Zhaochen Yu et.al. 2510.11701 null
2025-10-13 When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents Lingfei Qian et.al. 2510.11695 null
2025-10-13 Chronologically Consistent Generative AI Songrun He et.al. 2510.11677 null
2025-10-13 FinVet: A Collaborative Framework of RAG and External Fact-Checking Agents for Financial Misinformation Detection Daniel Berhane Araya et.al. 2510.11654 null
2025-10-13 Analyzing and Internalizing Complex Policy Documents for LLM Agents Jiateng Liu et.al. 2510.11588 null
2025-10-13 Uncertainty-Aware, Risk-Adaptive Access Control for Agentic Systems using an LLM-Judged TBAC Model Charles Fleming et.al. 2510.11414 null
2025-10-13 DocReward: A Document Reward Model for Structuring and Stylizing Junpeng Liu et.al. 2510.11391 null
2025-10-13 Evolution in Simulation: AI-Agent School with Dual Memory for High-Fidelity Educational Dynamics Sheng Jin et.al. 2510.11290 null
2025-10-13 PADME: Procedure Aware DynaMic Execution Deepeka Garg et.al. 2510.11281 null
2025-10-13 A Large-Language-Model Assisted Automated Scale Bar Detection and Extraction Framework for Scanning Electron Microscopic Images Yuxuan Chen et.al. 2510.11260 null
2025-10-13 Collaborative Shadows: Distributed Backdoor Attacks in LLM-Based Multi-Agent Systems Pengyu Zhu et.al. 2510.11246 null
2025-10-13 Attacks by Content: Automated Fact-checking is an AI Security Issue Michael Schlichtkrull et.al. 2510.11238 null
2025-10-13 WebRouter: Query-specific Router via Variational Information Bottleneck for Cost-sensitive Web Agent Tao Li et.al. 2510.11221 null
2025-10-13 Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains? Zhengyu Chen et.al. 2510.11184 null
2025-10-13 $How^{2}$ : How to learn from procedural How-to questions Gautier Dagan et.al. 2510.11144 null
2025-10-13 video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory Guangzhi Sun et.al. 2510.11129 null
2025-10-13 SusBench: An Online Benchmark for Evaluating Dark Pattern Susceptibility of Computer-Use Agents Longjie Guo et.al. 2510.11035 null
2025-10-13 A Survey on Agentic Multimodal Large Language Models Huanjin Yao et.al. 2510.10991 null
2025-10-13 Rethinking Reward Miscalibration of GRPO in Agentic RL Jingyu Liu et.al. 2509.23870 null
2025-10-13 EvoEmo: Towards Evolved Emotional Policies for Adversarial LLM Agents in Multi-Turn Price Negotiation Yunbo Long et.al. 2509.04310 null
2025-10-12 Zero-Shot Large Language Model Agents for Fully Automated Radiotherapy Treatment Planning Dongrong Yang et.al. 2510.11754 null
2025-10-12 GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-Turn Deep Search Heng Zhang et.al. 2510.10581 null
2025-10-12 MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision Hongjie Zheng et.al. 2510.10461 null
2025-10-12 Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval Junwei Lan et.al. 2509.24869 null
2025-10-12 Talk Less, Call Right: Enhancing Role-Play LLM Agents with Automatic Prompt Optimization and Role Prompting Saksorn Ruangtanusak et.al. 2509.00482 null
2025-10-11 KG-MAS: Knowledge Graph-Enhanced Multi-Agent Infrastructure for coupling physical and digital robotic environments Walid Abdela et.al. 2510.10325 null
2025-10-11 Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models Christopher Chiu et.al. 2510.10278 null
2025-10-11 Don’t Just Fine-tune the Agent, Tune the Environment Siyuan Lu et.al. 2510.10197 null
2025-10-11 ALLOY: Generating Reusable Agent Workflows from User Demonstration Jiawen Li et.al. 2510.10049 null
2025-10-11 SwarmSys: Decentralized Swarm-Inspired Agents for Scalable and Adaptive Reasoning Ruohao Li et.al. 2510.10047 null
2025-10-11 Leveraging Large Language Models for Cybersecurity Risk Assessment – A Case from Forestry Cyber-Physical Systems Fikret Mert Gultekin et.al. 2510.06343 null
2025-10-11 Tree Search for LLM Agent Reinforcement Learning Yuxiang Ji et.al. 2509.21240 null
2025-10-11 ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy Alejandro D. Mousist et.al. 2509.13380 null
2025-10-10 Autonomous Agents for Scientific Discovery: Orchestrating Scientists, Language, Code, and Physics Lianhao Zhou et.al. 2510.09901 null
2025-10-10 How can we assess human-agent interactions? Case studies in software agent design Valerie Chen et.al. 2510.09801 null
2025-10-10 Building a Foundational Guardrail for General Agentic Systems via Synthetic Data Yue Huang et.al. 2510.09781 null
2025-10-10 Preference-Aware Memory Update for Long-Term LLM Agents Haoran Sun et.al. 2510.09720 null
2025-10-10 StreamingVLM: Real-Time Understanding for Infinite Video Streams Ruyi Xu et.al. 2510.09608 null
2025-10-10 Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols Mikhail Terekhov et.al. 2510.09462 null
2025-10-10 Safety Game: Balancing Safe and Informative Conversations with Blackbox Agentic AI using LP Solvers Tuan Nguyen et.al. 2510.09330 null
2025-10-10 Fundamentals of Building Autonomous LLM Agents Victor de Lamo Castrillo et.al. 2510.09244 null
2025-10-10 Leading the Follower: Learning Persuasive Agents in Social Deduction Games Zhang Zheng et.al. 2510.09087 null
2025-10-10 When LLM Agents Meet Graph Optimization: An Automated Data Quality Improvement Approach Zhihan Zhang et.al. 2510.08952 null
2025-10-10 Reimagining Agent-based Modeling with Large Language Model Agents via Shachi So Kuroki et.al. 2509.21862 null
2025-10-09 CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization Debeshee Das et.al. 2510.08829 null
2025-10-09 COMPASS: Enhancing Agent Long-Horizon Reasoning with Evolving Context Guangya Wan et.al. 2510.08790 null
2025-10-09 Automating Android Build Repair: Bridging the Reasoning-Execution Gap in LLM Agents with Domain-Specific Tools Ha Min Son et.al. 2510.08640 null
2025-10-09 CaRT: Teaching LLM Agents to Know When They Know Enough Grace Liu et.al. 2510.08517 null
2025-10-09 Opponent Shaping in LLM Agents Marta Emili Garcia Segura et.al. 2510.08255 null
2025-10-09 Simulating Teams with LLM Agents: Interactive 2D Environments for Studying Human-AI Dynamics Mohammed Almutairi et.al. 2510.08242 null
2025-10-09 Training-Free Group Relative Policy Optimization Yuzheng Cai et.al. 2510.08191 null
2025-10-09 AutoQual: An LLM Agent for Automated Discovery of Interpretable Features for Review Quality Assessment Xiaochong Lan et.al. 2510.08081 null
2025-10-09 Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks Cheng Yang et.al. 2510.08002 null
2025-10-09 Team Xiaomi EV-AD VLA: Learning to Navigate Socially Through Proactive Risk Perception – Technical Report for IROS 2025 RoboSense Challenge Social Navigation Track Erjia Xiao et.al. 2510.07871 null
2025-10-09 Self-Improving LLM Agents at Test-Time Emre Can Acikgoz et.al. 2510.07841 null
2025-10-09 Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models Eric Hanchen Jiang et.al. 2510.07799 null
2025-10-09 Neuro-Symbolic Agents with Modal Logic for Autonomous Diagnostics Antonin Sulc et.al. 2509.11943 null
2025-10-08 PARSE: LLM Driven Schema Optimization for Reliable Entity Extraction Anubhav Shrimal et.al. 2510.08623 null
2025-10-08 L2M-AID: Autonomous Cyber-Physical Defense by Fusing Semantic Reasoning of Large Language Models with Multi-Agent Reinforcement Learning (Preprint) Tianxiang Xu et.al. 2510.07363 null
2025-10-08 LAD-RAG: Layout-aware Dynamic RAG for Visually-Rich Document Understanding Zhivar Sourati et.al. 2510.07233 null
2025-10-08 Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping Ziyi Wang et.al. 2510.07230 null
2025-10-08 Exposing LLM User Privacy via Traffic Fingerprint Analysis: A Study of Privacy Risks in LLM Agent Interactions Yixiang Zhang et.al. 2510.07176 null
2025-10-08 NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents Tianshi Zheng et.al. 2510.07172 null
2025-10-08 Prompt Optimization Across Multiple Agents for Representing Diverse Human Populations Manh Hung Nguyen et.al. 2510.07064 null
2025-10-08 COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization Tian Qin et.al. 2510.07043 null
2025-10-08 LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling Zecheng Tang et.al. 2510.06915 null
2025-10-08 When Machines Meet Each Other: Network Effects and the Strategic Role of History in Multi-Agent AI Yu Liu et.al. 2510.06903 null
2025-10-08 SID: Multi-LLM Debate Driven by Self Signals Xuhang Chen et.al. 2510.06843 null
2025-10-08 Scaling LLM Multi-turn RL with End-to-end Summarization-based Context Management Miao Lu et.al. 2510.06727 null
2025-10-08 WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks Jingbo Yang et.al. 2510.06587 null
2025-10-08 Spiral of Silence in Large Language Model Agents Mingze Zhong et.al. 2510.02360 null
2025-10-08 Toward Causal-Visual Programming: Enhancing Agentic Reasoning in Low-Code Environments Jiexi Xu et.al. 2509.25282 null
2025-10-07 A Survey on Agentic Security: Applications, Threats and Defenses Asif Shahriar et.al. 2510.06445 null
2025-10-07 Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents Mingkang Zhu et.al. 2510.06214 null
2025-10-07 RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback Chunyu Miao et.al. 2510.06186 null
2025-10-07 LLMs as Policy-Agnostic Teammates: A Case Study in Human Proxy Design for Heterogeneous Agent Teams Aju Ani Justus et.al. 2510.06151 null
2025-10-07 Constraint-Aware Route Recommendation from Natural Language via Hierarchical LLM Agents Tao Zhe et.al. 2510.06078 null
2025-10-07 Training-Free Time Series Classification via In-Context Reasoning with LLM Agents Songyuan Sui et.al. 2510.05950 null
2025-10-07 EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models Zheyue Tan et.al. 2510.05943 null
2025-10-07 LLM-FS-Agent: A Deliberative Role-based Large Language Model Architecture for Transparent Feature Selection Mohamed Bal-Ghaoui et.al. 2510.05935 null
2025-10-07 Communication Enables Cooperation in LLM Agents: A Comparison with Curriculum-Based Approaches Hachem Madmoun et.al. 2510.05748 null
2025-10-07 AutoPentester: An LLM Agent-based Framework for Automated Pentesting Yasod Ginige et.al. 2510.05605 null
2025-10-07 AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents Mingdai Yang et.al. 2510.05598 null
2025-10-07 From Agentification to Self-Evolving Agentic AI for Wireless Networks: Concepts, Approaches, and Future Research Directions Changyuan Zhao et.al. 2510.05596 null
2025-10-07 BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks Sagnik Anupam et.al. 2510.02418 null
2025-10-06 Adversarial Reinforcement Learning for Large Language Model Agent Safety Zizhao Wang et.al. 2510.05442 null
2025-10-06 A Lightweight Large Language Model-Based Multi-Agent System for 2D Frame Structural Analysis Ziheng Geng et.al. 2510.05414 null
2025-10-06 Plug-and-Play Dramaturge: A Divide-and-Conquer Approach for Iterative Narrative Script Refinement via Collaborative LLM Agents Wenda Xie et.al. 2510.05188 null
2025-10-06 RL Is a Hammer and LLMs Are Nails: A Simple Reinforcement Learning Recipe for Strong Prompt Injection Yuxin Wen et.al. 2510.04885 null
2025-10-06 Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails Siwei Han et.al. 2510.04860 null
2025-10-06 Beyond Outcome Reward: Decoupling Search and Answering Improves LLM Agents Yiding Wang et.al. 2510.04695 null
2025-10-06 Multi-Agent Tool-Integrated Policy Optimization Zhanfeng Mo et.al. 2510.04678 null
2025-10-06 Social Agent: Mastering Dyadic Nonverbal Behavior Generation via Conversational LLM Agents Zeyi Zhang et.al. 2510.04637 null
2025-10-06 Autonomy Matters: A Study on Personalization-Privacy Dilemma in LLM Agents Zhiping Zhang et.al. 2510.04465 null
2025-10-06 Beyond Manuals and Tasks: Instance-Level Context Learning for LLM Agents Kuntai Cai et.al. 2510.02369 null
2025-10-05 Internal World Models as Imagination Networks in Cognitive Agents Saurabh Ranjan et.al. 2510.04391 null
2025-10-05 Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation Hadi Nekoei et.al. 2510.04373 null
2025-10-05 Closing the Loop: Coordinating Inventory and Recommendation via Deep Reinforcement Learning on Multiple Timescales Jinyang Jiang et.al. 2510.04272 null
2025-10-05 AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework Hanchen Zhang et.al. 2510.04206 null
2025-10-05 Constructing coherent spatial memory in LLM agents through graph rectification Puzhen Zhang et.al. 2510.04195 null
2025-10-05 From Shadow to Light: Toward Safe and Efficient Policy Learning Across MPC, DeePC, RL, and LLM Agents Amin Vahidi-Moghaddam et.al. 2510.04076 null
2025-10-04 Adversarial Agent Collaboration for C to Rust Translation Tianyu Li et.al. 2510.03879 null
2025-10-04 InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents Yaxin Du et.al. 2510.02271 null
2025-10-04 Extracting Conceptual Knowledge to Locate Software Issues Ying Wang et.al. 2509.21427 null
2025-10-03 VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation Lesly Miculicich et.al. 2510.05156 null
2025-10-03 LLM Agents for Automated Dependency Upgrades Vali Tawosi et.al. 2510.03480 null
2025-10-03 ALMAS: an Autonomous LLM-based Multi-Agent Software Engineering Framework Vali Tawosi et.al. 2510.03463 null
2025-10-03 Improving GUI Grounding with Explicit Position-to-Coordinate Mapping Suyuchen Wang et.al. 2510.03230 null
2025-10-03 CoDA: Agentic Systems for Collaborative Data Visualization Zichen Chen et.al. 2510.03194 null
2025-10-03 AudioToolAgent: An Agentic Framework for Audio-Language Models Gijs Wijngaard et.al. 2510.02995 null
2025-10-03 Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents Wonjoong Kim et.al. 2510.02837 null
2025-10-02 AgentCaster: Reasoning-Guided Tornado Forecasting Michael Chen et.al. 2510.03349 null
2025-10-02 Orchestrating Human-AI Teams: The Manager Agent as a Unifying Research Challenge Charlie Masters et.al. 2510.02557 null
2025-10-02 StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? Yanxu Chen et.al. 2510.02209 null
2025-10-02 TACOS: Task Agnostic COordinator of a multi-drone System Alessandro Nazzari et.al. 2510.01869 null
2025-10-02 Pre-Hoc Predictions in AutoML: Leveraging LLMs to Enhance Model Selection and Benchmarking for Tabular datasets Yannis Belkhiter et.al. 2510.01842 null
2025-10-02 GuruAgents: Emulating Wise Investors with Prompt-Guided LLM Agents Yejin Kim et.al. 2510.01664 null
2025-10-02 SoK: Measuring What Matters for Closed-Loop Security Agents Mudita Khurana et.al. 2510.01654 null
2025-10-02 Position: Privacy Is Not Just Memorization! Niloofar Mireshghallah et.al. 2510.01645 null
2025-10-02 GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments Hanlin Zhu et.al. 2509.21998 null
2025-10-02 Gala: Global LLM Agents for Text-to-Model Translation Junyang Cai et.al. 2509.08970 null
2025-10-01 Automating Data-Driven Modeling and Analysis for Engineering Applications using Large Language Model Agents Yang Liu et.al. 2510.01398 null
2025-10-01 Beyond Single LLMs: Enhanced Code Generation via Multi-Stage Performance-Guided LLM Orchestration Huashan Chen et.al. 2510.01379 null
2025-10-01 Fine-tuning with RAG for Improving LLM Learning of New Skills Humaid Ibrahim et.al. 2510.01375 null
2025-10-01 Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks Shoumik Saha et.al. 2510.01359 null
2025-10-01 The Social Laboratory: A Psychometric Framework for Multi-Agent LLM Evaluation Zarreen Reza et.al. 2510.01295 null
2025-10-01 TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments Zhangchen Xu et.al. 2510.01179 null
2025-10-01 Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare Zhengliang Shi et.al. 2510.01164 null
2025-10-01 A Practitioner’s Guide to Multi-turn Agentic Reinforcement Learning Ruiyi Wang et.al. 2510.01132 null
2025-10-01 QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL Cong Yu et.al. 2510.00967 null
2025-10-01 ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs Adi Simhi et.al. 2510.00857 null
2025-10-01 ACON: Optimizing Context Compression for Long-horizon LLM Agents Minki Kang et.al. 2510.00615 null
2025-10-01 JoyAgent-JDGenie: Technical Report on the GAIA Jiarun Liu et.al. 2510.00510 null
2025-10-01 Seeing through Uncertainty: Robust Task-Oriented Optimization in Visual Navigation Yiyuan Pan et.al. 2510.00441 null
2025-10-01 RELATE-Sim: Leveraging Turning Point Theory and LLM Agents to Predict and Understand Long-Term Relationship Dynamics through Interactive Narrative Simulations Matthew Yue et.al. 2510.00414 null
2025-10-01 Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs Siyu Zhu et.al. 2509.25779 null
2025-10-01 Automatically Generating Web Applications from Requirements Via Multi-Agent Test-Driven Development Yuxuan Wan et.al. 2509.25297 null
2025-10-01 Beyond the Strongest LLM: Multi-Turn Multi-Agent Orchestration vs. Single LLMs on Benchmarks Aaron Xuxiang Tian et.al. 2509.23537 null
2025-10-01 On the Soundness and Consistency of LLM Agents for Executing Test Cases Written in Natural Language Sébastien Salva et.al. 2509.19136 null
2025-10-01 A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks S M Asif Hossain et.al. 2509.14285 null
2025-09-30 From Trace to Line: LLM Agent for Real-World OSS Vulnerability Localization Haoran Xi et.al. 2510.02389 null
2025-09-30 CORTEX: Collaborative LLM Agents for High-Stakes Alert Triage Bowen Wei et.al. 2510.00311 null
2025-09-30 Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents Zhen Yang et.al. 2509.26539 null
2025-09-30 VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications Wei He et.al. 2509.26490 null
2025-09-30 ErrorPrism: Reconstructing Error Propagation Paths in Cloud Service Systems Junsong Pu et.al. 2509.26463 null
2025-09-30 Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents Shuai Shao et.al. 2509.26354 null
2025-09-30 LLM Agents for Knowledge Discovery in Atomic Layer Processing Andreas Werbrouck et.al. 2509.26201 null
2025-09-30 RoRecomp: Enhancing Reasoning Efficiency via Rollout Response Recomposition in Reinforcement Learning Gang Li et.al. 2509.25958 null
2025-09-30 Mem-α: Learning Memory Construction via Reinforcement Learning Yu Wang et.al. 2509.25911 null
2025-09-30 SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents Ruolin Chen et.al. 2509.25885 null
2025-09-30 Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs Hankun Dai et.al. 2509.25873 null
2025-09-30 STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents Jing-Jing Li et.al. 2509.25624 null
2025-09-30 MASLegalBench: Benchmarking Multi-Agent Systems in Deductive Legal Reasoning Huihao Jing et.al. 2509.24922 null
2025-09-30 TENET: Leveraging Tests Beyond Validation for Code Generation Yiran Hu et.al. 2509.24148 null
2025-09-30 Dual-Scale World Models for LLM Agents Towards Hard-Exploration Problems Minsoo Kim et.al. 2509.24116 null
2025-09-30 InfiAgent: Self-Evolving Pyramid Agent Framework for Infinite Scenarios Chenglin Yu et.al. 2509.22502 null
2025-09-30 Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents Davide Paglieri et.al. 2509.03581 null
2025-09-30 Towards Agentic OS: An LLM Agent Framework for Linux Schedulers Yusheng Zheng et.al. 2509.01245 null
2025-09-29 A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory Qianshan Wei et.al. 2510.02373 null
2025-09-29 Causal Autoencoder-like Generation of Feedback Fuzzy Cognitive Maps with an LLM Agent Akash Kumar Panda et.al. 2509.25593 null
2025-09-29 RadOnc-GPT: An Autonomous LLM Agent for Real-Time Patient Outcomes Labeling at Scale Jason Holmes et.al. 2509.25540 null
2025-09-29 Where LLM Agents Fail and How They can Learn From Failures Kunlun Zhu et.al. 2509.25370 null
2025-09-29 Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents Boxuan Zhang et.al. 2509.25302 null
2025-09-29 PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion Yuyang Yin et.al. 2509.24997 null
2025-09-29 When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training Sanxing Chen et.al. 2509.24923 null
2025-09-29 MAS $^2$ : Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems Kun Wang et.al. 2509.24323 null
2025-09-29 SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents Gyuhyeon Seo et.al. 2509.24282 null
2025-09-28 WAREX: Web Agent Reliability Evaluation on Existing Benchmarks Su Kara et.al. 2510.03285 null
2025-09-28 Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning Runyu Zhang et.al. 2509.24047 null
2025-09-28 PartnerMAS: An LLM Hierarchical Multi-Agent Framework for Business Partner Selection on High-Dimensional Features Lingyao Li et.al. 2509.24046 null
2025-09-28 LLM/Agent-as-Data-Analyst: A Survey Zirui Tang et.al. 2509.23988 null
2025-09-28 Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation Pengxiang Li et.al. 2509.23866 null
2025-09-28 AgentGuard: Runtime Verification of AI Agents Roham Koohestani et.al. 2509.23864 null
2025-09-28 Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules Chenyu Zhou et.al. 2509.23836 null
2025-09-28 FedAgentBench: Towards Automating Real-world Federated Medical Image Analysis with Server-Client LLM Agents Pramit Saha et.al. 2509.23803 null
2025-09-28 GUI-Shepherd: Reliable Process Reward and Verification for Long-Sequence GUI Tasks Cong Chen et.al. 2509.23738 null
2025-09-28 Improving the Efficiency of LLM Agent Systems through Trajectory Reduction Yuan-An Xiao et.al. 2509.23586 null
2025-09-28 Agentic Reinforcement Learning with Implicit Step Rewards Xiaoqian Liu et.al. 2509.19199 null
2025-09-27 Memory Management and Contextual Consistency for Long-Running Low-Code Agents Jiexi Xu et.al. 2509.25250 null
2025-09-27 BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software Zehua Zhang et.al. 2509.25248 null
2025-09-27 Situational Awareness for Safe and Robust Multi-Agent Interactions Under Uncertainty Benjamin Alcorn et.al. 2509.23425 null
2025-09-27 “Shall We Dig Deeper?”: Designing and Evaluating Strategies for LLM Agents to Advance Knowledge Co-Construction in Asynchronous Online Discussions Yuanhao Zhang et.al. 2509.23327 null
2025-09-27 Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents Yaorui Shi et.al. 2509.23040 null
2025-09-26 Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents Heyang Gao et.al. 2510.03253 null
2025-09-26 AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering Ziqing Wang et.al. 2510.02328 null
2025-09-26 Infusing Theory of Mind into Socially Intelligent LLM Agents EunJeong Hwang et.al. 2509.22887 null
2025-09-26 ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents Hwan Chang et.al. 2509.22830 null
2025-09-26 EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Wujiang Xu et.al. 2509.22576 null
2025-09-26 The Emergence of Altruism in Large-Language-Model Agents Society Haoyang Li et.al. 2509.22537 null
2025-09-26 Do LLM Agents Know How to Ground, Recover, and Assess? A Benchmark for Epistemic Competence in Information-Seeking Agents Jiaqi Shao et.al. 2509.22391 null
2025-09-26 Impact of Collective Behaviors of Autonomous Vehicles on Urban Traffic Dynamics: A Multi-Agent Reinforcement Learning Approach Ahmet Onur Akman et.al. 2509.22216 null
2025-09-26 Leveraging LLM Agents for Automated Video Game Testing Chengjia Wang et.al. 2509.22170 null
2025-09-26 CoBel-World: Harnessing LLM Reasoning to Build a Collaborative Belief World for Optimizing Embodied Multi-Agent Collaboration Zhimin Wang et.al. 2509.21981 null
2025-09-26 What Makes LLM Agent Simulations Useful for Policy? Insights From an Iterative Design Engagement in Emergency Preparedness Yuxuan Li et.al. 2509.21868 null
2025-09-26 UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios Haotian Luo et.al. 2509.21766 null
2025-09-26 JudgeAgent: Knowledge-wise and Dynamic LLM Evaluation with Agent-as-Interviewer Zhichao Shi et.al. 2509.02097 null
2025-09-25 LLM Agent Meets Agentic AI: Can LLM Agents Simulate Customers to Evaluate Agentic-AI-based Shopping Assistants? Lu Sun et.al. 2509.21501 null
2025-09-25 What Do LLM Agents Do When Left Alone? Evidence of Spontaneous Meta-Cognitive Patterns Stefan Szeider et.al. 2509.21224 null
2025-09-25 CORE: Full-Path Evaluation of LLM Agents Beyond Final State Panagiotis Michelakis et.al. 2509.20998 null
2025-09-25 LIMI: Less is More for Agency Yang Xiao et.al. 2509.17567 null
2025-09-24 EpidemIQs: Prompt-to-Paper LLM Agents for Epidemic Modeling and Analysis Mohammad Hossein Samaei et.al. 2510.00024 null
2025-09-24 Blueprint-Bench: Comparing spatial intelligence of LLMs, agents and image models Lukas Petersson et.al. 2509.25229 null
2025-09-24 LLMs for Bayesian Optimization in Scientific Domains: Are We There Yet? Rushil Gupta et.al. 2509.21403 null
2025-09-24 Training Task Reasoning LLM Agents for Multi-turn Task Planning via Single-turn Reinforcement Learning Hanjiang Hu et.al. 2509.20616 null
2025-09-24 SAMULE: Self-Learning Agents Enhanced by Multi-level Reflection Yubin Ge et.al. 2509.20562 null
2025-09-24 Perspectra: Choosing Your Experts Enhances Critical Thinking in Multi-Agent Research Ideation Yiren Liu et.al. 2509.20553 null
2025-09-24 Agentic Metacognition: Designing a “Self-Aware” Low-Code Agent for Failure Prediction and Human Handoff Jiexi Xu et.al. 2509.19783 null
2025-09-23 Structured Cognition for Behavioral Intelligence in Large Language Model Agents: Preliminary Study Myung Ho Kim et.al. 2510.05107 null
2025-09-23 The Heterogeneous Multi-Agent Challenge Charles Dansereau et.al. 2509.19512 null
2025-09-23 Simulating Online Social Media Conversations on Controversial Topics Using AI Agents Calibrated on Real-World Data Elisa Composta et.al. 2509.18985 null
2025-09-23 MemOrb: A Plug-and-Play Verbal-Reinforcement Memory Layer for E-Commerce Customer Service Yizhe Huang et.al. 2509.18713 null
2025-09-23 LCMF: Lightweight Cross-Modality Mambaformer for Embodied Robotics VQA Zeyi Kang et.al. 2509.18576 null
2025-09-23 LLMZ+: Contextual Prompt Whitelist Principles for Agentic LLMs Tom Pawelek et.al. 2509.18557 null
2025-09-23 LLM Agents for Interactive Workflow Provenance: Reference Architecture and Evaluation Methodology Renan Souza et.al. 2509.13978 null
2025-09-22 ARK-V1: An LLM-Agent for Knowledge Graph Question Answering Requiring Commonsense Reasoning Jan-Felix Klein et.al. 2509.18063 null
2025-09-22 Through the Lens of Human-Human Collaboration: A Configurable Research Platform for Exploring Human-Agent Collaboration Bingsheng Yao et.al. 2509.18008 null
2025-09-22 MSCoRe: A Benchmark for Multi-Stage Collaborative Reasoning in LLM Agents Yuzhen Lei et.al. 2509.17628 null
2025-09-22 Human vs. Agent in Task-Oriented Conversations Zhefan Wang et.al. 2509.17619 null
2025-09-22 Privacy in Action: Towards Realistic Privacy Mitigation and Evaluation for LLM-Powered Agents Shouju Wang et.al. 2509.17488 null
2025-09-22 Asteria: Semantic-Aware Cross-Region Caching for Agentic LLM Tool Access Chaoyi Ruan et.al. 2509.17360 null
2025-09-22 UIPro: Unleashing Superior Interaction Capability For GUI Agents Hongxin Li et.al. 2509.17328 null
2025-09-22 Generalizable End-to-End Tool-Use RL with Synthetic CodeGym Weihua Du et.al. 2509.17325 null
2025-09-21 SignalLLM: A General-Purpose LLM Agent Framework for Automated Signal Processing Junlong Ke et.al. 2509.17197 null
2025-09-21 LLMs as Layout Designers: A Spatial Reasoning Perspective Sha Li et.al. 2509.16891 null
2025-09-20 Towards Transparent and Incentive-Compatible Collaboration in Decentralized LLM Multi-Agent Systems: A Blockchain-Driven Approach Minfeng Qi et.al. 2509.16736 null
2025-09-20 OPEN-THEATRE: An Open-Source Toolkit for LLM-based Interactive Drama Tianyang Xu et.al. 2509.16713 null
2025-09-20 Governed By Agents: A Survey On The Role Of Agentic AI In Future Computing Environments Nauman Ali Murad et.al. 2509.16676 null
2025-09-19 Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans Deuksin Kwon et.al. 2509.16394 null
2025-09-19 Overhearing LLM Agents: A Survey, Taxonomy, and Roadmap Andrew Zhu et.al. 2509.16325 null
2025-09-19 Towards Robust Visual Continual Learning with Multi-Prototype Supervision Xiwei Liu et.al. 2509.16011 null
2025-09-19 How do Language Models Generate Slang: A Systematic Comparison between Human and Machine-Generated Slang Usages Siyang Wu et.al. 2509.15518 null
2025-09-19 LLM Agents at the Roundtable: A Multi-Perspective and Dialectical Reasoning Framework for Essay Scoring Jinhee Jang et.al. 2509.14834 null
2025-09-18 SecureFixAgent: A Hybrid LLM Agent for Automated Python Static Vulnerability Repair Jugal Gajjar et.al. 2509.16275 null
2025-09-18 Diagnostics of cognitive failures in multi-agent expert systems using dynamic evaluation protocols and subsequent mutation of the processing context Andrejs Sorstkins et.al. 2509.15366 null
2025-09-18 A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making Xiao Wu et.al. 2509.14998 null
2025-09-18 ToolSample: Dual Dynamic Sampling Methods with Curriculum Learning for RL-based Tool Learning Zihao Feng et.al. 2509.14718 null
2025-09-18 SWE-QA: Can Language Models Answer Repository-level Code Questions? Weihan Peng et.al. 2509.14635 null
2025-09-17 Ticket-Bench: A Kickoff for Multilingual and Regionalized Agent Evaluation Thales Sales Almeida et.al. 2509.14477 null
2025-09-17 TopoSizing: An LLM-aided Framework of Topology-based Understanding and Sizing for AMS Circuits Ziming Wei et.al. 2509.14169 null
2025-09-17 Understanding the Process of Human-AI Value Alignment Jack McKinlay et.al. 2509.13854 null
2025-09-17 From Legacy Fortran to Portable Kokkos: An Autonomous Agentic AI Workflow Sparsh Gupta et.al. 2509.12443 null
2025-09-17 Co-Investigator AI: The Rise of Agentic AI for Smarter, Trustworthy AML Compliance Narratives Prathamesh Vasudeo Naik et.al. 2509.08380 null
2025-09-17 Emergent Social Dynamics of LLM Agents in the El Farol Bar Problem Ryosuke Takata et.al. 2509.04537 null
2025-09-17 How Does Cognitive Bias Affect Large Language Models? A Case Study on the Anchoring Effect in Price Negotiation Simulations Yoshiki Takenami et.al. 2508.21137 null
2025-09-16 Agentic JWT: A Secure Delegation Protocol for Autonomous AI Agents Abhishek Goswami et.al. 2509.13597 null
2025-09-16 AI Agents with Human-Like Collaborative Tools: Adaptive Strategies for Enhanced Problem-Solving Harper Reed et.al. 2509.13547 null
2025-09-16 An LLM Agentic Approach for Legal-Critical Software: A Case Study for Tax Prep Software Sina Gogani-Khiabani et.al. 2509.13471 null
2025-09-16 WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning Kuan Li et.al. 2509.13305 null
2025-09-16 Agentic AI for Financial Crime Compliance Henrik Axelsen et.al. 2509.13137 null
2025-09-16 Toward PDDL Planning Copilot Yarin Benyamin et.al. 2509.12987 null
2025-09-16 H $^2$ R: Hierarchical Hindsight Reflection for Multi-Task LLM Agents Shicheng Ye et.al. 2509.12810 null
2025-09-16 Agentic Lybic: Multi-Agent Execution System with Tiered Reasoning and Orchestration Liangxuan Guo et.al. 2509.11067 null
2025-09-16 PromptSleuth: Detecting Prompt Injection via Semantic Intent Invariance Mengxiao Wang et.al. 2508.20890 null
2025-09-16 Mining the Long Tail: A Comparative Study of Data-Centric Criticality Metrics for Robust Offline Reinforcement Learning in Autonomous Motion Planning Antonio Guillen-Perez et.al. 2508.18397 null
2025-09-16 Enhancing LLM-Based Social Bot via an Adversarial Learning Framework Fanqi Kong et.al. 2508.17711 null
2025-09-15 Emotions are Recognized Patterns of Cognitive Activities Yue Jin et.al. 2509.16232 null
2025-09-15 Redefining Website Fingerprinting Attacks With Multiagent LLMs Chuxu Song et.al. 2509.12462 null
2025-09-15 Survival at Any Cost? LLMs and the Choice Between Self-Preservation and Human Harm Alireza Mohamadi et.al. 2509.12190 null
2025-09-15 VisDocSketcher: Towards Scalable Visual Documentation with Agentic Systems Luís F. Gomes et.al. 2509.11942 null
2025-09-15 $ε$ -Optimal Multi-Agent Patrol using Recurrent Strategy Deepak Mallya et.al. 2509.11640 null
2025-09-15 Automated Creation and Enrichment Framework for Improved Invocation of Enterprise APIs as Tools Prerna Agarwal et.al. 2509.11626 null
2025-09-15 MedicalOS: An LLM Agent based Operating System for Digital Healthcare Jared Zhu et.al. 2509.11507 null
2025-09-14 Agentic UAVs: LLM-Driven Autonomy with Integrated Tool-Calling and Cognitive Reasoning Anis Koubaa et.al. 2509.13352 null
2025-09-14 Prompts to Proxies: Emulating Human Preferences via a Compact LLM Ensemble Bingchen Wang et.al. 2509.11311 null
2025-09-14 Free-MAD: Consensus-Free Multi-Agent Debate Yu Cui et.al. 2509.11035 null
2025-09-12 FHIR-AgentBench: Benchmarking LLM Agents for Realistic Interoperable EHR Question Answering Gyubok Lee et.al. 2509.19319 null
2025-09-12 V-Math: An Agentic Approach to the Vietnamese National High School Graduation Mathematics Exams Duong Q. Nguyen et.al. 2509.12251 null
2025-09-12 Dark Patterns Meet GUI Agents: LLM Agent Susceptibility to Manipulative Interfaces and the Role of Human Oversight Jingyu Tang et.al. 2509.10723 null
2025-09-12 Self-Supervised Goal-Reaching Results in Multi-Agent Cooperation and Exploration Chirayu Nimonkar et.al. 2509.10656 null
2025-09-12 SciML Agents: Write the Solver, Not the Solution Saarth Gaonkar et.al. 2509.09936 null
2025-09-12 Tackling One Health Risks: How Large Language Models are leveraged for Risk Negotiation and Consensus-building Alexandra Fetsch et.al. 2509.09906 null
2025-09-12 Strategic Tradeoffs Between Humans and AI in Multi-Agent Bargaining Crystal Qian et.al. 2509.09071 null
2025-09-11 TrEnv: Transparently Share Serverless Execution Environments Across Different Functions and Nodes Jialiang Huang et.al. 2509.09525 null
2025-09-11 Curriculum-Based Multi-Tier Semantic Exploration via Deep Reinforcement Learning Abdel Hakim Drid et.al. 2509.09356 null
2025-09-11 Flip Co-op: Cooperative Takeovers in Shared Autonomy Sandeep Banik et.al. 2509.09281 null
2025-09-11 Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents Jiawei Wang et.al. 2509.09265 null
2025-09-11 Enabling Regulatory Multi-Agent Collaboration: Architecture, Challenges, and Solutions Qinnan Hu et.al. 2509.09215 null
2025-09-10 HypoGeneAgent: A Hypothesis Language Agent for Gene-Set Cluster Resolution Selection Using Perturb-seq Datasets Ying Yuan et.al. 2509.09740 null
2025-09-10 AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning Zhiheng Xi et.al. 2509.08755 null
2025-09-10 Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations Ron F. Del Rosario et.al. 2509.08646 null
2025-09-10 AutoODD: Agentic Audits via Bayesian Red Teaming in Black-Box Models Rebecca Martin et.al. 2509.08638 null
2025-09-09 Multi Robot Coordination in Highly Dynamic Environments: Tackling Asymmetric Obstacles and Limited Communication Vincenzo Suriani et.al. 2509.08859 null
2025-09-09 EnvX: Agentize Everything with Agentic AI Linyao Chen et.al. 2509.08088 null
2025-09-09 Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees Katsuaki Nakano et.al. 2509.07939 null
2025-09-09 Getting In Contract with Large Language Models – An Agency Theory Perspective On Large Language Model Alignment Sascha Kaltenpoth et.al. 2509.07642 null
2025-09-09 Astra: A Multi-Agent System for GPU Kernel Performance Optimization Anjiang Wei et.al. 2509.07506 null
2025-09-09 Talking with Oompa Loompas: A novel framework for evaluating linguistic acquisition of LLM agents Sankalp Tattwadarshi Swain et.al. 2509.07389 null
2025-09-09 Autonomous Code Evolution Meets NP-Completeness Cunxi Yu et.al. 2509.07367 null
2025-09-09 CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation Alyssa Unell et.al. 2509.07325 null
2025-09-08 AxelSMOTE: An Agent-Based Oversampling Algorithm for Imbalanced Classification Sukumar Kishanthan et.al. 2509.06875 null
2025-09-08 RAFFLES: Reasoning-based Attribution of Faults for LLM Systems Chenyang Zhu et.al. 2509.06822 null
2025-09-08 Reinforcement Learning Foundations for Deep Research Systems: A Survey Wenjun Li et.al. 2509.06733 null
2025-09-08 REMI: A Novel Causal Schema Memory Architecture for Personalized Lifestyle Recommendation Agents Vishal Raman et.al. 2509.06269 null
2025-09-08 TalkToAgent: A Human-centric Explanation of Reinforcement Learning Agents with Large Language Models Haechang Kim et.al. 2509.04809 null
2025-09-08 Meta-Policy Reflexion: Reusable Reflective Memory and Rule Admissibility for Resource-Efficient LLM Agent Chunlong Wu et.al. 2509.03990 null
2025-09-07 From Digital Distrust to Codified Honesty: Experimental Evidence on Generative AI in Credence Goods Markets Alexander Erlei et.al. 2509.06069 null
2025-09-07 Let’s Roleplay: Examining LLM Alignment in Collaborative Dialogues Abhijnan Nath et.al. 2509.05882 null
2025-09-06 DRF: LLM-AGENT Dynamic Reputation Filtering Framework Yuwei Lou et.al. 2509.05764 null
2025-09-05 Internet 3.0: Architecture for a Web-of-Agents with it’s Algorithm for Ranking Agents Rajesh Tembarai Krishnamachari et.al. 2509.04979 null
2025-09-05 OSC: Cognitive Orchestration through Dynamic Knowledge Alignment in Multi-Agent LLM Collaboration Jusheng Zhang et.al. 2509.04876 null
2025-09-05 UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Haoming Wang et.al. 2509.02544 null
2025-09-04 Maestro: Joint Graph & Config Optimization for Reliable AI Agents Wenxiao Wang et.al. 2509.04642 null
2025-09-04 Psychologically Enhanced AI Agents Maciej Besta et.al. 2509.04343 null
2025-09-04 Are LLM Agents the New RPA? A Comparative Study with RPA Across Enterprise Workflows Petr Průcha et.al. 2509.04198 null
2025-09-04 MAGneT: Coordinated Multi-Agent Generation of Synthetic Multi-Turn Mental Health Counseling Sessions Aishik Mandal et.al. 2509.04183 null
2025-09-04 Real-time adaptive quantum error correction by model-free multi-agent learning Manuel Guatto et.al. 2509.03974 null
2025-09-04 FaMA: LLM-Empowered Agentic Assistant for Consumer-to-Consumer Marketplace Yineng Yan et.al. 2509.03890 null
2025-09-04 Leveraging LLM-Based Agents for Intelligent Supply Chain Planning Yongzhi Qi et.al. 2509.03811 null
2025-09-04 AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems? Guibin Zhang et.al. 2509.03312 null
2025-09-03 Are LLM Agents Behaviorally Coherent? Latent Profiles for Social Simulation James Mooney et.al. 2509.03736 null
2025-09-02 DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence Pranav Narayanan Venkit et.al. 2509.04499 null
2025-09-02 Deep Research is the New Analytics System: Towards Building the Runtime for AI-Driven Analytics Matthew Russo et.al. 2509.02751 null
2025-09-02 The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Guibin Zhang et.al. 2509.02547 null
2025-09-02 Towards Agents That Know When They Don’t Know: Uncertainty as a Control Signal for Structured Reasoning Josefa Lia Stoisser et.al. 2509.02401 null
2025-09-02 When Agents go Astray: Course-Correcting SWE Agents with PRMs Shubham Gandhi et.al. 2509.02360 null
2025-09-01 The Need for Verification in AI-Driven Scientific Discovery Cristina Cornelio et.al. 2509.01398 null
2025-09-01 Multi-Agent Reinforcement Learning for Task Offloading in Wireless Edge Networks Andrea Fox et.al. 2509.01257 null
2025-09-01 ORCA: ORchestrating Causal Agent Joanie Hayoun Chung et.al. 2508.21304 null
2025-09-01 How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$ -bench Venkatesh Mishra et.al. 2508.20931 null
2025-09-01 Instructional Agents: LLM Agents on Automated Course Material Generation for Teaching Faculties Huaiyuan Yao et.al. 2508.19611 null
2025-08-31 Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First Shu Liu et.al. 2509.00997 null
2025-08-30 Inducing State Anxiety in LLM Agents Reproduces Human-Like Biases in Consumer Decision-Making Ziv Ben-Zion et.al. 2510.06222 null
2025-08-30 Exploring Decision-Making Capabilities of LLM Agents: An Experimental Study on Jump-Jump Game Juwu Li et.al. 2509.00483 null
2025-08-29 COCORELI: Cooperative, Compositional Reconstitution \& Execution of Language Instructions Swarnadeep Bhar et.al. 2509.04470 null
2025-08-29 ReLATE: Learning Efficient Sparse Encoding for High-Performance Tensor Decomposition Ahmed E. Helal et.al. 2509.00280 null
2025-08-29 HiVA: Self-organized Hierarchical Variable Agent via Goal-driven Semantic-Topological Evolution Jinzhou Tang et.al. 2509.00189 null
2025-08-28 A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers Ming Hu et.al. 2508.21148 null
2025-08-28 Provable Benefits of In-Tool Learning for Large Language Models Sam Houliston et.al. 2508.20755 null
2025-08-28 rStar2-Agent: Agentic Reasoning Technical Report Ning Shang et.al. 2508.20722 null
2025-08-28 CyberSleuth: Autonomous Blue-Team LLM Agent for Web Attack Forensics Stefano Fumero et.al. 2508.20643 null
2025-08-28 MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers Zhenting Wang et.al. 2508.20453 null
2025-08-28 MindGuard: Tracking, Detecting, and Attributing MCP Tool Poisoning Attack via Decision Dependence Graph Zhiqiang Wang et.al. 2508.20412 null
2025-08-27 CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning Zeyi Sun et.al. 2508.20096 null
2025-08-27 AgentCoMa: A Compositional Benchmark Mixing Commonsense and Mathematical Reasoning in Real-World Scenarios Lisa Alazraki et.al. 2508.19988 null
2025-08-27 Evaluating Language Model Reasoning about Confidential Information Dylan Sam et.al. 2508.19980 null
2025-08-27 Secure Multi-LLM Agentic AI and Agentification for Edge General Intelligence by Zero-Trust: A Survey Yinqiu Liu et.al. 2508.19870 null
2025-08-27 Survey of Specialized Large Language Model Chenghan Yang et.al. 2508.19667 null
2025-08-27 CompLex: Music Theory Lexicon Constructed by Autonomous Agents for Automatic Music Generation Zhejing Hu et.al. 2508.19603 null
2025-08-27 Encouraging Good Processes Without the Need for Good Answers: Reinforcement Learning for LLM Agent Planning Zhiwei Li et.al. 2508.19598 null
2025-08-27 Aegis: Taxonomy and Optimizations for Overcoming Agent-Environment Failures in LLM Agents Kevin Song et.al. 2508.19504 null
2025-08-27 Interactive Graph Visualization and TeamingRecommendation in an Interdisciplinary Project’sTalent Knowledge Graph Jiawei Xu et.al. 2508.19489 null
2025-08-26 Reliable Weak-to-Strong Monitoring of LLM Agents Neil Kale et.al. 2508.19461 null
2025-08-26 Real-Time Model Checking for Closed-Loop Robot Reactive Planning Christopher Chandler et.al. 2508.19186 null
2025-08-26 MATRIX: Multi-Agent simulaTion fRamework for safe Interactions and conteXtual clinical conversational evaluation Ernest Lim et.al. 2508.19163 null
2025-08-26 A Concurrent Modular Agent: Framework for Autonomous LLM Agents Norihiro Maruyama et.al. 2508.19042 null
2025-08-26 CausalMACE: Causality Empowered Multi-Agents in Minecraft Cooperative Tasks Qi Chai et.al. 2508.18797 null
2025-08-26 Toward Edge General Intelligence with Agentic AI and Agentification: Concepts, Technologies, and Future Directions Ruichen Zhang et.al. 2508.18725 null
2025-08-26 FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation Shaswata Mitra et.al. 2508.18684 null
2025-08-26 Utilizing Training Data to Improve LLM Reasoning for Tabular Understanding Chufan Gao et.al. 2508.18676 null
2025-08-26 Bias-Adjusted LLM Agents for Human-Like Decision-Making via Behavioral Economics Ayato Kitadai et.al. 2508.18600 null
2025-08-26 Generative Artificial Intelligence and Agents in Research and Teaching Jussi S. Jauhiainen et.al. 2508.16701 null
2025-08-25 Toward Generalized Autonomous Agents: A Neuro-Symbolic AI Framework for Integrating Social and Technical Support in Education Ryan Hare et.al. 2508.18406 null
2025-08-25 The AI Data Scientist Farkhad Akimov et.al. 2508.18113 null
2025-08-25 Memento: Fine-tuning LLM Agents without Fine-tuning LLMs Huichi Zhou et.al. 2508.16153 null
2025-08-24 FLAIRR-TS – Forecasting LLM-Agents with Iterative Refinement and Retrieval for Time Series Gunjan Jalori et.al. 2508.19279 null
2025-08-24 Agent-Testing Agent: A Meta-Agent for Automated Testing and Evaluation of Conversational AI Agents Sameer Komoravolu et.al. 2508.17393 null
2025-08-24 From Language to Action: A Review of Large Language Models as Autonomous Agents and Tool Users Sadia Sultana Chowa et.al. 2508.17281 null
2025-08-22 AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications Dawei Gao et.al. 2508.16279 null
2025-08-22 IR-Agent: Expert-Inspired LLM Agents for Structure Elucidation from Infrared Spectra Heewoong Noh et.al. 2508.16112 null
2025-08-21 Noise, Adaptation, and Strategy: Assessing LLM Fidelity in Decision-Making Yuanjun Feng et.al. 2508.15926 null
2025-08-21 End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning Qiaoyu Zheng et.al. 2508.15746 null

Large Language Models

Publish Date Title Authors PDF Code
2025-10-29 OpenReward: Learning to Reward Long-form Agentic Tasks via Reinforcement Learning Ziyou Hu et.al. 2510.24636 null
2025-10-28 Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance Yujie Wei et.al. 2510.24711 null
2025-10-28 ComboBench: Can LLMs Manipulate Physical Devices to Play Virtual Reality Games? Shuqing Li et.al. 2510.24706 null
2025-10-28 Tongyi DeepResearch Technical Report Tongyi DeepResearch Team et.al. 2510.24701 null
2025-10-28 Greedy Sampling Is Provably Efficient for RLHF Di Wu et.al. 2510.24700 null
2025-10-28 WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking Zhengwei Tao et.al. 2510.24697 null
2025-10-28 AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis Xuanzhong Chen et.al. 2510.24695 null
2025-10-28 STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence Zihan Liu et.al. 2510.24693 null
2025-10-28 Dissecting Role Cognition in Medical LLMs via Neuronal Ablation Xun Liang et.al. 2510.24677 null
2025-10-28 Evolving Diagnostic Agents in a Virtual Clinical Environment Pengcheng Qiu et.al. 2510.24654 null
2025-10-28 Optimizing Retrieval for RAG via Reinforced Contrastive Learning Jiawei Zhou et.al. 2510.24652 null
2025-10-28 Advancing site-specific disease and pest management in precision agriculture: From reasoning-driven foundation models to adaptive, feedback-based learning Nitin Rai et.al. 2510.24650 null
2025-10-28 FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling Zengzhuang Xu et.al. 2510.24645 null
2025-10-28 Relative Scaling Laws for LLMs William Held et.al. 2510.24626 null
2025-10-28 Zero-Shot Cross-Lingual Transfer using Prefix-Based Adaptation Snegha A et.al. 2510.24619 null
2025-10-28 Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way Yicun Yang et.al. 2510.24605 null
2025-10-28 ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization Guoxin Chen et.al. 2510.24592 null
2025-10-28 ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers? Christine Ye et.al. 2510.24591 null
2025-10-28 Generative AI for Healthcare: Fundamentals, Challenges, and Perspectives Gang Chen et.al. 2510.24551 null
2025-10-28 Open Korean Historical Corpus: A Millennia-Scale Diachronic Collection of Public Domain Texts Seyoung Song et.al. 2510.24541 null
2025-10-28 Multi-Agent Evolve: LLM Self-Improve through Co-evolution Yixing Chen et.al. 2510.23595 null
2025-10-28 PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection Yusu Qian et.al. 2510.23594 null
2025-10-27 PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity Yuqian Yuan et.al. 2510.23603 null
2025-10-27 Alita-G: Self-Evolving Generative Agent for Agent Generation Jiahao Qiu et.al. 2510.23601 null
2025-10-27 Think Twice: Branch-and-Rethink Reasoning Reward Model Yizhu Jiao et.al. 2510.23596 null
2025-10-27 Lightweight Robust Direct Preference Optimization Cheol Woo Kim et.al. 2510.23590 null
2025-10-27 FARMER: Flow AutoRegressive Transformer over Pixels Guangting Zheng et.al. 2510.23588 null
2025-10-27 A Survey of Data Agents: Emerging Paradigm or Overstated Hype? Yizhang Zhu et.al. 2510.23587 null
2025-10-27 RobotArena $\infty$ : Scalable Robot Benchmarking via Real-to-Sim Translation Yash Jangir et.al. 2510.23571 null
2025-10-27 EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT Baoqi Pei et.al. 2510.23569 null
2025-10-27 ReCode: Unify Plan and Action for Universal Granularity Control Zhaoyang Yu et.al. 2510.23564 null
2025-10-27 ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models Bohan Li et.al. 2510.23558 null
2025-10-27 Minimizing Human Intervention in Online Classification William Réveillard et.al. 2510.23557 null
2025-10-27 IPQA: A Benchmark for Core Intent Identification in Personalized Question Answering Jieyong Kim et.al. 2510.23536 null
2025-10-27 Point Convergence of Nesterov’s Accelerated Gradient Method: An AI-Assisted Proof Uijeong Jang et.al. 2510.23513 null
2025-10-27 Deductive Chain-of-Thought Augmented Socially-aware Robot Navigation World Model Weizheng Wang et.al. 2510.23509 null
2025-10-27 Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier Hyeongseop Rha et.al. 2510.23506 null
2025-10-27 VOLD: Reasoning Transfer from LLMs to Vision-Language Models via On-Policy Distillation Walid Bousselham et.al. 2510.23497 null
2025-10-27 Learning the PTM Code through a Coarse-to-Fine, Mechanism-Aware Framework Jingjie Zhang et.al. 2510.23492 null
2025-10-27 Learning to Reason Efficiently with Discounted Reinforcement Learning Alex Ayoub et.al. 2510.23486 null
2025-10-24 A Multimodal Benchmark for Framing of Oil & Gas Advertising and Potential Greenwashing Detection Gaku Morio et.al. 2510.21679 null
2025-10-24 A Data-Centric Approach to Multilingual E-Commerce Product Search: Case Study on Query-Category and Query-Item Relevance Yabo Yin et.al. 2510.21671 null
2025-10-24 The Universal Landscape of Human Reasoning Qiguang Chen et.al. 2510.21623 null
2025-10-24 Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine Wenyi Wang et.al. 2510.21614 null
2025-10-24 Modest-Align: Data-Efficient Alignment for Vision-Language Models Jiaxiang Liu et.al. 2510.21606 null
2025-10-24 RETuning: Upgrading Inference-Time Scaling for Stock Movement Prediction with Large Language Models Xueyuan Lin et.al. 2510.21604 null
2025-10-24 From Polyester Girlfriends to Blind Mice: Creating the First Pragmatics Understanding Benchmarks for Slovene Mojca Brglez et.al. 2510.21575 null
2025-10-24 ColorEcosystem: Powering Personalized, Standardized, and Trustworthy Agentic Service in massive-agent Ecosystem Fangwen Wu et.al. 2510.21566 null
2025-10-24 Are the LLMs Capable of Maintaining at Least the Language Genus? Sandra Mitrović et.al. 2510.21561 null
2025-10-24 EU-Agent-Bench: Measuring Illegal Behavior of LLM Agents Under EU Law Ilija Lichkovski et.al. 2510.21524 null
2025-10-24 Brain-tuning Improves Generalizability and Efficiency of Brain Alignment in Speech Models Omer Moussa et.al. 2510.21520 null
2025-10-24 Head Pursuit: Probing Attention Specialization in Multimodal Transformers Lorenzo Basile et.al. 2510.21518 null
2025-10-24 Wisdom and Delusion of LLM Ensembles for Code Generation and Repair Fernando Vallecillos Ruiz et.al. 2510.21513 null
2025-10-24 Actionable Cybersecurity Notifications for Smart Homes: A User Study on the Role of Length and Complexity Victor Jüttner et.al. 2510.21508 null
2025-10-24 MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization Chenglong Wang et.al. 2510.21473 null
2025-10-24 Risk Management for Mitigating Benchmark Failure Modes: BenchRisk Sean McGregor et.al. 2510.21460 null
2025-10-24 SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots Adetayo Adebimpe et.al. 2510.21459 null
2025-10-24 ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models Federico Danieli et.al. 2510.21450 null
2025-10-24 MoniTor: Exploiting Large Language Models with Instruction for Online Video Anomaly Detection Shengtian Yang et.al. 2510.21449 null
2025-10-24 REMONI: An Autonomous System Integrating Wearables and Multimodal Large Language Models for Enhanced Remote Health Monitoring Thanh Cong Ho et.al. 2510.21445 null
2025-10-23 KL-Regularized Reinforcement Learning is Designed to Mode Collapse Anthony GX-Chen et.al. 2510.20817 null
2025-10-23 Generative Reasoning Recommendation via LLMs Minjie Hong et.al. 2510.20815 null
2025-10-23 Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation Yuhan Liu et.al. 2510.20812 null
2025-10-23 On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text? Mingmeng Geng et.al. 2510.20810 null
2025-10-23 Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal Transformers Dean L Slack et.al. 2510.20807 null
2025-10-23 ARGenSeg: Image Segmentation with Autoregressive Image Generation Model Xiaolong Wang et.al. 2510.20803 null
2025-10-23 Simple Context Compression: Mean-Pooling and Multi-Ratio Training Yair Feldman et.al. 2510.20797 null
2025-10-23 A Use-Case Specific Dataset for Measuring Dimensions of Responsible Performance in LLM-generated Text Alicia Sagae et.al. 2510.20782 null
2025-10-23 RAGRank: Using PageRank to Counter Poisoning in CTI LLM Pipelines Austin Jia et.al. 2510.20768 null
2025-10-23 Empathic Prompting: Non-Verbal Context Integration for Multimodal LLM Conversations Lorenzo Stacchio et.al. 2510.20743 null
2025-10-23 Learning to Triage Taint Flows Reported by Dynamic Program Analysis in Node.js Packages Ronghao Ni et.al. 2510.20739 null
2025-10-23 Automated Extraction of Fluoropyrimidine Treatment and Treatment-Related Toxicities from Clinical Notes Using Natural Language Processing Xizhi Wu et.al. 2510.20727 null
2025-10-23 User Perceptions of Privacy and Helpfulness in LLM Responses to Privacy-Sensitive Scenarios Xiaoyuan Wu et.al. 2510.20721 null
2025-10-23 Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models Xuyang Liu et.al. 2510.20707 null
2025-10-23 Structure-Conditional Minimum Bayes Risk Decoding Bryan Eikema et.al. 2510.20700 null
2025-10-23 Diagnosing Visual Reasoning: Challenges, Insights, and a Path Forward Jing Bi et.al. 2510.20696 null
2025-10-23 Exploring Large Language Models for Access Control Policy Synthesis and Summarization Adarsh Vatsa et.al. 2510.20692 null
2025-10-23 Plan Then Retrieve: Reinforcement Learning-Guided Complex Reasoning over Knowledge Graphs Yanlin Song et.al. 2510.20691 null
2025-10-23 Neural Diversity Regularizes Hallucinations in Small Models Kushal Chakrabarti et.al. 2510.20690 null
2025-10-23 Bayesian Jammer Localization with a Hybrid CNN and Path-Loss Mixture of Experts Mariona Jaramillo-Civill et.al. 2510.20666 null
2025-10-23 Zhyper: Factorized Hypernetworks for Conditioned LLM Fine-Tuning M. H. I. Abdalla et.al. 2510.19733 null
2025-10-23 Fast Inference via Hierarchical Speculative Decoding Clara Mohri et.al. 2510.19705 null
2025-10-22 Semantic World Models Jacob Berg et.al. 2510.19818 null
2025-10-22 olmOCR 2: Unit Test Rewards for Document OCR Jake Poznanski et.al. 2510.19817 null
2025-10-22 Hubble: a Model Suite to Advance the Study of LLM Memorization Johnny Tian-Zheng Wei et.al. 2510.19811 null
2025-10-22 Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning Xichen Zhang et.al. 2510.19807 null
2025-10-22 The Art of Asking: Multilingual Prompt Optimization for Synthetic Data David Mora et.al. 2510.19806 null
2025-10-22 Forbidden Sidon subsets of perfect difference sets, featuring a human-assisted proof Boris Alexeev et.al. 2510.19804 null
2025-10-22 Class-Aware Prototype Learning with Negative Contrast for Test-Time Adaptation of Vision-Language Models Xiaozhen Qiao et.al. 2510.19802 null
2025-10-22 The Feasibility of Training Sovereign Language Models in the Global South: A Study of Brazil and Mexico Sandra Malagon et.al. 2510.19801 null
2025-10-22 Integrating Transparent Models, LLMs, and Practitioner-in-the-Loop: A Case of Nonprofit Program Evaluation Ji Ma et.al. 2510.19799 null
2025-10-22 Blackbox Model Provenance via Palimpsestic Membership Inference Rohith Kuditipudi et.al. 2510.19796 null
2025-10-22 On Controlled Change: Generative AI’s Impact on Professional Authority in Journalism Tomás Dodds et.al. 2510.19792 null
2025-10-22 ToolDreamer: Instilling LLM Reasoning Into Tool Retrievers Saptarshi Sengupta et.al. 2510.19791 null
2025-10-22 AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Yuezhou Hu et.al. 2510.19779 null
2025-10-22 The Tail Tells All: Estimating Model-Level Membership Inference Vulnerability Without Reference Models Euodia Dodd et.al. 2510.19773 null
2025-10-22 SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration Xichen Zhang et.al. 2510.19767 null
2025-10-22 Top-P Masking for Cross Language Information Retrieval Joseph Casale et.al. 2510.19758 null
2025-10-22 Review of Tools for Zero-Code LLM Based Application Development Priyaranjan Pattnayak et.al. 2510.19747 null
2025-10-22 RLIE: Rule Generation with Logistic Regression, Iterative Refinement, and Evaluation for Large Language Models Yang Yang et.al. 2510.19698 null
2025-10-22 Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs Haochen Wang et.al. 2510.18876 null
2025-10-21 Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting Howard Chen et.al. 2510.18874 null
2025-10-21 DSI-Bench: A Benchmark for Dynamic Spatial Intelligence Ziang Zhang et.al. 2510.18873 null
2025-10-21 How Do LLMs Use Their Depth? Akshat Gupta et.al. 2510.18871 null
2025-10-21 LightMem: Lightweight and Efficient Memory-Augmented Generation Jizhan Fang et.al. 2510.18866 null
2025-10-21 EffiReasonTrans: RL-Optimized Reasoning for Code Translation Yanlin Wang et.al. 2510.18863 null
2025-10-21 Streamlining Acceptance Test Generation for Mobile Applications Through Large Language Models: An Industrial Case Study Pedro Luís Fonseca et.al. 2510.18861 null
2025-10-21 An Encoder-Decoder Foundation Chemical Language Model for Generative Polymer Design Harikrishna Sahu et.al. 2510.18860 null
2025-10-21 Towards Faithful and Controllable Personalization via Critique-Post-Edit Reinforcement Learning Chenghao Zhu et.al. 2510.18849 null
2025-10-21 See the Text: From Tokenization to Visual Reading Ling Xing et.al. 2510.18840 null
2025-10-21 FedDEAP: Adaptive Dual-Prompt Tuning for Multi-Domain Federated Learning Yubin Zheng et.al. 2510.18837 null
2025-10-21 MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training Wenxuan Li et.al. 2510.18830 null
2025-10-21 Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework Yujie Xing et.al. 2510.18825 null
2025-10-21 Fine-Tuned Thoughts: Leveraging Chain-of-Thought Reasoning for Industrial Asset Health Monitoring Shuxin Lin et.al. 2510.18817 null
2025-10-21 Integrating Large Language Models and Evaluating Student Outcomes in an Introductory Computer Science Course Annapurna Vadaparty et.al. 2510.18806 null
2025-10-21 FeClustRE: Hierarchical Clustering and Semantic Tagging of App Features from User Reviews Max Tiessler et.al. 2510.18799 null
2025-10-21 ShaRE your Data! Characterizing Datasets for LLM-based Requirements Engineering Quim Motger et.al. 2510.18787 null
2025-10-21 KAT-Coder Technical Report Zizheng Zhan et.al. 2510.18779 null
2025-10-21 Seg the HAB: Language-Guided Geospatial Algae Bloom Reasoning and Segmentation Patterson Hsieh et.al. 2510.18751 null
2025-10-21 Topoformer: brain-like topographic organization in Transformer language models through spatial querying and reweighting Taha Binhuraib et.al. 2510.18745 null
2025-10-21 Verifiable Accuracy and Abstention Rewards in Curriculum RL to Alleviate Lost-in-Conversation Ming Li et.al. 2510.18731 null
2025-10-21 HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models Sidhant Narula et.al. 2510.18728 null
2025-10-21 IF-VidCap: Can Video Caption Models Follow Instructions? Shihao Li et.al. 2510.18726 null
2025-10-21 SemiAdapt and SemiLoRA: Efficient Domain Adaptation for Transformer-based Low-Resource Language Translation with a Case Study on Irish Josh McGiff et.al. 2510.18725 null
2025-10-21 SSD: Spatial-Semantic Head Decoupling for Efficient Autoregressive Image Generation Siyong Jian et.al. 2510.18716 null
2025-10-21 Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options Joongkyu Lee et.al. 2510.18713 null
2025-10-21 Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents Yiqi Lin et.al. 2510.18703 null
2025-10-21 UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation Yibin Wang et.al. 2510.18701 null
2025-10-21 MLMA: Towards Multilingual with Mamba Based Architectures Mohamed Nabih Ali et.al. 2510.18684 null
2025-10-21 Exploring Membership Inference Vulnerabilities in Clinical Large Language Models Alexander Nemecek et.al. 2510.18674 null
2025-10-21 Reasoning Language Model Inference Serving Unveiled: An Empirical Study Qi Li et.al. 2510.18672 null
2025-10-21 Hardness of Learning Regular Languages in the Next Symbol Prediction Setting Satwik Bhattamishra et.al. 2510.18634 null
2025-10-21 Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views Zhangquan Chen et.al. 2510.18632 null
2025-10-21 VAR: Visual Attention Reasoning via Structured Search and Backtracking Wei Cai et.al. 2510.18619 null
2025-10-21 Evaluating Large Language Models in detecting Secrets in Android Apps Marco Alecci et.al. 2510.18601 null
2025-10-21 CUARewardBench: A Benchmark for Evaluating Reward Models on Computer-using Agent Haojia Lin et.al. 2510.18596 null
2025-10-21 Tokencake: A KV-Cache-centric Serving Framework for LLM-based Multi-Agent Applications Zhuohang Bian et.al. 2510.18586 null
2025-10-21 CLASP: Cost-Optimized LLM-based Agentic System for Phishing Detection Fouad Trad et.al. 2510.18585 null
2025-10-21 CovMatch: Cross-Covariance Guided Multimodal Dataset Distillation with Trainable Text Encoder Yongmin Lee et.al. 2510.18583 null
2025-10-21 The Trust Paradox in LLM-Based Multi-Agent Systems: When Collaboration Becomes a Security Vulnerability Zijie Xu et.al. 2510.18563 null
2025-10-21 Large language models for folktale type automation based on motifs: Cinderella case study Tjaša Arčon et.al. 2510.18561 null
2025-10-21 Building Trust in Clinical LLMs: Bias Analysis and Dataset Transparency Svetlana Maslenkova et.al. 2510.18556 null
2025-10-21 JAUNT: Joint Alignment of User Intent and Network State for QoE-centric LLM Tool Routing Enhan Li et.al. 2510.18550 null
2025-10-21 EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval Zebin Yang et.al. 2510.18546 null
2025-10-21 SLICE: SLO-Driven Scheduling for LLM Inference on Edge Computing Devices Pan Zhou et.al. 2510.18544 null
2025-10-21 Noise-Conditioned Mixture-of-Experts Framework for Robust Speaker Verification Bin Gu et.al. 2510.18533 null
2025-10-21 LLMs as Sparse Retrievers:A Framework for First-Stage Product Search Hongru Song et.al. 2510.18527 null
2025-10-21 Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models Hanze Guo et.al. 2510.18526 null
2025-10-21 From Quarter to All: Accelerating Speculative LLM Decoding via Floating-Point Exponent Remapping and Parameter Sharing Yushu Zhao et.al. 2510.18525 null
2025-10-21 Socialized Learning and Emergent Behaviors in Multi-Agent Systems based on Multimodal Large Language Models Sureyya Akin et.al. 2510.18515 null
2025-10-21 Identity-Aware Large Language Models require Cultural Reasoning Alistair Plum et.al. 2510.18510 null
2025-10-21 Prompting the Priorities: A First Look at Evaluating LLMs for Vulnerability Triage and Prioritization Osama Al Haddad et.al. 2510.18508 null
2025-10-21 Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation Wei-Chia Chang et.al. 2510.18502 null
2025-10-21 One Size Fits All? A Modular Adaptive Sanitization Kit (MASK) for Customizable Privacy-Preserving Phone Scam Detection Kangzhong Wang et.al. 2510.18493 null
2025-10-21 The Attribution Story of WhisperGate: An Academic Perspective Oleksandr Adamov et.al. 2510.18484 null
2025-10-21 StarBench: A Turn-Based RPG Benchmark for Agentic Multimodal Decision-Making and Information Seeking Haoran Zhang et.al. 2510.18483 null
2025-10-21 How Efficient Are Diffusion Language Models? A Critical Examination of Efficiency Evaluation Practices Han Peng et.al. 2510.18480 null
2025-10-21 LAFA: Agentic LLM-Driven Federated Analytics over Decentralized Data Sources Haichao Ji et.al. 2510.18477 null
2025-10-21 Probabilistic Modeling of Intentions in Socially Intelligent LLM Agents Feifan Xia et.al. 2510.18476 null
2025-10-21 DART: A Structured Dataset of Regulatory Drug Documents in Italian for Clinical NLP Mariano Barone et.al. 2510.18475 null
2025-10-21 CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment Xue Jiang et.al. 2510.18471 null
2025-10-21 CircuitSeer: Mining High-Quality Data by Probing Mathematical Reasoning Circuits in LLMs Shaobo Wang et.al. 2510.18470 null
2025-10-21 IMB: An Italian Medical Benchmark for Question Answering Antonio Romano et.al. 2510.18468 null
2025-10-21 Simple and Efficient Heterogeneous Temporal Graph Neural Network Yili Wang et.al. 2510.18467 null
2025-10-21 CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning Masato Kikuchi et.al. 2510.18466 null
2025-10-21 Large Language Models in Thematic Analysis: Prompt Engineering, Evaluation, and Guidelines for Qualitative Software Engineering Research Cristina Martinez Montes et.al. 2510.18456 null
2025-10-21 Engagement Undermines Safety: How Stereotypes and Toxicity Shape Humor in Language Models Atharvan Dogra et.al. 2510.18454 null
2025-10-21 PlanU: Large Language Model Decision Making through Planning under Uncertainty Ziwei Deng et.al. 2510.18442 null
2025-10-21 Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation Yasser Hamidullah et.al. 2510.18439 null
2025-10-21 DeepTx: Real-Time Transaction Risk Analysis via Multi-Modal Features and LLM Reasoning Yixuan Liu et.al. 2510.18438 null
2025-10-21 Chain-of-Conceptual-Thought: Eliciting the Agent to Deeply Think within the Response Qingqing Gu et.al. 2510.18434 null
2025-10-21 ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization Yuanhe Guo et.al. 2510.18433 null
2025-10-21 Automated urban waterlogging assessment and early warning through a mixture of foundation models Chenxu Zhang et.al. 2510.18425 null
2025-10-21 Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents Guangfu Guo et.al. 2510.18424 null
2025-10-21 SegTune: Structured and Fine-Grained Control for Song Generation Pengfei Cai et.al. 2510.18416 null
2025-10-21 Adamas: Hadamard Sparse Attention for Efficient Long-Context Inference Siyuan Yan et.al. 2510.18413 null
2025-10-21 MENTOR: A Reinforcement Learning Framework for Model Enhancement via Teacher-Optimized Rewards in Small Models ChangSu Choi et.al. 2510.18383 null
2025-10-21 Training Diverse Graph Experts for Ensembles: A Systematic Empirical Study Gangda Deng et.al. 2510.18370 null
2025-10-21 KoSimpleQA: A Korean Factuality Benchmark with an Analysis of Reasoning LLMs Donghyeon Ko et.al. 2510.18368 null
2025-10-21 Evaluating LLM-Based Mobile App Recommendations: An Empirical Study Quim Motger et.al. 2510.18364 null
2025-10-21 KrishokBondhu: A Retrieval-Augmented Voice-Based Agricultural Advisory Call Center for Bengali Farmers Mohd Ruhul Ameen et.al. 2510.18355 null
2025-10-21 GPTFace: Generative Pre-training of Facial-Linguistic Transformer by Span Masking and Weakly Correlated Text-image Data Yudong Li et.al. 2510.18345 null
2025-10-21 Combining Distantly Supervised Models with In Context Learning for Monolingual and Cross-Lingual Relation Extraction Vipul Rathore et.al. 2510.18344 null
2025-10-21 Why Policy Gradient Algorithms Work for Undiscounted Total-Reward MDPs Jongmin Lee et.al. 2510.18340 null
2025-10-21 ECG-LLM– training and evaluation of domain-specific large language models for electrocardiography Lara Ahrens et.al. 2510.18339 null
2025-10-21 Position: LLM Watermarking Should Align Stakeholders’ Incentives for Practical Adoption Yepeng Liu et.al. 2510.18333 null
2025-10-21 InspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger Collaboration Yunkun Wang et.al. 2510.18327 null
2025-10-21 Beyond Single Models: Mitigating Multimodal Hallucinations via Adaptive Token Ensemble Decoding Jinlin Li et.al. 2510.18321 null
2025-10-21 Genesis: Evolving Attack Strategies for LLM Web Agent Red-Teaming Zheng Zhang et.al. 2510.18314 null
2025-10-21 ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive Text-to-Speech Generation Haowei Lou et.al. 2510.18308 null
2025-10-21 The Impact of Image Resolution on Biomedical Multimodal Large Language Models Liangyu Chen et.al. 2510.18304 null
2025-10-21 Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models Lehan Wang et.al. 2510.18303 null
2025-10-21 From Retrieval to Generation: Unifying External and Parametric Knowledge for Medical Question Answering Lei Li et.al. 2510.18297 null
2025-10-21 BrailleLLM: Braille Instruction Tuning with Large Language Models for Braille Domain Tasks Tianyuan Huang et.al. 2510.18288 null
2025-10-21 Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs Yanhong Li et.al. 2510.18279 null
2025-10-21 Enhancing Hotel Recommendations with AI: LLM-Based Review Summarization and Query-Driven Insights Nikolaos Belibasakis et.al. 2510.18277 null
2025-10-21 StreamingTOM: Streaming Token Compression for Efficient Video Understanding Xueyi Chen et.al. 2510.18269 null
2025-10-21 UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding Da Zhang et.al. 2510.18262 null
2025-10-21 DelvePO: Direction-Guided Self-Evolving Framework for Flexible Prompt Optimization Tao Tao et.al. 2510.18257 null
2025-10-21 Illusions of reflection: open-ended task reveals systematic failures in Large Language Models’ reflective reasoning Sion Weatherhead et.al. 2510.18254 null

Reinforcement Learning

Publish Date Title Authors PDF Code
2025-10-29 Prospects for a 95 GeV Higgs Boson at Future Higgs Factories with Transformer Networks Yabo Dong et.al. 2510.24662 null
2025-10-29 OpenReward: Learning to Reward Long-form Agentic Tasks via Reinforcement Learning Ziyou Hu et.al. 2510.24636 null
2025-10-28 Cluster Dose Prediction in Carbon Ion Therapy: Using Transfer Learning from a Pretrained Dose Prediction U-Net Miriam Schwarze et.al. 2510.24703 null
2025-10-28 Greedy Sampling Is Provably Efficient for RLHF Di Wu et.al. 2510.24700 null
2025-10-28 How Flat is a Plateau? Evolution of Late-Time TDE Disks Yael Alush et.al. 2510.24696 null
2025-10-28 SPICE: Self-Play In Corpus Environments Improves Reasoning Bo Liu et.al. 2510.24684 null
2025-10-28 Fare: Failure Resilience in Learned Visual Navigation Control Zishuo Wang et.al. 2510.24680 null
2025-10-28 Learning to Drive Safely with Hybrid Options Bram De Cooman et.al. 2510.24674 null
2025-10-28 Evolving Diagnostic Agents in a Virtual Clinical Environment Pengcheng Qiu et.al. 2510.24654 null
2025-10-28 Advancing site-specific disease and pest management in precision agriculture: From reasoning-driven foundation models to adaptive, feedback-based learning Nitin Rai et.al. 2510.24650 null
2025-10-28 Fast Bayesian Multilevel Quasi-Monte Carlo Aleksei G. Sorokin et.al. 2510.24604 null
2025-10-28 Low-lying baryon resonances from lattice QCD Colin Morningstar et.al. 2510.24596 null
2025-10-28 Towards Quadrupedal Jumping and Walking for Dynamic Locomotion using Reinforcement Learning Jørgen Anker Olsen et.al. 2510.24584 null
2025-10-28 Dual-Mind World Models: A General Framework for Learning in Dynamic Wireless Networks Lingyi Wang et.al. 2510.24546 null
2025-10-28 Sample-efficient and Scalable Exploration in Continuous-Time RL Klemens Iten et.al. 2510.24482 null
2025-10-28 Adaptive Surrogate Gradients for Sequential Reinforcement Learning in Spiking Neural Networks Korneel Van den Berghe et.al. 2510.24461 null
2025-10-28 Pair Approximation Meets Reality: Diffusion of Innovation in Organizational Networks within the biased-independence q-Voter Model Angelika Abramiuk-Szurlej et.al. 2510.24447 null
2025-10-28 SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space Viktoriia Zinkovich et.al. 2510.24446 null
2025-10-28 Fill in the Blanks: Accelerating Q-Learning with a Handful of Demonstrations in Sparse Reward Settings Seyed Mahdi Basiri Azad et.al. 2510.24432 null
2025-10-28 MiniOneRec: An Open-Source Framework for Scaling Generative Recommendation Xiaoyu Kong et.al. 2510.24431 null
2025-10-28 Multi-Agent Evolve: LLM Self-Improve through Co-evolution Yixing Chen et.al. 2510.23595 null
2025-10-28 VOLD: Reasoning Transfer from LLMs to Vision-Language Models via On-Policy Distillation Walid Bousselham et.al. 2510.23497 null
2025-10-28 SGFusion: Stochastic Geographic Gradient Fusion in Federated Learning Khoa Nguyen et.al. 2510.23455 null
2025-10-27 Think Twice: Branch-and-Rethink Reasoning Reward Model Yizhu Jiao et.al. 2510.23596 null
2025-10-27 Cosmic magnification on multi-catalogue Herschel submillimetre galaxies R. Fernandez-Fernandez et.al. 2510.23582 null
2025-10-27 Towards Stochastic (N-1)-Secure Redispatch Oleksii Molodchyk et.al. 2510.23551 null
2025-10-27 Variational Thermal State Preparation on Digital Quantum Processors Assisted by Matrix Product States Rui-Hao Li et.al. 2510.23546 null
2025-10-27 Approximately optimal distributed controls for high-dimensional stochastic systems with pairwise interaction through controls Elise Devey et.al. 2510.23537 null
2025-10-27 Sequential Multi-Agent Dynamic Algorithm Configuration Chen Lu et.al. 2510.23535 null
2025-10-27 Learning to Reason Efficiently with Discounted Reinforcement Learning Alex Ayoub et.al. 2510.23486 null
2025-10-27 MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding Xin Jin et.al. 2510.23479 null
2025-10-27 Video-Thinker: Sparking “Thinking with Videos” via Reinforcement Learning Shijian Wang et.al. 2510.23473 null
2025-10-27 Adaptive Multilevel Splitting: First Application to Rare-Event Derivative Pricing Riccardo Gozzo et.al. 2510.23461 null
2025-10-27 Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences Zhuoran Jin et.al. 2510.23451 null
2025-10-27 An Information-Theoretic Analysis of Out-of-Distribution Generalization in Meta-Learning with Applications to Meta-RL Xingtu Liu et.al. 2510.23448 null
2025-10-27 Causal Deep Q Network Elouanes Khelifi et.al. 2510.23424 null
2025-10-27 A Sequential Planning Framework for the Operational Reality of Interacting Air Traffic Flow Regulations and Traffic Flow Programs Thinh Hoang et.al. 2510.23402 null
2025-10-27 VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations Lu Dong et.al. 2510.23397 null
2025-10-27 The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation Farid Bagirov et.al. 2510.23393 null
2025-10-27 Ground-state phase diagram of S = 1/2 Heisenberg model on 2D square-hexagon-octagon lattice Yumeng Luo et.al. 2510.23376 null
2025-10-24 Mechanistic Interpretability for Neural TSP Solvers Reuben Narad et.al. 2510.21693 null
2025-10-24 Reduced Floating-Point Precision Implicit Monte Carlo Simon Butson et.al. 2510.21683 null
2025-10-24 Goal-based portfolio selection with fixed transaction costs Erhan Bayraktar et.al. 2510.21650 null
2025-10-24 Electroweak corrections to $gg\rightarrow γγ$ Gabriele Fiore et.al. 2510.21643 null
2025-10-24 Predicted observational effects of rapid rotation for Be stars Rina G. Rast et.al. 2510.21640 null
2025-10-24 DEEDEE: Fast and Scalable Out-of-Distribution Dynamics Detection Tala Aljaafari et.al. 2510.21638 null
2025-10-24 DeepAgent: A General Reasoning Agent with Scalable Toolsets Xiaoxi Li et.al. 2510.21618 null
2025-10-24 Enhancing Tactile-based Reinforcement Learning for Robotic Control Elle Miller et.al. 2510.21609 null
2025-10-24 Multilevel Picard scheme for solving high-dimensional drift control problems with state constraints Yuan Zhong et.al. 2510.21607 null
2025-10-24 RETuning: Upgrading Inference-Time Scaling for Stock Movement Prediction with Large Language Models Xueyuan Lin et.al. 2510.21604 null
2025-10-24 Three-nucleon lepton-number-violating potentials in chiral EFT and their matrix elements in light nuclei Graham Chambers-Wall et.al. 2510.21564 null
2025-10-24 System-Theoretic Analysis of Dynamic Generalized Nash Equilibrium Problems – Turnpikes and Dissipativity Sophie Hall et.al. 2510.21556 null
2025-10-24 Cost Minimization for Space-Air-Ground Integrated Multi-Access Edge Computing Systems Weihong Qin et.al. 2510.21541 null
2025-10-24 A Unified Model for Multi-Task Drone Routing in Post-Disaster Road Assessment Huatian Gong et.al. 2510.21525 null
2025-10-24 Surrogate-based quantification of policy uncertainty in generative flow networks Ramón Nartallo-Kaluarachchi et.al. 2510.21523 null
2025-10-24 The population of Galactic young massive star clusters in the TeV range Rowan Batzofin et.al. 2510.21480 null
2025-10-24 MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization Chenglong Wang et.al. 2510.21473 null
2025-10-24 Constraints on ultra-heavy dark matter from the CDEX-10 experiment at the China Jinping Underground Laboratory Y. F. Wang et.al. 2510.21458 null
2025-10-24 Unified token representations for sequential decision models Zhuojing Tian et.al. 2510.21448 null
2025-10-24 Causality Meets Locality: Provably Generalizable and Scalable Policy Learning for Networked Systems Hao Liang et.al. 2510.21427 null
2025-10-24 Real-Time Gait Adaptation for Quadrupeds using Model Predictive Control and Reinforcement Learning Prakrut Kotecha et.al. 2510.20706 null
2025-10-23 KL-Regularized Reinforcement Learning is Designed to Mode Collapse Anthony GX-Chen et.al. 2510.20817 null
2025-10-23 GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation Guangqi Jiang et.al. 2510.20813 null
2025-10-23 A Microphysical Probe of Neutron Star Interiors: Constraining the Equation of State with Glitch Dynamics Zhonghao Tu et.al. 2510.20791 null
2025-10-23 Consumption-Investment Problem in Rank-Based Models David Itkin et.al. 2510.20763 null
2025-10-23 Reinforcement Learning and Consumption-Savings Behavior Brandon Kaplowitz et.al. 2510.20748 null
2025-10-23 No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes Jasmine Bayrooti et.al. 2510.20725 null
2025-10-23 Measuring cosmic dipole with the GRB luminosity-time relation Jessica Santiago et.al. 2510.20705 null
2025-10-23 Plan Then Retrieve: Reinforcement Learning-Guided Complex Reasoning over Knowledge Graphs Yanlin Song et.al. 2510.20691 null
2025-10-23 Downsizing Diffusion Models for Cardinality Estimation Xinhe Mu et.al. 2510.20681 null
2025-10-23 The Shape of Reasoning: Topological Analysis of Reasoning Traces in Large Language Models Xue Wen Tan et.al. 2510.20665 null
2025-10-23 Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Jiahao Meng et.al. 2510.20579 null
2025-10-23 EmbodiedBrain: Expanding Performance Boundaries of Task Planning for Embodied Intelligence Ding Zou et.al. 2510.20578 null
2025-10-23 Monte Carlo Sampling for Wave Functions Requiring (Anti)Symmetrization Koyena Bose et.al. 2510.20577 null
2025-10-23 AdaDoS: Adaptive DoS Attack via Deep Adversarial Reinforcement Learning in SDN Wei Shao et.al. 2510.20566 null
2025-10-23 GlobalRAG: Enhancing Global Reasoning in Multi-hop Question Answering via Reinforcement Learning Jinchang Luo et.al. 2510.20548 null
2025-10-23 A Unified Framework for Zero-Shot Reinforcement Learning Jacopo Di Ventura et.al. 2510.20542 null
2025-10-23 Detection of ultra-high-energy cosmic rays in the southern hemisphere with FAST: data acquisition and preliminary results Jakub Kmec et.al. 2510.20522 null
2025-10-23 Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence Kun Ouyang et.al. 2510.20470 null
2025-10-23 On Multiple Robustness of Proximal Dynamic Treatment Regimes Yuanshan Gao et.al. 2510.20451 null
2025-10-23 DAIL: Beyond Task Ambiguity for Language-Conditioned Reinforcement Learning Runpeng Xie et.al. 2510.19562 null
2025-10-22 olmOCR 2: Unit Test Rewards for Document OCR Jake Poznanski et.al. 2510.19817 null
2025-10-22 Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing Yusu Qian et.al. 2510.19808 null
2025-10-22 Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning Xichen Zhang et.al. 2510.19807 null
2025-10-22 SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration Xichen Zhang et.al. 2510.19767 null
2025-10-22 SEA: Semantic Map Prediction for Active Exploration of Uncertain Areas Hongyu Ding et.al. 2510.19766 null
2025-10-22 Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning Gunshi Gupta et.al. 2510.19732 null
2025-10-22 Semi-Implicit Approaches for Large-Scale Bayesian Spatial Interpolation Sébastien Garneau et.al. 2510.19722 null
2025-10-22 MedReason-R1: Learning to Reason for CT Diagnosis with Reinforcement Learning and Local Zoom Yifan Li et.al. 2510.19626 null
2025-10-22 Demonstrating Real Advantage of Machine-Learning-Enhanced Monte Carlo for Combinatorial Optimization Luca Maria Del Bono et.al. 2510.19544 null
2025-10-22 Quantum Monte Carlo study of low-dimensional Fermi fluids of dipolar atoms Clio Johnson et.al. 2510.19533 null
2025-10-22 The Confusing Instance Principle for Online Linear Quadratic Control Waris Radji et.al. 2510.19531 null
2025-10-22 Optimizing the Unknown: Black Box Bayesian Optimization with Energy-Based Model and Reinforcement Learning Ruiyao Miao et.al. 2510.19530 null
2025-10-22 Learning Upper Lower Value Envelopes to Shape Online RL: A Principled Approach Sebastian Reboul et.al. 2510.19528 null
2025-10-22 Practical algorithm for simulating thermal pure quantum states Wei-Bo He et.al. 2510.19504 null
2025-10-22 Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning Kevin Huang et.al. 2510.19495 null
2025-10-22 Quantum Machine Learning methods for Fourier-based distribution estimation with application in option pricing Fernando Alonso et.al. 2510.19494 null
2025-10-22 Monte Carlo study of the $O(2)$-invariant $φ^4$ theory with a cubic perturbation in three dimensions Martin Hasenbusch et.al. 2510.19473 null
2025-10-22 Reasoning Like Experts: Leveraging Multimodal Large Language Models for Drawing-based Psychoanalysis Xueqi Ma et.al. 2510.19451 null
2025-10-22 Universal Quantitative Abstraction: Categorical Duality and Logical Completeness for Probabilistic Systems Nivar Anwer et.al. 2510.19444 null
2025-10-21 Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting Howard Chen et.al. 2510.18874 null
2025-10-21 EffiReasonTrans: RL-Optimized Reasoning for Code Translation Yanlin Wang et.al. 2510.18863 null
2025-10-21 Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model Ling Team et.al. 2510.18855 null
2025-10-21 Lyapunov-Aware Quantum-Inspired Reinforcement Learning for Continuous-Time Vehicle Control: A Feasibility Study Nutkritta Kraipatthanapong et.al. 2510.18852 null
2025-10-21 Towards Faithful and Controllable Personalization via Critique-Post-Edit Reinforcement Learning Chenghao Zhu et.al. 2510.18849 null
2025-10-21 MADR: MPC-guided Adversarial DeepReach Ryan Teoh et.al. 2510.18845 null
2025-10-21 PCMS: Parallel Coupler For Multimodel Simulations Jacob S. Merson et.al. 2510.18838 null
2025-10-21 Actor-Free Continuous Control via Structurally Maximizable Q-Functions Yigit Korkmaz et.al. 2510.18828 null
2025-10-21 Search Self-play: Pushing the Frontier of Agent Capability without Supervision Hongliang Lu et.al. 2510.18821 null
2025-10-21 Online SFT for LLM Reasoning: Surprising Effectiveness of Self-Tuning without Rewards Mengqi Li et.al. 2510.18814 null
2025-10-21 Computational Foundations for Strategic Coopetition: Formalizing Interdependence and Complementarity Vik Pant et.al. 2510.18802 null
2025-10-21 Two-loop QCD corrections for real and off-shell diphoton and triphoton production via quark loops Dario Kermanschah et.al. 2510.18801 null
2025-10-21 WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection Guanzhong He et.al. 2510.18798 null
2025-10-21 Beware of the running $n_s$ when producing heavy primordial black holes Sasha Allegrini et.al. 2510.18791 null
2025-10-21 Analysis note: measurement of thrust and track energy-energy correlator in $e^+e^-$ collisions at 91.2 GeV with DELPHI open data Jingyu Zhang et.al. 2510.18762 null
2025-10-21 Verifiable Accuracy and Abstention Rewards in Curriculum RL to Alleviate Lost-in-Conversation Ming Li et.al. 2510.18731 null
2025-10-21 Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options Joongkyu Lee et.al. 2510.18713 null
2025-10-21 Chemistry, Climate, and Transmission Spectra of TRAPPIST-1 e Explored with a Multimodel Sparse Sampled Ensemble Eric T. Wolf et.al. 2510.18704 null
2025-10-21 Reinforcement Learning with Imperfect Transition Predictions: A Bellman-Jensen Approach Chenbei Lu et.al. 2510.18687 null
2025-10-21 Sherlock Your Queries: Learning to Ask the Right Questions for Dialogue-Based Retrieval Dong Yun et.al. 2510.18659 null
2025-10-21 An integrated neural wavefunction solver for spinful Fermi systems Alexander Avdoshkin et.al. 2510.18621 null
2025-10-21 CUARewardBench: A Benchmark for Evaluating Reward Models on Computer-using Agent Haojia Lin et.al. 2510.18596 null
2025-10-21 Deep Q-Learning Assisted Bandwidth Reservation for Multi-Operator Time-Sensitive Vehicular Networking Abdullah Al-Khatib et.al. 2510.18553 null
2025-10-21 Improved thermonuclear rate of $^{42}$Ti($p$,$γ$)$^{43}$ V and its astrophysical implication in rp-process S. Q. Hou et.al. 2510.18531 null
2025-10-21 Efficient Model-Based Reinforcement Learning for Robot Control via Online Learning Fang Nan et.al. 2510.18518 null
2025-10-21 Socialized Learning and Emergent Behaviors in Multi-Agent Systems based on Multimodal Large Language Models Sureyya Akin et.al. 2510.18515 null
2025-10-21 Learning to Navigate Under Imperfect Perception: Conformalised Segmentation for Safe Reinforcement Learning Daniel Bethell et.al. 2510.18485 null
2025-10-21 Safe But Not Sorry: Reducing Over-Conservatism in Safety Critics via Uncertainty-Aware Modulation Daniel Bethell et.al. 2510.18478 null
2025-10-21 CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment Xue Jiang et.al. 2510.18471 null
2025-10-21 Uncovering critical temperature dependence in Heusler magnets via explicit machine learning Jean-Baptiste Morée et.al. 2510.18469 null
2025-10-21 DeLoad: Demand-Driven Short-Video Preloading with Scalable Watch-Time Estimation Tong Liu et.al. 2510.18459 null
2025-10-21 Fingerprints of cluster-based Haldane and bound-magnon states in a spin-1 Heisenberg diamond chain Azam Zoshki et.al. 2510.18447 null
2025-10-21 PlanU: Large Language Model Decision Making through Planning under Uncertainty Ziwei Deng et.al. 2510.18442 null
2025-10-21 Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents Guangfu Guo et.al. 2510.18424 null
2025-10-21 On AI Verification in Open RAN Rahul Soundrarajan et.al. 2510.18417 null
2025-10-21 MENTOR: A Reinforcement Learning Framework for Model Enhancement via Teacher-Optimized Rewards in Small Models ChangSu Choi et.al. 2510.18383 null
2025-10-21 Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback Yi-Lun Wu et.al. 2510.18353 null
2025-10-21 PGTT: Phase-Guided Terrain Traversal for Perceptive Legged Locomotion Alexandros Ntagkas et.al. 2510.18348 null
2025-10-21 Why Policy Gradient Algorithms Work for Undiscounted Total-Reward MDPs Jongmin Lee et.al. 2510.18340 null
2025-10-21 The implications of inflation for the last ACT Zhi-Chong Qiu et.al. 2510.18320 null
2025-10-21 MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation Chengshu Li et.al. 2510.18316 null
2025-10-21 Higher Embedding Dimension Creates a Stronger World Model for a Simple Sorting Task Brady Bhalla et.al. 2510.18315 null
2025-10-21 Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models Lehan Wang et.al. 2510.18303 null
2025-10-21 Food4All: A Multi-Agent Framework for Real-time Free Food Discovery with Integrated Nutritional Metadata Zhengqing Yuan et.al. 2510.18289 null
2025-10-21 From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation Ziwei Huang et.al. 2510.18263 null
2025-10-21 NTKMTL: Mitigating Task Imbalance in Multi-Task Learning from Neural Tangent Kernel Perspective Xiaohan Qin et.al. 2510.18258 null
2025-10-21 The Picard-Lagrange Framework for Higher-Order Langevin Monte Carlo Jaideep Mahajan et.al. 2510.18242 null
2025-10-21 Nash Policy Gradient: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria Eason Yu et.al. 2510.18183 null
2025-10-20 Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains Soumya Rani Samineni et.al. 2510.18176 null
2025-10-20 LLMs Encode How Difficult Problems Are William Lugoloobi et.al. 2510.18147 null
2025-10-20 Measuring Reasoning in LLMs: a New Dialectical Angle Soheil Abbasloo et.al. 2510.18134 null
2025-10-20 R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations Connor Mattson et.al. 2510.18085 null
2025-10-20 RL-Driven Security-Aware Resource Allocation Framework for UAV-Assisted O-RAN Zaineh Abughazzah et.al. 2510.18084 null
2025-10-20 Provably Optimal Reinforcement Learning under Safety Filtering Donggeon David Oh et.al. 2510.18082 null
2025-10-20 R2L: Reliable Reinforcement Learning: Guaranteed Return & Reliable Policies in Reinforcement Learning Nadir Farhi et.al. 2510.18074 null
2025-10-20 Fine-tuning Flow Matching Generative Models with Intermediate Feedback Jiajun Fan et.al. 2510.18072 null
2025-10-20 Oxidation State Dynamics and Emerging Patterns in Magnetite Emre Gürsoy et.al. 2510.18061 null
2025-10-20 SPACeR: Self-Play Anchoring with Centralized Reference Models Wei-Jer Chang et.al. 2510.18060 null
2025-10-20 Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models Jiajun Fan et.al. 2510.18053 null
2025-10-20 OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning Zhenyu Bi et.al. 2510.18032 null
2025-10-20 Humanoid Goalkeeper: Learning from Position Conditioned Task-Motion Constraints Junli Ren et.al. 2510.18002 null
2025-10-20 Collider Searches for Near-Continuum Dark Matter Steven Ferrante et.al. 2510.17989 null
2025-10-20 Accelerating Bayesian Inference via Multi-Fidelity Transport Map Coupling Sanjan C. Muchandimath et.al. 2510.17946 null
2025-10-20 An Exact Quantile-Energy Equality for Terminal Halfspaces in Linear-Gaussian Control with a Discrete-Time Companion, KL/Schrodinger Links, and High-Precision Validation Sandro Andric et.al. 2510.17945 null
2025-10-20 UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts Fu-Yun Wang et.al. 2510.17937 null
2025-10-20 EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning He Du et.al. 2510.17928 null
2025-10-20 Rewarding the Journey, Not Just the Destination: A Composite Path and Answer Self-Scoring Reward Mechanism for Test-Time Reinforcement Learning Chenwei Tang et.al. 2510.17923 null
2025-10-20 CLAWS:Creativity detection for LLM-generated solutions using Attention Window of Sections Keuntae Kim et.al. 2510.17921 null
2025-10-20 Functional Distribution Networks (FDN) Omer Haq et.al. 2510.17794 null
2025-10-20 Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains Austin Xu et.al. 2510.17793 null
2025-10-20 SoftMimic: Learning Compliant Whole-body Control from Examples Gabriel B. Margolis et.al. 2510.17792 null
2025-10-20 UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action Yuhao Yang et.al. 2510.17790 null
2025-10-20 B-Meson Anomalies: Effective Field Theory Meets Machine Learning Alejandro Mir et.al. 2510.17742 null
2025-10-20 Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations Tong Chen et.al. 2510.17733 null
2025-10-20 QueST: Incentivizing LLMs to Generate Difficult Problems Hanxu Hu et.al. 2510.17715 null
2025-10-20 The Marked Edge Walk: A Novel MCMC Algorithm for Sampling of Graph Partitions Atticus McWhorter et.al. 2510.17714 null
2025-10-20 A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning Anjie Liu et.al. 2510.17697 null
2025-10-20 Efficient Algorithms for Mitigating Uncertainty and Risk in Reinforcement Learning Xihong Su et.al. 2510.17690 null
2025-10-20 CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks Xu Zhang et.al. 2510.17687 null
2025-10-20 RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation Yuquan Xue et.al. 2510.17640 null
2025-10-20 Colour coherence in small collision systems Isobel Kolbé et.al. 2510.17570 null
2025-10-20 An Empirical Study of Lagrangian Methods in Safe Reinforcement Learning Lindsay Spoor et.al. 2510.17564 null
2025-10-20 Towards Optimal Control and Algorithmic Structure of Decompression Schedules Benjamin Marsh et.al. 2510.17551 null
2025-10-20 OncoReason: Structuring Clinical Reasoning in LLMs for Robust and Interpretable Survival Prediction Raghu Vamshi Hemadri et.al. 2510.17532 null
2025-10-20 Plasma Shape Control via Zero-shot Generative Reinforcement Learning Niannian Wu et.al. 2510.17531 null
2025-10-20 Toward Autonomous Neural VMC: An Energy-Variance Convergence Criterion for Quantum Systems Huan-Chen Shi et.al. 2510.17490 null
2025-10-20 Certified Self-Consistency: Statistical Guarantees and Test-Time Training for Reliable Reasoning in LLMs Paula Cordero-Encinar et.al. 2510.17472 null
2025-10-20 Estimating Orbital Parameters of Direct Imaging Exoplanet Using Neural Network Bo Liang et.al. 2510.17459 null
2025-10-20 Agentic Reinforcement Learning for Search is Unsafe Yushi Yang et.al. 2510.17431 null
2025-10-20 Leveraging Group Relative Policy Optimization to Advance Large Language Models in Traditional Chinese Medicine Jiacheng Xie et.al. 2510.17402 null
2025-10-20 Finite-Time Bounds for Average-Reward Fitted Q-Iteration Jongmin Lee et.al. 2510.17391 null
2025-10-20 Inference of Deterministic Finite Automata via Q-Learning Elaheh Hosseinkhani et.al. 2510.17386 null
2025-10-20 TabR1: Taming GRPO for tabular reasoning LLMs Pengxiang Cai et.al. 2510.17385 null
2025-10-20 Optimizing Energy Management of Smart Grid using Reinforcement Learning aided by Surrogate models built using Physics-informed Neural Networks Julen Cestero et.al. 2510.17380 null
2025-10-20 When 5G NTN Meets GNSS: Tracking GNSS Signals under Overlaid 5G Waveforms Idir Edjekouane et.al. 2510.17324 null
2025-10-20 Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling Lipeng Xie et.al. 2510.17314 null
2025-10-20 Multimodal Safety Is Asymmetric: Cross-Modal Exploits Unlock Black-Box MLLMs Jailbreaks Xinkai Wang et.al. 2510.17277 null
2025-10-20 Characterizing expansivity through $C^*$ -algebras S. Bautista et.al. 2510.17255 null
2025-10-20 From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models Zefan Cai et.al. 2510.17247 null
2025-10-20 Deep Neural Network extraction of Unpolarized Transverse Momentum Distributions I. P. Fernando et.al. 2510.17243 null
2025-10-20 Coinvisor: An RL-Enhanced Chatbot Agent for Interactive Cryptocurrency Investment Analysis Chong Chen et.al. 2510.17235 null
2025-10-20 D2C-HRHR: Discrete Actions with Double Distributional Critics for High-Risk-High-Return Tasks Jundong Zhang et.al. 2510.17212 null
2025-10-20 Trading with the Devil: Risk and Return in Foundation Model Strategies Jinrui Zhang et.al. 2510.17165 null
2025-10-20 ALPINE: A Lightweight and Adaptive Privacy-Decision Agent Framework for Dynamic Edge Crowdsensing Guanjie Cheng et.al. 2510.17162 null
2025-10-20 GACO-CAD: Geometry-Augmented and Conciseness-Optimized CAD Model Generation from Single Image Yinghui Wang et.al. 2510.17157 null
2025-10-20 Decentralized Real-Time Planning for Multi-UAV Cooperative Manipulation via Imitation Learning Shantnav Agarwal et.al. 2510.17143 null
2025-10-20 Rethinking On-policy Optimization for Query Augmentation Zhichao Xu et.al. 2510.17139 null
2025-10-20 Continuous Q-Score Matching: Diffusion Guided Reinforcement Learning for Continuous-Time Control Chengxiu Hua et.al. 2510.17122 null
2025-10-20 Learning to Design Soft Hands using Reward Models Xueqian Bai et.al. 2510.17086 null
2025-10-20 Consistent Zero-Shot Imitation with Contrastive Goal Inference Kathryn Wantlin et.al. 2510.17059 null

Notes:

Function added: