Agent Research Papers

Automatically Updated on 2025.10.30

Current Search Keywords: Agent,Multi-Agent,Tool Learning,Agent RL,Autonomous Agent,LLM Agent

If you have any other keywords, please feel free to let us know :)

Agent

Publish Date	Title	Authors	PDF	Code
2025-10-28	Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents	Yueqi Song et.al.	2510.24702	null
2025-10-28	AgentFold: Long-Horizon Web Agents with Proactive Context Management	Rui Ye et.al.	2510.24699	null
2025-10-28	AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis	Xuanzhong Chen et.al.	2510.24695	null
2025-10-28	Repurposing Synthetic Data for Fine-grained Search Agent Supervision	Yida Zhao et.al.	2510.24694	null
2025-10-28	OrchDAG: Complex Tool Orchestration in Multi-Turn Interactions with Plan DAGs	Yifu Lu et.al.	2510.24663	null
2025-10-28	FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling	Zengzhuang Xu et.al.	2510.24645	null
2025-10-28	ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers?	Christine Ye et.al.	2510.24591	null
2025-10-28	Affordance Representation and Recognition for Autonomous Agents	Habtom Kahsay Gidey et.al.	2510.24459	null
2025-10-28	Law in Silico: Simulating Legal Society with LLM-Based Agents	Yiding Wang et.al.	2510.24442	null
2025-10-28	Can LLMs Write Faithfully? An Agent-Based Evaluation of LLM-generated Islamic Content	Abdullah Mushtaq et.al.	2510.24438	null
2025-10-28	Policy Cards: Machine-Readable Runtime Governance for Autonomous AI Agents	Juraj Mavračić et.al.	2510.24383	null
2025-10-28	Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation	Lingyue Fu et.al.	2510.24358	null
2025-10-28	Cybersecurity AI Benchmark (CAIBench): A Meta-Benchmark for Evaluating Cybersecurity AI Agents	María Sanz-Gómez et.al.	2510.24317	null
2025-10-28	Retrieval and Argumentation Enhanced Multi-Agent LLMs for Judgmental Forecasting	Deniz Gorur et.al.	2510.24303	null
2025-10-28	MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools	Wenhao Wang et.al.	2510.24284	null
2025-10-28	Investigating Software Aging in LLM-Generated Software Systems	César Santos et.al.	2510.24188	null
2025-10-28	BLM $_1$ : A Boundless Large Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning	Wentao Tan et.al.	2510.24161	null
2025-10-28	From Observability Data to Diagnosis: An Evolving Multi-agent System for Incident Management in Cloud Systems	Yu Luo et.al.	2510.24145	null
2025-10-28	Reinforcement Learning for Long-Horizon Multi-Turn Search Agents	Vivek Kalyan et.al.	2510.24126	null
2025-10-28	PFEA: An LLM-based High-Level Natural Language Planning and Feedback Embodied Agent for Human-Centered AI	Wenbin Ding et.al.	2510.24109	null
2025-10-28	BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents	Litu Ou et.al.	2510.23458	null
2025-10-28	Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views	Anna Deichler et.al.	2510.22672	null
2025-10-27	Are Agents Just Automata? On the Formal Equivalence Between Agentic AI and the Chomsky Hierarchy	Roham Koohestani et.al.	2510.23487	null
2025-10-27	Model Proficiency in Centralized Multi-Agent Systems: A Performance Study	Anna Guerra et.al.	2510.23447	null
2025-10-27	AutoStreamPipe: LLM Assisted Automatic Generation of Data Stream Processing Pipelines	Abolfazl Younesi et.al.	2510.23408	null
2025-10-27	Multi-Stakeholder Alignment in LLM-Powered Collaborative AI Systems: A Multi-Agent Framework for Intelligent Tutoring	Alexandre P Uchoa et.al.	2510.23245	null
2025-10-27	Evaluation of Vision-LLMs in Surveillance Video	Pascal Benschop et.al.	2510.23190	null
2025-10-27	SI-Bench: Benchmarking Social Intelligence of Large Language Models in Human-to-Human Conversations	Shuai Huang et.al.	2510.23182	null
2025-10-27	Adapting Interleaved Encoders with PPO for Language-Guided Reinforcement Learning in BabyAI	Aryan Mathur et.al.	2510.23148	null
2025-10-27	Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLMs	Kai Zhuang et.al.	2510.23127	null
2025-10-27	Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning	Ran Xu et.al.	2510.23038	null
2025-10-27	P1GPT: a multi-agent LLM workflow module for multi-modal financial information analysis	Chen-Che Lu et.al.	2510.23032	null
2025-10-27	TALM: Dynamic Tree-Structured Multi-Agent Framework with Long-Term Memory for Scalable Code Generation	Ming-Tung Shen et.al.	2510.23010	null
2025-10-27	CodeAD: Synthesize Code of Rules for Log-based Anomaly Detection with LLMs	Junjie Huang et.al.	2510.22986	null
2025-10-27	Language Server CLI Empowers Language Agents with Process Rewards	Yifan Zhang et.al.	2510.22907	null
2025-10-27	On Generalization in Agentic Tool Calling: CoreThink Agentic Reasoner and MAVEN Dataset	Vishvesh Bhat et.al.	2510.22898	null
2025-10-26	Distributed Multi-Agent Bandits Over Erdős-Rényi Random Networks	Jingyuan Liu et.al.	2510.22811	null
2025-10-26	Collaborative LLM Agents for C4 Software Architecture Design Automation	Kamil Szczepanik et.al.	2510.22787	null
2025-10-26	How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations	Zora Zhiruo Wang et.al.	2510.22780	null
2025-10-26	ATLAS: Actor-Critic Task-Completion with Look-ahead Action Simulation	Jiali Cheng et.al.	2510.22732	null
2025-10-24	A Knowledge-Graph Translation Layer for Mission-Aware Multi-Agent Path Planning in Spatiotemporal Dynamics	Edward Holmberg et.al.	2510.21695	null
2025-10-24	AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite	Jonathan Bragg et.al.	2510.21652	null
2025-10-24	Five-loop beta function for gauge theories: computations, results and consequences	F. Herzog et.al.	2510.21624	null
2025-10-24	DeepAgent: A General Reasoning Agent with Scalable Toolsets	Xiaoxi Li et.al.	2510.21618	null
2025-10-24	Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine	Wenyi Wang et.al.	2510.21614	null
2025-10-24	Doc-Researcher: A Unified System for Multimodal Document Parsing and Deep Research	Kuicai Dong et.al.	2510.21603	null
2025-10-24	EU-Agent-Bench: Measuring Illegal Behavior of LLM Agents Under EU Law	Ilija Lichkovski et.al.	2510.21524	null
2025-10-24	OpenHype: Hyperbolic Embeddings for Hierarchical Open-Vocabulary Radiance Fields	Lisa Weijler et.al.	2510.21441	null
2025-10-24	Context Engineering for AI Agents in Open-Source Software	Seyedmoein Mohsenimofidi et.al.	2510.21413	null
2025-10-24	HIKMA: Human-Inspired Knowledge by Machine Agents through a Multi-Agent Framework for Semi-Autonomous Scientific Conferences	Zain Ul Abideen Tariq et.al.	2510.21370	null
2025-10-24	Magellan: Guided MCTS for Latent Space Exploration and Novelty Generation	Lufan Chang et.al.	2510.21341	null
2025-10-24	Towards Reliable Code-as-Policies: A Neuro-Symbolic Framework for Embodied Task Planning	Sanghyun Ahn et.al.	2510.21302	null
2025-10-24	Securing AI Agent Execution	Christoph Bühler et.al.	2510.21236	null
2025-10-24	DispatchMAS: Fusing taxonomy and artificial intelligence agents for emergency medical services	Xiang Li et.al.	2510.21228	null
2025-10-24	DAO-AI: Evaluating Collective Decision-Making through Agentic AI in Decentralized Governance	Chunghyun Han et.al.	2510.21117	null
2025-10-24	Soft Instruction De-escalation Defense	Nils Philipp Walter et.al.	2510.21057	null
2025-10-24	Mixture-of-Minds: Multi-Agent Reinforcement Learning for Table Understanding	Yuhang Zhou et.al.	2510.20176	null
2025-10-23	From Questions to Queries: An AI-powered Multi-Agent Framework for Spatial Text-to-SQL	Ali Khosravi Kazazi et.al.	2510.21045	null
2025-10-23	AgentArcEval: An Architecture Evaluation Method for Foundation Model based Agents	Qinghua Lu et.al.	2510.21031	null
2025-10-23	Co-Designing Quantum Codes with Transversal Diagonal Gates via Multi-Agent Systems	Xi He et.al.	2510.20728	null
2025-10-23	C-NAV: Towards Self-Evolving Continual Object Navigation in Open World	Ming-Ming Yu et.al.	2510.20685	null
2025-10-23	Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence	Jiahao Meng et.al.	2510.20579	null
2025-10-23	EmbodiedBrain: Expanding Performance Boundaries of Task Planning for Embodied Intelligence	Ding Zou et.al.	2510.20578	null
2025-10-23	Designing Intent Communication for Agent-Human Collaboration	Yi Li et.al.	2510.20409	null
2025-10-23	Balancing Specialization and Centralization: A Multi-Agent Reinforcement Learning Benchmark for Sequential Industrial Control	Tom Maus et.al.	2510.20408	null
2025-10-23	GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?	Chiyu Chen et.al.	2510.20333	null
2025-10-23	From Generation to Attribution: Music AI Agent Architectures for the Post-Streaming Era	Wonil Kim et.al.	2510.20276	null
2025-10-23	ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases	Ziqian Zhong et.al.	2510.20270	null
2025-10-23	Towards AI Agents for Course Instruction in Higher Education: Early Experiences from the Field	Yogesh Simmhan et.al.	2510.20255	null
2025-10-23	Automated Cloud Infrastructure-as-Code Reconciliation with AI Agents	Zhenning Yang et.al.	2510.20211	null
2025-10-23	Merge and Conquer: Evolutionarily Optimizing AI for 2048	Maggie Bai et.al.	2510.20205	null
2025-10-23	Human-Centered LLM-Agent System for Detecting Anomalous Digital Asset Transactions	Gyuyeon Na et.al.	2510.20102	null
2025-10-22	ToolScope: Enhancing LLM Agent Tool Use through Tool Merging and Context-Aware Filtering	Marianne Menglin Liu et.al.	2510.20036	null
2025-10-22	Communication to Completion: Modeling Collaborative Workflows with Intelligent Multi-Agent Communication	Yiming Lu et.al.	2510.19995	null
2025-10-22	A Tutorial on Cognitive Biases in Agentic AI-Driven 6G Autonomous Networks	Hatim Chergui et.al.	2510.19973	null
2025-10-22	Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets	Jiashi Feng et.al.	2510.19944	null
2025-10-22	Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation	Jackson Hassell et.al.	2510.19897	null
2025-10-22	Large Language Model enabled Mathematical Modeling	Guoyun Zhang et.al.	2510.19895	null
2025-10-22	Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents	Gil Pasternak et.al.	2510.19771	null
2025-10-22	Review of Tools for Zero-Code LLM Based Application Development	Priyaranjan Pattnayak et.al.	2510.19747	null
2025-10-22	Misalignment Bounty: Crowdsourcing AI Agent Misbehavior	Rustem Turtayev et.al.	2510.19738	null
2025-10-22	Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning	Gunshi Gupta et.al.	2510.19732	null
2025-10-22	Are Large Language Models Sensitive to the Motives Behind Communication?	Addison J. Wu et.al.	2510.19687	null
2025-10-22	Pragmatic Heterogeneous Collaborative Perception via Generative Communication Mechanism	Junfei Zhou et.al.	2510.19618	null
2025-10-22	Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1	Qianli Ma et.al.	2510.19600	null
2025-10-22	gem5 Co-Pilot: AI Assistant Agent for Architectural Design Space Exploration	Zuoming Fu et.al.	2510.19577	null
2025-10-22	AegisMCP: Online Graph Intrusion Detection for Tool-Augmented LLMs on Edge Devices	Zhonghao Zhan et.al.	2510.19462	null
2025-10-22	MSC-Bench: A Rigorous Benchmark for Multi-Server Tool Orchestration	Jia-Kai Dong et.al.	2510.19423	null
2025-10-22	ColorAgent: Building A Robust, Personalized, and Interactive OS Agent	Ning Li et.al.	2510.19386	null
2025-10-22	Nonmonotone subgradient methods based on a local descent lemma	Francisco J. Aragón-Artacho et.al.	2510.19341	null
2025-10-22	Learning to Make Friends: Coaching LLM Agents toward Emergent Social Ties	Philipp J. Schneider et.al.	2510.19299	null
2025-10-22	Trace: Securing Smart Contract Repository Against Access Control Vulnerability	Chong Chen et.al.	2510.19254	null
2025-10-22	SheetBrain: A Neuro-Symbolic Agent for Accurate Reasoning over Complex and Large Spreadsheets	Ziwei Wang et.al.	2510.19247	null
2025-10-22	DiSRouter: Distributed Self-Routing for LLM Selections	Hang Zheng et.al.	2510.19208	null
2025-10-22	Defending Against Prompt Injection with DataFilter	Yizhu Wang et.al.	2510.19207	null
2025-10-22	WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation	Yaoyao Qian et.al.	2510.19205	null
2025-10-21	When Your AI Agent Succumbs to Peer-Pressure: Studying Opinion-Change Dynamics of LLMs	Aliakbar Mehdizadeh et.al.	2510.19107	null
2025-10-21	Plural Voices, Single Agent: Towards Inclusive AI in Multi-User Domestic Spaces	Joydeep Chandra et.al.	2510.19008	null
2025-10-21	Search Self-play: Pushing the Frontier of Agent Capability without Supervision	Hongliang Lu et.al.	2510.18821	null
2025-10-21	WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection	Guanzhong He et.al.	2510.18798	null
2025-10-21	KAT-Coder Technical Report	Zizheng Zhan et.al.	2510.18779	null
2025-10-21	Fetch.ai: An Architecture for Modern Multi-Agent Systems	Michael J. Wooldridge et.al.	2510.18699	null
2025-10-21	Tokencake: A KV-Cache-centric Serving Framework for LLM-based Multi-Agent Applications	Zhuohang Bian et.al.	2510.18586	null
2025-10-21	WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality	Chunyang Li et.al.	2510.18560	null
2025-10-21	SOCIA-Nabla: Textual Gradient Meets Multi-Agent Orchestration for Automated Simulator Generation	Yuncheng Hua et.al.	2510.18551	null
2025-10-21	JAUNT: Joint Alignment of User Intent and Network State for QoE-centric LLM Tool Routing	Enhan Li et.al.	2510.18550	null
2025-10-21	EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval	Zebin Yang et.al.	2510.18546	null
2025-10-21	Socialized Learning and Emergent Behaviors in Multi-Agent Systems based on Multimodal Large Language Models	Sureyya Akin et.al.	2510.18515	null
2025-10-21	Crucible: Quantifying the Potential of Control Algorithms through LLM Agents	Lianchen Jia et.al.	2510.18491	null
2025-10-21	LAFA: Agentic LLM-Driven Federated Analytics over Decentralized Data Sources	Haichao Ji et.al.	2510.18477	null
2025-10-21	Probabilistic Modeling of Intentions in Socially Intelligent LLM Agents	Feifan Xia et.al.	2510.18476	null
2025-10-21	Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents	Guangfu Guo et.al.	2510.18424	null
2025-10-21	Memory-Augmented State Machine Prompting: A Novel LLM Agent Framework for Real-Time Strategy Games	Runnan Qi et.al.	2510.18395	null
2025-10-21	MENTOR: A Reinforcement Learning Framework for Model Enhancement via Teacher-Optimized Rewards in Small Models	ChangSu Choi et.al.	2510.18383	null
2025-10-21	InspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger Collaboration	Yunkun Wang et.al.	2510.18327	null
2025-10-21	Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning	Aaron Bell et.al.	2510.18318	null
2025-10-21	Genesis: Evolving Attack Strategies for LLM Web Agent Red-Teaming	Zheng Zhang et.al.	2510.18314	null
2025-10-21	Food4All: A Multi-Agent Framework for Real-time Free Food Discovery with Integrated Nutritional Metadata	Zhengqing Yuan et.al.	2510.18289	null
2025-10-21	Optimal allocations with distortion risk measures and mixed risk attitudes	Mario Ghossoub et.al.	2510.18236	null
2025-10-21	Applying voxel-based analysis to oropharyngeal cancer proton therapy patients: a correlation study on radiation-induced acute dysphagia	Qianxia Wang et.al.	2510.18210	null
2025-10-21	Adaptive Coopetition: Leveraging Coarse Verifier Signals for Resilient Multi-Agent LLM Reasoning	Rui Jerry Huang et.al.	2510.18179	null
2025-10-21	NEBULA: Do We Evaluate Vision-Language-Action Agents Correctly?	Jierui Peng et.al.	2510.16263	null
2025-10-21	SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection	Yang Feng et.al.	2510.16219	null
2025-10-21	PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold	Yi Wan et.al.	2510.15862	null
2025-10-21	FinAI Data Assistant: LLM-based Financial Database Query Processing with the OpenAI Function Calling API	Juhyeong Kim et.al.	2510.14162	null
2025-10-21	A $^2$ FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning	Qianben Chen et.al.	2510.12838	null
2025-10-20	AgentChangeBench: A Multi-Dimensional Evaluation Framework for Goal-Shift Robustness in Conversational AI	Manik Rana et.al.	2510.18170	null
2025-10-20	World-in-World: World Models in a Closed-Loop World	Jiahan Zhang et.al.	2510.18135	null
2025-10-20	SafeCoop: Unravelling Full Stack Safety in Agentic Collaborative Driving	Xiangbo Gao et.al.	2510.18123	null
2025-10-20	Investigating the Impact of Dark Patterns on LLM-Based Web Agents	Devin Ersoy et.al.	2510.18113	null
2025-10-20	Does Reasoning Help LLM Agents Play Dungeons and Dragons? A Prompt Engineering Experiment	Patricia Delafuente et.al.	2510.18112	null
2025-10-20	CompactPrompt: A Unified Pipeline for Prompt Data Compression in LLM Workflows	Joong Ho Choi et.al.	2510.18043	null
2025-10-20	OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning	Zhenyu Bi et.al.	2510.18032	null
2025-10-20	FABRIC: Framework for Agent-Based Realistic Intelligence Creation	Abhigya Verma et.al.	2510.17995	null
2025-10-20	PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits	Neeladri Bhuiya et.al.	2510.17947	null
2025-10-20	Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics	Akshara Prabhakar et.al.	2510.17797	null
2025-10-20	Executable Knowledge Graphs for Replicating AI Research	Yujie Luo et.al.	2510.17795	null
2025-10-20	A Mimamsa Inspired Framework For Instruction Sequencing In AI Agents	Bama Srinivasan et.al.	2510.17691	null
2025-10-20	ShapeCraft: LLM Agents for Structured, Textured and Interactive 3D Modeling	Shuyuan Zhang et.al.	2510.17603	null
2025-10-20	MIRAGE: Agentic Framework for Multimodal Misinformation Detection with Web-Grounded Reasoning	Mir Nafis Sharear Shopnil et.al.	2510.17590	null
2025-10-20	Cybersecurity AI: Evaluating Agentic Cybersecurity in Attack/Defense CTFs	Francesco Balassone et.al.	2510.17521	null
2025-10-20	Empowering Real-World: A Survey on the Technology, Practice, and Evaluation of LLM-driven Industry Agents	Yihong Tang et.al.	2510.17491	null
2025-10-20	Agentic Reinforcement Learning for Search is Unsafe	Yushi Yang et.al.	2510.17431	null
2025-10-20	Diverse Planning with Simulators via Linear Temporal Logic	Mustafa F. Abdelwahed et.al.	2510.17418	null
2025-10-20	Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems	Rishi Jha et.al.	2510.17276	null
2025-10-20	Coinvisor: An RL-Enhanced Chatbot Agent for Interactive Cryptocurrency Investment Analysis	Chong Chen et.al.	2510.17235	null
2025-10-20	ALPINE: A Lightweight and Adaptive Privacy-Decision Agent Framework for Dynamic Edge Crowdsensing	Guanjie Cheng et.al.	2510.17162	null
2025-10-20	Decentralized Real-Time Planning for Multi-UAV Cooperative Manipulation via Imitation Learning	Shantnav Agarwal et.al.	2510.17143	null
2025-10-20	Do LLMs Recognize Your Latent Preferences? A Benchmark for Latent Information Discovery in Personalized Interaction	Ioannis Tsaknakis et.al.	2510.17132	null
2025-10-20	Semantic Intelligence: A Bio-Inspired Cognitive Framework for Embodied Agents	Wenbing Tang et.al.	2510.17129	null
2025-10-20	Verification-Aware Planning for Multi-Agent Systems	Tianyang Xu et.al.	2510.17109	null
2025-10-20	Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models	Elias Hossain et.al.	2510.17098	null
2025-10-20	A Brain Cell Type Resource Created by Large Language Models and a Multi-Agent AI System for Collaborative Community Annotation	Rongbin Li et.al.	2510.17064	null
2025-10-20	Consistent Zero-Shot Imitation with Contrastive Goal Inference	Kathryn Wantlin et.al.	2510.17059	null
2025-10-20	Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks	Trilok Padhi et.al.	2510.14207	null
2025-10-19	ToolCritic: Detecting and Correcting Tool-Use Errors in Dialogue Systems	Hassan Hamad et.al.	2510.17052	null
2025-10-19	ReclAIm: A multi-agent framework for degradation-aware performance tuning of medical imaging AI	Eleftherios Tzanis et.al.	2510.17004	null
2025-10-19	EEschematic: Multimodal-LLM Based AI Agent for Schematic Generation of Analog Circuit	Chang Liu et.al.	2510.17002	null
2025-10-19	STARK: Strategic Team of Agents for Refining Kernels	Juncheng Dong et.al.	2510.16996	null
2025-10-19	Towards Interpretable and Trustworthy Time Series Reasoning: A BlueSky Vision	Kanghui Ning et.al.	2510.16980	null
2025-10-19	Lark: Biologically Inspired Neuroevolution for Multi-Stakeholder LLM Agents	Dheeraj Chintapalli et.al.	2510.16978	null
2025-10-19	Learning Ecology with VERA Using Conceptual Models and Simulations	Spencer Rugaber et.al.	2510.16944	null
2025-10-19	VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents	Kangrui Wang et.al.	2510.16907	null
2025-10-19	Agentic Inequality	Matthew Sharp et.al.	2510.16853	null
2025-10-19	FinSight: Towards Real-World Financial Deep Research	Jiajie Jin et.al.	2510.16844	null
2025-10-19	More with Less: An Empirical Study of Turn-Control Strategies for Efficient Coding Agents	Pengfei Gao et.al.	2510.16786	null
2025-10-19	Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI	Jitao Sang et.al.	2510.16720	null
2025-10-19	An Agentic Framework with LLMs for Solving Complex Vehicle Routing Problems	Ni Zhang et.al.	2510.16701	null
2025-10-19	Pursuing Minimal Sufficiency in Spatial Reasoning	Yejie Guo et.al.	2510.16688	null
2025-10-19	Agentic Design of Compositional Machines	Wenqian Zhang et.al.	2510.14980	null
2025-10-18	Unleashing Diverse Thinking Modes in LLMs through Multi-Agent Collaboration	Zhixuan He et.al.	2510.16645	null
2025-10-18	Prompt Optimization via Retrieved Reasoning Assets and Multi-Agent Analysis	Wonduk Seo et.al.	2510.16635	null
2025-10-18	Prior Makes It Possible: From Sublinear Graph Algorithms to LLM Test-Time Methods	Avrim Blum et.al.	2510.16609	null
2025-10-18	Ripple Effect Protocol: Coordinating Agent Populations	Ayush Chopra et.al.	2510.16572	null
2025-10-18	BuildArena: A Physics-Aligned Interactive Benchmark of LLMs for Engineering Construction	Tian Xia et.al.	2510.16559	null
2025-10-18	Check Yourself Before You Wreck Yourself: Selectively Quitting Improves LLM Agent Safety	Vamshi Krishna Bonagiri et.al.	2510.16492	null
2025-10-18	REALM: An MLLM-Agent Framework for Open World 3D Reasoning Segmentation and Editing on Gaussian Splatting	Changyue Shi et.al.	2510.16410	null
2025-10-18	ATA: A Neuro-Symbolic Approach to Implement Autonomous and Trustworthy Agents	David Peer et.al.	2510.16381	null
2025-10-18	Synergizing chemical and AI communities for advancing laboratories of the future	Saejin Oh et.al.	2510.16293	null
2025-10-17	Outraged AI: Large language models prioritise emotion over cost in fairness enforcement	Hao Liu et.al.	2510.17880	null
2025-10-17	WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale	Yuxuan Lu et.al.	2510.16252	null
2025-10-17	Towards Automatic Evaluation and Selection of PHI De-identification Models via Multi-Agent Collaboration	Guanchen Wu et.al.	2510.16194	null
2025-10-17	Agentic AI for Ultra-Modern Networks: Multi-Agent Framework for RAN Autonomy and Assurance	Sukhdeep Singh et.al.	2510.16144	null
2025-10-17	Narrowing Action Choices with AI Improves Human Sequential Decisions	Eleni Straitouri et.al.	2510.16097	null
2025-10-17	TriAgent: Automated Biomarker Discovery with Deep Research Grounding for Triage in Acute Care by LLM-Based Multi-Agent Collaboration	Kerem Delikoyun et.al.	2510.16080	null
2025-10-17	EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle	Rong Wu et.al.	2510.16079	null
2025-10-17	SIADAFIX: issue description response for adaptive program repair	Xin Cao et.al.	2510.16059	null
2025-10-17	PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction	Simon Yu et.al.	2510.15863	null
2025-10-17	Self-evolving expertise in complex non-verifiable subject domains: dialogue as implicit meta-RL	Richard M. Bailey et.al.	2510.15772	null
2025-10-17	AURA: An Agent Autonomy Risk Assessment Framework	Lorenzo Satta Chiris et.al.	2510.15739	null
2025-10-17	Build Your Personalized Research Group: A Multiagent Framework for Continual and Interactive Science Automation	Ed Li et.al.	2510.15624	null
2025-10-17	The Spark Effect: On Engineering Creative Diversity in Multi-Agent AI Systems	Alexander Doudkin et.al.	2510.15568	null
2025-10-17	MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games	Huining Yuan et.al.	2510.15414	null
2025-10-17	SHARE: Scene-Human Aligned Reconstruction	Joshua Li et.al.	2510.15342	null
2025-10-17	VERA-MH Concept Paper	Luca Belli et.al.	2510.15297	null
2025-10-17	Exemplar-Guided Planing: Enhanced LLM Agent for KGQA	Jingao Xu et.al.	2510.15283	null
2025-10-17	Experience-Driven Exploration for Efficient API-Free AI Agents	Chenwei Tang et.al.	2510.15259	null
2025-10-17	Multi-dimensional Data Analysis and Applications Basing on LLM Agents and Knowledge Graph Interactions	Xi Wang et.al.	2510.15258	null
2025-10-17	Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding	Sensen Gao et.al.	2510.15253	null
2025-10-17	Where to Search: Measure the Prior-Structured Search Space of LLM Agents	Zhuo-Yang Song et.al.	2510.14846	null
2025-10-16	GUIrilla: A Scalable Framework for Automated Desktop UI Exploration	Sofiya Garkot et.al.	2510.16051	null
2025-10-16	MAGPIE: A benchmark for Multi-AGent contextual PrIvacy Evaluation	Gurusha Juneja et.al.	2510.15186	null
2025-10-16	Internalizing World Models via Self-Play Finetuning for Agentic RL	Shiqi Chen et.al.	2510.15047	null
2025-10-16	Generalized Dynamics Generation towards Scannable Physical World Model	Yichen Li et.al.	2510.15041	null
2025-10-16	UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos	Mingxuan Liu et.al.	2510.15018	null
2025-10-16	Data-driven Calibration Sample Selection and Forecast Combination in Electricity Price Forecasting: An Application of the ARHNN Method	Tomasz Serafin et.al.	2510.15011	null
2025-10-16	Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents	Guoqing Wang et.al.	2510.14967	null
2025-10-16	VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation	Han Zhao et.al.	2510.14902	null
2025-10-16	The Gatekeeper Knows Enough	Fikresilase Wondmeneh Abebayew et.al.	2510.14881	null
2025-10-16	LabOS: The AI-XR Co-Scientist That Sees and Works With Humans	Le Cong et.al.	2510.14861	null
2025-10-16	RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning	Jinrui Liu et.al.	2510.14828	null
2025-10-16	To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models	Eran Malach et.al.	2510.14826	null
2025-10-16	ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling	Jianghao Lin et.al.	2510.14703	null
2025-10-16	LLM Agents for Automated Web Vulnerability Reproduction: Are We There Yet?	Bin Liu et.al.	2510.14700	null
2025-10-16	LLM Agents Beyond Utility: An Open-Ended Perspective	Asen Nachkov et.al.	2510.14548	null
2025-10-16	Agentic Entropy-Balanced Policy Optimization	Guanting Dong et.al.	2510.14545	null
2025-10-16	Helmsman: Autonomous Synthesis of Federated Learning Systems via Multi-Agent Collaboration	Haoyuan Li et.al.	2510.14512	null
2025-10-16	LiRA: Linguistic Robust Anchoring for Cross-lingual Large Language Models	Haolin Li et.al.	2510.14466	null
2025-10-16	Towards Automated Governance: A DSL for Human-Agent Collaboration in Software Projects	Adem Ait et.al.	2510.14465	null
2025-10-16	Why Instant-Runoff Voting Is So Resilient to Coalitional Manipulation: Phase Transitions in the Perturbed Culture	François Durand et.al.	2510.14450	null
2025-10-16	Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents	Rui Wang et.al.	2510.14438	null
2025-10-16	Bounds and asymptotic expansions for the radii of convexity and uniform convexity of normalized Bessel functions	Árpád Baricz et.al.	2510.14323	null
2025-10-16	Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies	Mason Nakamura et.al.	2510.14312	null
2025-10-16	ReUseIt: Synthesizing Reusable AI Agent Workflows for Web Automation	Yimeng Liu et.al.	2510.14308	null
2025-10-16	AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement Learning Framework for Stock Trading	Zheye Deng et.al.	2510.14264	null
2025-10-16	MAFA: A Multi-Agent Framework for Enterprise-Scale Annotation with Configurable Task Adaptation	Mahmood Hegazy et.al.	2510.14184	null
2025-10-16	Training LLM Agents to Empower Humans	Evan Ellis et.al.	2510.13709	null
2025-10-16	OpenDerisk: An Industrial Framework for AI-Driven SRE, with Design, Implementation, and Case Studies	Peng Di et.al.	2510.13561	null
2025-10-16	SVAG-Bench: A Large-Scale Benchmark for Multi-Instance Spatio-temporal Video Action Grounding	Tanveer Hannan et.al.	2510.13016	null
2025-10-16	Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics	Marco Del Tredici et.al.	2510.12787	null
2025-10-16	Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning	Xingang Guo et.al.	2510.12712	null
2025-10-16	MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites	Zhenxin Lei et.al.	2510.12126	null
2025-10-15	When “Correct” Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?	Yibo Peng et.al.	2510.17862	null
2025-10-15	CiteGuard: Faithful Citation Attribution for LLMs via Retrieval-Augmented Validation	Yee Man Choi et.al.	2510.17853	null
2025-10-15	CodeEvolve: An open source evolutionary coding agent for algorithm discovery and optimization	Henrique Assumpção et.al.	2510.14150	null
2025-10-15	Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems	Edoardo Allegrini et.al.	2510.14133	null
2025-10-15	Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Serving	Nikos Pagonas et.al.	2510.14126	null
2025-10-15	STEMS: Spatial-Temporal Enhanced Safe Multi-Agent Coordination for Building Energy Management	Huiliang Zhang et.al.	2510.14112	null
2025-10-15	Three-Dimensional Simulation of the University of Hawai`i FEL Oscillator: Superradiant Emission and Cavity Desynchronization	Amir Weinberg et.al.	2510.14061	null
2025-10-15	Sequential Quantum Measurements and the Instrumental Group Algebra	Christopher S. Jackson et.al.	2510.13980	null
2025-10-15	An LLM-Powered AI Agent Framework for Holistic IoT Traffic Interpretation	Daniel Adu Worae et.al.	2510.13925	null
2025-10-15	FACTS: Table Summarization via Offline Template Generation with Agentic Workflows	Ye Yuan et.al.	2510.13920	null
2025-10-15	Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms	Shrey Pandit et.al.	2510.13913	null
2025-10-15	RECODE: Reasoning Through Code Generation for Visual Question Answering	Junhong Shen et.al.	2510.13756	null
2025-10-15	From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails	Ravi Pandya et.al.	2510.13727	null
2025-10-15	Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module	Ruitao Feng et.al.	2510.13558	null
2025-10-15	Tandem Training for Language Models	Robert West et.al.	2510.13551	null
2025-10-15	In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers	Avihay Cohen et.al.	2510.13543	null
2025-10-15	MADREC: A Multi-Aspect Driven LLM Agent for Explainable and Adaptive Recommendation	Jiin Park et.al.	2510.13371	null
2025-10-15	Higher Satisfaction, Lower Cost: A Technical Report on How LLMs Revolutionize Meituan’s Intelligent Interaction Systems	Xuxin Cheng et.al.	2510.13291	null
2025-10-15	Automated Network Protocol Testing with LLM Agents	Yunze Wei et.al.	2510.13248	null
2025-10-15	EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems	Yufei He et.al.	2510.13220	null
2025-10-15	Addressing the alignment problem in transportation policy making: an LLM approach	Xiaoyu Yan et.al.	2510.13139	null
2025-10-14	Using Kolmogorov-Smirnov Distance for Measuring Distribution Shift in Machine Learning	Ozan K. Tonguz et.al.	2510.15996	null
2025-10-14	MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents	Dongsen Zhang et.al.	2510.15994	null
2025-10-14	Benefits and Limitations of Communication in Multi-Agent Reasoning	Michael Rizvi-Martel et.al.	2510.13903	null
2025-10-14	GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents	Xi Yu et.al.	2510.13896	null
2025-10-14	MultiFoodhat: A potential new paradigm for intelligent food quality inspection	Yue Hu et.al.	2510.13889	null
2025-10-14	Deliberate Lab: A Platform for Real-Time Human-AI Social Experiments	Crystal Qian et.al.	2510.13011	null
2025-10-14	SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents	Simon Sinong Zhan et.al.	2510.12985	null
2025-10-14	From Literal to Liberal: A Meta-Prompting Framework for Eliciting Human-Aligned Exception Handling in Large Language Models	Imran Khan et.al.	2510.12864	null
2025-10-14	Three Lenses on the AI Revolution: Risk, Transformation, Continuity	Masoud Makrehchi et.al.	2510.12859	null
2025-10-14	VQArt-Bench: A semantically rich VQA Benchmark for Art and Cultural Heritage	A. Alfarano et.al.	2510.12750	null
2025-10-14	SPORTS: Simultaneous Panoptic Odometry, Rendering, Tracking and Segmentation for Urban Scenes Understanding	Zhiliu Yang et.al.	2510.12749	null
2025-10-14	Multi-Agent Debate for LLM Judges with Adaptive Stability Detection	Tianyu Hu et.al.	2510.12697	null
2025-10-14	ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning	Hanyang Chen et.al.	2510.12693	null
2025-10-14	Designing Tools with Control Confidence	Ajith Anil Meera et.al.	2510.12630	null
2025-10-14	A Survey of Vibe Coding with Large Language Models	Yuyao Ge et.al.	2510.12399	null
2025-10-14	GOAT: A Training Framework for Goal-Oriented Agent with Tools	Hyunji Min et.al.	2510.12218	null
2025-10-14	Agent-Based Simulation of a Financial Market with Large Language Models	Ryuji Hashimoto et.al.	2510.12189	null
2025-10-14	IL3D: A Large-Scale Indoor Layout Dataset for LLM-Driven 3D Scene Generation	Wenxu Zhou et.al.	2510.12095	null
2025-10-14	ToPolyAgent: AI Agents for Coarse-Grained Topological Polymer Simulations	Lijie Ding et.al.	2510.12091	null
2025-10-14	Evaluating the Quality of Randomness and Entropy in Tasks Supported by Large Language Models	Rabimba Karanjai et.al.	2510.12080	null
2025-10-14	EmboMatrix: A Scalable Training-Ground for Embodied Decision-Making	Zixing Lei et.al.	2510.12072	null
2025-10-14	AI Agents as Universal Task Solvers	Alessandro Achille et.al.	2510.12066	null
2025-10-14	Empowering LLM Agents with Geospatial Awareness: Toward Grounded Reasoning for Wildfire Response	Yiheng Chen et.al.	2510.12061	null
2025-10-14	On the Number of Small Points for Rational Maps	Jit Wu Yap et.al.	2510.12039	null
2025-10-14	ManiAgent: An Agentic Framework for General Robotic Manipulation	Yi Yang et.al.	2510.11660	null
2025-10-14	Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs	Yujie Zhao et.al.	2510.11062	null
2025-10-13	Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation	Sayash Kapoor et.al.	2510.11977	null
2025-10-13	Scaling Long-Horizon LLM Agent via Context-Folding	Weiwei Sun et.al.	2510.11967	null
2025-10-13	DMAS-Forge: A Framework for Transparent Deployment of AI Applications as Distributed Systems	Alessandro Cornacchia et.al.	2510.11872	null
2025-10-13	Demystifying Reinforcement Learning in Agentic Reasoning	Zhaochen Yu et.al.	2510.11701	null
2025-10-13	When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents	Lingfei Qian et.al.	2510.11695	null
2025-10-13	Chronologically Consistent Generative AI	Songrun He et.al.	2510.11677	null
2025-10-13	FinVet: A Collaborative Framework of RAG and External Fact-Checking Agents for Financial Misinformation Detection	Daniel Berhane Araya et.al.	2510.11654	null
2025-10-13	Analyzing and Internalizing Complex Policy Documents for LLM Agents	Jiateng Liu et.al.	2510.11588	null
2025-10-13	Uncertainty-Aware, Risk-Adaptive Access Control for Agentic Systems using an LLM-Judged TBAC Model	Charles Fleming et.al.	2510.11414	null
2025-10-13	DocReward: A Document Reward Model for Structuring and Stylizing	Junpeng Liu et.al.	2510.11391	null
2025-10-13	Evolution in Simulation: AI-Agent School with Dual Memory for High-Fidelity Educational Dynamics	Sheng Jin et.al.	2510.11290	null
2025-10-13	PADME: Procedure Aware DynaMic Execution	Deepeka Garg et.al.	2510.11281	null
2025-10-13	A Large-Language-Model Assisted Automated Scale Bar Detection and Extraction Framework for Scanning Electron Microscopic Images	Yuxuan Chen et.al.	2510.11260	null
2025-10-13	Collaborative Shadows: Distributed Backdoor Attacks in LLM-Based Multi-Agent Systems	Pengyu Zhu et.al.	2510.11246	null
2025-10-13	Attacks by Content: Automated Fact-checking is an AI Security Issue	Michael Schlichtkrull et.al.	2510.11238	null
2025-10-13	WebRouter: Query-specific Router via Variational Information Bottleneck for Cost-sensitive Web Agent	Tao Li et.al.	2510.11221	null
2025-10-13	Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains?	Zhengyu Chen et.al.	2510.11184	null
2025-10-13	$How^{2}$ : How to learn from procedural How-to questions	Gautier Dagan et.al.	2510.11144	null
2025-10-13	video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory	Guangzhi Sun et.al.	2510.11129	null
2025-10-13	SusBench: An Online Benchmark for Evaluating Dark Pattern Susceptibility of Computer-Use Agents	Longjie Guo et.al.	2510.11035	null
2025-10-13	A Survey on Agentic Multimodal Large Language Models	Huanjin Yao et.al.	2510.10991	null
2025-10-13	Rethinking Reward Miscalibration of GRPO in Agentic RL	Jingyu Liu et.al.	2509.23870	null
2025-10-13	EvoEmo: Towards Evolved Emotional Policies for Adversarial LLM Agents in Multi-Turn Price Negotiation	Yunbo Long et.al.	2509.04310	null
2025-10-12	Zero-Shot Large Language Model Agents for Fully Automated Radiotherapy Treatment Planning	Dongrong Yang et.al.	2510.11754	null
2025-10-12	GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-Turn Deep Search	Heng Zhang et.al.	2510.10581	null
2025-10-12	MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision	Hongjie Zheng et.al.	2510.10461	null
2025-10-12	*Retro: Optimizing LLMs for Reasoning-Intensive Document Retrieval**	Junwei Lan et.al.	2509.24869	null
2025-10-12	Talk Less, Call Right: Enhancing Role-Play LLM Agents with Automatic Prompt Optimization and Role Prompting	Saksorn Ruangtanusak et.al.	2509.00482	null
2025-10-11	KG-MAS: Knowledge Graph-Enhanced Multi-Agent Infrastructure for coupling physical and digital robotic environments	Walid Abdela et.al.	2510.10325	null
2025-10-11	Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models	Christopher Chiu et.al.	2510.10278	null
2025-10-11	Don’t Just Fine-tune the Agent, Tune the Environment	Siyuan Lu et.al.	2510.10197	null
2025-10-11	ALLOY: Generating Reusable Agent Workflows from User Demonstration	Jiawen Li et.al.	2510.10049	null
2025-10-11	SwarmSys: Decentralized Swarm-Inspired Agents for Scalable and Adaptive Reasoning	Ruohao Li et.al.	2510.10047	null
2025-10-11	Leveraging Large Language Models for Cybersecurity Risk Assessment – A Case from Forestry Cyber-Physical Systems	Fikret Mert Gultekin et.al.	2510.06343	null
2025-10-11	Tree Search for LLM Agent Reinforcement Learning	Yuxiang Ji et.al.	2509.21240	null
2025-10-11	ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy	Alejandro D. Mousist et.al.	2509.13380	null
2025-10-10	Autonomous Agents for Scientific Discovery: Orchestrating Scientists, Language, Code, and Physics	Lianhao Zhou et.al.	2510.09901	null
2025-10-10	How can we assess human-agent interactions? Case studies in software agent design	Valerie Chen et.al.	2510.09801	null
2025-10-10	Building a Foundational Guardrail for General Agentic Systems via Synthetic Data	Yue Huang et.al.	2510.09781	null
2025-10-10	Preference-Aware Memory Update for Long-Term LLM Agents	Haoran Sun et.al.	2510.09720	null
2025-10-10	StreamingVLM: Real-Time Understanding for Infinite Video Streams	Ruyi Xu et.al.	2510.09608	null
2025-10-10	Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols	Mikhail Terekhov et.al.	2510.09462	null
2025-10-10	Safety Game: Balancing Safe and Informative Conversations with Blackbox Agentic AI using LP Solvers	Tuan Nguyen et.al.	2510.09330	null
2025-10-10	Fundamentals of Building Autonomous LLM Agents	Victor de Lamo Castrillo et.al.	2510.09244	null
2025-10-10	Leading the Follower: Learning Persuasive Agents in Social Deduction Games	Zhang Zheng et.al.	2510.09087	null
2025-10-10	When LLM Agents Meet Graph Optimization: An Automated Data Quality Improvement Approach	Zhihan Zhang et.al.	2510.08952	null
2025-10-10	Reimagining Agent-based Modeling with Large Language Model Agents via Shachi	So Kuroki et.al.	2509.21862	null
2025-10-09	CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization	Debeshee Das et.al.	2510.08829	null
2025-10-09	COMPASS: Enhancing Agent Long-Horizon Reasoning with Evolving Context	Guangya Wan et.al.	2510.08790	null
2025-10-09	Automating Android Build Repair: Bridging the Reasoning-Execution Gap in LLM Agents with Domain-Specific Tools	Ha Min Son et.al.	2510.08640	null
2025-10-09	CaRT: Teaching LLM Agents to Know When They Know Enough	Grace Liu et.al.	2510.08517	null
2025-10-09	Opponent Shaping in LLM Agents	Marta Emili Garcia Segura et.al.	2510.08255	null
2025-10-09	Simulating Teams with LLM Agents: Interactive 2D Environments for Studying Human-AI Dynamics	Mohammed Almutairi et.al.	2510.08242	null
2025-10-09	Training-Free Group Relative Policy Optimization	Yuzheng Cai et.al.	2510.08191	null
2025-10-09	AutoQual: An LLM Agent for Automated Discovery of Interpretable Features for Review Quality Assessment	Xiaochong Lan et.al.	2510.08081	null
2025-10-09	Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks	Cheng Yang et.al.	2510.08002	null
2025-10-09	Team Xiaomi EV-AD VLA: Learning to Navigate Socially Through Proactive Risk Perception – Technical Report for IROS 2025 RoboSense Challenge Social Navigation Track	Erjia Xiao et.al.	2510.07871	null
2025-10-09	Self-Improving LLM Agents at Test-Time	Emre Can Acikgoz et.al.	2510.07841	null
2025-10-09	Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models	Eric Hanchen Jiang et.al.	2510.07799	null
2025-10-09	Neuro-Symbolic Agents with Modal Logic for Autonomous Diagnostics	Antonin Sulc et.al.	2509.11943	null
2025-10-08	PARSE: LLM Driven Schema Optimization for Reliable Entity Extraction	Anubhav Shrimal et.al.	2510.08623	null
2025-10-08	L2M-AID: Autonomous Cyber-Physical Defense by Fusing Semantic Reasoning of Large Language Models with Multi-Agent Reinforcement Learning (Preprint)	Tianxiang Xu et.al.	2510.07363	null
2025-10-08	LAD-RAG: Layout-aware Dynamic RAG for Visually-Rich Document Understanding	Zhivar Sourati et.al.	2510.07233	null
2025-10-08	Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping	Ziyi Wang et.al.	2510.07230	null
2025-10-08	Exposing LLM User Privacy via Traffic Fingerprint Analysis: A Study of Privacy Risks in LLM Agent Interactions	Yixiang Zhang et.al.	2510.07176	null
2025-10-08	NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents	Tianshi Zheng et.al.	2510.07172	null
2025-10-08	Prompt Optimization Across Multiple Agents for Representing Diverse Human Populations	Manh Hung Nguyen et.al.	2510.07064	null
2025-10-08	COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization	Tian Qin et.al.	2510.07043	null
2025-10-08	LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling	Zecheng Tang et.al.	2510.06915	null
2025-10-08	When Machines Meet Each Other: Network Effects and the Strategic Role of History in Multi-Agent AI	Yu Liu et.al.	2510.06903	null
2025-10-08	SID: Multi-LLM Debate Driven by Self Signals	Xuhang Chen et.al.	2510.06843	null
2025-10-08	Scaling LLM Multi-turn RL with End-to-end Summarization-based Context Management	Miao Lu et.al.	2510.06727	null
2025-10-08	WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks	Jingbo Yang et.al.	2510.06587	null
2025-10-08	Spiral of Silence in Large Language Model Agents	Mingze Zhong et.al.	2510.02360	null
2025-10-08	Toward Causal-Visual Programming: Enhancing Agentic Reasoning in Low-Code Environments	Jiexi Xu et.al.	2509.25282	null
2025-10-07	A Survey on Agentic Security: Applications, Threats and Defenses	Asif Shahriar et.al.	2510.06445	null
2025-10-07	Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents	Mingkang Zhu et.al.	2510.06214	null
2025-10-07	RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback	Chunyu Miao et.al.	2510.06186	null
2025-10-07	LLMs as Policy-Agnostic Teammates: A Case Study in Human Proxy Design for Heterogeneous Agent Teams	Aju Ani Justus et.al.	2510.06151	null
2025-10-07	Constraint-Aware Route Recommendation from Natural Language via Hierarchical LLM Agents	Tao Zhe et.al.	2510.06078	null
2025-10-07	Training-Free Time Series Classification via In-Context Reasoning with LLM Agents	Songyuan Sui et.al.	2510.05950	null
2025-10-07	EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models	Zheyue Tan et.al.	2510.05943	null
2025-10-07	LLM-FS-Agent: A Deliberative Role-based Large Language Model Architecture for Transparent Feature Selection	Mohamed Bal-Ghaoui et.al.	2510.05935	null
2025-10-07	Communication Enables Cooperation in LLM Agents: A Comparison with Curriculum-Based Approaches	Hachem Madmoun et.al.	2510.05748	null
2025-10-07	AutoPentester: An LLM Agent-based Framework for Automated Pentesting	Yasod Ginige et.al.	2510.05605	null
2025-10-07	AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents	Mingdai Yang et.al.	2510.05598	null
2025-10-07	From Agentification to Self-Evolving Agentic AI for Wireless Networks: Concepts, Approaches, and Future Research Directions	Changyuan Zhao et.al.	2510.05596	null
2025-10-07	BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks	Sagnik Anupam et.al.	2510.02418	null
2025-10-06	Adversarial Reinforcement Learning for Large Language Model Agent Safety	Zizhao Wang et.al.	2510.05442	null
2025-10-06	A Lightweight Large Language Model-Based Multi-Agent System for 2D Frame Structural Analysis	Ziheng Geng et.al.	2510.05414	null
2025-10-06	Plug-and-Play Dramaturge: A Divide-and-Conquer Approach for Iterative Narrative Script Refinement via Collaborative LLM Agents	Wenda Xie et.al.	2510.05188	null
2025-10-06	RL Is a Hammer and LLMs Are Nails: A Simple Reinforcement Learning Recipe for Strong Prompt Injection	Yuxin Wen et.al.	2510.04885	null
2025-10-06	Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails	Siwei Han et.al.	2510.04860	null
2025-10-06	Beyond Outcome Reward: Decoupling Search and Answering Improves LLM Agents	Yiding Wang et.al.	2510.04695	null
2025-10-06	Multi-Agent Tool-Integrated Policy Optimization	Zhanfeng Mo et.al.	2510.04678	null
2025-10-06	Social Agent: Mastering Dyadic Nonverbal Behavior Generation via Conversational LLM Agents	Zeyi Zhang et.al.	2510.04637	null
2025-10-06	Autonomy Matters: A Study on Personalization-Privacy Dilemma in LLM Agents	Zhiping Zhang et.al.	2510.04465	null
2025-10-06	Beyond Manuals and Tasks: Instance-Level Context Learning for LLM Agents	Kuntai Cai et.al.	2510.02369	null
2025-10-05	Internal World Models as Imagination Networks in Cognitive Agents	Saurabh Ranjan et.al.	2510.04391	null
2025-10-05	Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation	Hadi Nekoei et.al.	2510.04373	null
2025-10-05	Closing the Loop: Coordinating Inventory and Recommendation via Deep Reinforcement Learning on Multiple Timescales	Jinyang Jiang et.al.	2510.04272	null
2025-10-05	AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework	Hanchen Zhang et.al.	2510.04206	null
2025-10-05	Constructing coherent spatial memory in LLM agents through graph rectification	Puzhen Zhang et.al.	2510.04195	null
2025-10-05	From Shadow to Light: Toward Safe and Efficient Policy Learning Across MPC, DeePC, RL, and LLM Agents	Amin Vahidi-Moghaddam et.al.	2510.04076	null
2025-10-04	Adversarial Agent Collaboration for C to Rust Translation	Tianyu Li et.al.	2510.03879	null
2025-10-04	InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents	Yaxin Du et.al.	2510.02271	null
2025-10-04	Extracting Conceptual Knowledge to Locate Software Issues	Ying Wang et.al.	2509.21427	null
2025-10-03	VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation	Lesly Miculicich et.al.	2510.05156	null
2025-10-03	LLM Agents for Automated Dependency Upgrades	Vali Tawosi et.al.	2510.03480	null
2025-10-03	ALMAS: an Autonomous LLM-based Multi-Agent Software Engineering Framework	Vali Tawosi et.al.	2510.03463	null
2025-10-03	Improving GUI Grounding with Explicit Position-to-Coordinate Mapping	Suyuchen Wang et.al.	2510.03230	null
2025-10-03	CoDA: Agentic Systems for Collaborative Data Visualization	Zichen Chen et.al.	2510.03194	null
2025-10-03	AudioToolAgent: An Agentic Framework for Audio-Language Models	Gijs Wijngaard et.al.	2510.02995	null
2025-10-03	Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents	Wonjoong Kim et.al.	2510.02837	null
2025-10-02	AgentCaster: Reasoning-Guided Tornado Forecasting	Michael Chen et.al.	2510.03349	null
2025-10-02	Orchestrating Human-AI Teams: The Manager Agent as a Unifying Research Challenge	Charlie Masters et.al.	2510.02557	null
2025-10-02	StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?	Yanxu Chen et.al.	2510.02209	null
2025-10-02	TACOS: Task Agnostic COordinator of a multi-drone System	Alessandro Nazzari et.al.	2510.01869	null
2025-10-02	Pre-Hoc Predictions in AutoML: Leveraging LLMs to Enhance Model Selection and Benchmarking for Tabular datasets	Yannis Belkhiter et.al.	2510.01842	null
2025-10-02	GuruAgents: Emulating Wise Investors with Prompt-Guided LLM Agents	Yejin Kim et.al.	2510.01664	null
2025-10-02	SoK: Measuring What Matters for Closed-Loop Security Agents	Mudita Khurana et.al.	2510.01654	null
2025-10-02	Position: Privacy Is Not Just Memorization!	Niloofar Mireshghallah et.al.	2510.01645	null
2025-10-02	GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments	Hanlin Zhu et.al.	2509.21998	null
2025-10-02	Gala: Global LLM Agents for Text-to-Model Translation	Junyang Cai et.al.	2509.08970	null
2025-10-01	Automating Data-Driven Modeling and Analysis for Engineering Applications using Large Language Model Agents	Yang Liu et.al.	2510.01398	null
2025-10-01	Beyond Single LLMs: Enhanced Code Generation via Multi-Stage Performance-Guided LLM Orchestration	Huashan Chen et.al.	2510.01379	null
2025-10-01	Fine-tuning with RAG for Improving LLM Learning of New Skills	Humaid Ibrahim et.al.	2510.01375	null
2025-10-01	Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks	Shoumik Saha et.al.	2510.01359	null
2025-10-01	The Social Laboratory: A Psychometric Framework for Multi-Agent LLM Evaluation	Zarreen Reza et.al.	2510.01295	null
2025-10-01	TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments	Zhangchen Xu et.al.	2510.01179	null
2025-10-01	Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare	Zhengliang Shi et.al.	2510.01164	null
2025-10-01	A Practitioner’s Guide to Multi-turn Agentic Reinforcement Learning	Ruiyi Wang et.al.	2510.01132	null
2025-10-01	QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL	Cong Yu et.al.	2510.00967	null
2025-10-01	ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs	Adi Simhi et.al.	2510.00857	null
2025-10-01	ACON: Optimizing Context Compression for Long-horizon LLM Agents	Minki Kang et.al.	2510.00615	null
2025-10-01	JoyAgent-JDGenie: Technical Report on the GAIA	Jiarun Liu et.al.	2510.00510	null
2025-10-01	Seeing through Uncertainty: Robust Task-Oriented Optimization in Visual Navigation	Yiyuan Pan et.al.	2510.00441	null
2025-10-01	RELATE-Sim: Leveraging Turning Point Theory and LLM Agents to Predict and Understand Long-Term Relationship Dynamics through Interactive Narrative Simulations	Matthew Yue et.al.	2510.00414	null
2025-10-01	Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs	Siyu Zhu et.al.	2509.25779	null
2025-10-01	Automatically Generating Web Applications from Requirements Via Multi-Agent Test-Driven Development	Yuxuan Wan et.al.	2509.25297	null
2025-10-01	Beyond the Strongest LLM: Multi-Turn Multi-Agent Orchestration vs. Single LLMs on Benchmarks	Aaron Xuxiang Tian et.al.	2509.23537	null
2025-10-01	On the Soundness and Consistency of LLM Agents for Executing Test Cases Written in Natural Language	Sébastien Salva et.al.	2509.19136	null
2025-10-01	A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks	S M Asif Hossain et.al.	2509.14285	null
2025-09-30	From Trace to Line: LLM Agent for Real-World OSS Vulnerability Localization	Haoran Xi et.al.	2510.02389	null
2025-09-30	CORTEX: Collaborative LLM Agents for High-Stakes Alert Triage	Bowen Wei et.al.	2510.00311	null
2025-09-30	Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents	Zhen Yang et.al.	2509.26539	null
2025-09-30	VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications	Wei He et.al.	2509.26490	null
2025-09-30	ErrorPrism: Reconstructing Error Propagation Paths in Cloud Service Systems	Junsong Pu et.al.	2509.26463	null
2025-09-30	Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents	Shuai Shao et.al.	2509.26354	null
2025-09-30	LLM Agents for Knowledge Discovery in Atomic Layer Processing	Andreas Werbrouck et.al.	2509.26201	null
2025-09-30	RoRecomp: Enhancing Reasoning Efficiency via Rollout Response Recomposition in Reinforcement Learning	Gang Li et.al.	2509.25958	null
2025-09-30	Mem-α: Learning Memory Construction via Reinforcement Learning	Yu Wang et.al.	2509.25911	null
2025-09-30	SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents	Ruolin Chen et.al.	2509.25885	null
2025-09-30	Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs	Hankun Dai et.al.	2509.25873	null
2025-09-30	STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents	Jing-Jing Li et.al.	2509.25624	null
2025-09-30	MASLegalBench: Benchmarking Multi-Agent Systems in Deductive Legal Reasoning	Huihao Jing et.al.	2509.24922	null
2025-09-30	TENET: Leveraging Tests Beyond Validation for Code Generation	Yiran Hu et.al.	2509.24148	null
2025-09-30	Dual-Scale World Models for LLM Agents Towards Hard-Exploration Problems	Minsoo Kim et.al.	2509.24116	null
2025-09-30	InfiAgent: Self-Evolving Pyramid Agent Framework for Infinite Scenarios	Chenglin Yu et.al.	2509.22502	null
2025-09-30	Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents	Davide Paglieri et.al.	2509.03581	null
2025-09-30	Towards Agentic OS: An LLM Agent Framework for Linux Schedulers	Yusheng Zheng et.al.	2509.01245	null
2025-09-29	A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory	Qianshan Wei et.al.	2510.02373	null
2025-09-29	Causal Autoencoder-like Generation of Feedback Fuzzy Cognitive Maps with an LLM Agent	Akash Kumar Panda et.al.	2509.25593	null
2025-09-29	RadOnc-GPT: An Autonomous LLM Agent for Real-Time Patient Outcomes Labeling at Scale	Jason Holmes et.al.	2509.25540	null
2025-09-29	Where LLM Agents Fail and How They can Learn From Failures	Kunlun Zhu et.al.	2509.25370	null
2025-09-29	Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents	Boxuan Zhang et.al.	2509.25302	null
2025-09-29	PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion	Yuyang Yin et.al.	2509.24997	null
2025-09-29	When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training	Sanxing Chen et.al.	2509.24923	null
2025-09-29	MAS $^2$ : Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems	Kun Wang et.al.	2509.24323	null
2025-09-29	SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents	Gyuhyeon Seo et.al.	2509.24282	null
2025-09-28	WAREX: Web Agent Reliability Evaluation on Existing Benchmarks	Su Kara et.al.	2510.03285	null
2025-09-28	Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning	Runyu Zhang et.al.	2509.24047	null
2025-09-28	PartnerMAS: An LLM Hierarchical Multi-Agent Framework for Business Partner Selection on High-Dimensional Features	Lingyao Li et.al.	2509.24046	null
2025-09-28	LLM/Agent-as-Data-Analyst: A Survey	Zirui Tang et.al.	2509.23988	null
2025-09-28	Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation	Pengxiang Li et.al.	2509.23866	null
2025-09-28	AgentGuard: Runtime Verification of AI Agents	Roham Koohestani et.al.	2509.23864	null
2025-09-28	Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules	Chenyu Zhou et.al.	2509.23836	null
2025-09-28	FedAgentBench: Towards Automating Real-world Federated Medical Image Analysis with Server-Client LLM Agents	Pramit Saha et.al.	2509.23803	null
2025-09-28	GUI-Shepherd: Reliable Process Reward and Verification for Long-Sequence GUI Tasks	Cong Chen et.al.	2509.23738	null
2025-09-28	Improving the Efficiency of LLM Agent Systems through Trajectory Reduction	Yuan-An Xiao et.al.	2509.23586	null
2025-09-28	Agentic Reinforcement Learning with Implicit Step Rewards	Xiaoqian Liu et.al.	2509.19199	null
2025-09-27	Memory Management and Contextual Consistency for Long-Running Low-Code Agents	Jiexi Xu et.al.	2509.25250	null
2025-09-27	BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software	Zehua Zhang et.al.	2509.25248	null
2025-09-27	Situational Awareness for Safe and Robust Multi-Agent Interactions Under Uncertainty	Benjamin Alcorn et.al.	2509.23425	null
2025-09-27	“Shall We Dig Deeper?”: Designing and Evaluating Strategies for LLM Agents to Advance Knowledge Co-Construction in Asynchronous Online Discussions	Yuanhao Zhang et.al.	2509.23327	null
2025-09-27	Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents	Yaorui Shi et.al.	2509.23040	null
2025-09-26	Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents	Heyang Gao et.al.	2510.03253	null
2025-09-26	AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering	Ziqing Wang et.al.	2510.02328	null
2025-09-26	Infusing Theory of Mind into Socially Intelligent LLM Agents	EunJeong Hwang et.al.	2509.22887	null
2025-09-26	ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents	Hwan Chang et.al.	2509.22830	null
2025-09-26	EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning	Wujiang Xu et.al.	2509.22576	null
2025-09-26	The Emergence of Altruism in Large-Language-Model Agents Society	Haoyang Li et.al.	2509.22537	null
2025-09-26	Do LLM Agents Know How to Ground, Recover, and Assess? A Benchmark for Epistemic Competence in Information-Seeking Agents	Jiaqi Shao et.al.	2509.22391	null
2025-09-26	Impact of Collective Behaviors of Autonomous Vehicles on Urban Traffic Dynamics: A Multi-Agent Reinforcement Learning Approach	Ahmet Onur Akman et.al.	2509.22216	null
2025-09-26	Leveraging LLM Agents for Automated Video Game Testing	Chengjia Wang et.al.	2509.22170	null
2025-09-26	CoBel-World: Harnessing LLM Reasoning to Build a Collaborative Belief World for Optimizing Embodied Multi-Agent Collaboration	Zhimin Wang et.al.	2509.21981	null
2025-09-26	What Makes LLM Agent Simulations Useful for Policy? Insights From an Iterative Design Engagement in Emergency Preparedness	Yuxuan Li et.al.	2509.21868	null
2025-09-26	UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios	Haotian Luo et.al.	2509.21766	null
2025-09-26	JudgeAgent: Knowledge-wise and Dynamic LLM Evaluation with Agent-as-Interviewer	Zhichao Shi et.al.	2509.02097	null
2025-09-25	LLM Agent Meets Agentic AI: Can LLM Agents Simulate Customers to Evaluate Agentic-AI-based Shopping Assistants?	Lu Sun et.al.	2509.21501	null
2025-09-25	What Do LLM Agents Do When Left Alone? Evidence of Spontaneous Meta-Cognitive Patterns	Stefan Szeider et.al.	2509.21224	null
2025-09-25	CORE: Full-Path Evaluation of LLM Agents Beyond Final State	Panagiotis Michelakis et.al.	2509.20998	null
2025-09-25	LIMI: Less is More for Agency	Yang Xiao et.al.	2509.17567	null
2025-09-24	EpidemIQs: Prompt-to-Paper LLM Agents for Epidemic Modeling and Analysis	Mohammad Hossein Samaei et.al.	2510.00024	null
2025-09-24	Blueprint-Bench: Comparing spatial intelligence of LLMs, agents and image models	Lukas Petersson et.al.	2509.25229	null
2025-09-24	LLMs for Bayesian Optimization in Scientific Domains: Are We There Yet?	Rushil Gupta et.al.	2509.21403	null
2025-09-24	Training Task Reasoning LLM Agents for Multi-turn Task Planning via Single-turn Reinforcement Learning	Hanjiang Hu et.al.	2509.20616	null
2025-09-24	SAMULE: Self-Learning Agents Enhanced by Multi-level Reflection	Yubin Ge et.al.	2509.20562	null
2025-09-24	Perspectra: Choosing Your Experts Enhances Critical Thinking in Multi-Agent Research Ideation	Yiren Liu et.al.	2509.20553	null
2025-09-24	Agentic Metacognition: Designing a “Self-Aware” Low-Code Agent for Failure Prediction and Human Handoff	Jiexi Xu et.al.	2509.19783	null
2025-09-23	Structured Cognition for Behavioral Intelligence in Large Language Model Agents: Preliminary Study	Myung Ho Kim et.al.	2510.05107	null
2025-09-23	The Heterogeneous Multi-Agent Challenge	Charles Dansereau et.al.	2509.19512	null
2025-09-23	Simulating Online Social Media Conversations on Controversial Topics Using AI Agents Calibrated on Real-World Data	Elisa Composta et.al.	2509.18985	null
2025-09-23	MemOrb: A Plug-and-Play Verbal-Reinforcement Memory Layer for E-Commerce Customer Service	Yizhe Huang et.al.	2509.18713	null
2025-09-23	LCMF: Lightweight Cross-Modality Mambaformer for Embodied Robotics VQA	Zeyi Kang et.al.	2509.18576	null
2025-09-23	LLMZ+: Contextual Prompt Whitelist Principles for Agentic LLMs	Tom Pawelek et.al.	2509.18557	null
2025-09-23	LLM Agents for Interactive Workflow Provenance: Reference Architecture and Evaluation Methodology	Renan Souza et.al.	2509.13978	null
2025-09-22	ARK-V1: An LLM-Agent for Knowledge Graph Question Answering Requiring Commonsense Reasoning	Jan-Felix Klein et.al.	2509.18063	null
2025-09-22	Through the Lens of Human-Human Collaboration: A Configurable Research Platform for Exploring Human-Agent Collaboration	Bingsheng Yao et.al.	2509.18008	null
2025-09-22	MSCoRe: A Benchmark for Multi-Stage Collaborative Reasoning in LLM Agents	Yuzhen Lei et.al.	2509.17628	null
2025-09-22	Human vs. Agent in Task-Oriented Conversations	Zhefan Wang et.al.	2509.17619	null
2025-09-22	Privacy in Action: Towards Realistic Privacy Mitigation and Evaluation for LLM-Powered Agents	Shouju Wang et.al.	2509.17488	null
2025-09-22	Asteria: Semantic-Aware Cross-Region Caching for Agentic LLM Tool Access	Chaoyi Ruan et.al.	2509.17360	null
2025-09-22	UIPro: Unleashing Superior Interaction Capability For GUI Agents	Hongxin Li et.al.	2509.17328	null
2025-09-22	Generalizable End-to-End Tool-Use RL with Synthetic CodeGym	Weihua Du et.al.	2509.17325	null
2025-09-21	SignalLLM: A General-Purpose LLM Agent Framework for Automated Signal Processing	Junlong Ke et.al.	2509.17197	null
2025-09-21	LLMs as Layout Designers: A Spatial Reasoning Perspective	Sha Li et.al.	2509.16891	null
2025-09-20	Towards Transparent and Incentive-Compatible Collaboration in Decentralized LLM Multi-Agent Systems: A Blockchain-Driven Approach	Minfeng Qi et.al.	2509.16736	null
2025-09-20	OPEN-THEATRE: An Open-Source Toolkit for LLM-based Interactive Drama	Tianyang Xu et.al.	2509.16713	null
2025-09-20	Governed By Agents: A Survey On The Role Of Agentic AI In Future Computing Environments	Nauman Ali Murad et.al.	2509.16676	null
2025-09-19	Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans	Deuksin Kwon et.al.	2509.16394	null
2025-09-19	Overhearing LLM Agents: A Survey, Taxonomy, and Roadmap	Andrew Zhu et.al.	2509.16325	null
2025-09-19	Towards Robust Visual Continual Learning with Multi-Prototype Supervision	Xiwei Liu et.al.	2509.16011	null
2025-09-19	How do Language Models Generate Slang: A Systematic Comparison between Human and Machine-Generated Slang Usages	Siyang Wu et.al.	2509.15518	null
2025-09-19	LLM Agents at the Roundtable: A Multi-Perspective and Dialectical Reasoning Framework for Essay Scoring	Jinhee Jang et.al.	2509.14834	null
2025-09-18	SecureFixAgent: A Hybrid LLM Agent for Automated Python Static Vulnerability Repair	Jugal Gajjar et.al.	2509.16275	null
2025-09-18	Diagnostics of cognitive failures in multi-agent expert systems using dynamic evaluation protocols and subsequent mutation of the processing context	Andrejs Sorstkins et.al.	2509.15366	null
2025-09-18	A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making	Xiao Wu et.al.	2509.14998	null
2025-09-18	ToolSample: Dual Dynamic Sampling Methods with Curriculum Learning for RL-based Tool Learning	Zihao Feng et.al.	2509.14718	null
2025-09-18	SWE-QA: Can Language Models Answer Repository-level Code Questions?	Weihan Peng et.al.	2509.14635	null
2025-09-17	Ticket-Bench: A Kickoff for Multilingual and Regionalized Agent Evaluation	Thales Sales Almeida et.al.	2509.14477	null
2025-09-17	TopoSizing: An LLM-aided Framework of Topology-based Understanding and Sizing for AMS Circuits	Ziming Wei et.al.	2509.14169	null
2025-09-17	Understanding the Process of Human-AI Value Alignment	Jack McKinlay et.al.	2509.13854	null
2025-09-17	From Legacy Fortran to Portable Kokkos: An Autonomous Agentic AI Workflow	Sparsh Gupta et.al.	2509.12443	null
2025-09-17	Co-Investigator AI: The Rise of Agentic AI for Smarter, Trustworthy AML Compliance Narratives	Prathamesh Vasudeo Naik et.al.	2509.08380	null
2025-09-17	Emergent Social Dynamics of LLM Agents in the El Farol Bar Problem	Ryosuke Takata et.al.	2509.04537	null
2025-09-17	How Does Cognitive Bias Affect Large Language Models? A Case Study on the Anchoring Effect in Price Negotiation Simulations	Yoshiki Takenami et.al.	2508.21137	null
2025-09-16	Agentic JWT: A Secure Delegation Protocol for Autonomous AI Agents	Abhishek Goswami et.al.	2509.13597	null
2025-09-16	AI Agents with Human-Like Collaborative Tools: Adaptive Strategies for Enhanced Problem-Solving	Harper Reed et.al.	2509.13547	null
2025-09-16	An LLM Agentic Approach for Legal-Critical Software: A Case Study for Tax Prep Software	Sina Gogani-Khiabani et.al.	2509.13471	null
2025-09-16	WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning	Kuan Li et.al.	2509.13305	null
2025-09-16	Agentic AI for Financial Crime Compliance	Henrik Axelsen et.al.	2509.13137	null
2025-09-16	Toward PDDL Planning Copilot	Yarin Benyamin et.al.	2509.12987	null
2025-09-16	H $^2$ R: Hierarchical Hindsight Reflection for Multi-Task LLM Agents	Shicheng Ye et.al.	2509.12810	null
2025-09-16	Agentic Lybic: Multi-Agent Execution System with Tiered Reasoning and Orchestration	Liangxuan Guo et.al.	2509.11067	null
2025-09-16	PromptSleuth: Detecting Prompt Injection via Semantic Intent Invariance	Mengxiao Wang et.al.	2508.20890	null
2025-09-16	Mining the Long Tail: A Comparative Study of Data-Centric Criticality Metrics for Robust Offline Reinforcement Learning in Autonomous Motion Planning	Antonio Guillen-Perez et.al.	2508.18397	null
2025-09-16	Enhancing LLM-Based Social Bot via an Adversarial Learning Framework	Fanqi Kong et.al.	2508.17711	null
2025-09-15	Emotions are Recognized Patterns of Cognitive Activities	Yue Jin et.al.	2509.16232	null
2025-09-15	Redefining Website Fingerprinting Attacks With Multiagent LLMs	Chuxu Song et.al.	2509.12462	null
2025-09-15	Survival at Any Cost? LLMs and the Choice Between Self-Preservation and Human Harm	Alireza Mohamadi et.al.	2509.12190	null
2025-09-15	VisDocSketcher: Towards Scalable Visual Documentation with Agentic Systems	Luís F. Gomes et.al.	2509.11942	null
2025-09-15	$ε$ -Optimal Multi-Agent Patrol using Recurrent Strategy	Deepak Mallya et.al.	2509.11640	null
2025-09-15	Automated Creation and Enrichment Framework for Improved Invocation of Enterprise APIs as Tools	Prerna Agarwal et.al.	2509.11626	null
2025-09-15	MedicalOS: An LLM Agent based Operating System for Digital Healthcare	Jared Zhu et.al.	2509.11507	null
2025-09-14	Agentic UAVs: LLM-Driven Autonomy with Integrated Tool-Calling and Cognitive Reasoning	Anis Koubaa et.al.	2509.13352	null
2025-09-14	Prompts to Proxies: Emulating Human Preferences via a Compact LLM Ensemble	Bingchen Wang et.al.	2509.11311	null
2025-09-14	Free-MAD: Consensus-Free Multi-Agent Debate	Yu Cui et.al.	2509.11035	null
2025-09-12	FHIR-AgentBench: Benchmarking LLM Agents for Realistic Interoperable EHR Question Answering	Gyubok Lee et.al.	2509.19319	null
2025-09-12	V-Math: An Agentic Approach to the Vietnamese National High School Graduation Mathematics Exams	Duong Q. Nguyen et.al.	2509.12251	null
2025-09-12	Dark Patterns Meet GUI Agents: LLM Agent Susceptibility to Manipulative Interfaces and the Role of Human Oversight	Jingyu Tang et.al.	2509.10723	null
2025-09-12	Self-Supervised Goal-Reaching Results in Multi-Agent Cooperation and Exploration	Chirayu Nimonkar et.al.	2509.10656	null
2025-09-12	SciML Agents: Write the Solver, Not the Solution	Saarth Gaonkar et.al.	2509.09936	null
2025-09-12	Tackling One Health Risks: How Large Language Models are leveraged for Risk Negotiation and Consensus-building	Alexandra Fetsch et.al.	2509.09906	null
2025-09-12	Strategic Tradeoffs Between Humans and AI in Multi-Agent Bargaining	Crystal Qian et.al.	2509.09071	null
2025-09-11	TrEnv: Transparently Share Serverless Execution Environments Across Different Functions and Nodes	Jialiang Huang et.al.	2509.09525	null
2025-09-11	Curriculum-Based Multi-Tier Semantic Exploration via Deep Reinforcement Learning	Abdel Hakim Drid et.al.	2509.09356	null
2025-09-11	Flip Co-op: Cooperative Takeovers in Shared Autonomy	Sandeep Banik et.al.	2509.09281	null
2025-09-11	Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents	Jiawei Wang et.al.	2509.09265	null
2025-09-11	Enabling Regulatory Multi-Agent Collaboration: Architecture, Challenges, and Solutions	Qinnan Hu et.al.	2509.09215	null
2025-09-10	HypoGeneAgent: A Hypothesis Language Agent for Gene-Set Cluster Resolution Selection Using Perturb-seq Datasets	Ying Yuan et.al.	2509.09740	null
2025-09-10	AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning	Zhiheng Xi et.al.	2509.08755	null
2025-09-10	Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations	Ron F. Del Rosario et.al.	2509.08646	null
2025-09-10	AutoODD: Agentic Audits via Bayesian Red Teaming in Black-Box Models	Rebecca Martin et.al.	2509.08638	null
2025-09-09	Multi Robot Coordination in Highly Dynamic Environments: Tackling Asymmetric Obstacles and Limited Communication	Vincenzo Suriani et.al.	2509.08859	null
2025-09-09	EnvX: Agentize Everything with Agentic AI	Linyao Chen et.al.	2509.08088	null
2025-09-09	Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees	Katsuaki Nakano et.al.	2509.07939	null
2025-09-09	Getting In Contract with Large Language Models – An Agency Theory Perspective On Large Language Model Alignment	Sascha Kaltenpoth et.al.	2509.07642	null
2025-09-09	Astra: A Multi-Agent System for GPU Kernel Performance Optimization	Anjiang Wei et.al.	2509.07506	null
2025-09-09	Talking with Oompa Loompas: A novel framework for evaluating linguistic acquisition of LLM agents	Sankalp Tattwadarshi Swain et.al.	2509.07389	null
2025-09-09	Autonomous Code Evolution Meets NP-Completeness	Cunxi Yu et.al.	2509.07367	null
2025-09-09	CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation	Alyssa Unell et.al.	2509.07325	null
2025-09-08	AxelSMOTE: An Agent-Based Oversampling Algorithm for Imbalanced Classification	Sukumar Kishanthan et.al.	2509.06875	null
2025-09-08	RAFFLES: Reasoning-based Attribution of Faults for LLM Systems	Chenyang Zhu et.al.	2509.06822	null
2025-09-08	Reinforcement Learning Foundations for Deep Research Systems: A Survey	Wenjun Li et.al.	2509.06733	null
2025-09-08	REMI: A Novel Causal Schema Memory Architecture for Personalized Lifestyle Recommendation Agents	Vishal Raman et.al.	2509.06269	null
2025-09-08	TalkToAgent: A Human-centric Explanation of Reinforcement Learning Agents with Large Language Models	Haechang Kim et.al.	2509.04809	null
2025-09-08	Meta-Policy Reflexion: Reusable Reflective Memory and Rule Admissibility for Resource-Efficient LLM Agent	Chunlong Wu et.al.	2509.03990	null
2025-09-07	From Digital Distrust to Codified Honesty: Experimental Evidence on Generative AI in Credence Goods Markets	Alexander Erlei et.al.	2509.06069	null
2025-09-07	Let’s Roleplay: Examining LLM Alignment in Collaborative Dialogues	Abhijnan Nath et.al.	2509.05882	null
2025-09-06	DRF: LLM-AGENT Dynamic Reputation Filtering Framework	Yuwei Lou et.al.	2509.05764	null
2025-09-05	Internet 3.0: Architecture for a Web-of-Agents with it’s Algorithm for Ranking Agents	Rajesh Tembarai Krishnamachari et.al.	2509.04979	null
2025-09-05	OSC: Cognitive Orchestration through Dynamic Knowledge Alignment in Multi-Agent LLM Collaboration	Jusheng Zhang et.al.	2509.04876	null
2025-09-05	UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning	Haoming Wang et.al.	2509.02544	null
2025-09-04	Maestro: Joint Graph & Config Optimization for Reliable AI Agents	Wenxiao Wang et.al.	2509.04642	null
2025-09-04	Psychologically Enhanced AI Agents	Maciej Besta et.al.	2509.04343	null
2025-09-04	Are LLM Agents the New RPA? A Comparative Study with RPA Across Enterprise Workflows	Petr Průcha et.al.	2509.04198	null
2025-09-04	MAGneT: Coordinated Multi-Agent Generation of Synthetic Multi-Turn Mental Health Counseling Sessions	Aishik Mandal et.al.	2509.04183	null
2025-09-04	Real-time adaptive quantum error correction by model-free multi-agent learning	Manuel Guatto et.al.	2509.03974	null
2025-09-04	FaMA: LLM-Empowered Agentic Assistant for Consumer-to-Consumer Marketplace	Yineng Yan et.al.	2509.03890	null
2025-09-04	Leveraging LLM-Based Agents for Intelligent Supply Chain Planning	Yongzhi Qi et.al.	2509.03811	null
2025-09-04	AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems?	Guibin Zhang et.al.	2509.03312	null
2025-09-03	Are LLM Agents Behaviorally Coherent? Latent Profiles for Social Simulation	James Mooney et.al.	2509.03736	null
2025-09-02	DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence	Pranav Narayanan Venkit et.al.	2509.04499	null
2025-09-02	Deep Research is the New Analytics System: Towards Building the Runtime for AI-Driven Analytics	Matthew Russo et.al.	2509.02751	null
2025-09-02	The Landscape of Agentic Reinforcement Learning for LLMs: A Survey	Guibin Zhang et.al.	2509.02547	null
2025-09-02	Towards Agents That Know When They Don’t Know: Uncertainty as a Control Signal for Structured Reasoning	Josefa Lia Stoisser et.al.	2509.02401	null
2025-09-02	When Agents go Astray: Course-Correcting SWE Agents with PRMs	Shubham Gandhi et.al.	2509.02360	null
2025-09-01	The Need for Verification in AI-Driven Scientific Discovery	Cristina Cornelio et.al.	2509.01398	null
2025-09-01	Multi-Agent Reinforcement Learning for Task Offloading in Wireless Edge Networks	Andrea Fox et.al.	2509.01257	null
2025-09-01	ORCA: ORchestrating Causal Agent	Joanie Hayoun Chung et.al.	2508.21304	null
2025-09-01	How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$ -bench	Venkatesh Mishra et.al.	2508.20931	null
2025-09-01	Instructional Agents: LLM Agents on Automated Course Material Generation for Teaching Faculties	Huaiyuan Yao et.al.	2508.19611	null
2025-08-31	Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First	Shu Liu et.al.	2509.00997	null
2025-08-30	Inducing State Anxiety in LLM Agents Reproduces Human-Like Biases in Consumer Decision-Making	Ziv Ben-Zion et.al.	2510.06222	null
2025-08-30	Exploring Decision-Making Capabilities of LLM Agents: An Experimental Study on Jump-Jump Game	Juwu Li et.al.	2509.00483	null
2025-08-29	COCORELI: Cooperative, Compositional Reconstitution \& Execution of Language Instructions	Swarnadeep Bhar et.al.	2509.04470	null
2025-08-29	ReLATE: Learning Efficient Sparse Encoding for High-Performance Tensor Decomposition	Ahmed E. Helal et.al.	2509.00280	null
2025-08-29	HiVA: Self-organized Hierarchical Variable Agent via Goal-driven Semantic-Topological Evolution	Jinzhou Tang et.al.	2509.00189	null
2025-08-28	A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers	Ming Hu et.al.	2508.21148	null
2025-08-28	Provable Benefits of In-Tool Learning for Large Language Models	Sam Houliston et.al.	2508.20755	null
2025-08-28	rStar2-Agent: Agentic Reasoning Technical Report	Ning Shang et.al.	2508.20722	null
2025-08-28	CyberSleuth: Autonomous Blue-Team LLM Agent for Web Attack Forensics	Stefano Fumero et.al.	2508.20643	null
2025-08-28	MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers	Zhenting Wang et.al.	2508.20453	null
2025-08-28	MindGuard: Tracking, Detecting, and Attributing MCP Tool Poisoning Attack via Decision Dependence Graph	Zhiqiang Wang et.al.	2508.20412	null
2025-08-27	CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning	Zeyi Sun et.al.	2508.20096	null
2025-08-27	AgentCoMa: A Compositional Benchmark Mixing Commonsense and Mathematical Reasoning in Real-World Scenarios	Lisa Alazraki et.al.	2508.19988	null
2025-08-27	Evaluating Language Model Reasoning about Confidential Information	Dylan Sam et.al.	2508.19980	null
2025-08-27	Secure Multi-LLM Agentic AI and Agentification for Edge General Intelligence by Zero-Trust: A Survey	Yinqiu Liu et.al.	2508.19870	null
2025-08-27	Survey of Specialized Large Language Model	Chenghan Yang et.al.	2508.19667	null
2025-08-27	CompLex: Music Theory Lexicon Constructed by Autonomous Agents for Automatic Music Generation	Zhejing Hu et.al.	2508.19603	null
2025-08-27	Encouraging Good Processes Without the Need for Good Answers: Reinforcement Learning for LLM Agent Planning	Zhiwei Li et.al.	2508.19598	null
2025-08-27	Aegis: Taxonomy and Optimizations for Overcoming Agent-Environment Failures in LLM Agents	Kevin Song et.al.	2508.19504	null
2025-08-27	Interactive Graph Visualization and TeamingRecommendation in an Interdisciplinary Project’sTalent Knowledge Graph	Jiawei Xu et.al.	2508.19489	null
2025-08-26	Reliable Weak-to-Strong Monitoring of LLM Agents	Neil Kale et.al.	2508.19461	null
2025-08-26	Real-Time Model Checking for Closed-Loop Robot Reactive Planning	Christopher Chandler et.al.	2508.19186	null
2025-08-26	MATRIX: Multi-Agent simulaTion fRamework for safe Interactions and conteXtual clinical conversational evaluation	Ernest Lim et.al.	2508.19163	null
2025-08-26	A Concurrent Modular Agent: Framework for Autonomous LLM Agents	Norihiro Maruyama et.al.	2508.19042	null
2025-08-26	CausalMACE: Causality Empowered Multi-Agents in Minecraft Cooperative Tasks	Qi Chai et.al.	2508.18797	null
2025-08-26	Toward Edge General Intelligence with Agentic AI and Agentification: Concepts, Technologies, and Future Directions	Ruichen Zhang et.al.	2508.18725	null
2025-08-26	FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation	Shaswata Mitra et.al.	2508.18684	null
2025-08-26	Utilizing Training Data to Improve LLM Reasoning for Tabular Understanding	Chufan Gao et.al.	2508.18676	null
2025-08-26	Bias-Adjusted LLM Agents for Human-Like Decision-Making via Behavioral Economics	Ayato Kitadai et.al.	2508.18600	null
2025-08-26	Generative Artificial Intelligence and Agents in Research and Teaching	Jussi S. Jauhiainen et.al.	2508.16701	null
2025-08-25	Toward Generalized Autonomous Agents: A Neuro-Symbolic AI Framework for Integrating Social and Technical Support in Education	Ryan Hare et.al.	2508.18406	null
2025-08-25	The AI Data Scientist	Farkhad Akimov et.al.	2508.18113	null
2025-08-25	Memento: Fine-tuning LLM Agents without Fine-tuning LLMs	Huichi Zhou et.al.	2508.16153	null
2025-08-24	FLAIRR-TS – Forecasting LLM-Agents with Iterative Refinement and Retrieval for Time Series	Gunjan Jalori et.al.	2508.19279	null
2025-08-24	Agent-Testing Agent: A Meta-Agent for Automated Testing and Evaluation of Conversational AI Agents	Sameer Komoravolu et.al.	2508.17393	null
2025-08-24	From Language to Action: A Review of Large Language Models as Autonomous Agents and Tool Users	Sadia Sultana Chowa et.al.	2508.17281	null
2025-08-22	AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications	Dawei Gao et.al.	2508.16279	null
2025-08-22	IR-Agent: Expert-Inspired LLM Agents for Structure Elucidation from Infrared Spectra	Heewoong Noh et.al.	2508.16112	null
2025-08-21	Noise, Adaptation, and Strategy: Assessing LLM Fidelity in Decision-Making	Yuanjun Feng et.al.	2508.15926	null
2025-08-21	End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning	Qiaoyu Zheng et.al.	2508.15746	null

Large Language Models

Publish Date	Title	Authors	PDF	Code
2025-10-29	OpenReward: Learning to Reward Long-form Agentic Tasks via Reinforcement Learning	Ziyou Hu et.al.	2510.24636	null
2025-10-28	Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance	Yujie Wei et.al.	2510.24711	null
2025-10-28	ComboBench: Can LLMs Manipulate Physical Devices to Play Virtual Reality Games?	Shuqing Li et.al.	2510.24706	null
2025-10-28	Tongyi DeepResearch Technical Report	Tongyi DeepResearch Team et.al.	2510.24701	null
2025-10-28	Greedy Sampling Is Provably Efficient for RLHF	Di Wu et.al.	2510.24700	null
2025-10-28	WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking	Zhengwei Tao et.al.	2510.24697	null
2025-10-28	AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis	Xuanzhong Chen et.al.	2510.24695	null
2025-10-28	STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence	Zihan Liu et.al.	2510.24693	null
2025-10-28	Dissecting Role Cognition in Medical LLMs via Neuronal Ablation	Xun Liang et.al.	2510.24677	null
2025-10-28	Evolving Diagnostic Agents in a Virtual Clinical Environment	Pengcheng Qiu et.al.	2510.24654	null
2025-10-28	Optimizing Retrieval for RAG via Reinforced Contrastive Learning	Jiawei Zhou et.al.	2510.24652	null
2025-10-28	Advancing site-specific disease and pest management in precision agriculture: From reasoning-driven foundation models to adaptive, feedback-based learning	Nitin Rai et.al.	2510.24650	null
2025-10-28	FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling	Zengzhuang Xu et.al.	2510.24645	null
2025-10-28	Relative Scaling Laws for LLMs	William Held et.al.	2510.24626	null
2025-10-28	Zero-Shot Cross-Lingual Transfer using Prefix-Based Adaptation	Snegha A et.al.	2510.24619	null
2025-10-28	Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way	Yicun Yang et.al.	2510.24605	null
2025-10-28	ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization	Guoxin Chen et.al.	2510.24592	null
2025-10-28	ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers?	Christine Ye et.al.	2510.24591	null
2025-10-28	Generative AI for Healthcare: Fundamentals, Challenges, and Perspectives	Gang Chen et.al.	2510.24551	null
2025-10-28	Open Korean Historical Corpus: A Millennia-Scale Diachronic Collection of Public Domain Texts	Seyoung Song et.al.	2510.24541	null
2025-10-28	Multi-Agent Evolve: LLM Self-Improve through Co-evolution	Yixing Chen et.al.	2510.23595	null
2025-10-28	PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection	Yusu Qian et.al.	2510.23594	null
2025-10-27	PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity	Yuqian Yuan et.al.	2510.23603	null
2025-10-27	Alita-G: Self-Evolving Generative Agent for Agent Generation	Jiahao Qiu et.al.	2510.23601	null
2025-10-27	Think Twice: Branch-and-Rethink Reasoning Reward Model	Yizhu Jiao et.al.	2510.23596	null
2025-10-27	Lightweight Robust Direct Preference Optimization	Cheol Woo Kim et.al.	2510.23590	null
2025-10-27	FARMER: Flow AutoRegressive Transformer over Pixels	Guangting Zheng et.al.	2510.23588	null
2025-10-27	A Survey of Data Agents: Emerging Paradigm or Overstated Hype?	Yizhang Zhu et.al.	2510.23587	null
2025-10-27	RobotArena $\infty$ : Scalable Robot Benchmarking via Real-to-Sim Translation	Yash Jangir et.al.	2510.23571	null
2025-10-27	EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT	Baoqi Pei et.al.	2510.23569	null
2025-10-27	ReCode: Unify Plan and Action for Universal Granularity Control	Zhaoyang Yu et.al.	2510.23564	null
2025-10-27	ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models	Bohan Li et.al.	2510.23558	null
2025-10-27	Minimizing Human Intervention in Online Classification	William Réveillard et.al.	2510.23557	null
2025-10-27	IPQA: A Benchmark for Core Intent Identification in Personalized Question Answering	Jieyong Kim et.al.	2510.23536	null
2025-10-27	Point Convergence of Nesterov’s Accelerated Gradient Method: An AI-Assisted Proof	Uijeong Jang et.al.	2510.23513	null
2025-10-27	Deductive Chain-of-Thought Augmented Socially-aware Robot Navigation World Model	Weizheng Wang et.al.	2510.23509	null
2025-10-27	Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier	Hyeongseop Rha et.al.	2510.23506	null
2025-10-27	VOLD: Reasoning Transfer from LLMs to Vision-Language Models via On-Policy Distillation	Walid Bousselham et.al.	2510.23497	null
2025-10-27	Learning the PTM Code through a Coarse-to-Fine, Mechanism-Aware Framework	Jingjie Zhang et.al.	2510.23492	null
2025-10-27	Learning to Reason Efficiently with Discounted Reinforcement Learning	Alex Ayoub et.al.	2510.23486	null
2025-10-24	A Multimodal Benchmark for Framing of Oil & Gas Advertising and Potential Greenwashing Detection	Gaku Morio et.al.	2510.21679	null
2025-10-24	A Data-Centric Approach to Multilingual E-Commerce Product Search: Case Study on Query-Category and Query-Item Relevance	Yabo Yin et.al.	2510.21671	null
2025-10-24	The Universal Landscape of Human Reasoning	Qiguang Chen et.al.	2510.21623	null
2025-10-24	Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine	Wenyi Wang et.al.	2510.21614	null
2025-10-24	Modest-Align: Data-Efficient Alignment for Vision-Language Models	Jiaxiang Liu et.al.	2510.21606	null
2025-10-24	RETuning: Upgrading Inference-Time Scaling for Stock Movement Prediction with Large Language Models	Xueyuan Lin et.al.	2510.21604	null
2025-10-24	From Polyester Girlfriends to Blind Mice: Creating the First Pragmatics Understanding Benchmarks for Slovene	Mojca Brglez et.al.	2510.21575	null
2025-10-24	ColorEcosystem: Powering Personalized, Standardized, and Trustworthy Agentic Service in massive-agent Ecosystem	Fangwen Wu et.al.	2510.21566	null
2025-10-24	Are the LLMs Capable of Maintaining at Least the Language Genus?	Sandra Mitrović et.al.	2510.21561	null
2025-10-24	EU-Agent-Bench: Measuring Illegal Behavior of LLM Agents Under EU Law	Ilija Lichkovski et.al.	2510.21524	null
2025-10-24	Brain-tuning Improves Generalizability and Efficiency of Brain Alignment in Speech Models	Omer Moussa et.al.	2510.21520	null
2025-10-24	Head Pursuit: Probing Attention Specialization in Multimodal Transformers	Lorenzo Basile et.al.	2510.21518	null
2025-10-24	Wisdom and Delusion of LLM Ensembles for Code Generation and Repair	Fernando Vallecillos Ruiz et.al.	2510.21513	null
2025-10-24	Actionable Cybersecurity Notifications for Smart Homes: A User Study on the Role of Length and Complexity	Victor Jüttner et.al.	2510.21508	null
2025-10-24	MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization	Chenglong Wang et.al.	2510.21473	null
2025-10-24	Risk Management for Mitigating Benchmark Failure Modes: BenchRisk	Sean McGregor et.al.	2510.21460	null
2025-10-24	SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots	Adetayo Adebimpe et.al.	2510.21459	null
2025-10-24	ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models	Federico Danieli et.al.	2510.21450	null
2025-10-24	MoniTor: Exploiting Large Language Models with Instruction for Online Video Anomaly Detection	Shengtian Yang et.al.	2510.21449	null
2025-10-24	REMONI: An Autonomous System Integrating Wearables and Multimodal Large Language Models for Enhanced Remote Health Monitoring	Thanh Cong Ho et.al.	2510.21445	null
2025-10-23	KL-Regularized Reinforcement Learning is Designed to Mode Collapse	Anthony GX-Chen et.al.	2510.20817	null
2025-10-23	Generative Reasoning Recommendation via LLMs	Minjie Hong et.al.	2510.20815	null
2025-10-23	Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation	Yuhan Liu et.al.	2510.20812	null
2025-10-23	On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?	Mingmeng Geng et.al.	2510.20810	null
2025-10-23	Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal Transformers	Dean L Slack et.al.	2510.20807	null
2025-10-23	ARGenSeg: Image Segmentation with Autoregressive Image Generation Model	Xiaolong Wang et.al.	2510.20803	null
2025-10-23	Simple Context Compression: Mean-Pooling and Multi-Ratio Training	Yair Feldman et.al.	2510.20797	null
2025-10-23	A Use-Case Specific Dataset for Measuring Dimensions of Responsible Performance in LLM-generated Text	Alicia Sagae et.al.	2510.20782	null
2025-10-23	RAGRank: Using PageRank to Counter Poisoning in CTI LLM Pipelines	Austin Jia et.al.	2510.20768	null
2025-10-23	Empathic Prompting: Non-Verbal Context Integration for Multimodal LLM Conversations	Lorenzo Stacchio et.al.	2510.20743	null
2025-10-23	Learning to Triage Taint Flows Reported by Dynamic Program Analysis in Node.js Packages	Ronghao Ni et.al.	2510.20739	null
2025-10-23	Automated Extraction of Fluoropyrimidine Treatment and Treatment-Related Toxicities from Clinical Notes Using Natural Language Processing	Xizhi Wu et.al.	2510.20727	null
2025-10-23	User Perceptions of Privacy and Helpfulness in LLM Responses to Privacy-Sensitive Scenarios	Xiaoyuan Wu et.al.	2510.20721	null
2025-10-23	Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models	Xuyang Liu et.al.	2510.20707	null
2025-10-23	Structure-Conditional Minimum Bayes Risk Decoding	Bryan Eikema et.al.	2510.20700	null
2025-10-23	Diagnosing Visual Reasoning: Challenges, Insights, and a Path Forward	Jing Bi et.al.	2510.20696	null
2025-10-23	Exploring Large Language Models for Access Control Policy Synthesis and Summarization	Adarsh Vatsa et.al.	2510.20692	null
2025-10-23	Plan Then Retrieve: Reinforcement Learning-Guided Complex Reasoning over Knowledge Graphs	Yanlin Song et.al.	2510.20691	null
2025-10-23	Neural Diversity Regularizes Hallucinations in Small Models	Kushal Chakrabarti et.al.	2510.20690	null
2025-10-23	Bayesian Jammer Localization with a Hybrid CNN and Path-Loss Mixture of Experts	Mariona Jaramillo-Civill et.al.	2510.20666	null
2025-10-23	Zhyper: Factorized Hypernetworks for Conditioned LLM Fine-Tuning	M. H. I. Abdalla et.al.	2510.19733	null
2025-10-23	Fast Inference via Hierarchical Speculative Decoding	Clara Mohri et.al.	2510.19705	null
2025-10-22	Semantic World Models	Jacob Berg et.al.	2510.19818	null
2025-10-22	olmOCR 2: Unit Test Rewards for Document OCR	Jake Poznanski et.al.	2510.19817	null
2025-10-22	Hubble: a Model Suite to Advance the Study of LLM Memorization	Johnny Tian-Zheng Wei et.al.	2510.19811	null
2025-10-22	Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning	Xichen Zhang et.al.	2510.19807	null
2025-10-22	The Art of Asking: Multilingual Prompt Optimization for Synthetic Data	David Mora et.al.	2510.19806	null
2025-10-22	Forbidden Sidon subsets of perfect difference sets, featuring a human-assisted proof	Boris Alexeev et.al.	2510.19804	null
2025-10-22	Class-Aware Prototype Learning with Negative Contrast for Test-Time Adaptation of Vision-Language Models	Xiaozhen Qiao et.al.	2510.19802	null
2025-10-22	The Feasibility of Training Sovereign Language Models in the Global South: A Study of Brazil and Mexico	Sandra Malagon et.al.	2510.19801	null
2025-10-22	Integrating Transparent Models, LLMs, and Practitioner-in-the-Loop: A Case of Nonprofit Program Evaluation	Ji Ma et.al.	2510.19799	null
2025-10-22	Blackbox Model Provenance via Palimpsestic Membership Inference	Rohith Kuditipudi et.al.	2510.19796	null
2025-10-22	On Controlled Change: Generative AI’s Impact on Professional Authority in Journalism	Tomás Dodds et.al.	2510.19792	null
2025-10-22	ToolDreamer: Instilling LLM Reasoning Into Tool Retrievers	Saptarshi Sengupta et.al.	2510.19791	null
2025-10-22	AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders	Yuezhou Hu et.al.	2510.19779	null
2025-10-22	The Tail Tells All: Estimating Model-Level Membership Inference Vulnerability Without Reference Models	Euodia Dodd et.al.	2510.19773	null
2025-10-22	SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration	Xichen Zhang et.al.	2510.19767	null
2025-10-22	Top-P Masking for Cross Language Information Retrieval	Joseph Casale et.al.	2510.19758	null
2025-10-22	Review of Tools for Zero-Code LLM Based Application Development	Priyaranjan Pattnayak et.al.	2510.19747	null
2025-10-22	RLIE: Rule Generation with Logistic Regression, Iterative Refinement, and Evaluation for Large Language Models	Yang Yang et.al.	2510.19698	null
2025-10-22	Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs	Haochen Wang et.al.	2510.18876	null
2025-10-21	Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting	Howard Chen et.al.	2510.18874	null
2025-10-21	DSI-Bench: A Benchmark for Dynamic Spatial Intelligence	Ziang Zhang et.al.	2510.18873	null
2025-10-21	How Do LLMs Use Their Depth?	Akshat Gupta et.al.	2510.18871	null
2025-10-21	LightMem: Lightweight and Efficient Memory-Augmented Generation	Jizhan Fang et.al.	2510.18866	null
2025-10-21	EffiReasonTrans: RL-Optimized Reasoning for Code Translation	Yanlin Wang et.al.	2510.18863	null
2025-10-21	Streamlining Acceptance Test Generation for Mobile Applications Through Large Language Models: An Industrial Case Study	Pedro Luís Fonseca et.al.	2510.18861	null
2025-10-21	An Encoder-Decoder Foundation Chemical Language Model for Generative Polymer Design	Harikrishna Sahu et.al.	2510.18860	null
2025-10-21	Towards Faithful and Controllable Personalization via Critique-Post-Edit Reinforcement Learning	Chenghao Zhu et.al.	2510.18849	null
2025-10-21	See the Text: From Tokenization to Visual Reading	Ling Xing et.al.	2510.18840	null
2025-10-21	FedDEAP: Adaptive Dual-Prompt Tuning for Multi-Domain Federated Learning	Yubin Zheng et.al.	2510.18837	null
2025-10-21	MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training	Wenxuan Li et.al.	2510.18830	null
2025-10-21	Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework	Yujie Xing et.al.	2510.18825	null
2025-10-21	Fine-Tuned Thoughts: Leveraging Chain-of-Thought Reasoning for Industrial Asset Health Monitoring	Shuxin Lin et.al.	2510.18817	null
2025-10-21	Integrating Large Language Models and Evaluating Student Outcomes in an Introductory Computer Science Course	Annapurna Vadaparty et.al.	2510.18806	null
2025-10-21	FeClustRE: Hierarchical Clustering and Semantic Tagging of App Features from User Reviews	Max Tiessler et.al.	2510.18799	null
2025-10-21	ShaRE your Data! Characterizing Datasets for LLM-based Requirements Engineering	Quim Motger et.al.	2510.18787	null
2025-10-21	KAT-Coder Technical Report	Zizheng Zhan et.al.	2510.18779	null
2025-10-21	Seg the HAB: Language-Guided Geospatial Algae Bloom Reasoning and Segmentation	Patterson Hsieh et.al.	2510.18751	null
2025-10-21	Topoformer: brain-like topographic organization in Transformer language models through spatial querying and reweighting	Taha Binhuraib et.al.	2510.18745	null
2025-10-21	Verifiable Accuracy and Abstention Rewards in Curriculum RL to Alleviate Lost-in-Conversation	Ming Li et.al.	2510.18731	null
2025-10-21	HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models	Sidhant Narula et.al.	2510.18728	null
2025-10-21	IF-VidCap: Can Video Caption Models Follow Instructions?	Shihao Li et.al.	2510.18726	null
2025-10-21	SemiAdapt and SemiLoRA: Efficient Domain Adaptation for Transformer-based Low-Resource Language Translation with a Case Study on Irish	Josh McGiff et.al.	2510.18725	null
2025-10-21	SSD: Spatial-Semantic Head Decoupling for Efficient Autoregressive Image Generation	Siyong Jian et.al.	2510.18716	null
2025-10-21	Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options	Joongkyu Lee et.al.	2510.18713	null
2025-10-21	Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents	Yiqi Lin et.al.	2510.18703	null
2025-10-21	UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation	Yibin Wang et.al.	2510.18701	null
2025-10-21	MLMA: Towards Multilingual with Mamba Based Architectures	Mohamed Nabih Ali et.al.	2510.18684	null
2025-10-21	Exploring Membership Inference Vulnerabilities in Clinical Large Language Models	Alexander Nemecek et.al.	2510.18674	null
2025-10-21	Reasoning Language Model Inference Serving Unveiled: An Empirical Study	Qi Li et.al.	2510.18672	null
2025-10-21	Hardness of Learning Regular Languages in the Next Symbol Prediction Setting	Satwik Bhattamishra et.al.	2510.18634	null
2025-10-21	Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views	Zhangquan Chen et.al.	2510.18632	null
2025-10-21	VAR: Visual Attention Reasoning via Structured Search and Backtracking	Wei Cai et.al.	2510.18619	null
2025-10-21	Evaluating Large Language Models in detecting Secrets in Android Apps	Marco Alecci et.al.	2510.18601	null
2025-10-21	CUARewardBench: A Benchmark for Evaluating Reward Models on Computer-using Agent	Haojia Lin et.al.	2510.18596	null
2025-10-21	Tokencake: A KV-Cache-centric Serving Framework for LLM-based Multi-Agent Applications	Zhuohang Bian et.al.	2510.18586	null
2025-10-21	CLASP: Cost-Optimized LLM-based Agentic System for Phishing Detection	Fouad Trad et.al.	2510.18585	null
2025-10-21	CovMatch: Cross-Covariance Guided Multimodal Dataset Distillation with Trainable Text Encoder	Yongmin Lee et.al.	2510.18583	null
2025-10-21	The Trust Paradox in LLM-Based Multi-Agent Systems: When Collaboration Becomes a Security Vulnerability	Zijie Xu et.al.	2510.18563	null
2025-10-21	Large language models for folktale type automation based on motifs: Cinderella case study	Tjaša Arčon et.al.	2510.18561	null
2025-10-21	Building Trust in Clinical LLMs: Bias Analysis and Dataset Transparency	Svetlana Maslenkova et.al.	2510.18556	null
2025-10-21	JAUNT: Joint Alignment of User Intent and Network State for QoE-centric LLM Tool Routing	Enhan Li et.al.	2510.18550	null
2025-10-21	EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval	Zebin Yang et.al.	2510.18546	null
2025-10-21	SLICE: SLO-Driven Scheduling for LLM Inference on Edge Computing Devices	Pan Zhou et.al.	2510.18544	null
2025-10-21	Noise-Conditioned Mixture-of-Experts Framework for Robust Speaker Verification	Bin Gu et.al.	2510.18533	null
2025-10-21	LLMs as Sparse Retrievers:A Framework for First-Stage Product Search	Hongru Song et.al.	2510.18527	null
2025-10-21	Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models	Hanze Guo et.al.	2510.18526	null
2025-10-21	From Quarter to All: Accelerating Speculative LLM Decoding via Floating-Point Exponent Remapping and Parameter Sharing	Yushu Zhao et.al.	2510.18525	null
2025-10-21	Socialized Learning and Emergent Behaviors in Multi-Agent Systems based on Multimodal Large Language Models	Sureyya Akin et.al.	2510.18515	null
2025-10-21	Identity-Aware Large Language Models require Cultural Reasoning	Alistair Plum et.al.	2510.18510	null
2025-10-21	Prompting the Priorities: A First Look at Evaluating LLMs for Vulnerability Triage and Prioritization	Osama Al Haddad et.al.	2510.18508	null
2025-10-21	Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation	Wei-Chia Chang et.al.	2510.18502	null
2025-10-21	One Size Fits All? A Modular Adaptive Sanitization Kit (MASK) for Customizable Privacy-Preserving Phone Scam Detection	Kangzhong Wang et.al.	2510.18493	null
2025-10-21	The Attribution Story of WhisperGate: An Academic Perspective	Oleksandr Adamov et.al.	2510.18484	null
2025-10-21	StarBench: A Turn-Based RPG Benchmark for Agentic Multimodal Decision-Making and Information Seeking	Haoran Zhang et.al.	2510.18483	null
2025-10-21	How Efficient Are Diffusion Language Models? A Critical Examination of Efficiency Evaluation Practices	Han Peng et.al.	2510.18480	null
2025-10-21	LAFA: Agentic LLM-Driven Federated Analytics over Decentralized Data Sources	Haichao Ji et.al.	2510.18477	null
2025-10-21	Probabilistic Modeling of Intentions in Socially Intelligent LLM Agents	Feifan Xia et.al.	2510.18476	null
2025-10-21	DART: A Structured Dataset of Regulatory Drug Documents in Italian for Clinical NLP	Mariano Barone et.al.	2510.18475	null
2025-10-21	CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment	Xue Jiang et.al.	2510.18471	null
2025-10-21	CircuitSeer: Mining High-Quality Data by Probing Mathematical Reasoning Circuits in LLMs	Shaobo Wang et.al.	2510.18470	null
2025-10-21	IMB: An Italian Medical Benchmark for Question Answering	Antonio Romano et.al.	2510.18468	null
2025-10-21	Simple and Efficient Heterogeneous Temporal Graph Neural Network	Yili Wang et.al.	2510.18467	null
2025-10-21	CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning	Masato Kikuchi et.al.	2510.18466	null
2025-10-21	Large Language Models in Thematic Analysis: Prompt Engineering, Evaluation, and Guidelines for Qualitative Software Engineering Research	Cristina Martinez Montes et.al.	2510.18456	null
2025-10-21	Engagement Undermines Safety: How Stereotypes and Toxicity Shape Humor in Language Models	Atharvan Dogra et.al.	2510.18454	null
2025-10-21	PlanU: Large Language Model Decision Making through Planning under Uncertainty	Ziwei Deng et.al.	2510.18442	null
2025-10-21	Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation	Yasser Hamidullah et.al.	2510.18439	null
2025-10-21	DeepTx: Real-Time Transaction Risk Analysis via Multi-Modal Features and LLM Reasoning	Yixuan Liu et.al.	2510.18438	null
2025-10-21	Chain-of-Conceptual-Thought: Eliciting the Agent to Deeply Think within the Response	Qingqing Gu et.al.	2510.18434	null
2025-10-21	ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization	Yuanhe Guo et.al.	2510.18433	null
2025-10-21	Automated urban waterlogging assessment and early warning through a mixture of foundation models	Chenxu Zhang et.al.	2510.18425	null
2025-10-21	Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents	Guangfu Guo et.al.	2510.18424	null
2025-10-21	SegTune: Structured and Fine-Grained Control for Song Generation	Pengfei Cai et.al.	2510.18416	null
2025-10-21	Adamas: Hadamard Sparse Attention for Efficient Long-Context Inference	Siyuan Yan et.al.	2510.18413	null
2025-10-21	MENTOR: A Reinforcement Learning Framework for Model Enhancement via Teacher-Optimized Rewards in Small Models	ChangSu Choi et.al.	2510.18383	null
2025-10-21	Training Diverse Graph Experts for Ensembles: A Systematic Empirical Study	Gangda Deng et.al.	2510.18370	null
2025-10-21	KoSimpleQA: A Korean Factuality Benchmark with an Analysis of Reasoning LLMs	Donghyeon Ko et.al.	2510.18368	null
2025-10-21	Evaluating LLM-Based Mobile App Recommendations: An Empirical Study	Quim Motger et.al.	2510.18364	null
2025-10-21	KrishokBondhu: A Retrieval-Augmented Voice-Based Agricultural Advisory Call Center for Bengali Farmers	Mohd Ruhul Ameen et.al.	2510.18355	null
2025-10-21	GPTFace: Generative Pre-training of Facial-Linguistic Transformer by Span Masking and Weakly Correlated Text-image Data	Yudong Li et.al.	2510.18345	null
2025-10-21	Combining Distantly Supervised Models with In Context Learning for Monolingual and Cross-Lingual Relation Extraction	Vipul Rathore et.al.	2510.18344	null
2025-10-21	Why Policy Gradient Algorithms Work for Undiscounted Total-Reward MDPs	Jongmin Lee et.al.	2510.18340	null
2025-10-21	ECG-LLM– training and evaluation of domain-specific large language models for electrocardiography	Lara Ahrens et.al.	2510.18339	null
2025-10-21	Position: LLM Watermarking Should Align Stakeholders’ Incentives for Practical Adoption	Yepeng Liu et.al.	2510.18333	null
2025-10-21	InspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger Collaboration	Yunkun Wang et.al.	2510.18327	null
2025-10-21	Beyond Single Models: Mitigating Multimodal Hallucinations via Adaptive Token Ensemble Decoding	Jinlin Li et.al.	2510.18321	null
2025-10-21	Genesis: Evolving Attack Strategies for LLM Web Agent Red-Teaming	Zheng Zhang et.al.	2510.18314	null
2025-10-21	ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive Text-to-Speech Generation	Haowei Lou et.al.	2510.18308	null
2025-10-21	The Impact of Image Resolution on Biomedical Multimodal Large Language Models	Liangyu Chen et.al.	2510.18304	null
2025-10-21	Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models	Lehan Wang et.al.	2510.18303	null
2025-10-21	From Retrieval to Generation: Unifying External and Parametric Knowledge for Medical Question Answering	Lei Li et.al.	2510.18297	null
2025-10-21	BrailleLLM: Braille Instruction Tuning with Large Language Models for Braille Domain Tasks	Tianyuan Huang et.al.	2510.18288	null
2025-10-21	Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs	Yanhong Li et.al.	2510.18279	null
2025-10-21	Enhancing Hotel Recommendations with AI: LLM-Based Review Summarization and Query-Driven Insights	Nikolaos Belibasakis et.al.	2510.18277	null
2025-10-21	StreamingTOM: Streaming Token Compression for Efficient Video Understanding	Xueyi Chen et.al.	2510.18269	null
2025-10-21	UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding	Da Zhang et.al.	2510.18262	null
2025-10-21	DelvePO: Direction-Guided Self-Evolving Framework for Flexible Prompt Optimization	Tao Tao et.al.	2510.18257	null
2025-10-21	Illusions of reflection: open-ended task reveals systematic failures in Large Language Models’ reflective reasoning	Sion Weatherhead et.al.	2510.18254	null

Reinforcement Learning

Publish Date	Title	Authors	PDF	Code
2025-10-29	Prospects for a 95 GeV Higgs Boson at Future Higgs Factories with Transformer Networks	Yabo Dong et.al.	2510.24662	null
2025-10-29	OpenReward: Learning to Reward Long-form Agentic Tasks via Reinforcement Learning	Ziyou Hu et.al.	2510.24636	null
2025-10-28	Cluster Dose Prediction in Carbon Ion Therapy: Using Transfer Learning from a Pretrained Dose Prediction U-Net	Miriam Schwarze et.al.	2510.24703	null
2025-10-28	Greedy Sampling Is Provably Efficient for RLHF	Di Wu et.al.	2510.24700	null
2025-10-28	How Flat is a Plateau? Evolution of Late-Time TDE Disks	Yael Alush et.al.	2510.24696	null
2025-10-28	SPICE: Self-Play In Corpus Environments Improves Reasoning	Bo Liu et.al.	2510.24684	null
2025-10-28	Fare: Failure Resilience in Learned Visual Navigation Control	Zishuo Wang et.al.	2510.24680	null
2025-10-28	Learning to Drive Safely with Hybrid Options	Bram De Cooman et.al.	2510.24674	null
2025-10-28	Evolving Diagnostic Agents in a Virtual Clinical Environment	Pengcheng Qiu et.al.	2510.24654	null
2025-10-28	Advancing site-specific disease and pest management in precision agriculture: From reasoning-driven foundation models to adaptive, feedback-based learning	Nitin Rai et.al.	2510.24650	null
2025-10-28	Fast Bayesian Multilevel Quasi-Monte Carlo	Aleksei G. Sorokin et.al.	2510.24604	null
2025-10-28	Low-lying baryon resonances from lattice QCD	Colin Morningstar et.al.	2510.24596	null
2025-10-28	Towards Quadrupedal Jumping and Walking for Dynamic Locomotion using Reinforcement Learning	Jørgen Anker Olsen et.al.	2510.24584	null
2025-10-28	Dual-Mind World Models: A General Framework for Learning in Dynamic Wireless Networks	Lingyi Wang et.al.	2510.24546	null
2025-10-28	Sample-efficient and Scalable Exploration in Continuous-Time RL	Klemens Iten et.al.	2510.24482	null
2025-10-28	Adaptive Surrogate Gradients for Sequential Reinforcement Learning in Spiking Neural Networks	Korneel Van den Berghe et.al.	2510.24461	null
2025-10-28	Pair Approximation Meets Reality: Diffusion of Innovation in Organizational Networks within the biased-independence q-Voter Model	Angelika Abramiuk-Szurlej et.al.	2510.24447	null
2025-10-28	SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space	Viktoriia Zinkovich et.al.	2510.24446	null
2025-10-28	Fill in the Blanks: Accelerating Q-Learning with a Handful of Demonstrations in Sparse Reward Settings	Seyed Mahdi Basiri Azad et.al.	2510.24432	null
2025-10-28	MiniOneRec: An Open-Source Framework for Scaling Generative Recommendation	Xiaoyu Kong et.al.	2510.24431	null
2025-10-28	Multi-Agent Evolve: LLM Self-Improve through Co-evolution	Yixing Chen et.al.	2510.23595	null
2025-10-28	VOLD: Reasoning Transfer from LLMs to Vision-Language Models via On-Policy Distillation	Walid Bousselham et.al.	2510.23497	null
2025-10-28	SGFusion: Stochastic Geographic Gradient Fusion in Federated Learning	Khoa Nguyen et.al.	2510.23455	null
2025-10-27	Think Twice: Branch-and-Rethink Reasoning Reward Model	Yizhu Jiao et.al.	2510.23596	null
2025-10-27	Cosmic magnification on multi-catalogue Herschel submillimetre galaxies	R. Fernandez-Fernandez et.al.	2510.23582	null
2025-10-27	Towards Stochastic (N-1)-Secure Redispatch	Oleksii Molodchyk et.al.	2510.23551	null
2025-10-27	Variational Thermal State Preparation on Digital Quantum Processors Assisted by Matrix Product States	Rui-Hao Li et.al.	2510.23546	null
2025-10-27	Approximately optimal distributed controls for high-dimensional stochastic systems with pairwise interaction through controls	Elise Devey et.al.	2510.23537	null
2025-10-27	Sequential Multi-Agent Dynamic Algorithm Configuration	Chen Lu et.al.	2510.23535	null
2025-10-27	Learning to Reason Efficiently with Discounted Reinforcement Learning	Alex Ayoub et.al.	2510.23486	null
2025-10-27	MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding	Xin Jin et.al.	2510.23479	null
2025-10-27	Video-Thinker: Sparking “Thinking with Videos” via Reinforcement Learning	Shijian Wang et.al.	2510.23473	null
2025-10-27	Adaptive Multilevel Splitting: First Application to Rare-Event Derivative Pricing	Riccardo Gozzo et.al.	2510.23461	null
2025-10-27	Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences	Zhuoran Jin et.al.	2510.23451	null
2025-10-27	An Information-Theoretic Analysis of Out-of-Distribution Generalization in Meta-Learning with Applications to Meta-RL	Xingtu Liu et.al.	2510.23448	null
2025-10-27	Causal Deep Q Network	Elouanes Khelifi et.al.	2510.23424	null
2025-10-27	A Sequential Planning Framework for the Operational Reality of Interacting Air Traffic Flow Regulations and Traffic Flow Programs	Thinh Hoang et.al.	2510.23402	null
2025-10-27	VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations	Lu Dong et.al.	2510.23397	null
2025-10-27	The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation	Farid Bagirov et.al.	2510.23393	null
2025-10-27	Ground-state phase diagram of S = 1/2 Heisenberg model on 2D square-hexagon-octagon lattice	Yumeng Luo et.al.	2510.23376	null
2025-10-24	Mechanistic Interpretability for Neural TSP Solvers	Reuben Narad et.al.	2510.21693	null
2025-10-24	Reduced Floating-Point Precision Implicit Monte Carlo	Simon Butson et.al.	2510.21683	null
2025-10-24	Goal-based portfolio selection with fixed transaction costs	Erhan Bayraktar et.al.	2510.21650	null
2025-10-24	Electroweak corrections to $gg\rightarrow γγ$	Gabriele Fiore et.al.	2510.21643	null
2025-10-24	Predicted observational effects of rapid rotation for Be stars	Rina G. Rast et.al.	2510.21640	null
2025-10-24	DEEDEE: Fast and Scalable Out-of-Distribution Dynamics Detection	Tala Aljaafari et.al.	2510.21638	null
2025-10-24	DeepAgent: A General Reasoning Agent with Scalable Toolsets	Xiaoxi Li et.al.	2510.21618	null
2025-10-24	Enhancing Tactile-based Reinforcement Learning for Robotic Control	Elle Miller et.al.	2510.21609	null
2025-10-24	Multilevel Picard scheme for solving high-dimensional drift control problems with state constraints	Yuan Zhong et.al.	2510.21607	null
2025-10-24	RETuning: Upgrading Inference-Time Scaling for Stock Movement Prediction with Large Language Models	Xueyuan Lin et.al.	2510.21604	null
2025-10-24	Three-nucleon lepton-number-violating potentials in chiral EFT and their matrix elements in light nuclei	Graham Chambers-Wall et.al.	2510.21564	null
2025-10-24	System-Theoretic Analysis of Dynamic Generalized Nash Equilibrium Problems – Turnpikes and Dissipativity	Sophie Hall et.al.	2510.21556	null
2025-10-24	Cost Minimization for Space-Air-Ground Integrated Multi-Access Edge Computing Systems	Weihong Qin et.al.	2510.21541	null
2025-10-24	A Unified Model for Multi-Task Drone Routing in Post-Disaster Road Assessment	Huatian Gong et.al.	2510.21525	null
2025-10-24	Surrogate-based quantification of policy uncertainty in generative flow networks	Ramón Nartallo-Kaluarachchi et.al.	2510.21523	null
2025-10-24	The population of Galactic young massive star clusters in the TeV range	Rowan Batzofin et.al.	2510.21480	null
2025-10-24	MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization	Chenglong Wang et.al.	2510.21473	null
2025-10-24	Constraints on ultra-heavy dark matter from the CDEX-10 experiment at the China Jinping Underground Laboratory	Y. F. Wang et.al.	2510.21458	null
2025-10-24	Unified token representations for sequential decision models	Zhuojing Tian et.al.	2510.21448	null
2025-10-24	Causality Meets Locality: Provably Generalizable and Scalable Policy Learning for Networked Systems	Hao Liang et.al.	2510.21427	null
2025-10-24	Real-Time Gait Adaptation for Quadrupeds using Model Predictive Control and Reinforcement Learning	Prakrut Kotecha et.al.	2510.20706	null
2025-10-23	KL-Regularized Reinforcement Learning is Designed to Mode Collapse	Anthony GX-Chen et.al.	2510.20817	null
2025-10-23	GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation	Guangqi Jiang et.al.	2510.20813	null
2025-10-23	A Microphysical Probe of Neutron Star Interiors: Constraining the Equation of State with Glitch Dynamics	Zhonghao Tu et.al.	2510.20791	null
2025-10-23	Consumption-Investment Problem in Rank-Based Models	David Itkin et.al.	2510.20763	null
2025-10-23	Reinforcement Learning and Consumption-Savings Behavior	Brandon Kaplowitz et.al.	2510.20748	null
2025-10-23	No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes	Jasmine Bayrooti et.al.	2510.20725	null
2025-10-23	Measuring cosmic dipole with the GRB luminosity-time relation	Jessica Santiago et.al.	2510.20705	null
2025-10-23	Plan Then Retrieve: Reinforcement Learning-Guided Complex Reasoning over Knowledge Graphs	Yanlin Song et.al.	2510.20691	null
2025-10-23	Downsizing Diffusion Models for Cardinality Estimation	Xinhe Mu et.al.	2510.20681	null
2025-10-23	The Shape of Reasoning: Topological Analysis of Reasoning Traces in Large Language Models	Xue Wen Tan et.al.	2510.20665	null
2025-10-23	Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence	Jiahao Meng et.al.	2510.20579	null
2025-10-23	EmbodiedBrain: Expanding Performance Boundaries of Task Planning for Embodied Intelligence	Ding Zou et.al.	2510.20578	null
2025-10-23	Monte Carlo Sampling for Wave Functions Requiring (Anti)Symmetrization	Koyena Bose et.al.	2510.20577	null
2025-10-23	AdaDoS: Adaptive DoS Attack via Deep Adversarial Reinforcement Learning in SDN	Wei Shao et.al.	2510.20566	null
2025-10-23	GlobalRAG: Enhancing Global Reasoning in Multi-hop Question Answering via Reinforcement Learning	Jinchang Luo et.al.	2510.20548	null
2025-10-23	A Unified Framework for Zero-Shot Reinforcement Learning	Jacopo Di Ventura et.al.	2510.20542	null
2025-10-23	Detection of ultra-high-energy cosmic rays in the southern hemisphere with FAST: data acquisition and preliminary results	Jakub Kmec et.al.	2510.20522	null
2025-10-23	Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence	Kun Ouyang et.al.	2510.20470	null
2025-10-23	On Multiple Robustness of Proximal Dynamic Treatment Regimes	Yuanshan Gao et.al.	2510.20451	null
2025-10-23	DAIL: Beyond Task Ambiguity for Language-Conditioned Reinforcement Learning	Runpeng Xie et.al.	2510.19562	null
2025-10-22	olmOCR 2: Unit Test Rewards for Document OCR	Jake Poznanski et.al.	2510.19817	null
2025-10-22	Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing	Yusu Qian et.al.	2510.19808	null
2025-10-22	Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning	Xichen Zhang et.al.	2510.19807	null
2025-10-22	SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration	Xichen Zhang et.al.	2510.19767	null
2025-10-22	SEA: Semantic Map Prediction for Active Exploration of Uncertain Areas	Hongyu Ding et.al.	2510.19766	null
2025-10-22	Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning	Gunshi Gupta et.al.	2510.19732	null
2025-10-22	Semi-Implicit Approaches for Large-Scale Bayesian Spatial Interpolation	Sébastien Garneau et.al.	2510.19722	null
2025-10-22	MedReason-R1: Learning to Reason for CT Diagnosis with Reinforcement Learning and Local Zoom	Yifan Li et.al.	2510.19626	null
2025-10-22	Demonstrating Real Advantage of Machine-Learning-Enhanced Monte Carlo for Combinatorial Optimization	Luca Maria Del Bono et.al.	2510.19544	null
2025-10-22	Quantum Monte Carlo study of low-dimensional Fermi fluids of dipolar atoms	Clio Johnson et.al.	2510.19533	null
2025-10-22	The Confusing Instance Principle for Online Linear Quadratic Control	Waris Radji et.al.	2510.19531	null
2025-10-22	Optimizing the Unknown: Black Box Bayesian Optimization with Energy-Based Model and Reinforcement Learning	Ruiyao Miao et.al.	2510.19530	null
2025-10-22	Learning Upper Lower Value Envelopes to Shape Online RL: A Principled Approach	Sebastian Reboul et.al.	2510.19528	null
2025-10-22	Practical algorithm for simulating thermal pure quantum states	Wei-Bo He et.al.	2510.19504	null
2025-10-22	Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning	Kevin Huang et.al.	2510.19495	null
2025-10-22	Quantum Machine Learning methods for Fourier-based distribution estimation with application in option pricing	Fernando Alonso et.al.	2510.19494	null
2025-10-22	Monte Carlo study of the $O(2)$-invariant $φ^4$ theory with a cubic perturbation in three dimensions	Martin Hasenbusch et.al.	2510.19473	null
2025-10-22	Reasoning Like Experts: Leveraging Multimodal Large Language Models for Drawing-based Psychoanalysis	Xueqi Ma et.al.	2510.19451	null
2025-10-22	Universal Quantitative Abstraction: Categorical Duality and Logical Completeness for Probabilistic Systems	Nivar Anwer et.al.	2510.19444	null
2025-10-21	Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting	Howard Chen et.al.	2510.18874	null
2025-10-21	EffiReasonTrans: RL-Optimized Reasoning for Code Translation	Yanlin Wang et.al.	2510.18863	null
2025-10-21	Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model	Ling Team et.al.	2510.18855	null
2025-10-21	Lyapunov-Aware Quantum-Inspired Reinforcement Learning for Continuous-Time Vehicle Control: A Feasibility Study	Nutkritta Kraipatthanapong et.al.	2510.18852	null
2025-10-21	Towards Faithful and Controllable Personalization via Critique-Post-Edit Reinforcement Learning	Chenghao Zhu et.al.	2510.18849	null
2025-10-21	MADR: MPC-guided Adversarial DeepReach	Ryan Teoh et.al.	2510.18845	null
2025-10-21	PCMS: Parallel Coupler For Multimodel Simulations	Jacob S. Merson et.al.	2510.18838	null
2025-10-21	Actor-Free Continuous Control via Structurally Maximizable Q-Functions	Yigit Korkmaz et.al.	2510.18828	null
2025-10-21	Search Self-play: Pushing the Frontier of Agent Capability without Supervision	Hongliang Lu et.al.	2510.18821	null
2025-10-21	Online SFT for LLM Reasoning: Surprising Effectiveness of Self-Tuning without Rewards	Mengqi Li et.al.	2510.18814	null
2025-10-21	Computational Foundations for Strategic Coopetition: Formalizing Interdependence and Complementarity	Vik Pant et.al.	2510.18802	null
2025-10-21	Two-loop QCD corrections for real and off-shell diphoton and triphoton production via quark loops	Dario Kermanschah et.al.	2510.18801	null
2025-10-21	WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection	Guanzhong He et.al.	2510.18798	null
2025-10-21	Beware of the running $n_s$ when producing heavy primordial black holes	Sasha Allegrini et.al.	2510.18791	null
2025-10-21	Analysis note: measurement of thrust and track energy-energy correlator in $e^+e^-$ collisions at 91.2 GeV with DELPHI open data	Jingyu Zhang et.al.	2510.18762	null
2025-10-21	Verifiable Accuracy and Abstention Rewards in Curriculum RL to Alleviate Lost-in-Conversation	Ming Li et.al.	2510.18731	null
2025-10-21	Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options	Joongkyu Lee et.al.	2510.18713	null
2025-10-21	Chemistry, Climate, and Transmission Spectra of TRAPPIST-1 e Explored with a Multimodel Sparse Sampled Ensemble	Eric T. Wolf et.al.	2510.18704	null
2025-10-21	Reinforcement Learning with Imperfect Transition Predictions: A Bellman-Jensen Approach	Chenbei Lu et.al.	2510.18687	null
2025-10-21	Sherlock Your Queries: Learning to Ask the Right Questions for Dialogue-Based Retrieval	Dong Yun et.al.	2510.18659	null
2025-10-21	An integrated neural wavefunction solver for spinful Fermi systems	Alexander Avdoshkin et.al.	2510.18621	null
2025-10-21	CUARewardBench: A Benchmark for Evaluating Reward Models on Computer-using Agent	Haojia Lin et.al.	2510.18596	null
2025-10-21	Deep Q-Learning Assisted Bandwidth Reservation for Multi-Operator Time-Sensitive Vehicular Networking	Abdullah Al-Khatib et.al.	2510.18553	null
2025-10-21	Improved thermonuclear rate of $^{42}$Ti($p$,$γ$)$^{43}$ V and its astrophysical implication in rp-process	S. Q. Hou et.al.	2510.18531	null
2025-10-21	Efficient Model-Based Reinforcement Learning for Robot Control via Online Learning	Fang Nan et.al.	2510.18518	null
2025-10-21	Socialized Learning and Emergent Behaviors in Multi-Agent Systems based on Multimodal Large Language Models	Sureyya Akin et.al.	2510.18515	null
2025-10-21	Learning to Navigate Under Imperfect Perception: Conformalised Segmentation for Safe Reinforcement Learning	Daniel Bethell et.al.	2510.18485	null
2025-10-21	Safe But Not Sorry: Reducing Over-Conservatism in Safety Critics via Uncertainty-Aware Modulation	Daniel Bethell et.al.	2510.18478	null
2025-10-21	CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment	Xue Jiang et.al.	2510.18471	null
2025-10-21	Uncovering critical temperature dependence in Heusler magnets via explicit machine learning	Jean-Baptiste Morée et.al.	2510.18469	null
2025-10-21	DeLoad: Demand-Driven Short-Video Preloading with Scalable Watch-Time Estimation	Tong Liu et.al.	2510.18459	null
2025-10-21	Fingerprints of cluster-based Haldane and bound-magnon states in a spin-1 Heisenberg diamond chain	Azam Zoshki et.al.	2510.18447	null
2025-10-21	PlanU: Large Language Model Decision Making through Planning under Uncertainty	Ziwei Deng et.al.	2510.18442	null
2025-10-21	Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents	Guangfu Guo et.al.	2510.18424	null
2025-10-21	On AI Verification in Open RAN	Rahul Soundrarajan et.al.	2510.18417	null
2025-10-21	MENTOR: A Reinforcement Learning Framework for Model Enhancement via Teacher-Optimized Rewards in Small Models	ChangSu Choi et.al.	2510.18383	null
2025-10-21	Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback	Yi-Lun Wu et.al.	2510.18353	null
2025-10-21	PGTT: Phase-Guided Terrain Traversal for Perceptive Legged Locomotion	Alexandros Ntagkas et.al.	2510.18348	null
2025-10-21	Why Policy Gradient Algorithms Work for Undiscounted Total-Reward MDPs	Jongmin Lee et.al.	2510.18340	null
2025-10-21	The implications of inflation for the last ACT	Zhi-Chong Qiu et.al.	2510.18320	null
2025-10-21	MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation	Chengshu Li et.al.	2510.18316	null
2025-10-21	Higher Embedding Dimension Creates a Stronger World Model for a Simple Sorting Task	Brady Bhalla et.al.	2510.18315	null
2025-10-21	Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models	Lehan Wang et.al.	2510.18303	null
2025-10-21	Food4All: A Multi-Agent Framework for Real-time Free Food Discovery with Integrated Nutritional Metadata	Zhengqing Yuan et.al.	2510.18289	null
2025-10-21	From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation	Ziwei Huang et.al.	2510.18263	null
2025-10-21	NTKMTL: Mitigating Task Imbalance in Multi-Task Learning from Neural Tangent Kernel Perspective	Xiaohan Qin et.al.	2510.18258	null
2025-10-21	The Picard-Lagrange Framework for Higher-Order Langevin Monte Carlo	Jaideep Mahajan et.al.	2510.18242	null
2025-10-21	Nash Policy Gradient: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria	Eason Yu et.al.	2510.18183	null
2025-10-20	Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains	Soumya Rani Samineni et.al.	2510.18176	null
2025-10-20	LLMs Encode How Difficult Problems Are	William Lugoloobi et.al.	2510.18147	null
2025-10-20	Measuring Reasoning in LLMs: a New Dialectical Angle	Soheil Abbasloo et.al.	2510.18134	null
2025-10-20	R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations	Connor Mattson et.al.	2510.18085	null
2025-10-20	RL-Driven Security-Aware Resource Allocation Framework for UAV-Assisted O-RAN	Zaineh Abughazzah et.al.	2510.18084	null
2025-10-20	Provably Optimal Reinforcement Learning under Safety Filtering	Donggeon David Oh et.al.	2510.18082	null
2025-10-20	R2L: Reliable Reinforcement Learning: Guaranteed Return & Reliable Policies in Reinforcement Learning	Nadir Farhi et.al.	2510.18074	null
2025-10-20	Fine-tuning Flow Matching Generative Models with Intermediate Feedback	Jiajun Fan et.al.	2510.18072	null
2025-10-20	Oxidation State Dynamics and Emerging Patterns in Magnetite	Emre Gürsoy et.al.	2510.18061	null
2025-10-20	SPACeR: Self-Play Anchoring with Centralized Reference Models	Wei-Jer Chang et.al.	2510.18060	null
2025-10-20	Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models	Jiajun Fan et.al.	2510.18053	null
2025-10-20	OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning	Zhenyu Bi et.al.	2510.18032	null
2025-10-20	Humanoid Goalkeeper: Learning from Position Conditioned Task-Motion Constraints	Junli Ren et.al.	2510.18002	null
2025-10-20	Collider Searches for Near-Continuum Dark Matter	Steven Ferrante et.al.	2510.17989	null
2025-10-20	Accelerating Bayesian Inference via Multi-Fidelity Transport Map Coupling	Sanjan C. Muchandimath et.al.	2510.17946	null
2025-10-20	An Exact Quantile-Energy Equality for Terminal Halfspaces in Linear-Gaussian Control with a Discrete-Time Companion, KL/Schrodinger Links, and High-Precision Validation	Sandro Andric et.al.	2510.17945	null
2025-10-20	UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts	Fu-Yun Wang et.al.	2510.17937	null
2025-10-20	EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning	He Du et.al.	2510.17928	null
2025-10-20	Rewarding the Journey, Not Just the Destination: A Composite Path and Answer Self-Scoring Reward Mechanism for Test-Time Reinforcement Learning	Chenwei Tang et.al.	2510.17923	null
2025-10-20	CLAWS:Creativity detection for LLM-generated solutions using Attention Window of Sections	Keuntae Kim et.al.	2510.17921	null
2025-10-20	Functional Distribution Networks (FDN)	Omer Haq et.al.	2510.17794	null
2025-10-20	Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains	Austin Xu et.al.	2510.17793	null
2025-10-20	SoftMimic: Learning Compliant Whole-body Control from Examples	Gabriel B. Margolis et.al.	2510.17792	null
2025-10-20	UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action	Yuhao Yang et.al.	2510.17790	null
2025-10-20	B-Meson Anomalies: Effective Field Theory Meets Machine Learning	Alejandro Mir et.al.	2510.17742	null
2025-10-20	Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations	Tong Chen et.al.	2510.17733	null
2025-10-20	QueST: Incentivizing LLMs to Generate Difficult Problems	Hanxu Hu et.al.	2510.17715	null
2025-10-20	The Marked Edge Walk: A Novel MCMC Algorithm for Sampling of Graph Partitions	Atticus McWhorter et.al.	2510.17714	null
2025-10-20	A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning	Anjie Liu et.al.	2510.17697	null
2025-10-20	Efficient Algorithms for Mitigating Uncertainty and Risk in Reinforcement Learning	Xihong Su et.al.	2510.17690	null
2025-10-20	CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks	Xu Zhang et.al.	2510.17687	null
2025-10-20	RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation	Yuquan Xue et.al.	2510.17640	null
2025-10-20	Colour coherence in small collision systems	Isobel Kolbé et.al.	2510.17570	null
2025-10-20	An Empirical Study of Lagrangian Methods in Safe Reinforcement Learning	Lindsay Spoor et.al.	2510.17564	null
2025-10-20	Towards Optimal Control and Algorithmic Structure of Decompression Schedules	Benjamin Marsh et.al.	2510.17551	null
2025-10-20	OncoReason: Structuring Clinical Reasoning in LLMs for Robust and Interpretable Survival Prediction	Raghu Vamshi Hemadri et.al.	2510.17532	null
2025-10-20	Plasma Shape Control via Zero-shot Generative Reinforcement Learning	Niannian Wu et.al.	2510.17531	null
2025-10-20	Toward Autonomous Neural VMC: An Energy-Variance Convergence Criterion for Quantum Systems	Huan-Chen Shi et.al.	2510.17490	null
2025-10-20	Certified Self-Consistency: Statistical Guarantees and Test-Time Training for Reliable Reasoning in LLMs	Paula Cordero-Encinar et.al.	2510.17472	null
2025-10-20	Estimating Orbital Parameters of Direct Imaging Exoplanet Using Neural Network	Bo Liang et.al.	2510.17459	null
2025-10-20	Agentic Reinforcement Learning for Search is Unsafe	Yushi Yang et.al.	2510.17431	null
2025-10-20	Leveraging Group Relative Policy Optimization to Advance Large Language Models in Traditional Chinese Medicine	Jiacheng Xie et.al.	2510.17402	null
2025-10-20	Finite-Time Bounds for Average-Reward Fitted Q-Iteration	Jongmin Lee et.al.	2510.17391	null
2025-10-20	Inference of Deterministic Finite Automata via Q-Learning	Elaheh Hosseinkhani et.al.	2510.17386	null
2025-10-20	TabR1: Taming GRPO for tabular reasoning LLMs	Pengxiang Cai et.al.	2510.17385	null
2025-10-20	Optimizing Energy Management of Smart Grid using Reinforcement Learning aided by Surrogate models built using Physics-informed Neural Networks	Julen Cestero et.al.	2510.17380	null
2025-10-20	When 5G NTN Meets GNSS: Tracking GNSS Signals under Overlaid 5G Waveforms	Idir Edjekouane et.al.	2510.17324	null
2025-10-20	Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling	Lipeng Xie et.al.	2510.17314	null
2025-10-20	Multimodal Safety Is Asymmetric: Cross-Modal Exploits Unlock Black-Box MLLMs Jailbreaks	Xinkai Wang et.al.	2510.17277	null
2025-10-20	*Characterizing expansivity through $C^$ -algebras**	S. Bautista et.al.	2510.17255	null
2025-10-20	From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models	Zefan Cai et.al.	2510.17247	null
2025-10-20	Deep Neural Network extraction of Unpolarized Transverse Momentum Distributions	I. P. Fernando et.al.	2510.17243	null
2025-10-20	Coinvisor: An RL-Enhanced Chatbot Agent for Interactive Cryptocurrency Investment Analysis	Chong Chen et.al.	2510.17235	null
2025-10-20	D2C-HRHR: Discrete Actions with Double Distributional Critics for High-Risk-High-Return Tasks	Jundong Zhang et.al.	2510.17212	null
2025-10-20	Trading with the Devil: Risk and Return in Foundation Model Strategies	Jinrui Zhang et.al.	2510.17165	null
2025-10-20	ALPINE: A Lightweight and Adaptive Privacy-Decision Agent Framework for Dynamic Edge Crowdsensing	Guanjie Cheng et.al.	2510.17162	null
2025-10-20	GACO-CAD: Geometry-Augmented and Conciseness-Optimized CAD Model Generation from Single Image	Yinghui Wang et.al.	2510.17157	null
2025-10-20	Decentralized Real-Time Planning for Multi-UAV Cooperative Manipulation via Imitation Learning	Shantnav Agarwal et.al.	2510.17143	null
2025-10-20	Rethinking On-policy Optimization for Query Augmentation	Zhichao Xu et.al.	2510.17139	null
2025-10-20	Continuous Q-Score Matching: Diffusion Guided Reinforcement Learning for Continuous-Time Control	Chengxiu Hua et.al.	2510.17122	null
2025-10-20	Learning to Design Soft Hands using Reward Models	Xueqian Bai et.al.	2510.17086	null
2025-10-20	Consistent Zero-Shot Imitation with Contrastive Goal Inference	Kathryn Wantlin et.al.	2510.17059	null

Notes:

We have modified the sorting rule of the above table to prioritize papers based on the time of their latest update rather than their initial publication date. If an article has been recently modified, it will appear earlier in the list.

Function added:

Support more reliable text parser. Link
Support rich markdown format (better at parsing experimental tables). Link