LLM inference Arxiv Daily

Updated on 2026.03.09

inference
MoE
inference

Publish Date	Title	Authors	PDF	Code
2026-03-06	LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis	Tao Zhang et.al.	2603.05904	null
2026-03-05	Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and Bottlenecks	Burak Topcu et.al.	2603.05692	null
2026-03-05	Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents	Natchanon Pollertlam et.al.	2603.04814	null
2026-03-05	SLO-Aware Compute Resource Allocation for Prefill-Decode Disaggregated LLM Inference	Luchang Li et.al.	2603.04716	null
2026-03-04	A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality	Arther Tian et.al.	2603.04028	null
2026-03-03	SEALing the Gap: A Reference Framework for LLM Inference Carbon Estimation via Multi-Benchmark Driven Embodiment	Priyavanshi Pathania et.al.	2603.02949	null
2026-03-03	Agentic Self-Evolutionary Replanning for Embodied Navigation	Guoliang Li et.al.	2603.02772	null
2026-03-03	Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference	Yiqi Liu et.al.	2603.02737	null
2026-03-02	Beyond Microservices: Testing Web-Scale RCA Methods on GPU-Driven LLM Workloads	Dominik Scheinert et.al.	2603.02057	null
2026-03-02	Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning	Jiebin Zhang et.al.	2603.01639	null
2026-03-02	Towards Privacy-Preserving LLM Inference via Collaborative Obfuscation (Technical Report)	Yu Lin et.al.	2603.01499	null
2026-03-02	Agentic Multi-Source Grounding for Enhanced Query Intent Understanding: A DoorDash Case Study	Emmanuel Aboah Boateng et.al.	2603.01486	null
2026-03-02	SFCo-Nav: Efficient Zero-Shot Visual Language Navigation via Collaboration of Slow LLM and Fast Attributed Graph Alignment	Chaoran Xiong et.al.	2603.01477	null
2026-03-02	Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification	Guang Huang et.al.	2603.01399	null
2026-02-27	LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding	Alexander Samarin et.al.	2602.23881	null
2026-02-27	SLA-Aware Distributed LLM Inference Across Device-RAN-Cloud	Hariz Yet et.al.	2602.23722	null
2026-02-26	Discourse-Aware Dual-Track Streaming Response for Low-Latency Spoken Dialogue Systems	Siyuan Liu et.al.	2602.23266	null
2026-02-26	Accelerating Local LLMs on Resource-Constrained Edge Devices via Distributed Prompt Caching	Hiroki Matsutani et.al.	2602.22812	null
2026-02-25	Sustainable LLM Inference using Context-Aware Model Switching	Yuvarani et.al.	2602.22261	null
2026-02-25	Small Wins Big: Comparing Large Language Models and Domain Fine-Tuned Models for Sarcasm Detection in Code-Mixed Hinglish Text	Bitan Majumder et.al.	2602.21933	null
2026-02-26	DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference	Yongtong Wu et.al.	2602.21548	null
2026-02-24	SymTorch: A Framework for Symbolic Distillation of Deep Neural Networks	Elizabeth S. Z. Tan et.al.	2602.21307	null
2026-02-24	ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments	Haley Li et.al.	2602.21140	null
2026-02-24	CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference	Chao Fei et.al.	2602.20732	null
2026-02-24	FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill	Rakshith Jayanth et.al.	2602.20515	null
2026-02-23	KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem	Seongjin Cha et.al.	2602.20217	null
2026-02-21	MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Elastic LLMs	Dongwei Wang et.al.	2602.20191	null
2026-02-22	A Power Market Model with Hypersaclers and Modular Datacenters	Yihsu Chen et.al.	2602.19310	null
2026-02-22	Scaling Inference-Time Computation via Opponent Simulation: Enabling Online Strategic Adaptation in Repeated Negotiation	Xiangyu Liu et.al.	2602.19309	null
2026-02-21	WANSpec: Leveraging Global Compute Capacity for LLM Inference	Noah Martin et.al.	2602.18931	null
2026-02-21	BiScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS	Omar Basit et.al.	2602.18755	null
2026-02-21	HillInfer: Efficient Long-Context LLM Inference on the Edge with Hierarchical KV Eviction using SmartSSD	He Sun et.al.	2602.18750	null
2026-02-24	RPU – A Reasoning Processing Unit	Matthew Adiletta et.al.	2602.18568	null
2026-02-20	Dual-Tree LLM-Enhanced Negative Sampling for Implicit Collaborative Filtering	Jiayi Wu et.al.	2602.18249	null
2026-02-19	Privacy-Preserving Mechanisms Enable Cheap Verifiable Inference of LLMs	Arka Pal et.al.	2602.17223	null
2026-02-18	Privacy-Aware Split Inference with Speculative Decoding for Large Language Models over Wide-Area Networks	Michael Cunningham et.al.	2602.16760	null
2026-02-18	LLM-Driven Intent-Based Privacy-Aware Orchestration Across the Cloud-Edge Continuum	Zijie Su et.al.	2602.16100	null
2026-02-17	CLAA: Cross-Layer Attention Aggregation for Accelerating LLM Prefill	Bradley McDanel et.al.	2602.16054	null
2026-02-17	MoE-Spec: Expert Budgeting for Efficient Speculative Decoding	Bradley McDanel et.al.	2602.16052	null
2026-02-17	Learning to Retrieve Navigable Candidates for Efficient Vision-and-Language Navigation	Shutian Gu et.al.	2602.15724	null
2026-02-16	Efficient Multi-round LLM Inference over Disaggregated Serving	Wenhao He et.al.	2602.14516	null
2026-02-16	WiSparse: Boosting LLM Inference Efficiency with Weight-Aware Mixed Activation Sparsity	Lei Chen et.al.	2602.14452	null
2026-02-15	HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming	Jiahui Chen et.al.	2602.14214	null
2026-02-14	ThunderAgent: A Simple, Fast and Program-Aware Agentic Inference System	Hao Kang et.al.	2602.13692	null
2026-02-13	Characterize LSM-tree Compaction Performance via On-Device LLM Inference	Jiabiao Ding et.al.	2602.12669	null
2026-02-13	Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats	Pengxiang Zhao et.al.	2602.12635	null
2026-02-13	TensorCommitments: A Lightweight Verifiable Inference for Language Models	Oguzhan Baser et.al.	2602.12630	null
2026-02-12	Predicting LLM Output Length via Entropy-Guided Representations	Huanyi Xie et.al.	2602.11812	null
2026-02-12	Deep Kernel Fusion for Transformers	Zixi Zhang et.al.	2602.11808	null
2026-02-12	GORGO: Maximizing KV-Cache Reuse While Minimizing Network Latency in Cross-Region LLM Load Balancing	Alessio Ricci Toniolo et.al.	2602.11688	null
2026-02-12	Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt	Yujie Gu et.al.	2602.11513	null
2026-02-12	Cachemir: Fully Homomorphic Encrypted Inference of Generative Large Language Model with KV Cache	Ye Yu et.al.	2602.11470	null
2026-02-11	Vulnerabilities in Partial TEE-Shielded LLM Inference with Precomputed Noise	Abhishek Saini et.al.	2602.11088	null
2026-02-12	S-GRec: Personalized Semantic-Aware Generative Recommendation with Asymmetric Advantage	Jie Jiang et.al.	2602.10606	null
2026-02-10	Beyond SMILES: Evaluating Agentic Systems for Drug Discovery	Edward Wijaya et.al.	2602.10163	null
2026-02-12	Efficient Remote Prefix Fetching with GPU-native Media ASICs	Liang Mi et.al.	2602.09725	null
2026-02-10	MATA: Multi-Agent Framework for Reliable and Flexible Table Question Answering	Sieun Hyeon et.al.	2602.09642	null
2026-02-10	LLM-CoOpt: A Co-Design and Optimization Framework for Efficient LLM Inference on Heterogeneous Platforms	Jie Kong et.al.	2602.09323	null
2026-02-09	Benchmarking the Energy Savings with Speculative Decoding Strategies	Rohit Dutta et.al.	2602.09113	null
2026-02-09	FlattenGPT: Depth Compression for Transformer with Layer Flattening	Ruihan Xu et.al.	2602.08858	null
2026-02-09	Near-Oracle KV Selection via Pre-hoc Sparsity for Long-Context Inference	Yifei Gao et.al.	2602.08329	null
2026-02-10	Compiler-Assisted Speculative Sampling for Accelerated LLM Inference on Heterogeneous Edge Devices	Alejandro Ruiz y Mesa et.al.	2602.08060	null
2026-02-08	Accuracy-Delay Trade-Off in LLM Offloading via Token-Level Uncertainty	Yumin Kim et.al.	2602.07958	null
2026-02-08	MedCoG: Maximizing LLM Inference Density in Medical Reasoning via Meta-Cognitive Regulation	Yu Zhao et.al.	2602.07905	null
2026-02-08	Rethinking Latency Denial-of-Service: Attacking the LLM Serving Framework, Not the Model	Tianyi Wang et.al.	2602.07878	null
2026-02-07	ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs	Yanlin Qi et.al.	2602.07721	null
2026-02-07	Scout Before You Attend: Sketch-and-Walk Sparse Attention for Efficient LLM Inference	Hoang Anh Duy Le et.al.	2602.07397	null
2026-02-06	SpecAttn: Co-Designing Sparse Attention with Self-Speculative Decoding	Yikang Yue et.al.	2602.07223	null
2026-02-06	Do LLMs Act Like Rational Agents? Measuring Belief Coherence in Probabilistic Decision Making	Khurram Yamin et.al.	2602.06286	null
2026-02-05	Towards Green AI: Decoding the Energy of LLM Inference in Software Development	Lola Solovyeva et.al.	2602.05712	null
2026-02-05	Determining Energy Efficiency Sweet Spots in Production LLM Inference	Hiari Pizzini Cavagna et.al.	2602.05695	null
2026-02-05	Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers	Jingkai Huang et.al.	2602.05395	null
2026-02-05	TIDE: Temporal Incremental Draft Engine for Self-Improving LLM Inference	Jiyoung Park et.al.	2602.05145	null
2026-02-04	GPU-to-Grid: Voltage Regulation via GPU Utilization Control	Zhirui Liang et.al.	2602.05116	null
2026-02-04	Harmonia: Algorithm-Hardware Co-Design for Memory- and Compute-Efficient BFP-based LLM Inference	Xinyu Wang et.al.	2602.04595	null
2026-02-04	LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding	Gang Lin et.al.	2602.04541	null
2026-02-04	BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models	Junyu Chen et.al.	2602.04163	null
2026-02-03	DynSplit-KV: Dynamic Semantic Splitting for KVCache Compression in Efficient Long-Context LLM Inference	Jiancai Ye et.al.	2602.03184	null
2026-02-03	NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference	Jiangyong Yu et.al.	2602.02988	null
2026-02-03	Large-Scale LLM Inference with Heterogeneous Workloads: Prefill-Decode Contention and Asymptotically Optimal Control	Ruihan Lin et.al.	2602.02987	null
2026-02-02	Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing	Lingkun Long et.al.	2602.02159	null
2026-01-30	Fast Forward: Accelerating LLM Prefill with Predictive FFN Sparsity	Aayush Gautam et.al.	2602.00397	null
2026-01-30	Harvest: Opportunistic Peer-to-Peer GPU Caching for LLM Inference	Nikhil Gopal et.al.	2602.00328	null
2026-01-30	EigenAI: Deterministic Inference, Verifiable Results	David Ribeiro Alves et.al.	2602.00182	null
2026-01-30	Safer Policy Compliance with Dynamic Epistemic Fallback	Joseph Marvin Imperial et.al.	2601.23094	null
2026-01-30	Competitive Non-Clairvoyant KV-Cache Scheduling for LLM Inference	Yiding Feng et.al.	2601.22996	null
2026-01-30	Matterhorn: Efficient Analog Sparse Spiking Transformer Architecture with Masked Time-To-First-Spike Encoding	Zhanglu Yan et.al.	2601.22876	null
2026-01-30	OSNIP: Breaking the Privacy-Utility-Efficiency Trilemma in LLM Inference via Obfuscated Semantic Null Space	Zhiyuan Cao et.al.	2601.22752	null
2026-01-30	SCaLRec: Semantic Calibration for LLM-enabled Cloud-Device Sequential Recommendation	Ruiqi Zheng et.al.	2601.22543	null
2026-01-29	Understanding Efficiency: Quantization, Batching, and Serving Strategies in LLM Energy Use	Julien Delavande et.al.	2601.22362	null
2026-01-29	EWSJF: An Adaptive Scheduler with Hybrid Partitioning for Mixed-Workload LLM Inference	Bronislav Sidik et.al.	2601.21758	null
2026-01-29	Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks	Arther Tian et.al.	2601.21189	null
2026-01-28	ChunkWise LoRA: Adaptive Sequence Partitioning for Memory-Efficient Low-Rank Adaptation and Accelerated LLM Inference	Ketan Thakkar et.al.	2601.21109	null
2026-01-29	ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler	Bohua Zou et.al.	2601.20755	null
2026-01-29	DRAINCODE: Stealthy Energy Consumption Attacks on Retrieval-Augmented Code Generation via Context Poisoning	Yanlin Wang et.al.	2601.20615	null
2026-01-28	TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs	Minjae Lee et.al.	2601.20357	null
2026-01-28	Beyond Speedup – Utilizing KV Cache for Sampling and Reasoning	Zeyu Xing et.al.	2601.20326	null
2026-01-28	SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips	Jiahuan Yu et.al.	2601.20309	null
2026-01-28	LogSieve: Task-Aware CI Log Reduction for Sustainable LLM-Based Analysis	Marcus Emmanuel Barnes et.al.	2601.20148	null
2026-01-27	Identifying and Transferring Reasoning-Critical Neurons: Improving LLM Inference Reliability via Activation Steering	Fangan Dong et.al.	2601.19847	null
2026-01-27	DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference	Fuliang Liu et.al.	2601.19278	null
2026-01-26	Randomization Boosts KV Caching, Learning Balances Query Load: A Joint Perspective	Fangzhou Wu et.al.	2601.18999	null
2026-01-26	Flatter Tokens are More Valuable for Speculative Draft Model Training	Jiaming Fan et.al.	2601.18902	null
2026-01-26	Scaling up Privacy-Preserving ML: A CKKS Implementation of Llama-2-7B	Jaiyoung Park et.al.	2601.18511	null
2026-01-26	FABLE: Forest-Based Adaptive Bi-Path LLM-Enhanced Retrieval for Multi-Document Reasoning	Lin Sun et.al.	2601.18116	null
2026-01-25	LLM-42: Enabling Determinism in LLM Inference with Verified Speculation	Raja Gond et.al.	2601.17768	null
2026-01-25	Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction	Jang-Hyun Kim et.al.	2601.17668	null
2026-01-24	GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference	Thomas Ziller et.al.	2601.17551	null
2026-01-22	FlexLLM: Composable HLS Library for Flexible Hybrid LLM Accelerator Design	Jiahao Zhang et.al.	2601.15710	null
2026-01-21	MARS: Unleashing the Power of Speculative Decoding via Margin-Aware Verification	Jingwei Song et.al.	2601.15498	null
2026-01-21	QMC: Efficient SLM Edge Inference via Outlier-Aware Quantization and Emergent Memories Co-Design	Nilesh Prasad Pandey et.al.	2601.14549	null
2026-01-20	HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference	Zhiyuan Shi et.al.	2601.13684	null
2026-01-20	PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator	Yue Jiet Chong et.al.	2601.13628	null
2026-01-19	Explicit Cognitive Allocation: A Principle for Governed and Auditable Inference in Large Language Models	Héctor Manuel Manzanilla-Granados et.al.	2601.13443	null
2026-01-19	Probe and Skip: Self-Predictive Token Skipping for Efficient Long-Context LLM Inference	Zimeng Wu et.al.	2601.13155	null
2026-01-19	From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation	Jiahao Wang et.al.	2601.12904	null
2026-01-18	Power Aware Dynamic Reallocation For Inference	Yiwei Jiang et.al.	2601.12241	null
2026-01-16	RAPID-Serve: Resource-efficient and Accelerated P/D Intra-GPU Disaggregation	Amna Masood et.al.	2601.11822	null
2026-01-16	HALO: Semantic-Aware Distributed LLM Inference in Lossy Edge Network	Peirong Zheng et.al.	2601.11676	null
2026-01-15	WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching	Xiangchen Li et.al.	2601.11652	null
2026-01-16	FORESTLLM: Large Language Models Make Random Forest Great on Few-shot Tabular Learning	Zhihan Yang et.al.	2601.11311	null
2026-01-14	Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs	Jonathan Knoop et.al.	2601.09527	null
2026-01-14	LatencyPrism: Online Non-intrusive Latency Sculpting for SLO-Guaranteed LLM Inference	Du Yin et.al.	2601.09258	null
2026-01-13	HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding	Qitan Lv et.al.	2601.08273	null
2026-01-13	Coordinated Cooling and Compute Management for AI Datacenters	Nardos Belay Abera et.al.	2601.08113	null
2026-01-12	Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference	Rei Taniguchi et.al.	2601.07667	null
2026-01-12	ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs	Haoqian Meng et.al.	2601.07475	null
2026-01-12	TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees	Tianyu Liu et.al.	2601.07353	null
2026-01-12	Stochastic CHAOS: Why Deterministic Inference Kills, and Distributional Variability Is the Heartbeat of Artifical Cognition	Tanmay Joshi et.al.	2601.07239	null
2026-01-09	AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving	Tianhao Xu et.al.	2601.06288	null
2026-01-07	AutoVulnPHP: LLM-Powered Two-Stage PHP Vulnerability Detection and Automated Localization	Zhiqiang Wang et.al.	2601.06177	null
2026-01-14	Challenges and Research Directions for Large Language Model Inference Hardware	Xiaoyu Ma et.al.	2601.05047	null
2026-01-08	Revisiting Judge Decoding from First Principles via Training-Free Distributional Divergence	Shengyin Sun et.al.	2601.04766	null
2026-01-08	GPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language Models	Maanas Taneja et.al.	2601.04719	null
2026-01-07	XGrammar 2: Dynamic and Efficient Structured Generation Engine for Agentic LLMs	Linzhang Li et.al.	2601.04426	null
2026-01-05	LoRA-Drop: Temporal LoRA Decoding for Efficient LLM Inference	Hossein Rajabzadeh et.al.	2601.02569	null
2026-01-06	Making MoE-based LLM Inference Resilient with Tarragon	Songyu Zhang et.al.	2601.01310	null
2026-01-08	From Policy to Logic for Efficient and Interpretable Coverage Assessment	Rhitabrat Pokharel et.al.	2601.01266	null
2026-01-01	FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems	Shanli Xing et.al.	2601.00227	null
2025-12-31	FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference	Fen-Yu Hsieh et.al.	2512.24713	null
2026-01-04	Hardware Acceleration for Neural Networks: A Comprehensive Survey	Bin Xu et.al.	2512.23914	null
2025-12-29	Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding	Yue Guan et.al.	2512.23858	null
2025-12-28	Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware	Alex Khalil et.al.	2512.23029	null
2025-12-28	Argus: Token Aware Distributed LLM Inference Optimization	Panlong Wu et.al.	2512.22925	null
2025-12-27	Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving	Rui Li et.al.	2512.22420	null
2025-12-22	Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs	Xinhao Cheng et.al.	2512.22219	null
2025-12-20	MatKV: Trading Compute for Flash Storage in LLM Inference	Kun-Woo Shin et.al.	2512.22195	null
2025-12-26	Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling	Hannah Atmer et.al.	2512.22066	null
2025-12-26	Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models	Tingyang Sun et.al.	2512.21884	null
2025-12-26	LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices	Mingyu Sun et.al.	2512.21835	null
2025-12-23	Predictive-LoRA: A Proactive and Fragmentation-Aware Serverless Inference System for LLMs	Yinan Ni et.al.	2512.20210	null
2025-12-23	Concept Generalization in Humans and Large Language Models: Insights from the Number Game	Arghavan Bazigaran et.al.	2512.20162	null
2025-12-20	TraCT: Disaggregated LLM Serving with CXL Shared Memory KV Cache at Rack-Scale	Dongha Yoon et.al.	2512.18194	null
2025-12-20	Making Strong Error-Correcting Codes Work Effectively for HBM in AI Inference	Rui Xie et.al.	2512.18152	null
2025-12-19	Specification and Detection of LLM Code Smells	Brahim Mahmoudi et.al.	2512.18020	null
2025-12-19	CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs	Gunho Park et.al.	2512.17970	null
2025-12-19	Enabling Disaggregated Multi-Stage MLLM Inference via GPU-Internal Scheduling and Resource Sharing	Lingxiao Zhao et.al.	2512.17574	null
2025-12-22	Learning What to Write: Write-Gated KV for Efficient Long-Context Inference	Yen-Chieh Huang et.al.	2512.17452	null
2025-12-18	Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference	Dhruv Deshmukh et.al.	2512.16391	null
2025-12-18	Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference	Arther Tian et.al.	2512.16317	null
2025-12-18	Fast Collaborative Inference via Distributed Speculative Decoding	Ce Zheng et.al.	2512.16273	null
2025-12-18	Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference	Jian Tian et.al.	2512.16134	null
2025-12-16	EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving	Shaoting Feng et.al.	2512.14946	null
2025-12-16	Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement	Songze Liu et.al.	2512.14151	null
2025-12-14	Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM	Furong Jia et.al.	2512.12868	null
2025-12-14	Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P	Anurag Dutt et.al.	2512.12801	null
2025-12-13	V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval	Donghyuk Kim et.al.	2512.12284	null
2025-12-12	Learning to Extract Context for Context-Aware LLM Inference	Minseon Kim et.al.	2512.11986	null
2025-12-12	PD-Swap: Prefill-Decode Logic Swapping for End-to-End LLM Inference on Edge FPGAs via Dynamic Partial Reconfiguration	Yifan Zhang et.al.	2512.11550	null
2025-12-12	AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference	Kuan-Wei Lu et.al.	2512.11280	null
2025-12-12	Adaptive Soft Rolling KV Freeze with Entropy-Guided Recovery: Sublinear Memory Growth for Efficient LLM Inference	Adilet Metinov et.al.	2512.11221	null
2025-12-11	LLM-Auction: Generative Auction towards LLM-Native Advertising	Chujie Zhao et.al.	2512.10551	null
2025-12-14	GoodSpeed: Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge Inference	Phuong Tran et.al.	2512.09963	null
2025-12-10	RACAM: Enhancing DRAM with Reuse-Aware Computation and Automated Mapping for ML Inference	Siyuan Ma et.al.	2512.09304	null
2025-12-09	Magneton: Optimizing Energy Efficiency of ML Systems via Differential Energy Debugging	Yi Pan et.al.	2512.08365	null
2025-12-08	NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models	Feng Liang et.al.	2512.07218	null
2025-12-08	Leveraging KV Similarity for Online Structured Pruning in LLMs	Jungmin Lee et.al.	2512.07090	null
2025-12-07	PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance	Jifar Wakuma Ayana et.al.	2512.06747	null
2025-12-07	KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models	Sourjya Roy et.al.	2512.06727	null
2025-12-06	Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices	Xiangyu Li et.al.	2512.06443	null
2025-12-05	Compass: Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads	Boyu Li et.al.	2512.06093	null
2025-12-05	KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity	Damien Lesens et.al.	2512.05916	null
2025-12-05	RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMs	Jonathan Geuter et.al.	2512.05542	null
2025-12-05	Automated Identification of Incidentalomas Requiring Follow-Up: A Multi-Anatomy Evaluation of LLM-Based and Supervised Approaches	Namu Park et.al.	2512.05537	null
2025-12-05	Knowing Your Uncertainty – On the application of LLM in social sciences	Bolun Zhang et.al.	2512.05461	null
2025-12-04	Towards A Cultural Intelligence and Values Inferences Quality Benchmark for Community Values and Common Knowledge	Brittany Johnson et.al.	2512.05176	null
2025-12-04	Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning	Purbesh Mitra et.al.	2512.05105	null
2025-12-04	David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design?	Shashwat Shankar et.al.	2512.05073	null
2025-12-04	MemLoRA: Distilling Expert Adapters for On-Device Memory Systems	Massimo Bini et.al.	2512.04763	null
2025-12-04	EtCon: Edit-then-Consolidate for Reliable Knowledge Editing	Ruilin Li et.al.	2512.04753	null
2025-12-04	RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting	Siqi Wang et.al.	2512.04752	null
2025-12-04	Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild	Yigui Feng et.al.	2512.04728	null
2025-12-04	PBFuzz: Agentic Directed Fuzzing for PoV Generation	Haochen Zeng et.al.	2512.04611	null
2025-12-04	A Light-Weight Large Language Model File Format for Highly-Secure Model Distribution	Huifeng Zhu et.al.	2512.04580	null
2025-12-04	On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference	Yue Yu et.al.	2512.04558	null
2025-12-04	MSME: A Multi-Stage Multi-Expert Framework for Zero-Shot Stance Detection	Yuanshuo Zhang et.al.	2512.04492	null
2025-12-04	LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models	Jiaqi Sun et.al.	2512.04474	null
2025-12-03	AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving	Ying Wang et.al.	2512.04013	null
2025-12-03	OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference	Liujianfu Wang et.al.	2512.03927	null
2025-12-03	Training and Evaluation of Guideline-Based Medical Reasoning in LLMs	Michael Staniek et.al.	2512.03838	null
2025-12-03	ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers	Feice Huang et.al.	2512.03673	null
2025-12-03	KVNAND: Efficient On-Device Large Language Model Inference Using DRAM-Free In-Flash Computing	Lishuo Deng et.al.	2512.03608	null
2025-12-03	EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths	Zhening Li et.al.	2512.03571	null
2025-12-03	A Preliminary Study on the Promises and Challenges of Native Top- $k$ Sparse Attention	Di Xiu et.al.	2512.03494	null
2025-12-03	From Hypothesis to Premises: LLM-based Backward Logical Reasoning with Selective Symbolic Translation	Qingchuan Li et.al.	2512.03360	null
2025-12-03	Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs	Ngoc Bui et.al.	2512.03324	null
2025-12-02	LLM-Guided Material Inference for 3D Point Clouds	Nafiseh Izadyar et.al.	2512.03237	null
2025-12-02	TokenPowerBench: Benchmarking the Power Consumption of LLM Inference	Chenxu Niu et.al.	2512.03024	null
2025-12-02	Distribution-Calibrated Inference time compute for Thinking LLM-as-a-Judge	Hamid Dadkhahi et.al.	2512.03019	null
2025-12-02	FAIRY2I: Universal Extremely-Low Bit QAT framework via Widely-Linear Representation and Phase-Aware Quantization	Feiyu Wang et.al.	2512.02901	null
2025-12-02	OptPO: Optimal Rollout Allocation for Test-time Policy Optimization	Youkang Wang et.al.	2512.02882	null
2025-12-02	Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages	Lechen Zhang et.al.	2512.02841	null
2025-12-02	FiMMIA: scaling semantic perturbation-based membership inference across modalities	Anton Emelyanov et.al.	2512.02786	null
2025-12-02	Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs	Julian Ma et.al.	2512.02719	null
2025-12-02	CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning	Songqiao Su et.al.	2512.02551	null
2025-12-02	In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs	Vishnu Sarukkai et.al.	2512.02543	null
2025-12-02	Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective	Qiyao Xue et.al.	2512.02340	null
2025-12-01	Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling	Jack Cook et.al.	2512.02010	null
2025-12-01	The Art of Scaling Test-Time Compute for Large Language Models	Aradhye Agarwal et.al.	2512.02008	null
2025-12-01	KV Pareto: Systems-Level Optimization of KV Cache and Model Compression for Long Context Inference	Sai Gokhale et.al.	2512.01953	null
2025-12-01	Latent Debate: A Surrogate Framework for Interpreting LLM Thinking	Lihu Chen et.al.	2512.01909	null
2025-12-01	DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models	Patrick Kwon et.al.	2512.01686	null
2025-12-01	A Systematic Characterization of LLM Inference on GPUs	Haonan Wang et.al.	2512.01644	null
2025-12-01	LLM2Fx-Tools: Tool Calling For Music Post-Production	Seungheon Doh et.al.	2512.01559	null
2025-12-01	Multi-Path Collaborative Reasoning via Reinforcement Learning	Jindi Lv et.al.	2512.01485	null
2025-12-01	ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation	Rohin Manvi et.al.	2512.01457	null
2025-12-01	Kardia-R1: Unleashing LLMs to Reason toward Understanding and Empathy for Emotional Support via Rubric-as-Judge Reinforcement Learning	Jiahao Yuan et.al.	2512.01282	null
2025-11-30	Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios	Jianxiang Zang et.al.	2512.00920	null
2025-11-30	AFRAgent : An Adaptive Feature Renormalization Based High Resolution Aware GUI agent	Neeraj Anand et.al.	2512.00846	null
2025-11-30	ARCADIA: Scalable Causal Discovery for Corporate Bankruptcy Analysis Using Agentic AI	Fabrizio Maturo et.al.	2512.00839	null
2025-11-30	SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving	Bohan Zhao et.al.	2512.00719	null
2025-11-29	SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling	Yang Xiao et.al.	2512.00466	null
2025-11-29	Echo-N1: Affective RL Frontier	Naifan Zhang et.al.	2512.00344	null
2025-11-29	Efficient Kernel Mapping and Comprehensive System Evaluation of LLM Acceleration on a CGLA	Takuto Ando et.al.	2512.00335	null
2025-11-29	RL-Struct: A Lightweight Reinforcement Learning Framework for Reliable Structured Output in LLMs	Ruike Hu et.al.	2512.00319	null
2025-11-29	Evolving Paradigms in Task-Based Search and Learning: A Comparative Analysis of Traditional Search Engine with LLM-Enhanced Conversational Search System	Zhitong Guan et.al.	2512.00313	null
2025-11-28	Demystifying Errors in LLM Reasoning Traces: An Empirical Study of Code Execution Simulation	Mohammad Abdollahi et.al.	2512.00215	null
2025-11-28	ThetaEvolve: Test-time Learning on Open Problems	Yiping Wang et.al.	2511.23473	null
2025-11-28	Behavior-Equivalent Token: Single-Token Replacement for Long Prompts in LLMs	Jiancheng Dong et.al.	2511.23271	null
2025-11-28	Unlocking Multilingual Reasoning Capability of LLMs and LVLMs through Representation Engineering	Qiming Li et.al.	2511.23231	null
2025-11-28	HPSU: A Benchmark for Human-Level Perception in Real-World Spoken Speech Understanding	Chen Li et.al.	2511.23178	null
2025-11-28	Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match	Jinze Li et.al.	2511.22972	null
2025-11-28	Experts are all you need: A Composable Framework for Large Language Model Inference	Shrihari Sridharan et.al.	2511.22955	null
2025-11-28	Visual Puns from Idioms: An Iterative LLM-T2IM-MLLM Framework	Kelaiti Xiao et.al.	2511.22943	null
2025-11-28	RAG-Empowered LLM-Driven Dynamic Radio Resource Management in Open 6G RAN	Onur Salan et.al.	2511.22933	null
2025-11-28	Serving Heterogeneous LoRA Adapters in Distributed LLM Inference Systems	Shashwat Jaiswal et.al.	2511.22880	null
2025-11-27	PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration	Junfei Zhan et.al.	2511.22788	null
2025-11-27	CacheTrap: Injecting Trojans in LLMs without Leaving any Traces in Inputs or Weights	Mohaiminul Al Nahian et.al.	2511.22681	null
2025-11-27	GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents	Xinyu Zhang et.al.	2511.22441	null
2025-11-27	FADiff: Fusion-Aware Differentiable Optimization for DNN Scheduling on Tensor Accelerators	Shuao Jia et.al.	2511.22348	null
2025-11-27	Edge Deployment of Small Language Models, a comprehensive comparison of CPU, GPU and NPU backends	Pablo Prieto et.al.	2511.22334	null
2025-11-27	RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems	Mengfan Li et.al.	2511.22275	null
2025-11-27	Aquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR	Yuyang Zou et.al.	2511.22267	null
2025-11-27	Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information	Lukas Struppek et.al.	2511.22176	null
2025-11-27	Statistical Independence Aware Caching for LLM Workflows	Yihan Dai et.al.	2511.22118	null
2025-11-26	A Comparative Study of LLM Prompting and Fine-Tuning for Cross-genre Authorship Attribution on Chinese Lyrics	Yuxin Li et.al.	2511.21930	null
2025-11-26	Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework	Dong Wang et.al.	2511.21686	null
2025-11-26	DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving	Fengze Yu et.al.	2511.21669	null
2025-11-26	Auxiliary Metrics Help Decoding Skill Neurons in the Wild	Yixiu Zhao et.al.	2511.21610	null
2025-11-26	Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM	Tim Trappen et.al.	2511.21413	null
2025-11-26	PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark	Robert Belanec et.al.	2511.21285	null
2025-11-26	BRIDGE: Building Representations In Domain Guided Program Verification	Robert Joseph George et.al.	2511.21104	null
2025-11-26	MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts	Ivan Novikov et.al.	2511.21089	null
2025-11-26	OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection	Chujie Wang et.al.	2511.21064	null
2025-11-26	LOOM: Personalized Learning Informed by Daily LLM Conversations Toward Long-Term Mastery via a Dynamic Learner Memory Graph	Justin Cui et.al.	2511.21037	null
2025-11-26	CaptionQA: Is Your Caption as Useful as the Image Itself?	Shijia Yang et.al.	2511.21025	null
2025-11-26	A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving	Junhan Liao et.al.	2511.20982	null
2025-11-26	Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic Workflows	Yinwei Dai et.al.	2511.20975	null
2025-11-25	Representation Interventions Enable Lifelong Unstructured Knowledge Control	Xuyuan Liu et.al.	2511.20892	null
2025-11-25	Latent Collaboration in Multi-Agent Systems	Jiaru Zou et.al.	2511.20639	null
2025-11-25	DiFR: Inference Verification Despite Nondeterminism	Adam Karvonen et.al.	2511.20621	null
2025-11-25	Beyond Generation: Multi-Hop Reasoning for Factual Accuracy in Vision-Language Models	Shamima Hossain et.al.	2511.20531	null
2025-11-25	Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios	Luohe Shi et.al.	2511.20340	null
2025-11-25	LLM-Driven Transient Stability Assessment: From Automated Simulation to Neural Architecture Design	Lianzhe Hu et.al.	2511.20276	null
2025-11-25	REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance	Chuyi Kong et.al.	2511.20233	null
2025-11-25	Beluga: A CXL-Based Memory Architecture for Scalable and Efficient LLM KVCache Management	Xinjun Yang et.al.	2511.20172	null
2025-11-25	SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space	Zhenyi Shen et.al.	2511.20102	null
2025-11-25	More Bias, Less Bias: BiasPrompting for Enhanced Multiple-Choice Question Answering	Duc Anh Vu et.al.	2511.20086	null
2025-11-25	Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design	Zixiao Huang et.al.	2511.20048	null
2025-11-25	CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model	Dapeng Zhang et.al.	2511.19914	null
2025-11-25	Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models	Wentao Hu et.al.	2511.19822	null
2025-11-24	Gender Bias in Emotion Recognition by Large Language Models	Maureen Herbert et.al.	2511.19785	null
2025-11-24	Learning to Reason: Training LLMs with GPT-OSS or DeepSeek R1 Reasoning Traces	Shaltiel Shmidman et.al.	2511.19333	null
2025-11-24	MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization	Boyuan Wu et.al.	2511.19253	null
2025-11-24	Learning Plug-and-play Memory for Guiding Video Diffusion Models	Selena Song et.al.	2511.19229	null
2025-11-24	From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation	Moazzam Umer Gondal et.al.	2511.19149	null
2025-11-24	SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression	Santhosh G S et.al.	2511.18936	null
2025-11-24	Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations	Ryan Wong et.al.	2511.18933	null
2025-11-24	KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit	Dezhi Ran et.al.	2511.18868	null
2025-11-24	Think Before You Prune: Selective Self-Generated Calibration for Pruning Large Reasoning Models	Yang Xiang et.al.	2511.18864	null
2025-11-24	UNeMo: Collaborative Visual-Language Reasoning and Navigation via a Multimodal World Model	Changxin Huang et.al.	2511.18845	null
2025-11-24	Optimizing LLM Code Suggestions: Feedback-Driven Timing with Lightweight State Bounds	Mohammad Nour Al Awad et.al.	2511.18842	null
2025-11-23	A Needle in a Haystack: Intent-driven Reusable Artifacts Recommendation with LLMs	Dongming Jin et.al.	2511.18343	null
2025-11-23	Skypilot: Fine-Tuning LLM with Physical Grounding for AAV Coverage Search	Zhongkai Chen et.al.	2511.18270	null
2025-11-23	LLM Reasoning for Cold-Start Item Recommendation	Shijun Li et.al.	2511.18261	null
2025-11-22	Towards Harnessing the Power of LLMs for ABAC Policy Mining	More Aayush Babasaheb et.al.	2511.18098	null
2025-11-22	L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention	Yuliang Zhan et.al.	2511.17910	null
2025-11-22	QuickLAP: Quick Language-Action Preference Learning for Autonomous Driving Agents	Jordan Abi Nader et.al.	2511.17855	null
2025-11-21	Deterministic Inference across Tensor Parallel Sizes That Eliminates Training-Inference Mismatch	Ziyang Zhang et.al.	2511.17826	null
2025-11-21	APRIL: Annotations for Policy evaluation with Reliable Inference from LLMs	Aishwarya Mandyam et.al.	2511.17818	null
2025-11-21	That’s not natural: The Impact of Off-Policy Training Data on Probe Performance	Nathalie Kirch et.al.	2511.17408	null
2025-11-21	SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion	Jiajie Guo et.al.	2511.17308	null
2025-11-21	Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models	Vy Nguyen et.al.	2511.17170	null
2025-11-21	ChainV: Atomic Visual Hints Make Multimodal Reasoning Shorter and Better	Yuan Zhang et.al.	2511.17106	null
2025-11-21	Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters	Zhan Su et.al.	2511.17044	null
2025-11-21	Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems	Kirill Nagaitsev et.al.	2511.16964	null
2025-11-20	Comparison of Text-Based and Image-Based Retrieval in Multimodal Retrieval Augmented Generation Large Language Model Systems	Elias Lumer et.al.	2511.16654	null
2025-11-20	Integrating Symbolic Natural Language Understanding and Language Models for Word Sense Disambiguation	Kexin Zhao et.al.	2511.16577	null
2025-11-20	The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation	Jiaheng Zhang et.al.	2511.16543	null
2025-11-20	Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks	Éloïse Benito-Rodriguez et.al.	2511.16540	null
2025-11-20	Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement	Jiashu Yao et.al.	2511.16331	null
2025-11-20	SDA: Steering-Driven Distribution Alignment for Open LLMs without Fine-Tuning	Wei Xia et.al.	2511.16324	null
2025-11-20	T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs	Shao-Jun Xia et.al.	2511.16107	null
2025-11-20	Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio	Mohan Shi et.al.	2511.16046	null
2025-11-20	A Scalable NorthPole System with End-to-End Vertical Integration for Low-Latency and Energy-Efficient LLM Inference	Michael V. DeBole et.al.	2511.15950	null
2025-11-19	Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization	Rahul Krishna Thomas et.al.	2511.15898	null
2025-11-19	MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping	Yushi Huang et.al.	2511.15690	null
2025-11-19	A Tensor Compiler for Processing-In-Memory Architectures	Peiming Yang et.al.	2511.15503	null
2025-11-19	Know Your Intent: An Autonomous Multi-Perspective LLM Agent Framework for DeFi User Transaction Intent Mining	Qian’ang Mao et.al.	2511.15456	null
2025-11-19	Unveiling Inference Scaling for Difference-Aware User Modeling in LLM Personalization	Suyu Chen et.al.	2511.15389	null
2025-11-19	HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning	Alexis Correa-Guillén et.al.	2511.15355	null
2025-11-19	OEMA: Ontology-Enhanced Multi-Agent Collaboration Framework for Zero-Shot Clinical Named Entity Recognition	Xinli Tao et.al.	2511.15211	null
2025-11-19	As If We’ve Met Before: LLMs Exhibit Certainty in Recognizing Seen Files	Haodong Li et.al.	2511.15192	null
2025-11-19	Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference	Kexin Chu et.al.	2511.15015	null
2025-11-18	Near-Lossless Model Compression Enables Longer Context Inference in DNA Large Language Models	Rui Zhu et.al.	2511.14694	null
2025-11-18	Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer	Kallol Mondal et.al.	2511.14691	null
2025-11-18	Bias in, Bias out: Annotation Bias in Multilingual Large Language Models	Xia Cui et.al.	2511.14662	null
2025-11-18	AutoTool: Efficient Tool Selection for Large Language Model Agents	Jingyi Jia et.al.	2511.14650	null
2025-11-18	A Controllable Perceptual Feature Generative Model for Melody Harmonization via Conditional Variational Autoencoder	Dengyun Huang et.al.	2511.14600	null
2025-11-18	Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language	Minyoung Hwang et.al.	2511.14565	null
2025-11-18	CLO: Efficient LLM Inference System with CPU-Light KVCache Offloading via Algorithm-System Co-Design	Jiawei Yi et.al.	2511.14510	null
2025-11-18	Hyperion: Hierarchical Scheduling for Parallel LLM Acceleration in Multi-tier Networks	Mulei Ma et.al.	2511.14450	null
2025-11-18	PathMind: A Retrieve-Prioritize-Reason Framework for Knowledge Graph Reasoning with Large Language Models	Yu Liu et.al.	2511.14256	null
2025-11-18	Run, Ruminate, and Regulate: A Dual-process Thinking System for Vision-and-Language Navigation	Yu Zhong et.al.	2511.14131	null
2025-11-18	PRISM: Prompt-Refined In-Context System Modelling for Financial Retrieval	Chun Chet Ng et.al.	2511.14130	null
2025-11-18	Real-Time Mobile Video Analytics for Pre-arrival Emergency Medical Services	Liuyi Jin et.al.	2511.14119	null
2025-11-18	FailSafe: High-performance Resilient Serving	Ziyi Xu et.al.	2511.14116	null
2025-11-17	TZ-LLM: Protecting On-Device Large Language Models with Arm TrustZone	Xunjie Wang et.al.	2511.13717	null
2025-11-17	T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization	Hyunwoo Oh et.al.	2511.13676	null
2025-11-17	Tight and Practical Privacy Auditing for Differentially Private In-Context Learning	Yuyang Xia et.al.	2511.13502	null
2025-11-17	Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment	Jea Kwon et.al.	2511.13290	null
2025-11-17	Computational Measurement of Political Positions: A Review of Text-Based Ideal Point Estimation Algorithms	Patrick Parschan et.al.	2511.13238	null
2025-11-17	TokenSqueeze: Performance-Preserving Compression for Reasoning LLMs	Yuxiang Zhang et.al.	2511.13223	null
2025-11-17	TCM-5CEval: Extended Deep Evaluation Benchmark for LLM’s Comprehensive Clinical Research Competence in Traditional Chinese Medicine	Tianai Huang et.al.	2511.13169	null
2025-11-17	MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity	Vladimír Macko et.al.	2511.13061	null
2025-11-17	RAGPulse: An Open-Source RAG Workload Trace to Optimize RAG Serving Systems	Zhengchao Wang et.al.	2511.12979	null
2025-11-17	MedRule-KG: A Knowledge-Graph–Steered Scaffold for Reliable Mathematical and Biomedical Reasoning	Crystal Su et.al.	2511.12963	null
2025-11-16	ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction	Pengze Li et.al.	2511.12485	null
2025-11-16	Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models	Chenglong Wang et.al.	2511.12464	null
2025-11-15	Optimal Self-Consistency for Efficient Reasoning with Large Language Models	Austin Feng et.al.	2511.12309	null
2025-11-15	Sangam: Chiplet-Based DRAM-PIM Accelerator with CXL Integration for LLM Inferencing	Khyati Kiyawat et.al.	2511.12286	null
2025-11-15	MME-RAG: Multi-Manager-Expert Retrieval-Augmented Generation for Fine-Grained Entity Recognition in Task-Oriented Dialogues	Liang Xue et.al.	2511.12213	null
2025-11-15	AI-Salesman: Towards Reliable Large Language Model Driven Telemarketing	Qingyu Zhang et.al.	2511.12133	null
2025-11-15	OAD-Promoter: Enhancing Zero-shot VQA using Large Language Models with Object Attribute Description	Quanxing Xu et.al.	2511.12131	null
2025-11-15	BudgetLeak: Membership Inference Attacks on RAG Systems via the Generation Budget Side Channel	Hao Li et.al.	2511.12043	null
2025-11-15	Striking the Right Balance between Compute and Copy: Improving LLM Inferencing Under Speculative Decoding	Arun Ramachandran et.al.	2511.12031	null
2025-11-14	Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models	Siyou Li et.al.	2511.11910	null
2025-11-14	Experience-Guided Adaptation of Inference-Time Reasoning Strategies	Adam Stein et.al.	2511.11519	null
2025-11-14	W2S-AlignTree: Weak-to-Strong Inference-Time Alignment for Large Language Models via Monte Carlo Tree Search	Zhenyu Ding et.al.	2511.11518	null
2025-11-14	MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism	Shulin Liu et.al.	2511.11373	null
2025-11-14	iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference	Wei Fan et.al.	2511.11306	null
2025-11-14	T-MAN: Enabling End-to-End Low-Bit LLM Inference on NPUs via Unified Table Lookup	Jianyu Wei et.al.	2511.11248	null
2025-11-14	STaR: Towards Cognitive Table Reasoning via Slow-Thinking Large Language Models	Huajian Zhang et.al.	2511.11233	null
2025-11-14	AccKV: Towards Efficient Audio-Video LLMs Inference via Adaptive-Focusing and Cross-Calibration KV Cache Optimization	Zhonghua Jiang et.al.	2511.11106	null
2025-11-14	GraphMASAL: A Graph-based Multi-Agent System for Adaptive Learning	Biqing Zeng et.al.	2511.11035	null
2025-11-14	DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition	HongYu Liu et.al.	2511.11000	null
2025-11-14	DEFT-LLM: Disentangled Expert Feature Tuning for Micro-Expression Recognition	Ren Zhang et.al.	2511.10948	null
2025-11-13	ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference	Yesheng Liang et.al.	2511.10645	null
2025-11-13	Scalable Synthesis of distributed LLM workloads through Symbolic Tensor Graphs	Changhai Man et.al.	2511.10480	null
2025-11-13	FactGuard: Event-Centric and Commonsense-Guided Fake News Detection	Jing He et.al.	2511.10281	null
2025-11-13	Efficient Thought Space Exploration through Strategic Intervention	Ziheng Li et.al.	2511.10038	null
2025-11-13	EnchTable: Unified Safety Alignment Transfer in Fine-tuned Large Language Models	Jialin Wu et.al.	2511.09880	null
2025-11-13	HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning	Nikunj Gupta et.al.	2511.09873	null
2025-11-12	From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance	Jeongho Min et.al.	2511.09820	null
2025-11-13	LLM Inference Beyond a Single Node: From Bottlenecks to Mitigations with Fast All-Reduce Communication	Prajwal Singhania et.al.	2511.09557	null
2025-11-12	Seer Self-Consistency: Advance Budget Estimation for Adaptive Test-Time Scaling	Shiyu Ji et.al.	2511.09345	null
2025-11-12	Mixture-of-Channels: Exploiting Sparse FFNs for Efficient LLMs Pre-Training and Inference	Tong Wu et.al.	2511.09323	null
2025-11-10	Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models	Tianrui Song et.al.	2511.07295	null
2025-11-10	P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats	Yuzong Chen et.al.	2511.06838	null
2025-11-09	Optimizing Long-context LLM Serving via Fine-grained Sequence Parallelism	Cong Li et.al.	2511.06247	null
2025-11-09	LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs	Zifan He et.al.	2511.06174	null
2025-11-08	MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference	Myunghyun Rhee et.al.	2511.06010	null
2025-11-08	MCP-RiskCue: Can LLM infer risk information from MCP server System Logs?	Jiayi Fu et.al.	2511.05867	null
2025-11-06	Enabling Dynamic Sparsity in Quantized LLM Inference	Rongxiang Wang et.al.	2511.04477	null
2025-11-06	E-CARE: An Efficient LLM-based Commonsense-Augmented Framework for E-Commerce	Ge Zhang et.al.	2511.04087	null
2025-11-06	PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration	Yue Jiet Chong et.al.	2511.04036	null
2025-11-06	LLM-Driven Adaptive Source-Sink Identification and False Positive Mitigation for Static Analysis	Shiyin Lin et.al.	2511.04023	null
2025-11-05	RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse	Yinsicheng Jiang et.al.	2511.03475	null
2025-11-07	UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM	Hai Huang et.al.	2511.03293	null
2025-11-04	Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes	Mohammadsajad Alipour et.al.	2511.02681	null
2025-11-04	Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks	Xiumei Deng et.al.	2511.02647	null
2025-11-04	Verifying LLM Inference to Prevent Model Weight Exfiltration	Roy Rinberg et.al.	2511.02620	null
2025-11-03	KV Cache Transform Coding for Compact Storage in LLM Inference	Konrad Staniszewski et.al.	2511.01815	null
2025-11-04	Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding	Jungyeon Koh et.al.	2511.01695	null
2025-11-03	Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving	Chengying Huan et.al.	2511.01633	null
2025-11-03	When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding	Min Fang et.al.	2511.01282	null
2025-11-04	CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing	Yifan Zhou et.al.	2511.01197	null
2025-11-04	SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding	Jameson Sandler et.al.	2511.00606	null
2025-11-01	FlashEVA: Accelerating LLM inference via Efficient Attention	Juan Gabriel Kostelec et.al.	2511.00576	null
2025-10-31	AMD MI300X GPU Performance Analysis	Chandrish Ambati et.al.	2510.27583	null
2025-10-31	Glia: A Human-Inspired AI for Automated Systems Design and Optimization	Pouya Hamadanian et.al.	2510.27176	null
2025-10-30	Beyond Benchmarks: The Economics of AI Inference	Boqin Zhuang et.al.	2510.26136	null
2025-10-31	AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache	Dinghong Song et.al.	2510.25979	null
2025-10-31	NeuronMM: High-Performance Matrix Multiplication for LLM Inference on AWS Trainium	Dinghong Song et.al.	2510.25977	null
2025-10-29	Serve Programs, Not Prompts	In Gim et.al.	2510.25412	null
2025-10-26	Batch Speculative Decoding Done Right	Ranran Haoran Zhang et.al.	2510.22876	null
2025-10-26	Do Stop Me Now: Detecting Boilerplate Responses with a Single Iteration	Yuval Kainan et.al.	2510.22679	null
2025-10-26	SABlock: Semantic-Aware KV Cache Eviction with Adaptive Compression Block Size	Jinhan Chen et.al.	2510.22556	null
2025-10-22	Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs	Hongyi Liu et.al.	2510.20064	null
2025-10-22	Are Large Language Models Sensitive to the Motives Behind Communication?	Addison J. Wu et.al.	2510.19687	null
2025-10-30	DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference	Xiang Liu et.al.	2510.19669	null
2025-10-21	SLICE: SLO-Driven Scheduling for LLM Inference on Edge Computing Devices	Pan Zhou et.al.	2510.18544	null
2025-10-19	Justitia: Fair and Efficient Scheduling for LLM Applications	Mingyan Yang et.al.	2510.17015	null
2025-10-18	FourierCompress: Layer-Aware Spectral Activation Compression for Efficient and Accurate Collaborative LLM Inference	Jian Ma et.al.	2510.16418	null
2025-10-16	AMS-QUANT: Adaptive Mantissa Sharing for Floating-point Quantization	Mengtao Lv et.al.	2510.16045	null
2025-10-16	Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing	Tianhua Xia et.al.	2510.16040	null
2025-10-28	TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs	Sibo Xiao et.al.	2510.15545	null
2025-10-16	Tail-Optimized Caching for LLM Inference	Wenxin Zhang et.al.	2510.15152	null
2025-10-16	xLLM Technical Report	Tongxuan Liu et.al.	2510.14686	null
2025-10-16	MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving	Jungi Lee et.al.	2510.14557	null
2025-10-16	FairBatching: Fairness-Aware Batch Formation for LLM Inference	Hongtao Lyu et.al.	2510.14392	null
2025-10-16	Qwen3Guard Technical Report	Haiquan Zhao et.al.	2510.14276	null
2025-10-15	Efficiently Executing High-throughput Lightweight LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management	Thanh Son Phung et.al.	2510.14024	null
2025-10-15	Adaptive Rescheduling in Prefill-Decode Disaggregated LLM Inference	Zhibin Wang et.al.	2510.13668	null
2025-10-15	F-BFQ: Flexible Block Floating-Point Quantization Accelerator for LLMs	Jude Haris et.al.	2510.13401	null
2025-10-15	Taming the Fragility of KV Cache Eviction in LLM Inference	Yuan Feng et.al.	2510.13334	null
2025-10-15	Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference	Nikhil Bhendawade et.al.	2510.13161	null
2025-10-14	Beyond Postconditions: Can Large Language Models infer Formal Contracts for Automatic Software Verification?	Cedric Richter et.al.	2510.12702	null
2025-10-14	Traveling Salesman-Based Token Ordering Improves Stability in Homomorphically Encrypted Language Models	Donghwan Rho et.al.	2510.12343	null
2025-10-13	Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding	Bingjie Zhu et.al.	2510.11331	null
2025-10-13	Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs	João Paulo Cardoso de Lima et.al.	2510.11192	null
2025-10-11	CacheClip: Accelerating RAG with Effective KV Cache Reuse	Bin Yang et.al.	2510.10129	null
2025-10-10	FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference	Yu-Chen Lu et.al.	2510.09332	null
2025-10-10	Semantic-Condition Tuning: Fusing Graph Context with Large Language Models for Knowledge Graph Completion	Ruitong Liu et.al.	2510.08966	null
2025-10-13	Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors	Xin Liu et.al.	2510.08907	null
2025-10-09	SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference	Hengrui Zhang et.al.	2510.08544	null
2025-10-09	From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill	Gunjun Lee et.al.	2510.08055	null
2025-10-09	Augur: Modeling Covariate Causal Associations in Time Series via Large Language Models	Zhiqing Cui et.al.	2510.07858	null
2025-10-09	OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference	Yuzhe Gu et.al.	2510.07651	null
2025-10-08	Accelerating Diffusion LLM Inference via Local Determinism Propagation	Fanheng Kong et.al.	2510.07081	null
2025-10-08	Accelerating Sparse Ternary GEMM for Quantized LLM inference on Apple Silicon	Baraq Lipshitz et.al.	2510.06957	null
2025-10-07	VecInfer: Efficient LLM Inference with Low-Bit KV Cache via Outlier-Suppressed Vector Quantization	Dingyu Yao et.al.	2510.06175	null
2025-10-07	lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models	Haoxin Wang et.al.	2510.06126	null
2025-10-07	From Principles to Practice: A Systematic Study of LLM Serving on Multi-core NPUs	Tianhao Zhu et.al.	2510.05632	null
2025-10-06	KVLinC : KV Cache Quantization with Hadamard Rotation and Linear Correction	Utkarsh Saxena et.al.	2510.05373	null
2025-10-06	A novel hallucination classification framework	Maksym Zavhorodnii et.al.	2510.05189	null
2025-10-06	RevMine: An LLM-Assisted Tool for Code Review Mining and Analysis Across Git Platforms	Samah Kansab et.al.	2510.04796	null
2025-10-05	Speculative Actions: A Lossless Framework for Faster Agentic Systems	Naimeng Ye et.al.	2510.04371	null
2025-10-03	Best-of-Majority: Minimax-Optimal Strategy for Pass@ $k$ Inference Scaling	Qiwei Di et.al.	2510.03199	null
2025-10-03	Dissecting Transformers: A CLEAR Perspective towards Green AI	Hemang Jain et.al.	2510.02810	null
2025-10-03	HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference	Shubham Negi et.al.	2510.02675	null
2025-10-01	PolyLink: A Blockchain Based Decentralized Edge AI Platform for LLM Inference	Hongbo Liu et.al.	2510.02395	null
2025-10-03	Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey	Qiyuan Liu et.al.	2510.01925	null
2025-10-02	SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning	Shicheng Liu et.al.	2510.01832	null
2025-10-01	HiSpec: Hierarchical Speculative Decoding for LLMs	Avinash Kumar et.al.	2510.01336	null
2025-10-01	Generalized Parallel Scaling with Interdependent Generations	Harry Dong et.al.	2510.01143	null
2025-10-01	AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size	Guanxi Lu et.al.	2509.26432	null
2025-09-30	Parallax: Efficient LLM Inference Service over Decentralized Environment	Chris Tong et.al.	2509.26182	null
2025-09-30	Accelerating LLM Inference with Precomputed Query Storage	Jay H. Park et.al.	2509.25919	null
2025-09-30	SAIL: SRAM-Accelerated LLM Inference System with Lookup-Table-based GEMV	Jingyao Zhang et.al.	2509.25853	null
2025-09-29	SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching	Xinye Zhao et.al.	2509.24832	null
2025-09-29	Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding	Sungkyun Kim et.al.	2509.24328	null
2025-09-29	VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference	Ke Wang et.al.	2509.24257	null
2025-09-28	Collaborative Device-Cloud LLM Inference through Reinforcement Learning	Wenzhi Fang et.al.	2509.24050	null
2025-10-01	A Predictive and Synergistic Two-Layer Scheduling Framework for LLM Serving	Yue Zhang et.al.	2509.23384	null
2025-09-27	Scaling LLM Test-Time Compute with Mobile NPU on Smartphones	Zixu Hao et.al.	2509.23324	null
2025-09-27	Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization	Vage Egiazarian et.al.	2509.23202	null
2025-09-26	Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs	Shirin Alanova et.al.	2509.22166	null
2025-09-26	Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding	Shijing Hu et.al.	2509.22134	null
2025-09-26	SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation	Haotian Tan et.al.	2509.21932	null
2025-09-25	Preemptive Detection and Steering of LLM Misalignment via Latent Reachability	Sathwik Karnik et.al.	2509.21528	null
2025-09-25	Semantic Edge-Cloud Communication for Real-Time Urban Traffic Surveillance with ViT and LLMs over Mobile Networks	Murat Arda Onsu et.al.	2509.21259	null
2025-09-24	FastEagle: Cascaded Drafting for Accelerating Speculative Decoding	Haiduo Huang et.al.	2509.20416	null
2025-09-24	Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment	Deokjae Lee et.al.	2509.20214	null
2025-09-24	Gyges: Dynamic Cross-Instance Parallelism Transformation for Efficient LLM Inference	Haoyu Chen et.al.	2509.19729	null
2025-09-23	Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs	Marcin Chrapek et.al.	2509.18886	null
2025-09-22	Multimodal Health Risk Prediction System for Chronic Diseases via Vision-Language Fusion and Large Language Models	Dingxin Lu et.al.	2509.18221	null
2025-09-28	Disaggregated Prefill and Decoding Inference System for Large Language Model Serving on Multi-Vendor GPUs	Xing Chen et.al.	2509.17542	null
2025-09-22	Cronus: Efficient LLM inference on Heterogeneous GPU Clusters via Partially Disaggregated Prefill	Yunzhao Liu et.al.	2509.17357	null
2025-09-22	Multi-View Attention Multiple-Instance Learning Enhanced by LLM Reasoning for Cognitive Distortion Detection	Jun Seo Kim et.al.	2509.17292	null
2025-09-21	MoA-Off: Adaptive Heterogeneous Modality-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference	Zheming Yang et.al.	2509.16995	null
2025-09-20	Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads	Mert Hidayetoglu et.al.	2509.16495	null
2025-09-19	LightCode: Compiling LLM Inference for Photonic-Electronic Systems	Ryan Tomich et.al.	2509.16443	null
2025-09-19	LLM Cache Bandit Revisited: Addressing Query Heterogeneity for Cost-Effective LLM Inference	Hantao Yang et.al.	2509.15515	null
2025-09-18	A1: Asynchronous Test-Time Scaling via Conformal Prediction	Jing Xiong et.al.	2509.15148	null
2025-09-18	LEAP: LLM Inference on Scalable PIM-NoC Architecture with Balanced Dataflow and Fine-Grained Parallelism	Yimin Wang et.al.	2509.14781	null
2025-09-18	LLM Jailbreak Detection for (Almost) Free!	Guorui Chen et.al.	2509.14558	null
2025-09-17	TENET: An Efficient Sparsity-Aware LUT-Centric Architecture for Ternary LLM Inference On Edge	Zhirui Huang et.al.	2509.13765	null
2025-09-16	Scaling Up Throughput-oriented LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management	Thanh Son Phung et.al.	2509.13201	null
2025-09-16	HPIM: Heterogeneous Processing-In-Memory-based Accelerator for Large Language Models Inference	Cenlin Duan et.al.	2509.12993	null
2025-09-15	Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference	Synthia Wang et.al.	2509.12152	null
2025-09-14	Framing AI System Benchmarking as a Learning Task: FlexBench and the Open MLPerf Dataset	Grigori Fursin et.al.	2509.11413	null
2025-09-14	PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits	Loka Li et.al.	2509.11362	null
2025-09-14	AQUA: Attention via QUery mAgnitudes for Memory and Compute Efficient Inference in LLMs	Santhosh G S et.al.	2509.11155	null
2025-09-12	MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and Repetitiveness	Huizheng Wang et.al.	2509.10372	null
2025-09-11	LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation	Yiqun Shen et.al.	2509.09754	null
2025-09-11	Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference	Haoran Wu et.al.	2509.09505	null
2025-08-06	Frontier: Simulating the Next Generation of LLM Inference Systems	Yicheng Feng et.al.	2508.03148	null
2025-07-25	Cloud Native System for LLM Inference Serving	Minxian Xu et.al.	2507.18007	null
2025-07-23	BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving	Wanyi Zheng et.al.	2507.17120	null
2025-07-22	Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework	Hongyi Tang et.al.	2507.16414	null
2025-07-21	Efficient Routing of Inference Requests across LLM Instances in Cloud-Edge Computing	Shibo Yu et.al.	2507.15553	null
2025-07-18	Efficient LLM Inference: Bandwidth, Compute, Synchronization, and Capacity are all you need	Michael Davies et.al.	2507.14397	null
2025-07-18	Can LLMs Infer Personality from Real World Conversations?	Jianfeng Zhu et.al.	2507.14355	null
2025-07-23	Photonic Fabric Platform for AI Accelerators	Jing Ding et.al.	2507.14000	null
2025-07-18	LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues	Haoyang Li et.al.	2507.13681	null
2025-07-16	Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage	Junqing Lin et.al.	2507.12205	null
2025-07-15	MIRAGE: KV Cache Optimization through Parameter Remapping for Multi-tenant LLM Serving	Ruihao Li et.al.	2507.11507	null
2025-07-15	Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations	Miray Özcan et.al.	2507.11417	null
2025-07-14	Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference	Jiaming Cheng et.al.	2507.09942	null
2025-07-12	SLIM: A Heterogeneous Accelerator for Edge Inference of Sparse Large Language Model via Adaptive Thresholding	Weihong Xu et.al.	2507.09201	null
2025-07-11	On Evaluating Performance of LLM Inference Serving Systems	Amey Agrawal et.al.	2507.09019	null
2025-07-11	Hybrid Systolic Array Accelerator with Optimized Dataflow for Edge Large Language Model Inference	Chun-Ting Chen et.al.	2507.09010	null
2025-07-11	InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching	Yilun Wang et.al.	2507.08523	null
2025-07-10	Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions	Quanyan Zhu et.al.	2507.08208	null
2025-07-10	Krul: Efficient State Restoration for Multi-turn Conversations with Dynamic Cross-layer KV Sharing	Junyi Wen et.al.	2507.08045	null
2025-07-15	Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models	Varin Sikka et.al.	2507.07505	null
2025-07-11	QUEST: Query Optimization in Unstructured Document Analysis	Zhaoze Sun et.al.	2507.06515	null
2025-07-08	Voltage Regulation in Distribution Systems with Data Center Loads	Yize Chen et.al.	2507.06416	null
2025-07-07	Cascade: Token-Sharded Private LLM Inference	Rahul Thomas et.al.	2507.05228	null
2025-07-07	Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?	Yun Qu et.al.	2507.04632	null
2025-07-05	Enhancing Adaptive Behavioral Interventions with LLM Inference from Participant-Described States	Karine Karine et.al.	2507.03871	null
2025-07-05	OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference	Seungjun Shin et.al.	2507.03865	null
2025-07-04	Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA	Jindong Li et.al.	2507.03308	null
2025-07-03	HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference	Weishu Deng et.al.	2507.03153	null
2025-07-03	On the Convergence of Large Language Model Optimizer for Black-Box Network Management	Hoon Lee et.al.	2507.02689	null
2025-07-03	Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure	Rui Xie et.al.	2507.02654	null
2025-07-03	FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference	Xing Liu et.al.	2507.02620	null
2025-07-02	Dissecting the Impact of Mobile DVFS Governors on LLM Inference Performance and Energy Efficiency	Zongpu Zhang et.al.	2507.02135	null
2025-07-02	LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation	Tianyu Liu et.al.	2507.01449	null
2025-07-02	SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech	Cheng Zhuangfei et.al.	2507.01348	null
2025-07-02	La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation	Kai Liu et.al.	2507.01299	null
2025-07-01	VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator	Zhican Wang et.al.	2507.00797	null
2025-07-01	Cognitive Load-Aware Inference: A Neuro-Symbolic Framework for Optimizing the Token Economy of Large Language Models	Yilun Zhang et.al.	2507.00653	null
2025-07-01	LLM-Mesh: Enabling Elastic Sharing for Serverless LLM Inference	Chuhao Xu et.al.	2507.00507	null
2025-07-01	Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and High-Performance GPUs	Mohammad Firas Sada et.al.	2507.00418	null
2025-06-30	Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission	Faranaksadat Solat et.al.	2507.00082	null
2025-06-27	QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization	Danush Khanna et.al.	2506.22396	null
2025-06-27	Towards Operational Data Analytics Chatbots – Virtual Knowledge Graph is All You Need	Junaid Ahmed Khan et.al.	2506.22267	null
2025-06-27	SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference	Yongchao He et.al.	2506.22033	null
2025-06-30	A Survey of LLM Inference Systems	James Pan et.al.	2506.21901	null
2025-06-17	Utility-Driven Speculative Decoding for Mixture-of-Experts	Anish Saxena et.al.	2506.20675	null
2025-07-02	Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU	He Sun et.al.	2506.20187	null
2025-06-24	MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection	Zhengxiang Huang et.al.	2506.19884	null
2025-06-23	Black-Box Test Code Fault Localization Driven by Large Language Models and Execution Estimation	Ahmadreza Saboor Yaraghi et.al.	2506.19045	null
2025-06-23	WiLLM: An Open Wireless LLM Communication System	Boyi Liu et.al.	2506.19030	null
2025-06-23	CommVQ: Commutative Vector Quantization for KV Cache Compression	Junyan Li et.al.	2506.18879	null
2025-06-22	Mechanistic Interpretability in the Presence of Architectural Obfuscation	Marcos Florencio et.al.	2506.18053	null
2025-06-20	Towards AI Search Paradigm	Yuchen Li et.al.	2506.17188	null
2025-06-17	CrEst: Credibility Estimation for Contexts in LLMs via Weak Supervision	Dyah Adila et.al.	2506.14912	null
2025-06-16	Vector Ontologies as an LLM world view extraction method	Kaspar Rothenfusser et.al.	2506.13252	link
2025-06-13	Semantic Scheduling for LLM Inference	Wenyue Hua et.al.	2506.12204	link
2025-06-13	GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news	Abdul Haque et.al.	2506.11600	null
2025-06-13	Collaborative LLM Inference via Planning for Efficient Reasoning	Byeongchan Lee et.al.	2506.11578	null
2025-06-13	Efficient Long-Context LLM Inference via KV Cache Clustering	Jie Hu et.al.	2506.11418	null
2025-06-12	TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference	Hongbin Zhang et.al.	2506.10470	null
2025-06-11	A First Look at Bugs in LLM Inference Engines	Mugeng Liu et.al.	2506.09713	link
2025-06-12	Understanding the Performance and Power of LLM Inferencing on Edge Accelerators	Mayank Arya et.al.	2506.09554	null
2025-06-11	Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning	Jiayi Yuan et.al.	2506.09501	null
2025-06-10	Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive- $k$	Chihiro Taguchi et.al.	2506.08479	null
2025-06-10	Draft-based Approximate Inference for LLMs	Kevin Galim et.al.	2506.08373	link
2025-06-09	MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts	Wei Tao et.al.	2506.07533	null
2025-06-07	Containerized In-Storage Processing and Computing-Enabled SSD Disaggregation	Miryeong Kwon et.al.	2506.06769	null
2025-06-06	Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques	Adarsh Prasad Behera et.al.	2506.06579	null
2025-06-04	On the Fundamental Impossibility of Hallucination Control in Large Language Models	Michał P. Karpowicz et.al.	2506.06382	null
2025-06-04	SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling	Anhao Zhao et.al.	2506.04179	null
2025-06-04	Pre $^3$ : Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation	Junyi Chen et.al.	2506.03887	null
2025-06-04	Client-Side Zero-Shot LLM Inference for Comprehensive In-Browser URL Analysis	Avihay Cohen et.al.	2506.03656	null
2025-06-04	POSS: Position Specialist Generates Better Draft for Speculative Decoding	Langlin Huang et.al.	2506.03566	link
2025-06-07	Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs	Jiakun Fan et.al.	2506.03296	null
2025-06-03	Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs	Shangmin Guo et.al.	2506.02918	null
2025-06-03	HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference	Ping Gong et.al.	2506.02572	link
2025-06-02	Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts	Spencer Banasik et.al.	2506.01827	null
2025-05-30	Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching	Juan Wisznia et.al.	2505.24643	null
2025-05-30	LLM Inference Enhanced by External Knowledge: A Survey	Yu-Hsuan Lin et.al.	2505.24377	link
2025-05-30	SkyLB: A Locality-Aware Cross-Region Load Balancer for LLM Inference	Tian Xia et.al.	2505.24095	null
2025-05-29	Large Language Model Meets Constraint Propagation	Alexandre Bonlarron et.al.	2505.24012	null
2025-05-29	Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism	Jinhui Wei et.al.	2505.23219	null
2025-05-29	SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference	Yinghao Tang et.al.	2505.23022	null
2025-05-28	Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference	Donghyeon Joo et.al.	2505.22913	link
2025-05-28	Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference	Yue Zhu et.al.	2505.21919	null
2025-05-28	HoliTom: Holistic Token Merging for Fast Video Large Language Models	Kele Shao et.al.	2505.21334	link
2025-05-28	FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration	Daehyeon Baek et.al.	2505.20839	null
2025-05-26	HAMburger: Accelerating LLM Inference via Token Smashing	Jingyu Liu et.al.	2505.20438	null
2025-05-26	MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE	Zongle Huang et.al.	2505.19645	null
2025-05-26	WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference	Sihan Chen et.al.	2505.19427	link
2025-05-25	DECA: A Near-Core LLM Decompression Accelerator Supporting Out-of-Order Invocation	Gerasimos Gerogiannis et.al.	2505.19349	null
2025-06-03	A Survey of LLM $\times$ DATA	Xuanhe Zhou et.al.	2505.18458	null
2025-05-23	An Attack to Break Permutation-Based Private Third-Party Inference Schemes for LLMs	Rahul Thomas et.al.	2505.18332	null
2025-05-23	NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache	Donghyun Son et.al.	2505.18231	null
2025-05-23	Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning	Michael Hassid et.al.	2505.17813	null
2025-05-23	DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies	Ning Yang et.al.	2505.17420	null
2025-05-22	RAP: Runtime-Adaptive Pruning for LLM Inference	Huanrong Liu et.al.	2505.17138	null
2025-05-22	CASTILLO: Characterizing Response Length Distributions of Large Language Models	Daniel F. Perez-Ramirez et.al.	2505.16881	link
2025-05-22	Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization	Vera Neplenbroek et.al.	2505.16467	link
2025-05-22	QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design	Benjamin Schneider et.al.	2505.16175	link
2025-05-22	KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization	Mingbo Song et.al.	2505.16162	null
2025-05-20	Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity	Susav Shrestha et.al.	2505.14884	link
2025-05-20	ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions	Bufang Yang et.al.	2505.14668	null
2025-05-20	ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMs	Yifan Sui et.al.	2505.14468	null
2025-05-16	An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents	Ayesha Amjad et.al.	2505.13504	null
2025-05-19	HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding	Siran Liu et.al.	2505.13254	null
2025-05-19	FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference	Guangda Liu et.al.	2505.13109	null
2025-05-19	FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks	Zihua Wang et.al.	2505.12728	link
2025-05-17	Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning	Yuheng Lu et.al.	2505.11922	null
2025-05-17	Arrow: Adaptive Scheduling Mechanisms for Disaggregated LLM Inference Architecture	Yu Wu et.al.	2505.11916	null
2025-05-16	TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference	Raja Gond et.al.	2505.11329	link
2025-05-16	Vaiage: A Multi-Agent Solution to Personalized Travel Planning	Binwen Liu et.al.	2505.10922	null
2025-05-19	SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices	Xiangwen Zhuge et.al.	2505.10259	link
2025-05-15	ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production	Yuxing Xiang et.al.	2505.09999	link
2025-05-15	How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference	Nidhal Jegham et.al.	2505.09598	null
2025-05-14	Statistical Modeling and Uncertainty Estimation of LLM Inference Systems	Kaustabha Ray et.al.	2505.09319	null
2025-05-15	ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor	Seungbeom Choi et.al.	2505.09142	null
2025-05-13	LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries	Zekun Wu et.al.	2505.08842	null
2025-05-13	Automatic Task Detection and Heterogeneous LLM Speculative Decoding	Danying Ge et.al.	2505.08600	null
2025-05-08	Scaling Laws for Speculative Decoding	Siyuan Yan et.al.	2505.07858	null
2025-05-12	SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models	Hang Wu et.al.	2505.07680	null
2025-05-12	Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity	Guang Yan et.al.	2505.07239	null
2025-05-12	PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications	Kuntai Du et.al.	2505.07203	null
2025-05-14	I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference	Zibo Gao et.al.	2505.06738	null
2025-05-09	Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference	Haolin Zhang et.al.	2505.06461	null
2025-05-09	Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM	Zehao Fan et.al.	2505.05772	null
2025-05-08	HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow	You Peng et.al.	2505.05286	link
2025-05-06	Faster MoE LLM Inference for Extremely Large Models	Haoqi Yang et.al.	2505.03531	null
2025-05-05	RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference	Yaoqi Chen et.al.	2505.02922	null
2025-05-03	High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers	Brian Wong et.al.	2505.01693	null
2025-05-08	A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency	Sihyeong Park et.al.	2505.01658	link
2025-05-02	PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding	Bradley McDanel et.al.	2505.01572	null
2025-04-28	AutoJudge: Judge Decoding Without Manual Annotation	Roman Garipov et.al.	2504.20039	null
2025-04-28	Taming the Titans: A Survey of Efficient LLM Inference Serving	Ranran Zhen et.al.	2504.19720	link
2025-04-28	R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference	Zhenyu Zhang et.al.	2504.19449	null
2025-05-07	A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification	Junichiro Niimi et.al.	2504.18884	link
2025-04-29	PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation	Zihao An et.al.	2504.18583	null
2025-04-25	PropRAG: Guiding Retrieval with Beam Search over Proposition Paths	Jingjin Wang et.al.	2504.18070	null
2025-04-24	L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference	Qingyuan Liu et.al.	2504.17584	null
2025-04-24	On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration	Maoyang Xiang et.al.	2504.17376	null
2025-04-18	HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing	Myunghyun Rhee et.al.	2504.16112	null
2025-04-22	Token-Aware Coding Flow: A Study with Nano Surge in Reasoning Model	Junwei Hu et.al.	2504.15989	null
2025-04-23	KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments	Junyoung Park et.al.	2504.15364	null
2025-04-18	High-Throughput LLM inference on Heterogeneous Clusters	Yi Xiong et.al.	2504.15303	null
2025-04-21	Hardware-based Heterogeneous Memory Management for Large Language Model Inference	Soojin Hwang et.al.	2504.14893	null
2025-04-19	Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator	Akshat Ramachandran et.al.	2504.14365	null
2025-04-19	FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference	Coleman Hooper et.al.	2504.14152	null
2025-04-16	Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading	Kihyun Kim et.al.	2504.11816	link
2025-04-16	Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs	Hyungwoo Lee et.al.	2504.11765	null
2025-04-16	Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures	Prabhu Vellaisamy et.al.	2504.11750	null
2025-04-15	Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints	Ruicheng Ao et.al.	2504.11320	link
2025-04-14	HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving	Avinash Kumar et.al.	2504.10724	null
2025-04-14	AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference	Yangshen Deng et.al.	2504.10326	null
2025-04-14	KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference	Yuxuan Tian et.al.	2504.09936	null
2025-04-22	Understanding and Optimizing Multi-Stage AI Inference Pipelines	Abhimanyu Rajeshkumar Bambhaniya et.al.	2504.09775	null
2025-04-13	LoopLynx: A Scalable Dataflow Architecture for Efficient LLM Inference	Jianing Zheng et.al.	2504.09561	link
2025-04-12	MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints	Yichao Yuan et.al.	2504.09345	null
2025-04-11	SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting	Jiaming Xu et.al.	2504.08850	null
2025-04-10	SD $^2$ : Self-Distilled Sparse Drafters	Mike Lasby et.al.	2504.08838	null
2025-04-11	Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash	Fucheng Jia et.al.	2504.08378	null
2025-04-11	Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices	Shengyuan Ye et.al.	2504.08242	null
2025-04-10	Token Level Routing Inference System for Edge Devices	Jianshu She et.al.	2504.07878	null
2025-04-10	Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving	Shihong Gao et.al.	2504.07494	link
2025-04-10	UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference	Weikai Xu et.al.	2504.07479	null
2025-04-10	Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents	Yueying Li et.al.	2504.07347	null
2025-04-08	SPIRe: Boosting LLM Inference Throughput with Speculative Decoding	Sanjit Neelam et.al.	2504.06419	null
2025-04-08	Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching	Yanhao Dong et.al.	2504.06319	null
2025-04-09	Hogwild! Inference: Parallel LLM Generation via Concurrent Attention	Gleb Rodionov et.al.	2504.06261	link
2025-04-11	User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems	Jianling Wang et.al.	2504.05522	null
2025-04-07	Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness	Dongzhuoran Zhou et.al.	2504.05163	null
2025-04-04	Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency	Erik Johannes Husom et.al.	2504.03360	null
2025-04-04	Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation	Weitao Li et.al.	2504.03165	link
2025-04-03	Narrative Studio: Visual narrative exploration using LLMs and Monte Carlo Tree Search	Parsa Ghaffari et.al.	2504.02426	link
2025-04-01	SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching	Yuxuan Zhu et.al.	2504.00970	null
2025-04-03	Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding	Aayush Gautam et.al.	2504.00030	null
2025-04-06	ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance	Tong Xie et.al.	2503.24053	link
2025-03-31	MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration	Tatsuya Kubo et.al.	2503.23817	null
2025-03-30	Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference	Wei Tao et.al.	2503.23294	null
2025-03-28	Niyama : Breaking the Silos of LLM Inference Serving	Kanishk Goel et.al.	2503.22562	null
2025-03-25	LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation	Han Chen et.al.	2503.19950	link
2025-03-24	xKV: Cross-Layer SVD for KV-Cache Compression	Chi-Chih Chang et.al.	2503.18893	link
2025-03-27	Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design	Rui Xie et.al.	2503.18869	null
2025-03-24	Jenga: Effective Memory Management for Serving LLM with Heterogeneity	Chen Zhang et.al.	2503.18292	null
2025-03-27	WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference	Youhui Zuo et.al.	2503.17922	link
2025-03-22	PipeBoost: Resilient Pipelined Architecture for Fast Serverless LLM Scaling	Chongpeng Liu et.al.	2503.17707	null
2025-03-21	V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms	Javier J. Poveda Rodrigo et.al.	2503.17422	null
2025-03-21	Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation	Jingzhi Fang et.al.	2503.16893	null
2025-03-20	SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models	Fahao Chen et.al.	2503.15921	null
2025-03-19	Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study	Jomar Thomas Almonte et.al.	2503.15248	null
2025-03-19	Communication-Efficient Distributed On-Device LLM Inference Over Wireless Networks	Kai Zhang et.al.	2503.14882	null
2025-03-18	PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play	Wei Fang et.al.	2503.14432	null
2025-03-17	Mitigating KV Cache Competition to Enhance User Experience in LLM Inference	Haiying Shen et.al.	2503.13773	null
2025-03-17	AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications	Haiying Shen et.al.	2503.13737	null
2025-03-17	ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts	Evangelos Georganas et.al.	2503.13565	null
2025-03-14	Examples as the Prompt: A Scalable Approach for Efficient LLM Adaptation in E-Commerce	Jingying Zeng et.al.	2503.13518	null
2025-03-17	xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference	Maximilian Beck et.al.	2503.13427	link
2025-03-17	VeriLeaky: Navigating IP Protection vs Utility in Fine-Tuning for LLM-Driven Verilog Coding	Zeng Wang et.al.	2503.13116	null
2025-03-15	TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation	Mayank Kumar et.al.	2503.12217	null
2025-03-09	Green Prompting	Marta Adamska et.al.	2503.10666	null
2025-03-13	Collaborative Speculative Inference for Efficient LLM Inference Serving	Luyao Gao et.al.	2503.10325	null
2025-03-12	Prompt Inference Attack on Distributed Large Language Model Inference Frameworks	Xinjian Luo et.al.	2503.09291	null
2025-03-11	TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems	Feiyang Wu et.al.	2503.08415	link
2025-03-11	Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference	Pol G. Recasens et.al.	2503.08311	null
2025-03-09	Seesaw: High-throughput LLM Inference via Model Re-sharding	Qidong Su et.al.	2503.06433	null
2025-03-07	Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching	Bowen Pang et.al.	2503.05248	link
2025-03-07	SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding	Kaiyu Huang et.al.	2503.05096	null
2025-03-15	Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking	Yijie Xu et.al.	2503.04636	null
2025-03-06	AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services	Xiaoqi Wang et.al.	2503.04418	null
2025-03-06	Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search	Kou Misaki et.al.	2503.04412	null
2025-03-06	Beyond Memorization: Evaluating the True Type Inference Capabilities of LLMs for Java Code Snippets	Yiwen Dong et.al.	2503.04076	null
2025-03-04	FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference	Hongchao Du et.al.	2503.03777	null
2025-03-05	MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems	Rui Ye et.al.	2503.03686	null
2025-03-04	VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference	Zihan Liu et.al.	2503.02236	null
2025-02-26	Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis	Long Cheng et.al.	2503.01873	null
2025-03-03	SAGE: A Framework of Precise Retrieval for RAG	Jintao Zhang et.al.	2503.01713	null
2025-03-03	DILEMMA: Joint LLM Quantization and Distributed LLM Inference Over Edge Computing Systems	Minoo Hosseinzadeh et.al.	2503.01704	null
2025-03-01	Tutorial Proposal: Speculative Decoding for Efficient LLM Inference	Heming Xia et.al.	2503.00491	null
2025-02-28	FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference	Xunhao Lai et.al.	2502.20766	link
2025-02-28	SPD: Sync-Point Drop for efficient tensor parallelism of Large Language Models	Han-Byul Kim et.al.	2502.20727	null
2025-02-27	ECCOS: Efficient Capability and Cost Coordinated Scheduling for Multi-LLM Serving	Kai Mei et.al.	2502.20576	link
2025-02-26	Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs	Yiheng Yang et.al.	2502.19078	null
2025-02-24	LLM Inference Acceleration via Efficient Operation Fusion	Mahsa Salmani et.al.	2502.17728	null
2025-02-24	CodeSwift: Accelerating LLM Inference for Efficient Code Generation	Qianhui Zhao et.al.	2502.17139	null
2025-02-24	Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM	Lian Liu et.al.	2502.16963	null
2025-02-24	DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance	Xuanfan Ni et.al.	2502.16886	null
2025-03-01	CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter	Yepeng Weng et.al.	2502.16880	null
2025-02-23	DISC: Dynamic Decomposition Improves LLM Inference Scaling	Jonathan Light et.al.	2502.16706	null
2025-02-23	TerEffic: Highly Efficient Ternary LLM Inference on FPGA	Chenyang Yin et.al.	2502.16473	null
2025-02-21	KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse	Jingbo Yang et.al.	2502.16002	link
2025-02-21	Towards Swift Serverless LLM Cold Starts with ParaServe	Chiheng Lou et.al.	2502.15524	null
2025-02-24	HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings	Rasmus Aavang et.al.	2502.15411	link
2025-02-24	Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference	Yaohua Tang et.al.	2502.15294	null
2025-02-21	A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation	Shilong Hou et.al.	2502.15233	link
2025-02-19	EvoP: Robust LLM Inference via Evolutionary Pruning	Shangyu Wu et.al.	2502.14910	null
2025-02-20	Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale	Shashwat Jaiswal et.al.	2502.14617	null
2025-02-20	SR-LLM: Rethinking the Structured Representation in Large Language Model	Jiahuan Zhang et.al.	2502.14352	null
2025-02-19	RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression	Payman Behnam et.al.	2502.14051	null
2025-02-19	Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference	Qingfa Xiao et.al.	2502.13542	null
2025-02-19	What are Models Thinking about? Understanding Large Language Model Hallucinations “Psychology” through Model Inner State Analysis	Peiran Wang et.al.	2502.13490	null
2025-02-18	BaKlaVa – Budgeted Allocation of KV cache for Long-context Inference	Ahmed Burak Gulhan et.al.	2502.13176	null
2025-02-18	R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs	Sumin Jo et.al.	2502.12767	link
2025-02-18	HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading	Cheng Luo et.al.	2502.12574	link
2025-02-18	Distributed On-Device LLM Inference With Over-the-Air Computation	Kai Zhang et.al.	2502.12559	null
2025-02-18	SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs	Ahmed F. AbouElhamayed et.al.	2502.12444	link
2025-02-17	Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs	Kan Zhu et.al.	2502.12216	null
2025-02-17	Designing Role Vectors to Improve LLM Inference Behaviour	Daniele Potertì et.al.	2502.12055	null
2025-02-17	DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services	Ting Sun et.al.	2502.11417	null
2025-02-17	Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment	Ben Dong et.al.	2502.11347	null
2025-02-16	Diversified Sampling Improves Scaling LLM inference	Tianchun Wang et.al.	2502.11027	null
2025-02-16	Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings	Liangqi Yuan et.al.	2502.11007	link
2025-02-15	Pushing up to the Limit of Memory Bandwidth and Capacity Utilization for Efficient LLM Decoding on Embedded FPGA	Jindong Li et.al.	2502.10659	null
2025-02-14	λScale: Enabling Fast Scaling for Serverless Large Language Model Inference	Minchen Yu et.al.	2502.09922	null
2025-02-14	INF^2: High-Throughput Generative Inference of Large Language Models using Near-Storage Processing	Hongsun Jang et.al.	2502.09921	null
2025-02-13	On multi-token prediction for efficient LLM inference	Somesh Mehra et.al.	2502.09419	null
2025-02-13	InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU	Heejun Lee et.al.	2502.08910	null
2025-02-12	Universal Model Routing for Efficient LLM Inference	Wittawat Jitkrittum et.al.	2502.08773	null
2025-02-12	Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences	Shanshan Han et.al.	2502.08142	null
2025-02-11	HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment	Youhe Jiang et.al.	2502.07903	null
2025-02-11	SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters	Yiping Wang et.al.	2502.07832	null
2025-02-11	PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference	Yufeng Gu et.al.	2502.07578	link
2025-02-13	Online Scheduling for LLM Inference with KV Cache Constraints	Patrick Jaillet et.al.	2502.07115	null
2025-02-08	Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models	Soham Poddar et.al.	2502.05610	null
2025-02-08	Mechanistic Interpretability of Emotion Inference in Large Language Models	Ala N. Tak et.al.	2502.05489	null
2025-02-07	BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference	Reena Elangovan et.al.	2502.05376	null
2025-02-07	LLM Query Scheduling with Prefix Reuse and Latency Constraints	Gregory Dexter et.al.	2502.04677	null
2025-02-06	WaferLLM: A Wafer-Scale LLM Inference System	Congjie He et.al.	2502.04563	null
2025-02-06	KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference	Xing Li et.al.	2502.04420	link
2025-02-06	CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference	Zehua Pei et.al.	2502.04416	link
2025-02-06	AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference	Qingyue Yang et.al.	2502.04077	link
2025-02-06	Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective	Yuan Feng et.al.	2502.03805	link
2025-02-06	Adaptive Semantic Prompt Caching with VectorQ	Luis Gaspar Schroeder et.al.	2502.03771	null
2025-02-05	HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference	Zeyu Zhang et.al.	2502.03589	null
2025-02-05	Accessible and Portable LLM Inference by Compiling Computational Graphs into SQL	Wenbo Sun et.al.	2502.02818	null
2025-02-05	Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation	Jingyu Liu et.al.	2502.02789	link
2025-02-04	EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization	Yize Wu et.al.	2502.02493	null
2025-01-30	Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency	Sazzad Hossain et.al.	2502.01651	null
2025-02-06	An Investigation of FP8 Across Accelerators for LLM Inference	Jiwoo Kim et.al.	2502.01070	null
2025-02-02	Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference	Patrick Yubeaton et.al.	2502.00922	null
2025-02-02	SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models	Jiawen Zhang et.al.	2502.00847	null
2025-02-01	UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs	Yizhe Xiong et.al.	2502.00439	null
2025-02-01	ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference	Xiang Liu et.al.	2502.00299	null
2025-01-31	Pheromone-based Learning of Optimal Reasoning Paths	Anirudh Chari et.al.	2501.19278	null
2025-02-02	RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations	Zunhai Su et.al.	2501.16383	link
2025-01-27	Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs	Antony Bartlett et.al.	2501.16191	null
2025-01-27	TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference	Jack Min Ong et.al.	2501.16007	null
2025-01-27	Aging-aware CPU Core Management for Embodied Carbon Amortization in Cloud LLM Inference	Tharindu B. Hewage et.al.	2501.15829	link
2025-01-25	Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads	Xingyang He et.al.	2501.15113	null
2025-01-24	Locality-aware Fair Scheduling in LLM Serving	Shiyi Cao et.al.	2501.14312	null
2025-01-20	Glinthawk: A Two-Tiered Architecture for High-Throughput LLM Inference	Pouya Hamadanian et.al.	2501.11779	link
2025-01-20	Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas	Nishant Balepur et.al.	2501.11549	link
2025-01-19	GREEN-CODE: Optimizing Energy Efficiency in Large Language Models for Code Generation	Shashikant Ilager et.al.	2501.11006	link
2025-01-17	A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks	Xinzhe Li et.al.	2501.10069	link
2025-01-17	PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks	Huiyou Zhan et.al.	2501.09367	null
2025-01-16	Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition	Takaaki Hori et.al.	2501.09258	null
2025-01-15	Guiding Retrieval using LLM-based Listwise Rankers	Mandeep Rathee et.al.	2501.09186	link
2025-01-14	Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings	Paul Joe Maliakel et.al.	2501.08219	null
2025-01-14	PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving	Ahmet Caner Yüzügüler et.al.	2501.08192	null
2025-01-14	Hierarchical Autoscaling for Large Language Model Serving with Chiron	Archit Patke et.al.	2501.08090	null
2025-01-12	MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference	Wenxuan Zeng et.al.	2501.06807	null
2025-01-05	TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms	Jovan Stojkovic et.al.	2501.02600	null
2025-01-04	AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference	Zhuomin He et.al.	2501.02336	link
2025-01-03	Efficient LLM Inference with Activation Checkpointing and Hybrid Caching	Sanghyeon Lee et.al.	2501.01792	null
2025-01-03	BlockDialect: Block-wise Fine-grained Mixed Format for Energy-Efficient LLM Inference	Wonsuk Jang et.al.	2501.01144	link
2025-01-02	FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving	Zihao Ye et.al.	2501.01005	link
2024-12-23	Highly Optimized Kernels and Fine-Grained Codebooks for LLM Inference on Arm CPUs	Dibakar Gope et.al.	2501.00032	link
2024-12-29	TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication	Zongwu Wang et.al.	2412.20501	link
2024-12-28	LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System	Hyucksung Kwon et.al.	2412.20166	null
2024-12-19	GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors	Chengming Zhang et.al.	2412.19829	null
2025-01-02	A Survey on Large Language Model Acceleration based on KV Cache Management	Haoyang Li et.al.	2412.19442	link
2024-12-27	An Engorgio Prompt Makes Large Language Model Babble on	Jianshuo Dong et.al.	2412.19394	link
2024-12-25	Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference	Libo Zhang et.al.	2412.18934	null
2024-12-21	SYMPHONY: Improving Memory Management for LLM Inference Workloads	Saurabh Agarwal et.al.	2412.16434	null
2024-12-20	WebLLM: A High-Performance In-Browser LLM Inference Engine	Charlie F. Ruan et.al.	2412.15803	link
2024-12-18	A Survey on LLM Inference-Time Self-Improvement	Xiangjue Dong et.al.	2412.14352	link
2024-12-18	Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models	Seungeun Oh et.al.	2412.12687	null
2024-12-17	A System for Microserving of LLMs	Hongyi Jin et.al.	2412.12488	null
2024-12-16	CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation	Hongxuan Zhang et.al.	2412.11741	null
2024-12-15	Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning	Yun Qu et.al.	2412.11120	link
2024-12-15	NITRO: LLM Inference on Intel Laptop NPUs	Anthony Fei et.al.	2412.11053	link
2024-12-13	SCBench: A KV Cache-Centric Analysis of Long-Context Methods	Yucheng Li et.al.	2412.10319	null
2024-12-17	TurboAttention: Efficient Attention Approximation For High Throughputs LLMs	Hao Kang et.al.	2412.08585	null
2024-12-11	Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths	Naryeong Kim et.al.	2412.08281	null
2024-12-12	TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch	Xingchen Song et.al.	2412.08237	null
2024-12-09	Asynchronous LLM Function Calling	In Gim et.al.	2412.07017	null
2024-12-09	SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs	James Vo et.al.	2412.06198	null
2024-12-08	XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference	Weizhuo Li et.al.	2412.05896	null
2024-12-06	GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments	Yanyu Chen et.al.	2412.04788	null
2024-12-03	Multi-Bin Batching for Increasing LLM Inference Throughput	Ozgur Guldogan et.al.	2412.04504	null
2024-11-29	BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching	Zhen Zheng et.al.	2412.03594	null
2024-12-03	Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity	Da Ma et.al.	2412.02252	null
2024-12-02	PLD+: Accelerating LLM inference by leveraging Language Model Artifacts	Shwetha Somasundaram et.al.	2412.01447	null
2024-12-02	Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking	Marco Federici et.al.	2412.01380	null
2024-12-05	RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy	Geonho Lee et.al.	2412.01129	link
2024-12-02	TruncFormer: Private LLM Inference Using Only Truncations	Patrick Yubeaton et.al.	2412.01042	null
2024-11-29	A dynamic parallel method for performance optimization on hybrid CPUs	Luo Yu et.al.	2411.19542	null
2024-12-03	Puzzle: Distillation-Based NAS for Inference-Optimized LLMs	Akhiad Bercovich et.al.	2411.19146	null
2024-11-29	InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks	Xinyao Zheng et.al.	2411.18191	null
2024-11-28	MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache	Akshat Sharma et.al.	2411.18077	null
2024-11-24	Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments	Nikoleta Iliakopoulou et.al.	2411.17741	null
2024-11-26	PIM-AI: A Novel Architecture for High-Efficiency LLM Inference	Cristobal Ortega et.al.	2411.17309	null
2024-11-26	Star Attention: Efficient LLM Inference over Long Sequences	Shantanu Acharya et.al.	2411.17116	link
2024-11-26	Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation	Chaoyi Jiang et.al.	2411.17089	link
2024-11-25	MixPE: Quantization and Hardware Co-design for Efficient LLM Inference	Yu Zhang et.al.	2411.16158	null
2024-11-24	eFedLLM: Efficient LLM Inference Based on Federated Learning	Shengwen Ding et.al.	2411.16003	null
2024-11-24	Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format	Chao Fang et.al.	2411.15982	null
2024-11-24	Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems	Wenxiang Lin et.al.	2411.15715	null
2024-11-22	XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models	Yixin Dong et.al.	2411.15100	null
2024-11-21	Disentangling Memory and Reasoning Ability in Large Language Models	Mingyu Jin et.al.	2411.13504	link
2024-11-20	Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding	Hyun Ryu et.al.	2411.13157	null
2024-11-21	LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts	Zhuohan Gu et.al.	2411.13009	null
2024-11-15	An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2	Pepijn de Reus et.al.	2411.12758	link
2024-11-19	SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference	Jiho Shin et.al.	2411.12692	null
2024-11-18	MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs	Shiyi Cao et.al.	2411.11217	null
2024-11-15	AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference	Janghwan Lee et.al.	2411.09909	null
2024-11-14	Squeezed Attention: Accelerating Long Context Length LLM Inference	Coleman Hooper et.al.	2411.09688	link
2024-11-15	Communication Compression for Tensor Parallel LLM Inference	Jan Hansen-Palmus et.al.	2411.09510	null
2024-11-14	Pie: Pooling CPU Memory for LLM Inference	Yi Xu et.al.	2411.09317	null
2024-11-12	Towards Low-bit Communication for Tensor Parallel LLM Inference	Harry Dong et.al.	2411.07942	null
2024-11-12	The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving	Kyoungmin Kim et.al.	2411.07447	null
2024-11-08	AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality	Ilias Bournias et.al.	2411.05555	null
2024-11-07	Hardware and Software Platform Inference	Cheng Zhang et.al.	2411.05197	null
2024-11-07	SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference	Gabriele Oliaro et.al.	2411.04975	link
2024-11-05	CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration	Hongpeng Jin et.al.	2411.02829	null
2024-11-04	RAGViz: Diagnose and Visualize Retrieval-Augmented Generation	Tevin Wang et.al.	2411.01751	link
2024-11-06	HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference	Peng Tang et.al.	2411.01433	null
2024-11-02	RA-WEBs: Remote Attestation for WEB services	Kosei Akama et.al.	2411.01340	null
2024-11-02	NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference	Xuanlin Jiang et.al.	2411.01142	null
2024-11-01	LLM-Based Misconfiguration Detection for AWS Serverless Computing	Jinfeng Wen et.al.	2411.00642	null
2024-11-04	ReverseNER: A Self-Generated Example-Driven Framework for Zero-Shot Named Entity Recognition with Large Language Models	Anbang Wang et.al.	2411.00533	null
2024-11-01	Attention Tracker: Detecting Prompt Injection Attacks in LLMs	Kuo-Han Hung et.al.	2411.00348	null
2024-10-31	LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators	Krishna Teja Chitty-Venkata et.al.	2411.00136	link
2024-10-31	Interpretable Language Modeling via Induction-head Ngram Models	Eunji Kim et.al.	2411.00066	link
2024-10-31	ALISE: Accelerating Large Language Model Serving with Speculative Scheduling	Youpeng Zhao et.al.	2410.23537	null
2024-10-30	BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference	Junqi Zhao et.al.	2410.23079	link
2024-10-29	Scaling LLM Inference with Optimized Sample Compute Allocation	Kexun Zhang et.al.	2410.22480	link
2024-10-29	SVIP: Towards Verifiable Inference of Open-source Large Language Models	Yifan Sun et.al.	2410.22307	null
2024-10-28	ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference	Hanshi Sun et.al.	2410.21465	link
2024-10-27	FIRP: Faster LLM inference via future intermediate representation prediction	Pengfei Wu et.al.	2410.20488	null
2024-10-29	Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management	Tuowei Wang et.al.	2410.19274	null
2024-10-24	Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design	Ruisi Cai et.al.	2410.19123	link
2024-10-30	Dynamic Vocabulary Pruning in Early-Exit LLMs	Jort Vincenti et.al.	2410.18952	link
2024-10-25	A Survey on Speech Large Language Models	Jing Peng et.al.	2410.18908	null
2024-10-24	BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching	Peizhuang Cong et.al.	2410.18701	null
2024-10-25	Fast Inference for Augmented Large Language Models	Rana Shahout et.al.	2410.18248	null
2024-10-23	POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference	Aditya K Kamath et.al.	2410.18038	link
2024-10-22	FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs	Haoran Lin et.al.	2410.16663	null
2024-10-22	Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency	Prafulla Kumar Choubey et.al.	2410.16597	null
2024-10-20	EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models	Junhao Hu et.al.	2410.15332	null
2024-10-19	IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System	Minseok Seo et.al.	2410.15008	null
2024-10-23	Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching	Jie Peng et.al.	2410.14740	null
2024-10-18	A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference	You Wu et.al.	2410.14442	link
2024-10-18	Revisiting SLO and Goodput Metrics in LLM Serving	Zhibin Wang et.al.	2410.14257	null
2024-10-17	RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs	Jiatan Huang et.al.	2410.13987	null
2024-10-17	Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs	Tianyu Guo et.al.	2410.13835	link
2024-10-17	Progressive Mixed-Precision Decoding for Efficient LLM Inference	Hao Mark Chen et.al.	2410.13461	null
2024-10-17	Data Defenses Against Large Language Models	William Agnew et.al.	2410.13138	link
2024-10-19	In-context KV-Cache Eviction for LLMs via Attention-Gate	Zihao Zeng et.al.	2410.12876	null
2024-10-10	RecurFormer: Not All Transformer Heads Need Self-Attention	Ruiqing Yan et.al.	2410.12850	null
2024-10-16	Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning	Huiwen Wu et.al.	2410.12130	null
2024-10-15	Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix	Yingyu Liang et.al.	2410.11261	null
2024-10-14	DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads	Guangxuan Xiao et.al.	2410.10819	link
2024-10-16	SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization	Akrit Mudvari et.al.	2410.10759	null
2024-10-12	Power-Softmax: Towards Secure LLM Inference over Encrypted Data	Itamar Zimerman et.al.	2410.09457	null
2024-10-09	SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration	Heming Xia et.al.	2410.06916	link
2024-10-08	ParallelSpec: Parallel Drafter for Efficient Speculative Decoding	Zilin Xiao et.al.	2410.05589	null
2024-10-06	RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference	Yige Xu et.al.	2410.04519	link
2024-10-14	Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective	Jinhao Li et.al.	2410.04466	link
2024-10-04	SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation	Aurick Qiao et.al.	2410.03960	null
2024-10-04	EXAQ: Exponent Aware Quantization For LLMs Acceleration	Moran Shkolnik et.al.	2410.03185	link
2024-10-03	LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences	Zhenxiao Fu et.al.	2410.02950	null
2024-10-03	Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration	Yun Qu et.al.	2410.02511	link
2024-10-04	LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services	Małgorzata Łazuka et.al.	2410.02425	null
2024-10-04	Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation	Xiaoqun Liu et.al.	2410.02220	null
2024-10-02	Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads	Yuxiang Huang et.al.	2410.01805	link
2024-10-02	ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving	Yifan Qiao et.al.	2410.01228	null
2024-10-02	TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices	Zonghang Li et.al.	2410.00531	null
2024-09-30	The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems	Linke Song et.al.	2409.20002	null
2024-09-26	Control Industrial Automation System with Large Language Models	Yuchen Xia et.al.	2409.18009	link
2024-09-26	Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores	Shaobo Ma et.al.	2409.17870	null
2024-09-25	Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction	Zhenmei Shi et.al.	2409.17422	link
2024-09-25	Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations	Amey Agrawal et.al.	2409.17264	null
2024-09-25	Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference	Zongyue Qin et.al.	2409.16560	null
2024-09-25	AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization	Yifan Tan et.al.	2409.16546	link
2024-09-23	Eagle: Efficient Training-Free Router for Multi-LLM Inference	Zesen Zhao et.al.	2409.15518	null
2024-09-24	UELLM: A Unified and Efficient Approach for LLM Inference Serving	Yiyuan He et.al.	2409.14961	null
2024-09-22	RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph	Linxi Wei et.al.	2409.14556	null
2024-09-16	Do Large Language Models Need a Content Delivery Network?	Yihua Cheng et.al.	2409.13761	link
2024-09-19	PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)	Mahmoud Nazzal et.al.	2409.12699	link
2024-09-12	LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs	Han Xu et.al.	2409.11424	null
2024-09-04	ISO: Overlap of Computation and Communication within Seqenence For LLM Inference	Bin Xiao et.al.	2409.11155	null
2024-09-18	RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval	Di Liu et.al.	2409.10516	link
2024-09-08	InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference	Xiurui Pan et.al.	2409.04992	null
2024-09-07	Achieving Peak Performance for Large Language Models: A Systematic Review	Zhyar Rzgar K Rostam et.al.	2409.04833	null
2024-09-06	A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage	Huan Yang et.al.	2409.04040	null
2024-09-13	Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study	Jianwei Zhu et.al.	2409.03992	null
2024-09-05	Sirius: Contextual Sparsity with Correction for Efficient LLMs	Yang Zhou et.al.	2409.03856	link
2024-08-31	HSF: Defending against Jailbreak Attacks with Hidden State Filtering	Cheng Qian et.al.	2409.03788	null
2024-09-03	Contemporary Model Compression on Large Language Models Inference	Dong Liu et.al.	2409.01990	link
2024-09-02	CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification	Junhui He et.al.	2409.01366	link
2024-09-04	Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference	Barys Liskavets et.al.	2409.01227	link
2024-09-01	Research on LLM Acceleration Using the High-Performance RISC-V Processor “Xiangshan” (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product)	Xu-Hao Chen et.al.	2409.00661	null
2024-08-28	Decentralized LLM Inference over Edge Networks with Energy Harvesting	Aria Khoshsirat et.al.	2408.15907	null
2024-08-28	Efficient LLM Scheduling by Learning to Rank	Yichao Fu et.al.	2408.15792	link
2024-08-28	Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation	Lujun Gui et.al.	2408.15562	null
2024-08-22	NanoFlow: Towards Optimal Large Language Model Serving Throughput	Kan Zhu et.al.	2408.12757	link
2024-09-04	Parallel Speculative Decoding with Adaptive Draft Length	Tianyu Liu et.al.	2408.11850	link
2024-08-21	MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models	Elias Frantar et.al.	2408.11743	link
2024-08-20	Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models	Artem Vazhentsev et.al.	2408.10692	null
2024-08-19	PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars	Sumanth Prabhu et.al.	2408.08869	null
2024-08-23	ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models	Chao Zeng et.al.	2408.08554	link
2024-08-14	LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference	Seungjae Moon et.al.	2408.07326	null
2024-08-12	LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration	Zhiwen Mo et.al.	2408.06003	null
2024-08-10	LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale	Jaehong Cho et.al.	2408.05499	link
2024-08-05	SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving	Andreas Kosmas Kakolyris et.al.	2408.05235	null
2024-08-08	Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning	Ke Cheng et.al.	2408.04323	null
2024-08-07	Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference	Zeyu Zhang et.al.	2408.04107	null
2024-08-07	MPC-Minimized Secure LLM Inference	Deevashwer Rathee et.al.	2408.03561	null
2024-08-05	Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning	Hao Zhou et.al.	2408.02549	null
2024-08-02	The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines	Matias Martinez et.al.	2408.01050	null
2024-08-01	DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency	Jovan Stojkovic et.al.	2408.00741	null
2024-08-01	Designing Efficient LLM Accelerators for Edge Devices	Jude Haris et.al.	2408.00462	null
2024-08-01	Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control	Hao Zhou et.al.	2408.00214	null
2024-07-23	ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency	Yuhang Yao et.al.	2408.00008	null
2024-08-01	Responsive ML inference in multi-tenanted environments using AQUA	Abhishek Vijaya Kumar et.al.	2407.21255	null
2024-07-25	An Efficient Inference Framework for Early-exit Large Language Models	Ruijie Miao et.al.	2407.20272	null
2024-07-29	Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost	Sania Nayab et.al.	2407.19825	null
2024-07-29	Teaching LLMs at Charles University: Assignments and Activities	Jindřich Helcl et.al.	2407.19798	null
2024-07-22	RazorAttention: Efficient KV Cache Compression Through Retrieval Heads	Hanlin Tang et.al.	2407.15891	null
2024-07-22	vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving	Jiale Xu et.al.	2407.15309	link
2024-07-19	LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference	Qichen Fu et.al.	2407.14057	null
2024-07-17	Struct-X: Enhancing Large Language Models Reasoning with Structured Data	Xiaoyu Tan et.al.	2407.12522	null
2024-07-17	LLM Inference Serving: Survey of Recent Advances and Opportunities	Baolin Li et.al.	2407.12391	null
2024-07-17	Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models	Ayush Kaushal et.al.	2407.12327	link
2024-07-16	PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation	Branden Butler et.al.	2407.11798	null
2024-07-21	Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference	Yuan Feng et.al.	2407.11550	link
2024-07-15	Fast Matrix Multiplications for Lookup Table-Quantized LLMs	Han Guo et.al.	2407.10960	link
2024-07-12	Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference	Zongyue Qin et.al.	2407.09722	null
2024-09-02	Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems	Amey Agrawal et.al.	2407.07000	null
2024-07-08	Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU	Daliang Xu et.al.	2407.05858	link
2024-07-07	A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length	Yuqing Yang et.al.	2407.05347	null
2024-07-05	Corki: Enabling Real-time Embodied AI Robots via Algorithm-Architecture Co-Design	Yiyang Huang et.al.	2407.04292	link
2024-07-04	Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems	Grant Wilkins et.al.	2407.04014	null
2024-07-02	MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention	Huiqiang Jiang et.al.	2407.02490	link
2024-06-29	Teola: Towards End-to-End Optimization of LLM-based Applications	Xin Tan et.al.	2407.00326	link
2024-06-25	T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge	Jianyu Wei et.al.	2407.00088	link
2024-06-28	InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management	Wonbeom Lee et.al.	2406.19707	null
2024-06-24	Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters	Euiin Yi et.al.	2406.16758	link
2024-06-28	SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention	Qianchao Zhu et.al.	2406.15486	null
2024-06-21	Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models	Qi Liu et.al.	2406.14848	link
2024-06-20	Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data	Johannes Treutlein et.al.	2406.14546	link
2024-06-20	LiveMind: Low-latency Large Language Models with Simultaneous Inference	Chuangtao Chen et.al.	2406.14319	link
2024-06-19	SDQ: Sparse Decomposed Quantization for LLM Inference	Geonhwa Jeong et.al.	2406.13868	null
2024-06-19	Amphista: Accelerate LLM Inference with Bi-directional Multiple Drafting Heads in a Non-autoregressive Style	Zeping Li et.al.	2406.13170	null
2024-06-16	Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization	Jungi Lee et.al.	2406.12930	null
2024-06-18	LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization	Masafumi Enomoto et.al.	2406.12494	null
2024-06-18	LLMs Are Prone to Fallacies in Causal Inference	Nitish Joshi et.al.	2406.12158	null
2024-06-14	Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning	Hui Liu et.al.	2406.11890	null
2024-06-17	Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference	Donghyeon Joo et.al.	2406.11674	null
2024-06-17	QTIP: Quantization with Trellises and Incoherence Processing	Albert Tseng et.al.	2406.11235	link
2024-06-16	New Solutions on LLM Acceleration, Optimization, and Application	Yingbing Huang et.al.	2406.10903	null
2024-06-16	Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference	Jiaming Tang et.al.	2406.10774	link
2024-06-15	Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary Study	Hao Hao et.al.	2406.10675	link
2024-06-08	QCQA: Quality and Capacity-aware grouped Query Attention	Vinay Joshi et.al.	2406.10247	null
2024-06-12	Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference	Christopher Wolters et.al.	2406.08413	null
2024-06-12	PowerInfer-2: Fast Large Language Model Inference on a Smartphone	Zhenliang Xue et.al.	2406.06282	null
2024-06-09	A Superalignment Framework in Autonomous Driving with Large Language Models	Xiangrui Kong et.al.	2406.05651	null
2024-06-06	Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism	Jiahao Liu et.al.	2406.03853	null
2024-06-04	Language Models can Infer Action Semantics for Classical Planners from Environment Feedback	Wang Zhu et.al.	2406.02791	null
2024-06-08	Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach	Yuxuan Chen et.al.	2406.02616	null
2024-06-04	SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices	Ruslan Svirschevski et.al.	2406.02532	link
2024-06-03	Demystifying Platform Requirements for Diverse LLM Inference Use Cases	Abhimanyu Bambhaniya et.al.	2406.01698	link
2024-06-03	PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration	Ziqian Zeng et.al.	2406.01394	null
2024-06-01	A Practice-Friendly Two-Stage LLM-Enhanced Paradigm in Sequential Recommendation	Dugang Liu et.al.	2406.00333	null
2024-05-31	No Free Lunch Theorem for Privacy-Preserving LLM Inference	Xiaojin Zhang et.al.	2405.20681	null
2024-05-30	Decentralized AI: Permissionless LLM Inference on POKT Network	Daniel Olshansky et.al.	2405.20450	null
2024-06-01	S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs	Wei Zhong et.al.	2405.20314	null
2024-05-30	Deciphering Human Mobility: Inferring Semantics of Trajectories with Large Language Models	Yuxiao Luo et.al.	2405.19850	null
2024-05-29	MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models	Taehyun Kim et.al.	2405.18832	null
2024-05-29	PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN	Fei Zheng et.al.	2405.18744	null
2024-06-02	Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference	Hao Mark Chen et.al.	2405.18628	link
2024-05-25	FastQuery: Communication-efficient Embedding Table Query for Private LLM Inference	Chenqi Lin et.al.	2405.16241	null
2024-05-23	EdgeShard: Efficient LLM Inference via Collaborative Edge Computing	Mingjin Zhang et.al.	2405.14371	null
2024-05-23	MiniCache: KV Cache Compression in Depth Dimension for Large Language Models	Akide Liu et.al.	2405.14366	null
2024-05-21	PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference	Dongjie Yang et.al.	2405.12532	null
2024-05-12	Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization	Xinyuan Zhang et.al.	2405.07140	null
2024-05-11	Aladdin: Joint Placement and Scaling for SLO-Aware LLM Serving	Chengyi Nie et.al.	2405.06856	null
2024-05-21	Vidur: A Large-Scale Simulation Framework For LLM Inference	Amey Agrawal et.al.	2405.05465	link
2024-05-13	KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation	Minsik Cho et.al.	2405.05329	null
2024-05-12	DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer’s Disease Questions with Scientific Literature	Dawei Li et.al.	2405.04819	link
2024-05-10	QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving	Yujun Lin et.al.	2405.04532	link
2024-05-07	vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention	Ramya Prabhu et.al.	2405.04437	link
2024-05-07	Optimizing Language Model’s Reasoning Abilities with Weak Supervision	Yongqi Tong et.al.	2405.04086	null
2024-05-06	AlphaMath Almost Zero: process Supervision without process	Guoxin Chen et.al.	2405.03553	link
2024-05-03	Efficient and Economic Large Language Model Inference with Attention Offloading	Shaoyuan Chen et.al.	2405.01814	null

MoE

Publish Date	Title	Authors	PDF	Code
2026-03-06	RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering	Gaia A. Bertolino et.al.	2603.06542	null
2026-03-06	A Mixture-of-Experts Framework for Practical Hybrid-Quantum Models in Credit Card Fraud Detection	Rodrigo Chaves et.al.	2603.06473	null
2026-03-06	MoEMambaMIL: Structure-Aware Selective State Space Modeling for Whole-Slide Image Analysis	Dongqing Xie et.al.	2603.06378	null
2026-03-06	MoEless: Efficient MoE LLM Serving via Serverless Computing	Hanfei Yu et.al.	2603.06350	null
2026-03-06	WMoE-CLIP: Wavelet-Enhanced Mixture-of-Experts Prompt Learning for Zero-Shot Anomaly Detection	Peng Chen et.al.	2603.06313	null
2026-03-06	GazeMoE: Perception of Gaze Target with Mixture-of-Experts	Zhuangzhuang Dai et.al.	2603.06256	null
2026-03-06	EvoESAP: Non-Uniform Expert Pruning for Sparse MoE	Zongfang Liu et.al.	2603.06003	null
2026-03-06	MoE Lens – An Expert Is All You Need	Marmik Chaudhari et.al.	2603.05806	null
2026-03-06	Sparse Crosscoders for diffing MoEs and Dense models	Marmik Chaudhari et.al.	2603.05805	null
2026-03-05	Change Point Detection for Cell Populations Measured via Flow Cytometry	Yik Lun Kei et.al.	2603.05700	null
2026-03-05	NeuronMoE: Neuron-Guided Mixture-of-Experts for Efficient Multilingual LLM Extension	Rongzhi Li et.al.	2603.05046	null
2026-03-05	Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation	Yilong Chen et.al.	2603.04971	null
2026-03-05	Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling	Yong Liu et.al.	2603.04791	null
2026-03-05	TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings	Yebo Wu et.al.	2603.04772	null
2026-03-04	ECG-MoE: Mixture-of-Expert Electrocardiogram Foundation Model	Yuhao Xu et.al.	2603.04589	null
2026-03-04	Augmenting representations with scientific papers	Nicolò Oreste Pinciroli Vago et.al.	2603.04516	null
2026-03-04	Benchmarking Quantum Computers via Protocols, Comparing IBM’s Heron vs IBM’s Eagle	Nitay Mayo et.al.	2603.04377	null
2026-03-04	RANGER: Sparsely-Gated Mixture-of-Experts with Adaptive Retrieval Re-ranking for Pathology Report Generation	Yixin Chen et.al.	2603.04348	null
2026-03-04	CAMMSR: Category-Guided Attentive Mixture of Experts for Multimodal Sequential Recommendation	Jinfeng Xu et.al.	2603.04320	null
2026-03-04	UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization	Qianfeng Yang et.al.	2603.03967	null
2026-03-03	Modeling Cross-vision Synergy for Unified Large Vision Model	Shengqiong Wu et.al.	2603.03564	null
2026-03-03	Beyond Language Modeling: An Exploration of Multimodal Pretraining	Shengbang Tong et.al.	2603.03276	null
2026-03-04	MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection	Jun Yeong Park et.al.	2603.03101	null
2026-03-03	CMoE: Contrastive Mixture of Experts for Motion Control and Terrain Adaptation of Humanoid Robots	Shihao Ma et.al.	2603.03067	null
2026-03-03	EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education	Baoliang Chen et.al.	2603.03066	null
2026-03-03	Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs	Wuyue Zhang et.al.	2603.02731	null
2026-03-03	TenExp: Mixture-of-Experts-Based Tensor Decomposition Structure Search Framework	Ting-Wei Zhou et.al.	2603.02720	null
2026-03-03	MiM-DiT: MoE in MoE with Diffusion Transformers for All-in-One Image Restoration	Lingshun Kong et.al.	2603.02710	null
2026-03-03	Addressing Missing and Noisy Modalities in One Solution: Unified Modality-Quality Framework for Low-quality Multimodal Data	Sijie Mai et.al.	2603.02695	null
2026-03-03	Robust Heterogeneous Analog-Digital Computing for Mixture-of-Experts Models with Theoretical Generalization Guarantees	Mohammed Nowaz Rabbani Chowdhury et.al.	2603.02633	null
2026-03-02	DynaMoE: Dynamic Token-Level Expert Activation with Layer-Wise Adaptive Capacity for Mixture-of-Experts Neural Networks	Gökdeniz Gülmez et.al.	2603.01697	null
2026-03-02	PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification	Jian Yu et.al.	2603.01547	null
2026-03-02	Multimodal Mixture-of-Experts with Retrieval Augmentation for Protein Active Site Identification	Jiayang Wu et.al.	2603.01511	null
2026-03-02	UETrack: A Unified and Efficient Framework for Single Object Tracking	Ben Kang et.al.	2603.01412	null
2026-03-02	Fed-GAME: Personalized Federated Learning with Graph Attention Mixture-of-Experts For Time-Series Forecasting	Yi Li et.al.	2603.01363	null
2026-03-01	Truth as a Trajectory: What Internal Representations Reveal About Large Language Model Reasoning	Hamed Damirchi et.al.	2603.01326	null
2026-03-01	TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading	Yudong Pan et.al.	2603.01058	null
2026-03-01	Dr.Occ: Depth- and Region-Guided 3D Occupancy from Surround-View Cameras for Autonomous Driving	Xubo Zhu et.al.	2603.01007	null
2026-02-28	MME: Mixture of Mesh Experts with Random Walk Transformer Gating	Amir Belder et.al.	2603.00828	null
2026-02-27	Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization	Chenwei Jia et.al.	2602.24059	null
2026-02-26	Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG	Hanning Guo et.al.	2602.23410	null
2026-02-26	A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations	Soumya Dutta et.al.	2602.23300	null
2026-02-26	Learning Physical Operators using Neural Operators	Vignesh Gopakumar et.al.	2602.23113	null
2026-02-26	pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation	Shentong Mo et.al.	2602.22938	null
2026-02-26	Switch-Hurdle: A MoE Encoder with AR Hurdle Decoder for Intermittent Demand Forecasting	Fabian Muşat et.al.	2602.22685	null
2026-02-26	Predictive variational inference for flexible regression models	Lucas Kock et.al.	2602.22582	null
2026-02-25	NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training	Dengdi Sun et.al.	2602.22059	null
2026-02-25	Excitation: Momentum For Experts	Sagi Shaier et.al.	2602.21798	null
2026-02-25	Learning from Yesterday’s Error: An Efficient Online Learning Method for Traffic Demand Prediction	Xiannan Huang et.al.	2602.21757	null
2026-02-25	TiMi: Empower Time Series Transformers with Multimodal Mixture of Experts	Jiafeng Lin et.al.	2602.21693	null
2026-02-25	Multi-Layer Scheduling for MoE-Based LLM Reasoning	Yifan Sun et.al.	2602.21626	null
2026-02-24	Dual-Branch INS/GNSS Fusion with Inequality and Equality Constraints	Mor Levenhar et.al.	2602.21266	null
2026-02-25	GeCo-SRT: Geometry-aware Continual Adaptation for Robotic Cross-Task Sim-to-Real Transfer	Wenbo Yu et.al.	2602.20871	null
2026-02-24	Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA	Nuocheng Yang et.al.	2602.20492	null
2026-02-23	The Universal Eccentricity Distribution for Dynamical Gravitational-Wave Merger Channels	Mor Rozner et.al.	2602.20110	null
2026-02-23	Counterfactual Understanding via Retrieval-aware Multimodal Modeling for Time-to-Event Survival Prediction	Ha-Anh Hoang Nguyen et.al.	2602.19987	null
2026-02-23	A Replicate-and-Quantize Strategy for Plug-and-Play Load Balancing of Sparse Mixture-of-Experts LLMs	Zijie Liu et.al.	2602.19938	null
2026-02-23	Towards Dexterous Embodied Manipulation via Deep Multi-Sensory Fusion and Sparse Expert Scaling	Yirui Sun et.al.	2602.19764	null
2026-02-23	RAID: Retrieval-Augmented Anomaly Detection	Mingxiu Cai et.al.	2602.19611	null
2026-02-23	Conversational AI for Automated Patient Questionnaire Completion: Development Insights and Design Principles	David Fraile Navarro et.al.	2602.19507	null
2026-02-23	EMS-FL: Federated Tuning of Mixture-of-Experts in Satellite-Terrestrial Networks via Expert-Driven Model Splitting	Angzi Xu et.al.	2602.19485	null
2026-02-22	Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts	Toshihide Ubukata et.al.	2602.19244	null
2026-02-22	SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation	Yujie Lu et.al.	2602.19213	null
2026-02-22	JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation	Kai Liu et.al.	2602.19163	null
2026-02-22	Routing-Aware Explanations for Mixture of Experts Graph Models in Malware Detection	Hossein Shokouhinejad et.al.	2602.19025	null
2026-02-21	Give Users the Wheel: Towards Promptable Recommendation Paradigm	Fuyuan Lyu et.al.	2602.18929	null
2026-02-20	Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory	Vatsal Agarwal et.al.	2602.18434	null
2026-02-19	Grassmannian Mixture-of-Experts: Concentration-Controlled Routing on Subspace Manifolds	Ibne Farabi Shihab et.al.	2602.17798	null
2026-02-19	Phase-Aware Mixture of Experts for Agentic Reinforcement Learning	Shengtian Yang et.al.	2602.17038	null
2026-02-19	Arcee Trinity Large Technical Report	Varun Singh et.al.	2602.17004	null
2026-02-18	Federated Graph AGI for Cross-Border Insider Threat Intelligence in Government Financial Schemes	Srikumar Nayak et.al.	2602.16109	null
2026-02-17	MoE-Spec: Expert Budgeting for Efficient Speculative Decoding	Bradley McDanel et.al.	2602.16052	null
2026-02-17	ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns	Ziyu Zhao et.al.	2602.15521	null
2026-02-16	Mixture-of-Experts under Finite-Rate Gating: Communication–Generalization Trade-offs	Ali Khalesi et.al.	2602.15091	null
2026-02-15	DeepFusion: Accelerating MoE Training via Federated Knowledge Distillation from Heterogeneous Edge Devices	Songyuan Li et.al.	2602.14301	null
2026-02-15	MILD: Multi-Intent Learning and Disambiguation for Proactive Failure Prediction in Intent-based Networking	Md. Kamrul Hossain et.al.	2602.14283	null
2026-02-15	Multi-Agent Debate: A Unified Agentic Framework for Tabular Anomaly Detection	Pinqiao Wang et.al.	2602.14251	null
2026-02-15	Synergistic Intra- and Cross-Layer Regularization Losses for MoE Expert Specialization	Rizhen Hu et.al.	2602.14159	null
2026-02-15	LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts	Yang Liu et.al.	2602.14060	null
2026-02-15	Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models	Sajjad Kachuee et.al.	2602.14039	null
2026-02-15	Eureka-Audio: Triggering Audio Intelligence in Compact Language Models	Dan Zhang et.al.	2602.13954	null
2026-02-14	Mixture-of-experts Wishart model for covariance matrices with an application to Cancer drug screening	The Tien Mai et.al.	2602.13888	null
2026-02-13	Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning	Jon Irureta et.al.	2602.12708	null
2026-02-13	Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers	Anrui Chen et.al.	2602.12587	null
2026-02-13	SD-MoE: Spectral Decomposition for Effective Expert Specialization	Ruijun Huang et.al.	2602.12556	null
2026-02-13	Decoder-only Conformer with Modality-aware Sparse Mixtures of Experts for ASR	Jaeyoung Lee et.al.	2602.12546	null
2026-02-12	Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration	Akhiad Bercovich et.al.	2602.11937	null
2026-02-12	LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training	Xinyi Liu et.al.	2602.11686	null
2026-02-12	Evolutionary Router Feature Generation for Zero-Shot Graph Anomaly Detection with Mixture-of-Experts	Haiyang Jiang et.al.	2602.11622	null
2026-02-12	Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm	Jinrui Zhang et.al.	2602.11543	null
2026-02-11	Demonstration and performance of an online data selection algorithm for liquid argon time projection chambers using MicroBooNE	MicroBooNE collaboration et.al.	2602.11138	null
2026-02-11	MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs	Yupu Gu et.al.	2602.10965	null
2026-02-11	CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control	Riccardo Barbano et.al.	2602.10933	null
2026-02-11	VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training	Guobin Shen et.al.	2602.10693	null
2026-02-11	Multimodal Priors-Augmented Text-Driven 3D Human-Object Interaction Generation	Yin Wang et.al.	2602.10659	null
2026-02-11	Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters	Ailin Huang et.al.	2602.10604	null
2026-02-11	Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity	Guangzhi Xiong et.al.	2602.10585	null
2026-02-10	Area-Efficient In-Memory Computing for Mixture-of-Experts via Multiplexing and Caching	Hanyuan Gao et.al.	2602.10254	null
2026-02-10	Diverse Skill Discovery for Quadruped Robots via Unsupervised Learning	Ruopeng Cui et.al.	2602.09767	null
2026-02-10	DR.Experts: Differential Refinement of Distortion-Aware Experts for Blind Image Quality Assessment	Bohan Fu et.al.	2602.09531	null
2026-02-10	SMES: Towards Scalable Multi-Task Recommendation via Expert Sparsity	Yukun Zhang et.al.	2602.09386	null
2026-02-10	Effective MoE-based LLM Compression by Exploiting Heterogeneous Inter-Group Experts Routing Frequency and Information Density	Zhendong Mi et.al.	2602.09316	null
2026-02-09	Generalizing GNNs with Tokenized Mixture of Experts	Xiaoguang Guo et.al.	2602.09258	null
2026-02-09	UI-Venus-1.5 Technical Report	Veuns-Team et.al.	2602.09082	null
2026-02-09	DirMoE: Dirichlet-routed Mixture of Experts	Amirhossein Vahidi et.al.	2602.09001	null
2026-02-09	OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation	Yehua Huang et.al.	2602.08896	null
2026-02-09	FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models	Annemette Brok Pirchert et.al.	2602.08818	null
2026-02-10	MOVA: Towards Scalable and Synchronized Video-Audio Generation	SII-OpenMOSS Team et.al.	2602.08794	null
2026-02-09	Redundancy-Free View Alignment for Multimodal Human Activity Recognition with Arbitrarily Missing Views	Duc-Anh Nguyen et.al.	2602.08755	null
2026-02-09	Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing	Jona te Lintelo et.al.	2602.08741	null
2026-02-09	6G-Bench: An Open Benchmark for Semantic Communication and Network-Level Reasoning with Foundation Models in AI-Native 6G Networks	Mohamed Amine Ferrag et.al.	2602.08675	null
2026-02-09	Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models	Mingzi Cao et.al.	2602.08658	null
2026-02-09	Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs	Yukun Jiang et.al.	2602.08621	null
2026-02-09	TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration	Linye Wei et.al.	2602.08404	null
2026-02-06	Parameters as Experts: Adapting Vision Models with Dynamic Parameter Routing	Meng Lou et.al.	2602.06862	null
2026-02-06	POP: Online Structural Pruning Enables Efficient Inference of Large Foundation Models	Yi Chen et.al.	2602.06822	null
2026-02-06	HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction	Shengxuan Qiu et.al.	2602.06527	null
2026-02-05	To 2:4 Sparsity and Beyond: Neuron-level Activation Function to Accelerate LLM Pre-Training	Meghana Madhyastha et.al.	2602.06183	null
2026-02-05	MoSE: Mixture of Slimmable Experts for Efficient and Adaptive Language Models	Nurbek Tastan et.al.	2602.06154	null
2026-02-05	OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale	Jingze Shi et.al.	2602.05711	null
2026-02-04	Rule-Based Spatial Mixture-of-Experts U-Net for Explainable Edge Detection	Bharadwaj Dogga et.al.	2602.05100	null
2026-02-04	Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism	Chenwei Cui et.al.	2602.04870	null
2026-02-04	ERNIE 5.0 Technical Report	Haifeng Wang et.al.	2602.04705	null
2026-02-04	Let Experts Feel Uncertainty: A Multi-Expert Label Distribution Approach to Probabilistic Time Series Forecasting	Zhen Zhou et.al.	2602.04678	null
2026-02-04	RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models	Jiacheng Liang et.al.	2602.04448	null
2026-02-04	Mixture of Masters: Sparse Chess Language Models with Player Routing	Giacomo Frisoni et.al.	2602.04447	null
2026-02-04	Expert Selections In MoE Models Reveal (Almost) As Much As Text	Amir Nuriyev et.al.	2602.04105	null
2026-02-03	SpecMD: A Comprehensive Study On Speculative Expert Prefetching	Duc Hoang et.al.	2602.03921	null
2026-02-03	DALI: A Workload-Aware Offloading Framework for Efficient MoE Inference on Local PCs	Zeyu Zhu et.al.	2602.03495	null
2026-02-03	Scaling Continual Learning with Bi-Level Routing Mixture-of-Experts	Meng Lou et.al.	2602.03473	null
2026-02-03	VIRAL: Visual In-Context Reasoning via Analogy in Diffusion Transformers	Zhiwen Li et.al.	2602.03210	null
2026-02-03	Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry	Ye Su et.al.	2602.03204	null
2026-02-02	SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning	Qifan Yu et.al.	2602.02472	null
2026-02-02	Indications of Belief-Guided Agency and Meta-Cognitive Monitoring in Large Language Models	Noam Steinmetz Yalon et.al.	2602.02467	null
2026-02-02	From Directions to Regions: Decomposing Activations in Language Models via Local Geometry	Or Shafran et.al.	2602.02464	null
2026-02-02	DFKI-Speech System for WildSpoof Challenge: A robust framework for SASV In-the-Wild	Arnab Das et.al.	2602.02286	null
2026-02-02	MoLF: Mixture-of-Latent-Flow for Pan-Cancer Spatial Gene Expression Prediction from Histology	Susu Hu et.al.	2602.02282	null
2026-02-02	Edge-Aligned Initialization of Kernels for Steered Mixture-of-Experts	Martin Determann et.al.	2602.02031	null
2026-02-02	SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning	Zhen-Hao Xie et.al.	2602.01990	null
2026-02-02	Mixture-of-Experts with Intermediate CTC Supervision for Accented Speech Recognition	Wonjun Lee et.al.	2602.01967	null
2026-02-02	SOPRAG: Multi-view Graph Experts Retrieval for Industrial Standard Operating Procedures	Liangtao Lin et.al.	2602.01858	null
2026-02-02	Mutual-Guided Expert Collaboration for Cross-Subject EEG Classification	Zhi Zhang et.al.	2602.01728	null
2026-01-31	Improving Minimax Estimation Rates for Contaminated Mixture of Multinomial Logistic Experts via Expert Heterogeneity	Fanqi Yan et.al.	2602.00939	null
2026-01-31	Dynamic Expert Sharing: Decoupling Memory from Parallelism in Mixture-of-Experts Diffusion LLMs	Hao Mark Chen et.al.	2602.00879	null
2026-01-31	Toward Reliable Sim-to-Real Predictability for MoE-based Robust Quadrupedal Locomotion	Tianyang Wu et.al.	2602.00678	null
2026-01-31	SEER: Transformer-based Robust Time Series Forecasting via Automated Patch Enhancement and Replacement	Xiangfei Qiu et.al.	2602.00589	null
2026-01-31	PROBE: Co-Balancing Computation and Communication in MoE Inference via Real-Time Predictive Prefetching	Qianchao Zhu et.al.	2602.00509	null
2026-01-30	UrbanMoE: A Sparse Multi-Modal Mixture-of-Experts Framework for Multi-Task Urban Region Profiling	Pingping Liu et.al.	2601.22746	null
2026-01-30	A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization	Shiye Lei et.al.	2601.22718	null
2026-01-30	A Unified Study of LoRA Variants: Taxonomy, Review, Codebase, and Empirical Evaluation	Haonan He et.al.	2601.22708	null
2026-01-30	Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments	Jinwoo Jang et.al.	2601.22647	null
2026-01-30	SpanNorm: Reconciling Training Stability and Performance in Deep Transformers	Chao Wang et.al.	2601.22580	null
2026-01-30	Continual Policy Distillation from Distributed Reinforcement Learning Teachers	Yuxuan Li et.al.	2601.22475	null
2026-01-29	ECO: Quantized Training without Full-Precision Master Weights	Mahdi Nikdan et.al.	2601.22101	null
2026-01-29	MoE-ACT: Improving Surgical Imitation Learning Policies through Supervised Mixture-of-Experts	Lorenzo Mazza et.al.	2601.21971	null
2026-01-29	MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts	Evandro S. Ortigossa et.al.	2601.21866	null
2026-01-29	Seg-MoE: Multi-Resolution Segment-wise Mixture-of-Experts for Time Series Forecasting Transformers	Evandro S. Ortigossa et.al.	2601.21641	null
2026-01-29	Multi-Modal Time Series Prediction via Mixture of Modulated Experts	Lige Zhang et.al.	2601.21547	null
2026-01-29	ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory	Yang Zhao et.al.	2601.21545	null
2026-01-29	L $^3$ : Large Lookup Layers	Albert Tseng et.al.	2601.21461	null
2026-01-29	L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts	Minghao Yang et.al.	2601.21349	null
2026-01-29	Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies	Ce Hao et.al.	2601.21251	null
2026-01-29	Scaling Embeddings Outperforms Scaling Experts in Language Models	Hong Liu et.al.	2601.21204	null
2026-01-29	ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling	Yuchen Yang et.al.	2601.21198	null
2026-01-29	BrainStack: Neuro-MoE with Functionally Guided Expert Routing for EEG-Based Language Decoding	Ziyi Zhao et.al.	2601.21148	null
2026-01-29	TRACE: Trajectory Recovery for Continuous Mechanism Evolution in Causal Representation Learning	Shicheng Fan et.al.	2601.21135	null
2026-01-28	ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler	Bohua Zou et.al.	2601.20755	null
2026-01-28	Unsupervised Ensemble Learning Through Deep Energy-based Models	Ariel Maymon et.al.	2601.20556	null
2026-01-28	OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution	Le Zhang et.al.	2601.20380	null
2026-01-28	OSDEnhancer: Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion	Shuoyan Wei et.al.	2601.20308	null
2026-01-28	MiLorE-SSL: Scaling Multilingual Capabilities in Self-Supervised Models without Forgetting	Jing Xu et.al.	2601.20300	null
2026-01-28	HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH	Yueyang Wang et.al.	2601.20255	null
2026-01-28	Control Models for In-IDE Code Completion	Aral de Moor et.al.	2601.20223	null
2026-01-28	Hyperparameter Transfer with Mixture-of-Expert Layers	Tianze Jiang et.al.	2601.20205	null
2026-01-27	Revisiting Incremental Stochastic Majorization-Minimization Algorithms with Applications to Mixture of Experts	TrungKhang Tran et.al.	2601.19811	null
2026-01-27	Component-Level Lesioning of Language Models Reveals Clinically Aligned Aphasia Phenotypes	Yifan Wang et.al.	2601.19723	null
2026-01-27	Dynamic Multi-Expert Projectors with Stabilized Routing for Multilingual Speech Recognition	Isha Pandey et.al.	2601.19451	null
2026-01-26	Fauna Sprout: A lightweight, approachable, developer-ready humanoid robot	Fauna Robotics et.al.	2601.18963	null
2026-01-26	OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion	Zhichao Wang et.al.	2601.18094	null
2026-01-26	LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts	Venmugil Elango et.al.	2601.18089	null
2026-01-25	Domain-Expert-Guided Hybrid Mixture-of-Experts for Medical AI: Integrating Data-Driven Learning with Clinical Priors	Jinchen Gu et.al.	2601.17977	null
2026-01-25	$\infty$ -MoE: Generalizing Mixture of Experts to Infinite Experts	Shota Takashiro et.al.	2601.17680	null
2026-01-24	PILOT: A Perceptive Integrated Low-level Controller for Loco-manipulation over Unstructured Scenes	Xinru Cui et.al.	2601.17440	null
2026-01-23	Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts	Xuan-Phi Nguyen et.al.	2601.17111	null
2026-01-22	FlashMoE: Reducing SSD I/O Bottlenecks via ML-Based Cache Replacement for Mixture-of-Experts Inference on Edge Devices	Byeongju Kim et.al.	2601.17063	null
2026-01-23	GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints	Andy Zhu et.al.	2601.16905	null
2026-01-23	Mixture-of-Models: Unifying Heterogeneous Agents via N-Way Self-Evaluating Deliberation	Tims Pecerskis et.al.	2601.16863	null
2026-01-23	LongCat-Flash-Thinking-2601 Technical Report	Meituan LongCat Team et.al.	2601.16725	null
2026-01-22	LL-GaussianImage: Efficient Image Representation for Zero-shot Low-Light Enhancement with 2D Gaussian Splatting	Yuhan Chen et.al.	2601.15772	null
2026-01-21	Improving MoE Compute Efficiency by Composing Weight and Data Sparsity	Maciej Kilian et.al.	2601.15370	null
2026-01-21	Mixture-of-Experts Models in Vision: Routing, Optimization, and Generalization	Adam Rokah et.al.	2601.15021	null
2026-01-21	Modeling the Thermal Behavior of Photopolymers for In-Space Fabrication	Jonathan Ericson et.al.	2601.14897	null
2026-01-21	UniRoute: Unified Routing Mixture-of-Experts for Modality-Adaptive Remote Sensing Change Detection	Qingling Shu et.al.	2601.14797	null
2026-01-21	Robustness of Mixtures of Experts to Feature Noise	Dong Sun et.al.	2601.14792	null
2026-01-20	Layer-adaptive Expert Pruning for Pre-Training of Mixture-of-Experts Large Language Models	YuanLab. ai et.al.	2601.14327	null
2026-01-20	Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering	Yuxin Chen et.al.	2601.14050	null
2026-01-20	DExTeR: Weakly Semi-Supervised Object Detection with Class and Instance Experts for Medical Imaging	Adrien Meyer et.al.	2601.13954	null
2026-01-20	The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) II. The radial structure of debris discs	Yinuo Han et.al.	2601.13670	null
2026-01-20	MN-TSG:Continuous Time Series Generation with Irregular Observations	Xu Zhang et.al.	2601.13534	null
2026-01-19	CLIP-Guided Adaptable Self-Supervised Learning for Human-Centric Visual Tasks	Mingshuang Luo et.al.	2601.13133	null
2026-01-19	Polychronous Wave Computing: Timing-Native Address Selection in Spiking Networks	Natalila G. Berloff et.al.	2601.13079	null
2026-01-19	PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning	Zhiyan Hou et.al.	2601.13020	null
2026-01-19	HT-GNN: Hyper-Temporal Graph Neural Network for Customer Lifetime Value Prediction in Baidu Ads	Xiaohui Zhao et.al.	2601.13013	null
2026-01-19	OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models	Shiyuan Li et.al.	2601.12996	null
2026-01-19	PhyG-MoE: A Physics-Guided Mixture-of-Experts Framework for Energy-Efficient GNSS Interference Recognition	Zhihan Zeng et.al.	2601.12798	null
2026-01-18	The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) V: Comparison between scattered light and thermal emission	J. Milli et.al.	2601.12586	null
2026-01-18	A Mixture of Experts Vision Transformer for High-Fidelity Surface Code Decoding	Hoang Viet Nguyen et.al.	2601.12483	null
2026-01-18	Learning Diverse Skills for Behavior Models with Mixture of Experts	Wangtian Shen et.al.	2601.12397	null
2026-01-18	NADIR: Differential Attention Flow for Non-Autoregressive Transliteration in Indic Languages	Lakshya Tomar et.al.	2601.12389	null
2026-01-18	GazeFormer-MoE: Context-Aware Gaze Estimation via CLIP and MoE Transformer	Xinyuan Zhao et.al.	2601.12316	null
2026-01-18	Facet-Aware Multi-Head Mixture-of-Experts Model with Text-Enhanced Pre-training for Sequential Recommendation	Mingrui Liu et.al.	2601.12301	null
2026-01-17	EMoE: Eigenbasis-Guided Routing for Mixture-of-Experts	Anzhe Cheng et.al.	2601.12137	null
2026-01-17	The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) III: The vertical structure of debris disks	Brianna Zawadzki et.al.	2601.12128	null
2026-01-17	One-Shot Price Forecasting with Covariate-Guided Experts under Privacy Constraints	Ren He et.al.	2601.11977	null
2026-01-16	The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) VII: Optically thick gas with broad CO gaussian local line profiles in the HD 121617 disc	A. Brennan et.al.	2601.11824	null
2026-01-16	Self-Augmented Mixture-of-Experts for QoS Prediction	Kecheng Cai et.al.	2601.11036	null
2026-01-16	RobuMTL: Enhancing Multi-Task Learning Robustness Against Weather Conditions	Tasneem Shaffee et.al.	2601.10921	null
2026-01-15	MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts	Yuxuan Lou et.al.	2601.10272	null
2026-01-15	MMPG: MoE-based Adaptive Multi-Perspective Graph Fusion for Protein Representation Learning	Yusong Wang et.al.	2601.10157	null
2026-01-14	Progressive Mixture-of-Experts with autoencoder routing for continual RANS turbulence modelling	Haoyu Ji et.al.	2601.09305	null
2026-01-15	A.X K1 Technical Report	Sung Jun Cheon et.al.	2601.09200	null
2026-01-14	WiFo-E: A Scalable Wireless Foundation Model for End-to-End FDD Precoding in Communication Networks	Weibo Wen et.al.	2601.09186	null
2026-01-14	Horseshoe Mixtures-of-Experts (HS-MoE)	Nick Polson et.al.	2601.09043	null
2026-01-13	LookAhead: The Optimal Non-decreasing Index Policy for a Time-Varying Holding Cost problem	Keerthana Gurushankar et.al.	2601.08960	null
2026-01-13	MixServe: An Automatic Distributed Serving System for MoE Models with Hybrid Parallelism Based on Fused Communication Algorithm	Bowen Zhou et.al.	2601.08800	null
2026-01-13	LWM-Spectro: A Foundation Model for Wireless Baseband Signal Spectrograms	Namhyun Kim et.al.	2601.08780	null
2026-01-13	M $^2$ FMoE: Multi-Resolution Multi-View Frequency Mixture-of-Experts for Extreme-Adaptive Time Series Forecasting	Yaohui Huang et.al.	2601.08631	null
2026-01-13	Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance	Jihang Li et.al.	2601.08418	null
2026-01-13	Deconstructing Pre-training: Knowledge Attribution Analysis in MoE and Dense Models	Bo Wang et.al.	2601.08383	null
2026-01-13	Towards Principled Design of Mixture-of-Experts Language Models under Memory and Inference Constraints	Seng Pei Liew et.al.	2601.08215	null
2026-01-12	Towards Specialized Generalists: A Multi-Task MoE-LoRA Framework for Domain-Specific LLM Adaptation	Yuxin Yang et.al.	2601.07935	null
2026-01-12	Emotional Support Evaluation Framework via Controllable and Diverse Seeker Simulator	Chaewon Heo et.al.	2601.07698	null
2026-01-12	Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models	Xin Cheng et.al.	2601.07372	null
2026-01-11	Solar Open Technical Report	Sungrae Park et.al.	2601.07022	null
2026-01-11	Deep Learning Based Channel Extrapolation for Dual-Band Massive MIMO Systems	Qikai Xiao et.al.	2601.06858	null
2026-01-11	MoE-DisCo:Low Economy Cost Training Mixture-of-Experts Models	Xin Ye et.al.	2601.06857	null
2026-01-11	MoEScore: Mixture-of-Experts-Based Text-Audio Relevance Score Prediction for Text-to-Audio System Evaluation	Bochao Sun et.al.	2601.06829	null
2026-01-11	SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute	Bowen Shen et.al.	2601.06790	null
2026-01-10	Hellinger Multimodal Variational Autoencoders	Huyen Khanh Vo et.al.	2601.06572	null
2026-01-10	Physics-guided foundation model for universal speckle removal in ultrathin multimode fiber imaging	Xianrui Zeng et.al.	2601.06448	null
2026-01-09	Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning	Nusrat Jahan Prottasha et.al.	2601.06356	null
2026-01-09	Reconstruction of atmospheric neutrinos in DUNE’s horizontal-drift far-detector module	DUNE Collaboration et.al.	2601.05697	null
2026-01-09	Scalable Heterogeneous Graph Learning via Heterogeneous-aware Orthogonal Prototype Experts	Wei Zhou et.al.	2601.05537	null
2026-01-08	MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs	Jiyuan Zhang et.al.	2601.05296	null
2026-01-08	MoE3D: A Mixture-of-Experts Module for 3D Reconstruction	Zichen Wang et.al.	2601.05208	null
2026-01-08	FaST: Efficient and Effective Long-Horizon Forecasting for Large-Scale Spatial-Temporal Graphs via Mixture-of-Experts	Yiji Zhao et.al.	2601.05174	null
2026-01-08	How to Set the Learning Rate for Large-Scale Pre-training?	Yunhua Zhou et.al.	2601.05049	null
2026-01-08	DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation	Guanzhi Deng et.al.	2601.04823	null
2026-01-07	A Scheduling Framework for Efficient MoE Inference on Edge GPU-NDP Systems	Qi Wu et.al.	2601.03992	null
2026-01-07	Spectral Manifold Regularization for Stable and Modular Routing in Deep MoE Architectures	Ibrahim Delibasoglu et.al.	2601.03889	null
2026-01-07	Variational Inference, Entropy, and Orthogonality: A Unified Theory of Mixture-of-Experts	Ye Su et.al.	2601.03577	null
2026-01-07	CALM: Culturally Self-Aware Language Models	Lingzhi Shen et.al.	2601.03483	null
2026-01-06	The Illusion of Specialization: Unveiling the Domain-Invariant “Standing Committee” in Mixture-of-Experts Models	Yan Wang et.al.	2601.03425	null
2026-01-06	ReCCur: A Recursive Corner-Case Curation Framework for Robust Vision-Language Understanding in Open and Edge Scenarios	Yihan Wei et.al.	2601.03011	null
2026-01-06	MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free	Yishu Lei et.al.	2601.02967	null
2026-01-06	MixTTE: Multi-Level Mixture-of-Experts for Scalable and Adaptive Travel Time Estimation	Wenzhao Jiang et.al.	2601.02943	null
2026-01-06	MiMo-V2-Flash Technical Report	Bangjun Xiao et.al.	2601.02780	null
2026-01-05	Routing by Analogy: kNN-Augmented Expert Assignment for Mixture-of-Experts	Boxuan Lyu et.al.	2601.02144	null
2026-01-05	GCR: Geometry-Consistent Routing for Task-Agnostic Continual Anomaly Detection	Joongwon Chae et.al.	2601.01856	null
2026-01-05	K-EXAONE Technical Report	Eunbi Choi et.al.	2601.01739	null
2026-01-05	Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications	YuanLab. ai et.al.	2601.01718	null
2026-01-05	Varying-Coefficient Mixture of Experts Model	Qicheng Zhao et.al.	2601.01699	null
2026-01-04	Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts	Ruofeng Yang et.al.	2601.01475	null
2026-01-04	Making MoE based LLM inference resilient with Tarragon	Songyu Zhang et.al.	2601.01310	null
2026-01-03	MambaFormer: Token-Level Guided Routing Mixture-of-Experts for Accurate and Efficient Clinical Assistance	Hamad Khan et.al.	2601.01260	null
2026-01-02	Reliability Under Randomness: An Empirical Analysis of Sparse and Dense Language Models Across Decoding Temperatures	Kabir Grover et.al.	2601.00942	null
2026-01-02	HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts	Zihan Fang et.al.	2601.00583	null
2026-01-01	Geometric Regularization in Mixture-of-Experts: The Disconnect Between Weights and Activations	Hyunjun Kim et.al.	2601.00457	null
2026-01-01	Identification and Estimation under Multiple Versions of Treatment: Mixture-of-Experts Approach	Kohei Yoshikawa et.al.	2601.00287	null
2025-12-31	Compute-Accuracy Pareto Frontiers for Open-Source Reasoning Large Language Models	Ákos Prucs et.al.	2512.24776	null
2026-01-01	Sufficient and Necessary Conditions for Eckart-Young like Result for Tubal Tensors	Uria Mor et.al.	2512.24405	null
2025-12-30	Quantum Computing, Ising Formulation, and the Traveling Salesman Problem	Omer Gurevich et.al.	2512.24308	null
2025-12-30	Training Report of TeleChat3-MoE	Xinzhang Liu et.al.	2512.24157	null
2025-12-30	RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress	Ruixuan Huang et.al.	2512.23995	null
2025-12-30	Learnable Query Aggregation with KV Routing for Cross-view Geo-localisation	Hualin Ye et.al.	2512.23938	null
2025-12-29	Dynamic Subspace Composition: Efficient Adaptation via Contractive Basis Expansion	Vladimer Khasia et.al.	2512.23448	null
2025-12-29	Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss	Ang Lv et.al.	2512.23447	null
2025-12-30	YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection	Xu Lin et.al.	2512.23273	null
2025-12-28	Trust Region Masking for Long-Horizon LLM Reinforcement Learning	Yingru Li et.al.	2512.23075	null
2025-12-28	FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment	Boyang Zhang et.al.	2512.23070	null
2025-12-28	Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware	Alex Khalil et.al.	2512.23029	null
2025-12-28	Text-Routed Sparse Mixture-of-Experts Model with Explanation and Temporal Alignment for Multi-Modal Sentiment Analysis	Dongning Rao et.al.	2512.22741	null
2025-12-27	Bright 4B: Scaling Hyperspherical Learning for Segmentation in 3D Brightfield Microscopy	Amil Khan et.al.	2512.22423	null
2025-12-26	FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion	Zhuoran Zhu et.al.	2512.22036	null
2025-12-26	SWE-RM: Execution-free Feedback For Software Engineering Agents	KaShun Shum et.al.	2512.21919	null
2025-12-26	Accelerate Speculative Decoding with Sparse Computation in Verification	Jikai Wang et.al.	2512.21911	null
2025-12-26	MMCTOP: A Multimodal Textualization and Mixture-of-Experts Framework for Clinical Trial Outcome Prediction	Carolina Aparício et.al.	2512.21897	null
2025-12-25	Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction	Zheng Yin et.al.	2512.21707	null
2025-12-25	Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism	Xinglin Pan et.al.	2512.21487	null
2025-12-24	DeepCQ: General-Purpose Deep-Surrogate Framework for Lossy Compression Quality Prediction	Khondoker Mirazul Mumenin et.al.	2512.21433	null
2025-12-25	GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs	Lichao Wu et.al.	2512.21008	null
2025-12-24	RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks	Ningyuan Liu et.al.	2512.20920	null
2025-12-24	NVIDIA Nemotron 3: Efficient and Open Intelligence	NVIDIA et.al.	2512.20856	null
2025-12-23	Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning	NVIDIA et.al.	2512.20848	null
2025-12-23	Defending against adversarial attacks using mixture of experts	Mohammad Meymani et.al.	2512.20821	null
2025-12-23	MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts	Alexandros Christoforos et.al.	2512.20604	null
2025-12-23	Branch Learning in MRI: More Data, More Models, More Training	Yuyang Li et.al.	2512.20330	null
2025-12-23	Mixture-of-Experts with Gradient Conflict-Driven Subspace Topology Pruning for Emergent Modularity	Yuxing Gan et.al.	2512.20291	null
2025-12-23	Degradation-Aware Metric Prompting for Hyperspectral Image Restoration	Binfeng Wang et.al.	2512.20251	null
2025-12-23	AMoE: Agglomerative Mixture-of-Experts Vision Foundation Model	Sofian Chaybouti et.al.	2512.20157	null
2025-12-22	UCCL-EP: Portable Expert-Parallel Communication	Ziming Mao et.al.	2512.19849	null
2025-12-22	Towards Closed-Loop Embodied Empathy Evolution: Probing LLM-Centric Lifelong Empathic Motion Generation in Unseen Scenarios	Jiawen Wang et.al.	2512.19551	null
2025-12-22	EGM: Efficiently Learning General Motion Tracking Policy for High Dynamic Humanoid Whole-Body Control	Chao Yang et.al.	2512.19043	null
2025-12-21	Tempo as the Stable Cue: Hierarchical Mixture of Tempo and Beat Experts for Music to 3D Dance Generation	Guangtao Lyu et.al.	2512.18804	null
2025-12-21	Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts	Linwei Qiu et.al.	2512.18718	null
2025-12-21	Remoe: Towards Efficient and Low-Cost MoE Inference in Serverless Computing	Wentao Liu et.al.	2512.18674	null
2025-12-20	Secret mixtures of experts inside your LLM	Enric Boix-Adsera et.al.	2512.18452	null
2025-12-20	MoE Pathfinder: Trajectory-driven Expert Pruning	Xican Yang et.al.	2512.18425	null
2025-12-20	MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation	Kaixing Yang et.al.	2512.18181	null
2025-12-19	MoE-TransMov: A Transformer-based Model for Next POI Prediction in Familiar & Unfamiliar Movements	Ruichen Tan et.al.	2512.17985	null
2025-12-22	SCOPE: Sequential Causal Optimization of Process Interventions	Jakob De Moor et.al.	2512.17629	null
2025-12-18	Bandwidth-Efficient Adaptive Mixture-of-Experts via Low-Rank Compensation	Zhenyu Liu et.al.	2512.17073	null
2025-12-18	Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models	Zhongpan Tang et.al.	2512.16963	null
2025-12-18	An Upper Bound on the M/M/k Queue With Deterministic Setup Times	Jalani Williams et.al.	2512.16854	null
2025-12-18	Meta-RL Induces Exploration in Language Agents	Yulun Jiang et.al.	2512.16848	null
2025-12-18	PoseMoE: Mixture-of-Experts Network for Monocular 3D Human Pose Estimation	Mengyuan Liu et.al.	2512.16494	null
2025-12-18	Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems	En-Ming Huang et.al.	2512.16473	null
2025-12-18	Pretrained Battery Transformer (PBT): A battery life prediction foundation model	Ruifeng Tan et.al.	2512.16334	null
2025-12-19	Sigma-MoE-Tiny Technical Report	Qingguo Hu et.al.	2512.16248	null
2025-12-18	INTELLECT-3: Technical Report	Prime Intellect Team et.al.	2512.16144	null
2025-12-18	Let the Barbarians In: How AI Can Accelerate Systems Performance Research	Audrey Cheng et.al.	2512.14806	null
2025-12-15	SocialNav-MoE: A Mixture-of-Experts Vision Language Model for Socially Compliant Navigation with Reinforcement Fine-Tuning	Tomohito Kawabata et.al.	2512.14757	null
2025-12-16	SketchAssist: A Practical Assistant for Semantic Edits and Precise Local Redrawing	Han Zou et.al.	2512.14140	null
2025-12-16	SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations	Wentao Guo et.al.	2512.14080	null
2025-12-16	Sparsity-Controllable Dynamic Top-p MoE for Large Foundation Model Pre-training	Can Jin et.al.	2512.13996	null
2025-12-13	RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing	Yuhan Tang et.al.	2512.13727	null
2025-12-15	StutterFuse: Mitigating Modality Collapse in Stuttering Detection with Jaccard-Weighted Metric Learning and Gated Fusion	Guransh Singh et.al.	2512.13632	null
2025-12-16	Janus: Disaggregating Attention and Experts for Scalable MoE Inference	Zhexiang Zhang et.al.	2512.13525	null
2025-12-15	Automated Information Flow Selection for Multi-scenario Multi-task Recommendation	Chaohua Yang et.al.	2512.13396	null
2025-12-13	Fine-Grained Zero-Shot Learning with Attribute-Centric Representations	Zhi Chen et.al.	2512.12219	null
2025-12-13	MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models	Ahmad Chamma et.al.	2512.12121	null
2025-12-11	Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning	Benjamin Gundersen et.al.	2512.10691	null
2025-12-11	Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration	Wenlong Jiao et.al.	2512.10581	null
2025-12-11	Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment	Han Li et.al.	2512.10450	null
2025-12-10	Efficient Continual Learning in Neural Machine Translation: A Low-Rank Adaptation Approach	Salvador Carrión et.al.	2512.09910	null
2025-12-10	DynaIP: Dynamic Image Prompt Adapter for Scalable Zero-shot Personalized Text-to-Image Generation	Zhizhong Wang et.al.	2512.09814	null
2025-12-10	M3Net: A Multi-Metric Mixture of Experts Network Digital Twin with Graph Neural Networks	Blessed Guda et.al.	2512.09797	null
2025-12-10	FoundIR-v2: Optimizing Pre-Training Data Mixtures for Image Restoration Foundation Model	Xiang Chen et.al.	2512.09282	null
2025-12-10	Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens	Yanpeng Yu et.al.	2512.09277	null
2025-12-09	Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts	Yifan Lyu et.al.	2512.08814	null
2025-12-09	What really matters for person re-identification? A Mixture-of-Experts Framework for Semantic Attribute Importance	Athena Psalta et.al.	2512.08697	null
2025-12-09	Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems	Mingwei Li et.al.	2512.08411	null
2025-12-08	LongCat-Image Technical Report	Meituan LongCat Team et.al.	2512.07584	null
2025-12-08	Search for Light Sterile Neutrinos With Two Neutrino Beams at MicroBooNE	MicroBooNE collaboration et.al.	2512.07159	null
2025-12-09	TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning	Zebin Xing et.al.	2512.07135	null
2025-12-08	PlantBiMoE: A Bidirectional Foundation Model with SparseMoE for Plant Genomes	Kepeng Lin et.al.	2512.07113	null
2025-12-07	Adaptive Normalization Mamba with Multi Scale Trend Decomposition and Patch MoE Encoding	MinCheol Jeon et.al.	2512.06929	null
2025-12-07	Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge Networks	Long Shi et.al.	2512.06784	null
2025-12-07	Statistic-Augmented, Decoupled MoE Routing and Aggregating in Autonomous Driving	Wei-Bin Kou et.al.	2512.06664	null
2025-12-06	Enhancing Medical Cross-Modal Hashing Retrieval using Dropout-Voting Mixture-of-Experts Fusion	Jaewon Ahn et.al.	2512.06449	null
2025-12-04	The SAM2-to-SAM3 Gap in the Segment Anything Model Family: Why Prompt-Based Expertise Fails in Concept-Driven Image Segmentation	Ranjan Sapkota et.al.	2512.06032	null
2025-12-05	HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies	Zhiying Du et.al.	2512.05693	null
2025-12-05	ProPhy: Progressive Physical Alignment for Dynamic World Simulation	Zijun Wang et.al.	2512.05564	null
2025-12-05	EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture	Xin He et.al.	2512.04810	null
2025-12-04	Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space	Joey Hong et.al.	2512.04601	null
2025-12-04	Context-Aware Mixture-of-Experts Inference on CXL-Enabled GPU-NDP Systems	Zehao Fan et.al.	2512.04476	null
2025-12-03	Small Models Achieve Large Language Model Performance: Evaluating Reasoning-Enabled AI for Secure Child Welfare Research	Zia Qi et.al.	2512.04261	null
2025-12-03	OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference	Liujianfu Wang et.al.	2512.03927	null
2025-12-04	A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models	X. Y. Han et.al.	2512.03915	null
2025-12-03	Parsimonious Clustering of Covariance Matrices	Yixi Xu et.al.	2512.03912	null
2025-12-03	CellScout: Visual Analytics for Mining Biomarkers in Cell State Discovery	Rui Sheng et.al.	2512.03485	null
2025-12-03	SSLfmm: An R Package for Semi-Supervised Learning with a Mixed-Missingness Mechanism in Finite Mixture Models	Geoffrey J. McLachlan et.al.	2512.03322	null
2025-12-02	SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of Experts	Jiaqi Liu et.al.	2512.02517	null
2025-12-02	Multi-Domain Enhanced Map-Free Trajectory Prediction with Selective Attention	Wenyi Xiong et.al.	2512.02368	null
2025-12-02	Understanding and Harnessing Sparsity in Unified Multimodal Models	Shwai He et.al.	2512.02351	null
2025-12-01	Towards Unified Video Quality Assessment	Chen Feng et.al.	2512.02224	null
2025-12-01	ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation	Chenyang Gu et.al.	2512.02013	null
2025-12-01	Multimodal Mixture-of-Experts for ISAC in Low-Altitude Wireless Networks	Kai Zhang et.al.	2512.01750	null
2025-12-01	GRASP: Guided Residual Adapters with Sample-wise Partitioning	Felix Nützel et.al.	2512.01675	null
2025-12-01	Bridging the Scale Gap: Balanced Tiny and General Object Detection in Remote Sensing Imagery	Zhicheng Zhao et.al.	2512.01665	null
2025-12-01	Cuffless Blood Pressure Estimation from Six Wearable Sensor Modalities in Multi-Motion-State Scenarios	Yiqiao Chen et.al.	2512.01653	null
2025-12-02	Stabilizing Reinforcement Learning with LLMs: Formulation and Practices	Chujie Zheng et.al.	2512.01374	null
2025-12-01	Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe	Yahui Liu et.al.	2512.01252	null
2025-11-30	Elastic Mixture of Rank-Wise Experts for Knowledge Reuse in Federated Fine-Tuning	Yebo Wu et.al.	2512.00902	null
2025-11-30	Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking	Lingling Fu et.al.	2512.00724	null
2025-11-29	GCMCG: A Clustering-Aware Graph Attention and Expert Fusion Network for Multi-Paradigm, Multi-task, and Cross-Subject EEG Decoding	Yiqiao Chen et.al.	2512.00574	null
2025-11-28	Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model	Junshu Tang et.al.	2511.23429	null
2025-11-28	LFM2 Technical Report	Alexander Amini et.al.	2511.23404	null
2025-11-28	Chart2Code-MoLA: Efficient Multi-Modal Code Generation via Adaptive Expert Routing	Yifei Wang et.al.	2511.23321	null
2025-11-28	Multi-Modal Scene Graph with Kolmogorov-Arnold Experts for Audio-Visual Question Answering	Zijian Fu et.al.	2511.23304	null
2025-11-28	Experts are all you need: A Composable Framework for Large Language Model Inference	Shrihari Sridharan et.al.	2511.22955	null
2025-11-28	EnECG: Efficient Ensemble Learning for Electrocardiogram Multi-task Foundation Model	Yuhao Xu et.al.	2511.22935	null
2025-11-27	OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency	Jun Wang et.al.	2511.22481	null
2025-11-27	Foundation Model for Intelligent Wireless Communications	Boxun Liu et.al.	2511.22222	null
2025-11-27	MoE3D: Mixture of Experts meets Multi-Modal 3D Understanding	Yu Li et.al.	2511.22103	null
2025-11-27	Qwen3-VL Technical Report	Shuai Bai et.al.	2511.21631	null
2025-11-26	MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training	Lu Zhao et.al.	2511.21431	null
2025-11-26	MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts	Ivan Novikov et.al.	2511.21089	null
2025-11-25	HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation	Xiang Wang et.al.	2511.20520	null
2025-11-25	MTBBench: A Multimodal Sequential Clinical Decision-Making Benchmark in Oncology	Kiril Vasilev et.al.	2511.20490	null
2025-11-25	Soft Adaptive Policy Optimization	Chang Gao et.al.	2511.20347	null
2025-11-25	ADNet: A Large-Scale and Extensible Multi-Domain Benchmark for Anomaly Detection Across 380 Real-World Categories	Hai Ling et.al.	2511.20169	null
2025-11-25	Adaptive Knowledge Transfer for Cross-Disciplinary Cold-Start Knowledge Tracing	Yulong Deng et.al.	2511.20009	null
2025-11-25	Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models	Wentao Hu et.al.	2511.19822	null
2025-11-22	Exploiting the Experts: Unauthorized Compression in MoE-LLMs	Pinaki Prasad Guha Neogi et.al.	2511.19480	null
2025-11-24	OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs	Yuting Gao et.al.	2511.19023	null
2025-11-24	Dynamic Mixture of Experts Against Severe Distribution Shifts	Donghu Kim et.al.	2511.18987	null
2025-11-23	HiFi-MambaV2: Hierarchical Shared-Routed MoE for High-Fidelity MRI Reconstruction	Pengcheng Fang et.al.	2511.18534	null
2025-11-23	Attosecond-resolved quantum fluctuations of light and matter	Matan Even Tzur et.al.	2511.18362	null
2025-11-23	AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert	Yuting Gao et.al.	2511.18314	null
2025-11-22	PromptMoE: Generalizable Zero-Shot Anomaly Detection via Visually-Guided Prompt Mixtures	Yuheng Shao et.al.	2511.18116	null
2025-11-22	CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking	Hao Li et.al.	2511.17967	null
2025-11-22	FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning	Guoyang Xia et.al.	2511.17885	null
2025-11-22	Equivalence of Context and Parameter Updates in Modern Transformer Blocks	Adrian Goldwaser et.al.	2511.17864	null
2025-11-21	Unified Class and Domain Incremental Learning with Mixture of Experts for Indoor Localization	Akhil Singampalli et.al.	2511.17829	null
2025-11-21	Sparse Mixture-of-Experts for Multi-Channel Imaging: Are All Channel Interactions Required?	Sukwon Yun et.al.	2511.17400	null
2025-11-21	MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment	Huangbiao Xu et.al.	2511.17397	null
2025-11-21	Measurements of differential charged-current cross sections on argon for electron neutrinos with final-state protons in MicroBooNE	MicroBooNE collaboration et.al.	2511.17342	null
2025-11-21	Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design	Quentin Anthony et.al.	2511.17127	null
2025-11-21	VLM-Augmented Degradation Modeling for Image Restoration Under Adverse Weather Conditions	Qianyi Shao et.al.	2511.16998	null
2025-11-21	RadioKMoE: Knowledge-Guided Radiomap Estimation with Kolmogorov-Arnold Networks and Mixture-of-Experts	Fupei Guo et.al.	2511.16986	null
2025-11-21	MicroMoE: Fine-Grained Load Balancing for Mixture-of-Experts with Token Scheduling	Chenqi Zhao et.al.	2511.16947	null
2025-11-20	Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution	Xiao He et.al.	2511.16024	null
2025-11-19	AquaSentinel: Next-Generation AI System Integrating Sensor Networks for Urban Underground Water Pipeline Anomaly Detection via Collaborative MoE-LLM Agent Architecture	Qiming Guo et.al.	2511.15870	null
2025-11-19	MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping	Yushi Huang et.al.	2511.15690	null
2025-11-19	VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation	Tairan He et.al.	2511.15200	null
2025-11-19	GPU-Initiated Networking for NCCL	Khaled Hamidouche et.al.	2511.15076	null
2025-11-19	WiCo-PG: Wireless Channel Foundation Model for Pathloss Map Generation via Synesthesia of Machines	Mingran Sun et.al.	2511.15030	null
2025-11-19	WiCo-MG: Wireless Channel Foundation Model for Multipath Generation via Synesthesia of Machines	Zengrui Han et.al.	2511.15026	null
2025-11-19	Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference	Kexin Chu et.al.	2511.15015	null
2025-11-18	HMC: Learning Heterogeneous Meta-Control for Contact-Rich Loco-Manipulation	Lai Wei et.al.	2511.14756	null
2025-11-18	Towards Stable and Structured Time Series Generation with Perturbation-Aware Flow Matching	Jintao Zhang et.al.	2511.14488	null
2025-11-18	MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts	Wenfeng Wang et.al.	2511.14102	null
2025-11-18	FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration	Jingren Liu et.al.	2511.14099	null
2025-11-18	SMGeo: Cross-View Object Geo-Localization with Grid-Level Mixture-of-Experts	Fan Zhang et.al.	2511.14093	null
2025-11-17	MoMoE: A Mixture of Expert Agent Model for Financial Sentiment Analysis	Peng Shu et.al.	2511.13983	null
2025-11-17	Introducing AI to an Online Petition Platform Changed Outputs but not Outcomes	Isabel Corpus et.al.	2511.13949	null
2025-11-17	InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE	Lipeng Wang et.al.	2511.13488	null
2025-11-17	Measurement of Exclusive $π^+$ –argon Interactions Using ProtoDUNE-SP	DUNE Collaboration et.al.	2511.13462	null
2025-11-18	YOLO Meets Mixture-of-Experts: Adaptive Expert Routing for Robust Object Detection	Ori Meiraz et.al.	2511.13344	null
2025-11-17	Self-Adaptive Graph Mixture of Models	Mohit Meena et.al.	2511.13062	null
2025-11-17	Tokenize Once, Recommend Anywhere: Unified Item Tokenization for Multi-domain LLM-based Recommendation	Yu Hou et.al.	2511.12922	null
2025-11-16	Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data	Yunxin Li et.al.	2511.12609	null
2025-11-16	SEMC: Structure-Enhanced Mixture-of-Experts Contrastive Learning for Ultrasound Standard Plane Recognition	Qing Cai et.al.	2511.12559	null
2025-11-16	MdaIF: Robust One-Stop Multi-Degradation-Aware Image Fusion with Language-Driven Semantics	Jing Li et.al.	2511.12525	null
2025-11-16	MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding	Zhanheng Nie et.al.	2511.12449	null
2025-11-15	SAC-MoE: Reinforcement Learning with Mixture-of-Experts for Control of Hybrid Dynamical Systems with Uncertainty	Leroy D’Souza et.al.	2511.12361	null
2025-11-15	AMR-MoEGA: Antimicrobial Resistance Prediction using Mixture of Experts and Genetic Algorithms	Anshul Bagaria et.al.	2511.12223	null
2025-11-15	ViTE: Virtual Graph Trajectory Expert Router for Pedestrian Trajectory Prediction	Ruochen Li et.al.	2511.12214	null
2025-11-14	First Measurement of $π^+$-Ar and $p$ -Ar Total Inelastic Cross Sections in the Sub-GeV Energy Regime with ProtoDUNE-SP Data	DUNE Collaboration et.al.	2511.11925	null
2025-11-14	FarSkip-Collective: Unhobbling Blocking Communication in Mixture of Experts Models	Yonatan Dukler et.al.	2511.11505	null
2025-11-14	Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification	Qinghao Gao et.al.	2511.11460	null
2025-11-14	Parameter-Efficient MoE LoRA for Few-Shot Multi-Style Editing	Cong Cao et.al.	2511.11236	null
2025-11-14	DoReMi: A Domain-Representation Mixture Framework for Generalizable 3D Understanding	Mingwei Xing et.al.	2511.11232	null
2025-11-14	ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization	Anzhe Cheng et.al.	2511.10971	null
2025-11-14	Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go	Yashshi Pipalani et.al.	2511.10868	null
2025-11-13	Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts	Sumin Lee et.al.	2511.10300	null
2025-11-13	RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo	Jueun Ko et.al.	2511.10107	null
2025-11-13	BuddyMoE: Exploiting Expert Redundancy to Accelerate Memory-Constrained Mixture-of-Experts Inference	Yun Wang et.al.	2511.10054	null
2025-11-13	ConSurv: Multimodal Continual Learning for Survival Analysis	Dianzhi Yu et.al.	2511.09853	null
2025-11-12	UniMM-V2X: MoE-Enhanced Multi-Level Fusion for End-to-End Cooperative Autonomous Driving	Ziyi Song et.al.	2511.09013	null
2025-11-12	Selective Sinkhorn Routing for Improved Sparse Mixture of Experts	Duc Anh Nguyen et.al.	2511.08972	null
2025-11-12	Bayesian Mixture of Experts For Large Language Models	Maryam Dialameh et.al.	2511.08968	null
2025-11-11	OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild	Yuncheng Guo et.al.	2511.08423	null
2025-11-11	Text-based Aerial-Ground Person Retrieval	Xinyu Zhou et.al.	2511.08369	null
2025-11-13	National Institute on Aging PREPARE Challenge: Early Detection of Cognitive Impairment Using Speech – The SpeechCARE Solution	Maryam Zolnoori et.al.	2511.08132	null
2025-11-10	Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs	Zhongyang Li et.al.	2511.07419	null
2025-11-10	AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning	Qile Jiang et.al.	2511.07262	null
2025-11-10	S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning	Jiangwen Dong et.al.	2511.06727	null
2025-11-10	Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation	Evelyn Chee et.al.	2511.06723	null
2025-11-09	Route Experts by Sequence, not by Token	Tiansheng Wen et.al.	2511.06494	null
2025-11-09	HyMoERec: Hybrid Mixture-of-Experts for Sequential Recommendation	Kunrong Li et.al.	2511.06388	null
2025-11-09	A Mixture-of-Experts Framework with Log-Logistic Components for Survival Analysis on Histopathology Images	Ardhendu Sekhar et.al.	2511.06266	null
2025-11-08	DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities	Nagur Shareef Shaik et.al.	2511.05968	null
2025-11-08	MoEGCL: Mixture of Ego-Graphs Contrastive Representation Learning for Multi-View Clustering	Jian Zhu et.al.	2511.05876	null
2025-11-08	In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading	Shuning Lin et.al.	2511.05814	null
2025-11-07	MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery	Baiye Cheng et.al.	2511.05007	null
2025-11-06	PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference	Yushu Zhao et.al.	2511.04805	null
2025-11-06	GNN-MoE: Context-Aware Patch Routing using GNNs for Parameter-Efficient Domain Generalization	Mahmoud Soliman et.al.	2511.04008	null
2025-11-05	GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models	Zhibin Wang et.al.	2511.03251	null
2025-11-04	RoME: Domain-Robust Mixture-of-Experts for MILP Solution Prediction across Domains	Tianle Pu et.al.	2511.02331	null
2025-11-04	FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error	Fengjuan Wang et.al.	2511.02302	null
2025-11-04	Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining	Costin-Andrei Oncescu et.al.	2511.02237	null
2025-11-03	Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing	Song Gao et.al.	2511.01743	null
2025-11-03	HMVLM: Human Motion-Vision-Lanuage Model via MoE LoRA	Lei Hu et.al.	2511.01463	null
2025-11-04	CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing	Yifan Zhou et.al.	2511.01197	null
2025-11-03	DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection	Guoxin Ma et.al.	2511.01192	null
2025-11-01	OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory Feedback	Kai Luo et.al.	2511.00510	null
2025-10-31	LongCat-Flash-Omni Technical Report	Meituan LongCat Team et.al.	2511.00279	null
2025-10-31	Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals	Xiangyu Fan et.al.	2510.27684	null
2025-10-31	RDMA Point-to-Point Communication for LLM Systems	Nandor Licker et.al.	2510.27656	null
2025-10-31	MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts	Jingnan Gao et.al.	2510.27234	null
2025-10-31	AFM-Net: Advanced Fusing Hierarchical CNN Visual Priors with Global Sequence Modeling for Remote Sensing Image Scene Classification	Yuanhao Tang et.al.	2510.27155	null
2025-10-30	Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement	Aaditya Shukla et.al.	2510.27051	null
2025-10-30	Mixture-of-Transformers Learn Faster: A Theoretical Study on Classification Problems	Hongbo Li et.al.	2510.27004	null
2025-10-30	MoME: Mixture of Visual Language Medical Experts for Medical Imaging Segmentation	Arghavan Rezvani et.al.	2510.26996	null
2025-10-30	ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference	Zixu Shen et.al.	2510.26730	null
2025-10-30	Low-Altitude UAV-Carried Movable Antenna for Joint Wireless Power Transfer and Covert Communications	Chuang Zhang et.al.	2510.26628	null
2025-10-30	MossNet: Mixture of State-Space Experts is a Multi-Head Attention	Shikhar Tuli et.al.	2510.26182	null
2025-10-29	Dual Mixture-of-Experts Framework for Discrete-Time Survival Analysis	Hyeonjun Lee et.al.	2510.26014	null
2025-10-31	Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training	Hong Wang et.al.	2510.25803	null
2025-10-29	Revisiting scalable sequential recommendation with Multi-Embedding Approach and Mixture-of-Experts	Qiushi Pan et.al.	2510.25285	null
2025-10-29	MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference	Xinru Tang et.al.	2510.25258	null
2025-10-29	H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts	Peilin Tan et.al.	2510.25091	null
2025-10-28	Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation	Inclusion AI et.al.	2510.24821	null
2025-10-28	Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance	Yujie Wei et.al.	2510.24711	null
2025-10-28	Language-Conditioned Representations and Mixture-of-Experts Policy for Robust Multi-Task Robotic Manipulation	Xiucheng Zhang et.al.	2510.24055	null
2025-10-26	Sparsity and Superposition in Mixture of Experts	Marmik Chaudhari et.al.	2510.23671	null
2025-10-27	EMTSF:Extraordinary Mixture of SOTA Models for Time Series Forecasting	Musleh Alharthi et.al.	2510.23396	null
2025-10-27	Rethinking GSPO: The Perplexity-Entropy Equivalence	Chi Liu et.al.	2510.23142	null
2025-10-27	Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts	Di Zhang et.al.	2510.23027	null
2025-10-27	MoEMeta: Mixture-of-Experts Meta Learning for Few-Shot Relational Learning	Han Wu et.al.	2510.23013	null
2025-10-25	Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation	Ling-Team et.al.	2510.22115	null
2025-10-24	PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling	Andrea Bonfanti et.al.	2510.21262	null
2025-10-24	Adaptive Graph Mixture of Residual Experts: Unsupervised Learning on Diverse Graphs with Heterogeneous Specialization	Yunlong Chu et.al.	2510.21207	null
2025-10-24	Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts	Yanguang Sun et.al.	2510.21114	null
2025-10-24	MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning	Siyong Chen et.al.	2510.21093	null
2025-10-23	Bayesian Jammer Localization with a Hybrid CNN and Path-Loss Mixture of Experts	Mariona Jaramillo-Civill et.al.	2510.20666	null
2025-10-23	xTime: Extreme Event Prediction with Hierarchical Knowledge Distillation and Expert Fusion	Quan Li et.al.	2510.20651	null
2025-10-23	Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning	Xiaohan Lan et.al.	2510.20519	null
2025-10-23	A Parameter-Efficient Mixture-of-Experts Framework for Cross-Modal Geo-Localization	LinFeng Li et.al.	2510.20291	null
2025-10-23	AsyncHZP: Hierarchical ZeRO Parallelism with Asynchronous Scheduling for Scalable LLM Training	Huawei Bai et.al.	2510.20111	null
2025-10-22	HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission	Weihao Yang et.al.	2510.19470	null
2025-10-22	MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs	Xinfeng Xia et.al.	2510.19366	null
2025-10-22	Modeling Turn-Taking with Semantically Informed Gestures	Varsha Suresh et.al.	2510.19350	null
2025-10-23	RailS: Load Balancing for All-to-All Communication in Distributed Mixture-of-Experts Training	Heng Xu et.al.	2510.19262	null
2025-10-22	A Design Science Blueprint for an Orchestrated AI Assistant in Doctoral Supervision	Teo Susnjak et.al.	2510.19227	null
2025-10-22	MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting	In-Hwan Jin et.al.	2510.19210	null
2025-10-21	Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework	Yujie Xing et.al.	2510.18825	null
2025-10-21	Noise-Conditioned Mixture-of-Experts Framework for Robust Speaker Verification	Bin Gu et.al.	2510.18533	null
2025-10-21	Training Diverse Graph Experts for Ensembles: A Systematic Empirical Study	Gangda Deng et.al.	2510.18370	null
2025-10-19	L-MoE: End-to-End Training of a Lightweight Mixture of Low-Rank Adaptation Experts	Shihao Ji et.al.	2510.17898	null
2025-10-20	Towards 3D Objectness Learning in an Open World	Taichi Liu et.al.	2510.17686	null
2025-10-20	Intelligent Communication Mixture-of-Experts Boosted-Medical Image Segmentation Foundation Model	Xinwei Zhang et.al.	2510.17684	null
2025-10-20	Learned Inertial Odometry for Cycling Based on Mixture of Experts Algorithm	Hao Qiao et.al.	2510.17604	null
2025-10-20	ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts	Zheyue Tan et.al.	2510.17483	null
2025-10-19	End-to-end Listen, Look, Speak and Act	Siyin Wang et.al.	2510.16756	null
2025-10-18	NeurIPT: Foundation Model for Neural Interfaces	Zitao Fang et.al.	2510.16548	null
2025-10-18	Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts	Yongxiang Hua et.al.	2510.16448	null
2025-10-18	Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures	Minh-Khoi Nguyen-Nhat et.al.	2510.16411	null
2025-10-17	Expert Merging in Sparse Mixture of Experts with Nash Bargaining	Dung V. Nguyen et.al.	2510.16138	null
2025-10-17	Mixture of Experts Approaches in Dense Retrieval Tasks	Effrosyni Sokli et.al.	2510.15683	null
2025-10-17	FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification	Zhen Sun et.al.	2510.15595	null
2025-10-17	Backdoor or Manipulation? Graph Mixture of Experts Can Defend Against Various Graph Adversarial Attacks	Yuyuan Feng et.al.	2510.15333	null
2025-10-17	MTmixAtt: Integrating Mixture-of-Experts with Multi-Mix Attention for Large-Scale Recommendation	Xianyang Qi et.al.	2510.15286	null
2025-10-17	Adaptive Individual Uncertainty under Out-Of-Distribution Shift with Expert-Routed Conformal Prediction	Amitesh Badkul et.al.	2510.15233	null
2025-10-16	Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models	Guinan Su et.al.	2510.14853	null
2025-10-16	MergeMoE: Efficient Compression of MoE Models via Expert Output Merging	Ruijie Miao et.al.	2510.14436	null
2025-10-16	Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning	Weijie Shen et.al.	2510.14300	null
2025-10-16	MACE: Mixture-of-Experts Accelerated Coordinate Encoding for Large-Scale Scene Localization and Rendering	Mingkai Liu et.al.	2510.14251	null
2025-10-15	REAP the Experts: Why Pruning Prevails for One-Shot MoE compression	Mike Lasby et.al.	2510.13999	null
2025-10-15	Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module	Ruitao Feng et.al.	2510.13558	null
2025-10-15	ExpressNet-MoE: A Hybrid Deep Neural Network for Emotion Recognition	Deeptimaan Banerjee et.al.	2510.13493	null
2025-10-15	Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers	Xin Zhao et.al.	2510.13462	null
2025-10-15	Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts	Li Bai et.al.	2510.13451	null
2025-10-15	UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE	Zhenyu Liu et.al.	2510.13344	null
2025-10-15	GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models	Chen Zheng et.al.	2510.13079	null
2025-10-14	Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps	Do Tien Hai et.al.	2510.12744	null
2025-10-14	MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts	Yushu Zhao et.al.	2510.12357	null
2025-10-14	DE3S: Dual-Enhanced Soft-Sparse-Shape Learning for Medical Early Time-Series Classification	Tao Xie et.al.	2510.12214	null
2025-10-13	Beyond ‘Templates’: Category-Agnostic Object Pose, Size, and Shape Estimation from a Single View	Jinyu Zhang et.al.	2510.11687	null
2025-10-13	Robust Ego-Exo Correspondence with Long-Term Memory	Yijun Hu et.al.	2510.11417	null
2025-10-13	Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers	Wenhan Ma et.al.	2510.11370	null
2025-10-13	What to expect from microscopic nuclear modelling for k $_{\rm eff}$ calculations ?	D. Rochman et.al.	2510.11256	null
2025-10-13	MC#: Mixture Compressor for Mixture-of-Experts Large Models	Wei Huang et.al.	2510.10962	null
2025-10-12	Crisis-Aware Regime-Conditioned Diffusion with CVaR Allocation	Ali Atiah Alzahrani et.al.	2510.10807	null
2025-10-12	Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection	Shizhen Zhao et.al.	2510.10584	null
2025-10-12	Hierarchical LoRA MoE for Efficient CTR Model Scaling	Zhichen Zeng et.al.	2510.10432	null
2025-10-11	SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference	Liangkun Chen et.al.	2510.10302	null
2025-10-10	MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest	Xiao Yang et.al.	2510.09857	null
2025-10-10	Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation	Youwei Zheng et.al.	2510.09094	null
2025-10-09	LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution	Xiaohui Li et.al.	2510.08771	null
2025-10-09	FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts	Heming Zou et.al.	2510.08396	null
2025-10-09	Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization	Jason Bohne et.al.	2510.08256	null
2025-10-09	From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill	Gunjun Lee et.al.	2510.08055	null
2025-10-09	Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training	Ruizhe Wang et.al.	2510.08008	null
2025-10-09	Multilingual Knowledge Graph Completion via Efficient Multilingual Knowledge Sharing	Cunli Mao et.al.	2510.07736	null
2025-10-09	Mutual Learning for Hashing: Unlocking Strong Hash Functions from Weak Supervision	Xiaoxu Ma et.al.	2510.07703	null
2025-10-09	LiveThinking: Enabling Real-Time Efficient Reasoning for AI-Powered Livestreaming via Reinforcement Learning	Yuhan Sun et.al.	2510.07685	null
2025-10-08	MoGU: Mixture-of-Gaussians with Uncertainty-based Gating for Time Series Forecasting	Yoli Shavit et.al.	2510.07459	null
2025-10-08	Less is More: Strategic Expert Selection Outperforms Ensemble Complexity in Traffic Forecasting	Walid Guettala et.al.	2510.07426	null
2025-10-08	Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts	Fangshuo Liao et.al.	2510.07205	null
2025-10-08	A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages	Zibo Su et.al.	2510.06612	null
2025-10-09	SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation	Shuang Cheng et.al.	2510.06303	null
2025-10-06	Reproducibility Study of “XRec: Large Language Models for Explainable Recommendation”	Ranjan Mishra et.al.	2510.06275	null
2025-10-08	Barbarians at the Gate: How AI is Upending Systems Research	Audrey Cheng et.al.	2510.06189	null
2025-10-07	Rasterized Steered Mixture of Experts for Efficient 2D Image Regression	Yi-Hsin Li et.al.	2510.05814	null
2025-10-07	MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition	Haoxun Li et.al.	2510.05749	null
2025-10-07	Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting	Zhongkai Yu et.al.	2510.05497	null
2025-10-06	Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving	Yue Pan et.al.	2510.05245	null
2025-10-06	REN: Anatomically-Informed Mixture-of-Experts for Interstitial Lung Disease Diagnosis	Alec K. Peltekian et.al.	2510.04923	null
2025-10-06	LMM-Incentive: Large Multimodal Model-based Incentive Design for User-Generated Content in Web 3.0	Jinbo Wen et.al.	2510.04765	null
2025-10-06	Multilingual Routing in Mixture-of-Experts	Lucas Bandarkar et.al.	2510.04694	null
2025-10-06	Improving Multimodal Brain Encoding Model with Dynamic Subject-awareness Routing	Xuanhua Yin et.al.	2510.04670	null
2025-10-05	HoRA: Cross-Head Low-Rank Adaptation with Joint Hypernetworks	Nghiem T. Diep et.al.	2510.04295	null
2025-10-05	SliceMoE: Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer Scaling	Harshil Vejendla et.al.	2510.04286	null
2025-10-05	MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition	Umberto Cappellazzo et.al.	2510.04136	null
2025-10-03	Mixture of Many Zero-Compute Experts: A High-Rate Quantization Theory Perspective	Yehuda Dar et.al.	2510.03151	null
2025-10-02	ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models	Gursimran Singh et.al.	2510.02613	null
2025-10-02	UpSafe $^\circ$ C: Upcycling for Controllable Safety in Large Language Models	Yuhao Sun et.al.	2510.02194	null
2025-10-02	LadderMoE: Ladder-Side Mixture of Experts Adapters for Bronze Inscription Recognition	Rixin Zhou et.al.	2510.01651	null
2025-10-01	Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEs	Leyla Mirvakhabova et.al.	2510.01185	null
2025-10-01	Learning Compact Representations of LLM Abilities via Item Response Theory	Jianhao Chen et.al.	2510.00844	null
2025-10-01	Graph Integrated Multimodal Concept Bottleneck Model	Jiakai Lin et.al.	2510.00701	null
2025-10-01	FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression	Yifei Gao et.al.	2510.00621	null
2025-10-01	Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning	Minghao Yang et.al.	2510.00570	null
2025-09-30	FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training	Yunqi Gao et.al.	2510.00207	null
2025-09-30	Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization	Yaoxiang Wang et.al.	2509.26520	null
2025-09-30	Nephrobase Cell+: Multimodal Single-Cell Foundation Model for Decoding Kidney Biology	Chenyu Li et.al.	2509.26223	null
2025-09-30	Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline	Haiyang Li et.al.	2509.25991	null
2025-09-30	UniMMAD: Unified Multi-Modal and Multi-Class Anomaly Detection via MoE-Driven Feature Decompression	Yuan Zhao et.al.	2509.25934	null
2025-09-30	Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel	Chuanyang Zheng et.al.	2509.25913	null
2025-10-01	A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI	Arvind Murari Vepa et.al.	2509.25889	null
2025-09-30	Collaborative Compression for Large-Scale MoE Deployment on Edge	Yixiao Chen et.al.	2509.25689	null
2025-09-30	LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts	Yuan Zhuang et.al.	2509.25684	null
2025-09-30	Guiding Mixture-of-Experts with Temporal Multimodal Interactions	Xing Han et.al.	2509.25678	null
2025-09-29	K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model	Bangwei Guo et.al.	2509.25594	null
2025-09-29	GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference	Yu Han et.al.	2509.25041	null
2025-09-29	LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection	Bao-Ngoc Dao et.al.	2509.24547	null
2025-09-29	One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning	Minh Le et.al.	2509.24483	null
2025-09-29	Muon: Training and Trade-offs with Latent Attention and MoE	Sushant Mehta et.al.	2509.24406	null
2025-09-29	LLaDA-MoE: A Sparse MoE Diffusion Language Model	Fengqi Zhu et.al.	2509.24389	null
2025-09-29	Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning	Zhisheng Chen et.al.	2509.24222	null
2025-09-28	HunyuanImage 3.0 Technical Report	Siyu Cao et.al.	2509.23951	null
2025-09-28	Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms	Jiahao Ying et.al.	2509.23933	null
2025-09-28	Bayesian Mixture-of-Experts: Towards Making LLMs Know What They Don’t Know	Albus Yizhuo Li et.al.	2509.23830	null
2025-09-28	A Modality-Tailored Graph Modeling Framework for Urban Region Representation via Contrastive Learning	Yaya Zhao et.al.	2509.23772	null
2025-09-26	Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time	Yixuan Han et.al.	2509.22572	null
2025-09-26	Learning to Ball: Composing Policies for Long-Horizon Basketball Moves	Pei Xu et.al.	2509.22442	null
2025-09-26	Role-Aware Multi-modal federated learning system for detecting phishing webpages	Bo Wang et.al.	2509.22369	null
2025-09-26	HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space	Ke Li et.al.	2509.22299	null
2025-09-26	Unlocking the Power of Mixture-of-Experts for Task-Aware Time Series Analytics	Xingjian Wu et.al.	2509.22279	null
2025-09-26	MultiCrafter: High-Fidelity Multi-Subject Generation via Spatially Disentangled Attention and Identity-Aware Reinforcement Learning	Tao Wu et.al.	2509.21953	null
2025-09-26	Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts	Naibin Gu et.al.	2509.21892	null
2025-09-26	ChaosNexus: A Foundation Model for Universal Chaotic System Forecasting with Multi-scale Representations	Chang Liu et.al.	2509.21802	null
2025-09-26	LongScape: Advancing Long-Horizon Embodied World Models with Context-Aware MoE	Yu Shang et.al.	2509.21790	null
2025-09-25	Distributed Specialization: Rare-Token Neurons in Large Language Models	Jing Liu et.al.	2509.21163	null
2025-09-26	Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns	Xuemiao Zhang et.al.	2509.21124	null
2025-09-25	Physics Informed Neural Networks for design optimisation of diamond particle detectors for charged particle fast-tracking at high luminosity hadron colliders	Alessandro Bombini et.al.	2509.21123	null
2025-09-24	Dynamic Reasoning Chains through Depth-Specialized Mixture-of-Experts in Transformer Architectures	Sampurna Roy et.al.	2509.20577	null
2025-09-24	SHMoAReg: Spark Deformable Image Registration via Spatial Heterogeneous Mixture of Experts and Attention Heads	Yuxi Zheng et.al.	2509.20073	null
2025-09-24	Faster, Smaller, and Smarter: Task-Aware Expert Merging for Online MoE Inference	Ziyi Han et.al.	2509.19781	null
2025-09-23	DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces	Tianshuo Zhang et.al.	2509.19230	null
2025-09-23	Frequency-Domain Decomposition and Recomposition for Robust Audio-Visual Segmentation	Yunzhe Shen et.al.	2509.18912	null
2025-09-23	LongCat-Flash-Thinking Technical Report	Meituan LongCat Team et.al.	2509.18883	null
2025-09-23	PIE: Perception and Interaction Enhanced End-to-End Motion Planning for Autonomous Driving	Chengran Yuan et.al.	2509.18609	null
2025-09-23	Symphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-Experts	Qi Wang et.al.	2509.18542	null
2025-09-23	StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models	Haoxin Yang et.al.	2509.17993	null
2025-09-23	Optimizing Inference in Transformer-Based Models: A Multi-Method Benchmark	Siu Hang Ho et.al.	2509.17894	null
2025-09-22	Expert-as-a-Service: Towards Efficient, Scalable, and Robust Large-scale MoE Serving	Ziming Liu et.al.	2509.17863	null
2025-09-22	Attention-based Mixture of Experts for Robust Speech Deepfake Detection	Viola Negroni et.al.	2509.17585	null
2025-09-22	Robust Mixture Models for Algorithmic Fairness Under Latent Heterogeneity	Siqi Li et.al.	2509.17411	null
2025-09-21	MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE	Soheil Zibakhsh et.al.	2509.17238	null
2025-09-21	CoBEVMoE: Heterogeneity-aware Feature Fusion with Dynamic Mixture-of-Experts for Collaborative Perception	Lingzhao Kong et.al.	2509.17107	null
2025-09-21	Dynamic Expert Specialization: Towards Catastrophic Forgetting-Free Multi-Domain MoE Adaptation	Junzhuo Li et.al.	2509.16882	null
2025-09-20	KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control	Jinrui Han et.al.	2509.16638	null
2025-09-19	DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning	Sikai Bai et.al.	2509.16105	null
2025-09-19	MoE-CE: Enhancing Generalization for Deep Learning based Channel Estimation via a Mixture-of-Experts Framework	Tianyu Li et.al.	2509.15964	null
2025-09-19	pFedSAM: Personalized Federated Learning of Segment Anything Model for Medical Image Segmentation	Tong Wang et.al.	2509.15638	null
2025-09-19	MEC-Quant: Maximum Entropy Coding for Extremely Low Bit Quantization-Aware Training	Junbiao Pang et.al.	2509.15514	null
2025-09-18	Beyond Spurious Signals: Debiasing Multimodal Large Language Models via Counterfactual Inference and Adaptive Expert Routing	Zichen Wu et.al.	2509.15361	null
2025-09-18	Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting	Liran Nochumsohn et.al.	2509.15105	null
2025-09-18	Adaptive LoRA Experts Allocation and Selection for Federated Fine-Tuning	Lei Wang et.al.	2509.15087	null
2025-09-18	EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence	Chaoyin She et.al.	2509.14977	null
2025-09-18	FURINA: Free from Unmergeable Router via LINear Aggregation of mixed experts	Jiayi Han et.al.	2509.14900	null
2025-09-18	CollabVLA: Self-Reflective Vision-Language-Action Model Dreaming Together with Human	Nan Sun et.al.	2509.14889	null
2025-09-17	CSMoE: An Efficient Remote Sensing Foundation Model with Soft Mixture-of-Experts	Leonard Hackel et.al.	2509.14104	null
2025-09-18	SAIL-VL2 Technical Report	Weijie Yin et.al.	2509.14033	null
2025-09-17	Semi-MoE: Mixture-of-Experts meets Semi-Supervised Histopathology Segmentation	Nguyen Lan Vi Vu et.al.	2509.13834	null
2025-09-18	Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers	Manan Mittal et.al.	2509.13548	null
2025-09-18	GLAD: Global-Local Aware Dynamic Mixture-of-Experts for Multi-Talker ASR	Yujie Guo et.al.	2509.13093	null
2025-09-16	Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection	Boyu Han et.al.	2509.12990	null
2025-09-16	Bridging Perception and Planning: Towards End-to-End Planning for Signal Temporal Logic Tasks	Bowen Ye et.al.	2509.12813	null
2025-09-16	MEGAN: Mixture of Experts for Robust Uncertainty Estimation in Endoscopy Videos	Damola Agbelese et.al.	2509.12772	null
2025-09-17	NavMoE: Hybrid Model- and Learning-based Traversability Estimation for Local Navigation via Mixture of Experts	Botao He et.al.	2509.12747	null
2025-09-16	AsyMoE: Leveraging Modal Asymmetry for Enhanced Expert Specialization in Large Vision-Language Models	Heng Zhang et.al.	2509.12715	null
2025-10-24	Efficient Multimodal Streaming Recommendation via Expandable Side Mixture-of-Experts	Yunke Qu et.al.	2508.05993	null
2025-07-23	Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models	Changxin Tian et.al.	2507.17702	null
2025-07-23	Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography	Farnoush Bayatmakou et.al.	2507.17662	null
2025-07-23	InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation	Shuai Yang et.al.	2507.17520	null
2025-07-23	Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection	Yehao Lu et.al.	2507.17436	null
2025-07-23	A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model	Zhe Xu et.al.	2507.17303	null
2025-07-23	BrownoutServe: SLO-Aware Inference Serving under Bursty Workloads for MoE-based LLMs	Jianmin Hu et.al.	2507.17133	null
2025-07-22	GATEBLEED: Exploiting On-Core Accelerator Power Gating for High Performance & Stealthy Attacks on AI	Joshua Kalyanapu et.al.	2507.17033	null
2025-07-22	Mixture-of-Expert Variational Autoencoders for Cross-Modality Embedding of Type Ia Supernova Data	Yunyi Shen et.al.	2507.16817	null
2025-07-22	Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training	Zixiao Huang et.al.	2507.16274	null
2025-07-21	Applying multimodal learning to Classify transient Detections Early (AppleCiDEr) I: Data set, methods, and infrastructure	Alexandra Junell et.al.	2507.16088	null
2025-07-21	Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation	Alessandro B. Melchiorre et.al.	2507.15826	null
2025-07-21	The New LLM Bottleneck: A Systems Perspective on Latent Attention and Mixture-of-Experts	Sungmin Yun et.al.	2507.15465	null
2025-07-21	Universal crystal material property prediction via multi-view geometric fusion in graph transformers	Liang Zhang et.al.	2507.15303	null
2025-07-20	CoMoCAVs: Cohesive Decision-Guided Motion Planning for Connected and Autonomous Vehicles with Multi-Policy Reinforcement Learning	Pan Hu et.al.	2507.14903	null
2025-07-23	GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving	Chi Wan et.al.	2507.14456	null
2025-07-18	SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing	Yingying Zhang et.al.	2507.13812	null
2025-07-17	Apple Intelligence Foundation Language Models: Tech Report 2025	Hanzhi Zhou et.al.	2507.13575	null
2025-07-17	R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning	Xiaohan Guo et.al.	2507.13107	null
2025-07-16	Astro-MoE: Mixture of Experts for Multiband Astronomical Time Series	Martina Cádiz-Leyton et.al.	2507.12611	null
2025-07-16	Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models	Gen Luo et.al.	2507.12566	null
2025-07-17	Mixture of Raytraced Experts	Andrea Perin et.al.	2507.12419	null
2025-07-16	CorrMoE: Mixture of Experts with De-stylization Learning for Cross-Scene and Cross-Domain Correspondence Pruning	Peiwen Xia et.al.	2507.11834	null
2025-07-15	Mixture of Experts in Large Language Models	Danyang Zhang et.al.	2507.11181	null
2025-07-15	Atmos-Bench: 3D Atmospheric Structures for Climate Insight	Tianchi Xu et.al.	2507.11085	null
2025-07-14	DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models	Luolin Xiong et.al.	2507.09955	null
2025-07-14	ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization	Huilai Li et.al.	2507.09945	null
2025-07-14	Multi-residual Mixture of Experts Learning for Cooperative Control in Multi-vehicle Systems	Vindula Jayawardana et.al.	2507.09836	null
2025-07-13	Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts	Aakash Tripathi et.al.	2507.09754	null
2025-07-13	Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive	You Huang et.al.	2507.09612	null
2025-07-12	PPJudge: Towards Human-Aligned Assessment of Artistic Painting Process	Shiqi Jiang et.al.	2507.09242	null
2025-07-11	BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity	Chenyang Song et.al.	2507.08771	null
2025-07-11	CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes	Tianyou Jiang et.al.	2507.08542	null
2025-07-11	White-Basilisk: A Hybrid Model for Code Vulnerability Detection	Ioannis Lamprou et.al.	2507.08540	null
2025-07-15	KAT-V1: Kwai-AutoThink Technical Report	Zizheng Zhan et.al.	2507.08297	null
2025-07-11	Data-Driven Dimensional Synthesis of Diverse Planar Four-bar Function Generation Mechanisms via Direct Parameterization	Woon Ryong Kim et.al.	2507.08269	null
2025-07-10	MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving	Lu Xu et.al.	2507.07818	null
2025-07-10	When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance	Peizhang Shao et.al.	2507.07748	null
2025-07-09	Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning	Ankit Jyothish et.al.	2507.07335	null
2025-07-08	Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate	A. Bochkov et.al.	2507.07129	null
2025-07-09	4KAgent: Agentic Any Image to 4K Super-Resolution	Yushen Zuo et.al.	2507.07105	null
2025-07-11	FlexOlmo: Open Language Models for Flexible Data Use	Weijia Shi et.al.	2507.07024	null
2025-07-09	Deep Disentangled Representation Network for Treatment Effect Estimation	Hui Meng et.al.	2507.06650	null
2025-07-09	SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference	Qian Chen et.al.	2507.06567	null
2025-07-09	MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models	Yiwen Liu et.al.	2507.06502	null
2025-07-08	Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation	Szymon Płotka et.al.	2507.06363	null
2025-07-08	Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis	Xintong Hu et.al.	2507.06116	null
2025-07-09	A Survey on Prompt Tuning	Zongqian Li et.al.	2507.06085	null
2025-07-08	Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors	Bing Wang et.al.	2507.05939	null
2025-07-08	What You Have is What You Track: Adaptive and Robust Multimodal Tracking	Yuedong Tan et.al.	2507.05899	null
2025-07-08	Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition	Zijin Gu et.al.	2507.05724	null
2025-07-08	Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach	Xiaobing Chen et.al.	2507.05685	null
2025-07-08	City-Level Foreign Direct Investment Prediction with Tabular Learning on Judicial Data	Tianxing Wu et.al.	2507.05651	null
2025-07-07	QMoE: A Quantum Mixture of Experts Framework for Scalable Quantum Neural Networks	Hoang-Quan Nguyen et.al.	2507.05190	null
2025-07-07	NTSFormer: A Self-Teaching Graph Transformer for Multimodal Cold-Start Node Classification	Jun Hu et.al.	2507.04870	null
2025-07-07	DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics	Yayu Long et.al.	2507.04661	null
2025-07-08	UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification	Xixi Wan et.al.	2507.04638	null
2025-07-07	Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts	Yun Wang et.al.	2507.04631	null
2025-07-05	Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge	Linshen Liu et.al.	2507.04123	null
2025-07-05	From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM	Xinyi Wu et.al.	2507.03868	null
2025-07-04	Decoupled Relative Learning Rate Schedules	Jan Ludziejewski et.al.	2507.03526	null
2025-07-03	Neural Inhibition Improves Dynamic Routing and Mixture of Experts	Will Y. Zou et.al.	2507.03221	null
2025-07-03	System-performance and cost modeling of Large Language Model training and inference	Wenzhe Guo et.al.	2507.02456	null
2025-07-03	NLP4Neuro: Sequence-to-sequence learning for neural population decoding	Jacob J. Morra et.al.	2507.02264	null
2025-07-02	MoIRA: Modular Instruction Routing Architecture for Multi-Task Robotics	Dmytro Kuzmenko et.al.	2507.01843	null
2025-07-02	Mixtures of Neural Network Experts with Application to Phytoplankton Flow Cytometry Data	Ethan Pawl et.al.	2507.01375	null
2025-07-02	Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model	Chaoxiang Cai et.al.	2507.01351	null
2025-07-02	Dynamical Multimodal Fusion with Mixture-of-Experts for Localizations	Bohao Wang et.al.	2507.01337	null
2025-07-02	ExPaMoE: An Expandable Parallel Mixture of Experts for Continual Test-Time Adaptation	JianChao Zhao et.al.	2507.00502	null
2025-07-01	MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE	Geng Zhang et.al.	2507.00390	null
2025-06-30	MotionGPT3: Human Motion as a Second Modality	Bingfan Zhu et.al.	2506.24086	null
2025-06-30	MReg: A Novel Regression Model with MoE-based Video Feature Mining for Mitral Regurgitation Diagnosis	Zhe Liu et.al.	2506.23648	null
2025-06-30	Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model	Mu-Chi Chen et.al.	2506.23635	null
2025-06-29	Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging	Lujun Li et.al.	2506.23266	null
2025-06-29	External Data-Enhanced Meta-Representation for Adaptive Probabilistic Load Forecasting	Haoran Li et.al.	2506.23201	null
2025-06-29	Hierarchical Corpus-View-Category Refinement for Carotid Plaque Risk Grading in Ultrasound	Zhiyuan Zhu et.al.	2506.23108	null
2025-07-01	Hecto: Modular Sparse Experts for Adaptive and Interpretable Reasoning	Sanskar Pandey et.al.	2506.22919	null
2025-06-27	Towards Distributed Neural Architectures	Aditya Cowsik et.al.	2506.22389	null
2025-06-27	MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism	Zheng Zhang et.al.	2506.22175	null
2025-06-27	DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE	Hang Shao et.al.	2506.21864	null
2025-06-26	Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts	Jiajie Yang et.al.	2506.21328	null
2025-06-26	Learning to Skip the Middle Layers of Transformers	Tim Lawson et.al.	2506.21103	null
2025-06-26	Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning	Haodong Lu et.al.	2506.21035	null
2025-06-26	EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning	Xiao Zhang et.al.	2506.20986	null
2025-06-25	Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration	Jiaxing Huang et.al.	2506.20282	null
2025-06-23	Multimodal Anomaly Detection with a Mixture-of-Experts	Christoph Willibald et.al.	2506.19077	null
2025-06-23	Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models	Zihan Wang et.al.	2506.18945	null
2025-06-23	Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning	Rahul Atul Bhope et.al.	2506.18789	null
2025-06-23	An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify	Shivam Verma et.al.	2506.18735	null
2025-06-23	Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks	Xiaodong Wu et.al.	2506.18543	null
2025-06-23	SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation	Zichong Li et.al.	2506.18349	null
2025-06-23	Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies	Junchao Fan et.al.	2506.18304	null
2025-06-22	Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection	Zheng Zhan et.al.	2506.18145	null
2025-06-21	Incorporating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Expert	Gelei Xu et.al.	2506.17787	null
2025-06-21	Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities	Xinghao Huang et.al.	2506.17755	null
2025-06-21	PDC-Net: Pattern Divide-and-Conquer Network for Pelvic Radiation Injury Segmentation	Xinyu Xiong et.al.	2506.17712	null
2025-06-20	SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification	Zhenglin Lai et.al.	2506.17368	null
2025-06-19	FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE	Khiem Le et.al.	2506.16600	null
2025-06-19	Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models	Daniel Fidel Harvey et.al.	2506.16419	null
2025-06-17	Scaling Intelligence: Designing Data Centers for Next-Gen Language Models	Jesmin Jahan Tithi et.al.	2506.15006	null
2025-06-17	NeuroMoE: A Transformer-Based Mixture-of-Experts Framework for Multi-Modal Neurological Disorder Classification	Wajih Hassan Raza et.al.	2506.14970	null
2025-06-17	GMT: General Motion Tracking for Humanoid Whole-Body Control	Zixuan Chen et.al.	2506.14770	null
2025-06-17	Exploring Speaker Diarization with Mixture of Experts	Gaobin Yang et.al.	2506.14750	null
2025-06-18	Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs	Ling Team et.al.	2506.14731	null
2025-06-17	GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors	Hengyuan Zhang et.al.	2506.14646	link
2025-06-17	Single-Example Learning in a Mixture of GPDMs with Latent Geometries	Jesse St. Amand et.al.	2506.14563	null
2025-06-17	MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models	Hongyu Wang et.al.	2506.14435	null
2025-06-16	Load Balancing Mixture of Experts with Similarity Preserving Routers	Nabil Omi et.al.	2506.14038	null
2025-06-16	GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics	Qianzhong Chen et.al.	2506.14009	null
2025-06-16	MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention	MiniMax et.al.	2506.13585	link
2025-06-16	Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization	Guanghui Song et.al.	2506.13541	null
2025-06-16	EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization	Zhongqian Fu et.al.	2506.13329	link
2025-06-16	Breaking Thought Patterns: A Multi-Dimensional Reasoning Framework for LLMs	Xintong Tang et.al.	2506.13192	null
2025-06-15	Serving Large Language Models on Huawei CloudMatrix384	Pengfei Zuo et.al.	2506.12708	null
2025-06-14	Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts	Shengzhuang Chen et.al.	2506.12597	null
2025-06-14	Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control	Rongpeng Li et.al.	2506.12453	null
2025-06-17	HarMoEny: Efficient Multi-GPU Inference of MoE Models	Zachary Doucet et.al.	2506.12417	null
2025-06-14	Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model	Chong Li et.al.	2506.12388	null
2025-06-13	Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?	Houyi Li et.al.	2506.12119	null
2025-06-13	Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution	Zhangkai Ni et.al.	2506.11823	link
2025-06-12	Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts	Zaijing Li et.al.	2506.10357	null
2025-06-11	GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture	GigaChat team et.al.	2506.09440	null
2025-06-11	DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts	Yuchen Feng et.al.	2506.09351	null
2025-06-10	CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA	Jiale Dong et.al.	2506.08496	link
2025-06-11	MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding	Shivang Chopra et.al.	2506.08356	null
2025-06-11	STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation	Yiming Wang et.al.	2506.08054	link
2025-06-09	A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling	Jacob Helwig et.al.	2506.07969	link
2025-06-09	M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration	Yongzhen Wang et.al.	2506.07814	null
2025-06-11	MIRA: Medical Time Series Foundation Model for Real-World Health Data	Hao Li et.al.	2506.07584	null
2025-06-11	MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization	Ken Yaggel et.al.	2506.07563	link
2025-06-09	MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts	Wei Tao et.al.	2506.07533	null
2025-06-09	MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing	Haiyue Ma et.al.	2506.07366	null
2025-06-08	UNO: Unified Self-Supervised Monocular Odometry for Platform-Agnostic Deployment	Wentao Zhao et.al.	2506.07013	null
2025-06-07	High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations	Ziwei Li et.al.	2506.06858	null
2025-06-07	Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning	Yuan Yuan et.al.	2506.06694	null
2025-06-06	Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization	Jonathan Yang et.al.	2506.06196	null
2025-06-06	MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models	Jie Cao et.al.	2506.05928	null
2025-06-06	dots.llm1 Technical Report	Bi Huo et.al.	2506.05767	null
2025-06-05	Mixture-of-Experts Meets In-Context Reinforcement Learning	Wenhao Wu et.al.	2506.05426	null
2025-06-05	Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection	Ziyi Zhou et.al.	2506.04739	null
2025-06-05	FlashDMoE: Fast Distributed MoE in a Single Kernel	Osayamen Jonathan Aimuyo et.al.	2506.04667	link
2025-06-04	Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts	Jiaxing Zhang et.al.	2506.03591	null
2025-06-04	PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs	Ze Yu Zhang et.al.	2506.02965	null
2025-06-03	Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights	Jakub Krajewski et.al.	2506.02890	null
2025-06-03	Brain-Like Processing Pathways Form in Models With Heterogeneous Experts	Jack Cook et.al.	2506.02813	null
2025-06-04	MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection	Juntong Li et.al.	2506.02535	null
2025-06-03	MidPO: Dual Preference Optimization for Safety and Helpfulness in Large Language Models via a Mixture of Experts Framework	Yupeng Qi et.al.	2506.02460	null
2025-05-31	Enhancing Multimodal Continual Instruction Tuning with BranchLoRA	Duzhen Zhang et.al.	2506.02041	null
2025-06-02	SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model	Zhao Yang et.al.	2506.01833	link
2025-06-02	Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning	Ryotaro Kawata et.al.	2506.01656	null
2025-06-02	DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models	Jiancheng Ye et.al.	2506.01257	null
2025-06-01	Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts	Fan Liu et.al.	2506.00965	null
2025-05-30	Mixture-of-Experts for Personalized and Semantic-Aware Next Location Prediction	Shuai Liu et.al.	2505.24597	null
2025-05-30	Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis	Junzhuo Li et.al.	2505.24593	null
2025-05-30	Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer	Yilun Kong et.al.	2505.24378	link
2025-05-30	GradPower: Powering Gradients for Faster Language Model Pre-Training	Mingze Wang et.al.	2505.24275	null
2025-05-30	On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks	Mingze Wang et.al.	2505.24205	null
2025-05-29	Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts	Xuweiyi Chen et.al.	2505.23926	null
2025-06-03	Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert	Zhaokun Wang et.al.	2505.23868	null
2025-05-29	From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents	Tobias Lindenbauer et.al.	2505.23422	link
2025-05-29	Context-Aware Semantic Communication for the Wireless Networks	Guangyuan Liu et.al.	2505.23249	null
2025-05-29	Two Is Better Than One: Rotations Scale LoRAs	Hongcan Guo et.al.	2505.23184	null
2025-05-28	HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer	Qi Cai et.al.	2505.22705	link
2025-05-28	Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts	Xue Zhang et.al.	2505.22582	null
2025-05-28	A Human-Centric Approach to Explainable AI for Personalized Education	Vinitra Swamy et.al.	2505.22541	link
2025-05-28	Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion	Kewen Chen et.al.	2505.22360	null
2025-05-28	Advancing Expert Specialization for Better MoE	Hongcan Guo et.al.	2505.22323	null
2025-05-28	ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation	Jiawen Yu et.al.	2505.22159	null
2025-05-28	AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation	Yan Rong et.al.	2505.22053	null
2025-05-28	Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge	Zhongyi Zhou et.al.	2505.21906	null
2025-05-27	MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis	Yitong Li et.al.	2505.21698	null
2025-05-28	Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity	Yehui Tang et.al.	2505.21411	null
2025-05-27	Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM’s Instruction-Following Capabilities	Junyan Zhang et.al.	2505.21191	null
2025-05-27	Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts	Yue Zhang et.al.	2505.21079	null
2025-05-27	Multi-objective Large Language Model Alignment with Hierarchical Experts	Zhuo Li et.al.	2505.20925	null
2025-05-26	FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models	Hao Kang et.al.	2505.20225	link
2025-05-26	NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID	Shihao Li et.al.	2505.20001	null
2025-05-26	Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments	Junming Liu et.al.	2505.19699	null
2025-05-26	MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE	Zongle Huang et.al.	2505.19645	null
2025-05-26	Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate	Liangwei Nathan Zheng et.al.	2505.19525	link
2025-05-26	WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference	Sihan Chen et.al.	2505.19427	link
2025-05-25	RankLLM: A Python Package for Reranking with LLMs	Sahel Sharifymoghaddam et.al.	2505.19284	null
2025-05-25	I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts	Jiayi Xin et.al.	2505.19190	link
2025-05-24	TrajMoE: Spatially-Aware Mixture of Experts for Unified Human Mobility Modeling	Chonghua Han et.al.	2505.18670	null
2025-05-24	ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation	Jian Liang et.al.	2505.18640	link
2025-05-24	Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter	Weizhi Zhong et.al.	2505.18612	null
2025-05-23	Enhancing CTR Prediction with De-correlated Expert Networks	Jiancheng Wang et.al.	2505.17925	null
2025-05-23	PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval	Zehua Pei et.al.	2505.17639	null
2025-05-23	CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning	Jinyuan Feng et.al.	2505.17553	null
2025-05-23	MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation	Kaixing Yang et.al.	2505.17543	null
2025-05-22	JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model	Qihao Duan et.al.	2505.17257	null
2025-05-22	DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving	Zhenjie Yang et.al.	2505.16278	null
2025-05-22	DualComp: End-to-End Learning of a Unified Dual-Modality Lossless Compressor	Yan Zhao et.al.	2505.16256	null
2025-05-21	Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models	Jingcong Liang et.al.	2505.16056	link
2025-05-21	MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding	Yuxiang Wei et.al.	2505.15946	null
2025-05-21	CoLA: Collaborative Low-Rank Adaptation	Yiyun Zhou et.al.	2505.15471	link
2025-05-22	Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought	Tencent Hunyuan Team et.al.	2505.15431	null
2025-05-21	Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks	Uranik Berisha et.al.	2505.15414	null
2025-05-21	Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines	Xiaohou Shi et.al.	2505.15151	null
2025-05-20	Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies	Haoyi Qiu et.al.	2505.14972	link
2025-05-20	Balanced and Elastic End-to-end Training of Dynamic LLMs	Mohamed Wahib et.al.	2505.14864	null
2025-05-20	Solving MNIST with a globally trained Mixture of Quantum Experts	Paolo Alessandro Xavier Tognini et.al.	2505.14789	null
2025-05-20	Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training	Mengru Wang et.al.	2505.14681	null
2025-05-21	Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach	Umberto Cappellazzo et.al.	2505.14336	null
2025-05-20	FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation	Shaolin Zhu et.al.	2505.14256	null
2025-05-20	THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation	Yunlong Liang et.al.	2505.14173	null
2025-05-20	Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition	Shuo Zhang et.al.	2505.14143	null
2025-05-20	Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging	Ryo Bertolissi et.al.	2505.14136	null
2025-05-20	StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning	Huaijie Wang et.al.	2505.13997	null
2025-05-20	Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting	Bao-Ngoc Dao et.al.	2505.13944	link
2025-05-20	U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding	Ziqian Wang et.al.	2505.13880	link
2025-05-20	EfficientLLM: Efficiency in Large Language Models	Zhengqing Yuan et.al.	2505.13840	null
2025-05-19	CompeteSMoE – Statistically Guaranteed Mixture of Experts Training via Competition	Nam V. Nguyen et.al.	2505.13380	link
2025-05-19	Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference	Shuqing Luo et.al.	2505.13345	link
2025-05-19	Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models	Lucas Berry et.al.	2505.13273	null
2025-05-19	True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics	Christoph Jürgen Hemmer et.al.	2505.13192	null
2025-05-19	Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures	Tuan Thai et.al.	2505.13052	null
2025-05-18	Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented Optimization	Hongbiao Zhu et.al.	2505.12311	null
2025-05-20	Model Merging in Pre-training of Large Language Models	Yunshui Li et.al.	2505.12082	null
2025-05-20	Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition	Runduo Han et.al.	2505.12007	link
2025-05-17	MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging	Zihuan Qiu et.al.	2505.11883	null
2025-05-17	Improving Coverage in Combined Prediction Sets with Weighted p-values	Gina Wong et.al.	2505.11785	null
2025-05-16	MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production	Chao Jin et.al.	2505.11432	null
2025-05-16	MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems	Yinsicheng Jiang et.al.	2505.11415	null
2025-05-16	A Fast Kernel-based Conditional Independence test with Application to Causal Discovery	Oliver Schacht et.al.	2505.11085	null
2025-05-16	On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating	Huy Nguyen et.al.	2505.10860	null
2025-05-14	PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning	Zongqian Li et.al.	2505.09519	link
2025-05-14	Qwen3 Technical Report	An Yang et.al.	2505.09388	link
2025-05-14	Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures	Chenggang Zhao et.al.	2505.09343	null
2025-05-13	Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony	Shaoyu Wang et.al.	2505.08944	null
2025-05-13	PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts	Yang Su et.al.	2505.08719	null
2025-05-13	AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale	Yunjie Ji et.al.	2505.08311	null
2025-05-12	UMoE: Unifying Attention and FFN with Shared Experts	Yuanhang Yang et.al.	2505.07260	null
2025-05-11	Seed1.5-VL Technical Report	Dong Guo et.al.	2505.07062	null
2025-05-11	FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers	Tianyu Chen et.al.	2505.06858	null
2025-05-11	The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts	Enric Boix-Adsera et.al.	2505.06839	null
2025-05-10	Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free	Zihan Qiu et.al.	2505.06708	link
2025-05-10	Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding	Dawei Huang et.al.	2505.06685	link
2025-05-10	QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration	HamidReza Imani et.al.	2505.06481	null
2025-05-12	FloE: On-the-Fly MoE Inference on Memory-constrained GPU	Yuxin Zhou et.al.	2505.05950	null
2025-05-09	MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design	Haojie Duanmu et.al.	2505.05799	link
2025-05-08	Divide-and-Conquer: Cold-Start Bundle Recommendation via Mixture of Diffusion Experts	Ming Li et.al.	2505.05035	null
2025-05-07	Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs	Yehui Tang et.al.	2505.04519	null
2025-05-07	SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios	Ning Cheng et.al.	2505.04201	null
2025-05-07	LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress?	Teddy Foley et.al.	2505.04075	link
2025-05-07	Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications	Yuanai Xie et.al.	2505.04068	null
2025-05-06	Towards Smart Point-and-Shoot Photography	Jiawan Li et.al.	2505.03638	null
2025-05-06	Faster MoE LLM Inference for Extremely Large Models	Haoqi Yang et.al.	2505.03531	null
2025-05-06	STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation	Maolin Wang et.al.	2505.03484	null
2025-05-06	3D Gaussian Splatting Data Compression with Mixture of Priors	Lei Liu et.al.	2505.03310	null
2025-05-05	Finger Pose Estimation for Under-screen Fingerprint Sensor	Xiongjun Guan et.al.	2505.02481	link
2025-05-05	Multimodal Deep Learning-Empowered Beam Prediction in Future THz ISAC Systems	Kai Zhang et.al.	2505.02381	null
2025-05-05	Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques	Sanjay Surendranath Girija et.al.	2505.02309	null
2025-05-04	Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields	Zhenxing Mi et.al.	2505.02005	link
2025-05-03	Backdoor Attacks Against Patch-based Mixture of Experts	Cedric Chan et.al.	2505.01811	link
2025-05-01	MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling	Abdoul Majid O. Thiombiano et.al.	2505.01459	null
2025-05-02	Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders	Rogelio A Mancisidor et.al.	2505.01134	null
2025-05-02	CoCoAFusE: Beyond Mixtures of Experts via Model Fusion	Aurelio Raffa Ugolini et.al.	2505.01105	null
2025-05-01	Improving Routing in Sparse Mixture of Experts with Graph of Tokens	Tam Nguyen et.al.	2505.00792	null
2025-05-01	CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series	Tian Lan et.al.	2505.00415	null
2025-05-01	Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing	Piotr Piękos et.al.	2505.00315	link
2025-04-30	Online Federation For Mixtures of Proprietary Agents with Black-Box Encoders	Xuwei Yang et.al.	2505.00216	null
2025-04-29	TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts	Pradip Kunwar et.al.	2504.21190	null
2025-04-29	Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization	Shuai Gong et.al.	2504.21063	null
2025-04-26	PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight	Ben Goertzel et.al.	2504.21029	null
2025-04-29	MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification	Yichu Xu et.al.	2504.20509	null
2025-04-29	FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks	Wenjing Xiao et.al.	2504.20446	null
2025-04-29	MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation	Amaan Izhar et.al.	2504.20343	link
2025-04-28	Accelerating Mixture-of-Experts Training with Adaptive Expert Replication	Athinagoras Skiadopoulos et.al.	2504.19925	null
2025-04-28	Decentralization of Generative AI via Mixture of Experts for Wireless Networks: A Comprehensive Survey	Yunting Xu et.al.	2504.19660	null
2025-04-28	ARTEMIS: Autoregressive End-to-End Trajectory Planning with Mixture of Experts for Autonomous Driving	Renju Feng et.al.	2504.19580	link
2025-04-29	BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts	Qingyue Wang et.al.	2504.18598	null
2025-04-25	NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation	Rob Romijnders et.al.	2504.18147	null
2025-04-28	Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection	Haokai Zhang et.al.	2504.17834	link
2025-04-22	Compass-V2 Technical Report	Sophia Maria et.al.	2504.15527	null
2025-04-21	Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images	Jonathan Brokman et.al.	2504.15470	link
2025-04-17	D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving	Haodong Wang et.al.	2504.15299	null
2025-04-23	MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core	Dennis Liu et.al.	2504.14960	null
2025-04-18	Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts	Jie Zou et.al.	2504.13655	null
2025-04-18	HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering	Alexander Rusnak et.al.	2504.13590	null
2025-04-18	Dense Backpropagation Improves Training for Sparse Mixture-of-Experts	Ashwinee Panda et.al.	2504.12463	link
2025-04-16	Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models	Yuanbo Tang et.al.	2504.12359	null
2025-04-16	Trend Filtered Mixture of Experts for Automated Gating of High-Frequency Flow Cytometry Data	Sangwon Hyun et.al.	2504.12287	null
2025-04-16	MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models	Hang Yuan et.al.	2504.12234	null
2025-04-15	Simulation-based inference for stochastic nonlinear mixed-effects models with applications in systems biology	Henrik Häggström et.al.	2504.11279	link
2025-04-14	Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning	LeiLei Ma et.al.	2504.09990	null
2025-04-14	Multi-objective Bayesian Optimization With Mixed-categorical Design Variables for Expensive-to-evaluate Aeronautical Applications	Nathalie Bartoli et.al.	2504.09930	null
2025-04-14	Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming	Zhiqiang He et.al.	2504.09906	null
2025-04-13	Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation	Jia Wei et.al.	2504.09601	null
2025-04-12	MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints	Yichao Yuan et.al.	2504.09345	null
2025-04-12	Mixture of Group Experts for Learning Invariant Representations	Lei Kang et.al.	2504.09265	null
2025-04-11	RouterKT: Mixture-of-Experts for Knowledge Tracing	Han Liao et.al.	2504.08989	link
2025-04-11	Regularized infill criteria for multi-objective Bayesian optimization with application to aircraft design	Robin Grapin et.al.	2504.08671	null
2025-04-10	C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing	Zhongyang Li et.al.	2504.07964	link
2025-04-11	Scaling Laws for Native Multimodal Models	Mustafa Shukor et.al.	2504.07951	null
2025-04-10	Cluster-Driven Expert Pruning for Mixture-of-Experts Large Language Models	Hongcheng Guo et.al.	2504.07807	link
2025-04-10	Adaptive Detection of Fast Moving Celestial Objects Using a Mixture of Experts and Physical-Inspired Neural Network	Peng Jia et.al.	2504.07777	null
2025-04-10	Kimi-VL Technical Report	Kimi Team et.al.	2504.07491	link
2025-04-09	MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution	Zhe Wang et.al.	2504.07308	link
2025-04-11	Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models	Ling Team et.al.	2504.07158	null
2025-04-09	Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations	Zican Dong et.al.	2504.06792	null
2025-04-09	FedMerge: Federated Personalization via Model Merging	Shutong Chen et.al.	2504.06768	null
2025-04-08	S’MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning	Hanqing Zeng et.al.	2504.06426	null
2025-04-08	HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference	Shuzhang Zhong et.al.	2504.05897	link
2025-04-08	Adaptive Substructure-Aware Expert Model for Molecular Property Prediction	Tianyi Jiang et.al.	2504.05844	null
2025-04-10	Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations	Ajay Jaiswal et.al.	2504.05586	null
2025-04-07	SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement	Zuying Xie et.al.	2504.04818	null
2025-04-06	On the Spatial Structure of Mixture-of-Experts in Transformers	Daniel Bershatsky et.al.	2504.04444	null
2025-04-05	Collaboration and Controversy Among Experts: Rumor Early Detection by Tuning a Comment Generator	Bing Wang et.al.	2504.04076	link
2025-04-04	HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs	Yongji Wu et.al.	2504.03871	null
2025-04-01	Detecting Financial Fraud with Hybrid Deep Learning: A Mix-of-Experts Approach to Sequential and Anomalous Patterns	Diego Vallarino et.al.	2504.03750	null
2025-04-04	RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation	Hanbo Bi et.al.	2504.03166	null
2025-04-03	TeleMoM: Consensus-Driven Telecom Intelligence via Mixture of Models	Xinquan Wang et.al.	2504.02712	null
2025-04-07	MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators	Beichen Huang et.al.	2504.02658	link
2025-04-07	MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism	Ruidong Zhu et.al.	2504.02263	null
2025-04-02	Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design	Mohan Zhang et.al.	2504.01337	null
2025-04-01	Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function	Qiuchen Song et.al.	2504.00819	null
2025-04-01	DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism	Dengchun Li et.al.	2504.00661	link
2025-04-01	Continual Cross-Modal Generalization	Yan Xia et.al.	2504.00561	null
2025-04-01	Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection	Shunxin Chen et.al.	2504.00458	null
2025-03-31	Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion	Jiagen Li et.al.	2503.23721	null
2025-03-30	Mixture of Routers	Jia-Chen Zhang et.al.	2503.23362	null
2025-03-29	Beyond Standard MoE: Mixture of Latent Experts for Resource-Efficient Language Models	Zehua Liu et.al.	2503.23100	null
2025-03-29	S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning	Giang Do et.al.	2503.23007	null
2025-03-29	Sparse Mixture of Experts as Unified Competitive Learning	Giang Do et.al.	2503.22996	null
2025-04-01	Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities	Raman Dutt et.al.	2503.22517	null
2025-03-27	RocketPPA: Ultra-Fast LLM-Based PPA Estimator at Code-Level Abstraction	Armin Abdollahi et.al.	2503.21971	null
2025-03-27	iMedImage Technical Report	Ran Wei et.al.	2503.21836	null
2025-03-27	LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models	Hengyuan Zhao et.al.	2503.21227	null
2025-03-26	Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework	Soham Sane et.al.	2503.20750	null
2025-03-26	UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines	Chen Tang et.al.	2503.20748	null
2025-03-26	Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning	Sashuai Zhou et.al.	2503.20633	null
2025-03-26	MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation	Rongyu Zhang et.al.	2503.20384	null
2025-03-26	Modality-Independent Brain Lesion Segmentation with Privacy-aware Continual Learning	Yousef Sadegheih et.al.	2503.20326	link
2025-03-25	Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion	Konyul Park et.al.	2503.19776	null
2025-03-25	BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts	Suzhe Xu et.al.	2503.19769	null
2025-03-25	M $^2$ CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation	Ziyuan Liu et.al.	2503.19406	null
2025-03-27	Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design	Rui Xie et.al.	2503.18869	null
2025-03-24	Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding	Tianyu Chen et.al.	2503.18578	null
2025-03-24	SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking	Wenrui Cai et.al.	2503.18338	link
2025-03-23	Challenging Dataset and Multi-modal Gated Mixture of Experts Model for Remote Sensing Copy-Move Forgery Understanding	Ze Zhang et.al.	2503.18104	link
2025-03-22	Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM	Codefuse et.al.	2503.17793	null
2025-03-25	Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts	Yike Yuan et.al.	2503.16057	null
2025-03-21	UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations	Debabrata Mandal et.al.	2503.15868	null
2025-05-27	Mixture of Lookup Experts	Shibo Jie et.al.	2503.15798	null
2025-03-21	Leveraging MoE-based Large Language Model for Zero-Shot Multi-Task Semantic Communication	Sin-Yu Huang et.al.	2503.15722	null
2025-03-19	SemEval-2025 Task 1: AdMIRe – Advancing Multimodal Idiomaticity Representation	Thomas Pickard et.al.	2503.15358	null
2025-03-21	Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action Recognition	Seungyeon Cho et.al.	2503.14960	null
2025-03-18	Core-Periphery Principle Guided State Space Model for Functional Connectome Classification	Minheng Chen et.al.	2503.14655	null
2025-03-18	MAST-Pro: Dynamic Mixture-of-Experts for Adaptive Segmentation of Pan-Tumors with Knowledge-Driven Prompts	Runqi Meng et.al.	2503.14355	null
2025-03-18	SNAKE: A Sustainable and Multi-functional Traffic Analysis System utilizing Specialized Large-Scale Models with a Mixture of Experts Architecture	Tian Qin et.al.	2503.13808	null
2025-03-17	Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge	Shengling Qin et.al.	2503.13421	null
2025-03-17	Channel Estimation for Pinching-Antenna Systems (PASS)	Jian Xiao et.al.	2503.13268	null
2025-03-17	Federated Mixture-of-Expert for Non-Overlapped Cross-Domain Sequential Recommendation	Yu Liu et.al.	2503.13254	null
2025-03-16	Fast filtering of non-Gaussian models using Amortized Optimal Transport Maps	Mohammad Al-Jarrah et.al.	2503.12633	link
2025-03-16	MoECollab: Democratizing LLM Development Through Collaborative Mixture of Experts	Harshit et.al.	2503.12592	null
2025-03-16	MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification	Jianwei Zhao et.al.	2503.12401	null
2025-03-15	Adaptive Mixture of Experts Learning for Robust Audio Spoofing Detection	Qixian Chen et.al.	2503.12010	null
2025-03-14	FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-the-World LoRA	Jieming Bian et.al.	2503.11880	null
2025-03-14	A Review of DeepSeek Models’ Key Innovative Techniques	Chengen Wang et.al.	2503.11486	null
2025-03-14	MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling	Rachel S. Y. Teo et.al.	2503.11144	link
2025-03-13	Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores	Chenpeng Wu et.al.	2503.10725	link
2025-03-14	dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis	Luyuan Xie et.al.	2503.10412	null
2025-03-13	StableFusion: Continual Video Retrieval via Frame Adaptation	Zecheng Zhao et.al.	2503.10111	link
2025-03-12	Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework	Bakary Badjie et.al.	2503.09504	null
2025-03-12	Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment	Nazanin Moradinasab et.al.	2503.09498	link
2025-03-12	Astrea: A MOE-based Visual Understanding Model with Progressive Alignment	Xiaoda Yang et.al.	2503.09445	null
2025-03-12	Automatic Operator-level Parallelism Planning for Distributed Deep Learning – A Mixed-Integer Programming Approach	Ruifeng She et.al.	2503.09357	null
2025-03-12	Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference	Mohammad Siavashi et.al.	2503.09304	null
2025-03-13	FaVChat: Unlocking Fine-Grained Facial Video Understanding with Multimodal Large Language Models	Fufangchen Zhao et.al.	2503.09158	null
2025-03-11	MoE-Loco: Mixture of Experts for Multitask Locomotion	Runhan Huang et.al.	2503.08564	null
2025-03-11	Accelerating MoE Model Inference with Expert Sharding	Oana Balmau et.al.	2503.08467	null
2025-03-11	Uni $\textbf{F}^2$ ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models	Junzhe Li et.al.	2503.08120	null
2025-03-11	MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models	Han Zhao et.al.	2503.08007	null
2025-03-10	GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts	Minwen Liao et.al.	2503.07417	null
2025-03-10	A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications	Siyuan Mu et.al.	2503.07137	link
2025-03-10	VMTS: Vision-Assisted Teacher-Student Reinforcement Learning for Multi-Terrain Locomotion in Bipedal Robots	Fu Chen et.al.	2503.07049	link
2025-03-10	ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration	Mengting Ai et.al.	2503.06881	link
2025-03-10	eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference	Suraiya Tairin et.al.	2503.06823	null
2025-03-09	MoFE: Mixture of Frozen Experts Architecture	Jean Seo et.al.	2503.06491	null
2025-03-09	Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models	Nguyen Do et.al.	2503.06413	link
2025-03-08	MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering	Vinay Kumar Verma et.al.	2503.06296	null
2025-03-08	A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts	Wenzhuo Du et.al.	2503.06064	null
2025-03-08	MANDARIN: Mixture-of-Experts Framework for Dynamic Delirium and Coma Prediction in ICU Patients: Development and Validation of an Acute Brain Dysfunction Prediction Model	Miguel Contreras et.al.	2503.06059	null
2025-03-07	Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning	Justin Chih-Yao Chen et.al.	2503.05641	null
2025-03-07	FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework	Jingyu Xu et.al.	2503.05626	null
2025-03-07	Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts	Weigao Sun et.al.	2503.05447	link
2025-03-07	Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs	Ling Team et.al.	2503.05139	null
2025-03-07	Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts	Shwai He et.al.	2503.05066	null
2025-03-06	Continual Pre-training of MoEs: How robust is your router?	Benjamin Thérien et.al.	2503.05029	null
2025-03-06	Predictable Scale: Part I – Optimal Hyperparameter Scaling Law in Large Language Model Pretraining	Houyi Li et.al.	2503.04715	null
2025-03-07	Question-Aware Gaussian Experts for Audio-Visual Question Answering	Hongyeob Kim et.al.	2503.04459	link
2025-03-07	Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling	Yan Li et.al.	2503.04398	null
2025-03-06	A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery	Yiheng Zhu et.al.	2503.04362	null
2025-03-06	DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval	Yating Liu et.al.	2503.04144	null
2025-03-05	VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection	Enkhtogtokh Togootogtokh et.al.	2503.03797	link
2025-03-05	Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs	Haoran Fan et.al.	2503.03594	link
2025-03-06	Convergence Rates for Softmax Gating Mixture of Experts	Huy Nguyen et.al.	2503.03213	null
2025-03-04	MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation	Weihang Wang et.al.	2503.02799	link
2025-03-04	FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting	Congluo Xu et.al.	2503.02692	null
2025-03-04	Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer	Yujiao Yang et.al.	2503.02495	link
2025-03-04	Tabby: Tabular Data Synthesis with Language Models	Sonia Cromp et.al.	2503.02152	null
2025-03-03	ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition	Nastaran Mansourian et.al.	2503.01750	null
2025-03-03	Effective High-order Graph Representation Learning for Credit Card Fraud Detection	Yao Zou et.al.	2503.01556	null
2025-03-03	DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models	Yongqi Huang et.al.	2503.01359	null
2025-03-03	PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation	Linhai Zhang et.al.	2503.01303	null
2025-03-03	Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting	Xiaobin Hong et.al.	2503.01157	null
2025-03-02	Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion	Daiki Nishiyama et.al.	2503.00925	null
2025-03-01	R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts	Zhongyang Li et.al.	2502.20395	link
2025-02-27	Mixture of Experts for Recognizing Depression from Interview and Reading Tasks	Loukas Ilias et.al.	2502.20213	null
2025-02-27	Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems	Zeyi Ren et.al.	2502.20183	null
2025-02-27	UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook	Yidi Jiang et.al.	2502.20067	null
2025-03-01	Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts	Shulai Zhang et.al.	2502.19811	link
2025-02-26	Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization	Taishi Nakamura et.al.	2502.19261	null
2025-02-26	OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment	Jiaxin Deng et.al.	2502.18965	null
2025-02-25	Generative AI-enabled Wireless Communications for Robust Low-Altitude Economy Networking	Changyuan Zhao et.al.	2502.18118	null
2025-02-24	The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE	Andrei Chernov et.al.	2502.17391	null
2025-02-24	Delta Decompression for MoE-based LLMs Compression	Hao Gu et.al.	2502.17298	link
2025-02-24	Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks	Andrei Chernov et.al.	2502.17187	null
2025-02-24	Muon is Scalable for LLM Training	Jingyuan Liu et.al.	2502.16982	link
2025-02-24	BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference	Zewen Jin et.al.	2502.16927	null
2025-02-24	ENACT-Heart – ENsemble-based Assessment Using CNN and Transformer on Heart Sounds	Jiho Han et.al.	2502.16914	null
2025-02-26	Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment	Chenghao Fan et.al.	2502.16894	link
2025-02-22	An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning	Masoud Shokrnezhad et.al.	2502.16198	null
2025-02-21	A fast convergence algorithm based on binary integer programming for expert load balancing in MoE LLMs	Yuan Sun et.al.	2502.15451	link
2025-02-21	Tight Clusters Make Specialized Experts	Stefan K. Nielsen et.al.	2502.15315	link
2025-02-21	Multimodal Graph-Based Variational Mixture of Experts Network for Zero-Shot Multimodal Information Extraction	Baohang Zhou et.al.	2502.15290	link
2025-02-20	Ray-Tracing for Conditionally Activated Neural Networks	Claudio Gallicchio et.al.	2502.14788	null
2025-02-21	ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model	Zhongyi Zhou et.al.	2502.14420	link
2025-02-19	Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts	Xin Li et.al.	2502.13577	null
2025-02-18	MoBA: Mixture of Block Attention for Long-Context LLMs	Enzhe Lu et.al.	2502.13189	link
2025-02-18	Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models	Gyeongman Kim et.al.	2502.12947	null
2025-02-18	DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs	Minxuan Lv et.al.	2502.12455	null
2025-02-17	From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs	Kumari Nishu et.al.	2502.12325	null
2025-02-17	Accurate Expert Predictions in MoE Inference via Cross-Layer Gate	Zhiyuan Fang et.al.	2502.12224	null
2025-02-17	How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines	Ayan Sengupta et.al.	2502.12051	null
2025-02-17	Connector-S: A Survey of Connectors in Multi-modal Large Language Models	Xun Zhu et.al.	2502.11453	null
2025-02-16	Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time	Robert Dahlke et.al.	2502.11096	null
2025-02-16	ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models	Shixuan Li et.al.	2502.11059	null
2025-02-15	Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization	Matthew Lyle Olson et.al.	2502.10928	null
2025-02-12	Heterogeneous Mixture of Experts for Remote Sensing Image Super-Resolution	Bowen Chen et.al.	2502.09654	link
2025-02-14	Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting	Nicholas Dronen et.al.	2502.09500	link
2025-02-12	The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities	Ning Li et.al.	2502.08381	null
2025-02-12	Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification	Xuanze Chen et.al.	2502.08083	null
2025-02-13	Training Sparse Mixture Of Experts Text Embedding Models	Zach Nussbaum et.al.	2502.07972	link
2025-02-11	Memory Analysis on the Training Course of DeepSeek Models	Ping Zhang et.al.	2502.07846	null
2025-02-11	MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks	Lotfi Abdelkrim Mecharbat et.al.	2502.07422	null
2025-02-11	Online Aggregation of Trajectory Predictors	Alex Tong et.al.	2502.07178	null
2025-02-09	Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline	Zhiyuan Fang et.al.	2502.06888	null
2025-02-10	MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing	Seokjin Go et.al.	2502.06643	null
2025-02-10	Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE	Haiduo Huang et.al.	2502.06282	link
2025-02-10	Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models	Peiran Wang et.al.	2502.06094	null
2025-02-08	Mol-MoE: Training Preference-Guided Routers for Molecule Generation	Diego Calanzone et.al.	2502.05633	link
2025-02-08	UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA	Jiale Dong et.al.	2502.05602	link
2025-02-07	fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving	Hanfei Yu et.al.	2502.05370	null
2025-02-07	Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts	Roussel Desmond Nzoyem et.al.	2502.05335	null
2025-02-07	Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient	Jan Ludziejewski et.al.	2502.05172	null
2025-02-06	Mixture of neural operator experts for learning boundary conditions and model selection	Dwyer Deighan et.al.	2502.04562	null
2025-02-06	CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference	Zehua Pei et.al.	2502.04416	link
2025-02-06	Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning	Peizhuang Cong et.al.	2502.03884	null
2025-02-05	(GG) MoE vs. MLP on Tabular Data	Andrei Chernov et.al.	2502.03608	null
2025-02-05	RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts	Tuan Truong et.al.	2502.03044	null
2025-02-05	On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation	Nghiem T. Diep et.al.	2502.03029	null
2025-02-05	Scaling Laws for Upcycling Mixture-of-Experts Language Models	Seng Pei Liew et.al.	2502.03009	null
2025-02-04	ReGNet: Reciprocal Space-Aware Long-Range Modeling and Multi-Property Prediction for Crystals	Jianan Nie et.al.	2502.02748	null
2025-02-04	Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism	Yuhao Qing et.al.	2502.02581	null
2025-02-05	Brief analysis of DeepSeek R1 and its implications for Generative AI	Sarah Mercer et.al.	2502.02523	null
2025-02-04	M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference	Nikhil Bhendawade et.al.	2502.02040	null
2025-02-05	MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation	Haibo Tong et.al.	2502.01719	null
2025-02-04	MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs	Yuhang Zhou et.al.	2502.00997	null
2025-02-03	CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling	Xinze Wang et.al.	2502.00965	null
2025-02-02	UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs	Yufei He et.al.	2502.00806	link
2025-02-02	Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective	Yujin Oh et.al.	2502.00619	link
2025-02-01	PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning	Yu Feng et.al.	2502.00354	link
2025-02-01	Sigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts Perspective	Fanqi Yan et.al.	2502.00281	null
2025-01-31	Pheromone-based Learning of Optimal Reasoning Paths	Anirudh Chari et.al.	2501.19278	null
2025-01-31	Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning	Minh Le et.al.	2501.18936	null
2025-01-30	MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability	Yan Sun et.al.	2501.18439	null
2025-01-29	Free Agent in Agent-Based Mixture-of-Experts Generative AI Framework	Jung-Hua Liu et.al.	2501.17903	null
2025-01-29	Heuristic-Informed Mixture of Experts for Link Prediction in Multilayer Networks	Lucio La Cava et.al.	2501.17557	null
2025-01-28	3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow	Yueen Ma et.al.	2501.16698	null
2025-01-27	MoEVD: Enhancing Vulnerability Detection by Mixture-of-Experts (MoE)	Xu Yang et.al.	2501.16454	null
2025-01-27	Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference	Yinghan Li et.al.	2501.16103	null
2025-01-25	ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning	Shangqian Gao et.al.	2501.15316	null
2025-01-25	FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of Experts	Ziqi Liu et.al.	2501.15125	link
2025-01-25	Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning	Ziyu Zhao et.al.	2501.15103	null
2025-01-24	Mean-field limit from general mixtures of experts to quantum neural networks	Anderson Melchor Hernandez et.al.	2501.14660	null
2025-01-24	Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation	Shengzhe Zhang et.al.	2501.14269	link
2025-01-24	Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images	Zeyun Deng et.al.	2501.14198	null
2025-01-23	CSAOT: Cooperative Multi-Agent System for Active Object Tracking	Hy Nguyen et.al.	2501.13994	null
2025-01-22	Autonomy-of-Experts Models	Ang Lv et.al.	2501.13074	null
2025-01-22	LLM4WM: Adapting LLM for Wireless Multi-Tasking	Xuanyu Liu et.al.	2501.12983	null
2025-01-22	UniUIR: Considering Underwater Image Restoration as An All-in-One Learner	Xu Zhang et.al.	2501.12981	null
2025-01-22	BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR	Guodong Ma et.al.	2501.12602	null
2025-01-21	Modality Interactive Mixture-of-Experts for Fake News Detection	Yifan Liu et.al.	2501.12431	link
2025-01-21	SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection	Xiaocheng Zhang et.al.	2501.12430	null
2025-01-21	Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models	Samira Abnar et.al.	2501.12370	null
2025-01-21	MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks	Qishen Zhou et.al.	2501.12281	link
2025-01-21	Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models	Zihan Qiu et.al.	2501.11873	null
2025-01-18	FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models	Xinglin Pan et.al.	2501.10714	null
2025-01-17	OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning	Jinyuan Feng et.al.	2501.10062	null
2025-01-17	LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading	Kuan-Ming Liu et.al.	2501.09636	null
2025-01-14	MiniMax-01: Scaling Foundation Models with Lightning Attention	MiniMax et.al.	2501.08313	null
2025-01-14	GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism	Chen Tang et.al.	2501.07890	null
2025-01-18	PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration	Xiaoshui Huang et.al.	2501.07762	null
2025-01-13	A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis	Binyu Zhang et.al.	2501.07016	link
2025-01-12	Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning	Hanwen Zhong et.al.	2501.06884	link
2025-01-10	TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning	Yinghao Zhu et.al.	2501.05661	link
2025-01-09	Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing	Mengfan Liu et.al.	2501.05313	null
2025-01-07	LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes	Xiang Xu et.al.	2501.04004	link
2025-01-07	mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training	Xudong Liao et.al.	2501.03905	null
2025-01-08	Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection	Donatella Genovese et.al.	2501.03432	null
2025-01-12	Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning	Zhongyi Zhou et.al.	2501.02198	null
2025-01-03	MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders	Jiajun Cao et.al.	2501.01709	null
2025-01-01	REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization	Huyen Nguyen et.al.	2501.00779	null
2025-01-06	Superposition in Transformers: A Novel Way of Building Mixture of Experts	Ayoub Ben Chaliah et.al.	2501.00530	link
2024-12-31	CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection	Xiaolei Wang et.al.	2501.00346	null
2024-12-29	Multimodal Variational Autoencoder: a Barycentric View	Peijie Qiu et.al.	2412.20487	null
2024-12-29	A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement	Sidra Nasir et.al.	2412.20468	null
2024-12-28	UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity	Jingbo Lin et.al.	2412.20157	link
2024-12-28	Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection	Yaning Zhang et.al.	2412.20156	null
2024-12-27	DeepSeek-V3 Technical Report	DeepSeek-AI et.al.	2412.19437	link
2024-12-26	AskChart: Universal Chart Understanding through Textual Enhancement	Xudong Yang et.al.	2412.19146	link
2024-12-30	Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection	Xiaoyu Huang et.al.	2412.19108	null
2024-12-24	Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making	David Shoresh et.al.	2412.18593	link
2024-12-24	BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing	Yingjie Ma et.al.	2412.18065	link
2024-12-23	UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition	Li Fu et.al.	2412.17507	null
2024-12-23	BrainMAP: Learning Multiple Activation Pathways in Brain Networks	Song Wang et.al.	2412.17404	link
2024-12-22	Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models	Elie Antoine et.al.	2412.16971	null
2024-12-20	Theory of Mixture-of-Experts for Mobile Edge Computing	Hongbo Li et.al.	2412.15690	null
2024-12-19	MoEtion: Efficient and Reliable Checkpointing for Mixture-of-Experts Models at Scale	Swapnil Gandhi et.al.	2412.15411	null
2024-12-19	Qwen2.5 Technical Report	Qwen et.al.	2412.15115	link
2024-12-19	ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing	Ziteng Wang et.al.	2412.14711	link
2024-12-18	A Survey on Inference Optimization Techniques for Mixture of Experts Models	Jiacheng Liu et.al.	2412.14219	link
2024-12-18	SEKE: Specialised Experts for Keyword Extraction	Matej Martinc et.al.	2412.14087	link
2024-12-18	MedCoT: Medical Chain of Thought via Hierarchical Expert	Jiaxiang Liu et.al.	2412.13736	link
2024-12-17	SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks	Mátyás Vincze et.al.	2412.13053	link
2024-12-17	Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning	Moritz Reuss et.al.	2412.12953	null
2024-12-17	CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition	He Wang et.al.	2412.12760	null
2024-12-16	Investigating Mixture of Experts in Dense Retrieval	Effrosyni Sokli et.al.	2412.11864	null
2024-12-18	Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture	Jingze Shi et.al.	2412.11834	link
2024-12-16	Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation	Svetlana Pavlitska et.al.	2412.11608	link
2024-12-16	Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture	Jingyu Xu et.al.	2412.11557	null
2024-12-14	DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification	Yuhao Wang et.al.	2412.10650	link
2024-12-13	DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	Zhiyu Wu et.al.	2412.10302	link
2024-12-13	Llama 3 Meets MoE: Efficient Upcycling	Aditya Vavre et.al.	2412.09952	link
2024-12-12	Memory Layers at Scale	Vincent-Pierre Berges et.al.	2412.09764	link
2024-12-12	Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine	Xiaoshuang Huang et.al.	2412.09278	link
2024-12-12	Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective	Minh Le et.al.	2412.08285	null
2024-12-11	Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification	Xuanze Chen et.al.	2412.08193	link
2024-12-10	MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems	Yao Fu et.al.	2412.07067	null
2024-12-07	Partition of Unity Physics-Informed Neural Networks (POU-PINNs): An Unsupervised Framework for Physics-Informed Domain Decomposition and Mixtures of Experts	Arturo Rodriguez et.al.	2412.06842	null
2024-12-09	Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset	Xiao Wang et.al.	2412.06647	link
2024-12-09	UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts	Zhen Wan et.al.	2412.06340	null
2024-12-08	Hallucination-aware Optimization for Large Language Model-empowered Communications	Yinqiu Liu et.al.	2412.06007	link
2024-12-10	An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism	Qing Zhang et.al.	2412.05821	null
2024-12-10	RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts	Xu Liu et.al.	2412.05679	link
2024-12-07	SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts	Gengze Zhou et.al.	2412.05552	link
2024-12-07	Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers	Boxun Xu et.al.	2412.05540	null
2024-12-06	Steps are all you need: Rethinking STEM Education with Prompt Engineering	Krishnasai Addala et.al.	2412.05023	null
2024-12-09	Monet: Mixture of Monosemantic Experts for Transformers	Jungwoo Park et.al.	2412.04139	link
2024-12-05	Meta-Reinforcement Learning With Mixture of Experts for Generalizable Multi Access in Heterogeneous Wireless Networks	Zhaoyang Liu et.al.	2412.03850	null
2024-12-04	Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond	Loukas Ilias et.al.	2412.03483	null
2024-12-05	MQFL-FHE: Multimodal Quantum Federated Learning Framework with Fully Homomorphic Encryption	Siddhant Dutta et.al.	2412.01858	null
2024-12-05	Yi-Lightning Technical Report	01. AI et.al.	2412.01253	null
2024-11-30	Mixture of Experts for Node Classification	Yu Shi et.al.	2412.00418	null
2024-11-30	HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting	Shaohan Yu et.al.	2412.00316	null
2024-11-27	Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference	Andrii Skliar et.al.	2412.00099	null
2024-11-29	LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References	Shuguo Jiang et.al.	2411.19758	null
2024-11-28	On the effectiveness of discrete representations in sparse mixture of experts	Giang Do et.al.	2411.19402	null
2024-11-28	Bayesian Cluster Weighted Gaussian Models	Panagiotis Papastamoulis et.al.	2411.18957	link
2024-11-27	UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS	Haomin Zhuang et.al.	2411.18797	null
2024-11-27	Complexity Experts are Task-Discriminative Learners for Any Image Restoration	Eduard Zamfir et.al.	2411.18466	null
2024-11-27	Mixture of Experts in Image Classification: What’s the Sweet Spot?	Mathurin Videau et.al.	2411.18322	null
2024-11-26	$H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs	Selim Furkan Tekin et.al.	2411.17792	link
2024-11-25	Staleness-Centric Optimizations for Efficient Diffusion MoE Inference	Jiajun Luo et.al.	2411.16786	null
2024-11-29	MH-MoE: Multi-Head Mixture-of-Experts	Shaohan Huang et.al.	2411.16205	null
2024-11-25	LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy	Peng Cui et.al.	2411.16095	null
2024-11-24	Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution	Haiquan Wang et.al.	2411.15871	null
2024-11-24	LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training	Xiaoye Qu et.al.	2411.15708	link
2024-11-23	Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts	Qizhou Chen et.al.	2411.15432	null
2024-11-23	Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation	Fahao Chen et.al.	2411.15419	null
2024-11-20	MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification	Yuxuan Chen et.al.	2411.13004	null
2024-11-23	KAAE: Numerical Reasoning for Knowledge Graphs via Knowledge-aware Attributes Learning	Ming Yin et.al.	2411.12950	null
2024-11-19	Ultra-Sparse Memory Network	Zihao Huang et.al.	2411.12364	null
2024-11-18	MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs	Shiyi Cao et.al.	2411.11217	null
2024-11-16	Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts	Jinqiang Long et.al.	2411.10669	link
2024-11-15	Weakly-Supervised Multimodal Learning on MIMIC-CXR	Andrea Agostini et.al.	2411.10356	link
2024-11-21	Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models	Wei Wang et.al.	2411.10003	null
2024-11-13	Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection	Vima Gupta et.al.	2411.08982	null
2024-11-13	Sparse Upcycling: Inference Inefficient Finetuning	Sasha Doubov et.al.	2411.08968	null
2024-11-13	LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing	Xiaonan Nie et.al.	2411.08446	null
2024-11-12	Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach	Renzi Wang et.al.	2411.08232	null
2024-11-12	PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model	Yilun Liu et.al.	2411.08212	null
2024-11-12	Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge	Emmanuel Azuh Mensah et.al.	2411.07834	null
2024-11-11	Adaptive Conditional Expert Selection Network for Multi-domain Recommendation	Kuiyao Dong et.al.	2411.06826	null
2024-11-11	WDMoE: Wireless Distributed Mixture of Experts for Large Language Models	Nan Xue et.al.	2411.06681	null
2024-11-09	Learning Mixtures of Experts with EM	Quentin Fruytier et.al.	2411.06056	null
2024-11-08	NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts	Yen-Ting Lin et.al.	2411.05945	null
2024-11-05	DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts	Zelin Yao et.al.	2411.03025	link
2024-11-05	Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts	Yuan Xie et.al.	2411.02787	null
2024-11-06	Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent	Xingwu Sun et.al.	2411.02265	null
2024-11-04	FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation	Ziwei Zhan et.al.	2411.02115	null
2024-11-03	RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering	Hui Lin et.al.	2411.01595	null
2024-11-03	Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation	Mingrui Liu et.al.	2411.01457	null
2024-11-06	HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference	Peng Tang et.al.	2411.01433	null
2024-11-07	HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy	Shuqing Luo et.al.	2411.01288	link
2024-11-02	PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment	Dongxu Liu et.al.	2411.01245	null
2024-11-01	MoE-I $^2$ : Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition	Cheng Yang et.al.	2411.01016	null
2024-11-01	LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models	Nam V. Nguyen et.al.	2411.00918	link
2024-11-01	MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization	Jingming Guo et.al.	2411.00662	link
2024-10-31	Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts	Xiang Deng et.al.	2410.23836	null
2024-10-30	Efficient and Interpretable Grammatical Error Correction with Mixture of Experts	Muhammad Reza Qorib et.al.	2410.23507	link
2024-10-30	Stealing User Prompts from Mixture of Experts	Itay Yona et.al.	2410.22884	null
2024-10-30	MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning	Xujia Wang et.al.	2410.22782	null
2024-10-29	ProMoE: Fast MoE-based LLM Serving using Proactive Caching	Xiaoniu Song et.al.	2410.22134	null
2024-10-29	Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging	Li Shen et.al.	2410.21804	null
2024-10-29	Neural Experts: Mixture of Experts for Implicit Neural Representations	Yizhak Ben-Shabat et.al.	2410.21643	null
2024-10-28	FinTeamExperts: Role Specialized MOEs For Financial Analysis	Yue Yu et.al.	2410.21338	null
2024-10-28	Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving	Jiyao Wang et.al.	2410.21086	null
2024-10-27	Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation	Maohao Shen et.al.	2410.20336	null
2024-10-27	GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields	Yusuke Sekikawa et.al.	2410.20306	null
2024-10-25	DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction	Zelin Zang et.al.	2410.19504	link
2024-10-25	Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis	Weikai Li et.al.	2410.19225	link
2024-10-24	Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design	Ruisi Cai et.al.	2410.19123	link
2024-10-24	Mixture of Parrots: Experts improve memorization more than reasoning	Samy Jelassi et.al.	2410.19034	null
2024-10-24	MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases	Zhisheng Lin et.al.	2410.18406	null
2024-10-23	Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches	Kexin Feng et.al.	2410.18298	null
2024-10-23	MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning	Jingfan Zhang et.al.	2410.18035	null
2024-10-24	ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference	Xin He et.al.	2410.17954	null
2024-10-23	Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition	Artem Basharin et.al.	2410.17765	null
2024-10-22	Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling	Jialong Li et.al.	2410.17043	null
2024-10-21	LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset	Ruikun Zhang et.al.	2410.16095	link
2024-10-22	CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts	Zhenpeng Su et.al.	2410.16077	link
2024-10-21	Generalizing Motion Planners with Mixture of Experts for Autonomous Driving	Qiao Sun et.al.	2410.15774	link
2024-10-21	ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts	Xumeng Han et.al.	2410.15732	null
2024-10-20	Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs	Xin Zhou et.al.	2410.15438	null
2024-10-20	LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration	Yuang Ai et.al.	2410.15385	link
2024-10-19	MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning	Suning Huang et.al.	2410.14972	null
2024-10-18	MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts	Rachel S. Y. Teo et.al.	2410.14574	link
2024-10-18	ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction	Haoyu He et.al.	2410.14099	link
2024-10-17	Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks	Jinze Zhao et.al.	2410.13964	null
2024-10-16	On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs	Herun Wan et.al.	2410.12600	null
2024-10-16	Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts	Fanqi Yan et.al.	2410.12258	null
2024-10-16	EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference	Yulei Qian et.al.	2410.12247	null
2024-10-15	MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router	Yanyue Xie et.al.	2410.12013	null
2024-10-15	MoH: Multi-Head Attention as Mixture-of-Head Attention	Peng Jin et.al.	2410.11842	link
2024-10-15	GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation	Fei Tang et.al.	2410.11841	link
2024-10-15	Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models	James Vo et.al.	2410.11654	null
2024-10-16	Quadratic Gating Functions in Mixture of Experts: A Statistical Insight	Pedram Akbarian et.al.	2410.11222	null
2024-10-16	Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free	Ziyue Li et.al.	2410.10814	link
2024-10-14	Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts	Guorui Zheng et.al.	2410.10626	link
2024-10-14	Learning to Ground VLMs without Forgetting	Aritra Bhowmik et.al.	2410.10491	null
2024-10-14	Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts	Xu Liu et.al.	2410.10469	null
2024-10-15	Ada-K Routing: Boosting the Efficiency of MoE-based LLMs	Tongtian Yue et.al.	2410.10456	null
2024-10-14	Tighter Risk Bounds for Mixtures of Experts	Wissam Akretche et.al.	2410.10397	null
2024-10-14	Scalable Multi-Domain Adaptation of Language Models using Modular Experts	Peter Schafhalter et.al.	2410.10181	null
2024-10-14	Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models	Jun Luo et.al.	2410.10114	link
2024-10-14	AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality	Peijun Qing et.al.	2410.10054	link
2024-10-13	ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL	Zhanqiu Guo et.al.	2410.09781	null
2024-10-11	Semi-Supervised Learning of Noisy Mixture of Experts Models	Oh-Ran Kwon et.al.	2410.09039	null
2024-10-11	Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering	I-Chun Chen et.al.	2410.08589	link
2024-10-10	Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts	Sukwon Yun et.al.	2410.08245	link
2024-10-10	Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training	Gen Luo et.al.	2410.08202	null
2024-10-10	Efficient Dictionary Learning with Switch Sparse Autoencoders	Anish Mudide et.al.	2410.08201	link
2024-10-10	More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing	Sagi Shaier et.al.	2410.08003	link
2024-10-10	SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture	Jiayi Han et.al.	2410.07739	null
2024-10-10	Upcycling Large Language Models into Mixture of Experts	Ethan He et.al.	2410.07524	null
2024-10-09	MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts	Peng Jin et.al.	2410.07348	link
2024-10-09	Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders	David Noever et.al.	2410.06462	null
2024-10-09	Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs	Ruijia Niu et.al.	2410.06431	null
2024-10-08	Probing the Robustness of Theory of Mind in Large Language Models	Christian Nickel et.al.	2410.06271	null
2024-10-08	MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More	Wei Huang et.al.	2410.06270	link
2024-10-08	Aria: An Open Multimodal Native Mixture-of-Experts Model	Dongxu Li et.al.	2410.05993	link
2024-10-08	Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models	Siqi Wang et.al.	2410.05661	null
2024-10-07	Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild	Xinyu Zhao et.al.	2410.05357	link
2024-10-07	Multimodal Fusion Strategies for Mapping Biophysical Landscape Features	Lucia Gordon et.al.	2410.04833	link
2024-10-06	Realizing Video Summarization from the Path of Language-based Semantic Understanding	Kuan-Chen Mu et.al.	2410.04511	null
2024-10-09	Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding	Wei Wu et.al.	2410.03553	null
2024-10-04	Exploring the Benefit of Activation Sparsity in Pre-training	Zhengyan Zhang et.al.	2410.03440	link
2024-10-03	MLP-KAN: Unifying Deep Representation and Function Learning	Yunhong He et.al.	2410.03027	link
2024-10-03	On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions	Huy Nguyen et.al.	2410.02935	null
2024-10-03	Neutral residues: revisiting adapters for model extension	Franck Signe Talla et.al.	2410.02744	null
2024-10-03	Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping	Ziye Huang et.al.	2410.02475	null
2024-10-03	MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction	Zhaojian Yu et.al.	2410.02241	null
2024-10-03	Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts	Minh Le et.al.	2410.02200	link
2024-10-04	Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices	Andres Potapczynski et.al.	2410.02117	link
2024-10-04	EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing	Haotian Sun et.al.	2410.02098	null
2024-10-02	Don’t flatten, tokenize! Unlocking the key to SoftMoE’s efficacy in deep RL	Ghada Sokar et.al.	2410.01930	null
2024-10-02	Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models	Shayekh Bin Islam et.al.	2410.01782	link
2024-10-02	Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging	Tingfeng Hui et.al.	2410.01610	null
2024-10-02	The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs	Hong Li et.al.	2410.01417	null
2024-10-01	MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards	Sheng Wang et.al.	2410.00938	null
2024-10-01	UniAdapt: A Universal Adapter for Knowledge Calibration	Tai D. Nguyen et.al.	2410.00454	null
2024-10-01	Robust Traffic Forecasting against Spatial Shift over Years	Hongjun Wang et.al.	2410.00373	link
2024-09-29	IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method	Chaohui Xu et.al.	2410.00059	null
2024-09-30	MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning	Haotian Zhang et.al.	2409.20566	null
2024-10-02	CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling	Jihai Zhang et.al.	2409.19291	link
2024-09-27	SciDFM: A Large Language Model with Mixture-of-Experts for Science	Liangtai Sun et.al.	2409.18412	null
2024-09-26	Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE	Xun Zhu et.al.	2409.17508	link
2024-09-26	A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction	Guangyu Wang et.al.	2409.17440	link
2024-09-24	Leveraging Mixture of Experts for Improved Speech Deepfake Detection	Viola Negroni et.al.	2409.16077	null
2024-10-02	Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts	Xiaoming Shi et.al.	2409.16040	link
2024-09-24	Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM	Fengrun Zhang et.al.	2409.15905	null
2024-09-24	Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks	Jiayi He et.al.	2409.15695	null
2024-09-23	A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts	Hugo Inzirillo et.al.	2409.15161	link
2024-09-23	Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond	Hong Chen et.al.	2409.14993	null
2024-09-21	Routing in Sparsely-gated Language Models responds to Context	Stefan Arnold et.al.	2409.14107	null
2024-09-20	On-device Collaborative Language Modeling via a Mixture of Generalists and Specialists	Dongyang Fan et.al.	2409.13931	link
2024-09-20	Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning	Annette Spooner et.al.	2409.13791	null
2024-09-19	Robust Audiovisual Speech Recognition Models with Mixture-of-Experts	Yihan Wu et.al.	2409.12370	null
2024-09-18	GRIN: GRadient-INformed MoE	Liyuan Liu et.al.	2409.12136	null
2024-09-18	Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0	Zhiyong Wang et.al.	2409.11909	link
2024-09-17	LPT++: Efficient Training on Mixture of Long-tailed Experts	Bowen Dong et.al.	2409.11323	null
2024-09-19	LOLA – An Open-Source Massively Multilingual Large Language Model	Nikit Srivastava et.al.	2409.11272	link
2024-09-16	Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression	Yi-Hsin Li et.al.	2409.10101	null
2024-09-14	MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving	Enming Zhang et.al.	2409.07267	link
2024-09-10	DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models	Maryam Akhavan Aghdam et.al.	2409.06669	null
2024-09-10	STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning	Jaeseong Lee et.al.	2409.06211	null
2024-09-10	VE: Modeling Multivariate Time Series Correlation with Variate Embedding	Shangjiong Wang et.al.	2409.06169	link
2024-09-09	Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models	Hongyang Lei et.al.	2409.05929	link
2024-09-09	Optical Spiking Neurons Enable High-Speed and Energy-Efficient Optical Neural Networks	Bo Xu et.al.	2409.05726	null
2024-09-09	Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection	Tianwu Lei et.al.	2409.05611	null
2024-09-05	Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions	Zemian Ke et.al.	2409.03282	null
2024-09-05	ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding	Zhengzhuo Xu et.al.	2409.03277	null
2024-09-05	xLAM: A Family of Large Action Models to Empower AI Agent Systems	Jianguo Zhang et.al.	2409.03215	link
2024-09-04	Configurable Foundation Models: Building LLMs from a Modular Perspective	Chaojun Xiao et.al.	2409.02877	null
2024-09-04	Pluralistic Salient Object Detection	Xuelu Feng et.al.	2409.02368	null
2024-09-03	OLMoE: Open Mixture-of-Experts Language Models	Niklas Muennighoff et.al.	2409.02060	link
2024-09-05	Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model	Hukai Huang et.al.	2409.02050	null
2024-09-02	Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning	Soumajyoti Sarkar et.al.	2409.01483	null
2024-09-02	Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching	Sungmin Yun et.al.	2409.01141	null
2024-09-04	Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack	Guanzhong Chen et.al.	2409.00960	link
2024-09-02	Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts	Youngseog Chung et.al.	2409.00879	null
2024-08-29	Gradient-free variational learning with conditional mixture networks	Conor Heins et.al.	2408.16429	link
2024-08-28	Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models	Yuncheng Yang et.al.	2408.15915	link
2024-08-28	Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts	Nikolas Gritsch et.al.	2408.15901	null
2024-08-28	LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation	Fangxun Shu et.al.	2408.15881	link
2024-08-28	Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts	Lean Wang et.al.	2408.15664	null
2024-08-27	Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis	Sakhinana Sagar Srinivas et.al.	2408.15305	null
2024-08-27	MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce	Hao Jiang et.al.	2408.14968	null
2024-08-24	Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings	Sagar Srinivas Sakhinana et.al.	2408.13622	null
2024-08-23	The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities	Venkatesh Balavadhani Parthasarathy et.al.	2408.13296	null
2024-08-23	Guiding IoT-Based Healthcare Alert Systems with Large Language Models	Yulan Gao et.al.	2408.13071	null
2024-08-23	DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation	Xiaowei Mao et.al.	2408.12809	link
2024-08-23	Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth	Yuxiang Wei et.al.	2408.12803	null
2024-08-23	La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection	Hang Zou et.al.	2408.12793	null
2024-08-22	SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging	Mohammadreza Pourreza et.al.	2408.12733	null
2024-08-22	Jamba-1.5: Hybrid Transformer-Mamba Models at Scale	Jamba Team et.al.	2408.12570	null
2024-08-22	Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators	Dingkang Yang et.al.	2408.12325	link
2024-08-21	MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing	Hao Zhou et.al.	2408.11396	link
2024-08-21	KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting?	Xiao Han et.al.	2408.11306	link
2024-08-21	FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts	Hanzi Mei et.al.	2408.11304	null
2024-08-20	Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data	Atmika Gorti et.al.	2408.11247	null
2024-08-20	Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting	Jianxiang Zhou et.al.	2408.10822	link
2024-08-20	AnyGraph: Graph Foundation Model in the Wild	Lianghao Xia et.al.	2408.10700	link
2024-08-20	HMoE: Heterogeneous Mixture of Experts for Language Modeling	An Wang et.al.	2408.10681	null
2024-08-19	AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference	Shuzhang Zhong et.al.	2408.10284	link
2024-08-17	FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models	Xiaochen Wang et.al.	2408.10276	link
2024-08-19	Customizing Language Models with Instance-wise LoRA for Sequential Recommendation	Xiaoyu Kong et.al.	2408.10159	link
2024-08-19	A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method	Hang Zou et.al.	2408.09752	null
2024-08-16	Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection	Haohao Zhu et.al.	2408.08551	link
2024-08-17	BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts	Qizhen Zhang et.al.	2408.08274	null
2024-08-14	Beyond Inter-Item Relations: Dynamic Adaptive Mixture-of-Experts for LLM-Based Sequential Recommendation	CanYi Liu et.al.	2408.07427	null
2024-08-13	A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning	Prateek Yadav et.al.	2408.07057	null
2024-08-13	Layerwise Recurrent Router for Mixture-of-Experts	Zihan Qiu et.al.	2408.06793	link
2024-08-13	AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies	Bo-Wen Zhang et.al.	2408.06567	null
2024-08-10	HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou	Xu Wang et.al.	2408.05430	null
2024-08-08	Understanding the Performance and Estimating the Cost of LLM Fine-Tuning	Yuchen Xia et.al.	2408.04693	link
2024-08-08	Partial Experts Checkpoint: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training	Weilin Cai et.al.	2408.04307	null
2024-08-08	LaDiMo: Layer-wise Distillation Inspired MoEfier	Sungyoon Kim et.al.	2408.04278	null
2024-08-07	MoExtend: Tuning New Experts for Modality and Task Extension	Shanshan Zhong et.al.	2408.03511	link
2024-08-05	Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization	Changtao Miao et.al.	2408.02306	null
2024-08-02	HMDN: Hierarchical Multi-Distribution Network for Click-Through Rate Prediction	Xingyu Lou et.al.	2408.01332	null
2024-08-01	Multimodal Fusion and Coherence Modeling for Video Topic Segmentation	Hai Yu et.al.	2408.00365	null
2024-08-12	MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts	Xi Victoria Lin et.al.	2407.21770	null
2024-07-31	PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning	Min Jae Jung et.al.	2407.21571	null
2024-07-30	Distribution Learning for Molecular Regression	Nima Shoghi et.al.	2407.20475	null
2024-07-29	Time series forecasting with high stakes: A field study of the air cargo industry	Abhinav Garg et.al.	2407.20192	null
2024-07-30	Mixture of Nested Experts: Adaptive Processing of Visual Tokens	Gagan Jain et.al.	2407.19985	null
2024-07-28	Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models	Mohammed Al-Maamari et.al.	2407.19610	link
2024-07-26	Wolf: Captioning Everything with a World Summarization Framework	Boyi Li et.al.	2407.18908	null
2024-07-26	MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition	Chang Liu et.al.	2407.18616	link
2024-07-26	Dynamic Language Group-Based MoE: Enhancing Efficiency and Flexibility for Code-Switching Speech Recognition	Hukai Huang et.al.	2407.18581	link
2024-07-25	How Lightweight Can A Vision Transformer Be	Jen Hong Tan et.al.	2407.17783	null
2024-07-24	Exploring Domain Robust Lightweight Reward Models based on Router Mechanism	Hyuk Namgoong et.al.	2407.17546	null
2024-07-24	M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis	Junyu Li et.al.	2407.17267	link
2024-07-25	Cheems: Wonderful Matrices More Efficient and More Effective Architecture	Jingze Shi et.al.	2407.16958	null
2024-07-22	Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget	Vikash Sehwag et.al.	2407.15811	link
2024-07-22	Norface: Improving Facial Expression Analysis by Identity Normalization	Hanwei Liu et.al.	2407.15617	link
2024-07-19	Mixture of Experts with Mixture of Precisions for Tuning Quality of Service	HamidReza Imani et.al.	2407.14417	null
2024-07-19	EVLM: An Efficient Vision-Language Model for Visual Understanding	Kaibing Chen et.al.	2407.14177	null
2024-07-19	Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models	Qiong Wu et.al.	2407.14093	null
2024-07-18	Discussion: Effective and Interpretable Outcome Prediction by Training Sparse Mixtures of Linear Experts	Francesco Folino et.al.	2407.13526	null
2024-07-18	Mixture of Experts based Multi-task Supervise Learning from Crowds	Tao Han et.al.	2407.13268	null
2024-07-15	MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration	Yulin Ren et.al.	2407.10833	null
2024-07-18	Qwen2 Technical Report	An Yang et.al.	2407.10671	link
2024-07-15	Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering	Francesco Di Sario et.al.	2407.10389	null
2024-07-13	Low-Rank Interconnected Adaptation Across Layers	Yibo Zhong et.al.	2407.09946	link
2024-07-13	MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts	Zhenpeng Su et.al.	2407.09816	link
2024-07-12	Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts	Zeliang Zhang et.al.	2407.09590	null
2024-07-11	An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio	Siding Zeng et.al.	2407.08239	null
2024-07-10	MoVEInt: Mixture of Variational Experts for Learning Human-Robot Interactions from Demonstrations	Vignesh Prasad et.al.	2407.07636	link
2024-07-10	Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation	Szymon Płotka et.al.	2407.07514	link
2024-07-09	A Simple Architecture for Enterprise Large Language Model Applications based on Role based security and Clearance Levels using Retrieval-Augmented Generation or Mixture of Experts	Atilla Özgür et.al.	2407.06718	null
2024-07-06	SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation	Guoan Wang et.al.	2407.04938	null
2024-07-06	Completed Feature Disentanglement Learning for Multimodal MRIs Analysis	Tianling Liu et.al.	2407.04916	link
2024-07-05	YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation	Sungkyun Chang et.al.	2407.04822	link
2024-07-05	Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement	Yongji Wu et.al.	2407.04656	null
2024-07-05	MobileFlow: A Multimodal LLM For Mobile GUI Agent	Songqin Nong et.al.	2407.04346	null
2024-07-04	Mixture of A Million Experts	Xu Owen He et.al.	2407.04153	null
2024-07-02	Terminating Differentiable Tree Experts	Jonathan Thomm et.al.	2407.02060	null
2024-07-05	Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models	Zihan Wang et.al.	2407.01906	link
2024-07-01	Uncertainty Quantification in Table Structure Recognition	Kehinde Ajayi et.al.	2407.01731	link
2024-07-01	Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning	Yixiao Wang et.al.	2407.01531	null
2024-07-01	Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation	Nadezhda Chirkova et.al.	2407.01126	null
2024-07-01	Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs	Enshu Liu et.al.	2407.00945	link
2024-07-03	Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules	Xinglin Pan et.al.	2407.00599	link
2024-07-02	One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts	Ruochen Wang et.al.	2407.00256	null
2024-06-28	LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models	Renzhi Wang et.al.	2406.20030	null
2024-06-28	Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model	Longrong Yang et.al.	2406.19905	link
2024-06-28	SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR	Qiuming Zhao et.al.	2406.19706	link
2024-06-27	A Teacher Is Worth A Million Instructions	Nikhil Kothari et.al.	2406.19112	link
2024-06-27	Towards Personalized Federated Multi-scenario Multi-task Recommendation	Yue Ding et.al.	2406.18938	null
2024-06-26	Mixture of Experts in a Mixture of RL settings	Timon Willi et.al.	2406.18420	null
2024-06-26	A Closer Look into Mixture-of-Experts in Large Language Models	Ka Man Lo et.al.	2406.18219	link
2024-06-26	SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR	Shuaishuai Ye et.al.	2406.18021	null
2024-06-24	Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction	Bruce Rushing et.al.	2406.17150	link
2024-06-24	LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training	Tong Zhu et.al.	2406.16554	link
2024-06-25	OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser	Jingze Shi et.al.	2406.16495	link
2024-06-24	Theory on Mixture-of-Experts in Continual Learning	Hongbo Li et.al.	2406.16437	null
2024-06-22	SimSMoE: Solving Representational Collapse via Similarity Measure	Giang Do et.al.	2406.15883	null
2024-06-20	Voice Disorder Analysis: a Transformer-based Approach	Alkis Koudounas et.al.	2406.14693	link
2024-06-19	Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation	Qian Chen et.al.	2406.13583	null
2024-06-19	AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models	Zihao Zeng et.al.	2406.13233	link
2024-06-18	Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts	Haoxiang Wang et.al.	2406.12845	link
2024-06-18	P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts	Yuhao Dan et.al.	2406.12548	null
2024-06-18	Variational Distillation of Diffusion Policies into Mixture of Experts	Hongyi Zhou et.al.	2406.12538	null
2024-06-18	GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory	Haoze Wu et.al.	2406.12375	link
2024-06-17	Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding	Ukyo Honda et.al.	2406.12060	link
2024-06-17	DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence	DeepSeek-AI et.al.	2406.11931	link
2024-06-17	Graph Knowledge Distillation to Mixture of Experts	Pavel Rumiantsev et.al.	2406.11919	link
2024-06-17	$\texttt{MoE-RBench}$ : Towards Building Reliable Language Models with Sparse Mixture-of-Experts	Guanjie Chen et.al.	2406.11353	link
2024-06-17	Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts	Tong Zhu et.al.	2406.11256	link
2024-06-14	Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion	Anke Tang et.al.	2406.09770	link
2024-06-13	DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts	Joel Ong et.al.	2406.08742	link
2024-06-12	Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark	Pingzhi Li et.al.	2406.08155	link
2024-06-11	Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters	Yixin Song et.al.	2406.05955	null
2024-06-08	Flexible and Adaptable Summarization via Expertise Separation	Xiuying Chen et.al.	2406.05360	link
2024-06-07	MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter	Jitai Hao et.al.	2406.04984	link
2024-06-07	MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks	Xingkui Zhu et.al.	2406.04801	link
2024-06-05	Style Mixture of Experts for Expressive Text-To-Speech Synthesis	Ahad Jawaid et.al.	2406.03637	null
2024-06-05	Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach	Haoyu Han et.al.	2406.03464	null
2024-06-05	Continual Traffic Forecasting via Mixture of Experts	Sanghyun Lee et.al.	2406.03140	null
2024-06-05	Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models	Raeid Saqur et.al.	2406.02969	null
2024-06-04	Parrot: Multilingual Visual Instruction Tuning	Hai-Long Sun et.al.	2406.02539	link
2024-06-04	Demystifying the Compression of Mixture-of-Experts Through a Unified Framework	Shwai He et.al.	2406.02500	link
2024-06-02	Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts – Physics Informed Neural Operator Forward Model	Clement Etienam et.al.	2406.00889	link
2024-06-01	A Gaussian Process-based Streaming Algorithm for Prediction of Time Series With Regimes and Outliers	Daniel Waxman et.al.	2406.00570	link
2024-06-01	Optimizing 6G Integrated Sensing and Communications (ISAC) via Expert Networks	Jiacheng Wang et.al.	2406.00408	null
2024-05-30	Low-dimensional approximations of the conditional law of Volterra processes: a non-positive curvature approach	Reza Arabpour et.al.	2405.20094	null
2024-06-02	MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors	Renzhi Wang et.al.	2405.19086	null
2024-06-02	Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design	Markus J. Buehler et.al.	2405.19076	link
2024-05-29	Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization	Shengcai Liu et.al.	2405.18884	link
2024-05-29	MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models	Taehyun Kim et.al.	2405.18832	null
2024-05-29	Yuan 2.0-M32: Mixture of Experts with Attention Router	Shaohua Wu et.al.	2405.17976	link
2024-05-28	LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design	Rui Kong et.al.	2405.17741	null
2024-05-27	Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node	Andreas Charalampopoulos et.al.	2405.16836	link
2024-05-26	Mixture of Experts Using Tensor Products	Zhan Su et.al.	2405.16671	link
2024-05-30	A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts	Mohammed Nowaz Rabbani Chowdhury et.al.	2405.16646	null
2024-05-26	Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation	Rongyu Zhang et.al.	2405.16486	link
2024-05-25	MoEUT: Mixture-of-Experts Universal Transformers	Róbert Csordás et.al.	2405.16039	link
2024-05-23	Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training	Xianzhi Du et.al.	2405.15052	link
2024-05-23	Unchosen Experts Can Contribute Too: Unleashing MoE Models’ Power by Self-Contrast	Chufan Shi et.al.	2405.14507	link
2024-05-23	Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models	Yongxin Guo et.al.	2405.14297	link
2024-05-23	Graph Sparsification via Mixture of Graphs	Guibin Zhang et.al.	2405.14260	link
2024-05-23	Statistical Advantages of Perturbing Cosine Router in Sparse Mixture of Experts	Huy Nguyen et.al.	2405.14131	null
2024-05-23	Mixture of Experts Meets Prompt-Based Continual Learning	Minh Le et.al.	2405.14124	link
2024-05-22	Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts	Huy Nguyen et.al.	2405.13997	null
2024-05-22	xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token	Xin Cheng et.al.	2405.13792	link
2024-05-24	MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models	Jingwei Xu et.al.	2405.13053	link
2024-05-21	Optimizing Generative AI Networking: A Dual Perspective with Multi-Agent Systems and Mixture of Experts	Ruichen Zhang et.al.	2405.12472	null
2024-05-21	Ensemble and Mixture-of-Experts DeepONets For Operator Learning	Ramansh Sharma et.al.	2405.11907	link
2024-05-19	Learning More Generalized Experts by Merging Experts in Mixture-of-Experts	Sejik Park et.al.	2405.11530	null
2024-05-18	Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts	Yunxin Li et.al.	2405.11273	link
2024-05-16	Many Hands Make Light Work: Task-Oriented Dialogue System with Module-Based Mixture-of-Experts	Ruolin Su et.al.	2405.09744	null
2024-05-15	M $^4$ oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts	Yufeng Jiang et.al.	2405.09446	link
2024-05-13	Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition	Zhiyong Yang et.al.	2405.07780	link
2024-05-07	SUTRA: Scalable Multilingual Language Model Architecture	Abhijit Bendale et.al.	2405.06694	null
2024-05-09	A Mixture of Experts Approach to 3D Human Motion Prediction	Edmund Shieh et.al.	2405.06088	link
2024-05-09	A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds	Christopher Z. Cui et.al.	2405.06059	null
2024-05-09	EWMoE: An effective model for global weather forecasting with mixture-of-experts	Lihao Gan et.al.	2405.06004	link
2024-05-09	CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts	Jiachen Li et.al.	2405.05949	link
2024-05-16	DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model	DeepSeek-AI et.al.	2405.04434	link
2024-05-07	Enhancing Physical Layer Communication Security through Generative AI with Mixture of Experts	Changyuan Zhao et.al.	2405.04198	null
2024-05-06	Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training	Zexuan Zhong et.al.	2405.03133	null
2024-05-06	WDMoE: Wireless Distributed Large Language Models with Mixture of Experts	Nan Xue et.al.	2405.03131	null
2024-05-31	Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models	Xudong Lu et.al.	2402.14800	null
2024-10-29	GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts	Shirley Wu et.al.	2312.04693	null
2021-05-25	Tensor-variate Mixture of Experts for Proportional Myographic Control of a Robotic Hand	Noémie Jaquier et.al.	1902.11104	null
2018-06-22	Mixtures of Experts Models	Isobel Claire Gormley et.al.	1806.08200	null