Contributors Forks Stargazers Issues

Updated on 2025.10.22

Publish Date Title Authors PDF Code
2025-10-21 SLICE: SLO-Driven Scheduling for LLM Inference on Edge Computing Devices Pan Zhou et.al. 2510.18544 null
2025-10-19 Justitia: Fair and Efficient Scheduling for LLM Applications Mingyan Yang et.al. 2510.17015 null
2025-10-18 FourierCompress: Layer-Aware Spectral Activation Compression for Efficient and Accurate Collaborative LLM Inference Jian Ma et.al. 2510.16418 null
2025-10-16 AMS-QUANT: Adaptive Mantissa Sharing for Floating-point Quantization Mengtao Lv et.al. 2510.16045 null
2025-10-16 Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing Tianhua Xia et.al. 2510.16040 null
2025-10-17 TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs Sibo Xiao et.al. 2510.15545 null
2025-10-16 Tail-Optimized Caching for LLM Inference Wenxin Zhang et.al. 2510.15152 null
2025-10-16 xLLM Technical Report Tongxuan Liu et.al. 2510.14686 null
2025-10-16 MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving Jungi Lee et.al. 2510.14557 null
2025-10-16 FairBatching: Fairness-Aware Batch Formation for LLM Inference Hongtao Lyu et.al. 2510.14392 null
2025-10-16 Qwen3Guard Technical Report Haiquan Zhao et.al. 2510.14276 null
2025-10-15 Efficiently Executing High-throughput Lightweight LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management Thanh Son Phung et.al. 2510.14024 null
2025-10-15 Adaptive Rescheduling in Prefill-Decode Disaggregated LLM Inference Zhibin Wang et.al. 2510.13668 null
2025-10-15 F-BFQ: Flexible Block Floating-Point Quantization Accelerator for LLMs Jude Haris et.al. 2510.13401 null
2025-10-15 Taming the Fragility of KV Cache Eviction in LLM Inference Yuan Feng et.al. 2510.13334 null
2025-10-15 Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference Nikhil Bhendawade et.al. 2510.13161 null
2025-10-14 Beyond Postconditions: Can Large Language Models infer Formal Contracts for Automatic Software Verification? Cedric Richter et.al. 2510.12702 null
2025-10-14 Traveling Salesman-Based Token Ordering Improves Stability in Homomorphically Encrypted Language Models Donghwan Rho et.al. 2510.12343 null
2025-10-13 Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding Bingjie Zhu et.al. 2510.11331 null
2025-10-13 Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs João Paulo Cardoso de Lima et.al. 2510.11192 null
2025-10-11 CacheClip: Accelerating RAG with Effective KV Cache Reuse Bin Yang et.al. 2510.10129 null
2025-10-10 FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference Yu-Chen Lu et.al. 2510.09332 null
2025-10-10 Semantic-Condition Tuning: Fusing Graph Context with Large Language Models for Knowledge Graph Completion Ruitong Liu et.al. 2510.08966 null
2025-10-13 Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors Xin Liu et.al. 2510.08907 null
2025-10-09 SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference Hengrui Zhang et.al. 2510.08544 null
2025-10-09 From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill Gunjun Lee et.al. 2510.08055 null
2025-10-09 Augur: Modeling Covariate Causal Associations in Time Series via Large Language Models Zhiqing Cui et.al. 2510.07858 null
2025-10-09 OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference Yuzhe Gu et.al. 2510.07651 null
2025-10-08 Accelerating Diffusion LLM Inference via Local Determinism Propagation Fanheng Kong et.al. 2510.07081 null
2025-10-08 Accelerating Sparse Ternary GEMM for Quantized LLM inference on Apple Silicon Baraq Lipshitz et.al. 2510.06957 null
2025-10-07 VecInfer: Efficient LLM Inference with Low-Bit KV Cache via Outlier-Suppressed Vector Quantization Dingyu Yao et.al. 2510.06175 null
2025-10-07 lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models Haoxin Wang et.al. 2510.06126 null
2025-10-07 From Principles to Practice: A Systematic Study of LLM Serving on Multi-core NPUs Tianhao Zhu et.al. 2510.05632 null
2025-10-06 KVLinC : KV Cache Quantization with Hadamard Rotation and Linear Correction Utkarsh Saxena et.al. 2510.05373 null
2025-10-06 A novel hallucination classification framework Maksym Zavhorodnii et.al. 2510.05189 null
2025-10-06 RevMine: An LLM-Assisted Tool for Code Review Mining and Analysis Across Git Platforms Samah Kansab et.al. 2510.04796 null
2025-10-05 Speculative Actions: A Lossless Framework for Faster Agentic Systems Naimeng Ye et.al. 2510.04371 null
2025-10-03 Best-of-Majority: Minimax-Optimal Strategy for Pass@ $k$ Inference Scaling Qiwei Di et.al. 2510.03199 null
2025-10-03 Dissecting Transformers: A CLEAR Perspective towards Green AI Hemang Jain et.al. 2510.02810 null
2025-10-03 HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference Shubham Negi et.al. 2510.02675 null
2025-10-01 PolyLink: A Blockchain Based Decentralized Edge AI Platform for LLM Inference Hongbo Liu et.al. 2510.02395 null
2025-10-03 Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey Qiyuan Liu et.al. 2510.01925 null
2025-10-02 SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning Shicheng Liu et.al. 2510.01832 null
2025-10-01 HiSpec: Hierarchical Speculative Decoding for LLMs Avinash Kumar et.al. 2510.01336 null
2025-10-01 Generalized Parallel Scaling with Interdependent Generations Harry Dong et.al. 2510.01143 null
2025-10-01 AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size Guanxi Lu et.al. 2509.26432 null
2025-09-30 Parallax: Efficient LLM Inference Service over Decentralized Environment Chris Tong et.al. 2509.26182 null
2025-09-30 Accelerating LLM Inference with Precomputed Query Storage Jay H. Park et.al. 2509.25919 null
2025-09-30 SAIL: SRAM-Accelerated LLM Inference System with Lookup-Table-based GEMV Jingyao Zhang et.al. 2509.25853 null
2025-09-29 SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching Xinye Zhao et.al. 2509.24832 null
2025-09-29 Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding Sungkyun Kim et.al. 2509.24328 null
2025-09-29 VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference Ke Wang et.al. 2509.24257 null
2025-09-28 Collaborative Device-Cloud LLM Inference through Reinforcement Learning Wenzhi Fang et.al. 2509.24050 null
2025-10-01 A Predictive and Synergistic Two-Layer Scheduling Framework for LLM Serving Yue Zhang et.al. 2509.23384 null
2025-09-27 Scaling LLM Test-Time Compute with Mobile NPU on Smartphones Zixu Hao et.al. 2509.23324 null
2025-09-27 Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization Vage Egiazarian et.al. 2509.23202 null
2025-09-26 Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs Shirin Alanova et.al. 2509.22166 null
2025-09-26 Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding Shijing Hu et.al. 2509.22134 null
2025-09-26 SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation Haotian Tan et.al. 2509.21932 null
2025-09-25 Preemptive Detection and Steering of LLM Misalignment via Latent Reachability Sathwik Karnik et.al. 2509.21528 null
2025-09-25 Semantic Edge-Cloud Communication for Real-Time Urban Traffic Surveillance with ViT and LLMs over Mobile Networks Murat Arda Onsu et.al. 2509.21259 null
2025-09-24 FastEagle: Cascaded Drafting for Accelerating Speculative Decoding Haiduo Huang et.al. 2509.20416 null
2025-09-24 Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment Deokjae Lee et.al. 2509.20214 null
2025-09-24 Gyges: Dynamic Cross-Instance Parallelism Transformation for Efficient LLM Inference Haoyu Chen et.al. 2509.19729 null
2025-09-23 Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs Marcin Chrapek et.al. 2509.18886 null
2025-09-22 Multimodal Health Risk Prediction System for Chronic Diseases via Vision-Language Fusion and Large Language Models Dingxin Lu et.al. 2509.18221 null
2025-09-28 Disaggregated Prefill and Decoding Inference System for Large Language Model Serving on Multi-Vendor GPUs Xing Chen et.al. 2509.17542 null
2025-09-22 Cronus: Efficient LLM inference on Heterogeneous GPU Clusters via Partially Disaggregated Prefill Yunzhao Liu et.al. 2509.17357 null
2025-09-22 Multi-View Attention Multiple-Instance Learning Enhanced by LLM Reasoning for Cognitive Distortion Detection Jun Seo Kim et.al. 2509.17292 null
2025-09-21 MoA-Off: Adaptive Heterogeneous Modality-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference Zheming Yang et.al. 2509.16995 null
2025-09-20 Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads Mert Hidayetoglu et.al. 2509.16495 null
2025-09-19 LightCode: Compiling LLM Inference for Photonic-Electronic Systems Ryan Tomich et.al. 2509.16443 null
2025-09-19 LLM Cache Bandit Revisited: Addressing Query Heterogeneity for Cost-Effective LLM Inference Hantao Yang et.al. 2509.15515 null
2025-09-18 A1: Asynchronous Test-Time Scaling via Conformal Prediction Jing Xiong et.al. 2509.15148 null
2025-09-18 LEAP: LLM Inference on Scalable PIM-NoC Architecture with Balanced Dataflow and Fine-Grained Parallelism Yimin Wang et.al. 2509.14781 null
2025-09-18 LLM Jailbreak Detection for (Almost) Free! Guorui Chen et.al. 2509.14558 null
2025-09-17 TENET: An Efficient Sparsity-Aware LUT-Centric Architecture for Ternary LLM Inference On Edge Zhirui Huang et.al. 2509.13765 null
2025-09-16 Scaling Up Throughput-oriented LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management Thanh Son Phung et.al. 2509.13201 null
2025-09-16 HPIM: Heterogeneous Processing-In-Memory-based Accelerator for Large Language Models Inference Cenlin Duan et.al. 2509.12993 null
2025-09-15 Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference Synthia Wang et.al. 2509.12152 null
2025-09-14 Framing AI System Benchmarking as a Learning Task: FlexBench and the Open MLPerf Dataset Grigori Fursin et.al. 2509.11413 null
2025-09-14 PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits Loka Li et.al. 2509.11362 null
2025-09-14 AQUA: Attention via QUery mAgnitudes for Memory and Compute Efficient Inference in LLMs Santhosh G S et.al. 2509.11155 null
2025-09-12 MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and Repetitiveness Huizheng Wang et.al. 2509.10372 null
2025-09-11 LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation Yiqun Shen et.al. 2509.09754 null
2025-09-11 Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference Haoran Wu et.al. 2509.09505 null
2025-07-23 BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving Wanyi Zheng et.al. 2507.17120 null
2025-07-22 Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework Hongyi Tang et.al. 2507.16414 null
2025-07-21 Efficient Routing of Inference Requests across LLM Instances in Cloud-Edge Computing Shibo Yu et.al. 2507.15553 null
2025-07-18 Efficient LLM Inference: Bandwidth, Compute, Synchronization, and Capacity are all you need Michael Davies et.al. 2507.14397 null
2025-07-18 Can LLMs Infer Personality from Real World Conversations? Jianfeng Zhu et.al. 2507.14355 null
2025-07-23 Photonic Fabric Platform for AI Accelerators Jing Ding et.al. 2507.14000 null
2025-07-18 LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues Haoyang Li et.al. 2507.13681 null
2025-07-16 Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage Junqing Lin et.al. 2507.12205 null
2025-07-15 MIRAGE: KV Cache Optimization through Parameter Remapping for Multi-tenant LLM Serving Ruihao Li et.al. 2507.11507 null
2025-07-15 Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations Miray Özcan et.al. 2507.11417 null
2025-07-14 Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference Jiaming Cheng et.al. 2507.09942 null
2025-07-12 SLIM: A Heterogeneous Accelerator for Edge Inference of Sparse Large Language Model via Adaptive Thresholding Weihong Xu et.al. 2507.09201 null
2025-07-11 On Evaluating Performance of LLM Inference Serving Systems Amey Agrawal et.al. 2507.09019 null
2025-07-11 Hybrid Systolic Array Accelerator with Optimized Dataflow for Edge Large Language Model Inference Chun-Ting Chen et.al. 2507.09010 null
2025-07-11 InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching Yilun Wang et.al. 2507.08523 null
2025-07-10 Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions Quanyan Zhu et.al. 2507.08208 null
2025-07-10 Krul: Efficient State Restoration for Multi-turn Conversations with Dynamic Cross-layer KV Sharing Junyi Wen et.al. 2507.08045 null
2025-07-15 Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models Varin Sikka et.al. 2507.07505 null
2025-07-11 QUEST: Query Optimization in Unstructured Document Analysis Zhaoze Sun et.al. 2507.06515 null
2025-07-08 Voltage Regulation in Distribution Systems with Data Center Loads Yize Chen et.al. 2507.06416 null
2025-07-07 Cascade: Token-Sharded Private LLM Inference Rahul Thomas et.al. 2507.05228 null
2025-07-07 Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? Yun Qu et.al. 2507.04632 null
2025-07-05 Enhancing Adaptive Behavioral Interventions with LLM Inference from Participant-Described States Karine Karine et.al. 2507.03871 null
2025-07-05 OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference Seungjun Shin et.al. 2507.03865 null
2025-07-04 Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA Jindong Li et.al. 2507.03308 null
2025-07-03 HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference Weishu Deng et.al. 2507.03153 null
2025-07-03 On the Convergence of Large Language Model Optimizer for Black-Box Network Management Hoon Lee et.al. 2507.02689 null
2025-07-03 Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure Rui Xie et.al. 2507.02654 null
2025-07-03 FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference Xing Liu et.al. 2507.02620 null
2025-07-02 Dissecting the Impact of Mobile DVFS Governors on LLM Inference Performance and Energy Efficiency Zongpu Zhang et.al. 2507.02135 null
2025-07-02 LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation Tianyu Liu et.al. 2507.01449 null
2025-07-02 SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech Cheng Zhuangfei et.al. 2507.01348 null
2025-07-02 La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation Kai Liu et.al. 2507.01299 null
2025-07-01 VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator Zhican Wang et.al. 2507.00797 null
2025-07-01 Cognitive Load-Aware Inference: A Neuro-Symbolic Framework for Optimizing the Token Economy of Large Language Models Yilun Zhang et.al. 2507.00653 null
2025-07-01 LLM-Mesh: Enabling Elastic Sharing for Serverless LLM Inference Chuhao Xu et.al. 2507.00507 null
2025-07-01 Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and High-Performance GPUs Mohammad Firas Sada et.al. 2507.00418 null
2025-06-30 Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission Faranaksadat Solat et.al. 2507.00082 null
2025-06-27 QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization Danush Khanna et.al. 2506.22396 null
2025-06-27 Towards Operational Data Analytics Chatbots – Virtual Knowledge Graph is All You Need Junaid Ahmed Khan et.al. 2506.22267 null
2025-06-27 SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference Yongchao He et.al. 2506.22033 null
2025-06-27 A Survey of LLM Inference Systems James Pan et.al. 2506.21901 null
2025-06-17 Utility-Driven Speculative Decoding for Mixture-of-Experts Anish Saxena et.al. 2506.20675 null
2025-07-02 Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU He Sun et.al. 2506.20187 null
2025-06-24 MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection Zhengxiang Huang et.al. 2506.19884 null
2025-06-23 Black-Box Test Code Fault Localization Driven by Large Language Models and Execution Estimation Ahmadreza Saboor Yaraghi et.al. 2506.19045 null
2025-06-23 WiLLM: An Open Wireless LLM Communication System Boyi Liu et.al. 2506.19030 null
2025-06-23 CommVQ: Commutative Vector Quantization for KV Cache Compression Junyan Li et.al. 2506.18879 null
2025-06-22 Mechanistic Interpretability in the Presence of Architectural Obfuscation Marcos Florencio et.al. 2506.18053 null
2025-06-20 Towards AI Search Paradigm Yuchen Li et.al. 2506.17188 null
2025-06-17 CrEst: Credibility Estimation for Contexts in LLMs via Weak Supervision Dyah Adila et.al. 2506.14912 null
2025-06-16 Vector Ontologies as an LLM world view extraction method Kaspar Rothenfusser et.al. 2506.13252 link
2025-06-13 Semantic Scheduling for LLM Inference Wenyue Hua et.al. 2506.12204 link
2025-06-13 GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news Abdul Haque et.al. 2506.11600 null
2025-06-13 Collaborative LLM Inference via Planning for Efficient Reasoning Byeongchan Lee et.al. 2506.11578 null
2025-06-13 Efficient Long-Context LLM Inference via KV Cache Clustering Jie Hu et.al. 2506.11418 null
2025-06-12 TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference Hongbin Zhang et.al. 2506.10470 null
2025-06-11 A First Look at Bugs in LLM Inference Engines Mugeng Liu et.al. 2506.09713 link
2025-06-12 Understanding the Performance and Power of LLM Inferencing on Edge Accelerators Mayank Arya et.al. 2506.09554 null
2025-06-11 Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning Jiayi Yuan et.al. 2506.09501 null
2025-06-10 Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive- $k$ Chihiro Taguchi et.al. 2506.08479 null
2025-06-10 Draft-based Approximate Inference for LLMs Kevin Galim et.al. 2506.08373 link
2025-06-09 MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts Wei Tao et.al. 2506.07533 null
2025-06-07 Containerized In-Storage Processing and Computing-Enabled SSD Disaggregation Miryeong Kwon et.al. 2506.06769 null
2025-06-06 Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques Adarsh Prasad Behera et.al. 2506.06579 null
2025-06-04 On the Fundamental Impossibility of Hallucination Control in Large Language Models Michał P. Karpowicz et.al. 2506.06382 null
2025-06-04 SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling Anhao Zhao et.al. 2506.04179 null
2025-06-04 Pre $^3$ : Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation Junyi Chen et.al. 2506.03887 null
2025-06-04 Client-Side Zero-Shot LLM Inference for Comprehensive In-Browser URL Analysis Avihay Cohen et.al. 2506.03656 null
2025-06-04 POSS: Position Specialist Generates Better Draft for Speculative Decoding Langlin Huang et.al. 2506.03566 link
2025-06-07 Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs Jiakun Fan et.al. 2506.03296 null
2025-06-03 Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs Shangmin Guo et.al. 2506.02918 null
2025-06-03 HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference Ping Gong et.al. 2506.02572 link
2025-06-02 Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts Spencer Banasik et.al. 2506.01827 null
2025-05-30 Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching Juan Wisznia et.al. 2505.24643 null
2025-05-30 LLM Inference Enhanced by External Knowledge: A Survey Yu-Hsuan Lin et.al. 2505.24377 link
2025-05-30 SkyLB: A Locality-Aware Cross-Region Load Balancer for LLM Inference Tian Xia et.al. 2505.24095 null
2025-05-29 Large Language Model Meets Constraint Propagation Alexandre Bonlarron et.al. 2505.24012 null
2025-05-29 Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism Jinhui Wei et.al. 2505.23219 null
2025-05-29 SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference Yinghao Tang et.al. 2505.23022 null
2025-05-28 Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference Donghyeon Joo et.al. 2505.22913 link
2025-05-28 Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference Yue Zhu et.al. 2505.21919 null
2025-05-28 HoliTom: Holistic Token Merging for Fast Video Large Language Models Kele Shao et.al. 2505.21334 link
2025-05-28 FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration Daehyeon Baek et.al. 2505.20839 null
2025-05-26 HAMburger: Accelerating LLM Inference via Token Smashing Jingyu Liu et.al. 2505.20438 null
2025-05-26 MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE Zongle Huang et.al. 2505.19645 null
2025-05-26 WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference Sihan Chen et.al. 2505.19427 link
2025-05-25 DECA: A Near-Core LLM Decompression Accelerator Supporting Out-of-Order Invocation Gerasimos Gerogiannis et.al. 2505.19349 null
2025-05-27 A Survey of LLM $\times$ DATA Xuanhe Zhou et.al. 2505.18458 link
2025-05-23 An Attack to Break Permutation-Based Private Third-Party Inference Schemes for LLMs Rahul Thomas et.al. 2505.18332 null
2025-05-23 NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache Donghyun Son et.al. 2505.18231 null
2025-05-23 Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning Michael Hassid et.al. 2505.17813 null
2025-05-23 DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies Ning Yang et.al. 2505.17420 null
2025-05-22 RAP: Runtime-Adaptive Pruning for LLM Inference Huanrong Liu et.al. 2505.17138 null
2025-05-22 CASTILLO: Characterizing Response Length Distributions of Large Language Models Daniel F. Perez-Ramirez et.al. 2505.16881 link
2025-05-22 Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization Vera Neplenbroek et.al. 2505.16467 link
2025-05-22 QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design Benjamin Schneider et.al. 2505.16175 link
2025-05-22 KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization Mingbo Song et.al. 2505.16162 null
2025-05-20 Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity Susav Shrestha et.al. 2505.14884 link
2025-05-20 ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions Bufang Yang et.al. 2505.14668 null
2025-05-20 ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMs Yifan Sui et.al. 2505.14468 null
2025-05-16 An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents Ayesha Amjad et.al. 2505.13504 null
2025-05-19 HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding Siran Liu et.al. 2505.13254 null
2025-05-19 FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference Guangda Liu et.al. 2505.13109 null
2025-05-19 FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks Zihua Wang et.al. 2505.12728 link
2025-05-17 Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning Yuheng Lu et.al. 2505.11922 null
2025-05-17 Arrow: Adaptive Scheduling Mechanisms for Disaggregated LLM Inference Architecture Yu Wu et.al. 2505.11916 null
2025-05-16 TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference Raja Gond et.al. 2505.11329 link
2025-05-16 Vaiage: A Multi-Agent Solution to Personalized Travel Planning Binwen Liu et.al. 2505.10922 null
2025-05-19 SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices Xiangwen Zhuge et.al. 2505.10259 link
2025-05-15 ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production Yuxing Xiang et.al. 2505.09999 link
2025-05-15 How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference Nidhal Jegham et.al. 2505.09598 null
2025-05-14 Statistical Modeling and Uncertainty Estimation of LLM Inference Systems Kaustabha Ray et.al. 2505.09319 null
2025-05-14 ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor Seungbeom Choi et.al. 2505.09142 null
2025-05-13 LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries Zekun Wu et.al. 2505.08842 null
2025-05-13 Automatic Task Detection and Heterogeneous LLM Speculative Decoding Danying Ge et.al. 2505.08600 null
2025-05-08 Scaling Laws for Speculative Decoding Siyuan Yan et.al. 2505.07858 null
2025-05-12 SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models Hang Wu et.al. 2505.07680 null
2025-05-12 Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity Guang Yan et.al. 2505.07239 null
2025-05-12 PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications Kuntai Du et.al. 2505.07203 null
2025-05-14 I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference Zibo Gao et.al. 2505.06738 null
2025-05-09 Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference Haolin Zhang et.al. 2505.06461 null
2025-05-09 Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM Zehao Fan et.al. 2505.05772 null
2025-05-08 HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow You Peng et.al. 2505.05286 link
2025-05-06 Faster MoE LLM Inference for Extremely Large Models Haoqi Yang et.al. 2505.03531 null
2025-05-05 RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference Yaoqi Chen et.al. 2505.02922 null
2025-05-03 High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers Brian Wong et.al. 2505.01693 null
2025-05-08 A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Sihyeong Park et.al. 2505.01658 link
2025-05-02 PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding Bradley McDanel et.al. 2505.01572 null
2025-04-28 AutoJudge: Judge Decoding Without Manual Annotation Roman Garipov et.al. 2504.20039 null
2025-04-28 Taming the Titans: A Survey of Efficient LLM Inference Serving Ranran Zhen et.al. 2504.19720 link
2025-04-28 R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference Zhenyu Zhang et.al. 2504.19449 null
2025-05-07 A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification Junichiro Niimi et.al. 2504.18884 link
2025-04-29 PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation Zihao An et.al. 2504.18583 null
2025-04-25 PropRAG: Guiding Retrieval with Beam Search over Proposition Paths Jingjin Wang et.al. 2504.18070 null
2025-04-24 L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference Qingyuan Liu et.al. 2504.17584 null
2025-04-24 On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration Maoyang Xiang et.al. 2504.17376 null
2025-04-18 HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing Myunghyun Rhee et.al. 2504.16112 null
2025-04-22 Token-Aware Coding Flow: A Study with Nano Surge in Reasoning Model Junwei Hu et.al. 2504.15989 null
2025-04-23 KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments Junyoung Park et.al. 2504.15364 null
2025-04-18 High-Throughput LLM inference on Heterogeneous Clusters Yi Xiong et.al. 2504.15303 null
2025-04-21 Hardware-based Heterogeneous Memory Management for Large Language Model Inference Soojin Hwang et.al. 2504.14893 null
2025-04-19 Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator Akshat Ramachandran et.al. 2504.14365 null
2025-04-19 FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference Coleman Hooper et.al. 2504.14152 null
2025-04-16 Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading Kihyun Kim et.al. 2504.11816 link
2025-04-16 Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs Hyungwoo Lee et.al. 2504.11765 null
2025-04-16 Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures Prabhu Vellaisamy et.al. 2504.11750 null
2025-04-15 Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints Ruicheng Ao et.al. 2504.11320 link
2025-04-14 HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving Avinash Kumar et.al. 2504.10724 null
2025-04-14 AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference Yangshen Deng et.al. 2504.10326 null
2025-04-14 KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference Yuxuan Tian et.al. 2504.09936 null
2025-04-16 Understanding and Optimizing Multi-Stage AI Inference Pipelines Abhimanyu Rajeshkumar Bambhaniya et.al. 2504.09775 null
2025-04-13 LoopLynx: A Scalable Dataflow Architecture for Efficient LLM Inference Jianing Zheng et.al. 2504.09561 link
2025-04-12 MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints Yichao Yuan et.al. 2504.09345 null
2025-04-11 SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting Jiaming Xu et.al. 2504.08850 null
2025-04-10 SD $^2$ : Self-Distilled Sparse Drafters Mike Lasby et.al. 2504.08838 null
2025-04-11 Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash Fucheng Jia et.al. 2504.08378 null
2025-04-11 Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices Shengyuan Ye et.al. 2504.08242 null
2025-04-10 Token Level Routing Inference System for Edge Devices Jianshu She et.al. 2504.07878 null
2025-04-10 Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving Shihong Gao et.al. 2504.07494 link
2025-04-10 UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference Weikai Xu et.al. 2504.07479 null
2025-04-10 Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents Yueying Li et.al. 2504.07347 null
2025-04-08 SPIRe: Boosting LLM Inference Throughput with Speculative Decoding Sanjit Neelam et.al. 2504.06419 null
2025-04-08 Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching Yanhao Dong et.al. 2504.06319 null
2025-04-09 Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Gleb Rodionov et.al. 2504.06261 link
2025-04-11 User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems Jianling Wang et.al. 2504.05522 null
2025-04-07 Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness Dongzhuoran Zhou et.al. 2504.05163 null
2025-04-04 Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency Erik Johannes Husom et.al. 2504.03360 null
2025-04-04 Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation Weitao Li et.al. 2504.03165 link
2025-04-03 Narrative Studio: Visual narrative exploration using LLMs and Monte Carlo Tree Search Parsa Ghaffari et.al. 2504.02426 link
2025-04-01 SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching Yuxuan Zhu et.al. 2504.00970 null
2025-04-03 Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding Aayush Gautam et.al. 2504.00030 null
2025-04-06 ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance Tong Xie et.al. 2503.24053 link
2025-03-31 MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration Tatsuya Kubo et.al. 2503.23817 null
2025-03-30 Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference Wei Tao et.al. 2503.23294 null
2025-03-28 Niyama : Breaking the Silos of LLM Inference Serving Kanishk Goel et.al. 2503.22562 null
2025-03-25 LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation Han Chen et.al. 2503.19950 link
2025-03-24 xKV: Cross-Layer SVD for KV-Cache Compression Chi-Chih Chang et.al. 2503.18893 link
2025-03-27 Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design Rui Xie et.al. 2503.18869 null
2025-03-24 Jenga: Effective Memory Management for Serving LLM with Heterogeneity Chen Zhang et.al. 2503.18292 null
2025-03-27 WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference Youhui Zuo et.al. 2503.17922 link
2025-03-22 PipeBoost: Resilient Pipelined Architecture for Fast Serverless LLM Scaling Chongpeng Liu et.al. 2503.17707 null
2025-03-21 V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms Javier J. Poveda Rodrigo et.al. 2503.17422 null
2025-03-21 Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation Jingzhi Fang et.al. 2503.16893 null
2025-03-20 SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models Fahao Chen et.al. 2503.15921 null
2025-03-19 Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study Jomar Thomas Almonte et.al. 2503.15248 null
2025-03-19 Communication-Efficient Distributed On-Device LLM Inference Over Wireless Networks Kai Zhang et.al. 2503.14882 null
2025-03-18 PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play Wei Fang et.al. 2503.14432 null
2025-03-17 Mitigating KV Cache Competition to Enhance User Experience in LLM Inference Haiying Shen et.al. 2503.13773 null
2025-03-17 AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications Haiying Shen et.al. 2503.13737 null
2025-03-17 ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts Evangelos Georganas et.al. 2503.13565 null
2025-03-14 Examples as the Prompt: A Scalable Approach for Efficient LLM Adaptation in E-Commerce Jingying Zeng et.al. 2503.13518 null
2025-03-17 xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference Maximilian Beck et.al. 2503.13427 link
2025-03-17 VeriLeaky: Navigating IP Protection vs Utility in Fine-Tuning for LLM-Driven Verilog Coding Zeng Wang et.al. 2503.13116 null
2025-03-15 TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation Mayank Kumar et.al. 2503.12217 null
2025-03-09 Green Prompting Marta Adamska et.al. 2503.10666 null
2025-03-13 Collaborative Speculative Inference for Efficient LLM Inference Serving Luyao Gao et.al. 2503.10325 null
2025-03-12 Prompt Inference Attack on Distributed Large Language Model Inference Frameworks Xinjian Luo et.al. 2503.09291 null
2025-03-11 TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems Feiyang Wu et.al. 2503.08415 link
2025-03-11 Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference Pol G. Recasens et.al. 2503.08311 null
2025-03-09 Seesaw: High-throughput LLM Inference via Model Re-sharding Qidong Su et.al. 2503.06433 null
2025-03-07 Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching Bowen Pang et.al. 2503.05248 link
2025-03-07 SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding Kaiyu Huang et.al. 2503.05096 null
2025-03-15 Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking Yijie Xu et.al. 2503.04636 null
2025-03-06 AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services Xiaoqi Wang et.al. 2503.04418 null
2025-03-06 Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search Kou Misaki et.al. 2503.04412 null
2025-03-06 Beyond Memorization: Evaluating the True Type Inference Capabilities of LLMs for Java Code Snippets Yiwen Dong et.al. 2503.04076 null
2025-03-04 FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference Hongchao Du et.al. 2503.03777 null
2025-03-05 MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems Rui Ye et.al. 2503.03686 null
2025-03-04 VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference Zihan Liu et.al. 2503.02236 null
2025-02-26 Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis Long Cheng et.al. 2503.01873 null
2025-03-03 SAGE: A Framework of Precise Retrieval for RAG Jintao Zhang et.al. 2503.01713 null
2025-03-03 DILEMMA: Joint LLM Quantization and Distributed LLM Inference Over Edge Computing Systems Minoo Hosseinzadeh et.al. 2503.01704 null
2025-03-01 Tutorial Proposal: Speculative Decoding for Efficient LLM Inference Heming Xia et.al. 2503.00491 null
2025-02-28 FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference Xunhao Lai et.al. 2502.20766 link
2025-02-28 SPD: Sync-Point Drop for efficient tensor parallelism of Large Language Models Han-Byul Kim et.al. 2502.20727 null
2025-02-27 ECCOS: Efficient Capability and Cost Coordinated Scheduling for Multi-LLM Serving Kai Mei et.al. 2502.20576 link
2025-02-26 Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs Yiheng Yang et.al. 2502.19078 null
2025-02-24 LLM Inference Acceleration via Efficient Operation Fusion Mahsa Salmani et.al. 2502.17728 null
2025-02-24 CodeSwift: Accelerating LLM Inference for Efficient Code Generation Qianhui Zhao et.al. 2502.17139 null
2025-02-24 Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM Lian Liu et.al. 2502.16963 null
2025-02-24 DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance Xuanfan Ni et.al. 2502.16886 null
2025-03-01 CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter Yepeng Weng et.al. 2502.16880 null
2025-02-23 DISC: Dynamic Decomposition Improves LLM Inference Scaling Jonathan Light et.al. 2502.16706 null
2025-02-23 TerEffic: Highly Efficient Ternary LLM Inference on FPGA Chenyang Yin et.al. 2502.16473 null
2025-02-21 KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse Jingbo Yang et.al. 2502.16002 link
2025-02-21 Towards Swift Serverless LLM Cold Starts with ParaServe Chiheng Lou et.al. 2502.15524 null
2025-02-24 HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings Rasmus Aavang et.al. 2502.15411 link
2025-02-24 Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference Yaohua Tang et.al. 2502.15294 null
2025-02-21 A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation Shilong Hou et.al. 2502.15233 link
2025-02-19 EvoP: Robust LLM Inference via Evolutionary Pruning Shangyu Wu et.al. 2502.14910 null
2025-02-20 Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale Shashwat Jaiswal et.al. 2502.14617 null
2025-02-20 SR-LLM: Rethinking the Structured Representation in Large Language Model Jiahuan Zhang et.al. 2502.14352 null
2025-02-19 RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression Payman Behnam et.al. 2502.14051 null
2025-02-19 Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference Qingfa Xiao et.al. 2502.13542 null
2025-02-19 What are Models Thinking about? Understanding Large Language Model Hallucinations “Psychology” through Model Inner State Analysis Peiran Wang et.al. 2502.13490 null
2025-02-18 BaKlaVa – Budgeted Allocation of KV cache for Long-context Inference Ahmed Burak Gulhan et.al. 2502.13176 null
2025-02-18 R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs Sumin Jo et.al. 2502.12767 link
2025-02-18 HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading Cheng Luo et.al. 2502.12574 link
2025-02-18 Distributed On-Device LLM Inference With Over-the-Air Computation Kai Zhang et.al. 2502.12559 null
2025-02-18 SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs Ahmed F. AbouElhamayed et.al. 2502.12444 link
2025-02-17 Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs Kan Zhu et.al. 2502.12216 null
2025-02-17 Designing Role Vectors to Improve LLM Inference Behaviour Daniele Potertì et.al. 2502.12055 null
2025-02-17 DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services Ting Sun et.al. 2502.11417 null
2025-02-17 Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment Ben Dong et.al. 2502.11347 null
2025-02-16 Diversified Sampling Improves Scaling LLM inference Tianchun Wang et.al. 2502.11027 null
2025-02-16 Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings Liangqi Yuan et.al. 2502.11007 link
2025-02-15 Pushing up to the Limit of Memory Bandwidth and Capacity Utilization for Efficient LLM Decoding on Embedded FPGA Jindong Li et.al. 2502.10659 null
2025-02-14 λScale: Enabling Fast Scaling for Serverless Large Language Model Inference Minchen Yu et.al. 2502.09922 null
2025-02-14 INF^2: High-Throughput Generative Inference of Large Language Models using Near-Storage Processing Hongsun Jang et.al. 2502.09921 null
2025-02-13 On multi-token prediction for efficient LLM inference Somesh Mehra et.al. 2502.09419 null
2025-02-13 InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Heejun Lee et.al. 2502.08910 null
2025-02-12 Universal Model Routing for Efficient LLM Inference Wittawat Jitkrittum et.al. 2502.08773 null
2025-02-12 Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences Shanshan Han et.al. 2502.08142 null
2025-02-11 HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment Youhe Jiang et.al. 2502.07903 null
2025-02-11 SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters Yiping Wang et.al. 2502.07832 null
2025-02-11 PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference Yufeng Gu et.al. 2502.07578 link
2025-02-13 Online Scheduling for LLM Inference with KV Cache Constraints Patrick Jaillet et.al. 2502.07115 null
2025-02-08 Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models Soham Poddar et.al. 2502.05610 null
2025-02-08 Mechanistic Interpretability of Emotion Inference in Large Language Models Ala N. Tak et.al. 2502.05489 null
2025-02-07 BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference Reena Elangovan et.al. 2502.05376 null
2025-02-07 LLM Query Scheduling with Prefix Reuse and Latency Constraints Gregory Dexter et.al. 2502.04677 null
2025-02-06 WaferLLM: A Wafer-Scale LLM Inference System Congjie He et.al. 2502.04563 null
2025-02-06 KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference Xing Li et.al. 2502.04420 link
2025-02-06 CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference Zehua Pei et.al. 2502.04416 link
2025-02-06 AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference Qingyue Yang et.al. 2502.04077 link
2025-02-06 Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective Yuan Feng et.al. 2502.03805 link
2025-02-06 Adaptive Semantic Prompt Caching with VectorQ Luis Gaspar Schroeder et.al. 2502.03771 null
2025-02-05 HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference Zeyu Zhang et.al. 2502.03589 null
2025-02-05 Accessible and Portable LLM Inference by Compiling Computational Graphs into SQL Wenbo Sun et.al. 2502.02818 null
2025-02-05 Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation Jingyu Liu et.al. 2502.02789 link
2025-02-04 EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization Yize Wu et.al. 2502.02493 null
2025-01-30 Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency Sazzad Hossain et.al. 2502.01651 null
2025-02-06 An Investigation of FP8 Across Accelerators for LLM Inference Jiwoo Kim et.al. 2502.01070 null
2025-02-02 Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference Patrick Yubeaton et.al. 2502.00922 null
2025-02-02 SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models Jiawen Zhang et.al. 2502.00847 null
2025-02-01 UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs Yizhe Xiong et.al. 2502.00439 null
2025-02-01 ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference Xiang Liu et.al. 2502.00299 null
2025-01-31 Pheromone-based Learning of Optimal Reasoning Paths Anirudh Chari et.al. 2501.19278 null
2025-02-02 RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations Zunhai Su et.al. 2501.16383 link
2025-01-27 Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs Antony Bartlett et.al. 2501.16191 null
2025-01-27 TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference Jack Min Ong et.al. 2501.16007 null
2025-01-27 Aging-aware CPU Core Management for Embodied Carbon Amortization in Cloud LLM Inference Tharindu B. Hewage et.al. 2501.15829 link
2025-01-25 Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads Xingyang He et.al. 2501.15113 null
2025-01-24 Locality-aware Fair Scheduling in LLM Serving Shiyi Cao et.al. 2501.14312 null
2025-01-20 Glinthawk: A Two-Tiered Architecture for High-Throughput LLM Inference Pouya Hamadanian et.al. 2501.11779 link
2025-01-20 Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas Nishant Balepur et.al. 2501.11549 link
2025-01-19 GREEN-CODE: Optimizing Energy Efficiency in Large Language Models for Code Generation Shashikant Ilager et.al. 2501.11006 link
2025-01-17 A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks Xinzhe Li et.al. 2501.10069 link
2025-01-16 Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition Takaaki Hori et.al. 2501.09258 null
2025-01-15 Guiding Retrieval using LLM-based Listwise Rankers Mandeep Rathee et.al. 2501.09186 link
2025-01-14 Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings Paul Joe Maliakel et.al. 2501.08219 null
2025-01-14 PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving Ahmet Caner Yüzügüler et.al. 2501.08192 null
2025-01-14 Hierarchical Autoscaling for Large Language Model Serving with Chiron Archit Patke et.al. 2501.08090 null
2025-01-12 MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference Wenxuan Zeng et.al. 2501.06807 null
2025-01-05 TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms Jovan Stojkovic et.al. 2501.02600 null
2025-01-04 AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference Zhuomin He et.al. 2501.02336 link
2025-01-03 Efficient LLM Inference with Activation Checkpointing and Hybrid Caching Sanghyeon Lee et.al. 2501.01792 null
2025-01-03 BlockDialect: Block-wise Fine-grained Mixed Format for Energy-Efficient LLM Inference Wonsuk Jang et.al. 2501.01144 link
2025-01-02 FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving Zihao Ye et.al. 2501.01005 link
2024-12-23 Highly Optimized Kernels and Fine-Grained Codebooks for LLM Inference on Arm CPUs Dibakar Gope et.al. 2501.00032 link
2024-12-29 TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication Zongwu Wang et.al. 2412.20501 link
2024-12-28 LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System Hyucksung Kwon et.al. 2412.20166 null
2024-12-19 GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors Chengming Zhang et.al. 2412.19829 null
2025-01-02 A Survey on Large Language Model Acceleration based on KV Cache Management Haoyang Li et.al. 2412.19442 link
2024-12-27 An Engorgio Prompt Makes Large Language Model Babble on Jianshuo Dong et.al. 2412.19394 link
2024-12-25 Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference Libo Zhang et.al. 2412.18934 null
2024-12-21 SYMPHONY: Improving Memory Management for LLM Inference Workloads Saurabh Agarwal et.al. 2412.16434 null
2024-12-20 WebLLM: A High-Performance In-Browser LLM Inference Engine Charlie F. Ruan et.al. 2412.15803 link
2024-12-18 A Survey on LLM Inference-Time Self-Improvement Xiangjue Dong et.al. 2412.14352 link
2024-12-18 Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models Seungeun Oh et.al. 2412.12687 null
2024-12-17 A System for Microserving of LLMs Hongyi Jin et.al. 2412.12488 null
2024-12-16 CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation Hongxuan Zhang et.al. 2412.11741 null
2024-12-15 Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning Yun Qu et.al. 2412.11120 link
2024-12-15 NITRO: LLM Inference on Intel Laptop NPUs Anthony Fei et.al. 2412.11053 link
2024-12-13 SCBench: A KV Cache-Centric Analysis of Long-Context Methods Yucheng Li et.al. 2412.10319 null
2024-12-17 TurboAttention: Efficient Attention Approximation For High Throughputs LLMs Hao Kang et.al. 2412.08585 null
2024-12-11 Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths Naryeong Kim et.al. 2412.08281 null
2024-12-12 TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch Xingchen Song et.al. 2412.08237 null
2024-12-09 Asynchronous LLM Function Calling In Gim et.al. 2412.07017 null
2024-12-09 SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs James Vo et.al. 2412.06198 null
2024-12-08 XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference Weizhuo Li et.al. 2412.05896 null
2024-12-06 GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments Yanyu Chen et.al. 2412.04788 null
2024-12-03 Multi-Bin Batching for Increasing LLM Inference Throughput Ozgur Guldogan et.al. 2412.04504 null
2024-11-29 BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching Zhen Zheng et.al. 2412.03594 null
2024-12-03 Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity Da Ma et.al. 2412.02252 null
2024-12-02 PLD+: Accelerating LLM inference by leveraging Language Model Artifacts Shwetha Somasundaram et.al. 2412.01447 null
2024-12-02 Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking Marco Federici et.al. 2412.01380 null
2024-12-05 RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy Geonho Lee et.al. 2412.01129 link
2024-12-02 TruncFormer: Private LLM Inference Using Only Truncations Patrick Yubeaton et.al. 2412.01042 null
2024-11-29 A dynamic parallel method for performance optimization on hybrid CPUs Luo Yu et.al. 2411.19542 null
2024-12-03 Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Akhiad Bercovich et.al. 2411.19146 null
2024-11-29 InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks Xinyao Zheng et.al. 2411.18191 null
2024-11-28 MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache Akshat Sharma et.al. 2411.18077 null
2024-11-24 Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments Nikoleta Iliakopoulou et.al. 2411.17741 null
2024-11-26 PIM-AI: A Novel Architecture for High-Efficiency LLM Inference Cristobal Ortega et.al. 2411.17309 null
2024-11-26 Star Attention: Efficient LLM Inference over Long Sequences Shantanu Acharya et.al. 2411.17116 link
2024-11-26 Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation Chaoyi Jiang et.al. 2411.17089 link
2024-11-25 MixPE: Quantization and Hardware Co-design for Efficient LLM Inference Yu Zhang et.al. 2411.16158 null
2024-11-24 eFedLLM: Efficient LLM Inference Based on Federated Learning Shengwen Ding et.al. 2411.16003 null
2024-11-24 Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format Chao Fang et.al. 2411.15982 null
2024-11-24 Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems Wenxiang Lin et.al. 2411.15715 null
2024-11-22 XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models Yixin Dong et.al. 2411.15100 null
2024-11-21 Disentangling Memory and Reasoning Ability in Large Language Models Mingyu Jin et.al. 2411.13504 link
2024-11-20 Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding Hyun Ryu et.al. 2411.13157 null
2024-11-21 LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts Zhuohan Gu et.al. 2411.13009 null
2024-11-15 An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2 Pepijn de Reus et.al. 2411.12758 link
2024-11-19 SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference Jiho Shin et.al. 2411.12692 null
2024-11-18 MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs Shiyi Cao et.al. 2411.11217 null
2024-11-15 AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference Janghwan Lee et.al. 2411.09909 null
2024-11-14 Squeezed Attention: Accelerating Long Context Length LLM Inference Coleman Hooper et.al. 2411.09688 link
2024-11-15 Communication Compression for Tensor Parallel LLM Inference Jan Hansen-Palmus et.al. 2411.09510 null
2024-11-14 Pie: Pooling CPU Memory for LLM Inference Yi Xu et.al. 2411.09317 null
2024-11-12 Towards Low-bit Communication for Tensor Parallel LLM Inference Harry Dong et.al. 2411.07942 null
2024-11-12 The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving Kyoungmin Kim et.al. 2411.07447 null
2024-11-08 AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality Ilias Bournias et.al. 2411.05555 null
2024-11-07 Hardware and Software Platform Inference Cheng Zhang et.al. 2411.05197 null
2024-11-07 SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference Gabriele Oliaro et.al. 2411.04975 link
2024-11-05 CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration Hongpeng Jin et.al. 2411.02829 null
2024-11-04 RAGViz: Diagnose and Visualize Retrieval-Augmented Generation Tevin Wang et.al. 2411.01751 link
2024-11-06 HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference Peng Tang et.al. 2411.01433 null
2024-11-02 RA-WEBs: Remote Attestation for WEB services Kosei Akama et.al. 2411.01340 null
2024-11-02 NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference Xuanlin Jiang et.al. 2411.01142 null
2024-11-01 LLM-Based Misconfiguration Detection for AWS Serverless Computing Jinfeng Wen et.al. 2411.00642 null
2024-11-04 ReverseNER: A Self-Generated Example-Driven Framework for Zero-Shot Named Entity Recognition with Large Language Models Anbang Wang et.al. 2411.00533 null
2024-11-01 Attention Tracker: Detecting Prompt Injection Attacks in LLMs Kuo-Han Hung et.al. 2411.00348 null
2024-10-31 LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators Krishna Teja Chitty-Venkata et.al. 2411.00136 link
2024-10-31 Interpretable Language Modeling via Induction-head Ngram Models Eunji Kim et.al. 2411.00066 link
2024-10-31 ALISE: Accelerating Large Language Model Serving with Speculative Scheduling Youpeng Zhao et.al. 2410.23537 null
2024-10-30 BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference Junqi Zhao et.al. 2410.23079 link
2024-10-29 Scaling LLM Inference with Optimized Sample Compute Allocation Kexun Zhang et.al. 2410.22480 link
2024-10-29 SVIP: Towards Verifiable Inference of Open-source Large Language Models Yifan Sun et.al. 2410.22307 null
2024-10-28 ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Hanshi Sun et.al. 2410.21465 link
2024-10-27 FIRP: Faster LLM inference via future intermediate representation prediction Pengfei Wu et.al. 2410.20488 null
2024-10-29 Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management Tuowei Wang et.al. 2410.19274 null
2024-10-24 Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Ruisi Cai et.al. 2410.19123 link
2024-10-30 Dynamic Vocabulary Pruning in Early-Exit LLMs Jort Vincenti et.al. 2410.18952 link
2024-10-25 A Survey on Speech Large Language Models Jing Peng et.al. 2410.18908 null
2024-10-24 BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching Peizhuang Cong et.al. 2410.18701 null
2024-10-25 Fast Inference for Augmented Large Language Models Rana Shahout et.al. 2410.18248 null
2024-10-23 POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference Aditya K Kamath et.al. 2410.18038 link
2024-10-22 FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs Haoran Lin et.al. 2410.16663 null
2024-10-22 Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency Prafulla Kumar Choubey et.al. 2410.16597 null
2024-10-20 EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models Junhao Hu et.al. 2410.15332 null
2024-10-19 IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System Minseok Seo et.al. 2410.15008 null
2024-10-23 Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching Jie Peng et.al. 2410.14740 null
2024-10-18 A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference You Wu et.al. 2410.14442 link
2024-10-18 Revisiting SLO and Goodput Metrics in LLM Serving Zhibin Wang et.al. 2410.14257 null
2024-10-17 RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs Jiatan Huang et.al. 2410.13987 null
2024-10-17 Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs Tianyu Guo et.al. 2410.13835 link
2024-10-17 Progressive Mixed-Precision Decoding for Efficient LLM Inference Hao Mark Chen et.al. 2410.13461 null
2024-10-17 Data Defenses Against Large Language Models William Agnew et.al. 2410.13138 link
2024-10-19 In-context KV-Cache Eviction for LLMs via Attention-Gate Zihao Zeng et.al. 2410.12876 null
2024-10-10 RecurFormer: Not All Transformer Heads Need Self-Attention Ruiqing Yan et.al. 2410.12850 null
2024-10-16 Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning Huiwen Wu et.al. 2410.12130 null
2024-10-15 Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix Yingyu Liang et.al. 2410.11261 null
2024-10-14 DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Guangxuan Xiao et.al. 2410.10819 link
2024-10-16 SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization Akrit Mudvari et.al. 2410.10759 null
2024-10-12 Power-Softmax: Towards Secure LLM Inference over Encrypted Data Itamar Zimerman et.al. 2410.09457 null
2024-10-09 SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration Heming Xia et.al. 2410.06916 link
2024-10-08 ParallelSpec: Parallel Drafter for Efficient Speculative Decoding Zilin Xiao et.al. 2410.05589 null
2024-10-06 RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference Yige Xu et.al. 2410.04519 link
2024-10-14 Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective Jinhao Li et.al. 2410.04466 link
2024-10-04 SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation Aurick Qiao et.al. 2410.03960 null
2024-10-04 EXAQ: Exponent Aware Quantization For LLMs Acceleration Moran Shkolnik et.al. 2410.03185 link
2024-10-03 LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences Zhenxiao Fu et.al. 2410.02950 null
2024-10-03 Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration Yun Qu et.al. 2410.02511 link
2024-10-03 LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services Małgorzata Łazuka et.al. 2410.02425 link
2024-10-04 Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation Xiaoqun Liu et.al. 2410.02220 null
2024-10-02 Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads Yuxiang Huang et.al. 2410.01805 link
2024-10-02 ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving Yifan Qiao et.al. 2410.01228 null
2024-10-01 TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Zonghang Li et.al. 2410.00531 link
2024-09-30 The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems Linke Song et.al. 2409.20002 null
2024-09-26 Control Industrial Automation System with Large Language Models Yuchen Xia et.al. 2409.18009 link
2024-09-26 Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores Shaobo Ma et.al. 2409.17870 null
2024-09-25 Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Zhenmei Shi et.al. 2409.17422 link
2024-09-25 Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations Amey Agrawal et.al. 2409.17264 null
2024-09-25 Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference Zongyue Qin et.al. 2409.16560 null
2024-09-25 AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization Yifan Tan et.al. 2409.16546 link
2024-09-23 Eagle: Efficient Training-Free Router for Multi-LLM Inference Zesen Zhao et.al. 2409.15518 null
2024-09-24 UELLM: A Unified and Efficient Approach for LLM Inference Serving Yiyuan He et.al. 2409.14961 null
2024-09-22 RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph Linxi Wei et.al. 2409.14556 null
2024-09-16 Do Large Language Models Need a Content Delivery Network? Yihua Cheng et.al. 2409.13761 link
2024-09-19 PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs) Mahmoud Nazzal et.al. 2409.12699 link
2024-09-12 LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs Han Xu et.al. 2409.11424 null
2024-09-04 ISO: Overlap of Computation and Communication within Seqenence For LLM Inference Bin Xiao et.al. 2409.11155 null
2024-09-18 RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Di Liu et.al. 2409.10516 link
2024-09-08 InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference Xiurui Pan et.al. 2409.04992 null
2024-09-07 Achieving Peak Performance for Large Language Models: A Systematic Review Zhyar Rzgar K Rostam et.al. 2409.04833 null
2024-09-06 A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage Huan Yang et.al. 2409.04040 null
2024-09-13 Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study Jianwei Zhu et.al. 2409.03992 null
2024-09-05 Sirius: Contextual Sparsity with Correction for Efficient LLMs Yang Zhou et.al. 2409.03856 link
2024-08-31 HSF: Defending against Jailbreak Attacks with Hidden State Filtering Cheng Qian et.al. 2409.03788 null
2024-09-03 Contemporary Model Compression on Large Language Models Inference Dong Liu et.al. 2409.01990 link
2024-09-02 CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification Junhui He et.al. 2409.01366 link
2024-09-04 Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference Barys Liskavets et.al. 2409.01227 link
2024-09-01 Research on LLM Acceleration Using the High-Performance RISC-V Processor “Xiangshan” (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product) Xu-Hao Chen et.al. 2409.00661 null
2024-08-28 Decentralized LLM Inference over Edge Networks with Energy Harvesting Aria Khoshsirat et.al. 2408.15907 null
2024-08-28 Efficient LLM Scheduling by Learning to Rank Yichao Fu et.al. 2408.15792 link
2024-08-28 Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation Lujun Gui et.al. 2408.15562 null
2024-08-22 NanoFlow: Towards Optimal Large Language Model Serving Throughput Kan Zhu et.al. 2408.12757 link
2024-09-04 Parallel Speculative Decoding with Adaptive Draft Length Tianyu Liu et.al. 2408.11850 link
2024-08-21 MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models Elias Frantar et.al. 2408.11743 link
2024-08-20 Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models Artem Vazhentsev et.al. 2408.10692 null
2024-08-19 PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars Sumanth Prabhu et.al. 2408.08869 null
2024-08-23 ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models Chao Zeng et.al. 2408.08554 link
2024-08-14 LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference Seungjae Moon et.al. 2408.07326 null
2024-08-12 LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration Zhiwen Mo et.al. 2408.06003 null
2024-08-10 LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale Jaehong Cho et.al. 2408.05499 link
2024-08-05 SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving Andreas Kosmas Kakolyris et.al. 2408.05235 null
2024-08-08 Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning Ke Cheng et.al. 2408.04323 null
2024-08-07 Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference Zeyu Zhang et.al. 2408.04107 null
2024-08-07 MPC-Minimized Secure LLM Inference Deevashwer Rathee et.al. 2408.03561 null
2024-08-05 Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning Hao Zhou et.al. 2408.02549 null
2024-08-02 The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines Matias Martinez et.al. 2408.01050 null
2024-08-01 DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency Jovan Stojkovic et.al. 2408.00741 null
2024-08-01 Designing Efficient LLM Accelerators for Edge Devices Jude Haris et.al. 2408.00462 null
2024-08-01 Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control Hao Zhou et.al. 2408.00214 null
2024-07-23 ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency Yuhang Yao et.al. 2408.00008 null
2024-08-01 Responsive ML inference in multi-tenanted environments using AQUA Abhishek Vijaya Kumar et.al. 2407.21255 null
2024-07-25 An Efficient Inference Framework for Early-exit Large Language Models Ruijie Miao et.al. 2407.20272 null
2024-07-29 Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost Sania Nayab et.al. 2407.19825 null
2024-07-29 Teaching LLMs at Charles University: Assignments and Activities Jindřich Helcl et.al. 2407.19798 null
2024-07-22 RazorAttention: Efficient KV Cache Compression Through Retrieval Heads Hanlin Tang et.al. 2407.15891 null
2024-07-22 vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving Jiale Xu et.al. 2407.15309 link
2024-07-19 LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference Qichen Fu et.al. 2407.14057 null
2024-07-17 Struct-X: Enhancing Large Language Models Reasoning with Structured Data Xiaoyu Tan et.al. 2407.12522 null
2024-07-17 LLM Inference Serving: Survey of Recent Advances and Opportunities Baolin Li et.al. 2407.12391 null
2024-07-17 Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models Ayush Kaushal et.al. 2407.12327 link
2024-07-16 PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation Branden Butler et.al. 2407.11798 null
2024-07-21 Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference Yuan Feng et.al. 2407.11550 link
2024-07-15 Fast Matrix Multiplications for Lookup Table-Quantized LLMs Han Guo et.al. 2407.10960 link
2024-07-12 Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference Zongyue Qin et.al. 2407.09722 null
2024-07-09 Metron: Holistic Performance Evaluation Framework for LLM Inference Systems Amey Agrawal et.al. 2407.07000 link
2024-07-08 Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU Daliang Xu et.al. 2407.05858 link
2024-07-07 A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length Yuqing Yang et.al. 2407.05347 null
2024-07-05 Corki: Enabling Real-time Embodied AI Robots via Algorithm-Architecture Co-Design Yiyang Huang et.al. 2407.04292 link
2024-07-04 Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems Grant Wilkins et.al. 2407.04014 null
2024-07-02 MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Huiqiang Jiang et.al. 2407.02490 link
2024-06-29 Teola: Towards End-to-End Optimization of LLM-based Applications Xin Tan et.al. 2407.00326 link
2024-06-25 T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge Jianyu Wei et.al. 2407.00088 link
2024-06-28 InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management Wonbeom Lee et.al. 2406.19707 null
2024-06-24 Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters Euiin Yi et.al. 2406.16758 link
2024-06-28 SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention Qianchao Zhu et.al. 2406.15486 null
2024-06-21 Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models Qi Liu et.al. 2406.14848 link
2024-06-20 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data Johannes Treutlein et.al. 2406.14546 link
2024-06-20 LiveMind: Low-latency Large Language Models with Simultaneous Inference Chuangtao Chen et.al. 2406.14319 link
2024-06-19 SDQ: Sparse Decomposed Quantization for LLM Inference Geonhwa Jeong et.al. 2406.13868 null
2024-06-19 Amphista: Accelerate LLM Inference with Bi-directional Multiple Drafting Heads in a Non-autoregressive Style Zeping Li et.al. 2406.13170 null
2024-06-16 Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization Jungi Lee et.al. 2406.12930 null
2024-06-18 LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization Masafumi Enomoto et.al. 2406.12494 null
2024-06-18 LLMs Are Prone to Fallacies in Causal Inference Nitish Joshi et.al. 2406.12158 null
2024-06-14 Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning Hui Liu et.al. 2406.11890 null
2024-06-17 Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference Donghyeon Joo et.al. 2406.11674 null
2024-06-17 QTIP: Quantization with Trellises and Incoherence Processing Albert Tseng et.al. 2406.11235 link
2024-06-16 New Solutions on LLM Acceleration, Optimization, and Application Yingbing Huang et.al. 2406.10903 null
2024-06-16 Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference Jiaming Tang et.al. 2406.10774 link
2024-06-15 Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary Study Hao Hao et.al. 2406.10675 link
2024-06-08 QCQA: Quality and Capacity-aware grouped Query Attention Vinay Joshi et.al. 2406.10247 null
2024-06-12 Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference Christopher Wolters et.al. 2406.08413 null
2024-06-12 PowerInfer-2: Fast Large Language Model Inference on a Smartphone Zhenliang Xue et.al. 2406.06282 null
2024-06-09 A Superalignment Framework in Autonomous Driving with Large Language Models Xiangrui Kong et.al. 2406.05651 null
2024-06-06 Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism Jiahao Liu et.al. 2406.03853 null
2024-06-04 Language Models can Infer Action Semantics for Classical Planners from Environment Feedback Wang Zhu et.al. 2406.02791 null
2024-06-08 Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach Yuxuan Chen et.al. 2406.02616 null
2024-06-04 SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices Ruslan Svirschevski et.al. 2406.02532 link
2024-06-03 Demystifying Platform Requirements for Diverse LLM Inference Use Cases Abhimanyu Bambhaniya et.al. 2406.01698 link
2024-06-03 PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration Ziqian Zeng et.al. 2406.01394 null
2024-06-01 A Practice-Friendly Two-Stage LLM-Enhanced Paradigm in Sequential Recommendation Dugang Liu et.al. 2406.00333 null
2024-05-31 No Free Lunch Theorem for Privacy-Preserving LLM Inference Xiaojin Zhang et.al. 2405.20681 null
2024-05-30 Decentralized AI: Permissionless LLM Inference on POKT Network Daniel Olshansky et.al. 2405.20450 null
2024-06-01 S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs Wei Zhong et.al. 2405.20314 null
2024-05-30 Deciphering Human Mobility: Inferring Semantics of Trajectories with Large Language Models Yuxiao Luo et.al. 2405.19850 null
2024-05-29 MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models Taehyun Kim et.al. 2405.18832 null
2024-05-29 PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN Fei Zheng et.al. 2405.18744 null
2024-06-02 Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference Hao Mark Chen et.al. 2405.18628 link
2024-05-25 FastQuery: Communication-efficient Embedding Table Query for Private LLM Inference Chenqi Lin et.al. 2405.16241 null
2024-05-23 EdgeShard: Efficient LLM Inference via Collaborative Edge Computing Mingjin Zhang et.al. 2405.14371 null
2024-05-23 MiniCache: KV Cache Compression in Depth Dimension for Large Language Models Akide Liu et.al. 2405.14366 null
2024-05-21 PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference Dongjie Yang et.al. 2405.12532 null
2024-05-12 Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization Xinyuan Zhang et.al. 2405.07140 null
2024-05-11 Aladdin: Joint Placement and Scaling for SLO-Aware LLM Serving Chengyi Nie et.al. 2405.06856 null
2024-05-21 Vidur: A Large-Scale Simulation Framework For LLM Inference Amey Agrawal et.al. 2405.05465 link
2024-05-13 KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation Minsik Cho et.al. 2405.05329 null
2024-05-12 DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer’s Disease Questions with Scientific Literature Dawei Li et.al. 2405.04819 link
2024-05-10 QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving Yujun Lin et.al. 2405.04532 link
2024-05-07 vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention Ramya Prabhu et.al. 2405.04437 link
2024-05-07 Optimizing Language Model’s Reasoning Abilities with Weak Supervision Yongqi Tong et.al. 2405.04086 null
2024-05-06 AlphaMath Almost Zero: process Supervision without process Guoxin Chen et.al. 2405.03553 link
2024-05-03 Efficient and Economic Large Language Model Inference with Attention Offloading Shaoyuan Chen et.al. 2405.01814 null

<a href=#updated-on-20251022>(back to top)</a>

MoE

Publish Date Title Authors PDF Code
2025-10-21 Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework Yujie Xing et.al. 2510.18825 null
2025-10-21 Noise-Conditioned Mixture-of-Experts Framework for Robust Speaker Verification Bin Gu et.al. 2510.18533 null
2025-10-21 Training Diverse Graph Experts for Ensembles: A Systematic Empirical Study Gangda Deng et.al. 2510.18370 null
2025-10-19 L-MoE: End-to-End Training of a Lightweight Mixture of Low-Rank Adaptation Experts Shihao Ji et.al. 2510.17898 null
2025-10-20 Towards 3D Objectness Learning in an Open World Taichi Liu et.al. 2510.17686 null
2025-10-20 Intelligent Communication Mixture-of-Experts Boosted-Medical Image Segmentation Foundation Model Xinwei Zhang et.al. 2510.17684 null
2025-10-20 Learned Inertial Odometry for Cycling Based on Mixture of Experts Algorithm Hao Qiao et.al. 2510.17604 null
2025-10-20 ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts Zheyue Tan et.al. 2510.17483 null
2025-10-19 End-to-end Listen, Look, Speak and Act Siyin Wang et.al. 2510.16756 null
2025-10-18 NeurIPT: Foundation Model for Neural Interfaces Zitao Fang et.al. 2510.16548 null
2025-10-18 Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts Yongxiang Hua et.al. 2510.16448 null
2025-10-18 Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures Minh-Khoi Nguyen-Nhat et.al. 2510.16411 null
2025-10-17 Expert Merging in Sparse Mixture of Experts with Nash Bargaining Dung V. Nguyen et.al. 2510.16138 null
2025-10-17 Mixture of Experts Approaches in Dense Retrieval Tasks Effrosyni Sokli et.al. 2510.15683 null
2025-10-17 FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification Zhen Sun et.al. 2510.15595 null
2025-10-17 Backdoor or Manipulation? Graph Mixture of Experts Can Defend Against Various Graph Adversarial Attacks Yuyuan Feng et.al. 2510.15333 null
2025-10-17 MTmixAtt: Integrating Mixture-of-Experts with Multi-Mix Attention for Large-Scale Recommendation Xianyang Qi et.al. 2510.15286 null
2025-10-17 Adaptive Individual Uncertainty under Out-Of-Distribution Shift with Expert-Routed Conformal Prediction Amitesh Badkul et.al. 2510.15233 null
2025-10-16 Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models Guinan Su et.al. 2510.14853 null
2025-10-16 MergeMoE: Efficient Compression of MoE Models via Expert Output Merging Ruijie Miao et.al. 2510.14436 null
2025-10-16 Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning Weijie Shen et.al. 2510.14300 null
2025-10-16 MACE: Mixture-of-Experts Accelerated Coordinate Encoding for Large-Scale Scene Localization and Rendering Mingkai Liu et.al. 2510.14251 null
2025-10-15 REAP the Experts: Why Pruning Prevails for One-Shot MoE compression Mike Lasby et.al. 2510.13999 null
2025-10-15 Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module Ruitao Feng et.al. 2510.13558 null
2025-10-15 ExpressNet-MoE: A Hybrid Deep Neural Network for Emotion Recognition Deeptimaan Banerjee et.al. 2510.13493 null
2025-10-15 Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers Xin Zhao et.al. 2510.13462 null
2025-10-15 Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts Li Bai et.al. 2510.13451 null
2025-10-15 UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE Zhenyu Liu et.al. 2510.13344 null
2025-10-15 GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models Chen Zheng et.al. 2510.13079 null
2025-10-14 Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps Do Tien Hai et.al. 2510.12744 null
2025-10-14 MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts Yushu Zhao et.al. 2510.12357 null
2025-10-14 DE3S: Dual-Enhanced Soft-Sparse-Shape Learning for Medical Early Time-Series Classification Tao Xie et.al. 2510.12214 null
2025-10-13 Beyond ‘Templates’: Category-Agnostic Object Pose, Size, and Shape Estimation from a Single View Jinyu Zhang et.al. 2510.11687 null
2025-10-13 Robust Ego-Exo Correspondence with Long-Term Memory Yijun Hu et.al. 2510.11417 null
2025-10-13 Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers Wenhan Ma et.al. 2510.11370 null
2025-10-13 What to expect from microscopic nuclear modelling for k $_{\rm eff}$ calculations ? D. Rochman et.al. 2510.11256 null
2025-10-13 MC#: Mixture Compressor for Mixture-of-Experts Large Models Wei Huang et.al. 2510.10962 null
2025-10-12 Crisis-Aware Regime-Conditioned Diffusion with CVaR Allocation Ali Atiah Alzahrani et.al. 2510.10807 null
2025-10-12 Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection Shizhen Zhao et.al. 2510.10584 null
2025-10-12 Hierarchical LoRA MoE for Efficient CTR Model Scaling Zhichen Zeng et.al. 2510.10432 null
2025-10-11 SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference Liangkun Chen et.al. 2510.10302 null
2025-10-10 MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest Xiao Yang et.al. 2510.09857 null
2025-10-10 Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation Youwei Zheng et.al. 2510.09094 null
2025-10-09 LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution Xiaohui Li et.al. 2510.08771 null
2025-10-09 FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts Heming Zou et.al. 2510.08396 null
2025-10-09 Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization Jason Bohne et.al. 2510.08256 null
2025-10-09 From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill Gunjun Lee et.al. 2510.08055 null
2025-10-09 Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training Ruizhe Wang et.al. 2510.08008 null
2025-10-09 Multilingual Knowledge Graph Completion via Efficient Multilingual Knowledge Sharing Cunli Mao et.al. 2510.07736 null
2025-10-09 Mutual Learning for Hashing: Unlocking Strong Hash Functions from Weak Supervision Xiaoxu Ma et.al. 2510.07703 null
2025-10-09 LiveThinking: Enabling Real-Time Efficient Reasoning for AI-Powered Livestreaming via Reinforcement Learning Yuhan Sun et.al. 2510.07685 null
2025-10-08 MoGU: Mixture-of-Gaussians with Uncertainty-based Gating for Time Series Forecasting Yoli Shavit et.al. 2510.07459 null
2025-10-08 Less is More: Strategic Expert Selection Outperforms Ensemble Complexity in Traffic Forecasting Walid Guettala et.al. 2510.07426 null
2025-10-08 Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts Fangshuo Liao et.al. 2510.07205 null
2025-10-08 A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages Zibo Su et.al. 2510.06612 null
2025-10-09 SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation Shuang Cheng et.al. 2510.06303 null
2025-10-06 Reproducibility Study of “XRec: Large Language Models for Explainable Recommendation” Ranjan Mishra et.al. 2510.06275 null
2025-10-08 Barbarians at the Gate: How AI is Upending Systems Research Audrey Cheng et.al. 2510.06189 null
2025-10-07 Rasterized Steered Mixture of Experts for Efficient 2D Image Regression Yi-Hsin Li et.al. 2510.05814 null
2025-10-07 MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition Haoxun Li et.al. 2510.05749 null
2025-10-07 Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting Zhongkai Yu et.al. 2510.05497 null
2025-10-06 Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving Yue Pan et.al. 2510.05245 null
2025-10-06 REN: Anatomically-Informed Mixture-of-Experts for Interstitial Lung Disease Diagnosis Alec K. Peltekian et.al. 2510.04923 null
2025-10-06 LMM-Incentive: Large Multimodal Model-based Incentive Design for User-Generated Content in Web 3.0 Jinbo Wen et.al. 2510.04765 null
2025-10-06 Multilingual Routing in Mixture-of-Experts Lucas Bandarkar et.al. 2510.04694 null
2025-10-06 Improving Multimodal Brain Encoding Model with Dynamic Subject-awareness Routing Xuanhua Yin et.al. 2510.04670 null
2025-10-05 HoRA: Cross-Head Low-Rank Adaptation with Joint Hypernetworks Nghiem T. Diep et.al. 2510.04295 null
2025-10-05 SliceMoE: Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer Scaling Harshil Vejendla et.al. 2510.04286 null
2025-10-05 MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition Umberto Cappellazzo et.al. 2510.04136 null
2025-10-03 Mixture of Many Zero-Compute Experts: A High-Rate Quantization Theory Perspective Yehuda Dar et.al. 2510.03151 null
2025-10-02 ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models Gursimran Singh et.al. 2510.02613 null
2025-10-02 UpSafe $^\circ$ C: Upcycling for Controllable Safety in Large Language Models Yuhao Sun et.al. 2510.02194 null
2025-10-02 LadderMoE: Ladder-Side Mixture of Experts Adapters for Bronze Inscription Recognition Rixin Zhou et.al. 2510.01651 null
2025-10-01 Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEs Leyla Mirvakhabova et.al. 2510.01185 null
2025-10-01 Learning Compact Representations of LLM Abilities via Item Response Theory Jianhao Chen et.al. 2510.00844 null
2025-10-01 Graph Integrated Multimodal Concept Bottleneck Model Jiakai Lin et.al. 2510.00701 null
2025-10-01 FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression Yifei Gao et.al. 2510.00621 null
2025-10-01 Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning Minghao Yang et.al. 2510.00570 null
2025-09-30 FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training Yunqi Gao et.al. 2510.00207 null
2025-09-30 Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization Yaoxiang Wang et.al. 2509.26520 null
2025-09-30 Nephrobase Cell+: Multimodal Single-Cell Foundation Model for Decoding Kidney Biology Chenyu Li et.al. 2509.26223 null
2025-09-30 Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline Haiyang Li et.al. 2509.25991 null
2025-09-30 UniMMAD: Unified Multi-Modal and Multi-Class Anomaly Detection via MoE-Driven Feature Decompression Yuan Zhao et.al. 2509.25934 null
2025-09-30 Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel Chuanyang Zheng et.al. 2509.25913 null
2025-10-01 A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI Arvind Murari Vepa et.al. 2509.25889 null
2025-09-30 Collaborative Compression for Large-Scale MoE Deployment on Edge Yixiao Chen et.al. 2509.25689 null
2025-09-30 LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts Yuan Zhuang et.al. 2509.25684 null
2025-09-30 Guiding Mixture-of-Experts with Temporal Multimodal Interactions Xing Han et.al. 2509.25678 null
2025-09-29 K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model Bangwei Guo et.al. 2509.25594 null
2025-09-29 GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference Yu Han et.al. 2509.25041 null
2025-09-29 LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection Bao-Ngoc Dao et.al. 2509.24547 null
2025-09-29 One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning Minh Le et.al. 2509.24483 null
2025-09-29 Muon: Training and Trade-offs with Latent Attention and MoE Sushant Mehta et.al. 2509.24406 null
2025-09-29 LLaDA-MoE: A Sparse MoE Diffusion Language Model Fengqi Zhu et.al. 2509.24389 null
2025-09-29 Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning Zhisheng Chen et.al. 2509.24222 null
2025-09-28 HunyuanImage 3.0 Technical Report Siyu Cao et.al. 2509.23951 null
2025-09-28 Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms Jiahao Ying et.al. 2509.23933 null
2025-09-28 Bayesian Mixture-of-Experts: Towards Making LLMs Know What They Don’t Know Albus Yizhuo Li et.al. 2509.23830 null
2025-09-28 A Modality-Tailored Graph Modeling Framework for Urban Region Representation via Contrastive Learning Yaya Zhao et.al. 2509.23772 null
2025-09-26 Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time Yixuan Han et.al. 2509.22572 null
2025-09-26 Learning to Ball: Composing Policies for Long-Horizon Basketball Moves Pei Xu et.al. 2509.22442 null
2025-09-26 Role-Aware Multi-modal federated learning system for detecting phishing webpages Bo Wang et.al. 2509.22369 null
2025-09-26 HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space Ke Li et.al. 2509.22299 null
2025-09-26 Unlocking the Power of Mixture-of-Experts for Task-Aware Time Series Analytics Xingjian Wu et.al. 2509.22279 null
2025-09-26 MultiCrafter: High-Fidelity Multi-Subject Generation via Spatially Disentangled Attention and Identity-Aware Reinforcement Learning Tao Wu et.al. 2509.21953 null
2025-09-26 Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts Naibin Gu et.al. 2509.21892 null
2025-09-26 ChaosNexus: A Foundation Model for Universal Chaotic System Forecasting with Multi-scale Representations Chang Liu et.al. 2509.21802 null
2025-09-26 LongScape: Advancing Long-Horizon Embodied World Models with Context-Aware MoE Yu Shang et.al. 2509.21790 null
2025-09-25 Distributed Specialization: Rare-Token Neurons in Large Language Models Jing Liu et.al. 2509.21163 null
2025-09-26 Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns Xuemiao Zhang et.al. 2509.21124 null
2025-09-25 Physics Informed Neural Networks for design optimisation of diamond particle detectors for charged particle fast-tracking at high luminosity hadron colliders Alessandro Bombini et.al. 2509.21123 null
2025-09-24 Dynamic Reasoning Chains through Depth-Specialized Mixture-of-Experts in Transformer Architectures Sampurna Roy et.al. 2509.20577 null
2025-09-24 SHMoAReg: Spark Deformable Image Registration via Spatial Heterogeneous Mixture of Experts and Attention Heads Yuxi Zheng et.al. 2509.20073 null
2025-09-24 Faster, Smaller, and Smarter: Task-Aware Expert Merging for Online MoE Inference Ziyi Han et.al. 2509.19781 null
2025-09-23 DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces Tianshuo Zhang et.al. 2509.19230 null
2025-09-23 Frequency-Domain Decomposition and Recomposition for Robust Audio-Visual Segmentation Yunzhe Shen et.al. 2509.18912 null
2025-09-23 LongCat-Flash-Thinking Technical Report Meituan LongCat Team et.al. 2509.18883 null
2025-09-23 PIE: Perception and Interaction Enhanced End-to-End Motion Planning for Autonomous Driving Chengran Yuan et.al. 2509.18609 null
2025-09-23 Symphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-Experts Qi Wang et.al. 2509.18542 null
2025-09-23 StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models Haoxin Yang et.al. 2509.17993 null
2025-09-23 Optimizing Inference in Transformer-Based Models: A Multi-Method Benchmark Siu Hang Ho et.al. 2509.17894 null
2025-09-22 Expert-as-a-Service: Towards Efficient, Scalable, and Robust Large-scale MoE Serving Ziming Liu et.al. 2509.17863 null
2025-09-22 Attention-based Mixture of Experts for Robust Speech Deepfake Detection Viola Negroni et.al. 2509.17585 null
2025-09-22 Robust Mixture Models for Algorithmic Fairness Under Latent Heterogeneity Siqi Li et.al. 2509.17411 null
2025-09-21 MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE Soheil Zibakhsh et.al. 2509.17238 null
2025-09-21 CoBEVMoE: Heterogeneity-aware Feature Fusion with Dynamic Mixture-of-Experts for Collaborative Perception Lingzhao Kong et.al. 2509.17107 null
2025-09-21 Dynamic Expert Specialization: Towards Catastrophic Forgetting-Free Multi-Domain MoE Adaptation Junzhuo Li et.al. 2509.16882 null
2025-09-20 KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control Jinrui Han et.al. 2509.16638 null
2025-09-19 DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning Sikai Bai et.al. 2509.16105 null
2025-09-19 MoE-CE: Enhancing Generalization for Deep Learning based Channel Estimation via a Mixture-of-Experts Framework Tianyu Li et.al. 2509.15964 null
2025-09-19 pFedSAM: Personalized Federated Learning of Segment Anything Model for Medical Image Segmentation Tong Wang et.al. 2509.15638 null
2025-09-19 MEC-Quant: Maximum Entropy Coding for Extremely Low Bit Quantization-Aware Training Junbiao Pang et.al. 2509.15514 null
2025-09-18 Beyond Spurious Signals: Debiasing Multimodal Large Language Models via Counterfactual Inference and Adaptive Expert Routing Zichen Wu et.al. 2509.15361 null
2025-09-18 Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting Liran Nochumsohn et.al. 2509.15105 null
2025-09-18 Adaptive LoRA Experts Allocation and Selection for Federated Fine-Tuning Lei Wang et.al. 2509.15087 null
2025-09-18 EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence Chaoyin She et.al. 2509.14977 null
2025-09-18 FURINA: Free from Unmergeable Router via LINear Aggregation of mixed experts Jiayi Han et.al. 2509.14900 null
2025-09-18 CollabVLA: Self-Reflective Vision-Language-Action Model Dreaming Together with Human Nan Sun et.al. 2509.14889 null
2025-09-17 CSMoE: An Efficient Remote Sensing Foundation Model with Soft Mixture-of-Experts Leonard Hackel et.al. 2509.14104 null
2025-09-18 SAIL-VL2 Technical Report Weijie Yin et.al. 2509.14033 null
2025-09-17 Semi-MoE: Mixture-of-Experts meets Semi-Supervised Histopathology Segmentation Nguyen Lan Vi Vu et.al. 2509.13834 null
2025-09-18 Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers Manan Mittal et.al. 2509.13548 null
2025-09-18 GLAD: Global-Local Aware Dynamic Mixture-of-Experts for Multi-Talker ASR Yujie Guo et.al. 2509.13093 null
2025-09-16 Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection Boyu Han et.al. 2509.12990 null
2025-09-16 Bridging Perception and Planning: Towards End-to-End Planning for Signal Temporal Logic Tasks Bowen Ye et.al. 2509.12813 null
2025-09-16 MEGAN: Mixture of Experts for Robust Uncertainty Estimation in Endoscopy Videos Damola Agbelese et.al. 2509.12772 null
2025-09-17 NavMoE: Hybrid Model- and Learning-based Traversability Estimation for Local Navigation via Mixture of Experts Botao He et.al. 2509.12747 null
2025-09-16 AsyMoE: Leveraging Modal Asymmetry for Enhanced Expert Specialization in Large Vision-Language Models Heng Zhang et.al. 2509.12715 null
2025-07-23 Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models Changxin Tian et.al. 2507.17702 null
2025-07-23 Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography Farnoush Bayatmakou et.al. 2507.17662 null
2025-07-23 InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation Shuai Yang et.al. 2507.17520 null
2025-07-23 Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection Yehao Lu et.al. 2507.17436 null
2025-07-23 A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model Zhe Xu et.al. 2507.17303 null
2025-07-23 BrownoutServe: SLO-Aware Inference Serving under Bursty Workloads for MoE-based LLMs Jianmin Hu et.al. 2507.17133 null
2025-07-22 GATEBLEED: Exploiting On-Core Accelerator Power Gating for High Performance & Stealthy Attacks on AI Joshua Kalyanapu et.al. 2507.17033 null
2025-07-22 Mixture-of-Expert Variational Autoencoders for Cross-Modality Embedding of Type Ia Supernova Data Yunyi Shen et.al. 2507.16817 null
2025-07-22 Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training Zixiao Huang et.al. 2507.16274 null
2025-07-21 Applying multimodal learning to Classify transient Detections Early (AppleCiDEr) I: Data set, methods, and infrastructure Alexandra Junell et.al. 2507.16088 null
2025-07-21 Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation Alessandro B. Melchiorre et.al. 2507.15826 null
2025-07-21 The New LLM Bottleneck: A Systems Perspective on Latent Attention and Mixture-of-Experts Sungmin Yun et.al. 2507.15465 null
2025-07-21 Universal crystal material property prediction via multi-view geometric fusion in graph transformers Liang Zhang et.al. 2507.15303 null
2025-07-20 CoMoCAVs: Cohesive Decision-Guided Motion Planning for Connected and Autonomous Vehicles with Multi-Policy Reinforcement Learning Pan Hu et.al. 2507.14903 null
2025-07-23 GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving Chi Wan et.al. 2507.14456 null
2025-07-18 SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing Yingying Zhang et.al. 2507.13812 null
2025-07-17 Apple Intelligence Foundation Language Models: Tech Report 2025 Hanzhi Zhou et.al. 2507.13575 null
2025-07-17 R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning Xiaohan Guo et.al. 2507.13107 null
2025-07-16 Astro-MoE: Mixture of Experts for Multiband Astronomical Time Series Martina Cádiz-Leyton et.al. 2507.12611 null
2025-07-16 Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models Gen Luo et.al. 2507.12566 null
2025-07-16 Mixture of Raytraced Experts Andrea Perin et.al. 2507.12419 null
2025-07-16 CorrMoE: Mixture of Experts with De-stylization Learning for Cross-Scene and Cross-Domain Correspondence Pruning Peiwen Xia et.al. 2507.11834 null
2025-07-15 Mixture of Experts in Large Language Models Danyang Zhang et.al. 2507.11181 null
2025-07-15 Atmos-Bench: 3D Atmospheric Structures for Climate Insight Tianchi Xu et.al. 2507.11085 null
2025-07-14 DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models Luolin Xiong et.al. 2507.09955 null
2025-07-14 ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization Huilai Li et.al. 2507.09945 null
2025-07-14 Multi-residual Mixture of Experts Learning for Cooperative Control in Multi-vehicle Systems Vindula Jayawardana et.al. 2507.09836 null
2025-07-13 Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts Aakash Tripathi et.al. 2507.09754 null
2025-07-13 Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive You Huang et.al. 2507.09612 null
2025-07-12 PPJudge: Towards Human-Aligned Assessment of Artistic Painting Process Shiqi Jiang et.al. 2507.09242 null
2025-07-11 BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity Chenyang Song et.al. 2507.08771 null
2025-07-11 CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes Tianyou Jiang et.al. 2507.08542 null
2025-07-11 White-Basilisk: A Hybrid Model for Code Vulnerability Detection Ioannis Lamprou et.al. 2507.08540 null
2025-07-15 KAT-V1: Kwai-AutoThink Technical Report Zizheng Zhan et.al. 2507.08297 null
2025-07-11 Data-Driven Dimensional Synthesis of Diverse Planar Four-bar Function Generation Mechanisms via Direct Parameterization Woon Ryong Kim et.al. 2507.08269 null
2025-07-10 MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving Lu Xu et.al. 2507.07818 null
2025-07-10 When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance Peizhang Shao et.al. 2507.07748 null
2025-07-09 Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning Ankit Jyothish et.al. 2507.07335 null
2025-07-08 Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate A. Bochkov et.al. 2507.07129 null
2025-07-09 4KAgent: Agentic Any Image to 4K Super-Resolution Yushen Zuo et.al. 2507.07105 null
2025-07-11 FlexOlmo: Open Language Models for Flexible Data Use Weijia Shi et.al. 2507.07024 null
2025-07-09 Deep Disentangled Representation Network for Treatment Effect Estimation Hui Meng et.al. 2507.06650 null
2025-07-09 SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference Qian Chen et.al. 2507.06567 null
2025-07-09 MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models Yiwen Liu et.al. 2507.06502 null
2025-07-08 Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation Szymon Płotka et.al. 2507.06363 null
2025-07-08 Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis Xintong Hu et.al. 2507.06116 null
2025-07-09 A Survey on Prompt Tuning Zongqian Li et.al. 2507.06085 null
2025-07-08 Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors Bing Wang et.al. 2507.05939 null
2025-07-08 What You Have is What You Track: Adaptive and Robust Multimodal Tracking Yuedong Tan et.al. 2507.05899 null
2025-07-08 Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition Zijin Gu et.al. 2507.05724 null
2025-07-08 Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach Xiaobing Chen et.al. 2507.05685 null
2025-07-08 City-Level Foreign Direct Investment Prediction with Tabular Learning on Judicial Data Tianxing Wu et.al. 2507.05651 null
2025-07-07 QMoE: A Quantum Mixture of Experts Framework for Scalable Quantum Neural Networks Hoang-Quan Nguyen et.al. 2507.05190 null
2025-07-07 NTSFormer: A Self-Teaching Graph Transformer for Multimodal Cold-Start Node Classification Jun Hu et.al. 2507.04870 null
2025-07-07 DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics Yayu Long et.al. 2507.04661 null
2025-07-08 UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification Xixi Wan et.al. 2507.04638 null
2025-07-07 Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts Yun Wang et.al. 2507.04631 null
2025-07-05 Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge Linshen Liu et.al. 2507.04123 null
2025-07-05 From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM Xinyi Wu et.al. 2507.03868 null
2025-07-04 Decoupled Relative Learning Rate Schedules Jan Ludziejewski et.al. 2507.03526 null
2025-07-03 Neural Inhibition Improves Dynamic Routing and Mixture of Experts Will Y. Zou et.al. 2507.03221 null
2025-07-03 System-performance and cost modeling of Large Language Model training and inference Wenzhe Guo et.al. 2507.02456 null
2025-07-03 NLP4Neuro: Sequence-to-sequence learning for neural population decoding Jacob J. Morra et.al. 2507.02264 null
2025-07-02 MoIRA: Modular Instruction Routing Architecture for Multi-Task Robotics Dmytro Kuzmenko et.al. 2507.01843 null
2025-07-02 Mixtures of Neural Network Experts with Application to Phytoplankton Flow Cytometry Data Ethan Pawl et.al. 2507.01375 null
2025-07-02 Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model Chaoxiang Cai et.al. 2507.01351 null
2025-07-02 Dynamical Multimodal Fusion with Mixture-of-Experts for Localizations Bohao Wang et.al. 2507.01337 null
2025-07-02 ExPaMoE: An Expandable Parallel Mixture of Experts for Continual Test-Time Adaptation JianChao Zhao et.al. 2507.00502 null
2025-07-01 MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE Geng Zhang et.al. 2507.00390 null
2025-06-30 MotionGPT3: Human Motion as a Second Modality Bingfan Zhu et.al. 2506.24086 null
2025-06-30 MReg: A Novel Regression Model with MoE-based Video Feature Mining for Mitral Regurgitation Diagnosis Zhe Liu et.al. 2506.23648 null
2025-06-30 Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model Mu-Chi Chen et.al. 2506.23635 null
2025-06-29 Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging Lujun Li et.al. 2506.23266 null
2025-06-29 External Data-Enhanced Meta-Representation for Adaptive Probabilistic Load Forecasting Haoran Li et.al. 2506.23201 null
2025-06-29 Hierarchical Corpus-View-Category Refinement for Carotid Plaque Risk Grading in Ultrasound Zhiyuan Zhu et.al. 2506.23108 null
2025-07-01 Hecto: Modular Sparse Experts for Adaptive and Interpretable Reasoning Sanskar Pandey et.al. 2506.22919 null
2025-06-27 Towards Distributed Neural Architectures Aditya Cowsik et.al. 2506.22389 null
2025-06-27 MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism Zheng Zhang et.al. 2506.22175 null
2025-06-27 DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE Hang Shao et.al. 2506.21864 null
2025-06-26 Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts Jiajie Yang et.al. 2506.21328 null
2025-06-26 Learning to Skip the Middle Layers of Transformers Tim Lawson et.al. 2506.21103 null
2025-06-26 Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning Haodong Lu et.al. 2506.21035 null
2025-06-26 EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning Xiao Zhang et.al. 2506.20986 null
2025-06-25 Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration Jiaxing Huang et.al. 2506.20282 null
2025-06-23 Multimodal Anomaly Detection with a Mixture-of-Experts Christoph Willibald et.al. 2506.19077 null
2025-06-23 Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models Zihan Wang et.al. 2506.18945 null
2025-06-23 Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning Rahul Atul Bhope et.al. 2506.18789 null
2025-06-23 An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify Shivam Verma et.al. 2506.18735 null
2025-06-23 Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks Xiaodong Wu et.al. 2506.18543 null
2025-06-23 SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation Zichong Li et.al. 2506.18349 null
2025-06-23 Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies Junchao Fan et.al. 2506.18304 null
2025-06-22 Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection Zheng Zhan et.al. 2506.18145 null
2025-06-21 Incorporating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Expert Gelei Xu et.al. 2506.17787 null
2025-06-21 Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities Xinghao Huang et.al. 2506.17755 null
2025-06-21 PDC-Net: Pattern Divide-and-Conquer Network for Pelvic Radiation Injury Segmentation Xinyu Xiong et.al. 2506.17712 null
2025-06-20 SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification Zhenglin Lai et.al. 2506.17368 null
2025-06-19 FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE Khiem Le et.al. 2506.16600 null
2025-06-19 Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models Daniel Fidel Harvey et.al. 2506.16419 null
2025-06-17 Scaling Intelligence: Designing Data Centers for Next-Gen Language Models Jesmin Jahan Tithi et.al. 2506.15006 null
2025-06-17 NeuroMoE: A Transformer-Based Mixture-of-Experts Framework for Multi-Modal Neurological Disorder Classification Wajih Hassan Raza et.al. 2506.14970 null
2025-06-17 GMT: General Motion Tracking for Humanoid Whole-Body Control Zixuan Chen et.al. 2506.14770 null
2025-06-17 Exploring Speaker Diarization with Mixture of Experts Gaobin Yang et.al. 2506.14750 null
2025-06-18 Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs Ling Team et.al. 2506.14731 null
2025-06-17 GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors Hengyuan Zhang et.al. 2506.14646 link
2025-06-17 Single-Example Learning in a Mixture of GPDMs with Latent Geometries Jesse St. Amand et.al. 2506.14563 null
2025-06-17 MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models Hongyu Wang et.al. 2506.14435 null
2025-06-16 Load Balancing Mixture of Experts with Similarity Preserving Routers Nabil Omi et.al. 2506.14038 null
2025-06-16 GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics Qianzhong Chen et.al. 2506.14009 null
2025-06-16 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention MiniMax et.al. 2506.13585 link
2025-06-16 Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization Guanghui Song et.al. 2506.13541 null
2025-06-16 EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization Zhongqian Fu et.al. 2506.13329 link
2025-06-16 Breaking Thought Patterns: A Multi-Dimensional Reasoning Framework for LLMs Xintong Tang et.al. 2506.13192 null
2025-06-15 Serving Large Language Models on Huawei CloudMatrix384 Pengfei Zuo et.al. 2506.12708 null
2025-06-14 Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts Shengzhuang Chen et.al. 2506.12597 null
2025-06-14 Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control Rongpeng Li et.al. 2506.12453 null
2025-06-17 HarMoEny: Efficient Multi-GPU Inference of MoE Models Zachary Doucet et.al. 2506.12417 null
2025-06-14 Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model Chong Li et.al. 2506.12388 null
2025-06-13 Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources? Houyi Li et.al. 2506.12119 null
2025-06-13 Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution Zhangkai Ni et.al. 2506.11823 link
2025-06-12 Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts Zaijing Li et.al. 2506.10357 null
2025-06-11 GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture GigaChat team et.al. 2506.09440 null
2025-06-11 DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts Yuchen Feng et.al. 2506.09351 null
2025-06-10 CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA Jiale Dong et.al. 2506.08496 link
2025-06-11 MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding Shivang Chopra et.al. 2506.08356 null
2025-06-11 STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation Yiming Wang et.al. 2506.08054 link
2025-06-09 A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling Jacob Helwig et.al. 2506.07969 link
2025-06-09 M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration Yongzhen Wang et.al. 2506.07814 null
2025-06-11 MIRA: Medical Time Series Foundation Model for Real-World Health Data Hao Li et.al. 2506.07584 null
2025-06-11 MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization Ken Yaggel et.al. 2506.07563 link
2025-06-09 MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts Wei Tao et.al. 2506.07533 null
2025-06-09 MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing Haiyue Ma et.al. 2506.07366 null
2025-06-08 UNO: Unified Self-Supervised Monocular Odometry for Platform-Agnostic Deployment Wentao Zhao et.al. 2506.07013 null
2025-06-07 High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations Ziwei Li et.al. 2506.06858 null
2025-06-07 Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning Yuan Yuan et.al. 2506.06694 null
2025-06-06 Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization Jonathan Yang et.al. 2506.06196 null
2025-06-06 MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models Jie Cao et.al. 2506.05928 null
2025-06-06 dots.llm1 Technical Report Bi Huo et.al. 2506.05767 null
2025-06-05 Mixture-of-Experts Meets In-Context Reinforcement Learning Wenhao Wu et.al. 2506.05426 null
2025-06-05 Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection Ziyi Zhou et.al. 2506.04739 null
2025-06-05 FlashDMoE: Fast Distributed MoE in a Single Kernel Osayamen Jonathan Aimuyo et.al. 2506.04667 link
2025-06-04 Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts Jiaxing Zhang et.al. 2506.03591 null
2025-06-04 PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs Ze Yu Zhang et.al. 2506.02965 null
2025-06-03 Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights Jakub Krajewski et.al. 2506.02890 null
2025-06-03 Brain-Like Processing Pathways Form in Models With Heterogeneous Experts Jack Cook et.al. 2506.02813 null
2025-06-04 MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection Juntong Li et.al. 2506.02535 null
2025-06-03 MidPO: Dual Preference Optimization for Safety and Helpfulness in Large Language Models via a Mixture of Experts Framework Yupeng Qi et.al. 2506.02460 null
2025-05-31 Enhancing Multimodal Continual Instruction Tuning with BranchLoRA Duzhen Zhang et.al. 2506.02041 null
2025-06-02 SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model Zhao Yang et.al. 2506.01833 link
2025-06-02 Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning Ryotaro Kawata et.al. 2506.01656 null
2025-06-02 DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models Jiancheng Ye et.al. 2506.01257 null
2025-06-01 Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts Fan Liu et.al. 2506.00965 null
2025-05-30 Mixture-of-Experts for Personalized and Semantic-Aware Next Location Prediction Shuai Liu et.al. 2505.24597 null
2025-05-30 Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis Junzhuo Li et.al. 2505.24593 null
2025-05-30 Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer Yilun Kong et.al. 2505.24378 link
2025-05-30 GradPower: Powering Gradients for Faster Language Model Pre-Training Mingze Wang et.al. 2505.24275 null
2025-05-30 On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks Mingze Wang et.al. 2505.24205 null
2025-05-29 Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts Xuweiyi Chen et.al. 2505.23926 null
2025-06-03 Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert Zhaokun Wang et.al. 2505.23868 null
2025-05-29 From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents Tobias Lindenbauer et.al. 2505.23422 link
2025-05-29 Context-Aware Semantic Communication for the Wireless Networks Guangyuan Liu et.al. 2505.23249 null
2025-05-29 Two Is Better Than One: Rotations Scale LoRAs Hongcan Guo et.al. 2505.23184 null
2025-05-28 HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer Qi Cai et.al. 2505.22705 link
2025-05-28 Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts Xue Zhang et.al. 2505.22582 null
2025-05-28 A Human-Centric Approach to Explainable AI for Personalized Education Vinitra Swamy et.al. 2505.22541 link
2025-05-28 Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion Kewen Chen et.al. 2505.22360 null
2025-05-28 Advancing Expert Specialization for Better MoE Hongcan Guo et.al. 2505.22323 null
2025-05-28 ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation Jiawen Yu et.al. 2505.22159 null
2025-05-28 AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation Yan Rong et.al. 2505.22053 null
2025-05-28 Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge Zhongyi Zhou et.al. 2505.21906 null
2025-05-27 MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis Yitong Li et.al. 2505.21698 null
2025-05-28 Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity Yehui Tang et.al. 2505.21411 null
2025-05-27 Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM’s Instruction-Following Capabilities Junyan Zhang et.al. 2505.21191 null
2025-05-27 Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts Yue Zhang et.al. 2505.21079 null
2025-05-27 Multi-objective Large Language Model Alignment with Hierarchical Experts Zhuo Li et.al. 2505.20925 null
2025-05-26 FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models Hao Kang et.al. 2505.20225 link
2025-05-26 NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID Shihao Li et.al. 2505.20001 null
2025-05-26 Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments Junming Liu et.al. 2505.19699 null
2025-05-26 MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE Zongle Huang et.al. 2505.19645 null
2025-05-26 Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate Liangwei Nathan Zheng et.al. 2505.19525 link
2025-05-26 WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference Sihan Chen et.al. 2505.19427 link
2025-05-25 RankLLM: A Python Package for Reranking with LLMs Sahel Sharifymoghaddam et.al. 2505.19284 null
2025-05-25 I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts Jiayi Xin et.al. 2505.19190 link
2025-05-24 TrajMoE: Spatially-Aware Mixture of Experts for Unified Human Mobility Modeling Chonghua Han et.al. 2505.18670 null
2025-05-24 ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation Jian Liang et.al. 2505.18640 link
2025-05-24 Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter Weizhi Zhong et.al. 2505.18612 null
2025-05-23 Enhancing CTR Prediction with De-correlated Expert Networks Jiancheng Wang et.al. 2505.17925 null
2025-05-23 PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval Zehua Pei et.al. 2505.17639 null
2025-05-23 CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning Jinyuan Feng et.al. 2505.17553 null
2025-05-23 MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation Kaixing Yang et.al. 2505.17543 null
2025-05-22 JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model Qihao Duan et.al. 2505.17257 null
2025-05-22 DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving Zhenjie Yang et.al. 2505.16278 null
2025-05-22 DualComp: End-to-End Learning of a Unified Dual-Modality Lossless Compressor Yan Zhao et.al. 2505.16256 null
2025-05-21 Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models Jingcong Liang et.al. 2505.16056 link
2025-05-21 MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding Yuxiang Wei et.al. 2505.15946 null
2025-05-21 CoLA: Collaborative Low-Rank Adaptation Yiyun Zhou et.al. 2505.15471 link
2025-05-22 Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought Tencent Hunyuan Team et.al. 2505.15431 null
2025-05-21 Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks Uranik Berisha et.al. 2505.15414 null
2025-05-21 Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines Xiaohou Shi et.al. 2505.15151 null
2025-05-20 Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies Haoyi Qiu et.al. 2505.14972 link
2025-05-20 Balanced and Elastic End-to-end Training of Dynamic LLMs Mohamed Wahib et.al. 2505.14864 null
2025-05-20 Solving MNIST with a globally trained Mixture of Quantum Experts Paolo Alessandro Xavier Tognini et.al. 2505.14789 null
2025-05-20 Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training Mengru Wang et.al. 2505.14681 null
2025-05-21 Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach Umberto Cappellazzo et.al. 2505.14336 null
2025-05-20 FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation Shaolin Zhu et.al. 2505.14256 null
2025-05-20 THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation Yunlong Liang et.al. 2505.14173 null
2025-05-20 Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition Shuo Zhang et.al. 2505.14143 null
2025-05-20 Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging Ryo Bertolissi et.al. 2505.14136 null
2025-05-20 StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning Huaijie Wang et.al. 2505.13997 null
2025-05-20 Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting Bao-Ngoc Dao et.al. 2505.13944 link
2025-05-20 U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding Ziqian Wang et.al. 2505.13880 link
2025-05-20 EfficientLLM: Efficiency in Large Language Models Zhengqing Yuan et.al. 2505.13840 null
2025-05-19 CompeteSMoE – Statistically Guaranteed Mixture of Experts Training via Competition Nam V. Nguyen et.al. 2505.13380 link
2025-05-19 Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference Shuqing Luo et.al. 2505.13345 link
2025-05-19 Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models Lucas Berry et.al. 2505.13273 null
2025-05-19 True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics Christoph Jürgen Hemmer et.al. 2505.13192 null
2025-05-19 Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures Tuan Thai et.al. 2505.13052 null
2025-05-18 Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented Optimization Hongbiao Zhu et.al. 2505.12311 null
2025-05-20 Model Merging in Pre-training of Large Language Models Yunshui Li et.al. 2505.12082 null
2025-05-20 Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition Runduo Han et.al. 2505.12007 link
2025-05-17 MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging Zihuan Qiu et.al. 2505.11883 null
2025-05-17 Improving Coverage in Combined Prediction Sets with Weighted p-values Gina Wong et.al. 2505.11785 null
2025-05-16 MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production Chao Jin et.al. 2505.11432 null
2025-05-16 MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems Yinsicheng Jiang et.al. 2505.11415 null
2025-05-16 A Fast Kernel-based Conditional Independence test with Application to Causal Discovery Oliver Schacht et.al. 2505.11085 null
2025-05-16 On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating Huy Nguyen et.al. 2505.10860 null
2025-05-14 PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning Zongqian Li et.al. 2505.09519 link
2025-05-14 Qwen3 Technical Report An Yang et.al. 2505.09388 link
2025-05-14 Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Chenggang Zhao et.al. 2505.09343 null
2025-05-13 Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony Shaoyu Wang et.al. 2505.08944 null
2025-05-13 PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts Yang Su et.al. 2505.08719 null
2025-05-13 AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale Yunjie Ji et.al. 2505.08311 null
2025-05-12 UMoE: Unifying Attention and FFN with Shared Experts Yuanhang Yang et.al. 2505.07260 null
2025-05-11 Seed1.5-VL Technical Report Dong Guo et.al. 2505.07062 null
2025-05-11 FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers Tianyu Chen et.al. 2505.06858 null
2025-05-11 The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts Enric Boix-Adsera et.al. 2505.06839 null
2025-05-10 Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Zihan Qiu et.al. 2505.06708 link
2025-05-10 Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding Dawei Huang et.al. 2505.06685 link
2025-05-10 QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration HamidReza Imani et.al. 2505.06481 null
2025-05-12 FloE: On-the-Fly MoE Inference on Memory-constrained GPU Yuxin Zhou et.al. 2505.05950 null
2025-05-09 MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design Haojie Duanmu et.al. 2505.05799 link
2025-05-08 Divide-and-Conquer: Cold-Start Bundle Recommendation via Mixture of Diffusion Experts Ming Li et.al. 2505.05035 null
2025-05-07 Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs Yehui Tang et.al. 2505.04519 null
2025-05-07 SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios Ning Cheng et.al. 2505.04201 null
2025-05-07 LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress? Teddy Foley et.al. 2505.04075 link
2025-05-07 Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications Yuanai Xie et.al. 2505.04068 null
2025-05-06 Towards Smart Point-and-Shoot Photography Jiawan Li et.al. 2505.03638 null
2025-05-06 Faster MoE LLM Inference for Extremely Large Models Haoqi Yang et.al. 2505.03531 null
2025-05-06 STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation Maolin Wang et.al. 2505.03484 null
2025-05-06 3D Gaussian Splatting Data Compression with Mixture of Priors Lei Liu et.al. 2505.03310 null
2025-05-05 Finger Pose Estimation for Under-screen Fingerprint Sensor Xiongjun Guan et.al. 2505.02481 link
2025-05-05 Multimodal Deep Learning-Empowered Beam Prediction in Future THz ISAC Systems Kai Zhang et.al. 2505.02381 null
2025-05-05 Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques Sanjay Surendranath Girija et.al. 2505.02309 null
2025-05-04 Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields Zhenxing Mi et.al. 2505.02005 link
2025-05-03 Backdoor Attacks Against Patch-based Mixture of Experts Cedric Chan et.al. 2505.01811 link
2025-05-01 MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling Abdoul Majid O. Thiombiano et.al. 2505.01459 null
2025-05-02 Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders Rogelio A Mancisidor et.al. 2505.01134 null
2025-05-02 CoCoAFusE: Beyond Mixtures of Experts via Model Fusion Aurelio Raffa Ugolini et.al. 2505.01105 null
2025-05-01 Improving Routing in Sparse Mixture of Experts with Graph of Tokens Tam Nguyen et.al. 2505.00792 null
2025-05-01 CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series Tian Lan et.al. 2505.00415 null
2025-05-01 Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing Piotr Piękos et.al. 2505.00315 link
2025-04-30 Online Federation For Mixtures of Proprietary Agents with Black-Box Encoders Xuwei Yang et.al. 2505.00216 null
2025-04-29 TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts Pradip Kunwar et.al. 2504.21190 null
2025-04-29 Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization Shuai Gong et.al. 2504.21063 null
2025-04-26 PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight Ben Goertzel et.al. 2504.21029 null
2025-04-29 MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification Yichu Xu et.al. 2504.20509 null
2025-04-29 FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks Wenjing Xiao et.al. 2504.20446 null
2025-04-29 MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation Amaan Izhar et.al. 2504.20343 link
2025-04-28 Accelerating Mixture-of-Experts Training with Adaptive Expert Replication Athinagoras Skiadopoulos et.al. 2504.19925 null
2025-04-28 Decentralization of Generative AI via Mixture of Experts for Wireless Networks: A Comprehensive Survey Yunting Xu et.al. 2504.19660 null
2025-04-28 ARTEMIS: Autoregressive End-to-End Trajectory Planning with Mixture of Experts for Autonomous Driving Renju Feng et.al. 2504.19580 link
2025-04-29 BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts Qingyue Wang et.al. 2504.18598 null
2025-04-25 NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation Rob Romijnders et.al. 2504.18147 null
2025-04-28 Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection Haokai Zhang et.al. 2504.17834 link
2025-04-22 Compass-V2 Technical Report Sophia Maria et.al. 2504.15527 null
2025-04-21 Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images Jonathan Brokman et.al. 2504.15470 link
2025-04-17 D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving Haodong Wang et.al. 2504.15299 null
2025-04-23 MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core Dennis Liu et.al. 2504.14960 null
2025-04-18 Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts Jie Zou et.al. 2504.13655 null
2025-04-18 HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering Alexander Rusnak et.al. 2504.13590 null
2025-04-18 Dense Backpropagation Improves Training for Sparse Mixture-of-Experts Ashwinee Panda et.al. 2504.12463 link
2025-04-16 Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models Yuanbo Tang et.al. 2504.12359 null
2025-04-16 Trend Filtered Mixture of Experts for Automated Gating of High-Frequency Flow Cytometry Data Sangwon Hyun et.al. 2504.12287 null
2025-04-16 MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models Hang Yuan et.al. 2504.12234 null
2025-04-15 Simulation-based inference for stochastic nonlinear mixed-effects models with applications in systems biology Henrik Häggström et.al. 2504.11279 link
2025-04-14 Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning LeiLei Ma et.al. 2504.09990 null
2025-04-14 Multi-objective Bayesian Optimization With Mixed-categorical Design Variables for Expensive-to-evaluate Aeronautical Applications Nathalie Bartoli et.al. 2504.09930 null
2025-04-14 Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming Zhiqiang He et.al. 2504.09906 null
2025-04-13 Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation Jia Wei et.al. 2504.09601 null
2025-04-12 MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints Yichao Yuan et.al. 2504.09345 null
2025-04-12 Mixture of Group Experts for Learning Invariant Representations Lei Kang et.al. 2504.09265 null
2025-04-11 RouterKT: Mixture-of-Experts for Knowledge Tracing Han Liao et.al. 2504.08989 link
2025-04-11 Regularized infill criteria for multi-objective Bayesian optimization with application to aircraft design Robin Grapin et.al. 2504.08671 null
2025-04-10 C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing Zhongyang Li et.al. 2504.07964 link
2025-04-11 Scaling Laws for Native Multimodal Models Mustafa Shukor et.al. 2504.07951 null
2025-04-10 Cluster-Driven Expert Pruning for Mixture-of-Experts Large Language Models Hongcheng Guo et.al. 2504.07807 link
2025-04-10 Adaptive Detection of Fast Moving Celestial Objects Using a Mixture of Experts and Physical-Inspired Neural Network Peng Jia et.al. 2504.07777 null
2025-04-10 Kimi-VL Technical Report Kimi Team et.al. 2504.07491 link
2025-04-09 MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution Zhe Wang et.al. 2504.07308 link
2025-04-11 Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models Ling Team et.al. 2504.07158 null
2025-04-09 Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations Zican Dong et.al. 2504.06792 null
2025-04-09 FedMerge: Federated Personalization via Model Merging Shutong Chen et.al. 2504.06768 null
2025-04-08 S’MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning Hanqing Zeng et.al. 2504.06426 null
2025-04-08 HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference Shuzhang Zhong et.al. 2504.05897 link
2025-04-08 Adaptive Substructure-Aware Expert Model for Molecular Property Prediction Tianyi Jiang et.al. 2504.05844 null
2025-04-10 Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations Ajay Jaiswal et.al. 2504.05586 null
2025-04-07 SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement Zuying Xie et.al. 2504.04818 null
2025-04-06 On the Spatial Structure of Mixture-of-Experts in Transformers Daniel Bershatsky et.al. 2504.04444 null
2025-04-05 Collaboration and Controversy Among Experts: Rumor Early Detection by Tuning a Comment Generator Bing Wang et.al. 2504.04076 link
2025-04-04 HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs Yongji Wu et.al. 2504.03871 null
2025-04-01 Detecting Financial Fraud with Hybrid Deep Learning: A Mix-of-Experts Approach to Sequential and Anomalous Patterns Diego Vallarino et.al. 2504.03750 null
2025-04-04 RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation Hanbo Bi et.al. 2504.03166 null
2025-04-03 TeleMoM: Consensus-Driven Telecom Intelligence via Mixture of Models Xinquan Wang et.al. 2504.02712 null
2025-04-07 MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators Beichen Huang et.al. 2504.02658 link
2025-04-07 MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism Ruidong Zhu et.al. 2504.02263 null
2025-04-02 Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design Mohan Zhang et.al. 2504.01337 null
2025-04-01 Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function Qiuchen Song et.al. 2504.00819 null
2025-04-01 DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism Dengchun Li et.al. 2504.00661 link
2025-04-01 Continual Cross-Modal Generalization Yan Xia et.al. 2504.00561 null
2025-04-01 Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection Shunxin Chen et.al. 2504.00458 null
2025-03-31 Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion Jiagen Li et.al. 2503.23721 null
2025-03-30 Mixture of Routers Jia-Chen Zhang et.al. 2503.23362 null
2025-03-29 Beyond Standard MoE: Mixture of Latent Experts for Resource-Efficient Language Models Zehua Liu et.al. 2503.23100 null
2025-03-29 S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning Giang Do et.al. 2503.23007 null
2025-03-29 Sparse Mixture of Experts as Unified Competitive Learning Giang Do et.al. 2503.22996 null
2025-04-01 Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities Raman Dutt et.al. 2503.22517 null
2025-03-27 RocketPPA: Ultra-Fast LLM-Based PPA Estimator at Code-Level Abstraction Armin Abdollahi et.al. 2503.21971 null
2025-03-27 iMedImage Technical Report Ran Wei et.al. 2503.21836 null
2025-03-27 LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models Hengyuan Zhao et.al. 2503.21227 null
2025-03-26 Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework Soham Sane et.al. 2503.20750 null
2025-03-26 UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines Chen Tang et.al. 2503.20748 null
2025-03-26 Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning Sashuai Zhou et.al. 2503.20633 null
2025-03-26 MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation Rongyu Zhang et.al. 2503.20384 null
2025-03-26 Modality-Independent Brain Lesion Segmentation with Privacy-aware Continual Learning Yousef Sadegheih et.al. 2503.20326 link
2025-03-25 Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion Konyul Park et.al. 2503.19776 null
2025-03-25 BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts Suzhe Xu et.al. 2503.19769 null
2025-03-25 M $^2$ CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation Ziyuan Liu et.al. 2503.19406 null
2025-03-27 Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design Rui Xie et.al. 2503.18869 null
2025-03-24 Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding Tianyu Chen et.al. 2503.18578 null
2025-03-24 SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking Wenrui Cai et.al. 2503.18338 link
2025-03-23 Challenging Dataset and Multi-modal Gated Mixture of Experts Model for Remote Sensing Copy-Move Forgery Understanding Ze Zhang et.al. 2503.18104 link
2025-03-22 Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM Codefuse et.al. 2503.17793 null
2025-03-25 Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts Yike Yuan et.al. 2503.16057 null
2025-03-21 UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations Debabrata Mandal et.al. 2503.15868 null
2025-03-20 Mixture of Lookup Experts Shibo Jie et.al. 2503.15798 link
2025-03-21 Leveraging MoE-based Large Language Model for Zero-Shot Multi-Task Semantic Communication Sin-Yu Huang et.al. 2503.15722 null
2025-03-19 SemEval-2025 Task 1: AdMIRe – Advancing Multimodal Idiomaticity Representation Thomas Pickard et.al. 2503.15358 null
2025-03-21 Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action Recognition Seungyeon Cho et.al. 2503.14960 null
2025-03-18 Core-Periphery Principle Guided State Space Model for Functional Connectome Classification Minheng Chen et.al. 2503.14655 null
2025-03-18 MAST-Pro: Dynamic Mixture-of-Experts for Adaptive Segmentation of Pan-Tumors with Knowledge-Driven Prompts Runqi Meng et.al. 2503.14355 null
2025-03-18 SNAKE: A Sustainable and Multi-functional Traffic Analysis System utilizing Specialized Large-Scale Models with a Mixture of Experts Architecture Tian Qin et.al. 2503.13808 null
2025-03-17 Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge Shengling Qin et.al. 2503.13421 null
2025-03-17 Channel Estimation for Pinching-Antenna Systems (PASS) Jian Xiao et.al. 2503.13268 null
2025-03-17 Federated Mixture-of-Expert for Non-Overlapped Cross-Domain Sequential Recommendation Yu Liu et.al. 2503.13254 null
2025-03-16 Fast filtering of non-Gaussian models using Amortized Optimal Transport Maps Mohammad Al-Jarrah et.al. 2503.12633 link
2025-03-16 MoECollab: Democratizing LLM Development Through Collaborative Mixture of Experts Harshit et.al. 2503.12592 null
2025-03-16 MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification Jianwei Zhao et.al. 2503.12401 null
2025-03-15 Adaptive Mixture of Experts Learning for Robust Audio Spoofing Detection Qixian Chen et.al. 2503.12010 null
2025-03-14 FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-the-World LoRA Jieming Bian et.al. 2503.11880 null
2025-03-14 A Review of DeepSeek Models’ Key Innovative Techniques Chengen Wang et.al. 2503.11486 null
2025-03-14 MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling Rachel S. Y. Teo et.al. 2503.11144 link
2025-03-13 Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores Chenpeng Wu et.al. 2503.10725 link
2025-03-14 dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis Luyuan Xie et.al. 2503.10412 null
2025-03-13 StableFusion: Continual Video Retrieval via Frame Adaptation Zecheng Zhao et.al. 2503.10111 link
2025-03-12 Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework Bakary Badjie et.al. 2503.09504 null
2025-03-12 Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment Nazanin Moradinasab et.al. 2503.09498 link
2025-03-12 Astrea: A MOE-based Visual Understanding Model with Progressive Alignment Xiaoda Yang et.al. 2503.09445 null
2025-03-12 Automatic Operator-level Parallelism Planning for Distributed Deep Learning – A Mixed-Integer Programming Approach Ruifeng She et.al. 2503.09357 null
2025-03-12 Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference Mohammad Siavashi et.al. 2503.09304 null
2025-03-13 FaVChat: Unlocking Fine-Grained Facial Video Understanding with Multimodal Large Language Models Fufangchen Zhao et.al. 2503.09158 null
2025-03-11 MoE-Loco: Mixture of Experts for Multitask Locomotion Runhan Huang et.al. 2503.08564 null
2025-03-11 Accelerating MoE Model Inference with Expert Sharding Oana Balmau et.al. 2503.08467 null
2025-03-11 Uni $\textbf{F}^2$ ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models Junzhe Li et.al. 2503.08120 null
2025-03-11 MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models Han Zhao et.al. 2503.08007 null
2025-03-10 GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts Minwen Liao et.al. 2503.07417 null
2025-03-10 A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications Siyuan Mu et.al. 2503.07137 link
2025-03-10 VMTS: Vision-Assisted Teacher-Student Reinforcement Learning for Multi-Terrain Locomotion in Bipedal Robots Fu Chen et.al. 2503.07049 link
2025-03-10 ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration Mengting Ai et.al. 2503.06881 link
2025-03-10 eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference Suraiya Tairin et.al. 2503.06823 null
2025-03-09 MoFE: Mixture of Frozen Experts Architecture Jean Seo et.al. 2503.06491 null
2025-03-09 Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models Nguyen Do et.al. 2503.06413 link
2025-03-08 MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering Vinay Kumar Verma et.al. 2503.06296 null
2025-03-08 A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts Wenzhuo Du et.al. 2503.06064 null
2025-03-08 MANDARIN: Mixture-of-Experts Framework for Dynamic Delirium and Coma Prediction in ICU Patients: Development and Validation of an Acute Brain Dysfunction Prediction Model Miguel Contreras et.al. 2503.06059 null
2025-03-07 Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning Justin Chih-Yao Chen et.al. 2503.05641 null
2025-03-07 FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework Jingyu Xu et.al. 2503.05626 null
2025-03-07 Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts Weigao Sun et.al. 2503.05447 link
2025-03-07 Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs Ling Team et.al. 2503.05139 null
2025-03-07 Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts Shwai He et.al. 2503.05066 null
2025-03-06 Continual Pre-training of MoEs: How robust is your router? Benjamin Thérien et.al. 2503.05029 null
2025-03-06 Predictable Scale: Part I – Optimal Hyperparameter Scaling Law in Large Language Model Pretraining Houyi Li et.al. 2503.04715 null
2025-03-07 Question-Aware Gaussian Experts for Audio-Visual Question Answering Hongyeob Kim et.al. 2503.04459 link
2025-03-07 Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling Yan Li et.al. 2503.04398 null
2025-03-06 A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery Yiheng Zhu et.al. 2503.04362 null
2025-03-06 DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval Yating Liu et.al. 2503.04144 null
2025-03-05 VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection Enkhtogtokh Togootogtokh et.al. 2503.03797 link
2025-03-05 Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs Haoran Fan et.al. 2503.03594 link
2025-03-05 Convergence Rates for Softmax Gating Mixture of Experts Huy Nguyen et.al. 2503.03213 null
2025-03-04 MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation Weihang Wang et.al. 2503.02799 link
2025-03-04 FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting Congluo Xu et.al. 2503.02692 null
2025-03-04 Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer Yujiao Yang et.al. 2503.02495 link
2025-03-04 Tabby: Tabular Data Synthesis with Language Models Sonia Cromp et.al. 2503.02152 null
2025-03-03 ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition Nastaran Mansourian et.al. 2503.01750 null
2025-03-03 Effective High-order Graph Representation Learning for Credit Card Fraud Detection Yao Zou et.al. 2503.01556 null
2025-03-03 DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models Yongqi Huang et.al. 2503.01359 null
2025-03-03 PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation Linhai Zhang et.al. 2503.01303 null
2025-03-03 Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting Xiaobin Hong et.al. 2503.01157 null
2025-03-02 Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion Daiki Nishiyama et.al. 2503.00925 null
2025-03-01 R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts Zhongyang Li et.al. 2502.20395 link
2025-02-27 Mixture of Experts for Recognizing Depression from Interview and Reading Tasks Loukas Ilias et.al. 2502.20213 null
2025-02-27 Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems Zeyi Ren et.al. 2502.20183 null
2025-02-27 UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook Yidi Jiang et.al. 2502.20067 null
2025-03-01 Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts Shulai Zhang et.al. 2502.19811 link
2025-02-26 Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization Taishi Nakamura et.al. 2502.19261 null
2025-02-26 OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment Jiaxin Deng et.al. 2502.18965 null
2025-02-25 Generative AI-enabled Wireless Communications for Robust Low-Altitude Economy Networking Changyuan Zhao et.al. 2502.18118 null
2025-02-24 The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE Andrei Chernov et.al. 2502.17391 null
2025-02-24 Delta Decompression for MoE-based LLMs Compression Hao Gu et.al. 2502.17298 link
2025-02-24 Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks Andrei Chernov et.al. 2502.17187 null
2025-02-24 Muon is Scalable for LLM Training Jingyuan Liu et.al. 2502.16982 link
2025-02-24 BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference Zewen Jin et.al. 2502.16927 null
2025-02-24 ENACT-Heart – ENsemble-based Assessment Using CNN and Transformer on Heart Sounds Jiho Han et.al. 2502.16914 null
2025-02-26 Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment Chenghao Fan et.al. 2502.16894 link
2025-02-22 An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning Masoud Shokrnezhad et.al. 2502.16198 null
2025-02-21 A fast convergence algorithm based on binary integer programming for expert load balancing in MoE LLMs Yuan Sun et.al. 2502.15451 link
2025-02-21 Tight Clusters Make Specialized Experts Stefan K. Nielsen et.al. 2502.15315 link
2025-02-21 Multimodal Graph-Based Variational Mixture of Experts Network for Zero-Shot Multimodal Information Extraction Baohang Zhou et.al. 2502.15290 link
2025-02-20 Ray-Tracing for Conditionally Activated Neural Networks Claudio Gallicchio et.al. 2502.14788 null
2025-02-21 ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model Zhongyi Zhou et.al. 2502.14420 link
2025-02-19 Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts Xin Li et.al. 2502.13577 null
2025-02-18 MoBA: Mixture of Block Attention for Long-Context LLMs Enzhe Lu et.al. 2502.13189 link
2025-02-18 Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models Gyeongman Kim et.al. 2502.12947 null
2025-02-18 DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs Minxuan Lv et.al. 2502.12455 null
2025-02-17 From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs Kumari Nishu et.al. 2502.12325 null
2025-02-17 Accurate Expert Predictions in MoE Inference via Cross-Layer Gate Zhiyuan Fang et.al. 2502.12224 null
2025-02-17 How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines Ayan Sengupta et.al. 2502.12051 null
2025-02-17 Connector-S: A Survey of Connectors in Multi-modal Large Language Models Xun Zhu et.al. 2502.11453 null
2025-02-16 Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time Robert Dahlke et.al. 2502.11096 null
2025-02-16 ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models Shixuan Li et.al. 2502.11059 null
2025-02-15 Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization Matthew Lyle Olson et.al. 2502.10928 null
2025-02-12 Heterogeneous Mixture of Experts for Remote Sensing Image Super-Resolution Bowen Chen et.al. 2502.09654 link
2025-02-14 Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting Nicholas Dronen et.al. 2502.09500 link
2025-02-12 The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities Ning Li et.al. 2502.08381 null
2025-02-12 Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification Xuanze Chen et.al. 2502.08083 null
2025-02-13 Training Sparse Mixture Of Experts Text Embedding Models Zach Nussbaum et.al. 2502.07972 link
2025-02-11 Memory Analysis on the Training Course of DeepSeek Models Ping Zhang et.al. 2502.07846 null
2025-02-11 MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks Lotfi Abdelkrim Mecharbat et.al. 2502.07422 null
2025-02-11 Online Aggregation of Trajectory Predictors Alex Tong et.al. 2502.07178 null
2025-02-09 Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline Zhiyuan Fang et.al. 2502.06888 null
2025-02-10 MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing Seokjin Go et.al. 2502.06643 null
2025-02-10 Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE Haiduo Huang et.al. 2502.06282 link
2025-02-10 Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models Peiran Wang et.al. 2502.06094 null
2025-02-08 Mol-MoE: Training Preference-Guided Routers for Molecule Generation Diego Calanzone et.al. 2502.05633 link
2025-02-08 UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA Jiale Dong et.al. 2502.05602 link
2025-02-07 fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving Hanfei Yu et.al. 2502.05370 null
2025-02-07 Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts Roussel Desmond Nzoyem et.al. 2502.05335 null
2025-02-07 Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient Jan Ludziejewski et.al. 2502.05172 null
2025-02-06 Mixture of neural operator experts for learning boundary conditions and model selection Dwyer Deighan et.al. 2502.04562 null
2025-02-06 CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference Zehua Pei et.al. 2502.04416 link
2025-02-06 Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning Peizhuang Cong et.al. 2502.03884 null
2025-02-05 (GG) MoE vs. MLP on Tabular Data Andrei Chernov et.al. 2502.03608 null
2025-02-05 RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts Tuan Truong et.al. 2502.03044 null
2025-02-05 On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation Nghiem T. Diep et.al. 2502.03029 null
2025-02-05 Scaling Laws for Upcycling Mixture-of-Experts Language Models Seng Pei Liew et.al. 2502.03009 null
2025-02-04 ReGNet: Reciprocal Space-Aware Long-Range Modeling and Multi-Property Prediction for Crystals Jianan Nie et.al. 2502.02748 null
2025-02-04 Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism Yuhao Qing et.al. 2502.02581 null
2025-02-05 Brief analysis of DeepSeek R1 and its implications for Generative AI Sarah Mercer et.al. 2502.02523 null
2025-02-04 M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference Nikhil Bhendawade et.al. 2502.02040 null
2025-02-05 MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation Haibo Tong et.al. 2502.01719 null
2025-02-04 MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs Yuhang Zhou et.al. 2502.00997 null
2025-02-03 CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling Xinze Wang et.al. 2502.00965 null
2025-02-02 UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs Yufei He et.al. 2502.00806 link
2025-02-02 Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective Yujin Oh et.al. 2502.00619 link
2025-02-01 PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning Yu Feng et.al. 2502.00354 link
2025-02-01 Sigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts Perspective Fanqi Yan et.al. 2502.00281 null
2025-01-31 Pheromone-based Learning of Optimal Reasoning Paths Anirudh Chari et.al. 2501.19278 null
2025-01-31 Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning Minh Le et.al. 2501.18936 null
2025-01-30 MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability Yan Sun et.al. 2501.18439 null
2025-01-29 Free Agent in Agent-Based Mixture-of-Experts Generative AI Framework Jung-Hua Liu et.al. 2501.17903 null
2025-01-29 Heuristic-Informed Mixture of Experts for Link Prediction in Multilayer Networks Lucio La Cava et.al. 2501.17557 null
2025-01-28 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow Yueen Ma et.al. 2501.16698 null
2025-01-27 MoEVD: Enhancing Vulnerability Detection by Mixture-of-Experts (MoE) Xu Yang et.al. 2501.16454 null
2025-01-27 Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference Yinghan Li et.al. 2501.16103 null
2025-01-25 ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning Shangqian Gao et.al. 2501.15316 null
2025-01-25 FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of Experts Ziqi Liu et.al. 2501.15125 link
2025-01-25 Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning Ziyu Zhao et.al. 2501.15103 null
2025-01-24 Mean-field limit from general mixtures of experts to quantum neural networks Anderson Melchor Hernandez et.al. 2501.14660 null
2025-01-24 Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation Shengzhe Zhang et.al. 2501.14269 link
2025-01-24 Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images Zeyun Deng et.al. 2501.14198 null
2025-01-23 CSAOT: Cooperative Multi-Agent System for Active Object Tracking Hy Nguyen et.al. 2501.13994 null
2025-01-22 Autonomy-of-Experts Models Ang Lv et.al. 2501.13074 null
2025-01-22 LLM4WM: Adapting LLM for Wireless Multi-Tasking Xuanyu Liu et.al. 2501.12983 null
2025-01-22 UniUIR: Considering Underwater Image Restoration as An All-in-One Learner Xu Zhang et.al. 2501.12981 null
2025-01-22 BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR Guodong Ma et.al. 2501.12602 null
2025-01-21 Modality Interactive Mixture-of-Experts for Fake News Detection Yifan Liu et.al. 2501.12431 link
2025-01-21 SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection Xiaocheng Zhang et.al. 2501.12430 null
2025-01-21 Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models Samira Abnar et.al. 2501.12370 null
2025-01-21 MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks Qishen Zhou et.al. 2501.12281 link
2025-01-21 Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Zihan Qiu et.al. 2501.11873 null
2025-01-18 FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models Xinglin Pan et.al. 2501.10714 null
2025-01-17 OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning Jinyuan Feng et.al. 2501.10062 null
2025-01-17 LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading Kuan-Ming Liu et.al. 2501.09636 null
2025-01-14 MiniMax-01: Scaling Foundation Models with Lightning Attention MiniMax et.al. 2501.08313 null
2025-01-14 GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism Chen Tang et.al. 2501.07890 null
2025-01-18 PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration Xiaoshui Huang et.al. 2501.07762 null
2025-01-13 A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis Binyu Zhang et.al. 2501.07016 link
2025-01-12 Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning Hanwen Zhong et.al. 2501.06884 link
2025-01-10 TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning Yinghao Zhu et.al. 2501.05661 link
2025-01-09 Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing Mengfan Liu et.al. 2501.05313 null
2025-01-07 LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes Xiang Xu et.al. 2501.04004 link
2025-01-07 mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training Xudong Liao et.al. 2501.03905 null
2025-01-08 Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection Donatella Genovese et.al. 2501.03432 null
2025-01-12 Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning Zhongyi Zhou et.al. 2501.02198 null
2025-01-03 MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders Jiajun Cao et.al. 2501.01709 null
2025-01-01 REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization Huyen Nguyen et.al. 2501.00779 null
2025-01-06 Superposition in Transformers: A Novel Way of Building Mixture of Experts Ayoub Ben Chaliah et.al. 2501.00530 link
2024-12-31 CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection Xiaolei Wang et.al. 2501.00346 null
2024-12-29 Multimodal Variational Autoencoder: a Barycentric View Peijie Qiu et.al. 2412.20487 null
2024-12-29 A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement Sidra Nasir et.al. 2412.20468 null
2024-12-28 UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity Jingbo Lin et.al. 2412.20157 link
2024-12-28 Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection Yaning Zhang et.al. 2412.20156 null
2024-12-27 DeepSeek-V3 Technical Report DeepSeek-AI et.al. 2412.19437 link
2024-12-26 AskChart: Universal Chart Understanding through Textual Enhancement Xudong Yang et.al. 2412.19146 link
2024-12-30 Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection Xiaoyu Huang et.al. 2412.19108 null
2024-12-24 Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making David Shoresh et.al. 2412.18593 link
2024-12-24 BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing Yingjie Ma et.al. 2412.18065 link
2024-12-23 UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition Li Fu et.al. 2412.17507 null
2024-12-23 BrainMAP: Learning Multiple Activation Pathways in Brain Networks Song Wang et.al. 2412.17404 link
2024-12-22 Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models Elie Antoine et.al. 2412.16971 null
2024-12-20 Theory of Mixture-of-Experts for Mobile Edge Computing Hongbo Li et.al. 2412.15690 null
2024-12-19 MoEtion: Efficient and Reliable Checkpointing for Mixture-of-Experts Models at Scale Swapnil Gandhi et.al. 2412.15411 null
2024-12-19 Qwen2.5 Technical Report Qwen et.al. 2412.15115 link
2024-12-19 ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Ziteng Wang et.al. 2412.14711 link
2024-12-18 A Survey on Inference Optimization Techniques for Mixture of Experts Models Jiacheng Liu et.al. 2412.14219 link
2024-12-18 SEKE: Specialised Experts for Keyword Extraction Matej Martinc et.al. 2412.14087 link
2024-12-18 MedCoT: Medical Chain of Thought via Hierarchical Expert Jiaxiang Liu et.al. 2412.13736 link
2024-12-17 SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks Mátyás Vincze et.al. 2412.13053 link
2024-12-17 Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning Moritz Reuss et.al. 2412.12953 null
2024-12-17 CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition He Wang et.al. 2412.12760 null
2024-12-16 Investigating Mixture of Experts in Dense Retrieval Effrosyni Sokli et.al. 2412.11864 null
2024-12-18 Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture Jingze Shi et.al. 2412.11834 link
2024-12-16 Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation Svetlana Pavlitska et.al. 2412.11608 link
2024-12-16 Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture Jingyu Xu et.al. 2412.11557 null
2024-12-14 DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification Yuhao Wang et.al. 2412.10650 link
2024-12-13 DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Zhiyu Wu et.al. 2412.10302 link
2024-12-13 Llama 3 Meets MoE: Efficient Upcycling Aditya Vavre et.al. 2412.09952 link
2024-12-12 Memory Layers at Scale Vincent-Pierre Berges et.al. 2412.09764 link
2024-12-12 Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine Xiaoshuang Huang et.al. 2412.09278 link
2024-12-12 Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective Minh Le et.al. 2412.08285 null
2024-12-11 Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification Xuanze Chen et.al. 2412.08193 link
2024-12-10 MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems Yao Fu et.al. 2412.07067 null
2024-12-07 Partition of Unity Physics-Informed Neural Networks (POU-PINNs): An Unsupervised Framework for Physics-Informed Domain Decomposition and Mixtures of Experts Arturo Rodriguez et.al. 2412.06842 null
2024-12-09 Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset Xiao Wang et.al. 2412.06647 link
2024-12-09 UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts Zhen Wan et.al. 2412.06340 null
2024-12-08 Hallucination-aware Optimization for Large Language Model-empowered Communications Yinqiu Liu et.al. 2412.06007 link
2024-12-10 An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism Qing Zhang et.al. 2412.05821 null
2024-12-10 RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts Xu Liu et.al. 2412.05679 link
2024-12-07 SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts Gengze Zhou et.al. 2412.05552 link
2024-12-07 Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers Boxun Xu et.al. 2412.05540 null
2024-12-06 Steps are all you need: Rethinking STEM Education with Prompt Engineering Krishnasai Addala et.al. 2412.05023 null
2024-12-09 Monet: Mixture of Monosemantic Experts for Transformers Jungwoo Park et.al. 2412.04139 link
2024-12-05 Meta-Reinforcement Learning With Mixture of Experts for Generalizable Multi Access in Heterogeneous Wireless Networks Zhaoyang Liu et.al. 2412.03850 null
2024-12-04 Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond Loukas Ilias et.al. 2412.03483 null
2024-12-05 MQFL-FHE: Multimodal Quantum Federated Learning Framework with Fully Homomorphic Encryption Siddhant Dutta et.al. 2412.01858 null
2024-12-05 Yi-Lightning Technical Report 01. AI et.al. 2412.01253 null
2024-11-30 Mixture of Experts for Node Classification Yu Shi et.al. 2412.00418 null
2024-11-30 HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting Shaohan Yu et.al. 2412.00316 null
2024-11-27 Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference Andrii Skliar et.al. 2412.00099 null
2024-11-29 LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References Shuguo Jiang et.al. 2411.19758 null
2024-11-28 On the effectiveness of discrete representations in sparse mixture of experts Giang Do et.al. 2411.19402 null
2024-11-28 Bayesian Cluster Weighted Gaussian Models Panagiotis Papastamoulis et.al. 2411.18957 link
2024-11-27 UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS Haomin Zhuang et.al. 2411.18797 null
2024-11-27 Complexity Experts are Task-Discriminative Learners for Any Image Restoration Eduard Zamfir et.al. 2411.18466 null
2024-11-27 Mixture of Experts in Image Classification: What’s the Sweet Spot? Mathurin Videau et.al. 2411.18322 null
2024-11-26 $H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs Selim Furkan Tekin et.al. 2411.17792 link
2024-11-25 Staleness-Centric Optimizations for Efficient Diffusion MoE Inference Jiajun Luo et.al. 2411.16786 null
2024-11-29 MH-MoE: Multi-Head Mixture-of-Experts Shaohan Huang et.al. 2411.16205 null
2024-11-25 LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy Peng Cui et.al. 2411.16095 null
2024-11-24 Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution Haiquan Wang et.al. 2411.15871 null
2024-11-24 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training Xiaoye Qu et.al. 2411.15708 link
2024-11-23 Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts Qizhou Chen et.al. 2411.15432 null
2024-11-23 Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation Fahao Chen et.al. 2411.15419 null
2024-11-20 MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification Yuxuan Chen et.al. 2411.13004 null
2024-11-23 KAAE: Numerical Reasoning for Knowledge Graphs via Knowledge-aware Attributes Learning Ming Yin et.al. 2411.12950 null
2024-11-19 Ultra-Sparse Memory Network Zihao Huang et.al. 2411.12364 null
2024-11-18 MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs Shiyi Cao et.al. 2411.11217 null
2024-11-16 Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts Jinqiang Long et.al. 2411.10669 link
2024-11-15 Weakly-Supervised Multimodal Learning on MIMIC-CXR Andrea Agostini et.al. 2411.10356 link
2024-11-21 Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models Wei Wang et.al. 2411.10003 null
2024-11-13 Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection Vima Gupta et.al. 2411.08982 null
2024-11-13 Sparse Upcycling: Inference Inefficient Finetuning Sasha Doubov et.al. 2411.08968 null
2024-11-13 LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing Xiaonan Nie et.al. 2411.08446 null
2024-11-12 Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach Renzi Wang et.al. 2411.08232 null
2024-11-12 PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model Yilun Liu et.al. 2411.08212 null
2024-11-12 Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge Emmanuel Azuh Mensah et.al. 2411.07834 null
2024-11-11 Adaptive Conditional Expert Selection Network for Multi-domain Recommendation Kuiyao Dong et.al. 2411.06826 null
2024-11-11 WDMoE: Wireless Distributed Mixture of Experts for Large Language Models Nan Xue et.al. 2411.06681 null
2024-11-09 Learning Mixtures of Experts with EM Quentin Fruytier et.al. 2411.06056 null
2024-11-08 NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts Yen-Ting Lin et.al. 2411.05945 null
2024-11-05 DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts Zelin Yao et.al. 2411.03025 link
2024-11-05 Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts Yuan Xie et.al. 2411.02787 null
2024-11-06 Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Xingwu Sun et.al. 2411.02265 null
2024-11-04 FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation Ziwei Zhan et.al. 2411.02115 null
2024-11-03 RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering Hui Lin et.al. 2411.01595 null
2024-11-03 Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation Mingrui Liu et.al. 2411.01457 null
2024-11-06 HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference Peng Tang et.al. 2411.01433 null
2024-11-07 HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy Shuqing Luo et.al. 2411.01288 link
2024-11-02 PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment Dongxu Liu et.al. 2411.01245 null
2024-11-01 MoE-I $^2$ : Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition Cheng Yang et.al. 2411.01016 null
2024-11-01 LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models Nam V. Nguyen et.al. 2411.00918 link
2024-11-01 MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization Jingming Guo et.al. 2411.00662 link
2024-10-31 Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts Xiang Deng et.al. 2410.23836 null
2024-10-30 Efficient and Interpretable Grammatical Error Correction with Mixture of Experts Muhammad Reza Qorib et.al. 2410.23507 link
2024-10-30 Stealing User Prompts from Mixture of Experts Itay Yona et.al. 2410.22884 null
2024-10-30 MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning Xujia Wang et.al. 2410.22782 null
2024-10-29 ProMoE: Fast MoE-based LLM Serving using Proactive Caching Xiaoniu Song et.al. 2410.22134 null
2024-10-29 Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging Li Shen et.al. 2410.21804 null
2024-10-29 Neural Experts: Mixture of Experts for Implicit Neural Representations Yizhak Ben-Shabat et.al. 2410.21643 null
2024-10-28 FinTeamExperts: Role Specialized MOEs For Financial Analysis Yue Yu et.al. 2410.21338 null
2024-10-28 Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving Jiyao Wang et.al. 2410.21086 null
2024-10-27 Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation Maohao Shen et.al. 2410.20336 null
2024-10-27 GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields Yusuke Sekikawa et.al. 2410.20306 null
2024-10-25 DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction Zelin Zang et.al. 2410.19504 link
2024-10-25 Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis Weikai Li et.al. 2410.19225 link
2024-10-24 Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Ruisi Cai et.al. 2410.19123 link
2024-10-24 Mixture of Parrots: Experts improve memorization more than reasoning Samy Jelassi et.al. 2410.19034 null
2024-10-24 MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases Zhisheng Lin et.al. 2410.18406 null
2024-10-23 Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches Kexin Feng et.al. 2410.18298 null
2024-10-23 MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning Jingfan Zhang et.al. 2410.18035 null
2024-10-23 ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference Xin He et.al. 2410.17954 null
2024-10-23 Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition Artem Basharin et.al. 2410.17765 null
2024-10-22 Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling Jialong Li et.al. 2410.17043 null
2024-10-21 LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset Ruikun Zhang et.al. 2410.16095 link
2024-10-22 CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts Zhenpeng Su et.al. 2410.16077 link
2024-10-21 Generalizing Motion Planners with Mixture of Experts for Autonomous Driving Qiao Sun et.al. 2410.15774 link
2024-10-21 ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts Xumeng Han et.al. 2410.15732 null
2024-10-20 Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs Xin Zhou et.al. 2410.15438 null
2024-10-20 LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration Yuang Ai et.al. 2410.15385 link
2024-10-19 MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning Suning Huang et.al. 2410.14972 null
2024-10-18 MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts Rachel S. Y. Teo et.al. 2410.14574 link
2024-10-18 ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction Haoyu He et.al. 2410.14099 link
2024-10-17 Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks Jinze Zhao et.al. 2410.13964 null
2024-10-16 On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs Herun Wan et.al. 2410.12600 null
2024-10-16 Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts Fanqi Yan et.al. 2410.12258 null
2024-10-16 EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference Yulei Qian et.al. 2410.12247 null
2024-10-15 MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router Yanyue Xie et.al. 2410.12013 null
2024-10-15 MoH: Multi-Head Attention as Mixture-of-Head Attention Peng Jin et.al. 2410.11842 link
2024-10-15 GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation Fei Tang et.al. 2410.11841 link
2024-10-15 Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models James Vo et.al. 2410.11654 null
2024-10-16 Quadratic Gating Functions in Mixture of Experts: A Statistical Insight Pedram Akbarian et.al. 2410.11222 null
2024-10-16 Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Ziyue Li et.al. 2410.10814 link
2024-10-14 Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts Guorui Zheng et.al. 2410.10626 link
2024-10-14 Learning to Ground VLMs without Forgetting Aritra Bhowmik et.al. 2410.10491 null
2024-10-14 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts Xu Liu et.al. 2410.10469 null
2024-10-15 Ada-K Routing: Boosting the Efficiency of MoE-based LLMs Tongtian Yue et.al. 2410.10456 null
2024-10-14 Tighter Risk Bounds for Mixtures of Experts Wissam Akretche et.al. 2410.10397 null
2024-10-14 Scalable Multi-Domain Adaptation of Language Models using Modular Experts Peter Schafhalter et.al. 2410.10181 null
2024-10-14 Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models Jun Luo et.al. 2410.10114 link
2024-10-14 AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality Peijun Qing et.al. 2410.10054 link
2024-10-13 ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL Zhanqiu Guo et.al. 2410.09781 null
2024-10-11 Semi-Supervised Learning of Noisy Mixture of Experts Models Oh-Ran Kwon et.al. 2410.09039 null
2024-10-11 Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering I-Chun Chen et.al. 2410.08589 link
2024-10-10 Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts Sukwon Yun et.al. 2410.08245 link
2024-10-10 Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training Gen Luo et.al. 2410.08202 null
2024-10-10 Efficient Dictionary Learning with Switch Sparse Autoencoders Anish Mudide et.al. 2410.08201 link
2024-10-10 More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing Sagi Shaier et.al. 2410.08003 link
2024-10-10 SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture Jiayi Han et.al. 2410.07739 null
2024-10-10 Upcycling Large Language Models into Mixture of Experts Ethan He et.al. 2410.07524 null
2024-10-09 MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts Peng Jin et.al. 2410.07348 link
2024-10-09 Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders David Noever et.al. 2410.06462 null
2024-10-09 Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs Ruijia Niu et.al. 2410.06431 null
2024-10-08 Probing the Robustness of Theory of Mind in Large Language Models Christian Nickel et.al. 2410.06271 null
2024-10-08 MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More Wei Huang et.al. 2410.06270 link
2024-10-08 Aria: An Open Multimodal Native Mixture-of-Experts Model Dongxu Li et.al. 2410.05993 link
2024-10-08 Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models Siqi Wang et.al. 2410.05661 null
2024-10-07 Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild Xinyu Zhao et.al. 2410.05357 link
2024-10-07 Multimodal Fusion Strategies for Mapping Biophysical Landscape Features Lucia Gordon et.al. 2410.04833 link
2024-10-06 Realizing Video Summarization from the Path of Language-based Semantic Understanding Kuan-Chen Mu et.al. 2410.04511 null
2024-10-09 Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding Wei Wu et.al. 2410.03553 null
2024-10-04 Exploring the Benefit of Activation Sparsity in Pre-training Zhengyan Zhang et.al. 2410.03440 link
2024-10-03 MLP-KAN: Unifying Deep Representation and Function Learning Yunhong He et.al. 2410.03027 link
2024-10-03 On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions Huy Nguyen et.al. 2410.02935 null
2024-10-03 Neutral residues: revisiting adapters for model extension Franck Signe Talla et.al. 2410.02744 null
2024-10-03 Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping Ziye Huang et.al. 2410.02475 null
2024-10-03 MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction Zhaojian Yu et.al. 2410.02241 null
2024-10-03 Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts Minh Le et.al. 2410.02200 link
2024-10-04 Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices Andres Potapczynski et.al. 2410.02117 link
2024-10-04 EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing Haotian Sun et.al. 2410.02098 null
2024-10-02 Don’t flatten, tokenize! Unlocking the key to SoftMoE’s efficacy in deep RL Ghada Sokar et.al. 2410.01930 null
2024-10-02 Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models Shayekh Bin Islam et.al. 2410.01782 link
2024-10-02 Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging Tingfeng Hui et.al. 2410.01610 null
2024-10-02 The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs Hong Li et.al. 2410.01417 null
2024-10-01 MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards Sheng Wang et.al. 2410.00938 null
2024-10-01 UniAdapt: A Universal Adapter for Knowledge Calibration Tai D. Nguyen et.al. 2410.00454 null
2024-10-01 Robust Traffic Forecasting against Spatial Shift over Years Hongjun Wang et.al. 2410.00373 link
2024-09-29 IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method Chaohui Xu et.al. 2410.00059 null
2024-09-30 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Haotian Zhang et.al. 2409.20566 null
2024-10-02 CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Jihai Zhang et.al. 2409.19291 link
2024-09-27 SciDFM: A Large Language Model with Mixture-of-Experts for Science Liangtai Sun et.al. 2409.18412 null
2024-09-26 Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE Xun Zhu et.al. 2409.17508 link
2024-09-26 A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction Guangyu Wang et.al. 2409.17440 link
2024-09-24 Leveraging Mixture of Experts for Improved Speech Deepfake Detection Viola Negroni et.al. 2409.16077 null
2024-10-02 Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Xiaoming Shi et.al. 2409.16040 link
2024-09-24 Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM Fengrun Zhang et.al. 2409.15905 null
2024-09-24 Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks Jiayi He et.al. 2409.15695 null
2024-09-23 A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts Hugo Inzirillo et.al. 2409.15161 link
2024-09-23 Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond Hong Chen et.al. 2409.14993 null
2024-09-21 Routing in Sparsely-gated Language Models responds to Context Stefan Arnold et.al. 2409.14107 null
2024-09-20 On-device Collaborative Language Modeling via a Mixture of Generalists and Specialists Dongyang Fan et.al. 2409.13931 link
2024-09-20 Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning Annette Spooner et.al. 2409.13791 null
2024-09-19 Robust Audiovisual Speech Recognition Models with Mixture-of-Experts Yihan Wu et.al. 2409.12370 null
2024-09-18 GRIN: GRadient-INformed MoE Liyuan Liu et.al. 2409.12136 null
2024-09-18 Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0 Zhiyong Wang et.al. 2409.11909 link
2024-09-17 LPT++: Efficient Training on Mixture of Long-tailed Experts Bowen Dong et.al. 2409.11323 null
2024-09-19 LOLA – An Open-Source Massively Multilingual Large Language Model Nikit Srivastava et.al. 2409.11272 link
2024-09-16 Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression Yi-Hsin Li et.al. 2409.10101 null
2024-09-14 MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving Enming Zhang et.al. 2409.07267 link
2024-09-10 DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models Maryam Akhavan Aghdam et.al. 2409.06669 null
2024-09-10 STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning Jaeseong Lee et.al. 2409.06211 null
2024-09-10 VE: Modeling Multivariate Time Series Correlation with Variate Embedding Shangjiong Wang et.al. 2409.06169 link
2024-09-09 Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models Hongyang Lei et.al. 2409.05929 link
2024-09-09 Optical Spiking Neurons Enable High-Speed and Energy-Efficient Optical Neural Networks Bo Xu et.al. 2409.05726 null
2024-09-09 Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection Tianwu Lei et.al. 2409.05611 null
2024-09-05 Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions Zemian Ke et.al. 2409.03282 null
2024-09-05 ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding Zhengzhuo Xu et.al. 2409.03277 null
2024-09-05 xLAM: A Family of Large Action Models to Empower AI Agent Systems Jianguo Zhang et.al. 2409.03215 link
2024-09-04 Configurable Foundation Models: Building LLMs from a Modular Perspective Chaojun Xiao et.al. 2409.02877 null
2024-09-04 Pluralistic Salient Object Detection Xuelu Feng et.al. 2409.02368 null
2024-09-03 OLMoE: Open Mixture-of-Experts Language Models Niklas Muennighoff et.al. 2409.02060 link
2024-09-05 Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model Hukai Huang et.al. 2409.02050 null
2024-09-02 Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning Soumajyoti Sarkar et.al. 2409.01483 null
2024-09-02 Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching Sungmin Yun et.al. 2409.01141 null
2024-09-04 Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack Guanzhong Chen et.al. 2409.00960 link
2024-09-02 Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts Youngseog Chung et.al. 2409.00879 null
2024-08-29 Gradient-free variational learning with conditional mixture networks Conor Heins et.al. 2408.16429 link
2024-08-28 Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models Yuncheng Yang et.al. 2408.15915 link
2024-08-28 Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts Nikolas Gritsch et.al. 2408.15901 null
2024-08-28 LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Fangxun Shu et.al. 2408.15881 link
2024-08-28 Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts Lean Wang et.al. 2408.15664 null
2024-08-27 Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis Sakhinana Sagar Srinivas et.al. 2408.15305 null
2024-08-27 MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce Hao Jiang et.al. 2408.14968 null
2024-08-24 Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings Sagar Srinivas Sakhinana et.al. 2408.13622 null
2024-08-23 The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities Venkatesh Balavadhani Parthasarathy et.al. 2408.13296 null
2024-08-23 Guiding IoT-Based Healthcare Alert Systems with Large Language Models Yulan Gao et.al. 2408.13071 null
2024-08-23 DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation Xiaowei Mao et.al. 2408.12809 link
2024-08-23 Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth Yuxiang Wei et.al. 2408.12803 null
2024-08-23 La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection Hang Zou et.al. 2408.12793 null
2024-08-22 SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging Mohammadreza Pourreza et.al. 2408.12733 null
2024-08-22 Jamba-1.5: Hybrid Transformer-Mamba Models at Scale Jamba Team et.al. 2408.12570 null
2024-08-22 Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators Dingkang Yang et.al. 2408.12325 link
2024-08-21 MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing Hao Zhou et.al. 2408.11396 link
2024-08-21 KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting? Xiao Han et.al. 2408.11306 link
2024-08-21 FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts Hanzi Mei et.al. 2408.11304 null
2024-08-20 Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data Atmika Gorti et.al. 2408.11247 null
2024-08-20 Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting Jianxiang Zhou et.al. 2408.10822 link
2024-08-20 AnyGraph: Graph Foundation Model in the Wild Lianghao Xia et.al. 2408.10700 link
2024-08-20 HMoE: Heterogeneous Mixture of Experts for Language Modeling An Wang et.al. 2408.10681 null
2024-08-19 AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference Shuzhang Zhong et.al. 2408.10284 link
2024-08-17 FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models Xiaochen Wang et.al. 2408.10276 link
2024-08-19 Customizing Language Models with Instance-wise LoRA for Sequential Recommendation Xiaoyu Kong et.al. 2408.10159 link
2024-08-19 A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method Hang Zou et.al. 2408.09752 null
2024-08-16 Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection Haohao Zhu et.al. 2408.08551 link
2024-08-17 BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts Qizhen Zhang et.al. 2408.08274 null
2024-08-14 Beyond Inter-Item Relations: Dynamic Adaptive Mixture-of-Experts for LLM-Based Sequential Recommendation CanYi Liu et.al. 2408.07427 null
2024-08-13 A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning Prateek Yadav et.al. 2408.07057 null
2024-08-13 Layerwise Recurrent Router for Mixture-of-Experts Zihan Qiu et.al. 2408.06793 link
2024-08-13 AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies Bo-Wen Zhang et.al. 2408.06567 null
2024-08-10 HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou Xu Wang et.al. 2408.05430 null
2024-08-08 Understanding the Performance and Estimating the Cost of LLM Fine-Tuning Yuchen Xia et.al. 2408.04693 link
2024-08-08 Partial Experts Checkpoint: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training Weilin Cai et.al. 2408.04307 null
2024-08-08 LaDiMo: Layer-wise Distillation Inspired MoEfier Sungyoon Kim et.al. 2408.04278 null
2024-08-07 MoExtend: Tuning New Experts for Modality and Task Extension Shanshan Zhong et.al. 2408.03511 link
2024-08-05 Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization Changtao Miao et.al. 2408.02306 null
2024-08-02 HMDN: Hierarchical Multi-Distribution Network for Click-Through Rate Prediction Xingyu Lou et.al. 2408.01332 null
2024-08-01 Multimodal Fusion and Coherence Modeling for Video Topic Segmentation Hai Yu et.al. 2408.00365 null
2024-08-12 MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts Xi Victoria Lin et.al. 2407.21770 null
2024-07-31 PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning Min Jae Jung et.al. 2407.21571 null
2024-07-30 Distribution Learning for Molecular Regression Nima Shoghi et.al. 2407.20475 null
2024-07-29 Time series forecasting with high stakes: A field study of the air cargo industry Abhinav Garg et.al. 2407.20192 null
2024-07-30 Mixture of Nested Experts: Adaptive Processing of Visual Tokens Gagan Jain et.al. 2407.19985 null
2024-07-28 Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models Mohammed Al-Maamari et.al. 2407.19610 link
2024-07-26 Wolf: Captioning Everything with a World Summarization Framework Boyi Li et.al. 2407.18908 null
2024-07-26 MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition Chang Liu et.al. 2407.18616 link
2024-07-26 Dynamic Language Group-Based MoE: Enhancing Efficiency and Flexibility for Code-Switching Speech Recognition Hukai Huang et.al. 2407.18581 link
2024-07-25 How Lightweight Can A Vision Transformer Be Jen Hong Tan et.al. 2407.17783 null
2024-07-24 Exploring Domain Robust Lightweight Reward Models based on Router Mechanism Hyuk Namgoong et.al. 2407.17546 null
2024-07-24 M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis Junyu Li et.al. 2407.17267 link
2024-07-25 Cheems: Wonderful Matrices More Efficient and More Effective Architecture Jingze Shi et.al. 2407.16958 null
2024-07-22 Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget Vikash Sehwag et.al. 2407.15811 link
2024-07-22 Norface: Improving Facial Expression Analysis by Identity Normalization Hanwei Liu et.al. 2407.15617 link
2024-07-19 Mixture of Experts with Mixture of Precisions for Tuning Quality of Service HamidReza Imani et.al. 2407.14417 null
2024-07-19 EVLM: An Efficient Vision-Language Model for Visual Understanding Kaibing Chen et.al. 2407.14177 null
2024-07-19 Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models Qiong Wu et.al. 2407.14093 null
2024-07-18 Discussion: Effective and Interpretable Outcome Prediction by Training Sparse Mixtures of Linear Experts Francesco Folino et.al. 2407.13526 null
2024-07-18 Mixture of Experts based Multi-task Supervise Learning from Crowds Tao Han et.al. 2407.13268 null
2024-07-15 MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration Yulin Ren et.al. 2407.10833 null
2024-07-18 Qwen2 Technical Report An Yang et.al. 2407.10671 link
2024-07-15 Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering Francesco Di Sario et.al. 2407.10389 null
2024-07-13 Low-Rank Interconnected Adaptation Across Layers Yibo Zhong et.al. 2407.09946 link
2024-07-13 MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts Zhenpeng Su et.al. 2407.09816 link
2024-07-12 Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts Zeliang Zhang et.al. 2407.09590 null
2024-07-11 An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio Siding Zeng et.al. 2407.08239 null
2024-07-10 MoVEInt: Mixture of Variational Experts for Learning Human-Robot Interactions from Demonstrations Vignesh Prasad et.al. 2407.07636 link
2024-07-10 Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation Szymon Płotka et.al. 2407.07514 link
2024-07-09 A Simple Architecture for Enterprise Large Language Model Applications based on Role based security and Clearance Levels using Retrieval-Augmented Generation or Mixture of Experts Atilla Özgür et.al. 2407.06718 null
2024-07-06 SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation Guoan Wang et.al. 2407.04938 null
2024-07-06 Completed Feature Disentanglement Learning for Multimodal MRIs Analysis Tianling Liu et.al. 2407.04916 link
2024-07-05 YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation Sungkyun Chang et.al. 2407.04822 link
2024-07-05 Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement Yongji Wu et.al. 2407.04656 null
2024-07-05 MobileFlow: A Multimodal LLM For Mobile GUI Agent Songqin Nong et.al. 2407.04346 null
2024-07-04 Mixture of A Million Experts Xu Owen He et.al. 2407.04153 null
2024-07-02 Terminating Differentiable Tree Experts Jonathan Thomm et.al. 2407.02060 null
2024-07-05 Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models Zihan Wang et.al. 2407.01906 link
2024-07-01 Uncertainty Quantification in Table Structure Recognition Kehinde Ajayi et.al. 2407.01731 link
2024-07-01 Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning Yixiao Wang et.al. 2407.01531 null
2024-07-01 Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation Nadezhda Chirkova et.al. 2407.01126 null
2024-07-01 Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs Enshu Liu et.al. 2407.00945 link
2024-07-03 Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules Xinglin Pan et.al. 2407.00599 link
2024-06-28 One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts Ruochen Wang et.al. 2407.00256 link
2024-06-28 LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models Renzhi Wang et.al. 2406.20030 null
2024-06-28 Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model Longrong Yang et.al. 2406.19905 link
2024-06-28 SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR Qiuming Zhao et.al. 2406.19706 link
2024-06-27 A Teacher Is Worth A Million Instructions Nikhil Kothari et.al. 2406.19112 link
2024-06-27 Towards Personalized Federated Multi-scenario Multi-task Recommendation Yue Ding et.al. 2406.18938 null
2024-06-26 Mixture of Experts in a Mixture of RL settings Timon Willi et.al. 2406.18420 null
2024-06-26 A Closer Look into Mixture-of-Experts in Large Language Models Ka Man Lo et.al. 2406.18219 link
2024-06-26 SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR Shuaishuai Ye et.al. 2406.18021 null
2024-06-24 Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction Bruce Rushing et.al. 2406.17150 link
2024-06-24 LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training Tong Zhu et.al. 2406.16554 link
2024-06-25 OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser Jingze Shi et.al. 2406.16495 link
2024-06-24 Theory on Mixture-of-Experts in Continual Learning Hongbo Li et.al. 2406.16437 null
2024-06-22 SimSMoE: Solving Representational Collapse via Similarity Measure Giang Do et.al. 2406.15883 null
2024-06-20 Voice Disorder Analysis: a Transformer-based Approach Alkis Koudounas et.al. 2406.14693 link
2024-06-19 Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation Qian Chen et.al. 2406.13583 null
2024-06-19 AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models Zihao Zeng et.al. 2406.13233 link
2024-06-18 Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts Haoxiang Wang et.al. 2406.12845 link
2024-06-18 P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts Yuhao Dan et.al. 2406.12548 null
2024-06-18 Variational Distillation of Diffusion Policies into Mixture of Experts Hongyi Zhou et.al. 2406.12538 null
2024-06-18 GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory Haoze Wu et.al. 2406.12375 link
2024-06-17 Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding Ukyo Honda et.al. 2406.12060 link
2024-06-17 DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence DeepSeek-AI et.al. 2406.11931 link
2024-06-17 Graph Knowledge Distillation to Mixture of Experts Pavel Rumiantsev et.al. 2406.11919 link
2024-06-17 $\texttt{MoE-RBench}$ : Towards Building Reliable Language Models with Sparse Mixture-of-Experts Guanjie Chen et.al. 2406.11353 link
2024-06-17 Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts Tong Zhu et.al. 2406.11256 link
2024-06-14 Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion Anke Tang et.al. 2406.09770 link
2024-06-13 DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts Joel Ong et.al. 2406.08742 link
2024-06-12 Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark Pingzhi Li et.al. 2406.08155 link
2024-06-11 Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Yixin Song et.al. 2406.05955 null
2024-06-08 Flexible and Adaptable Summarization via Expertise Separation Xiuying Chen et.al. 2406.05360 link
2024-06-07 MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter Jitai Hao et.al. 2406.04984 link
2024-06-07 MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks Xingkui Zhu et.al. 2406.04801 link
2024-06-05 Style Mixture of Experts for Expressive Text-To-Speech Synthesis Ahad Jawaid et.al. 2406.03637 null
2024-06-05 Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach Haoyu Han et.al. 2406.03464 null
2024-06-05 Continual Traffic Forecasting via Mixture of Experts Sanghyun Lee et.al. 2406.03140 null
2024-06-05 Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models Raeid Saqur et.al. 2406.02969 null
2024-06-04 Parrot: Multilingual Visual Instruction Tuning Hai-Long Sun et.al. 2406.02539 link
2024-06-04 Demystifying the Compression of Mixture-of-Experts Through a Unified Framework Shwai He et.al. 2406.02500 link
2024-06-02 Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts – Physics Informed Neural Operator Forward Model Clement Etienam et.al. 2406.00889 link
2024-06-01 A Gaussian Process-based Streaming Algorithm for Prediction of Time Series With Regimes and Outliers Daniel Waxman et.al. 2406.00570 link
2024-06-01 Optimizing 6G Integrated Sensing and Communications (ISAC) via Expert Networks Jiacheng Wang et.al. 2406.00408 null
2024-05-30 Low-dimensional approximations of the conditional law of Volterra processes: a non-positive curvature approach Reza Arabpour et.al. 2405.20094 null
2024-06-02 MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors Renzhi Wang et.al. 2405.19086 null
2024-06-02 Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design Markus J. Buehler et.al. 2405.19076 link
2024-05-29 Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization Shengcai Liu et.al. 2405.18884 link
2024-05-29 MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models Taehyun Kim et.al. 2405.18832 null
2024-05-29 Yuan 2.0-M32: Mixture of Experts with Attention Router Shaohua Wu et.al. 2405.17976 link
2024-05-28 LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design Rui Kong et.al. 2405.17741 null
2024-05-27 Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node Andreas Charalampopoulos et.al. 2405.16836 link
2024-05-26 Mixture of Experts Using Tensor Products Zhan Su et.al. 2405.16671 link
2024-05-30 A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts Mohammed Nowaz Rabbani Chowdhury et.al. 2405.16646 null
2024-05-26 Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation Rongyu Zhang et.al. 2405.16486 link
2024-05-25 MoEUT: Mixture-of-Experts Universal Transformers Róbert Csordás et.al. 2405.16039 link
2024-05-23 Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training Xianzhi Du et.al. 2405.15052 link
2024-05-23 Unchosen Experts Can Contribute Too: Unleashing MoE Models’ Power by Self-Contrast Chufan Shi et.al. 2405.14507 link
2024-05-23 Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models Yongxin Guo et.al. 2405.14297 link
2024-05-23 Graph Sparsification via Mixture of Graphs Guibin Zhang et.al. 2405.14260 link
2024-05-23 Statistical Advantages of Perturbing Cosine Router in Sparse Mixture of Experts Huy Nguyen et.al. 2405.14131 null
2024-05-23 Mixture of Experts Meets Prompt-Based Continual Learning Minh Le et.al. 2405.14124 link
2024-05-22 Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts Huy Nguyen et.al. 2405.13997 null
2024-05-22 xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token Xin Cheng et.al. 2405.13792 link
2024-05-24 MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models Jingwei Xu et.al. 2405.13053 link
2024-05-21 Optimizing Generative AI Networking: A Dual Perspective with Multi-Agent Systems and Mixture of Experts Ruichen Zhang et.al. 2405.12472 null
2024-05-21 Ensemble and Mixture-of-Experts DeepONets For Operator Learning Ramansh Sharma et.al. 2405.11907 link
2024-05-19 Learning More Generalized Experts by Merging Experts in Mixture-of-Experts Sejik Park et.al. 2405.11530 null
2024-05-18 Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts Yunxin Li et.al. 2405.11273 link
2024-05-16 Many Hands Make Light Work: Task-Oriented Dialogue System with Module-Based Mixture-of-Experts Ruolin Su et.al. 2405.09744 null
2024-05-15 M $^4$ oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts Yufeng Jiang et.al. 2405.09446 link
2024-05-13 Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition Zhiyong Yang et.al. 2405.07780 link
2024-05-07 SUTRA: Scalable Multilingual Language Model Architecture Abhijit Bendale et.al. 2405.06694 null
2024-05-09 A Mixture of Experts Approach to 3D Human Motion Prediction Edmund Shieh et.al. 2405.06088 link
2024-05-09 A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds Christopher Z. Cui et.al. 2405.06059 null
2024-05-09 EWMoE: An effective model for global weather forecasting with mixture-of-experts Lihao Gan et.al. 2405.06004 link
2024-05-09 CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts Jiachen Li et.al. 2405.05949 link
2024-05-16 DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model DeepSeek-AI et.al. 2405.04434 link
2024-05-07 Enhancing Physical Layer Communication Security through Generative AI with Mixture of Experts Changyuan Zhao et.al. 2405.04198 null
2024-05-06 Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training Zexuan Zhong et.al. 2405.03133 null
2024-05-06 WDMoE: Wireless Distributed Large Language Models with Mixture of Experts Nan Xue et.al. 2405.03131 null

<a href=#updated-on-20251022>(back to top)</a>