Updated on 2025.06.28
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-17 | Utility-Driven Speculative Decoding for Mixture-of-Experts | Anish Saxena et.al. | 2506.20675 | null |
2025-06-25 | Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU | He Sun et.al. | 2506.20187 | null |
2025-06-24 | MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection | Zhengxiang Huang et.al. | 2506.19884 | null |
2025-06-23 | Black-Box Test Code Fault Localization Driven by Large Language Models and Execution Estimation | Ahmadreza Saboor Yaraghi et.al. | 2506.19045 | null |
2025-06-23 | WiLLM: An Open Wireless LLM Communication System | Boyi Liu et.al. | 2506.19030 | null |
2025-06-23 | CommVQ: Commutative Vector Quantization for KV Cache Compression | Junyan Li et.al. | 2506.18879 | null |
2025-06-22 | Mechanistic Interpretability in the Presence of Architectural Obfuscation | Marcos Florencio et.al. | 2506.18053 | null |
2025-06-20 | Towards AI Search Paradigm | Yuchen Li et.al. | 2506.17188 | null |
2025-06-17 | CrEst: Credibility Estimation for Contexts in LLMs via Weak Supervision | Dyah Adila et.al. | 2506.14912 | null |
2025-06-16 | Vector Ontologies as an LLM world view extraction method | Kaspar Rothenfusser et.al. | 2506.13252 | link |
2025-06-13 | Semantic Scheduling for LLM Inference | Wenyue Hua et.al. | 2506.12204 | link |
2025-06-13 | GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news | Abdul Haque et.al. | 2506.11600 | null |
2025-06-13 | Collaborative LLM Inference via Planning for Efficient Reasoning | Byeongchan Lee et.al. | 2506.11578 | null |
2025-06-13 | Efficient Long-Context LLM Inference via KV Cache Clustering | Jie Hu et.al. | 2506.11418 | null |
2025-06-12 | TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference | Hongbin Zhang et.al. | 2506.10470 | null |
2025-06-11 | A First Look at Bugs in LLM Inference Engines | Mugeng Liu et.al. | 2506.09713 | link |
2025-06-12 | Understanding the Performance and Power of LLM Inferencing on Edge Accelerators | Mayank Arya et.al. | 2506.09554 | null |
2025-06-11 | Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning | Jiayi Yuan et.al. | 2506.09501 | null |
2025-06-10 | Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive- $k$ | Chihiro Taguchi et.al. | 2506.08479 | null |
2025-06-10 | Draft-based Approximate Inference for LLMs | Kevin Galim et.al. | 2506.08373 | link |
2025-06-09 | MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts | Wei Tao et.al. | 2506.07533 | null |
2025-06-07 | Containerized In-Storage Processing and Computing-Enabled SSD Disaggregation | Miryeong Kwon et.al. | 2506.06769 | null |
2025-06-06 | Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques | Adarsh Prasad Behera et.al. | 2506.06579 | null |
2025-06-04 | On the Fundamental Impossibility of Hallucination Control in Large Language Models | Michał P. Karpowicz et.al. | 2506.06382 | null |
2025-06-04 | SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling | Anhao Zhao et.al. | 2506.04179 | null |
2025-06-04 | Pre $^3$ : Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation | Junyi Chen et.al. | 2506.03887 | null |
2025-06-04 | Client-Side Zero-Shot LLM Inference for Comprehensive In-Browser URL Analysis | Avihay Cohen et.al. | 2506.03656 | null |
2025-06-04 | POSS: Position Specialist Generates Better Draft for Speculative Decoding | Langlin Huang et.al. | 2506.03566 | link |
2025-06-07 | Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs | Jiakun Fan et.al. | 2506.03296 | null |
2025-06-03 | Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs | Shangmin Guo et.al. | 2506.02918 | null |
2025-06-03 | HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference | Ping Gong et.al. | 2506.02572 | link |
2025-06-02 | Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts | Spencer Banasik et.al. | 2506.01827 | null |
2025-05-30 | Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching | Juan Wisznia et.al. | 2505.24643 | null |
2025-05-30 | LLM Inference Enhanced by External Knowledge: A Survey | Yu-Hsuan Lin et.al. | 2505.24377 | link |
2025-05-30 | SkyLB: A Locality-Aware Cross-Region Load Balancer for LLM Inference | Tian Xia et.al. | 2505.24095 | null |
2025-05-29 | Large Language Model Meets Constraint Propagation | Alexandre Bonlarron et.al. | 2505.24012 | null |
2025-05-29 | Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism | Jinhui Wei et.al. | 2505.23219 | null |
2025-05-29 | SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference | Yinghao Tang et.al. | 2505.23022 | null |
2025-05-28 | Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference | Donghyeon Joo et.al. | 2505.22913 | link |
2025-05-28 | Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference | Yue Zhu et.al. | 2505.21919 | null |
2025-05-28 | HoliTom: Holistic Token Merging for Fast Video Large Language Models | Kele Shao et.al. | 2505.21334 | link |
2025-05-28 | FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration | Daehyeon Baek et.al. | 2505.20839 | null |
2025-05-26 | HAMburger: Accelerating LLM Inference via Token Smashing | Jingyu Liu et.al. | 2505.20438 | null |
2025-05-26 | MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE | Zongle Huang et.al. | 2505.19645 | null |
2025-05-26 | WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference | Sihan Chen et.al. | 2505.19427 | link |
2025-05-25 | DECA: A Near-Core LLM Decompression Accelerator Supporting Out-of-Order Invocation | Gerasimos Gerogiannis et.al. | 2505.19349 | null |
2025-05-27 | A Survey of LLM $\times$ DATA | Xuanhe Zhou et.al. | 2505.18458 | link |
2025-05-23 | An Attack to Break Permutation-Based Private Third-Party Inference Schemes for LLMs | Rahul Thomas et.al. | 2505.18332 | null |
2025-05-23 | NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache | Donghyun Son et.al. | 2505.18231 | null |
2025-05-23 | Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning | Michael Hassid et.al. | 2505.17813 | null |
2025-05-23 | DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies | Ning Yang et.al. | 2505.17420 | null |
2025-05-22 | RAP: Runtime-Adaptive Pruning for LLM Inference | Huanrong Liu et.al. | 2505.17138 | null |
2025-05-22 | CASTILLO: Characterizing Response Length Distributions of Large Language Models | Daniel F. Perez-Ramirez et.al. | 2505.16881 | link |
2025-05-22 | Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization | Vera Neplenbroek et.al. | 2505.16467 | link |
2025-05-22 | QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design | Benjamin Schneider et.al. | 2505.16175 | link |
2025-05-22 | KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization | Mingbo Song et.al. | 2505.16162 | null |
2025-05-20 | Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity | Susav Shrestha et.al. | 2505.14884 | link |
2025-05-20 | ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions | Bufang Yang et.al. | 2505.14668 | null |
2025-05-20 | ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMs | Yifan Sui et.al. | 2505.14468 | null |
2025-05-16 | An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents | Ayesha Amjad et.al. | 2505.13504 | null |
2025-05-19 | HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding | Siran Liu et.al. | 2505.13254 | null |
2025-05-19 | FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference | Guangda Liu et.al. | 2505.13109 | null |
2025-05-19 | FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks | Zihua Wang et.al. | 2505.12728 | link |
2025-05-17 | Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning | Yuheng Lu et.al. | 2505.11922 | null |
2025-05-17 | Arrow: Adaptive Scheduling Mechanisms for Disaggregated LLM Inference Architecture | Yu Wu et.al. | 2505.11916 | null |
2025-05-16 | TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference | Raja Gond et.al. | 2505.11329 | null |
2025-05-16 | Vaiage: A Multi-Agent Solution to Personalized Travel Planning | Binwen Liu et.al. | 2505.10922 | null |
2025-05-19 | SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices | Xiangwen Zhuge et.al. | 2505.10259 | link |
2025-05-15 | ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production | Yuxing Xiang et.al. | 2505.09999 | link |
2025-05-15 | How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference | Nidhal Jegham et.al. | 2505.09598 | null |
2025-05-14 | Statistical Modeling and Uncertainty Estimation of LLM Inference Systems | Kaustabha Ray et.al. | 2505.09319 | null |
2025-05-14 | ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor | Seungbeom Choi et.al. | 2505.09142 | null |
2025-05-13 | LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries | Zekun Wu et.al. | 2505.08842 | null |
2025-05-13 | Automatic Task Detection and Heterogeneous LLM Speculative Decoding | Danying Ge et.al. | 2505.08600 | null |
2025-05-08 | Scaling Laws for Speculative Decoding | Siyuan Yan et.al. | 2505.07858 | null |
2025-05-12 | SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models | Hang Wu et.al. | 2505.07680 | null |
2025-05-12 | Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity | Guang Yan et.al. | 2505.07239 | null |
2025-05-12 | PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications | Kuntai Du et.al. | 2505.07203 | null |
2025-05-14 | I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference | Zibo Gao et.al. | 2505.06738 | null |
2025-05-09 | Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference | Haolin Zhang et.al. | 2505.06461 | null |
2025-05-09 | Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM | Zehao Fan et.al. | 2505.05772 | null |
2025-05-08 | HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow | You Peng et.al. | 2505.05286 | link |
2025-05-06 | Faster MoE LLM Inference for Extremely Large Models | Haoqi Yang et.al. | 2505.03531 | null |
2025-05-05 | RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference | Yaoqi Chen et.al. | 2505.02922 | null |
2025-05-03 | High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers | Brian Wong et.al. | 2505.01693 | null |
2025-05-08 | A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency | Sihyeong Park et.al. | 2505.01658 | link |
2025-05-02 | PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding | Bradley McDanel et.al. | 2505.01572 | null |
2025-04-28 | AutoJudge: Judge Decoding Without Manual Annotation | Roman Garipov et.al. | 2504.20039 | null |
2025-04-28 | Taming the Titans: A Survey of Efficient LLM Inference Serving | Ranran Zhen et.al. | 2504.19720 | link |
2025-04-28 | R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference | Zhenyu Zhang et.al. | 2504.19449 | null |
2025-05-07 | A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification | Junichiro Niimi et.al. | 2504.18884 | link |
2025-04-29 | PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation | Zihao An et.al. | 2504.18583 | null |
2025-04-25 | PropRAG: Guiding Retrieval with Beam Search over Proposition Paths | Jingjin Wang et.al. | 2504.18070 | null |
2025-04-24 | L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference | Qingyuan Liu et.al. | 2504.17584 | null |
2025-04-24 | On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration | Maoyang Xiang et.al. | 2504.17376 | null |
2025-04-18 | HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing | Myunghyun Rhee et.al. | 2504.16112 | null |
2025-04-22 | Token-Aware Coding Flow: A Study with Nano Surge in Reasoning Model | Junwei Hu et.al. | 2504.15989 | null |
2025-04-23 | KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments | Junyoung Park et.al. | 2504.15364 | null |
2025-04-18 | High-Throughput LLM inference on Heterogeneous Clusters | Yi Xiong et.al. | 2504.15303 | null |
2025-04-21 | Hardware-based Heterogeneous Memory Management for Large Language Model Inference | Soojin Hwang et.al. | 2504.14893 | null |
2025-04-19 | Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator | Akshat Ramachandran et.al. | 2504.14365 | null |
2025-04-19 | FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference | Coleman Hooper et.al. | 2504.14152 | null |
2025-04-16 | Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading | Kihyun Kim et.al. | 2504.11816 | link |
2025-04-16 | Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs | Hyungwoo Lee et.al. | 2504.11765 | null |
2025-04-16 | Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures | Prabhu Vellaisamy et.al. | 2504.11750 | null |
2025-04-15 | Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints | Ruicheng Ao et.al. | 2504.11320 | link |
2025-04-14 | HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving | Avinash Kumar et.al. | 2504.10724 | null |
2025-04-14 | AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference | Yangshen Deng et.al. | 2504.10326 | null |
2025-04-14 | KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference | Yuxuan Tian et.al. | 2504.09936 | null |
2025-04-16 | Understanding and Optimizing Multi-Stage AI Inference Pipelines | Abhimanyu Rajeshkumar Bambhaniya et.al. | 2504.09775 | null |
2025-04-13 | LoopLynx: A Scalable Dataflow Architecture for Efficient LLM Inference | Jianing Zheng et.al. | 2504.09561 | link |
2025-04-12 | MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints | Yichao Yuan et.al. | 2504.09345 | null |
2025-04-11 | SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting | Jiaming Xu et.al. | 2504.08850 | null |
2025-04-10 | SD $^2$ : Self-Distilled Sparse Drafters | Mike Lasby et.al. | 2504.08838 | null |
2025-04-11 | Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash | Fucheng Jia et.al. | 2504.08378 | null |
2025-04-11 | Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices | Shengyuan Ye et.al. | 2504.08242 | null |
2025-04-10 | Token Level Routing Inference System for Edge Devices | Jianshu She et.al. | 2504.07878 | null |
2025-04-10 | Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving | Shihong Gao et.al. | 2504.07494 | link |
2025-04-10 | UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference | Weikai Xu et.al. | 2504.07479 | null |
2025-04-10 | Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents | Yueying Li et.al. | 2504.07347 | null |
2025-04-08 | SPIRe: Boosting LLM Inference Throughput with Speculative Decoding | Sanjit Neelam et.al. | 2504.06419 | null |
2025-04-08 | Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching | Yanhao Dong et.al. | 2504.06319 | null |
2025-04-09 | Hogwild! Inference: Parallel LLM Generation via Concurrent Attention | Gleb Rodionov et.al. | 2504.06261 | link |
2025-04-11 | User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems | Jianling Wang et.al. | 2504.05522 | null |
2025-04-07 | Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness | Dongzhuoran Zhou et.al. | 2504.05163 | null |
2025-04-04 | Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency | Erik Johannes Husom et.al. | 2504.03360 | null |
2025-04-04 | Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation | Weitao Li et.al. | 2504.03165 | link |
2025-04-03 | Narrative Studio: Visual narrative exploration using LLMs and Monte Carlo Tree Search | Parsa Ghaffari et.al. | 2504.02426 | link |
2025-04-01 | SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching | Yuxuan Zhu et.al. | 2504.00970 | null |
2025-04-03 | Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding | Aayush Gautam et.al. | 2504.00030 | null |
2025-04-06 | ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance | Tong Xie et.al. | 2503.24053 | link |
2025-03-31 | MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration | Tatsuya Kubo et.al. | 2503.23817 | null |
2025-03-30 | Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference | Wei Tao et.al. | 2503.23294 | null |
2025-03-28 | Niyama : Breaking the Silos of LLM Inference Serving | Kanishk Goel et.al. | 2503.22562 | null |
2025-03-25 | LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation | Han Chen et.al. | 2503.19950 | link |
2025-03-24 | xKV: Cross-Layer SVD for KV-Cache Compression | Chi-Chih Chang et.al. | 2503.18893 | link |
2025-03-27 | Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design | Rui Xie et.al. | 2503.18869 | null |
2025-03-24 | Jenga: Effective Memory Management for Serving LLM with Heterogeneity | Chen Zhang et.al. | 2503.18292 | null |
2025-03-27 | WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference | Youhui Zuo et.al. | 2503.17922 | link |
2025-03-22 | PipeBoost: Resilient Pipelined Architecture for Fast Serverless LLM Scaling | Chongpeng Liu et.al. | 2503.17707 | null |
2025-03-21 | V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms | Javier J. Poveda Rodrigo et.al. | 2503.17422 | null |
2025-03-21 | Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation | Jingzhi Fang et.al. | 2503.16893 | null |
2025-03-20 | SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models | Fahao Chen et.al. | 2503.15921 | null |
2025-03-19 | Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study | Jomar Thomas Almonte et.al. | 2503.15248 | null |
2025-03-19 | Communication-Efficient Distributed On-Device LLM Inference Over Wireless Networks | Kai Zhang et.al. | 2503.14882 | null |
2025-03-18 | PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play | Wei Fang et.al. | 2503.14432 | null |
2025-03-17 | Mitigating KV Cache Competition to Enhance User Experience in LLM Inference | Haiying Shen et.al. | 2503.13773 | null |
2025-03-17 | AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications | Haiying Shen et.al. | 2503.13737 | null |
2025-03-17 | ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts | Evangelos Georganas et.al. | 2503.13565 | null |
2025-03-14 | Examples as the Prompt: A Scalable Approach for Efficient LLM Adaptation in E-Commerce | Jingying Zeng et.al. | 2503.13518 | null |
2025-03-17 | xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference | Maximilian Beck et.al. | 2503.13427 | link |
2025-03-17 | VeriLeaky: Navigating IP Protection vs Utility in Fine-Tuning for LLM-Driven Verilog Coding | Zeng Wang et.al. | 2503.13116 | null |
2025-03-15 | TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation | Mayank Kumar et.al. | 2503.12217 | null |
2025-03-09 | Green Prompting | Marta Adamska et.al. | 2503.10666 | null |
2025-03-13 | Collaborative Speculative Inference for Efficient LLM Inference Serving | Luyao Gao et.al. | 2503.10325 | null |
2025-03-12 | Prompt Inference Attack on Distributed Large Language Model Inference Frameworks | Xinjian Luo et.al. | 2503.09291 | null |
2025-03-11 | TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems | Feiyang Wu et.al. | 2503.08415 | link |
2025-03-11 | Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference | Pol G. Recasens et.al. | 2503.08311 | null |
2025-03-09 | Seesaw: High-throughput LLM Inference via Model Re-sharding | Qidong Su et.al. | 2503.06433 | null |
2025-03-07 | Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching | Bowen Pang et.al. | 2503.05248 | link |
2025-03-07 | SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding | Kaiyu Huang et.al. | 2503.05096 | null |
2025-03-15 | Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking | Yijie Xu et.al. | 2503.04636 | null |
2025-03-06 | AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services | Xiaoqi Wang et.al. | 2503.04418 | null |
2025-03-06 | Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search | Kou Misaki et.al. | 2503.04412 | null |
2025-03-06 | Beyond Memorization: Evaluating the True Type Inference Capabilities of LLMs for Java Code Snippets | Yiwen Dong et.al. | 2503.04076 | null |
2025-03-04 | FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference | Hongchao Du et.al. | 2503.03777 | null |
2025-03-05 | MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems | Rui Ye et.al. | 2503.03686 | null |
2025-03-04 | VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference | Zihan Liu et.al. | 2503.02236 | null |
2025-02-26 | Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis | Long Cheng et.al. | 2503.01873 | null |
2025-03-03 | SAGE: A Framework of Precise Retrieval for RAG | Jintao Zhang et.al. | 2503.01713 | null |
2025-03-03 | DILEMMA: Joint LLM Quantization and Distributed LLM Inference Over Edge Computing Systems | Minoo Hosseinzadeh et.al. | 2503.01704 | null |
2025-03-01 | Tutorial Proposal: Speculative Decoding for Efficient LLM Inference | Heming Xia et.al. | 2503.00491 | null |
2025-02-28 | FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference | Xunhao Lai et.al. | 2502.20766 | link |
2025-02-28 | SPD: Sync-Point Drop for efficient tensor parallelism of Large Language Models | Han-Byul Kim et.al. | 2502.20727 | null |
2025-02-27 | ECCOS: Efficient Capability and Cost Coordinated Scheduling for Multi-LLM Serving | Kai Mei et.al. | 2502.20576 | link |
2025-02-26 | Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs | Yiheng Yang et.al. | 2502.19078 | null |
2025-02-24 | LLM Inference Acceleration via Efficient Operation Fusion | Mahsa Salmani et.al. | 2502.17728 | null |
2025-02-24 | CodeSwift: Accelerating LLM Inference for Efficient Code Generation | Qianhui Zhao et.al. | 2502.17139 | null |
2025-02-24 | Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM | Lian Liu et.al. | 2502.16963 | null |
2025-02-24 | DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance | Xuanfan Ni et.al. | 2502.16886 | null |
2025-03-01 | CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter | Yepeng Weng et.al. | 2502.16880 | null |
2025-02-23 | DISC: Dynamic Decomposition Improves LLM Inference Scaling | Jonathan Light et.al. | 2502.16706 | null |
2025-02-23 | TerEffic: Highly Efficient Ternary LLM Inference on FPGA | Chenyang Yin et.al. | 2502.16473 | null |
2025-02-21 | KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse | Jingbo Yang et.al. | 2502.16002 | link |
2025-02-21 | Towards Swift Serverless LLM Cold Starts with ParaServe | Chiheng Lou et.al. | 2502.15524 | null |
2025-02-24 | HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings | Rasmus Aavang et.al. | 2502.15411 | link |
2025-02-24 | Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference | Yaohua Tang et.al. | 2502.15294 | null |
2025-02-21 | A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation | Shilong Hou et.al. | 2502.15233 | link |
2025-02-19 | EvoP: Robust LLM Inference via Evolutionary Pruning | Shangyu Wu et.al. | 2502.14910 | null |
2025-02-20 | Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale | Shashwat Jaiswal et.al. | 2502.14617 | null |
2025-02-20 | SR-LLM: Rethinking the Structured Representation in Large Language Model | Jiahuan Zhang et.al. | 2502.14352 | null |
2025-02-19 | RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression | Payman Behnam et.al. | 2502.14051 | null |
2025-02-19 | Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference | Qingfa Xiao et.al. | 2502.13542 | null |
2025-02-19 | What are Models Thinking about? Understanding Large Language Model Hallucinations “Psychology” through Model Inner State Analysis | Peiran Wang et.al. | 2502.13490 | null |
2025-02-18 | BaKlaVa – Budgeted Allocation of KV cache for Long-context Inference | Ahmed Burak Gulhan et.al. | 2502.13176 | null |
2025-02-18 | R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs | Sumin Jo et.al. | 2502.12767 | link |
2025-02-18 | HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading | Cheng Luo et.al. | 2502.12574 | link |
2025-02-18 | Distributed On-Device LLM Inference With Over-the-Air Computation | Kai Zhang et.al. | 2502.12559 | null |
2025-02-18 | SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs | Ahmed F. AbouElhamayed et.al. | 2502.12444 | link |
2025-02-17 | Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs | Kan Zhu et.al. | 2502.12216 | null |
2025-02-17 | Designing Role Vectors to Improve LLM Inference Behaviour | Daniele Potertì et.al. | 2502.12055 | null |
2025-02-17 | DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services | Ting Sun et.al. | 2502.11417 | null |
2025-02-17 | Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment | Ben Dong et.al. | 2502.11347 | null |
2025-02-16 | Diversified Sampling Improves Scaling LLM inference | Tianchun Wang et.al. | 2502.11027 | null |
2025-02-16 | Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings | Liangqi Yuan et.al. | 2502.11007 | link |
2025-02-15 | Pushing up to the Limit of Memory Bandwidth and Capacity Utilization for Efficient LLM Decoding on Embedded FPGA | Jindong Li et.al. | 2502.10659 | null |
2025-02-14 | λScale: Enabling Fast Scaling for Serverless Large Language Model Inference | Minchen Yu et.al. | 2502.09922 | null |
2025-02-14 | INF^2: High-Throughput Generative Inference of Large Language Models using Near-Storage Processing | Hongsun Jang et.al. | 2502.09921 | null |
2025-02-13 | On multi-token prediction for efficient LLM inference | Somesh Mehra et.al. | 2502.09419 | null |
2025-02-13 | InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU | Heejun Lee et.al. | 2502.08910 | null |
2025-02-12 | Universal Model Routing for Efficient LLM Inference | Wittawat Jitkrittum et.al. | 2502.08773 | null |
2025-02-12 | Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences | Shanshan Han et.al. | 2502.08142 | null |
2025-02-11 | HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment | Youhe Jiang et.al. | 2502.07903 | null |
2025-02-11 | SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters | Yiping Wang et.al. | 2502.07832 | null |
2025-02-11 | PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference | Yufeng Gu et.al. | 2502.07578 | link |
2025-02-13 | Online Scheduling for LLM Inference with KV Cache Constraints | Patrick Jaillet et.al. | 2502.07115 | null |
2025-02-08 | Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models | Soham Poddar et.al. | 2502.05610 | null |
2025-02-08 | Mechanistic Interpretability of Emotion Inference in Large Language Models | Ala N. Tak et.al. | 2502.05489 | null |
2025-02-07 | BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference | Reena Elangovan et.al. | 2502.05376 | null |
2025-02-07 | LLM Query Scheduling with Prefix Reuse and Latency Constraints | Gregory Dexter et.al. | 2502.04677 | null |
2025-02-06 | WaferLLM: A Wafer-Scale LLM Inference System | Congjie He et.al. | 2502.04563 | null |
2025-02-06 | KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference | Xing Li et.al. | 2502.04420 | link |
2025-02-06 | CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference | Zehua Pei et.al. | 2502.04416 | link |
2025-02-06 | AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference | Qingyue Yang et.al. | 2502.04077 | link |
2025-02-06 | Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective | Yuan Feng et.al. | 2502.03805 | link |
2025-02-06 | Adaptive Semantic Prompt Caching with VectorQ | Luis Gaspar Schroeder et.al. | 2502.03771 | null |
2025-02-05 | HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference | Zeyu Zhang et.al. | 2502.03589 | null |
2025-02-05 | Accessible and Portable LLM Inference by Compiling Computational Graphs into SQL | Wenbo Sun et.al. | 2502.02818 | null |
2025-02-05 | Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation | Jingyu Liu et.al. | 2502.02789 | link |
2025-02-04 | EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization | Yize Wu et.al. | 2502.02493 | null |
2025-01-30 | Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency | Sazzad Hossain et.al. | 2502.01651 | null |
2025-02-06 | An Investigation of FP8 Across Accelerators for LLM Inference | Jiwoo Kim et.al. | 2502.01070 | null |
2025-02-02 | Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference | Patrick Yubeaton et.al. | 2502.00922 | null |
2025-02-02 | SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models | Jiawen Zhang et.al. | 2502.00847 | null |
2025-02-01 | UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs | Yizhe Xiong et.al. | 2502.00439 | null |
2025-02-01 | ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference | Xiang Liu et.al. | 2502.00299 | null |
2025-01-31 | Pheromone-based Learning of Optimal Reasoning Paths | Anirudh Chari et.al. | 2501.19278 | null |
2025-02-02 | RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations | Zunhai Su et.al. | 2501.16383 | link |
2025-01-27 | Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs | Antony Bartlett et.al. | 2501.16191 | null |
2025-01-27 | TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference | Jack Min Ong et.al. | 2501.16007 | null |
2025-01-27 | Aging-aware CPU Core Management for Embodied Carbon Amortization in Cloud LLM Inference | Tharindu B. Hewage et.al. | 2501.15829 | link |
2025-01-25 | Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads | Xingyang He et.al. | 2501.15113 | null |
2025-01-24 | Locality-aware Fair Scheduling in LLM Serving | Shiyi Cao et.al. | 2501.14312 | null |
2025-01-20 | Glinthawk: A Two-Tiered Architecture for High-Throughput LLM Inference | Pouya Hamadanian et.al. | 2501.11779 | link |
2025-01-20 | Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas | Nishant Balepur et.al. | 2501.11549 | link |
2025-01-19 | GREEN-CODE: Optimizing Energy Efficiency in Large Language Models for Code Generation | Shashikant Ilager et.al. | 2501.11006 | link |
2025-01-17 | A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks | Xinzhe Li et.al. | 2501.10069 | link |
2025-01-16 | Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition | Takaaki Hori et.al. | 2501.09258 | null |
2025-01-15 | Guiding Retrieval using LLM-based Listwise Rankers | Mandeep Rathee et.al. | 2501.09186 | link |
2025-01-14 | Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings | Paul Joe Maliakel et.al. | 2501.08219 | null |
2025-01-14 | PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving | Ahmet Caner Yüzügüler et.al. | 2501.08192 | null |
2025-01-14 | Hierarchical Autoscaling for Large Language Model Serving with Chiron | Archit Patke et.al. | 2501.08090 | null |
2025-01-12 | MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference | Wenxuan Zeng et.al. | 2501.06807 | null |
2025-01-05 | TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms | Jovan Stojkovic et.al. | 2501.02600 | null |
2025-01-04 | AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference | Zhuomin He et.al. | 2501.02336 | link |
2025-01-03 | Efficient LLM Inference with Activation Checkpointing and Hybrid Caching | Sanghyeon Lee et.al. | 2501.01792 | null |
2025-01-03 | BlockDialect: Block-wise Fine-grained Mixed Format for Energy-Efficient LLM Inference | Wonsuk Jang et.al. | 2501.01144 | link |
2025-01-02 | FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving | Zihao Ye et.al. | 2501.01005 | link |
2024-12-23 | Highly Optimized Kernels and Fine-Grained Codebooks for LLM Inference on Arm CPUs | Dibakar Gope et.al. | 2501.00032 | link |
2024-12-29 | TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication | Zongwu Wang et.al. | 2412.20501 | link |
2024-12-28 | LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System | Hyucksung Kwon et.al. | 2412.20166 | null |
2024-12-19 | GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors | Chengming Zhang et.al. | 2412.19829 | null |
2025-01-02 | A Survey on Large Language Model Acceleration based on KV Cache Management | Haoyang Li et.al. | 2412.19442 | link |
2024-12-27 | An Engorgio Prompt Makes Large Language Model Babble on | Jianshuo Dong et.al. | 2412.19394 | link |
2024-12-25 | Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference | Libo Zhang et.al. | 2412.18934 | null |
2024-12-21 | SYMPHONY: Improving Memory Management for LLM Inference Workloads | Saurabh Agarwal et.al. | 2412.16434 | null |
2024-12-20 | WebLLM: A High-Performance In-Browser LLM Inference Engine | Charlie F. Ruan et.al. | 2412.15803 | link |
2024-12-18 | A Survey on LLM Inference-Time Self-Improvement | Xiangjue Dong et.al. | 2412.14352 | link |
2024-12-18 | Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models | Seungeun Oh et.al. | 2412.12687 | null |
2024-12-17 | A System for Microserving of LLMs | Hongyi Jin et.al. | 2412.12488 | null |
2024-12-16 | CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation | Hongxuan Zhang et.al. | 2412.11741 | null |
2024-12-15 | Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning | Yun Qu et.al. | 2412.11120 | link |
2024-12-15 | NITRO: LLM Inference on Intel Laptop NPUs | Anthony Fei et.al. | 2412.11053 | link |
2024-12-13 | SCBench: A KV Cache-Centric Analysis of Long-Context Methods | Yucheng Li et.al. | 2412.10319 | null |
2024-12-17 | TurboAttention: Efficient Attention Approximation For High Throughputs LLMs | Hao Kang et.al. | 2412.08585 | null |
2024-12-11 | Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths | Naryeong Kim et.al. | 2412.08281 | null |
2024-12-12 | TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch | Xingchen Song et.al. | 2412.08237 | null |
2024-12-09 | Asynchronous LLM Function Calling | In Gim et.al. | 2412.07017 | null |
2024-12-09 | SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs | James Vo et.al. | 2412.06198 | null |
2024-12-08 | XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference | Weizhuo Li et.al. | 2412.05896 | null |
2024-12-06 | GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments | Yanyu Chen et.al. | 2412.04788 | null |
2024-12-03 | Multi-Bin Batching for Increasing LLM Inference Throughput | Ozgur Guldogan et.al. | 2412.04504 | null |
2024-11-29 | BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching | Zhen Zheng et.al. | 2412.03594 | null |
2024-12-03 | Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity | Da Ma et.al. | 2412.02252 | null |
2024-12-02 | PLD+: Accelerating LLM inference by leveraging Language Model Artifacts | Shwetha Somasundaram et.al. | 2412.01447 | null |
2024-12-02 | Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking | Marco Federici et.al. | 2412.01380 | null |
2024-12-05 | RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy | Geonho Lee et.al. | 2412.01129 | link |
2024-12-02 | TruncFormer: Private LLM Inference Using Only Truncations | Patrick Yubeaton et.al. | 2412.01042 | null |
2024-11-29 | A dynamic parallel method for performance optimization on hybrid CPUs | Luo Yu et.al. | 2411.19542 | null |
2024-12-03 | Puzzle: Distillation-Based NAS for Inference-Optimized LLMs | Akhiad Bercovich et.al. | 2411.19146 | null |
2024-11-29 | InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks | Xinyao Zheng et.al. | 2411.18191 | null |
2024-11-28 | MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache | Akshat Sharma et.al. | 2411.18077 | null |
2024-11-24 | Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments | Nikoleta Iliakopoulou et.al. | 2411.17741 | null |
2024-11-26 | PIM-AI: A Novel Architecture for High-Efficiency LLM Inference | Cristobal Ortega et.al. | 2411.17309 | null |
2024-11-26 | Star Attention: Efficient LLM Inference over Long Sequences | Shantanu Acharya et.al. | 2411.17116 | link |
2024-11-26 | Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation | Chaoyi Jiang et.al. | 2411.17089 | link |
2024-11-25 | MixPE: Quantization and Hardware Co-design for Efficient LLM Inference | Yu Zhang et.al. | 2411.16158 | null |
2024-11-24 | eFedLLM: Efficient LLM Inference Based on Federated Learning | Shengwen Ding et.al. | 2411.16003 | null |
2024-11-24 | Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format | Chao Fang et.al. | 2411.15982 | null |
2024-11-24 | Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems | Wenxiang Lin et.al. | 2411.15715 | null |
2024-11-22 | XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models | Yixin Dong et.al. | 2411.15100 | null |
2024-11-21 | Disentangling Memory and Reasoning Ability in Large Language Models | Mingyu Jin et.al. | 2411.13504 | link |
2024-11-20 | Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding | Hyun Ryu et.al. | 2411.13157 | null |
2024-11-21 | LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts | Zhuohan Gu et.al. | 2411.13009 | null |
2024-11-15 | An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2 | Pepijn de Reus et.al. | 2411.12758 | link |
2024-11-19 | SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference | Jiho Shin et.al. | 2411.12692 | null |
2024-11-18 | MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs | Shiyi Cao et.al. | 2411.11217 | null |
2024-11-15 | AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference | Janghwan Lee et.al. | 2411.09909 | null |
2024-11-14 | Squeezed Attention: Accelerating Long Context Length LLM Inference | Coleman Hooper et.al. | 2411.09688 | link |
2024-11-15 | Communication Compression for Tensor Parallel LLM Inference | Jan Hansen-Palmus et.al. | 2411.09510 | null |
2024-11-14 | Pie: Pooling CPU Memory for LLM Inference | Yi Xu et.al. | 2411.09317 | null |
2024-11-12 | Towards Low-bit Communication for Tensor Parallel LLM Inference | Harry Dong et.al. | 2411.07942 | null |
2024-11-12 | The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving | Kyoungmin Kim et.al. | 2411.07447 | null |
2024-11-08 | AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality | Ilias Bournias et.al. | 2411.05555 | null |
2024-11-07 | Hardware and Software Platform Inference | Cheng Zhang et.al. | 2411.05197 | null |
2024-11-07 | SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference | Gabriele Oliaro et.al. | 2411.04975 | link |
2024-11-05 | CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration | Hongpeng Jin et.al. | 2411.02829 | null |
2024-11-04 | RAGViz: Diagnose and Visualize Retrieval-Augmented Generation | Tevin Wang et.al. | 2411.01751 | link |
2024-11-06 | HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference | Peng Tang et.al. | 2411.01433 | null |
2024-11-02 | RA-WEBs: Remote Attestation for WEB services | Kosei Akama et.al. | 2411.01340 | null |
2024-11-02 | NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference | Xuanlin Jiang et.al. | 2411.01142 | null |
2024-11-01 | LLM-Based Misconfiguration Detection for AWS Serverless Computing | Jinfeng Wen et.al. | 2411.00642 | null |
2024-11-04 | ReverseNER: A Self-Generated Example-Driven Framework for Zero-Shot Named Entity Recognition with Large Language Models | Anbang Wang et.al. | 2411.00533 | null |
2024-11-01 | Attention Tracker: Detecting Prompt Injection Attacks in LLMs | Kuo-Han Hung et.al. | 2411.00348 | null |
2024-10-31 | LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators | Krishna Teja Chitty-Venkata et.al. | 2411.00136 | link |
2024-10-31 | Interpretable Language Modeling via Induction-head Ngram Models | Eunji Kim et.al. | 2411.00066 | link |
2024-10-31 | ALISE: Accelerating Large Language Model Serving with Speculative Scheduling | Youpeng Zhao et.al. | 2410.23537 | null |
2024-10-30 | BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference | Junqi Zhao et.al. | 2410.23079 | link |
2024-10-29 | Scaling LLM Inference with Optimized Sample Compute Allocation | Kexun Zhang et.al. | 2410.22480 | link |
2024-10-29 | SVIP: Towards Verifiable Inference of Open-source Large Language Models | Yifan Sun et.al. | 2410.22307 | null |
2024-10-28 | ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference | Hanshi Sun et.al. | 2410.21465 | link |
2024-10-27 | FIRP: Faster LLM inference via future intermediate representation prediction | Pengfei Wu et.al. | 2410.20488 | null |
2024-10-29 | Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management | Tuowei Wang et.al. | 2410.19274 | null |
2024-10-24 | Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design | Ruisi Cai et.al. | 2410.19123 | link |
2024-10-30 | Dynamic Vocabulary Pruning in Early-Exit LLMs | Jort Vincenti et.al. | 2410.18952 | link |
2024-10-25 | A Survey on Speech Large Language Models | Jing Peng et.al. | 2410.18908 | null |
2024-10-24 | BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching | Peizhuang Cong et.al. | 2410.18701 | null |
2024-10-25 | Fast Inference for Augmented Large Language Models | Rana Shahout et.al. | 2410.18248 | null |
2024-10-23 | POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference | Aditya K Kamath et.al. | 2410.18038 | link |
2024-10-22 | FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs | Haoran Lin et.al. | 2410.16663 | null |
2024-10-22 | Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency | Prafulla Kumar Choubey et.al. | 2410.16597 | null |
2024-10-20 | EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models | Junhao Hu et.al. | 2410.15332 | null |
2024-10-19 | IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System | Minseok Seo et.al. | 2410.15008 | null |
2024-10-23 | Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching | Jie Peng et.al. | 2410.14740 | null |
2024-10-18 | A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference | You Wu et.al. | 2410.14442 | link |
2024-10-18 | Revisiting SLO and Goodput Metrics in LLM Serving | Zhibin Wang et.al. | 2410.14257 | null |
2024-10-17 | RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs | Jiatan Huang et.al. | 2410.13987 | null |
2024-10-17 | Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs | Tianyu Guo et.al. | 2410.13835 | link |
2024-10-17 | Progressive Mixed-Precision Decoding for Efficient LLM Inference | Hao Mark Chen et.al. | 2410.13461 | null |
2024-10-17 | Data Defenses Against Large Language Models | William Agnew et.al. | 2410.13138 | link |
2024-10-19 | In-context KV-Cache Eviction for LLMs via Attention-Gate | Zihao Zeng et.al. | 2410.12876 | null |
2024-10-10 | RecurFormer: Not All Transformer Heads Need Self-Attention | Ruiqing Yan et.al. | 2410.12850 | null |
2024-10-16 | Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning | Huiwen Wu et.al. | 2410.12130 | null |
2024-10-15 | Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix | Yingyu Liang et.al. | 2410.11261 | null |
2024-10-14 | DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads | Guangxuan Xiao et.al. | 2410.10819 | link |
2024-10-16 | SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization | Akrit Mudvari et.al. | 2410.10759 | null |
2024-10-12 | Power-Softmax: Towards Secure LLM Inference over Encrypted Data | Itamar Zimerman et.al. | 2410.09457 | null |
2024-10-09 | SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration | Heming Xia et.al. | 2410.06916 | link |
2024-10-08 | ParallelSpec: Parallel Drafter for Efficient Speculative Decoding | Zilin Xiao et.al. | 2410.05589 | null |
2024-10-06 | RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference | Yige Xu et.al. | 2410.04519 | link |
2024-10-14 | Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective | Jinhao Li et.al. | 2410.04466 | link |
2024-10-04 | SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation | Aurick Qiao et.al. | 2410.03960 | null |
2024-10-04 | EXAQ: Exponent Aware Quantization For LLMs Acceleration | Moran Shkolnik et.al. | 2410.03185 | link |
2024-10-03 | LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences | Zhenxiao Fu et.al. | 2410.02950 | null |
2024-10-03 | Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration | Yun Qu et.al. | 2410.02511 | link |
2024-10-03 | LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services | Małgorzata Łazuka et.al. | 2410.02425 | link |
2024-10-04 | Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation | Xiaoqun Liu et.al. | 2410.02220 | null |
2024-10-02 | Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads | Yuxiang Huang et.al. | 2410.01805 | link |
2024-10-02 | ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving | Yifan Qiao et.al. | 2410.01228 | null |
2024-10-01 | TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices | Zonghang Li et.al. | 2410.00531 | link |
2024-09-30 | The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems | Linke Song et.al. | 2409.20002 | null |
2024-09-26 | Control Industrial Automation System with Large Language Models | Yuchen Xia et.al. | 2409.18009 | link |
2024-09-26 | Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores | Shaobo Ma et.al. | 2409.17870 | null |
2024-09-25 | Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction | Zhenmei Shi et.al. | 2409.17422 | link |
2024-09-25 | Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations | Amey Agrawal et.al. | 2409.17264 | null |
2024-09-25 | Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference | Zongyue Qin et.al. | 2409.16560 | null |
2024-09-25 | AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization | Yifan Tan et.al. | 2409.16546 | link |
2024-09-23 | Eagle: Efficient Training-Free Router for Multi-LLM Inference | Zesen Zhao et.al. | 2409.15518 | null |
2024-09-24 | UELLM: A Unified and Efficient Approach for LLM Inference Serving | Yiyuan He et.al. | 2409.14961 | null |
2024-09-22 | RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph | Linxi Wei et.al. | 2409.14556 | null |
2024-09-16 | Do Large Language Models Need a Content Delivery Network? | Yihua Cheng et.al. | 2409.13761 | link |
2024-09-19 | PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs) | Mahmoud Nazzal et.al. | 2409.12699 | link |
2024-09-12 | LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs | Han Xu et.al. | 2409.11424 | null |
2024-09-04 | ISO: Overlap of Computation and Communication within Seqenence For LLM Inference | Bin Xiao et.al. | 2409.11155 | null |
2024-09-18 | RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval | Di Liu et.al. | 2409.10516 | link |
2024-09-08 | InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference | Xiurui Pan et.al. | 2409.04992 | null |
2024-09-07 | Achieving Peak Performance for Large Language Models: A Systematic Review | Zhyar Rzgar K Rostam et.al. | 2409.04833 | null |
2024-09-06 | A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage | Huan Yang et.al. | 2409.04040 | null |
2024-09-13 | Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study | Jianwei Zhu et.al. | 2409.03992 | null |
2024-09-05 | Sirius: Contextual Sparsity with Correction for Efficient LLMs | Yang Zhou et.al. | 2409.03856 | link |
2024-08-31 | HSF: Defending against Jailbreak Attacks with Hidden State Filtering | Cheng Qian et.al. | 2409.03788 | null |
2024-09-03 | Contemporary Model Compression on Large Language Models Inference | Dong Liu et.al. | 2409.01990 | link |
2024-09-02 | CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification | Junhui He et.al. | 2409.01366 | link |
2024-09-04 | Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference | Barys Liskavets et.al. | 2409.01227 | link |
2024-09-01 | Research on LLM Acceleration Using the High-Performance RISC-V Processor “Xiangshan” (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product) | Xu-Hao Chen et.al. | 2409.00661 | null |
2024-08-28 | Decentralized LLM Inference over Edge Networks with Energy Harvesting | Aria Khoshsirat et.al. | 2408.15907 | null |
2024-08-28 | Efficient LLM Scheduling by Learning to Rank | Yichao Fu et.al. | 2408.15792 | link |
2024-08-28 | Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation | Lujun Gui et.al. | 2408.15562 | null |
2024-08-22 | NanoFlow: Towards Optimal Large Language Model Serving Throughput | Kan Zhu et.al. | 2408.12757 | link |
2024-09-04 | Parallel Speculative Decoding with Adaptive Draft Length | Tianyu Liu et.al. | 2408.11850 | link |
2024-08-21 | MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models | Elias Frantar et.al. | 2408.11743 | link |
2024-08-20 | Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models | Artem Vazhentsev et.al. | 2408.10692 | null |
2024-08-19 | PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars | Sumanth Prabhu et.al. | 2408.08869 | null |
2024-08-23 | ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models | Chao Zeng et.al. | 2408.08554 | link |
2024-08-14 | LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference | Seungjae Moon et.al. | 2408.07326 | null |
2024-08-12 | LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration | Zhiwen Mo et.al. | 2408.06003 | null |
2024-08-10 | LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale | Jaehong Cho et.al. | 2408.05499 | link |
2024-08-05 | SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving | Andreas Kosmas Kakolyris et.al. | 2408.05235 | null |
2024-08-08 | Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning | Ke Cheng et.al. | 2408.04323 | null |
2024-08-07 | Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference | Zeyu Zhang et.al. | 2408.04107 | null |
2024-08-07 | MPC-Minimized Secure LLM Inference | Deevashwer Rathee et.al. | 2408.03561 | null |
2024-08-05 | Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning | Hao Zhou et.al. | 2408.02549 | null |
2024-08-02 | The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines | Matias Martinez et.al. | 2408.01050 | null |
2024-08-01 | DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency | Jovan Stojkovic et.al. | 2408.00741 | null |
2024-08-01 | Designing Efficient LLM Accelerators for Edge Devices | Jude Haris et.al. | 2408.00462 | null |
2024-08-01 | Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control | Hao Zhou et.al. | 2408.00214 | null |
2024-07-23 | ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency | Yuhang Yao et.al. | 2408.00008 | null |
2024-08-01 | Responsive ML inference in multi-tenanted environments using AQUA | Abhishek Vijaya Kumar et.al. | 2407.21255 | null |
2024-07-25 | An Efficient Inference Framework for Early-exit Large Language Models | Ruijie Miao et.al. | 2407.20272 | null |
2024-07-29 | Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost | Sania Nayab et.al. | 2407.19825 | null |
2024-07-29 | Teaching LLMs at Charles University: Assignments and Activities | Jindřich Helcl et.al. | 2407.19798 | null |
2024-07-22 | RazorAttention: Efficient KV Cache Compression Through Retrieval Heads | Hanlin Tang et.al. | 2407.15891 | null |
2024-07-22 | vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving | Jiale Xu et.al. | 2407.15309 | link |
2024-07-19 | LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference | Qichen Fu et.al. | 2407.14057 | null |
2024-07-17 | Struct-X: Enhancing Large Language Models Reasoning with Structured Data | Xiaoyu Tan et.al. | 2407.12522 | null |
2024-07-17 | LLM Inference Serving: Survey of Recent Advances and Opportunities | Baolin Li et.al. | 2407.12391 | null |
2024-07-17 | Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models | Ayush Kaushal et.al. | 2407.12327 | link |
2024-07-16 | PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation | Branden Butler et.al. | 2407.11798 | null |
2024-07-21 | Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference | Yuan Feng et.al. | 2407.11550 | link |
2024-07-15 | Fast Matrix Multiplications for Lookup Table-Quantized LLMs | Han Guo et.al. | 2407.10960 | link |
2024-07-12 | Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference | Zongyue Qin et.al. | 2407.09722 | null |
2024-07-09 | Metron: Holistic Performance Evaluation Framework for LLM Inference Systems | Amey Agrawal et.al. | 2407.07000 | link |
2024-07-08 | Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU | Daliang Xu et.al. | 2407.05858 | link |
2024-07-07 | A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length | Yuqing Yang et.al. | 2407.05347 | null |
2024-07-05 | Corki: Enabling Real-time Embodied AI Robots via Algorithm-Architecture Co-Design | Yiyang Huang et.al. | 2407.04292 | link |
2024-07-04 | Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems | Grant Wilkins et.al. | 2407.04014 | null |
2024-07-02 | MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention | Huiqiang Jiang et.al. | 2407.02490 | link |
2024-06-29 | Teola: Towards End-to-End Optimization of LLM-based Applications | Xin Tan et.al. | 2407.00326 | link |
2024-06-25 | T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge | Jianyu Wei et.al. | 2407.00088 | link |
2024-06-28 | InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management | Wonbeom Lee et.al. | 2406.19707 | null |
2024-06-24 | Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters | Euiin Yi et.al. | 2406.16758 | link |
2024-06-28 | SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention | Qianchao Zhu et.al. | 2406.15486 | null |
2024-06-21 | Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models | Qi Liu et.al. | 2406.14848 | link |
2024-06-20 | Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data | Johannes Treutlein et.al. | 2406.14546 | link |
2024-06-20 | LiveMind: Low-latency Large Language Models with Simultaneous Inference | Chuangtao Chen et.al. | 2406.14319 | link |
2024-06-19 | SDQ: Sparse Decomposed Quantization for LLM Inference | Geonhwa Jeong et.al. | 2406.13868 | null |
2024-06-19 | Amphista: Accelerate LLM Inference with Bi-directional Multiple Drafting Heads in a Non-autoregressive Style | Zeping Li et.al. | 2406.13170 | null |
2024-06-16 | Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization | Jungi Lee et.al. | 2406.12930 | null |
2024-06-18 | LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization | Masafumi Enomoto et.al. | 2406.12494 | null |
2024-06-18 | LLMs Are Prone to Fallacies in Causal Inference | Nitish Joshi et.al. | 2406.12158 | null |
2024-06-14 | Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning | Hui Liu et.al. | 2406.11890 | null |
2024-06-17 | Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference | Donghyeon Joo et.al. | 2406.11674 | null |
2024-06-17 | QTIP: Quantization with Trellises and Incoherence Processing | Albert Tseng et.al. | 2406.11235 | link |
2024-06-16 | New Solutions on LLM Acceleration, Optimization, and Application | Yingbing Huang et.al. | 2406.10903 | null |
2024-06-16 | Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference | Jiaming Tang et.al. | 2406.10774 | link |
2024-06-15 | Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary Study | Hao Hao et.al. | 2406.10675 | link |
2024-06-08 | QCQA: Quality and Capacity-aware grouped Query Attention | Vinay Joshi et.al. | 2406.10247 | null |
2024-06-12 | Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference | Christopher Wolters et.al. | 2406.08413 | null |
2024-06-12 | PowerInfer-2: Fast Large Language Model Inference on a Smartphone | Zhenliang Xue et.al. | 2406.06282 | null |
2024-06-09 | A Superalignment Framework in Autonomous Driving with Large Language Models | Xiangrui Kong et.al. | 2406.05651 | null |
2024-06-06 | Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism | Jiahao Liu et.al. | 2406.03853 | null |
2024-06-04 | Language Models can Infer Action Semantics for Classical Planners from Environment Feedback | Wang Zhu et.al. | 2406.02791 | null |
2024-06-08 | Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach | Yuxuan Chen et.al. | 2406.02616 | null |
2024-06-04 | SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices | Ruslan Svirschevski et.al. | 2406.02532 | link |
2024-06-03 | Demystifying Platform Requirements for Diverse LLM Inference Use Cases | Abhimanyu Bambhaniya et.al. | 2406.01698 | link |
2024-06-03 | PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration | Ziqian Zeng et.al. | 2406.01394 | null |
2024-06-01 | A Practice-Friendly Two-Stage LLM-Enhanced Paradigm in Sequential Recommendation | Dugang Liu et.al. | 2406.00333 | null |
2024-05-31 | No Free Lunch Theorem for Privacy-Preserving LLM Inference | Xiaojin Zhang et.al. | 2405.20681 | null |
2024-05-30 | Decentralized AI: Permissionless LLM Inference on POKT Network | Daniel Olshansky et.al. | 2405.20450 | null |
2024-06-01 | S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs | Wei Zhong et.al. | 2405.20314 | null |
2024-05-30 | Deciphering Human Mobility: Inferring Semantics of Trajectories with Large Language Models | Yuxiao Luo et.al. | 2405.19850 | null |
2024-05-29 | MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models | Taehyun Kim et.al. | 2405.18832 | null |
2024-05-29 | PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN | Fei Zheng et.al. | 2405.18744 | null |
2024-06-02 | Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference | Hao Mark Chen et.al. | 2405.18628 | link |
2024-05-25 | FastQuery: Communication-efficient Embedding Table Query for Private LLM Inference | Chenqi Lin et.al. | 2405.16241 | null |
2024-05-23 | EdgeShard: Efficient LLM Inference via Collaborative Edge Computing | Mingjin Zhang et.al. | 2405.14371 | null |
2024-05-23 | MiniCache: KV Cache Compression in Depth Dimension for Large Language Models | Akide Liu et.al. | 2405.14366 | null |
2024-05-21 | PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference | Dongjie Yang et.al. | 2405.12532 | null |
2024-05-12 | Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization | Xinyuan Zhang et.al. | 2405.07140 | null |
2024-05-11 | Aladdin: Joint Placement and Scaling for SLO-Aware LLM Serving | Chengyi Nie et.al. | 2405.06856 | null |
2024-05-21 | Vidur: A Large-Scale Simulation Framework For LLM Inference | Amey Agrawal et.al. | 2405.05465 | link |
2024-05-13 | KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation | Minsik Cho et.al. | 2405.05329 | null |
2024-05-12 | DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer’s Disease Questions with Scientific Literature | Dawei Li et.al. | 2405.04819 | link |
2024-05-10 | QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | Yujun Lin et.al. | 2405.04532 | link |
2024-05-07 | vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention | Ramya Prabhu et.al. | 2405.04437 | link |
2024-05-07 | Optimizing Language Model’s Reasoning Abilities with Weak Supervision | Yongqi Tong et.al. | 2405.04086 | null |
2024-05-06 | AlphaMath Almost Zero: process Supervision without process | Guoxin Chen et.al. | 2405.03553 | link |
2024-05-03 | Efficient and Economic Large Language Model Inference with Attention Offloading | Shaoyuan Chen et.al. | 2405.01814 | null |
<a href=#updated-on-20250628>(back to top)</a>
MoE
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-26 | Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts | Jiajie Yang et.al. | 2506.21328 | null |
2025-06-26 | Learning to Skip the Middle Layers of Transformers | Tim Lawson et.al. | 2506.21103 | null |
2025-06-26 | Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning | Haodong Lu et.al. | 2506.21035 | null |
2025-06-26 | EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning | Xiao Zhang et.al. | 2506.20986 | null |
2025-06-25 | Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration | Jiaxing Huang et.al. | 2506.20282 | null |
2025-06-23 | Multimodal Anomaly Detection with a Mixture-of-Experts | Christoph Willibald et.al. | 2506.19077 | null |
2025-06-23 | Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models | Zihan Wang et.al. | 2506.18945 | null |
2025-06-23 | Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning | Rahul Atul Bhope et.al. | 2506.18789 | null |
2025-06-23 | An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify | Shivam Verma et.al. | 2506.18735 | null |
2025-06-23 | Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks | Xiaodong Wu et.al. | 2506.18543 | null |
2025-06-23 | SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation | Zichong Li et.al. | 2506.18349 | null |
2025-06-23 | Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies | Junchao Fan et.al. | 2506.18304 | null |
2025-06-22 | Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection | Zheng Zhan et.al. | 2506.18145 | null |
2025-06-21 | Incorporating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Expert | Gelei Xu et.al. | 2506.17787 | null |
2025-06-21 | Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities | Xinghao Huang et.al. | 2506.17755 | null |
2025-06-21 | PDC-Net: Pattern Divide-and-Conquer Network for Pelvic Radiation Injury Segmentation | Xinyu Xiong et.al. | 2506.17712 | null |
2025-06-20 | SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification | Zhenglin Lai et.al. | 2506.17368 | null |
2025-06-19 | FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE | Khiem Le et.al. | 2506.16600 | null |
2025-06-19 | Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models | Daniel Fidel Harvey et.al. | 2506.16419 | null |
2025-06-17 | Scaling Intelligence: Designing Data Centers for Next-Gen Language Models | Jesmin Jahan Tithi et.al. | 2506.15006 | null |
2025-06-17 | NeuroMoE: A Transformer-Based Mixture-of-Experts Framework for Multi-Modal Neurological Disorder Classification | Wajih Hassan Raza et.al. | 2506.14970 | null |
2025-06-17 | GMT: General Motion Tracking for Humanoid Whole-Body Control | Zixuan Chen et.al. | 2506.14770 | null |
2025-06-17 | Exploring Speaker Diarization with Mixture of Experts | Gaobin Yang et.al. | 2506.14750 | null |
2025-06-18 | Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs | Ling Team et.al. | 2506.14731 | null |
2025-06-17 | GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors | Hengyuan Zhang et.al. | 2506.14646 | null |
2025-06-17 | Single-Example Learning in a Mixture of GPDMs with Latent Geometries | Jesse St. Amand et.al. | 2506.14563 | null |
2025-06-17 | MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models | Hongyu Wang et.al. | 2506.14435 | null |
2025-06-16 | Load Balancing Mixture of Experts with Similarity Preserving Routers | Nabil Omi et.al. | 2506.14038 | null |
2025-06-16 | GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics | Qianzhong Chen et.al. | 2506.14009 | null |
2025-06-16 | MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention | MiniMax et.al. | 2506.13585 | link |
2025-06-16 | Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization | Guanghui Song et.al. | 2506.13541 | null |
2025-06-16 | EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization | Zhongqian Fu et.al. | 2506.13329 | link |
2025-06-16 | Breaking Thought Patterns: A Multi-Dimensional Reasoning Framework for LLMs | Xintong Tang et.al. | 2506.13192 | null |
2025-06-15 | Serving Large Language Models on Huawei CloudMatrix384 | Pengfei Zuo et.al. | 2506.12708 | null |
2025-06-14 | Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts | Shengzhuang Chen et.al. | 2506.12597 | null |
2025-06-14 | Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control | Rongpeng Li et.al. | 2506.12453 | null |
2025-06-17 | HarMoEny: Efficient Multi-GPU Inference of MoE Models | Zachary Doucet et.al. | 2506.12417 | null |
2025-06-14 | Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model | Chong Li et.al. | 2506.12388 | null |
2025-06-13 | Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources? | Houyi Li et.al. | 2506.12119 | null |
2025-06-13 | Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution | Zhangkai Ni et.al. | 2506.11823 | link |
2025-06-12 | Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts | Zaijing Li et.al. | 2506.10357 | null |
2025-06-11 | GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture | GigaChat team et.al. | 2506.09440 | null |
2025-06-11 | DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts | Yuchen Feng et.al. | 2506.09351 | null |
2025-06-10 | CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA | Jiale Dong et.al. | 2506.08496 | link |
2025-06-11 | MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding | Shivang Chopra et.al. | 2506.08356 | null |
2025-06-11 | STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation | Yiming Wang et.al. | 2506.08054 | link |
2025-06-09 | A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling | Jacob Helwig et.al. | 2506.07969 | link |
2025-06-09 | M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration | Yongzhen Wang et.al. | 2506.07814 | null |
2025-06-11 | MIRA: Medical Time Series Foundation Model for Real-World Health Data | Hao Li et.al. | 2506.07584 | null |
2025-06-11 | MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization | Ken Yaggel et.al. | 2506.07563 | link |
2025-06-09 | MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts | Wei Tao et.al. | 2506.07533 | null |
2025-06-09 | MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing | Haiyue Ma et.al. | 2506.07366 | null |
2025-06-08 | UNO: Unified Self-Supervised Monocular Odometry for Platform-Agnostic Deployment | Wentao Zhao et.al. | 2506.07013 | null |
2025-06-07 | High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations | Ziwei Li et.al. | 2506.06858 | null |
2025-06-07 | Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning | Yuan Yuan et.al. | 2506.06694 | null |
2025-06-06 | Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization | Jonathan Yang et.al. | 2506.06196 | null |
2025-06-06 | MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models | Jie Cao et.al. | 2506.05928 | null |
2025-06-06 | dots.llm1 Technical Report | Bi Huo et.al. | 2506.05767 | null |
2025-06-05 | Mixture-of-Experts Meets In-Context Reinforcement Learning | Wenhao Wu et.al. | 2506.05426 | null |
2025-06-05 | Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection | Ziyi Zhou et.al. | 2506.04739 | null |
2025-06-05 | FlashDMoE: Fast Distributed MoE in a Single Kernel | Osayamen Jonathan Aimuyo et.al. | 2506.04667 | link |
2025-06-04 | Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts | Jiaxing Zhang et.al. | 2506.03591 | null |
2025-06-04 | PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs | Ze Yu Zhang et.al. | 2506.02965 | null |
2025-06-03 | Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights | Jakub Krajewski et.al. | 2506.02890 | null |
2025-06-03 | Brain-Like Processing Pathways Form in Models With Heterogeneous Experts | Jack Cook et.al. | 2506.02813 | null |
2025-06-04 | MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection | Juntong Li et.al. | 2506.02535 | null |
2025-06-03 | MidPO: Dual Preference Optimization for Safety and Helpfulness in Large Language Models via a Mixture of Experts Framework | Yupeng Qi et.al. | 2506.02460 | null |
2025-05-31 | Enhancing Multimodal Continual Instruction Tuning with BranchLoRA | Duzhen Zhang et.al. | 2506.02041 | null |
2025-06-02 | SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model | Zhao Yang et.al. | 2506.01833 | link |
2025-06-02 | Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning | Ryotaro Kawata et.al. | 2506.01656 | null |
2025-06-02 | DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models | Jiancheng Ye et.al. | 2506.01257 | null |
2025-06-01 | Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts | Fan Liu et.al. | 2506.00965 | null |
2025-05-30 | Mixture-of-Experts for Personalized and Semantic-Aware Next Location Prediction | Shuai Liu et.al. | 2505.24597 | null |
2025-05-30 | Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis | Junzhuo Li et.al. | 2505.24593 | null |
2025-05-30 | Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer | Yilun Kong et.al. | 2505.24378 | link |
2025-05-30 | GradPower: Powering Gradients for Faster Language Model Pre-Training | Mingze Wang et.al. | 2505.24275 | null |
2025-05-30 | On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks | Mingze Wang et.al. | 2505.24205 | null |
2025-05-29 | Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts | Xuweiyi Chen et.al. | 2505.23926 | null |
2025-06-03 | Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert | Zhaokun Wang et.al. | 2505.23868 | null |
2025-05-29 | From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents | Tobias Lindenbauer et.al. | 2505.23422 | link |
2025-05-29 | Context-Aware Semantic Communication for the Wireless Networks | Guangyuan Liu et.al. | 2505.23249 | null |
2025-05-29 | Two Is Better Than One: Rotations Scale LoRAs | Hongcan Guo et.al. | 2505.23184 | null |
2025-05-28 | HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer | Qi Cai et.al. | 2505.22705 | link |
2025-05-28 | Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts | Xue Zhang et.al. | 2505.22582 | null |
2025-05-28 | A Human-Centric Approach to Explainable AI for Personalized Education | Vinitra Swamy et.al. | 2505.22541 | link |
2025-05-28 | Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion | Kewen Chen et.al. | 2505.22360 | null |
2025-05-28 | Advancing Expert Specialization for Better MoE | Hongcan Guo et.al. | 2505.22323 | null |
2025-05-28 | ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation | Jiawen Yu et.al. | 2505.22159 | null |
2025-05-28 | AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation | Yan Rong et.al. | 2505.22053 | null |
2025-05-28 | Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge | Zhongyi Zhou et.al. | 2505.21906 | null |
2025-05-27 | MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis | Yitong Li et.al. | 2505.21698 | null |
2025-05-28 | Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity | Yehui Tang et.al. | 2505.21411 | null |
2025-05-27 | Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM’s Instruction-Following Capabilities | Junyan Zhang et.al. | 2505.21191 | null |
2025-05-27 | Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts | Yue Zhang et.al. | 2505.21079 | null |
2025-05-27 | Multi-objective Large Language Model Alignment with Hierarchical Experts | Zhuo Li et.al. | 2505.20925 | null |
2025-05-26 | FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models | Hao Kang et.al. | 2505.20225 | link |
2025-05-26 | NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID | Shihao Li et.al. | 2505.20001 | null |
2025-05-26 | Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments | Junming Liu et.al. | 2505.19699 | null |
2025-05-26 | MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE | Zongle Huang et.al. | 2505.19645 | null |
2025-05-26 | Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate | Liangwei Nathan Zheng et.al. | 2505.19525 | link |
2025-05-26 | WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference | Sihan Chen et.al. | 2505.19427 | link |
2025-05-25 | RankLLM: A Python Package for Reranking with LLMs | Sahel Sharifymoghaddam et.al. | 2505.19284 | null |
2025-05-25 | I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts | Jiayi Xin et.al. | 2505.19190 | link |
2025-05-24 | TrajMoE: Spatially-Aware Mixture of Experts for Unified Human Mobility Modeling | Chonghua Han et.al. | 2505.18670 | null |
2025-05-24 | ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation | Jian Liang et.al. | 2505.18640 | link |
2025-05-24 | Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter | Weizhi Zhong et.al. | 2505.18612 | null |
2025-05-23 | Enhancing CTR Prediction with De-correlated Expert Networks | Jiancheng Wang et.al. | 2505.17925 | null |
2025-05-23 | PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval | Zehua Pei et.al. | 2505.17639 | null |
2025-05-23 | CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning | Jinyuan Feng et.al. | 2505.17553 | null |
2025-05-23 | MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation | Kaixing Yang et.al. | 2505.17543 | null |
2025-05-22 | JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model | Qihao Duan et.al. | 2505.17257 | null |
2025-05-22 | DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving | Zhenjie Yang et.al. | 2505.16278 | null |
2025-05-22 | DualComp: End-to-End Learning of a Unified Dual-Modality Lossless Compressor | Yan Zhao et.al. | 2505.16256 | null |
2025-05-21 | Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models | Jingcong Liang et.al. | 2505.16056 | link |
2025-05-21 | MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding | Yuxiang Wei et.al. | 2505.15946 | null |
2025-05-21 | CoLA: Collaborative Low-Rank Adaptation | Yiyun Zhou et.al. | 2505.15471 | link |
2025-05-22 | Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought | Tencent Hunyuan Team et.al. | 2505.15431 | null |
2025-05-21 | Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks | Uranik Berisha et.al. | 2505.15414 | null |
2025-05-21 | Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines | Xiaohou Shi et.al. | 2505.15151 | null |
2025-05-20 | Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies | Haoyi Qiu et.al. | 2505.14972 | link |
2025-05-20 | Balanced and Elastic End-to-end Training of Dynamic LLMs | Mohamed Wahib et.al. | 2505.14864 | null |
2025-05-20 | Solving MNIST with a globally trained Mixture of Quantum Experts | Paolo Alessandro Xavier Tognini et.al. | 2505.14789 | null |
2025-05-20 | Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training | Mengru Wang et.al. | 2505.14681 | null |
2025-05-21 | Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach | Umberto Cappellazzo et.al. | 2505.14336 | null |
2025-05-20 | FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation | Shaolin Zhu et.al. | 2505.14256 | null |
2025-05-20 | THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation | Yunlong Liang et.al. | 2505.14173 | null |
2025-05-20 | Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition | Shuo Zhang et.al. | 2505.14143 | null |
2025-05-20 | Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging | Ryo Bertolissi et.al. | 2505.14136 | null |
2025-05-20 | StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning | Huaijie Wang et.al. | 2505.13997 | null |
2025-05-20 | Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting | Bao-Ngoc Dao et.al. | 2505.13944 | link |
2025-05-20 | U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding | Ziqian Wang et.al. | 2505.13880 | link |
2025-05-20 | EfficientLLM: Efficiency in Large Language Models | Zhengqing Yuan et.al. | 2505.13840 | null |
2025-05-19 | CompeteSMoE – Statistically Guaranteed Mixture of Experts Training via Competition | Nam V. Nguyen et.al. | 2505.13380 | link |
2025-05-19 | Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference | Shuqing Luo et.al. | 2505.13345 | link |
2025-05-19 | Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models | Lucas Berry et.al. | 2505.13273 | null |
2025-05-19 | True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics | Christoph Jürgen Hemmer et.al. | 2505.13192 | null |
2025-05-19 | Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures | Tuan Thai et.al. | 2505.13052 | null |
2025-05-18 | Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented Optimization | Hongbiao Zhu et.al. | 2505.12311 | null |
2025-05-20 | Model Merging in Pre-training of Large Language Models | Yunshui Li et.al. | 2505.12082 | null |
2025-05-20 | Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition | Runduo Han et.al. | 2505.12007 | link |
2025-05-17 | MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging | Zihuan Qiu et.al. | 2505.11883 | null |
2025-05-17 | Improving Coverage in Combined Prediction Sets with Weighted p-values | Gina Wong et.al. | 2505.11785 | null |
2025-05-16 | MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production | Chao Jin et.al. | 2505.11432 | null |
2025-05-16 | MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems | Yinsicheng Jiang et.al. | 2505.11415 | null |
2025-05-16 | A Fast Kernel-based Conditional Independence test with Application to Causal Discovery | Oliver Schacht et.al. | 2505.11085 | null |
2025-05-16 | On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating | Huy Nguyen et.al. | 2505.10860 | null |
2025-05-14 | PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning | Zongqian Li et.al. | 2505.09519 | link |
2025-05-14 | Qwen3 Technical Report | An Yang et.al. | 2505.09388 | link |
2025-05-14 | Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures | Chenggang Zhao et.al. | 2505.09343 | null |
2025-05-13 | Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony | Shaoyu Wang et.al. | 2505.08944 | null |
2025-05-13 | PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts | Yang Su et.al. | 2505.08719 | null |
2025-05-13 | AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale | Yunjie Ji et.al. | 2505.08311 | null |
2025-05-12 | UMoE: Unifying Attention and FFN with Shared Experts | Yuanhang Yang et.al. | 2505.07260 | null |
2025-05-11 | Seed1.5-VL Technical Report | Dong Guo et.al. | 2505.07062 | null |
2025-05-11 | FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers | Tianyu Chen et.al. | 2505.06858 | null |
2025-05-11 | The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts | Enric Boix-Adsera et.al. | 2505.06839 | null |
2025-05-10 | Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free | Zihan Qiu et.al. | 2505.06708 | link |
2025-05-10 | Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding | Dawei Huang et.al. | 2505.06685 | link |
2025-05-10 | QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration | HamidReza Imani et.al. | 2505.06481 | null |
2025-05-12 | FloE: On-the-Fly MoE Inference on Memory-constrained GPU | Yuxin Zhou et.al. | 2505.05950 | null |
2025-05-09 | MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design | Haojie Duanmu et.al. | 2505.05799 | link |
2025-05-08 | Divide-and-Conquer: Cold-Start Bundle Recommendation via Mixture of Diffusion Experts | Ming Li et.al. | 2505.05035 | null |
2025-05-07 | Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs | Yehui Tang et.al. | 2505.04519 | null |
2025-05-07 | SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios | Ning Cheng et.al. | 2505.04201 | null |
2025-05-07 | LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress? | Teddy Foley et.al. | 2505.04075 | link |
2025-05-07 | Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications | Yuanai Xie et.al. | 2505.04068 | null |
2025-05-06 | Towards Smart Point-and-Shoot Photography | Jiawan Li et.al. | 2505.03638 | null |
2025-05-06 | Faster MoE LLM Inference for Extremely Large Models | Haoqi Yang et.al. | 2505.03531 | null |
2025-05-06 | STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation | Maolin Wang et.al. | 2505.03484 | null |
2025-05-06 | 3D Gaussian Splatting Data Compression with Mixture of Priors | Lei Liu et.al. | 2505.03310 | null |
2025-05-05 | Finger Pose Estimation for Under-screen Fingerprint Sensor | Xiongjun Guan et.al. | 2505.02481 | link |
2025-05-05 | Multimodal Deep Learning-Empowered Beam Prediction in Future THz ISAC Systems | Kai Zhang et.al. | 2505.02381 | null |
2025-05-05 | Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques | Sanjay Surendranath Girija et.al. | 2505.02309 | null |
2025-05-04 | Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields | Zhenxing Mi et.al. | 2505.02005 | link |
2025-05-03 | Backdoor Attacks Against Patch-based Mixture of Experts | Cedric Chan et.al. | 2505.01811 | link |
2025-05-01 | MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling | Abdoul Majid O. Thiombiano et.al. | 2505.01459 | null |
2025-05-02 | Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders | Rogelio A Mancisidor et.al. | 2505.01134 | null |
2025-05-02 | CoCoAFusE: Beyond Mixtures of Experts via Model Fusion | Aurelio Raffa Ugolini et.al. | 2505.01105 | null |
2025-05-01 | Improving Routing in Sparse Mixture of Experts with Graph of Tokens | Tam Nguyen et.al. | 2505.00792 | null |
2025-05-01 | CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series | Tian Lan et.al. | 2505.00415 | null |
2025-05-01 | Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing | Piotr Piękos et.al. | 2505.00315 | link |
2025-04-30 | Online Federation For Mixtures of Proprietary Agents with Black-Box Encoders | Xuwei Yang et.al. | 2505.00216 | null |
2025-04-29 | TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts | Pradip Kunwar et.al. | 2504.21190 | null |
2025-04-29 | Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization | Shuai Gong et.al. | 2504.21063 | null |
2025-04-26 | PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight | Ben Goertzel et.al. | 2504.21029 | null |
2025-04-29 | MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification | Yichu Xu et.al. | 2504.20509 | null |
2025-04-29 | FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks | Wenjing Xiao et.al. | 2504.20446 | null |
2025-04-29 | MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation | Amaan Izhar et.al. | 2504.20343 | link |
2025-04-28 | Accelerating Mixture-of-Experts Training with Adaptive Expert Replication | Athinagoras Skiadopoulos et.al. | 2504.19925 | null |
2025-04-28 | Decentralization of Generative AI via Mixture of Experts for Wireless Networks: A Comprehensive Survey | Yunting Xu et.al. | 2504.19660 | null |
2025-04-28 | ARTEMIS: Autoregressive End-to-End Trajectory Planning with Mixture of Experts for Autonomous Driving | Renju Feng et.al. | 2504.19580 | link |
2025-04-29 | BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts | Qingyue Wang et.al. | 2504.18598 | null |
2025-04-25 | NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation | Rob Romijnders et.al. | 2504.18147 | null |
2025-04-28 | Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection | Haokai Zhang et.al. | 2504.17834 | link |
2025-04-22 | Compass-V2 Technical Report | Sophia Maria et.al. | 2504.15527 | null |
2025-04-21 | Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images | Jonathan Brokman et.al. | 2504.15470 | link |
2025-04-17 | D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving | Haodong Wang et.al. | 2504.15299 | null |
2025-04-23 | MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core | Dennis Liu et.al. | 2504.14960 | null |
2025-04-18 | Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts | Jie Zou et.al. | 2504.13655 | null |
2025-04-18 | HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering | Alexander Rusnak et.al. | 2504.13590 | null |
2025-04-18 | Dense Backpropagation Improves Training for Sparse Mixture-of-Experts | Ashwinee Panda et.al. | 2504.12463 | link |
2025-04-16 | Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models | Yuanbo Tang et.al. | 2504.12359 | null |
2025-04-16 | Trend Filtered Mixture of Experts for Automated Gating of High-Frequency Flow Cytometry Data | Sangwon Hyun et.al. | 2504.12287 | null |
2025-04-16 | MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models | Hang Yuan et.al. | 2504.12234 | null |
2025-04-15 | Simulation-based inference for stochastic nonlinear mixed-effects models with applications in systems biology | Henrik Häggström et.al. | 2504.11279 | link |
2025-04-14 | Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning | LeiLei Ma et.al. | 2504.09990 | null |
2025-04-14 | Multi-objective Bayesian Optimization With Mixed-categorical Design Variables for Expensive-to-evaluate Aeronautical Applications | Nathalie Bartoli et.al. | 2504.09930 | null |
2025-04-14 | Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming | Zhiqiang He et.al. | 2504.09906 | null |
2025-04-13 | Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation | Jia Wei et.al. | 2504.09601 | null |
2025-04-12 | MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints | Yichao Yuan et.al. | 2504.09345 | null |
2025-04-12 | Mixture of Group Experts for Learning Invariant Representations | Lei Kang et.al. | 2504.09265 | null |
2025-04-11 | RouterKT: Mixture-of-Experts for Knowledge Tracing | Han Liao et.al. | 2504.08989 | link |
2025-04-11 | Regularized infill criteria for multi-objective Bayesian optimization with application to aircraft design | Robin Grapin et.al. | 2504.08671 | null |
2025-04-10 | C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing | Zhongyang Li et.al. | 2504.07964 | link |
2025-04-11 | Scaling Laws for Native Multimodal Models | Mustafa Shukor et.al. | 2504.07951 | null |
2025-04-10 | Cluster-Driven Expert Pruning for Mixture-of-Experts Large Language Models | Hongcheng Guo et.al. | 2504.07807 | link |
2025-04-10 | Adaptive Detection of Fast Moving Celestial Objects Using a Mixture of Experts and Physical-Inspired Neural Network | Peng Jia et.al. | 2504.07777 | null |
2025-04-10 | Kimi-VL Technical Report | Kimi Team et.al. | 2504.07491 | link |
2025-04-09 | MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution | Zhe Wang et.al. | 2504.07308 | link |
2025-04-11 | Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models | Ling Team et.al. | 2504.07158 | null |
2025-04-09 | Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations | Zican Dong et.al. | 2504.06792 | null |
2025-04-09 | FedMerge: Federated Personalization via Model Merging | Shutong Chen et.al. | 2504.06768 | null |
2025-04-08 | S’MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning | Hanqing Zeng et.al. | 2504.06426 | null |
2025-04-08 | HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference | Shuzhang Zhong et.al. | 2504.05897 | link |
2025-04-08 | Adaptive Substructure-Aware Expert Model for Molecular Property Prediction | Tianyi Jiang et.al. | 2504.05844 | null |
2025-04-10 | Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations | Ajay Jaiswal et.al. | 2504.05586 | null |
2025-04-07 | SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement | Zuying Xie et.al. | 2504.04818 | null |
2025-04-06 | On the Spatial Structure of Mixture-of-Experts in Transformers | Daniel Bershatsky et.al. | 2504.04444 | null |
2025-04-05 | Collaboration and Controversy Among Experts: Rumor Early Detection by Tuning a Comment Generator | Bing Wang et.al. | 2504.04076 | link |
2025-04-04 | HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs | Yongji Wu et.al. | 2504.03871 | null |
2025-04-01 | Detecting Financial Fraud with Hybrid Deep Learning: A Mix-of-Experts Approach to Sequential and Anomalous Patterns | Diego Vallarino et.al. | 2504.03750 | null |
2025-04-04 | RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation | Hanbo Bi et.al. | 2504.03166 | null |
2025-04-03 | TeleMoM: Consensus-Driven Telecom Intelligence via Mixture of Models | Xinquan Wang et.al. | 2504.02712 | null |
2025-04-07 | MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators | Beichen Huang et.al. | 2504.02658 | link |
2025-04-07 | MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism | Ruidong Zhu et.al. | 2504.02263 | null |
2025-04-02 | Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design | Mohan Zhang et.al. | 2504.01337 | null |
2025-04-01 | Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function | Qiuchen Song et.al. | 2504.00819 | null |
2025-04-01 | DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism | Dengchun Li et.al. | 2504.00661 | link |
2025-04-01 | Continual Cross-Modal Generalization | Yan Xia et.al. | 2504.00561 | null |
2025-04-01 | Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection | Shunxin Chen et.al. | 2504.00458 | null |
2025-03-31 | Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion | Jiagen Li et.al. | 2503.23721 | null |
2025-03-30 | Mixture of Routers | Jia-Chen Zhang et.al. | 2503.23362 | null |
2025-03-29 | Beyond Standard MoE: Mixture of Latent Experts for Resource-Efficient Language Models | Zehua Liu et.al. | 2503.23100 | null |
2025-03-29 | S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning | Giang Do et.al. | 2503.23007 | null |
2025-03-29 | Sparse Mixture of Experts as Unified Competitive Learning | Giang Do et.al. | 2503.22996 | null |
2025-04-01 | Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities | Raman Dutt et.al. | 2503.22517 | null |
2025-03-27 | RocketPPA: Ultra-Fast LLM-Based PPA Estimator at Code-Level Abstraction | Armin Abdollahi et.al. | 2503.21971 | null |
2025-03-27 | iMedImage Technical Report | Ran Wei et.al. | 2503.21836 | null |
2025-03-27 | LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models | Hengyuan Zhao et.al. | 2503.21227 | null |
2025-03-26 | Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework | Soham Sane et.al. | 2503.20750 | null |
2025-03-26 | UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines | Chen Tang et.al. | 2503.20748 | null |
2025-03-26 | Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning | Sashuai Zhou et.al. | 2503.20633 | null |
2025-03-26 | MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation | Rongyu Zhang et.al. | 2503.20384 | null |
2025-03-26 | Modality-Independent Brain Lesion Segmentation with Privacy-aware Continual Learning | Yousef Sadegheih et.al. | 2503.20326 | link |
2025-03-25 | Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion | Konyul Park et.al. | 2503.19776 | null |
2025-03-25 | BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts | Suzhe Xu et.al. | 2503.19769 | null |
2025-03-25 | M $^2$ CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation | Ziyuan Liu et.al. | 2503.19406 | null |
2025-03-27 | Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design | Rui Xie et.al. | 2503.18869 | null |
2025-03-24 | Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding | Tianyu Chen et.al. | 2503.18578 | null |
2025-03-24 | SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking | Wenrui Cai et.al. | 2503.18338 | null |
2025-03-23 | Challenging Dataset and Multi-modal Gated Mixture of Experts Model for Remote Sensing Copy-Move Forgery Understanding | Ze Zhang et.al. | 2503.18104 | link |
2025-03-22 | Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM | Codefuse et.al. | 2503.17793 | null |
2025-03-25 | Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts | Yike Yuan et.al. | 2503.16057 | null |
2025-03-21 | UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations | Debabrata Mandal et.al. | 2503.15868 | null |
2025-03-20 | Mixture of Lookup Experts | Shibo Jie et.al. | 2503.15798 | link |
2025-03-21 | Leveraging MoE-based Large Language Model for Zero-Shot Multi-Task Semantic Communication | Sin-Yu Huang et.al. | 2503.15722 | null |
2025-03-19 | SemEval-2025 Task 1: AdMIRe – Advancing Multimodal Idiomaticity Representation | Thomas Pickard et.al. | 2503.15358 | null |
2025-03-21 | Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action Recognition | Seungyeon Cho et.al. | 2503.14960 | null |
2025-03-18 | Core-Periphery Principle Guided State Space Model for Functional Connectome Classification | Minheng Chen et.al. | 2503.14655 | null |
2025-03-18 | MAST-Pro: Dynamic Mixture-of-Experts for Adaptive Segmentation of Pan-Tumors with Knowledge-Driven Prompts | Runqi Meng et.al. | 2503.14355 | null |
2025-03-18 | SNAKE: A Sustainable and Multi-functional Traffic Analysis System utilizing Specialized Large-Scale Models with a Mixture of Experts Architecture | Tian Qin et.al. | 2503.13808 | null |
2025-03-17 | Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge | Shengling Qin et.al. | 2503.13421 | null |
2025-03-17 | Channel Estimation for Pinching-Antenna Systems (PASS) | Jian Xiao et.al. | 2503.13268 | null |
2025-03-17 | Federated Mixture-of-Expert for Non-Overlapped Cross-Domain Sequential Recommendation | Yu Liu et.al. | 2503.13254 | null |
2025-03-16 | Fast filtering of non-Gaussian models using Amortized Optimal Transport Maps | Mohammad Al-Jarrah et.al. | 2503.12633 | link |
2025-03-16 | MoECollab: Democratizing LLM Development Through Collaborative Mixture of Experts | Harshit et.al. | 2503.12592 | null |
2025-03-16 | MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification | Jianwei Zhao et.al. | 2503.12401 | null |
2025-03-15 | Adaptive Mixture of Experts Learning for Robust Audio Spoofing Detection | Qixian Chen et.al. | 2503.12010 | null |
2025-03-14 | FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-the-World LoRA | Jieming Bian et.al. | 2503.11880 | null |
2025-03-14 | A Review of DeepSeek Models’ Key Innovative Techniques | Chengen Wang et.al. | 2503.11486 | null |
2025-03-14 | MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling | Rachel S. Y. Teo et.al. | 2503.11144 | link |
2025-03-13 | Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores | Chenpeng Wu et.al. | 2503.10725 | link |
2025-03-14 | dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis | Luyuan Xie et.al. | 2503.10412 | null |
2025-03-13 | StableFusion: Continual Video Retrieval via Frame Adaptation | Zecheng Zhao et.al. | 2503.10111 | link |
2025-03-12 | Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework | Bakary Badjie et.al. | 2503.09504 | null |
2025-03-12 | Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment | Nazanin Moradinasab et.al. | 2503.09498 | link |
2025-03-12 | Astrea: A MOE-based Visual Understanding Model with Progressive Alignment | Xiaoda Yang et.al. | 2503.09445 | null |
2025-03-12 | Automatic Operator-level Parallelism Planning for Distributed Deep Learning – A Mixed-Integer Programming Approach | Ruifeng She et.al. | 2503.09357 | null |
2025-03-12 | Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference | Mohammad Siavashi et.al. | 2503.09304 | null |
2025-03-13 | FaVChat: Unlocking Fine-Grained Facial Video Understanding with Multimodal Large Language Models | Fufangchen Zhao et.al. | 2503.09158 | null |
2025-03-11 | MoE-Loco: Mixture of Experts for Multitask Locomotion | Runhan Huang et.al. | 2503.08564 | null |
2025-03-11 | Accelerating MoE Model Inference with Expert Sharding | Oana Balmau et.al. | 2503.08467 | null |
2025-03-11 | Uni $\textbf{F}^2$ ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models | Junzhe Li et.al. | 2503.08120 | null |
2025-03-11 | MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models | Han Zhao et.al. | 2503.08007 | null |
2025-03-10 | GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts | Minwen Liao et.al. | 2503.07417 | null |
2025-03-10 | A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications | Siyuan Mu et.al. | 2503.07137 | link |
2025-03-10 | VMTS: Vision-Assisted Teacher-Student Reinforcement Learning for Multi-Terrain Locomotion in Bipedal Robots | Fu Chen et.al. | 2503.07049 | link |
2025-03-10 | ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration | Mengting Ai et.al. | 2503.06881 | link |
2025-03-10 | eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference | Suraiya Tairin et.al. | 2503.06823 | null |
2025-03-09 | MoFE: Mixture of Frozen Experts Architecture | Jean Seo et.al. | 2503.06491 | null |
2025-03-09 | Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models | Nguyen Do et.al. | 2503.06413 | link |
2025-03-08 | MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering | Vinay Kumar Verma et.al. | 2503.06296 | null |
2025-03-08 | A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts | Wenzhuo Du et.al. | 2503.06064 | null |
2025-03-08 | MANDARIN: Mixture-of-Experts Framework for Dynamic Delirium and Coma Prediction in ICU Patients: Development and Validation of an Acute Brain Dysfunction Prediction Model | Miguel Contreras et.al. | 2503.06059 | null |
2025-03-07 | Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning | Justin Chih-Yao Chen et.al. | 2503.05641 | null |
2025-03-07 | FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework | Jingyu Xu et.al. | 2503.05626 | null |
2025-03-07 | Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts | Weigao Sun et.al. | 2503.05447 | link |
2025-03-07 | Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs | Ling Team et.al. | 2503.05139 | null |
2025-03-07 | Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts | Shwai He et.al. | 2503.05066 | null |
2025-03-06 | Continual Pre-training of MoEs: How robust is your router? | Benjamin Thérien et.al. | 2503.05029 | null |
2025-03-06 | Predictable Scale: Part I – Optimal Hyperparameter Scaling Law in Large Language Model Pretraining | Houyi Li et.al. | 2503.04715 | null |
2025-03-07 | Question-Aware Gaussian Experts for Audio-Visual Question Answering | Hongyeob Kim et.al. | 2503.04459 | link |
2025-03-07 | Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling | Yan Li et.al. | 2503.04398 | null |
2025-03-06 | A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery | Yiheng Zhu et.al. | 2503.04362 | null |
2025-03-06 | DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval | Yating Liu et.al. | 2503.04144 | null |
2025-03-05 | VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection | Enkhtogtokh Togootogtokh et.al. | 2503.03797 | link |
2025-03-05 | Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs | Haoran Fan et.al. | 2503.03594 | link |
2025-03-05 | Convergence Rates for Softmax Gating Mixture of Experts | Huy Nguyen et.al. | 2503.03213 | null |
2025-03-04 | MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation | Weihang Wang et.al. | 2503.02799 | link |
2025-03-04 | FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting | Congluo Xu et.al. | 2503.02692 | null |
2025-03-04 | Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer | Yujiao Yang et.al. | 2503.02495 | link |
2025-03-04 | Tabby: Tabular Data Synthesis with Language Models | Sonia Cromp et.al. | 2503.02152 | null |
2025-03-03 | ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition | Nastaran Mansourian et.al. | 2503.01750 | null |
2025-03-03 | Effective High-order Graph Representation Learning for Credit Card Fraud Detection | Yao Zou et.al. | 2503.01556 | null |
2025-03-03 | DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models | Yongqi Huang et.al. | 2503.01359 | null |
2025-03-03 | PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation | Linhai Zhang et.al. | 2503.01303 | null |
2025-03-03 | Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting | Xiaobin Hong et.al. | 2503.01157 | null |
2025-03-02 | Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion | Daiki Nishiyama et.al. | 2503.00925 | null |
2025-03-01 | R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts | Zhongyang Li et.al. | 2502.20395 | link |
2025-02-27 | Mixture of Experts for Recognizing Depression from Interview and Reading Tasks | Loukas Ilias et.al. | 2502.20213 | null |
2025-02-27 | Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems | Zeyi Ren et.al. | 2502.20183 | null |
2025-02-27 | UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook | Yidi Jiang et.al. | 2502.20067 | null |
2025-03-01 | Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | Shulai Zhang et.al. | 2502.19811 | link |
2025-02-26 | Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization | Taishi Nakamura et.al. | 2502.19261 | null |
2025-02-26 | OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment | Jiaxin Deng et.al. | 2502.18965 | null |
2025-02-25 | Generative AI-enabled Wireless Communications for Robust Low-Altitude Economy Networking | Changyuan Zhao et.al. | 2502.18118 | null |
2025-02-24 | The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE | Andrei Chernov et.al. | 2502.17391 | null |
2025-02-24 | Delta Decompression for MoE-based LLMs Compression | Hao Gu et.al. | 2502.17298 | link |
2025-02-24 | Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks | Andrei Chernov et.al. | 2502.17187 | null |
2025-02-24 | Muon is Scalable for LLM Training | Jingyuan Liu et.al. | 2502.16982 | link |
2025-02-24 | BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference | Zewen Jin et.al. | 2502.16927 | null |
2025-02-24 | ENACT-Heart – ENsemble-based Assessment Using CNN and Transformer on Heart Sounds | Jiho Han et.al. | 2502.16914 | null |
2025-02-26 | Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment | Chenghao Fan et.al. | 2502.16894 | link |
2025-02-22 | An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning | Masoud Shokrnezhad et.al. | 2502.16198 | null |
2025-02-21 | A fast convergence algorithm based on binary integer programming for expert load balancing in MoE LLMs | Yuan Sun et.al. | 2502.15451 | link |
2025-02-21 | Tight Clusters Make Specialized Experts | Stefan K. Nielsen et.al. | 2502.15315 | link |
2025-02-21 | Multimodal Graph-Based Variational Mixture of Experts Network for Zero-Shot Multimodal Information Extraction | Baohang Zhou et.al. | 2502.15290 | link |
2025-02-20 | Ray-Tracing for Conditionally Activated Neural Networks | Claudio Gallicchio et.al. | 2502.14788 | null |
2025-02-21 | ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model | Zhongyi Zhou et.al. | 2502.14420 | link |
2025-02-19 | Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts | Xin Li et.al. | 2502.13577 | null |
2025-02-18 | MoBA: Mixture of Block Attention for Long-Context LLMs | Enzhe Lu et.al. | 2502.13189 | link |
2025-02-18 | Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models | Gyeongman Kim et.al. | 2502.12947 | null |
2025-02-18 | DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs | Minxuan Lv et.al. | 2502.12455 | null |
2025-02-17 | From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs | Kumari Nishu et.al. | 2502.12325 | null |
2025-02-17 | Accurate Expert Predictions in MoE Inference via Cross-Layer Gate | Zhiyuan Fang et.al. | 2502.12224 | null |
2025-02-17 | How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines | Ayan Sengupta et.al. | 2502.12051 | null |
2025-02-17 | Connector-S: A Survey of Connectors in Multi-modal Large Language Models | Xun Zhu et.al. | 2502.11453 | null |
2025-02-16 | Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time | Robert Dahlke et.al. | 2502.11096 | null |
2025-02-16 | ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models | Shixuan Li et.al. | 2502.11059 | null |
2025-02-15 | Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization | Matthew Lyle Olson et.al. | 2502.10928 | null |
2025-02-12 | Heterogeneous Mixture of Experts for Remote Sensing Image Super-Resolution | Bowen Chen et.al. | 2502.09654 | link |
2025-02-14 | Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting | Nicholas Dronen et.al. | 2502.09500 | link |
2025-02-12 | The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities | Ning Li et.al. | 2502.08381 | null |
2025-02-12 | Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification | Xuanze Chen et.al. | 2502.08083 | null |
2025-02-13 | Training Sparse Mixture Of Experts Text Embedding Models | Zach Nussbaum et.al. | 2502.07972 | link |
2025-02-11 | Memory Analysis on the Training Course of DeepSeek Models | Ping Zhang et.al. | 2502.07846 | null |
2025-02-11 | MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks | Lotfi Abdelkrim Mecharbat et.al. | 2502.07422 | null |
2025-02-11 | Online Aggregation of Trajectory Predictors | Alex Tong et.al. | 2502.07178 | null |
2025-02-09 | Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline | Zhiyuan Fang et.al. | 2502.06888 | null |
2025-02-10 | MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing | Seokjin Go et.al. | 2502.06643 | null |
2025-02-10 | Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE | Haiduo Huang et.al. | 2502.06282 | link |
2025-02-10 | Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models | Peiran Wang et.al. | 2502.06094 | null |
2025-02-08 | Mol-MoE: Training Preference-Guided Routers for Molecule Generation | Diego Calanzone et.al. | 2502.05633 | link |
2025-02-08 | UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA | Jiale Dong et.al. | 2502.05602 | link |
2025-02-07 | fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving | Hanfei Yu et.al. | 2502.05370 | null |
2025-02-07 | Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts | Roussel Desmond Nzoyem et.al. | 2502.05335 | null |
2025-02-07 | Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient | Jan Ludziejewski et.al. | 2502.05172 | null |
2025-02-06 | Mixture of neural operator experts for learning boundary conditions and model selection | Dwyer Deighan et.al. | 2502.04562 | null |
2025-02-06 | CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference | Zehua Pei et.al. | 2502.04416 | link |
2025-02-06 | Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning | Peizhuang Cong et.al. | 2502.03884 | null |
2025-02-05 | (GG) MoE vs. MLP on Tabular Data | Andrei Chernov et.al. | 2502.03608 | null |
2025-02-05 | RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts | Tuan Truong et.al. | 2502.03044 | null |
2025-02-05 | On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation | Nghiem T. Diep et.al. | 2502.03029 | null |
2025-02-05 | Scaling Laws for Upcycling Mixture-of-Experts Language Models | Seng Pei Liew et.al. | 2502.03009 | null |
2025-02-04 | ReGNet: Reciprocal Space-Aware Long-Range Modeling and Multi-Property Prediction for Crystals | Jianan Nie et.al. | 2502.02748 | null |
2025-02-04 | Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism | Yuhao Qing et.al. | 2502.02581 | null |
2025-02-05 | Brief analysis of DeepSeek R1 and its implications for Generative AI | Sarah Mercer et.al. | 2502.02523 | null |
2025-02-04 | M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference | Nikhil Bhendawade et.al. | 2502.02040 | null |
2025-02-05 | MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation | Haibo Tong et.al. | 2502.01719 | null |
2025-02-04 | MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs | Yuhang Zhou et.al. | 2502.00997 | null |
2025-02-03 | CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling | Xinze Wang et.al. | 2502.00965 | null |
2025-02-02 | UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs | Yufei He et.al. | 2502.00806 | link |
2025-02-02 | Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective | Yujin Oh et.al. | 2502.00619 | link |
2025-02-01 | PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning | Yu Feng et.al. | 2502.00354 | link |
2025-02-01 | Sigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts Perspective | Fanqi Yan et.al. | 2502.00281 | null |
2025-01-31 | Pheromone-based Learning of Optimal Reasoning Paths | Anirudh Chari et.al. | 2501.19278 | null |
2025-01-31 | Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning | Minh Le et.al. | 2501.18936 | null |
2025-01-30 | MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability | Yan Sun et.al. | 2501.18439 | null |
2025-01-29 | Free Agent in Agent-Based Mixture-of-Experts Generative AI Framework | Jung-Hua Liu et.al. | 2501.17903 | null |
2025-01-29 | Heuristic-Informed Mixture of Experts for Link Prediction in Multilayer Networks | Lucio La Cava et.al. | 2501.17557 | null |
2025-01-28 | 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow | Yueen Ma et.al. | 2501.16698 | null |
2025-01-27 | MoEVD: Enhancing Vulnerability Detection by Mixture-of-Experts (MoE) | Xu Yang et.al. | 2501.16454 | null |
2025-01-27 | Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference | Yinghan Li et.al. | 2501.16103 | null |
2025-01-25 | ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning | Shangqian Gao et.al. | 2501.15316 | null |
2025-01-25 | FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of Experts | Ziqi Liu et.al. | 2501.15125 | link |
2025-01-25 | Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning | Ziyu Zhao et.al. | 2501.15103 | null |
2025-01-24 | Mean-field limit from general mixtures of experts to quantum neural networks | Anderson Melchor Hernandez et.al. | 2501.14660 | null |
2025-01-24 | Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation | Shengzhe Zhang et.al. | 2501.14269 | link |
2025-01-24 | Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images | Zeyun Deng et.al. | 2501.14198 | null |
2025-01-23 | CSAOT: Cooperative Multi-Agent System for Active Object Tracking | Hy Nguyen et.al. | 2501.13994 | null |
2025-01-22 | Autonomy-of-Experts Models | Ang Lv et.al. | 2501.13074 | null |
2025-01-22 | LLM4WM: Adapting LLM for Wireless Multi-Tasking | Xuanyu Liu et.al. | 2501.12983 | null |
2025-01-22 | UniUIR: Considering Underwater Image Restoration as An All-in-One Learner | Xu Zhang et.al. | 2501.12981 | null |
2025-01-22 | BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR | Guodong Ma et.al. | 2501.12602 | null |
2025-01-21 | Modality Interactive Mixture-of-Experts for Fake News Detection | Yifan Liu et.al. | 2501.12431 | link |
2025-01-21 | SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection | Xiaocheng Zhang et.al. | 2501.12430 | null |
2025-01-21 | Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models | Samira Abnar et.al. | 2501.12370 | null |
2025-01-21 | MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks | Qishen Zhou et.al. | 2501.12281 | link |
2025-01-21 | Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models | Zihan Qiu et.al. | 2501.11873 | null |
2025-01-18 | FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models | Xinglin Pan et.al. | 2501.10714 | null |
2025-01-17 | OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning | Jinyuan Feng et.al. | 2501.10062 | null |
2025-01-17 | LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading | Kuan-Ming Liu et.al. | 2501.09636 | null |
2025-01-14 | MiniMax-01: Scaling Foundation Models with Lightning Attention | MiniMax et.al. | 2501.08313 | null |
2025-01-14 | GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism | Chen Tang et.al. | 2501.07890 | null |
2025-01-18 | PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration | Xiaoshui Huang et.al. | 2501.07762 | null |
2025-01-13 | A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis | Binyu Zhang et.al. | 2501.07016 | link |
2025-01-12 | Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning | Hanwen Zhong et.al. | 2501.06884 | link |
2025-01-10 | TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning | Yinghao Zhu et.al. | 2501.05661 | link |
2025-01-09 | Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing | Mengfan Liu et.al. | 2501.05313 | null |
2025-01-07 | LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes | Xiang Xu et.al. | 2501.04004 | link |
2025-01-07 | mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training | Xudong Liao et.al. | 2501.03905 | null |
2025-01-08 | Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection | Donatella Genovese et.al. | 2501.03432 | null |
2025-01-12 | Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning | Zhongyi Zhou et.al. | 2501.02198 | null |
2025-01-03 | MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders | Jiajun Cao et.al. | 2501.01709 | null |
2025-01-01 | REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization | Huyen Nguyen et.al. | 2501.00779 | null |
2025-01-06 | Superposition in Transformers: A Novel Way of Building Mixture of Experts | Ayoub Ben Chaliah et.al. | 2501.00530 | link |
2024-12-31 | CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection | Xiaolei Wang et.al. | 2501.00346 | null |
2024-12-29 | Multimodal Variational Autoencoder: a Barycentric View | Peijie Qiu et.al. | 2412.20487 | null |
2024-12-29 | A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement | Sidra Nasir et.al. | 2412.20468 | null |
2024-12-28 | UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity | Jingbo Lin et.al. | 2412.20157 | link |
2024-12-28 | Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection | Yaning Zhang et.al. | 2412.20156 | null |
2024-12-27 | DeepSeek-V3 Technical Report | DeepSeek-AI et.al. | 2412.19437 | link |
2024-12-26 | AskChart: Universal Chart Understanding through Textual Enhancement | Xudong Yang et.al. | 2412.19146 | link |
2024-12-30 | Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection | Xiaoyu Huang et.al. | 2412.19108 | null |
2024-12-24 | Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making | David Shoresh et.al. | 2412.18593 | link |
2024-12-24 | BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing | Yingjie Ma et.al. | 2412.18065 | link |
2024-12-23 | UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition | Li Fu et.al. | 2412.17507 | null |
2024-12-23 | BrainMAP: Learning Multiple Activation Pathways in Brain Networks | Song Wang et.al. | 2412.17404 | link |
2024-12-22 | Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models | Elie Antoine et.al. | 2412.16971 | null |
2024-12-20 | Theory of Mixture-of-Experts for Mobile Edge Computing | Hongbo Li et.al. | 2412.15690 | null |
2024-12-19 | MoEtion: Efficient and Reliable Checkpointing for Mixture-of-Experts Models at Scale | Swapnil Gandhi et.al. | 2412.15411 | null |
2024-12-19 | Qwen2.5 Technical Report | Qwen et.al. | 2412.15115 | link |
2024-12-19 | ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing | Ziteng Wang et.al. | 2412.14711 | link |
2024-12-18 | A Survey on Inference Optimization Techniques for Mixture of Experts Models | Jiacheng Liu et.al. | 2412.14219 | link |
2024-12-18 | SEKE: Specialised Experts for Keyword Extraction | Matej Martinc et.al. | 2412.14087 | link |
2024-12-18 | MedCoT: Medical Chain of Thought via Hierarchical Expert | Jiaxiang Liu et.al. | 2412.13736 | link |
2024-12-17 | SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks | Mátyás Vincze et.al. | 2412.13053 | link |
2024-12-17 | Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning | Moritz Reuss et.al. | 2412.12953 | null |
2024-12-17 | CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition | He Wang et.al. | 2412.12760 | null |
2024-12-16 | Investigating Mixture of Experts in Dense Retrieval | Effrosyni Sokli et.al. | 2412.11864 | null |
2024-12-18 | Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture | Jingze Shi et.al. | 2412.11834 | link |
2024-12-16 | Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation | Svetlana Pavlitska et.al. | 2412.11608 | link |
2024-12-16 | Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture | Jingyu Xu et.al. | 2412.11557 | null |
2024-12-14 | DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification | Yuhao Wang et.al. | 2412.10650 | link |
2024-12-13 | DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding | Zhiyu Wu et.al. | 2412.10302 | link |
2024-12-13 | Llama 3 Meets MoE: Efficient Upcycling | Aditya Vavre et.al. | 2412.09952 | link |
2024-12-12 | Memory Layers at Scale | Vincent-Pierre Berges et.al. | 2412.09764 | link |
2024-12-12 | Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Xiaoshuang Huang et.al. | 2412.09278 | link |
2024-12-12 | Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective | Minh Le et.al. | 2412.08285 | null |
2024-12-11 | Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification | Xuanze Chen et.al. | 2412.08193 | link |
2024-12-10 | MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems | Yao Fu et.al. | 2412.07067 | null |
2024-12-07 | Partition of Unity Physics-Informed Neural Networks (POU-PINNs): An Unsupervised Framework for Physics-Informed Domain Decomposition and Mixtures of Experts | Arturo Rodriguez et.al. | 2412.06842 | null |
2024-12-09 | Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset | Xiao Wang et.al. | 2412.06647 | link |
2024-12-09 | UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts | Zhen Wan et.al. | 2412.06340 | null |
2024-12-08 | Hallucination-aware Optimization for Large Language Model-empowered Communications | Yinqiu Liu et.al. | 2412.06007 | link |
2024-12-10 | An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism | Qing Zhang et.al. | 2412.05821 | null |
2024-12-10 | RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts | Xu Liu et.al. | 2412.05679 | link |
2024-12-07 | SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts | Gengze Zhou et.al. | 2412.05552 | link |
2024-12-07 | Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers | Boxun Xu et.al. | 2412.05540 | null |
2024-12-06 | Steps are all you need: Rethinking STEM Education with Prompt Engineering | Krishnasai Addala et.al. | 2412.05023 | null |
2024-12-09 | Monet: Mixture of Monosemantic Experts for Transformers | Jungwoo Park et.al. | 2412.04139 | link |
2024-12-05 | Meta-Reinforcement Learning With Mixture of Experts for Generalizable Multi Access in Heterogeneous Wireless Networks | Zhaoyang Liu et.al. | 2412.03850 | null |
2024-12-04 | Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond | Loukas Ilias et.al. | 2412.03483 | null |
2024-12-05 | MQFL-FHE: Multimodal Quantum Federated Learning Framework with Fully Homomorphic Encryption | Siddhant Dutta et.al. | 2412.01858 | null |
2024-12-05 | Yi-Lightning Technical Report | 01. AI et.al. | 2412.01253 | null |
2024-11-30 | Mixture of Experts for Node Classification | Yu Shi et.al. | 2412.00418 | null |
2024-11-30 | HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting | Shaohan Yu et.al. | 2412.00316 | null |
2024-11-27 | Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference | Andrii Skliar et.al. | 2412.00099 | null |
2024-11-29 | LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References | Shuguo Jiang et.al. | 2411.19758 | null |
2024-11-28 | On the effectiveness of discrete representations in sparse mixture of experts | Giang Do et.al. | 2411.19402 | null |
2024-11-28 | Bayesian Cluster Weighted Gaussian Models | Panagiotis Papastamoulis et.al. | 2411.18957 | link |
2024-11-27 | UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS | Haomin Zhuang et.al. | 2411.18797 | null |
2024-11-27 | Complexity Experts are Task-Discriminative Learners for Any Image Restoration | Eduard Zamfir et.al. | 2411.18466 | null |
2024-11-27 | Mixture of Experts in Image Classification: What’s the Sweet Spot? | Mathurin Videau et.al. | 2411.18322 | null |
2024-11-26 | $H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs | Selim Furkan Tekin et.al. | 2411.17792 | link |
2024-11-25 | Staleness-Centric Optimizations for Efficient Diffusion MoE Inference | Jiajun Luo et.al. | 2411.16786 | null |
2024-11-29 | MH-MoE: Multi-Head Mixture-of-Experts | Shaohan Huang et.al. | 2411.16205 | null |
2024-11-25 | LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy | Peng Cui et.al. | 2411.16095 | null |
2024-11-24 | Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution | Haiquan Wang et.al. | 2411.15871 | null |
2024-11-24 | LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training | Xiaoye Qu et.al. | 2411.15708 | link |
2024-11-23 | Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts | Qizhou Chen et.al. | 2411.15432 | null |
2024-11-23 | Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation | Fahao Chen et.al. | 2411.15419 | null |
2024-11-20 | MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification | Yuxuan Chen et.al. | 2411.13004 | null |
2024-11-23 | KAAE: Numerical Reasoning for Knowledge Graphs via Knowledge-aware Attributes Learning | Ming Yin et.al. | 2411.12950 | null |
2024-11-19 | Ultra-Sparse Memory Network | Zihao Huang et.al. | 2411.12364 | null |
2024-11-18 | MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs | Shiyi Cao et.al. | 2411.11217 | null |
2024-11-16 | Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts | Jinqiang Long et.al. | 2411.10669 | link |
2024-11-15 | Weakly-Supervised Multimodal Learning on MIMIC-CXR | Andrea Agostini et.al. | 2411.10356 | link |
2024-11-21 | Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models | Wei Wang et.al. | 2411.10003 | null |
2024-11-13 | Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection | Vima Gupta et.al. | 2411.08982 | null |
2024-11-13 | Sparse Upcycling: Inference Inefficient Finetuning | Sasha Doubov et.al. | 2411.08968 | null |
2024-11-13 | LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing | Xiaonan Nie et.al. | 2411.08446 | null |
2024-11-12 | Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach | Renzi Wang et.al. | 2411.08232 | null |
2024-11-12 | PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model | Yilun Liu et.al. | 2411.08212 | null |
2024-11-12 | Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge | Emmanuel Azuh Mensah et.al. | 2411.07834 | null |
2024-11-11 | Adaptive Conditional Expert Selection Network for Multi-domain Recommendation | Kuiyao Dong et.al. | 2411.06826 | null |
2024-11-11 | WDMoE: Wireless Distributed Mixture of Experts for Large Language Models | Nan Xue et.al. | 2411.06681 | null |
2024-11-09 | Learning Mixtures of Experts with EM | Quentin Fruytier et.al. | 2411.06056 | null |
2024-11-08 | NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts | Yen-Ting Lin et.al. | 2411.05945 | null |
2024-11-05 | DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts | Zelin Yao et.al. | 2411.03025 | link |
2024-11-05 | Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts | Yuan Xie et.al. | 2411.02787 | null |
2024-11-06 | Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent | Xingwu Sun et.al. | 2411.02265 | null |
2024-11-04 | FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation | Ziwei Zhan et.al. | 2411.02115 | null |
2024-11-03 | RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering | Hui Lin et.al. | 2411.01595 | null |
2024-11-03 | Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation | Mingrui Liu et.al. | 2411.01457 | null |
2024-11-06 | HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference | Peng Tang et.al. | 2411.01433 | null |
2024-11-07 | HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy | Shuqing Luo et.al. | 2411.01288 | link |
2024-11-02 | PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment | Dongxu Liu et.al. | 2411.01245 | null |
2024-11-01 | MoE-I $^2$ : Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition | Cheng Yang et.al. | 2411.01016 | null |
2024-11-01 | LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models | Nam V. Nguyen et.al. | 2411.00918 | link |
2024-11-01 | MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization | Jingming Guo et.al. | 2411.00662 | link |
2024-10-31 | Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts | Xiang Deng et.al. | 2410.23836 | null |
2024-10-30 | Efficient and Interpretable Grammatical Error Correction with Mixture of Experts | Muhammad Reza Qorib et.al. | 2410.23507 | link |
2024-10-30 | Stealing User Prompts from Mixture of Experts | Itay Yona et.al. | 2410.22884 | null |
2024-10-30 | MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning | Xujia Wang et.al. | 2410.22782 | null |
2024-10-29 | ProMoE: Fast MoE-based LLM Serving using Proactive Caching | Xiaoniu Song et.al. | 2410.22134 | null |
2024-10-29 | Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging | Li Shen et.al. | 2410.21804 | null |
2024-10-29 | Neural Experts: Mixture of Experts for Implicit Neural Representations | Yizhak Ben-Shabat et.al. | 2410.21643 | null |
2024-10-28 | FinTeamExperts: Role Specialized MOEs For Financial Analysis | Yue Yu et.al. | 2410.21338 | null |
2024-10-28 | Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving | Jiyao Wang et.al. | 2410.21086 | null |
2024-10-27 | Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation | Maohao Shen et.al. | 2410.20336 | null |
2024-10-27 | GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields | Yusuke Sekikawa et.al. | 2410.20306 | null |
2024-10-25 | DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction | Zelin Zang et.al. | 2410.19504 | link |
2024-10-25 | Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis | Weikai Li et.al. | 2410.19225 | link |
2024-10-24 | Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design | Ruisi Cai et.al. | 2410.19123 | link |
2024-10-24 | Mixture of Parrots: Experts improve memorization more than reasoning | Samy Jelassi et.al. | 2410.19034 | null |
2024-10-24 | MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases | Zhisheng Lin et.al. | 2410.18406 | null |
2024-10-23 | Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches | Kexin Feng et.al. | 2410.18298 | null |
2024-10-23 | MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning | Jingfan Zhang et.al. | 2410.18035 | null |
2024-10-23 | ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference | Xin He et.al. | 2410.17954 | null |
2024-10-23 | Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition | Artem Basharin et.al. | 2410.17765 | null |
2024-10-22 | Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling | Jialong Li et.al. | 2410.17043 | null |
2024-10-21 | LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset | Ruikun Zhang et.al. | 2410.16095 | link |
2024-10-22 | CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts | Zhenpeng Su et.al. | 2410.16077 | link |
2024-10-21 | Generalizing Motion Planners with Mixture of Experts for Autonomous Driving | Qiao Sun et.al. | 2410.15774 | link |
2024-10-21 | ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts | Xumeng Han et.al. | 2410.15732 | null |
2024-10-20 | Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs | Xin Zhou et.al. | 2410.15438 | null |
2024-10-20 | LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration | Yuang Ai et.al. | 2410.15385 | link |
2024-10-19 | MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning | Suning Huang et.al. | 2410.14972 | null |
2024-10-18 | MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts | Rachel S. Y. Teo et.al. | 2410.14574 | link |
2024-10-18 | ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction | Haoyu He et.al. | 2410.14099 | link |
2024-10-17 | Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks | Jinze Zhao et.al. | 2410.13964 | null |
2024-10-16 | On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs | Herun Wan et.al. | 2410.12600 | null |
2024-10-16 | Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts | Fanqi Yan et.al. | 2410.12258 | null |
2024-10-16 | EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference | Yulei Qian et.al. | 2410.12247 | null |
2024-10-15 | MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router | Yanyue Xie et.al. | 2410.12013 | null |
2024-10-15 | MoH: Multi-Head Attention as Mixture-of-Head Attention | Peng Jin et.al. | 2410.11842 | link |
2024-10-15 | GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation | Fei Tang et.al. | 2410.11841 | link |
2024-10-15 | Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models | James Vo et.al. | 2410.11654 | null |
2024-10-16 | Quadratic Gating Functions in Mixture of Experts: A Statistical Insight | Pedram Akbarian et.al. | 2410.11222 | null |
2024-10-16 | Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free | Ziyue Li et.al. | 2410.10814 | link |
2024-10-14 | Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts | Guorui Zheng et.al. | 2410.10626 | link |
2024-10-14 | Learning to Ground VLMs without Forgetting | Aritra Bhowmik et.al. | 2410.10491 | null |
2024-10-14 | Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts | Xu Liu et.al. | 2410.10469 | null |
2024-10-15 | Ada-K Routing: Boosting the Efficiency of MoE-based LLMs | Tongtian Yue et.al. | 2410.10456 | null |
2024-10-14 | Tighter Risk Bounds for Mixtures of Experts | Wissam Akretche et.al. | 2410.10397 | null |
2024-10-14 | Scalable Multi-Domain Adaptation of Language Models using Modular Experts | Peter Schafhalter et.al. | 2410.10181 | null |
2024-10-14 | Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models | Jun Luo et.al. | 2410.10114 | link |
2024-10-14 | AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality | Peijun Qing et.al. | 2410.10054 | link |
2024-10-13 | ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL | Zhanqiu Guo et.al. | 2410.09781 | null |
2024-10-11 | Semi-Supervised Learning of Noisy Mixture of Experts Models | Oh-Ran Kwon et.al. | 2410.09039 | null |
2024-10-11 | Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering | I-Chun Chen et.al. | 2410.08589 | link |
2024-10-10 | Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts | Sukwon Yun et.al. | 2410.08245 | link |
2024-10-10 | Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training | Gen Luo et.al. | 2410.08202 | null |
2024-10-10 | Efficient Dictionary Learning with Switch Sparse Autoencoders | Anish Mudide et.al. | 2410.08201 | link |
2024-10-10 | More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing | Sagi Shaier et.al. | 2410.08003 | link |
2024-10-10 | SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture | Jiayi Han et.al. | 2410.07739 | null |
2024-10-10 | Upcycling Large Language Models into Mixture of Experts | Ethan He et.al. | 2410.07524 | null |
2024-10-09 | MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts | Peng Jin et.al. | 2410.07348 | link |
2024-10-09 | Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders | David Noever et.al. | 2410.06462 | null |
2024-10-09 | Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs | Ruijia Niu et.al. | 2410.06431 | null |
2024-10-08 | Probing the Robustness of Theory of Mind in Large Language Models | Christian Nickel et.al. | 2410.06271 | null |
2024-10-08 | MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More | Wei Huang et.al. | 2410.06270 | link |
2024-10-08 | Aria: An Open Multimodal Native Mixture-of-Experts Model | Dongxu Li et.al. | 2410.05993 | link |
2024-10-08 | Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models | Siqi Wang et.al. | 2410.05661 | null |
2024-10-07 | Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild | Xinyu Zhao et.al. | 2410.05357 | link |
2024-10-07 | Multimodal Fusion Strategies for Mapping Biophysical Landscape Features | Lucia Gordon et.al. | 2410.04833 | link |
2024-10-06 | Realizing Video Summarization from the Path of Language-based Semantic Understanding | Kuan-Chen Mu et.al. | 2410.04511 | null |
2024-10-09 | Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding | Wei Wu et.al. | 2410.03553 | null |
2024-10-04 | Exploring the Benefit of Activation Sparsity in Pre-training | Zhengyan Zhang et.al. | 2410.03440 | link |
2024-10-03 | MLP-KAN: Unifying Deep Representation and Function Learning | Yunhong He et.al. | 2410.03027 | link |
2024-10-03 | On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions | Huy Nguyen et.al. | 2410.02935 | null |
2024-10-03 | Neutral residues: revisiting adapters for model extension | Franck Signe Talla et.al. | 2410.02744 | null |
2024-10-03 | Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping | Ziye Huang et.al. | 2410.02475 | null |
2024-10-03 | MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction | Zhaojian Yu et.al. | 2410.02241 | null |
2024-10-03 | Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts | Minh Le et.al. | 2410.02200 | link |
2024-10-04 | Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices | Andres Potapczynski et.al. | 2410.02117 | link |
2024-10-04 | EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing | Haotian Sun et.al. | 2410.02098 | null |
2024-10-02 | Don’t flatten, tokenize! Unlocking the key to SoftMoE’s efficacy in deep RL | Ghada Sokar et.al. | 2410.01930 | null |
2024-10-02 | Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models | Shayekh Bin Islam et.al. | 2410.01782 | link |
2024-10-02 | Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging | Tingfeng Hui et.al. | 2410.01610 | null |
2024-10-02 | The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs | Hong Li et.al. | 2410.01417 | null |
2024-10-01 | MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards | Sheng Wang et.al. | 2410.00938 | null |
2024-10-01 | UniAdapt: A Universal Adapter for Knowledge Calibration | Tai D. Nguyen et.al. | 2410.00454 | null |
2024-10-01 | Robust Traffic Forecasting against Spatial Shift over Years | Hongjun Wang et.al. | 2410.00373 | link |
2024-09-29 | IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method | Chaohui Xu et.al. | 2410.00059 | null |
2024-09-30 | MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning | Haotian Zhang et.al. | 2409.20566 | null |
2024-10-02 | CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling | Jihai Zhang et.al. | 2409.19291 | link |
2024-09-27 | SciDFM: A Large Language Model with Mixture-of-Experts for Science | Liangtai Sun et.al. | 2409.18412 | null |
2024-09-26 | Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE | Xun Zhu et.al. | 2409.17508 | link |
2024-09-26 | A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction | Guangyu Wang et.al. | 2409.17440 | link |
2024-09-24 | Leveraging Mixture of Experts for Improved Speech Deepfake Detection | Viola Negroni et.al. | 2409.16077 | null |
2024-10-02 | Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts | Xiaoming Shi et.al. | 2409.16040 | link |
2024-09-24 | Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM | Fengrun Zhang et.al. | 2409.15905 | null |
2024-09-24 | Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks | Jiayi He et.al. | 2409.15695 | null |
2024-09-23 | A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts | Hugo Inzirillo et.al. | 2409.15161 | link |
2024-09-23 | Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond | Hong Chen et.al. | 2409.14993 | null |
2024-09-21 | Routing in Sparsely-gated Language Models responds to Context | Stefan Arnold et.al. | 2409.14107 | null |
2024-09-20 | On-device Collaborative Language Modeling via a Mixture of Generalists and Specialists | Dongyang Fan et.al. | 2409.13931 | link |
2024-09-20 | Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning | Annette Spooner et.al. | 2409.13791 | null |
2024-09-19 | Robust Audiovisual Speech Recognition Models with Mixture-of-Experts | Yihan Wu et.al. | 2409.12370 | null |
2024-09-18 | GRIN: GRadient-INformed MoE | Liyuan Liu et.al. | 2409.12136 | null |
2024-09-18 | Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0 | Zhiyong Wang et.al. | 2409.11909 | link |
2024-09-17 | LPT++: Efficient Training on Mixture of Long-tailed Experts | Bowen Dong et.al. | 2409.11323 | null |
2024-09-19 | LOLA – An Open-Source Massively Multilingual Large Language Model | Nikit Srivastava et.al. | 2409.11272 | link |
2024-09-16 | Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression | Yi-Hsin Li et.al. | 2409.10101 | null |
2024-09-14 | MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving | Enming Zhang et.al. | 2409.07267 | link |
2024-09-10 | DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models | Maryam Akhavan Aghdam et.al. | 2409.06669 | null |
2024-09-10 | STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning | Jaeseong Lee et.al. | 2409.06211 | null |
2024-09-10 | VE: Modeling Multivariate Time Series Correlation with Variate Embedding | Shangjiong Wang et.al. | 2409.06169 | link |
2024-09-09 | Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models | Hongyang Lei et.al. | 2409.05929 | link |
2024-09-09 | Optical Spiking Neurons Enable High-Speed and Energy-Efficient Optical Neural Networks | Bo Xu et.al. | 2409.05726 | null |
2024-09-09 | Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection | Tianwu Lei et.al. | 2409.05611 | null |
2024-09-05 | Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions | Zemian Ke et.al. | 2409.03282 | null |
2024-09-05 | ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding | Zhengzhuo Xu et.al. | 2409.03277 | null |
2024-09-05 | xLAM: A Family of Large Action Models to Empower AI Agent Systems | Jianguo Zhang et.al. | 2409.03215 | link |
2024-09-04 | Configurable Foundation Models: Building LLMs from a Modular Perspective | Chaojun Xiao et.al. | 2409.02877 | null |
2024-09-04 | Pluralistic Salient Object Detection | Xuelu Feng et.al. | 2409.02368 | null |
2024-09-03 | OLMoE: Open Mixture-of-Experts Language Models | Niklas Muennighoff et.al. | 2409.02060 | link |
2024-09-05 | Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model | Hukai Huang et.al. | 2409.02050 | null |
2024-09-02 | Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning | Soumajyoti Sarkar et.al. | 2409.01483 | null |
2024-09-02 | Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching | Sungmin Yun et.al. | 2409.01141 | null |
2024-09-04 | Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack | Guanzhong Chen et.al. | 2409.00960 | link |
2024-09-02 | Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts | Youngseog Chung et.al. | 2409.00879 | null |
2024-08-29 | Gradient-free variational learning with conditional mixture networks | Conor Heins et.al. | 2408.16429 | link |
2024-08-28 | Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models | Yuncheng Yang et.al. | 2408.15915 | link |
2024-08-28 | Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts | Nikolas Gritsch et.al. | 2408.15901 | null |
2024-08-28 | LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation | Fangxun Shu et.al. | 2408.15881 | link |
2024-08-28 | Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts | Lean Wang et.al. | 2408.15664 | null |
2024-08-27 | Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis | Sakhinana Sagar Srinivas et.al. | 2408.15305 | null |
2024-08-27 | MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce | Hao Jiang et.al. | 2408.14968 | null |
2024-08-24 | Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings | Sagar Srinivas Sakhinana et.al. | 2408.13622 | null |
2024-08-23 | The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities | Venkatesh Balavadhani Parthasarathy et.al. | 2408.13296 | null |
2024-08-23 | Guiding IoT-Based Healthcare Alert Systems with Large Language Models | Yulan Gao et.al. | 2408.13071 | null |
2024-08-23 | DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation | Xiaowei Mao et.al. | 2408.12809 | link |
2024-08-23 | Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth | Yuxiang Wei et.al. | 2408.12803 | null |
2024-08-23 | La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection | Hang Zou et.al. | 2408.12793 | null |
2024-08-22 | SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging | Mohammadreza Pourreza et.al. | 2408.12733 | null |
2024-08-22 | Jamba-1.5: Hybrid Transformer-Mamba Models at Scale | Jamba Team et.al. | 2408.12570 | null |
2024-08-22 | Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators | Dingkang Yang et.al. | 2408.12325 | link |
2024-08-21 | MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing | Hao Zhou et.al. | 2408.11396 | link |
2024-08-21 | KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting? | Xiao Han et.al. | 2408.11306 | link |
2024-08-21 | FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts | Hanzi Mei et.al. | 2408.11304 | null |
2024-08-20 | Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data | Atmika Gorti et.al. | 2408.11247 | null |
2024-08-20 | Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting | Jianxiang Zhou et.al. | 2408.10822 | link |
2024-08-20 | AnyGraph: Graph Foundation Model in the Wild | Lianghao Xia et.al. | 2408.10700 | link |
2024-08-20 | HMoE: Heterogeneous Mixture of Experts for Language Modeling | An Wang et.al. | 2408.10681 | null |
2024-08-19 | AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference | Shuzhang Zhong et.al. | 2408.10284 | link |
2024-08-17 | FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models | Xiaochen Wang et.al. | 2408.10276 | link |
2024-08-19 | Customizing Language Models with Instance-wise LoRA for Sequential Recommendation | Xiaoyu Kong et.al. | 2408.10159 | link |
2024-08-19 | A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method | Hang Zou et.al. | 2408.09752 | null |
2024-08-16 | Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection | Haohao Zhu et.al. | 2408.08551 | link |
2024-08-17 | BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts | Qizhen Zhang et.al. | 2408.08274 | null |
2024-08-14 | Beyond Inter-Item Relations: Dynamic Adaptive Mixture-of-Experts for LLM-Based Sequential Recommendation | CanYi Liu et.al. | 2408.07427 | null |
2024-08-13 | A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning | Prateek Yadav et.al. | 2408.07057 | null |
2024-08-13 | Layerwise Recurrent Router for Mixture-of-Experts | Zihan Qiu et.al. | 2408.06793 | link |
2024-08-13 | AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies | Bo-Wen Zhang et.al. | 2408.06567 | null |
2024-08-10 | HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou | Xu Wang et.al. | 2408.05430 | null |
2024-08-08 | Understanding the Performance and Estimating the Cost of LLM Fine-Tuning | Yuchen Xia et.al. | 2408.04693 | link |
2024-08-08 | Partial Experts Checkpoint: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training | Weilin Cai et.al. | 2408.04307 | null |
2024-08-08 | LaDiMo: Layer-wise Distillation Inspired MoEfier | Sungyoon Kim et.al. | 2408.04278 | null |
2024-08-07 | MoExtend: Tuning New Experts for Modality and Task Extension | Shanshan Zhong et.al. | 2408.03511 | link |
2024-08-05 | Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization | Changtao Miao et.al. | 2408.02306 | null |
2024-08-02 | HMDN: Hierarchical Multi-Distribution Network for Click-Through Rate Prediction | Xingyu Lou et.al. | 2408.01332 | null |
2024-08-01 | Multimodal Fusion and Coherence Modeling for Video Topic Segmentation | Hai Yu et.al. | 2408.00365 | null |
2024-08-12 | MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts | Xi Victoria Lin et.al. | 2407.21770 | null |
2024-07-31 | PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning | Min Jae Jung et.al. | 2407.21571 | null |
2024-07-30 | Distribution Learning for Molecular Regression | Nima Shoghi et.al. | 2407.20475 | null |
2024-07-29 | Time series forecasting with high stakes: A field study of the air cargo industry | Abhinav Garg et.al. | 2407.20192 | null |
2024-07-30 | Mixture of Nested Experts: Adaptive Processing of Visual Tokens | Gagan Jain et.al. | 2407.19985 | null |
2024-07-28 | Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models | Mohammed Al-Maamari et.al. | 2407.19610 | link |
2024-07-26 | Wolf: Captioning Everything with a World Summarization Framework | Boyi Li et.al. | 2407.18908 | null |
2024-07-26 | MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition | Chang Liu et.al. | 2407.18616 | link |
2024-07-26 | Dynamic Language Group-Based MoE: Enhancing Efficiency and Flexibility for Code-Switching Speech Recognition | Hukai Huang et.al. | 2407.18581 | link |
2024-07-25 | How Lightweight Can A Vision Transformer Be | Jen Hong Tan et.al. | 2407.17783 | null |
2024-07-24 | Exploring Domain Robust Lightweight Reward Models based on Router Mechanism | Hyuk Namgoong et.al. | 2407.17546 | null |
2024-07-24 | M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis | Junyu Li et.al. | 2407.17267 | link |
2024-07-25 | Cheems: Wonderful Matrices More Efficient and More Effective Architecture | Jingze Shi et.al. | 2407.16958 | null |
2024-07-22 | Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget | Vikash Sehwag et.al. | 2407.15811 | link |
2024-07-22 | Norface: Improving Facial Expression Analysis by Identity Normalization | Hanwei Liu et.al. | 2407.15617 | link |
2024-07-19 | Mixture of Experts with Mixture of Precisions for Tuning Quality of Service | HamidReza Imani et.al. | 2407.14417 | null |
2024-07-19 | EVLM: An Efficient Vision-Language Model for Visual Understanding | Kaibing Chen et.al. | 2407.14177 | null |
2024-07-19 | Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models | Qiong Wu et.al. | 2407.14093 | null |
2024-07-18 | Discussion: Effective and Interpretable Outcome Prediction by Training Sparse Mixtures of Linear Experts | Francesco Folino et.al. | 2407.13526 | null |
2024-07-18 | Mixture of Experts based Multi-task Supervise Learning from Crowds | Tao Han et.al. | 2407.13268 | null |
2024-07-15 | MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration | Yulin Ren et.al. | 2407.10833 | null |
2024-07-18 | Qwen2 Technical Report | An Yang et.al. | 2407.10671 | link |
2024-07-15 | Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering | Francesco Di Sario et.al. | 2407.10389 | null |
2024-07-13 | Low-Rank Interconnected Adaptation Across Layers | Yibo Zhong et.al. | 2407.09946 | link |
2024-07-13 | MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts | Zhenpeng Su et.al. | 2407.09816 | link |
2024-07-12 | Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts | Zeliang Zhang et.al. | 2407.09590 | null |
2024-07-11 | An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio | Siding Zeng et.al. | 2407.08239 | null |
2024-07-10 | MoVEInt: Mixture of Variational Experts for Learning Human-Robot Interactions from Demonstrations | Vignesh Prasad et.al. | 2407.07636 | link |
2024-07-10 | Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation | Szymon Płotka et.al. | 2407.07514 | link |
2024-07-09 | A Simple Architecture for Enterprise Large Language Model Applications based on Role based security and Clearance Levels using Retrieval-Augmented Generation or Mixture of Experts | Atilla Özgür et.al. | 2407.06718 | null |
2024-07-06 | SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation | Guoan Wang et.al. | 2407.04938 | null |
2024-07-06 | Completed Feature Disentanglement Learning for Multimodal MRIs Analysis | Tianling Liu et.al. | 2407.04916 | link |
2024-07-05 | YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation | Sungkyun Chang et.al. | 2407.04822 | link |
2024-07-05 | Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement | Yongji Wu et.al. | 2407.04656 | null |
2024-07-05 | MobileFlow: A Multimodal LLM For Mobile GUI Agent | Songqin Nong et.al. | 2407.04346 | null |
2024-07-04 | Mixture of A Million Experts | Xu Owen He et.al. | 2407.04153 | null |
2024-07-02 | Terminating Differentiable Tree Experts | Jonathan Thomm et.al. | 2407.02060 | null |
2024-07-05 | Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models | Zihan Wang et.al. | 2407.01906 | link |
2024-07-01 | Uncertainty Quantification in Table Structure Recognition | Kehinde Ajayi et.al. | 2407.01731 | link |
2024-07-01 | Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning | Yixiao Wang et.al. | 2407.01531 | null |
2024-07-01 | Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation | Nadezhda Chirkova et.al. | 2407.01126 | null |
2024-07-01 | Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs | Enshu Liu et.al. | 2407.00945 | link |
2024-07-03 | Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules | Xinglin Pan et.al. | 2407.00599 | link |
2024-06-28 | One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts | Ruochen Wang et.al. | 2407.00256 | link |
2024-06-28 | LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models | Renzhi Wang et.al. | 2406.20030 | null |
2024-06-28 | Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model | Longrong Yang et.al. | 2406.19905 | link |
2024-06-28 | SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR | Qiuming Zhao et.al. | 2406.19706 | link |
2024-06-27 | A Teacher Is Worth A Million Instructions | Nikhil Kothari et.al. | 2406.19112 | link |
2024-06-27 | Towards Personalized Federated Multi-scenario Multi-task Recommendation | Yue Ding et.al. | 2406.18938 | null |
2024-06-26 | Mixture of Experts in a Mixture of RL settings | Timon Willi et.al. | 2406.18420 | null |
2024-06-26 | A Closer Look into Mixture-of-Experts in Large Language Models | Ka Man Lo et.al. | 2406.18219 | link |
2024-06-26 | SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR | Shuaishuai Ye et.al. | 2406.18021 | null |
2024-06-24 | Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction | Bruce Rushing et.al. | 2406.17150 | link |
2024-06-24 | LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training | Tong Zhu et.al. | 2406.16554 | link |
2024-06-25 | OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser | Jingze Shi et.al. | 2406.16495 | link |
2024-06-24 | Theory on Mixture-of-Experts in Continual Learning | Hongbo Li et.al. | 2406.16437 | null |
2024-06-22 | SimSMoE: Solving Representational Collapse via Similarity Measure | Giang Do et.al. | 2406.15883 | null |
2024-06-20 | Voice Disorder Analysis: a Transformer-based Approach | Alkis Koudounas et.al. | 2406.14693 | link |
2024-06-19 | Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation | Qian Chen et.al. | 2406.13583 | null |
2024-06-19 | AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models | Zihao Zeng et.al. | 2406.13233 | link |
2024-06-18 | Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts | Haoxiang Wang et.al. | 2406.12845 | link |
2024-06-18 | P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts | Yuhao Dan et.al. | 2406.12548 | null |
2024-06-18 | Variational Distillation of Diffusion Policies into Mixture of Experts | Hongyi Zhou et.al. | 2406.12538 | null |
2024-06-18 | GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory | Haoze Wu et.al. | 2406.12375 | link |
2024-06-17 | Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding | Ukyo Honda et.al. | 2406.12060 | link |
2024-06-17 | DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence | DeepSeek-AI et.al. | 2406.11931 | link |
2024-06-17 | Graph Knowledge Distillation to Mixture of Experts | Pavel Rumiantsev et.al. | 2406.11919 | link |
2024-06-17 | $\texttt{MoE-RBench}$ : Towards Building Reliable Language Models with Sparse Mixture-of-Experts | Guanjie Chen et.al. | 2406.11353 | link |
2024-06-17 | Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts | Tong Zhu et.al. | 2406.11256 | link |
2024-06-14 | Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion | Anke Tang et.al. | 2406.09770 | link |
2024-06-13 | DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts | Joel Ong et.al. | 2406.08742 | link |
2024-06-12 | Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark | Pingzhi Li et.al. | 2406.08155 | link |
2024-06-11 | Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters | Yixin Song et.al. | 2406.05955 | null |
2024-06-08 | Flexible and Adaptable Summarization via Expertise Separation | Xiuying Chen et.al. | 2406.05360 | link |
2024-06-07 | MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter | Jitai Hao et.al. | 2406.04984 | link |
2024-06-07 | MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks | Xingkui Zhu et.al. | 2406.04801 | link |
2024-06-05 | Style Mixture of Experts for Expressive Text-To-Speech Synthesis | Ahad Jawaid et.al. | 2406.03637 | null |
2024-06-05 | Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach | Haoyu Han et.al. | 2406.03464 | null |
2024-06-05 | Continual Traffic Forecasting via Mixture of Experts | Sanghyun Lee et.al. | 2406.03140 | null |
2024-06-05 | Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models | Raeid Saqur et.al. | 2406.02969 | null |
2024-06-04 | Parrot: Multilingual Visual Instruction Tuning | Hai-Long Sun et.al. | 2406.02539 | link |
2024-06-04 | Demystifying the Compression of Mixture-of-Experts Through a Unified Framework | Shwai He et.al. | 2406.02500 | link |
2024-06-02 | Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts – Physics Informed Neural Operator Forward Model | Clement Etienam et.al. | 2406.00889 | link |
2024-06-01 | A Gaussian Process-based Streaming Algorithm for Prediction of Time Series With Regimes and Outliers | Daniel Waxman et.al. | 2406.00570 | link |
2024-06-01 | Optimizing 6G Integrated Sensing and Communications (ISAC) via Expert Networks | Jiacheng Wang et.al. | 2406.00408 | null |
2024-05-30 | Low-dimensional approximations of the conditional law of Volterra processes: a non-positive curvature approach | Reza Arabpour et.al. | 2405.20094 | null |
2024-06-02 | MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors | Renzhi Wang et.al. | 2405.19086 | null |
2024-06-02 | Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design | Markus J. Buehler et.al. | 2405.19076 | link |
2024-05-29 | Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization | Shengcai Liu et.al. | 2405.18884 | link |
2024-05-29 | MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models | Taehyun Kim et.al. | 2405.18832 | null |
2024-05-29 | Yuan 2.0-M32: Mixture of Experts with Attention Router | Shaohua Wu et.al. | 2405.17976 | link |
2024-05-28 | LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design | Rui Kong et.al. | 2405.17741 | null |
2024-05-27 | Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node | Andreas Charalampopoulos et.al. | 2405.16836 | link |
2024-05-26 | Mixture of Experts Using Tensor Products | Zhan Su et.al. | 2405.16671 | link |
2024-05-30 | A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts | Mohammed Nowaz Rabbani Chowdhury et.al. | 2405.16646 | null |
2024-05-26 | Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation | Rongyu Zhang et.al. | 2405.16486 | link |
2024-05-25 | MoEUT: Mixture-of-Experts Universal Transformers | Róbert Csordás et.al. | 2405.16039 | link |
2024-05-23 | Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training | Xianzhi Du et.al. | 2405.15052 | link |
2024-05-23 | Unchosen Experts Can Contribute Too: Unleashing MoE Models’ Power by Self-Contrast | Chufan Shi et.al. | 2405.14507 | link |
2024-05-23 | Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models | Yongxin Guo et.al. | 2405.14297 | link |
2024-05-23 | Graph Sparsification via Mixture of Graphs | Guibin Zhang et.al. | 2405.14260 | link |
2024-05-23 | Statistical Advantages of Perturbing Cosine Router in Sparse Mixture of Experts | Huy Nguyen et.al. | 2405.14131 | null |
2024-05-23 | Mixture of Experts Meets Prompt-Based Continual Learning | Minh Le et.al. | 2405.14124 | link |
2024-05-22 | Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts | Huy Nguyen et.al. | 2405.13997 | null |
2024-05-22 | xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token | Xin Cheng et.al. | 2405.13792 | link |
2024-05-24 | MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models | Jingwei Xu et.al. | 2405.13053 | link |
2024-05-21 | Optimizing Generative AI Networking: A Dual Perspective with Multi-Agent Systems and Mixture of Experts | Ruichen Zhang et.al. | 2405.12472 | null |
2024-05-21 | Ensemble and Mixture-of-Experts DeepONets For Operator Learning | Ramansh Sharma et.al. | 2405.11907 | link |
2024-05-19 | Learning More Generalized Experts by Merging Experts in Mixture-of-Experts | Sejik Park et.al. | 2405.11530 | null |
2024-05-18 | Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts | Yunxin Li et.al. | 2405.11273 | link |
2024-05-16 | Many Hands Make Light Work: Task-Oriented Dialogue System with Module-Based Mixture-of-Experts | Ruolin Su et.al. | 2405.09744 | null |
2024-05-15 | M $^4$ oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts | Yufeng Jiang et.al. | 2405.09446 | link |
2024-05-13 | Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition | Zhiyong Yang et.al. | 2405.07780 | link |
2024-05-07 | SUTRA: Scalable Multilingual Language Model Architecture | Abhijit Bendale et.al. | 2405.06694 | null |
2024-05-09 | A Mixture of Experts Approach to 3D Human Motion Prediction | Edmund Shieh et.al. | 2405.06088 | link |
2024-05-09 | A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds | Christopher Z. Cui et.al. | 2405.06059 | null |
2024-05-09 | EWMoE: An effective model for global weather forecasting with mixture-of-experts | Lihao Gan et.al. | 2405.06004 | link |
2024-05-09 | CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | Jiachen Li et.al. | 2405.05949 | link |
2024-05-16 | DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model | DeepSeek-AI et.al. | 2405.04434 | link |
2024-05-07 | Enhancing Physical Layer Communication Security through Generative AI with Mixture of Experts | Changyuan Zhao et.al. | 2405.04198 | null |
2024-05-06 | Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training | Zexuan Zhong et.al. | 2405.03133 | null |
2024-05-06 | WDMoE: Wireless Distributed Large Language Models with Mixture of Experts | Nan Xue et.al. | 2405.03131 | null |
<a href=#updated-on-20250628>(back to top)</a>