Contributors Forks Stargazers Issues

Updated on 2025.01.24

Publish Date Title Authors PDF Code
2025-01-20 Glinthawk: A Two-Tiered Architecture for High-Throughput LLM Inference Pouya Hamadanian et.al. 2501.11779 link
2025-01-20 Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas Nishant Balepur et.al. 2501.11549 link
2025-01-19 GREEN-CODE: Optimizing Energy Efficiency in Large Language Models for Code Generation Shashikant Ilager et.al. 2501.11006 null
2025-01-17 A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks Xinzhe Li et.al. 2501.10069 null
2025-01-16 Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition Takaaki Hori et.al. 2501.09258 null
2025-01-15 Guiding Retrieval using LLM-based Listwise Rankers Mandeep Rathee et.al. 2501.09186 link
2025-01-14 Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings Paul Joe Maliakel et.al. 2501.08219 null
2025-01-14 PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving Ahmet Caner Yüzügüler et.al. 2501.08192 null
2025-01-14 Hierarchical Autoscaling for Large Language Model Serving with Chiron Archit Patke et.al. 2501.08090 null
2025-01-12 MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference Wenxuan Zeng et.al. 2501.06807 null
2025-01-05 TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms Jovan Stojkovic et.al. 2501.02600 null
2025-01-04 AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference Zhuomin He et.al. 2501.02336 link
2025-01-03 Efficient LLM Inference with Activation Checkpointing and Hybrid Caching Sanghyeon Lee et.al. 2501.01792 null
2025-01-03 BlockDialect: Block-wise Fine-grained Mixed Format for Energy-Efficient LLM Inference Wonsuk Jang et.al. 2501.01144 null
2025-01-02 FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving Zihao Ye et.al. 2501.01005 link
2024-12-23 Highly Optimized Kernels and Fine-Grained Codebooks for LLM Inference on Arm CPUs Dibakar Gope et.al. 2501.00032 link
2024-12-29 TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication Zongwu Wang et.al. 2412.20501 link
2024-12-28 LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System Hyucksung Kwon et.al. 2412.20166 null
2024-12-19 GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors Chengming Zhang et.al. 2412.19829 null
2025-01-02 A Survey on Large Language Model Acceleration based on KV Cache Management Haoyang Li et.al. 2412.19442 link
2024-12-27 An Engorgio Prompt Makes Large Language Model Babble on Jianshuo Dong et.al. 2412.19394 link
2024-12-25 Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference Libo Zhang et.al. 2412.18934 null
2024-12-21 SYMPHONY: Improving Memory Management for LLM Inference Workloads Saurabh Agarwal et.al. 2412.16434 null
2024-12-20 WebLLM: A High-Performance In-Browser LLM Inference Engine Charlie F. Ruan et.al. 2412.15803 link
2024-12-18 A Survey on LLM Inference-Time Self-Improvement Xiangjue Dong et.al. 2412.14352 link
2024-12-18 Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models Seungeun Oh et.al. 2412.12687 null
2024-12-17 A System for Microserving of LLMs Hongyi Jin et.al. 2412.12488 null
2024-12-16 CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation Hongxuan Zhang et.al. 2412.11741 null
2024-12-15 Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning Yun Qu et.al. 2412.11120 link
2024-12-15 NITRO: LLM Inference on Intel Laptop NPUs Anthony Fei et.al. 2412.11053 link
2024-12-13 SCBench: A KV Cache-Centric Analysis of Long-Context Methods Yucheng Li et.al. 2412.10319 null
2024-12-17 TurboAttention: Efficient Attention Approximation For High Throughputs LLMs Hao Kang et.al. 2412.08585 null
2024-12-11 Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths Naryeong Kim et.al. 2412.08281 null
2024-12-12 TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch Xingchen Song et.al. 2412.08237 null
2024-12-09 Asynchronous LLM Function Calling In Gim et.al. 2412.07017 null
2024-12-09 SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs James Vo et.al. 2412.06198 null
2024-12-08 XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference Weizhuo Li et.al. 2412.05896 null
2024-12-06 GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments Yanyu Chen et.al. 2412.04788 null
2024-12-03 Multi-Bin Batching for Increasing LLM Inference Throughput Ozgur Guldogan et.al. 2412.04504 null
2024-11-29 BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching Zhen Zheng et.al. 2412.03594 null
2024-12-03 Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity Da Ma et.al. 2412.02252 null
2024-12-02 PLD+: Accelerating LLM inference by leveraging Language Model Artifacts Shwetha Somasundaram et.al. 2412.01447 null
2024-12-02 Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking Marco Federici et.al. 2412.01380 null
2024-12-05 RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy Geonho Lee et.al. 2412.01129 null
2024-12-02 TruncFormer: Private LLM Inference Using Only Truncations Patrick Yubeaton et.al. 2412.01042 null
2024-11-29 A dynamic parallel method for performance optimization on hybrid CPUs Luo Yu et.al. 2411.19542 null
2024-12-03 Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Akhiad Bercovich et.al. 2411.19146 null
2024-11-29 InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks Xinyao Zheng et.al. 2411.18191 null
2024-11-28 MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache Akshat Sharma et.al. 2411.18077 null
2024-11-24 Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments Nikoleta Iliakopoulou et.al. 2411.17741 null
2024-11-26 PIM-AI: A Novel Architecture for High-Efficiency LLM Inference Cristobal Ortega et.al. 2411.17309 null
2024-11-26 Star Attention: Efficient LLM Inference over Long Sequences Shantanu Acharya et.al. 2411.17116 link
2024-11-26 Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation Chaoyi Jiang et.al. 2411.17089 null
2024-11-25 MixPE: Quantization and Hardware Co-design for Efficient LLM Inference Yu Zhang et.al. 2411.16158 null
2024-11-24 eFedLLM: Efficient LLM Inference Based on Federated Learning Shengwen Ding et.al. 2411.16003 null
2024-11-24 Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format Chao Fang et.al. 2411.15982 null
2024-11-24 Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems Wenxiang Lin et.al. 2411.15715 null
2024-11-22 XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models Yixin Dong et.al. 2411.15100 null
2024-11-21 Disentangling Memory and Reasoning Ability in Large Language Models Mingyu Jin et.al. 2411.13504 link
2024-11-20 Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding Hyun Ryu et.al. 2411.13157 null
2024-11-21 LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts Zhuohan Gu et.al. 2411.13009 null
2024-11-15 An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2 Pepijn de Reus et.al. 2411.12758 link
2024-11-19 SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference Jiho Shin et.al. 2411.12692 null
2024-11-18 MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs Shiyi Cao et.al. 2411.11217 null
2024-11-15 AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference Janghwan Lee et.al. 2411.09909 null
2024-11-14 Squeezed Attention: Accelerating Long Context Length LLM Inference Coleman Hooper et.al. 2411.09688 link
2024-11-15 Communication Compression for Tensor Parallel LLM Inference Jan Hansen-Palmus et.al. 2411.09510 null
2024-11-14 Pie: Pooling CPU Memory for LLM Inference Yi Xu et.al. 2411.09317 null
2024-11-12 Towards Low-bit Communication for Tensor Parallel LLM Inference Harry Dong et.al. 2411.07942 null
2024-11-12 The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving Kyoungmin Kim et.al. 2411.07447 null
2024-11-08 AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality Ilias Bournias et.al. 2411.05555 null
2024-11-07 Hardware and Software Platform Inference Cheng Zhang et.al. 2411.05197 null
2024-11-07 SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference Gabriele Oliaro et.al. 2411.04975 null
2024-11-05 CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration Hongpeng Jin et.al. 2411.02829 null
2024-11-04 RAGViz: Diagnose and Visualize Retrieval-Augmented Generation Tevin Wang et.al. 2411.01751 link
2024-11-06 HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference Peng Tang et.al. 2411.01433 null
2024-11-02 RA-WEBs: Remote Attestation for WEB services Kosei Akama et.al. 2411.01340 null
2024-11-02 NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference Xuanlin Jiang et.al. 2411.01142 null
2024-11-01 LLM-Based Misconfiguration Detection for AWS Serverless Computing Jinfeng Wen et.al. 2411.00642 null
2024-11-04 ReverseNER: A Self-Generated Example-Driven Framework for Zero-Shot Named Entity Recognition with Large Language Models Anbang Wang et.al. 2411.00533 null
2024-11-01 Attention Tracker: Detecting Prompt Injection Attacks in LLMs Kuo-Han Hung et.al. 2411.00348 null
2024-10-31 LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators Krishna Teja Chitty-Venkata et.al. 2411.00136 link
2024-10-31 Interpretable Language Modeling via Induction-head Ngram Models Eunji Kim et.al. 2411.00066 link
2024-10-31 ALISE: Accelerating Large Language Model Serving with Speculative Scheduling Youpeng Zhao et.al. 2410.23537 null
2024-10-30 BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference Junqi Zhao et.al. 2410.23079 link
2024-10-29 Scaling LLM Inference with Optimized Sample Compute Allocation Kexun Zhang et.al. 2410.22480 link
2024-10-29 SVIP: Towards Verifiable Inference of Open-source Large Language Models Yifan Sun et.al. 2410.22307 null
2024-10-28 ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Hanshi Sun et.al. 2410.21465 link
2024-10-27 FIRP: Faster LLM inference via future intermediate representation prediction Pengfei Wu et.al. 2410.20488 null
2024-10-29 Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management Tuowei Wang et.al. 2410.19274 null
2024-10-24 Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Ruisi Cai et.al. 2410.19123 link
2024-10-30 Dynamic Vocabulary Pruning in Early-Exit LLMs Jort Vincenti et.al. 2410.18952 link
2024-10-25 A Survey on Speech Large Language Models Jing Peng et.al. 2410.18908 null
2024-10-24 BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching Peizhuang Cong et.al. 2410.18701 null
2024-10-25 Fast Inference for Augmented Large Language Models Rana Shahout et.al. 2410.18248 null
2024-10-23 POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference Aditya K Kamath et.al. 2410.18038 null
2024-10-22 FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs Haoran Lin et.al. 2410.16663 null
2024-10-22 Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency Prafulla Kumar Choubey et.al. 2410.16597 null
2024-10-20 EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models Junhao Hu et.al. 2410.15332 null
2024-10-19 IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System Minseok Seo et.al. 2410.15008 null
2024-10-23 Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching Jie Peng et.al. 2410.14740 null
2024-10-18 A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference You Wu et.al. 2410.14442 link
2024-10-18 Revisiting SLO and Goodput Metrics in LLM Serving Zhibin Wang et.al. 2410.14257 null
2024-10-17 RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs Jiatan Huang et.al. 2410.13987 null
2024-10-17 Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs Tianyu Guo et.al. 2410.13835 link
2024-10-17 Progressive Mixed-Precision Decoding for Efficient LLM Inference Hao Mark Chen et.al. 2410.13461 null
2024-10-17 Data Defenses Against Large Language Models William Agnew et.al. 2410.13138 link
2024-10-19 In-context KV-Cache Eviction for LLMs via Attention-Gate Zihao Zeng et.al. 2410.12876 null
2024-10-10 RecurFormer: Not All Transformer Heads Need Self-Attention Ruiqing Yan et.al. 2410.12850 null
2024-10-16 Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning Huiwen Wu et.al. 2410.12130 null
2024-10-15 Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix Yingyu Liang et.al. 2410.11261 null
2024-10-14 DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Guangxuan Xiao et.al. 2410.10819 link
2024-10-16 SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization Akrit Mudvari et.al. 2410.10759 null
2024-10-12 Power-Softmax: Towards Secure LLM Inference over Encrypted Data Itamar Zimerman et.al. 2410.09457 null
2024-10-09 SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration Heming Xia et.al. 2410.06916 link
2024-10-08 ParallelSpec: Parallel Drafter for Efficient Speculative Decoding Zilin Xiao et.al. 2410.05589 null
2024-10-06 RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference Yige Xu et.al. 2410.04519 link
2024-10-14 Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective Jinhao Li et.al. 2410.04466 null
2024-10-04 SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation Aurick Qiao et.al. 2410.03960 null
2024-10-04 EXAQ: Exponent Aware Quantization For LLMs Acceleration Moran Shkolnik et.al. 2410.03185 link
2024-10-03 LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences Zhenxiao Fu et.al. 2410.02950 null
2024-10-03 Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration Yun Qu et.al. 2410.02511 link
2024-10-03 LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services Małgorzata Łazuka et.al. 2410.02425 link
2024-10-04 Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation Xiaoqun Liu et.al. 2410.02220 null
2024-10-02 Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads Yuxiang Huang et.al. 2410.01805 link
2024-10-02 ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving Yifan Qiao et.al. 2410.01228 null
2024-10-01 TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Zonghang Li et.al. 2410.00531 link
2024-09-30 The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems Linke Song et.al. 2409.20002 null
2024-09-26 Control Industrial Automation System with Large Language Models Yuchen Xia et.al. 2409.18009 link
2024-09-26 Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores Shaobo Ma et.al. 2409.17870 null
2024-09-25 Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Zhenmei Shi et.al. 2409.17422 link
2024-09-25 Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations Amey Agrawal et.al. 2409.17264 null
2024-09-25 Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference Zongyue Qin et.al. 2409.16560 null
2024-09-25 AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization Yifan Tan et.al. 2409.16546 link
2024-09-23 Eagle: Efficient Training-Free Router for Multi-LLM Inference Zesen Zhao et.al. 2409.15518 null
2024-09-24 UELLM: A Unified and Efficient Approach for LLM Inference Serving Yiyuan He et.al. 2409.14961 null
2024-09-22 RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph Linxi Wei et.al. 2409.14556 null
2024-09-16 Do Large Language Models Need a Content Delivery Network? Yihua Cheng et.al. 2409.13761 link
2024-09-19 PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs) Mahmoud Nazzal et.al. 2409.12699 link
2024-09-12 LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs Han Xu et.al. 2409.11424 null
2024-09-04 ISO: Overlap of Computation and Communication within Seqenence For LLM Inference Bin Xiao et.al. 2409.11155 null
2024-09-18 RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Di Liu et.al. 2409.10516 link
2024-09-08 InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference Xiurui Pan et.al. 2409.04992 null
2024-09-07 Achieving Peak Performance for Large Language Models: A Systematic Review Zhyar Rzgar K Rostam et.al. 2409.04833 null
2024-09-06 A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage Huan Yang et.al. 2409.04040 null
2024-09-13 Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study Jianwei Zhu et.al. 2409.03992 null
2024-09-05 Sirius: Contextual Sparsity with Correction for Efficient LLMs Yang Zhou et.al. 2409.03856 link
2024-08-31 HSF: Defending against Jailbreak Attacks with Hidden State Filtering Cheng Qian et.al. 2409.03788 null
2024-09-03 Contemporary Model Compression on Large Language Models Inference Dong Liu et.al. 2409.01990 link
2024-09-02 CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification Junhui He et.al. 2409.01366 null
2024-09-04 Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference Barys Liskavets et.al. 2409.01227 null
2024-09-01 Research on LLM Acceleration Using the High-Performance RISC-V Processor “Xiangshan” (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product) Xu-Hao Chen et.al. 2409.00661 null
2024-08-28 Decentralized LLM Inference over Edge Networks with Energy Harvesting Aria Khoshsirat et.al. 2408.15907 null
2024-08-28 Efficient LLM Scheduling by Learning to Rank Yichao Fu et.al. 2408.15792 link
2024-08-28 Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation Lujun Gui et.al. 2408.15562 null
2024-08-22 NanoFlow: Towards Optimal Large Language Model Serving Throughput Kan Zhu et.al. 2408.12757 link
2024-09-04 Parallel Speculative Decoding with Adaptive Draft Length Tianyu Liu et.al. 2408.11850 link
2024-08-21 MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models Elias Frantar et.al. 2408.11743 link
2024-08-20 Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models Artem Vazhentsev et.al. 2408.10692 null
2024-08-19 PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars Sumanth Prabhu et.al. 2408.08869 null
2024-08-23 ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models Chao Zeng et.al. 2408.08554 link
2024-08-14 LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference Seungjae Moon et.al. 2408.07326 null
2024-08-12 LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration Zhiwen Mo et.al. 2408.06003 null
2024-08-10 LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale Jaehong Cho et.al. 2408.05499 link
2024-08-05 SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving Andreas Kosmas Kakolyris et.al. 2408.05235 null
2024-08-08 Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning Ke Cheng et.al. 2408.04323 null
2024-08-07 Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference Zeyu Zhang et.al. 2408.04107 null
2024-08-07 MPC-Minimized Secure LLM Inference Deevashwer Rathee et.al. 2408.03561 null
2024-08-05 Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning Hao Zhou et.al. 2408.02549 null
2024-08-02 The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines Matias Martinez et.al. 2408.01050 null
2024-08-01 DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency Jovan Stojkovic et.al. 2408.00741 null
2024-08-01 Designing Efficient LLM Accelerators for Edge Devices Jude Haris et.al. 2408.00462 null
2024-08-01 Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control Hao Zhou et.al. 2408.00214 null
2024-07-23 ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency Yuhang Yao et.al. 2408.00008 null
2024-08-01 Responsive ML inference in multi-tenanted environments using AQUA Abhishek Vijaya Kumar et.al. 2407.21255 null
2024-07-25 An Efficient Inference Framework for Early-exit Large Language Models Ruijie Miao et.al. 2407.20272 null
2024-07-29 Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost Sania Nayab et.al. 2407.19825 null
2024-07-29 Teaching LLMs at Charles University: Assignments and Activities Jindřich Helcl et.al. 2407.19798 null
2024-07-22 RazorAttention: Efficient KV Cache Compression Through Retrieval Heads Hanlin Tang et.al. 2407.15891 null
2024-07-22 vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving Jiale Xu et.al. 2407.15309 link
2024-07-19 LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference Qichen Fu et.al. 2407.14057 null
2024-07-17 Struct-X: Enhancing Large Language Models Reasoning with Structured Data Xiaoyu Tan et.al. 2407.12522 null
2024-07-17 LLM Inference Serving: Survey of Recent Advances and Opportunities Baolin Li et.al. 2407.12391 null
2024-07-17 Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models Ayush Kaushal et.al. 2407.12327 link
2024-07-16 PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation Branden Butler et.al. 2407.11798 null
2024-07-21 Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference Yuan Feng et.al. 2407.11550 link
2024-07-15 Fast Matrix Multiplications for Lookup Table-Quantized LLMs Han Guo et.al. 2407.10960 link
2024-07-12 Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference Zongyue Qin et.al. 2407.09722 null
2024-07-09 Metron: Holistic Performance Evaluation Framework for LLM Inference Systems Amey Agrawal et.al. 2407.07000 link
2024-07-08 Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU Daliang Xu et.al. 2407.05858 link
2024-07-07 A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length Yuqing Yang et.al. 2407.05347 null
2024-07-05 Corki: Enabling Real-time Embodied AI Robots via Algorithm-Architecture Co-Design Yiyang Huang et.al. 2407.04292 link
2024-07-04 Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems Grant Wilkins et.al. 2407.04014 null
2024-07-02 MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Huiqiang Jiang et.al. 2407.02490 link
2024-06-29 Teola: Towards End-to-End Optimization of LLM-based Applications Xin Tan et.al. 2407.00326 null
2024-06-25 T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge Jianyu Wei et.al. 2407.00088 link
2024-06-28 InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management Wonbeom Lee et.al. 2406.19707 null
2024-06-24 Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters Euiin Yi et.al. 2406.16758 link
2024-06-28 SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention Qianchao Zhu et.al. 2406.15486 null
2024-06-21 Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models Qi Liu et.al. 2406.14848 link
2024-06-20 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data Johannes Treutlein et.al. 2406.14546 link
2024-06-20 LiveMind: Low-latency Large Language Models with Simultaneous Inference Chuangtao Chen et.al. 2406.14319 link
2024-06-19 SDQ: Sparse Decomposed Quantization for LLM Inference Geonhwa Jeong et.al. 2406.13868 null
2024-06-19 Amphista: Accelerate LLM Inference with Bi-directional Multiple Drafting Heads in a Non-autoregressive Style Zeping Li et.al. 2406.13170 null
2024-06-16 Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization Jungi Lee et.al. 2406.12930 null
2024-06-18 LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization Masafumi Enomoto et.al. 2406.12494 null
2024-06-18 LLMs Are Prone to Fallacies in Causal Inference Nitish Joshi et.al. 2406.12158 null
2024-06-14 Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning Hui Liu et.al. 2406.11890 null
2024-06-17 Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference Donghyeon Joo et.al. 2406.11674 null
2024-06-17 QTIP: Quantization with Trellises and Incoherence Processing Albert Tseng et.al. 2406.11235 link
2024-06-16 New Solutions on LLM Acceleration, Optimization, and Application Yingbing Huang et.al. 2406.10903 null
2024-06-16 Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference Jiaming Tang et.al. 2406.10774 link
2024-06-15 Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary Study Hao Hao et.al. 2406.10675 link
2024-06-08 QCQA: Quality and Capacity-aware grouped Query Attention Vinay Joshi et.al. 2406.10247 null
2024-06-12 Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference Christopher Wolters et.al. 2406.08413 null
2024-06-12 PowerInfer-2: Fast Large Language Model Inference on a Smartphone Zhenliang Xue et.al. 2406.06282 null
2024-06-09 A Superalignment Framework in Autonomous Driving with Large Language Models Xiangrui Kong et.al. 2406.05651 null
2024-06-06 Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism Jiahao Liu et.al. 2406.03853 null
2024-06-04 Language Models can Infer Action Semantics for Classical Planners from Environment Feedback Wang Zhu et.al. 2406.02791 null
2024-06-08 Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach Yuxuan Chen et.al. 2406.02616 null
2024-06-04 SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices Ruslan Svirschevski et.al. 2406.02532 link
2024-06-03 Demystifying Platform Requirements for Diverse LLM Inference Use Cases Abhimanyu Bambhaniya et.al. 2406.01698 link
2024-06-03 PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration Ziqian Zeng et.al. 2406.01394 null
2024-06-01 A Practice-Friendly Two-Stage LLM-Enhanced Paradigm in Sequential Recommendation Dugang Liu et.al. 2406.00333 null
2024-05-31 No Free Lunch Theorem for Privacy-Preserving LLM Inference Xiaojin Zhang et.al. 2405.20681 null
2024-05-30 Decentralized AI: Permissionless LLM Inference on POKT Network Daniel Olshansky et.al. 2405.20450 null
2024-06-01 S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs Wei Zhong et.al. 2405.20314 null
2024-05-30 Deciphering Human Mobility: Inferring Semantics of Trajectories with Large Language Models Yuxiao Luo et.al. 2405.19850 null
2024-05-29 MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models Taehyun Kim et.al. 2405.18832 null
2024-05-29 PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN Fei Zheng et.al. 2405.18744 null
2024-06-02 Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference Hao Mark Chen et.al. 2405.18628 link
2024-05-25 FastQuery: Communication-efficient Embedding Table Query for Private LLM Inference Chenqi Lin et.al. 2405.16241 null
2024-05-23 EdgeShard: Efficient LLM Inference via Collaborative Edge Computing Mingjin Zhang et.al. 2405.14371 null
2024-05-23 MiniCache: KV Cache Compression in Depth Dimension for Large Language Models Akide Liu et.al. 2405.14366 null
2024-05-21 PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference Dongjie Yang et.al. 2405.12532 null
2024-05-12 Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization Xinyuan Zhang et.al. 2405.07140 null
2024-05-11 Aladdin: Joint Placement and Scaling for SLO-Aware LLM Serving Chengyi Nie et.al. 2405.06856 null
2024-05-21 Vidur: A Large-Scale Simulation Framework For LLM Inference Amey Agrawal et.al. 2405.05465 link
2024-05-13 KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation Minsik Cho et.al. 2405.05329 null
2024-05-12 DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer’s Disease Questions with Scientific Literature Dawei Li et.al. 2405.04819 link
2024-05-10 QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving Yujun Lin et.al. 2405.04532 link
2024-05-07 vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention Ramya Prabhu et.al. 2405.04437 null
2024-05-07 Optimizing Language Model’s Reasoning Abilities with Weak Supervision Yongqi Tong et.al. 2405.04086 null
2024-05-06 AlphaMath Almost Zero: process Supervision without process Guoxin Chen et.al. 2405.03553 link
2024-05-03 Efficient and Economic Large Language Model Inference with Attention Offloading Shaoyuan Chen et.al. 2405.01814 null

<a href=#updated-on-20250124>(back to top)</a>

MoE

Publish Date Title Authors PDF Code
2025-01-22 Autonomy-of-Experts Models Ang Lv et.al. 2501.13074 null
2025-01-22 LLM4WM: Adapting LLM for Wireless Multi-Tasking Xuanyu Liu et.al. 2501.12983 null
2025-01-22 UniUIR: Considering Underwater Image Restoration as An All-in-One Learner Xu Zhang et.al. 2501.12981 null
2025-01-22 BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR Guodong Ma et.al. 2501.12602 null
2025-01-21 Modality Interactive Mixture-of-Experts for Fake News Detection Yifan Liu et.al. 2501.12431 null
2025-01-21 SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection Xiaocheng Zhang et.al. 2501.12430 null
2025-01-21 Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models Samira Abnar et.al. 2501.12370 null
2025-01-21 MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks Qishen Zhou et.al. 2501.12281 link
2025-01-21 Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Zihan Qiu et.al. 2501.11873 null
2025-01-18 FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models Xinglin Pan et.al. 2501.10714 null
2025-01-17 OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning Jinyuan Feng et.al. 2501.10062 null
2025-01-17 LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading Kuan-Ming Liu et.al. 2501.09636 null
2025-01-14 MiniMax-01: Scaling Foundation Models with Lightning Attention MiniMax et.al. 2501.08313 null
2025-01-14 GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism Chen Tang et.al. 2501.07890 null
2025-01-18 PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration Xiaoshui Huang et.al. 2501.07762 null
2025-01-13 A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis Binyu Zhang et.al. 2501.07016 link
2025-01-12 Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning Hanwen Zhong et.al. 2501.06884 link
2025-01-10 TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning Yinghao Zhu et.al. 2501.05661 link
2025-01-09 Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing Mengfan Liu et.al. 2501.05313 null
2025-01-07 LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes Xiang Xu et.al. 2501.04004 link
2025-01-07 mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training Xudong Liao et.al. 2501.03905 null
2025-01-08 Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection Donatella Genovese et.al. 2501.03432 null
2025-01-12 Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning Zhongyi Zhou et.al. 2501.02198 null
2025-01-03 MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders Jiajun Cao et.al. 2501.01709 null
2025-01-01 REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization Huyen Nguyen et.al. 2501.00779 null
2025-01-06 Superposition in Transformers: A Novel Way of Building Mixture of Experts Ayoub Ben Chaliah et.al. 2501.00530 link
2024-12-31 CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection Xiaolei Wang et.al. 2501.00346 null
2024-12-29 Multimodal Variational Autoencoder: a Barycentric View Peijie Qiu et.al. 2412.20487 null
2024-12-29 A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement Sidra Nasir et.al. 2412.20468 null
2024-12-28 UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity Jingbo Lin et.al. 2412.20157 link
2024-12-28 Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection Yaning Zhang et.al. 2412.20156 null
2024-12-27 DeepSeek-V3 Technical Report DeepSeek-AI et.al. 2412.19437 link
2024-12-26 AskChart: Universal Chart Understanding through Textual Enhancement Xudong Yang et.al. 2412.19146 link
2024-12-30 Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection Xiaoyu Huang et.al. 2412.19108 null
2024-12-24 Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making David Shoresh et.al. 2412.18593 link
2024-12-24 BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing Yingjie Ma et.al. 2412.18065 link
2024-12-23 UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition Li Fu et.al. 2412.17507 null
2024-12-23 BrainMAP: Learning Multiple Activation Pathways in Brain Networks Song Wang et.al. 2412.17404 null
2024-12-22 Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models Elie Antoine et.al. 2412.16971 null
2024-12-20 Theory of Mixture-of-Experts for Mobile Edge Computing Hongbo Li et.al. 2412.15690 null
2024-12-19 MoEtion: Efficient and Reliable Checkpointing for Mixture-of-Experts Models at Scale Swapnil Gandhi et.al. 2412.15411 null
2024-12-19 Qwen2.5 Technical Report Qwen et.al. 2412.15115 link
2024-12-19 ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Ziteng Wang et.al. 2412.14711 link
2024-12-18 A Survey on Inference Optimization Techniques for Mixture of Experts Models Jiacheng Liu et.al. 2412.14219 link
2024-12-18 SEKE: Specialised Experts for Keyword Extraction Matej Martinc et.al. 2412.14087 link
2024-12-18 MedCoT: Medical Chain of Thought via Hierarchical Expert Jiaxiang Liu et.al. 2412.13736 link
2024-12-17 SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks Mátyás Vincze et.al. 2412.13053 null
2024-12-17 Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning Moritz Reuss et.al. 2412.12953 null
2024-12-17 CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition He Wang et.al. 2412.12760 null
2024-12-16 Investigating Mixture of Experts in Dense Retrieval Effrosyni Sokli et.al. 2412.11864 null
2024-12-18 Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture Jingze Shi et.al. 2412.11834 link
2024-12-16 Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation Svetlana Pavlitska et.al. 2412.11608 null
2024-12-16 Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture Jingyu Xu et.al. 2412.11557 null
2024-12-14 DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification Yuhao Wang et.al. 2412.10650 link
2024-12-13 DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Zhiyu Wu et.al. 2412.10302 link
2024-12-13 Llama 3 Meets MoE: Efficient Upcycling Aditya Vavre et.al. 2412.09952 link
2024-12-12 Memory Layers at Scale Vincent-Pierre Berges et.al. 2412.09764 link
2024-12-12 Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine Xiaoshuang Huang et.al. 2412.09278 link
2024-12-12 Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective Minh Le et.al. 2412.08285 null
2024-12-11 Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification Xuanze Chen et.al. 2412.08193 null
2024-12-10 MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems Yao Fu et.al. 2412.07067 null
2024-12-07 Partition of Unity Physics-Informed Neural Networks (POU-PINNs): An Unsupervised Framework for Physics-Informed Domain Decomposition and Mixtures of Experts Arturo Rodriguez et.al. 2412.06842 null
2024-12-09 Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset Xiao Wang et.al. 2412.06647 link
2024-12-09 UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts Zhen Wan et.al. 2412.06340 null
2024-12-08 Hallucination-aware Optimization for Large Language Model-empowered Communications Yinqiu Liu et.al. 2412.06007 link
2024-12-10 An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism Qing Zhang et.al. 2412.05821 null
2024-12-10 RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts Xu Liu et.al. 2412.05679 link
2024-12-07 SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts Gengze Zhou et.al. 2412.05552 link
2024-12-07 Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers Boxun Xu et.al. 2412.05540 null
2024-12-06 Steps are all you need: Rethinking STEM Education with Prompt Engineering Krishnasai Addala et.al. 2412.05023 null
2024-12-09 Monet: Mixture of Monosemantic Experts for Transformers Jungwoo Park et.al. 2412.04139 link
2024-12-05 Meta-Reinforcement Learning With Mixture of Experts for Generalizable Multi Access in Heterogeneous Wireless Networks Zhaoyang Liu et.al. 2412.03850 null
2024-12-04 Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond Loukas Ilias et.al. 2412.03483 null
2024-12-05 MQFL-FHE: Multimodal Quantum Federated Learning Framework with Fully Homomorphic Encryption Siddhant Dutta et.al. 2412.01858 null
2024-12-05 Yi-Lightning Technical Report 01. AI et.al. 2412.01253 null
2024-11-30 Mixture of Experts for Node Classification Yu Shi et.al. 2412.00418 null
2024-11-30 HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting Shaohan Yu et.al. 2412.00316 null
2024-11-27 Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference Andrii Skliar et.al. 2412.00099 null
2024-11-29 LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References Shuguo Jiang et.al. 2411.19758 null
2024-11-28 On the effectiveness of discrete representations in sparse mixture of experts Giang Do et.al. 2411.19402 null
2024-11-28 Bayesian Cluster Weighted Gaussian Models Panagiotis Papastamoulis et.al. 2411.18957 link
2024-11-27 UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS Haomin Zhuang et.al. 2411.18797 null
2024-11-27 Complexity Experts are Task-Discriminative Learners for Any Image Restoration Eduard Zamfir et.al. 2411.18466 null
2024-11-27 Mixture of Experts in Image Classification: What’s the Sweet Spot? Mathurin Videau et.al. 2411.18322 null
2024-11-26 $H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs Selim Furkan Tekin et.al. 2411.17792 link
2024-11-25 Staleness-Centric Optimizations for Efficient Diffusion MoE Inference Jiajun Luo et.al. 2411.16786 null
2024-11-29 MH-MoE: Multi-Head Mixture-of-Experts Shaohan Huang et.al. 2411.16205 null
2024-11-25 LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy Peng Cui et.al. 2411.16095 null
2024-11-24 Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution Haiquan Wang et.al. 2411.15871 null
2024-11-24 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training Xiaoye Qu et.al. 2411.15708 link
2024-11-23 Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts Qizhou Chen et.al. 2411.15432 null
2024-11-23 Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation Fahao Chen et.al. 2411.15419 null
2024-11-20 MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification Yuxuan Chen et.al. 2411.13004 null
2024-11-23 KAAE: Numerical Reasoning for Knowledge Graphs via Knowledge-aware Attributes Learning Ming Yin et.al. 2411.12950 null
2024-11-19 Ultra-Sparse Memory Network Zihao Huang et.al. 2411.12364 null
2024-11-18 MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs Shiyi Cao et.al. 2411.11217 null
2024-11-16 Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts Jinqiang Long et.al. 2411.10669 link
2024-11-15 Weakly-Supervised Multimodal Learning on MIMIC-CXR Andrea Agostini et.al. 2411.10356 link
2024-11-21 Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models Wei Wang et.al. 2411.10003 null
2024-11-13 Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection Vima Gupta et.al. 2411.08982 null
2024-11-13 Sparse Upcycling: Inference Inefficient Finetuning Sasha Doubov et.al. 2411.08968 null
2024-11-13 LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing Xiaonan Nie et.al. 2411.08446 null
2024-11-12 Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach Renzi Wang et.al. 2411.08232 null
2024-11-12 PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model Yilun Liu et.al. 2411.08212 null
2024-11-12 Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge Emmanuel Azuh Mensah et.al. 2411.07834 null
2024-11-11 Adaptive Conditional Expert Selection Network for Multi-domain Recommendation Kuiyao Dong et.al. 2411.06826 null
2024-11-11 WDMoE: Wireless Distributed Mixture of Experts for Large Language Models Nan Xue et.al. 2411.06681 null
2024-11-09 Learning Mixtures of Experts with EM Quentin Fruytier et.al. 2411.06056 null
2024-11-08 NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts Yen-Ting Lin et.al. 2411.05945 null
2024-11-05 DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts Zelin Yao et.al. 2411.03025 link
2024-11-05 Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts Yuan Xie et.al. 2411.02787 null
2024-11-06 Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Xingwu Sun et.al. 2411.02265 null
2024-11-04 FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation Ziwei Zhan et.al. 2411.02115 null
2024-11-03 RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering Hui Lin et.al. 2411.01595 null
2024-11-03 Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation Mingrui Liu et.al. 2411.01457 null
2024-11-06 HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference Peng Tang et.al. 2411.01433 null
2024-11-07 HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy Shuqing Luo et.al. 2411.01288 link
2024-11-02 PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment Dongxu Liu et.al. 2411.01245 null
2024-11-01 MoE-I $^2$ : Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition Cheng Yang et.al. 2411.01016 null
2024-11-01 LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models Nam V. Nguyen et.al. 2411.00918 link
2024-11-01 MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization Jingming Guo et.al. 2411.00662 link
2024-10-31 Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts Xiang Deng et.al. 2410.23836 null
2024-10-30 Efficient and Interpretable Grammatical Error Correction with Mixture of Experts Muhammad Reza Qorib et.al. 2410.23507 link
2024-10-30 Stealing User Prompts from Mixture of Experts Itay Yona et.al. 2410.22884 null
2024-10-30 MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning Xujia Wang et.al. 2410.22782 null
2024-10-29 ProMoE: Fast MoE-based LLM Serving using Proactive Caching Xiaoniu Song et.al. 2410.22134 null
2024-10-29 Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging Li Shen et.al. 2410.21804 null
2024-10-29 Neural Experts: Mixture of Experts for Implicit Neural Representations Yizhak Ben-Shabat et.al. 2410.21643 null
2024-10-28 FinTeamExperts: Role Specialized MOEs For Financial Analysis Yue Yu et.al. 2410.21338 null
2024-10-28 Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving Jiyao Wang et.al. 2410.21086 null
2024-10-27 Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation Maohao Shen et.al. 2410.20336 null
2024-10-27 GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields Yusuke Sekikawa et.al. 2410.20306 null
2024-10-25 DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction Zelin Zang et.al. 2410.19504 link
2024-10-25 Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis Weikai Li et.al. 2410.19225 link
2024-10-24 Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Ruisi Cai et.al. 2410.19123 link
2024-10-24 Mixture of Parrots: Experts improve memorization more than reasoning Samy Jelassi et.al. 2410.19034 null
2024-10-24 MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases Zhisheng Lin et.al. 2410.18406 null
2024-10-23 Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches Kexin Feng et.al. 2410.18298 null
2024-10-23 MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning Jingfan Zhang et.al. 2410.18035 null
2024-10-23 ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference Xin He et.al. 2410.17954 null
2024-10-23 Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition Artem Basharin et.al. 2410.17765 null
2024-10-22 Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling Jialong Li et.al. 2410.17043 null
2024-10-21 LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset Ruikun Zhang et.al. 2410.16095 link
2024-10-22 CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts Zhenpeng Su et.al. 2410.16077 link
2024-10-21 Generalizing Motion Planners with Mixture of Experts for Autonomous Driving Qiao Sun et.al. 2410.15774 link
2024-10-21 ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts Xumeng Han et.al. 2410.15732 null
2024-10-20 Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs Xin Zhou et.al. 2410.15438 null
2024-10-20 LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration Yuang Ai et.al. 2410.15385 link
2024-10-19 MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning Suning Huang et.al. 2410.14972 null
2024-10-18 MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts Rachel S. Y. Teo et.al. 2410.14574 link
2024-10-18 ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction Haoyu He et.al. 2410.14099 link
2024-10-17 Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks Jinze Zhao et.al. 2410.13964 null
2024-10-16 On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs Herun Wan et.al. 2410.12600 null
2024-10-16 Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts Fanqi Yan et.al. 2410.12258 null
2024-10-16 EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference Yulei Qian et.al. 2410.12247 null
2024-10-15 MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router Yanyue Xie et.al. 2410.12013 null
2024-10-15 MoH: Multi-Head Attention as Mixture-of-Head Attention Peng Jin et.al. 2410.11842 link
2024-10-15 GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation Fei Tang et.al. 2410.11841 link
2024-10-15 Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models James Vo et.al. 2410.11654 null
2024-10-16 Quadratic Gating Functions in Mixture of Experts: A Statistical Insight Pedram Akbarian et.al. 2410.11222 null
2024-10-16 Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Ziyue Li et.al. 2410.10814 link
2024-10-14 Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts Guorui Zheng et.al. 2410.10626 link
2024-10-14 Learning to Ground VLMs without Forgetting Aritra Bhowmik et.al. 2410.10491 null
2024-10-14 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts Xu Liu et.al. 2410.10469 null
2024-10-15 Ada-K Routing: Boosting the Efficiency of MoE-based LLMs Tongtian Yue et.al. 2410.10456 null
2024-10-14 Tighter Risk Bounds for Mixtures of Experts Wissam Akretche et.al. 2410.10397 null
2024-10-14 Scalable Multi-Domain Adaptation of Language Models using Modular Experts Peter Schafhalter et.al. 2410.10181 null
2024-10-14 Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models Jun Luo et.al. 2410.10114 null
2024-10-14 AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality Peijun Qing et.al. 2410.10054 link
2024-10-13 ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL Zhanqiu Guo et.al. 2410.09781 null
2024-10-11 Semi-Supervised Learning of Noisy Mixture of Experts Models Oh-Ran Kwon et.al. 2410.09039 null
2024-10-11 Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering I-Chun Chen et.al. 2410.08589 link
2024-10-10 Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts Sukwon Yun et.al. 2410.08245 link
2024-10-10 Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training Gen Luo et.al. 2410.08202 null
2024-10-10 Efficient Dictionary Learning with Switch Sparse Autoencoders Anish Mudide et.al. 2410.08201 link
2024-10-10 More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing Sagi Shaier et.al. 2410.08003 null
2024-10-10 SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture Jiayi Han et.al. 2410.07739 null
2024-10-10 Upcycling Large Language Models into Mixture of Experts Ethan He et.al. 2410.07524 null
2024-10-09 MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts Peng Jin et.al. 2410.07348 link
2024-10-09 Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders David Noever et.al. 2410.06462 null
2024-10-09 Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs Ruijia Niu et.al. 2410.06431 null
2024-10-08 Probing the Robustness of Theory of Mind in Large Language Models Christian Nickel et.al. 2410.06271 null
2024-10-08 MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More Wei Huang et.al. 2410.06270 link
2024-10-08 Aria: An Open Multimodal Native Mixture-of-Experts Model Dongxu Li et.al. 2410.05993 link
2024-10-08 Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models Siqi Wang et.al. 2410.05661 null
2024-10-07 Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild Xinyu Zhao et.al. 2410.05357 link
2024-10-07 Multimodal Fusion Strategies for Mapping Biophysical Landscape Features Lucia Gordon et.al. 2410.04833 link
2024-10-06 Realizing Video Summarization from the Path of Language-based Semantic Understanding Kuan-Chen Mu et.al. 2410.04511 null
2024-10-09 Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding Wei Wu et.al. 2410.03553 null
2024-10-04 Exploring the Benefit of Activation Sparsity in Pre-training Zhengyan Zhang et.al. 2410.03440 link
2024-10-03 MLP-KAN: Unifying Deep Representation and Function Learning Yunhong He et.al. 2410.03027 link
2024-10-03 On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions Huy Nguyen et.al. 2410.02935 null
2024-10-03 Neutral residues: revisiting adapters for model extension Franck Signe Talla et.al. 2410.02744 null
2024-10-03 Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping Ziye Huang et.al. 2410.02475 null
2024-10-03 MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction Zhaojian Yu et.al. 2410.02241 null
2024-10-03 Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts Minh Le et.al. 2410.02200 null
2024-10-04 Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices Andres Potapczynski et.al. 2410.02117 link
2024-10-04 EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing Haotian Sun et.al. 2410.02098 null
2024-10-02 Don’t flatten, tokenize! Unlocking the key to SoftMoE’s efficacy in deep RL Ghada Sokar et.al. 2410.01930 null
2024-10-02 Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models Shayekh Bin Islam et.al. 2410.01782 link
2024-10-02 Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging Tingfeng Hui et.al. 2410.01610 null
2024-10-02 The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs Hong Li et.al. 2410.01417 null
2024-10-01 MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards Sheng Wang et.al. 2410.00938 null
2024-10-01 UniAdapt: A Universal Adapter for Knowledge Calibration Tai D. Nguyen et.al. 2410.00454 null
2024-10-01 Robust Traffic Forecasting against Spatial Shift over Years Hongjun Wang et.al. 2410.00373 link
2024-09-29 IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method Chaohui Xu et.al. 2410.00059 null
2024-09-30 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Haotian Zhang et.al. 2409.20566 null
2024-10-02 CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Jihai Zhang et.al. 2409.19291 link
2024-09-27 SciDFM: A Large Language Model with Mixture-of-Experts for Science Liangtai Sun et.al. 2409.18412 null
2024-09-26 Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE Xun Zhu et.al. 2409.17508 link
2024-09-26 A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction Guangyu Wang et.al. 2409.17440 link
2024-09-24 Leveraging Mixture of Experts for Improved Speech Deepfake Detection Viola Negroni et.al. 2409.16077 null
2024-10-02 Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Xiaoming Shi et.al. 2409.16040 link
2024-09-24 Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM Fengrun Zhang et.al. 2409.15905 null
2024-09-24 Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks Jiayi He et.al. 2409.15695 null
2024-09-23 A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts Hugo Inzirillo et.al. 2409.15161 link
2024-09-23 Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond Hong Chen et.al. 2409.14993 null
2024-09-21 Routing in Sparsely-gated Language Models responds to Context Stefan Arnold et.al. 2409.14107 null
2024-09-20 On-device Collaborative Language Modeling via a Mixture of Generalists and Specialists Dongyang Fan et.al. 2409.13931 link
2024-09-20 Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning Annette Spooner et.al. 2409.13791 null
2024-09-19 Robust Audiovisual Speech Recognition Models with Mixture-of-Experts Yihan Wu et.al. 2409.12370 null
2024-09-18 GRIN: GRadient-INformed MoE Liyuan Liu et.al. 2409.12136 null
2024-09-18 Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0 Zhiyong Wang et.al. 2409.11909 null
2024-09-17 LPT++: Efficient Training on Mixture of Long-tailed Experts Bowen Dong et.al. 2409.11323 null
2024-09-19 LOLA – An Open-Source Massively Multilingual Large Language Model Nikit Srivastava et.al. 2409.11272 link
2024-09-16 Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression Yi-Hsin Li et.al. 2409.10101 null
2024-09-14 MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving Enming Zhang et.al. 2409.07267 link
2024-09-10 DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models Maryam Akhavan Aghdam et.al. 2409.06669 null
2024-09-10 STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning Jaeseong Lee et.al. 2409.06211 null
2024-09-10 VE: Modeling Multivariate Time Series Correlation with Variate Embedding Shangjiong Wang et.al. 2409.06169 link
2024-09-09 Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models Hongyang Lei et.al. 2409.05929 null
2024-09-09 Optical Spiking Neurons Enable High-Speed and Energy-Efficient Optical Neural Networks Bo Xu et.al. 2409.05726 null
2024-09-09 Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection Tianwu Lei et.al. 2409.05611 null
2024-09-05 Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions Zemian Ke et.al. 2409.03282 null
2024-09-05 ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding Zhengzhuo Xu et.al. 2409.03277 null
2024-09-05 xLAM: A Family of Large Action Models to Empower AI Agent Systems Jianguo Zhang et.al. 2409.03215 link
2024-09-04 Configurable Foundation Models: Building LLMs from a Modular Perspective Chaojun Xiao et.al. 2409.02877 null
2024-09-04 Pluralistic Salient Object Detection Xuelu Feng et.al. 2409.02368 null
2024-09-03 OLMoE: Open Mixture-of-Experts Language Models Niklas Muennighoff et.al. 2409.02060 link
2024-09-05 Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model Hukai Huang et.al. 2409.02050 null
2024-09-02 Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning Soumajyoti Sarkar et.al. 2409.01483 null
2024-09-02 Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching Sungmin Yun et.al. 2409.01141 null
2024-09-04 Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack Guanzhong Chen et.al. 2409.00960 link
2024-09-02 Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts Youngseog Chung et.al. 2409.00879 null
2024-08-29 Gradient-free variational learning with conditional mixture networks Conor Heins et.al. 2408.16429 link
2024-08-28 Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models Yuncheng Yang et.al. 2408.15915 link
2024-08-28 Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts Nikolas Gritsch et.al. 2408.15901 null
2024-08-28 LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Fangxun Shu et.al. 2408.15881 link
2024-08-28 Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts Lean Wang et.al. 2408.15664 null
2024-08-27 Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis Sakhinana Sagar Srinivas et.al. 2408.15305 null
2024-08-27 MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce Hao Jiang et.al. 2408.14968 null
2024-08-24 Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings Sagar Srinivas Sakhinana et.al. 2408.13622 null
2024-08-23 The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities Venkatesh Balavadhani Parthasarathy et.al. 2408.13296 null
2024-08-23 Guiding IoT-Based Healthcare Alert Systems with Large Language Models Yulan Gao et.al. 2408.13071 null
2024-08-23 DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation Xiaowei Mao et.al. 2408.12809 null
2024-08-23 Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth Yuxiang Wei et.al. 2408.12803 null
2024-08-23 La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection Hang Zou et.al. 2408.12793 null
2024-08-22 SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging Mohammadreza Pourreza et.al. 2408.12733 null
2024-08-22 Jamba-1.5: Hybrid Transformer-Mamba Models at Scale Jamba Team et.al. 2408.12570 null
2024-08-22 Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators Dingkang Yang et.al. 2408.12325 link
2024-08-21 MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing Hao Zhou et.al. 2408.11396 link
2024-08-21 KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting? Xiao Han et.al. 2408.11306 link
2024-08-21 FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts Hanzi Mei et.al. 2408.11304 null
2024-08-20 Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data Atmika Gorti et.al. 2408.11247 null
2024-08-20 Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting Jianxiang Zhou et.al. 2408.10822 link
2024-08-20 AnyGraph: Graph Foundation Model in the Wild Lianghao Xia et.al. 2408.10700 link
2024-08-20 HMoE: Heterogeneous Mixture of Experts for Language Modeling An Wang et.al. 2408.10681 null
2024-08-19 AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference Shuzhang Zhong et.al. 2408.10284 link
2024-08-17 FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models Xiaochen Wang et.al. 2408.10276 link
2024-08-19 Customizing Language Models with Instance-wise LoRA for Sequential Recommendation Xiaoyu Kong et.al. 2408.10159 link
2024-08-19 A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method Hang Zou et.al. 2408.09752 null
2024-08-16 Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection Haohao Zhu et.al. 2408.08551 null
2024-08-17 BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts Qizhen Zhang et.al. 2408.08274 null
2024-08-14 Beyond Inter-Item Relations: Dynamic Adaptive Mixture-of-Experts for LLM-Based Sequential Recommendation CanYi Liu et.al. 2408.07427 null
2024-08-13 A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning Prateek Yadav et.al. 2408.07057 null
2024-08-13 Layerwise Recurrent Router for Mixture-of-Experts Zihan Qiu et.al. 2408.06793 link
2024-08-13 AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies Bo-Wen Zhang et.al. 2408.06567 null
2024-08-10 HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou Xu Wang et.al. 2408.05430 null
2024-08-08 Understanding the Performance and Estimating the Cost of LLM Fine-Tuning Yuchen Xia et.al. 2408.04693 link
2024-08-08 Partial Experts Checkpoint: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training Weilin Cai et.al. 2408.04307 null
2024-08-08 LaDiMo: Layer-wise Distillation Inspired MoEfier Sungyoon Kim et.al. 2408.04278 null
2024-08-07 MoExtend: Tuning New Experts for Modality and Task Extension Shanshan Zhong et.al. 2408.03511 link
2024-08-05 Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization Changtao Miao et.al. 2408.02306 null
2024-08-02 HMDN: Hierarchical Multi-Distribution Network for Click-Through Rate Prediction Xingyu Lou et.al. 2408.01332 null
2024-08-01 Multimodal Fusion and Coherence Modeling for Video Topic Segmentation Hai Yu et.al. 2408.00365 null
2024-08-12 MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts Xi Victoria Lin et.al. 2407.21770 null
2024-07-31 PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning Min Jae Jung et.al. 2407.21571 null
2024-07-30 Distribution Learning for Molecular Regression Nima Shoghi et.al. 2407.20475 null
2024-07-29 Time series forecasting with high stakes: A field study of the air cargo industry Abhinav Garg et.al. 2407.20192 null
2024-07-30 Mixture of Nested Experts: Adaptive Processing of Visual Tokens Gagan Jain et.al. 2407.19985 null
2024-07-28 Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models Mohammed Al-Maamari et.al. 2407.19610 link
2024-07-26 Wolf: Captioning Everything with a World Summarization Framework Boyi Li et.al. 2407.18908 null
2024-07-26 MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition Chang Liu et.al. 2407.18616 link
2024-07-26 Dynamic Language Group-Based MoE: Enhancing Efficiency and Flexibility for Code-Switching Speech Recognition Hukai Huang et.al. 2407.18581 link
2024-07-25 How Lightweight Can A Vision Transformer Be Jen Hong Tan et.al. 2407.17783 null
2024-07-24 Exploring Domain Robust Lightweight Reward Models based on Router Mechanism Hyuk Namgoong et.al. 2407.17546 null
2024-07-24 M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis Junyu Li et.al. 2407.17267 link
2024-07-25 Cheems: Wonderful Matrices More Efficient and More Effective Architecture Jingze Shi et.al. 2407.16958 null
2024-07-22 Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget Vikash Sehwag et.al. 2407.15811 link
2024-07-22 Norface: Improving Facial Expression Analysis by Identity Normalization Hanwei Liu et.al. 2407.15617 link
2024-07-19 Mixture of Experts with Mixture of Precisions for Tuning Quality of Service HamidReza Imani et.al. 2407.14417 null
2024-07-19 EVLM: An Efficient Vision-Language Model for Visual Understanding Kaibing Chen et.al. 2407.14177 null
2024-07-19 Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models Qiong Wu et.al. 2407.14093 null
2024-07-18 Discussion: Effective and Interpretable Outcome Prediction by Training Sparse Mixtures of Linear Experts Francesco Folino et.al. 2407.13526 null
2024-07-18 Mixture of Experts based Multi-task Supervise Learning from Crowds Tao Han et.al. 2407.13268 null
2024-07-15 MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration Yulin Ren et.al. 2407.10833 null
2024-07-18 Qwen2 Technical Report An Yang et.al. 2407.10671 link
2024-07-15 Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering Francesco Di Sario et.al. 2407.10389 null
2024-07-13 Low-Rank Interconnected Adaptation Across Layers Yibo Zhong et.al. 2407.09946 link
2024-07-13 MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts Zhenpeng Su et.al. 2407.09816 link
2024-07-12 Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts Zeliang Zhang et.al. 2407.09590 null
2024-07-11 An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio Siding Zeng et.al. 2407.08239 null
2024-07-10 MoVEInt: Mixture of Variational Experts for Learning Human-Robot Interactions from Demonstrations Vignesh Prasad et.al. 2407.07636 link
2024-07-10 Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation Szymon Płotka et.al. 2407.07514 link
2024-07-09 A Simple Architecture for Enterprise Large Language Model Applications based on Role based security and Clearance Levels using Retrieval-Augmented Generation or Mixture of Experts Atilla Özgür et.al. 2407.06718 null
2024-07-06 SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation Guoan Wang et.al. 2407.04938 null
2024-07-06 Completed Feature Disentanglement Learning for Multimodal MRIs Analysis Tianling Liu et.al. 2407.04916 null
2024-07-05 YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation Sungkyun Chang et.al. 2407.04822 link
2024-07-05 Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement Yongji Wu et.al. 2407.04656 null
2024-07-05 MobileFlow: A Multimodal LLM For Mobile GUI Agent Songqin Nong et.al. 2407.04346 null
2024-07-04 Mixture of A Million Experts Xu Owen He et.al. 2407.04153 null
2024-07-02 Terminating Differentiable Tree Experts Jonathan Thomm et.al. 2407.02060 null
2024-07-05 Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models Zihan Wang et.al. 2407.01906 link
2024-07-01 Uncertainty Quantification in Table Structure Recognition Kehinde Ajayi et.al. 2407.01731 link
2024-07-01 Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning Yixiao Wang et.al. 2407.01531 null
2024-07-01 Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation Nadezhda Chirkova et.al. 2407.01126 null
2024-07-01 Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs Enshu Liu et.al. 2407.00945 link
2024-07-03 Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules Xinglin Pan et.al. 2407.00599 link
2024-06-28 One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts Ruochen Wang et.al. 2407.00256 link
2024-06-28 LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models Renzhi Wang et.al. 2406.20030 null
2024-06-28 Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model Longrong Yang et.al. 2406.19905 link
2024-06-28 SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR Qiuming Zhao et.al. 2406.19706 link
2024-06-27 A Teacher Is Worth A Million Instructions Nikhil Kothari et.al. 2406.19112 null
2024-06-27 Towards Personalized Federated Multi-scenario Multi-task Recommendation Yue Ding et.al. 2406.18938 null
2024-06-26 Mixture of Experts in a Mixture of RL settings Timon Willi et.al. 2406.18420 null
2024-06-26 A Closer Look into Mixture-of-Experts in Large Language Models Ka Man Lo et.al. 2406.18219 link
2024-06-26 SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR Shuaishuai Ye et.al. 2406.18021 null
2024-06-24 Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction Bruce Rushing et.al. 2406.17150 link
2024-06-24 LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training Tong Zhu et.al. 2406.16554 link
2024-06-25 OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser Jingze Shi et.al. 2406.16495 link
2024-06-24 Theory on Mixture-of-Experts in Continual Learning Hongbo Li et.al. 2406.16437 null
2024-06-22 SimSMoE: Solving Representational Collapse via Similarity Measure Giang Do et.al. 2406.15883 null
2024-06-20 Voice Disorder Analysis: a Transformer-based Approach Alkis Koudounas et.al. 2406.14693 link
2024-06-19 Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation Qian Chen et.al. 2406.13583 null
2024-06-19 AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models Zihao Zeng et.al. 2406.13233 link
2024-06-18 Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts Haoxiang Wang et.al. 2406.12845 link
2024-06-18 P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts Yuhao Dan et.al. 2406.12548 null
2024-06-18 Variational Distillation of Diffusion Policies into Mixture of Experts Hongyi Zhou et.al. 2406.12538 null
2024-06-18 GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory Haoze Wu et.al. 2406.12375 link
2024-06-17 Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding Ukyo Honda et.al. 2406.12060 link
2024-06-17 DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence DeepSeek-AI et.al. 2406.11931 link
2024-06-17 Graph Knowledge Distillation to Mixture of Experts Pavel Rumiantsev et.al. 2406.11919 link
2024-06-17 $\texttt{MoE-RBench}$ : Towards Building Reliable Language Models with Sparse Mixture-of-Experts Guanjie Chen et.al. 2406.11353 link
2024-06-17 Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts Tong Zhu et.al. 2406.11256 link
2024-06-14 Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion Anke Tang et.al. 2406.09770 link
2024-06-13 DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts Joel Ong et.al. 2406.08742 link
2024-06-12 Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark Pingzhi Li et.al. 2406.08155 link
2024-06-11 Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Yixin Song et.al. 2406.05955 null
2024-06-08 Flexible and Adaptable Summarization via Expertise Separation Xiuying Chen et.al. 2406.05360 link
2024-06-07 MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter Jitai Hao et.al. 2406.04984 link
2024-06-07 MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks Xingkui Zhu et.al. 2406.04801 link
2024-06-05 Style Mixture of Experts for Expressive Text-To-Speech Synthesis Ahad Jawaid et.al. 2406.03637 null
2024-06-05 Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach Haoyu Han et.al. 2406.03464 null
2024-06-05 Continual Traffic Forecasting via Mixture of Experts Sanghyun Lee et.al. 2406.03140 null
2024-06-05 Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models Raeid Saqur et.al. 2406.02969 null
2024-06-04 Parrot: Multilingual Visual Instruction Tuning Hai-Long Sun et.al. 2406.02539 link
2024-06-04 Demystifying the Compression of Mixture-of-Experts Through a Unified Framework Shwai He et.al. 2406.02500 link
2024-06-02 Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts – Physics Informed Neural Operator Forward Model Clement Etienam et.al. 2406.00889 link
2024-06-01 A Gaussian Process-based Streaming Algorithm for Prediction of Time Series With Regimes and Outliers Daniel Waxman et.al. 2406.00570 link
2024-06-01 Optimizing 6G Integrated Sensing and Communications (ISAC) via Expert Networks Jiacheng Wang et.al. 2406.00408 null
2024-05-30 Low-dimensional approximations of the conditional law of Volterra processes: a non-positive curvature approach Reza Arabpour et.al. 2405.20094 null
2024-06-02 MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors Renzhi Wang et.al. 2405.19086 null
2024-06-02 Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design Markus J. Buehler et.al. 2405.19076 link
2024-05-29 Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization Shengcai Liu et.al. 2405.18884 link
2024-05-29 MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models Taehyun Kim et.al. 2405.18832 null
2024-05-29 Yuan 2.0-M32: Mixture of Experts with Attention Router Shaohua Wu et.al. 2405.17976 link
2024-05-28 LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design Rui Kong et.al. 2405.17741 null
2024-05-27 Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node Andreas Charalampopoulos et.al. 2405.16836 link
2024-05-26 Mixture of Experts Using Tensor Products Zhan Su et.al. 2405.16671 link
2024-05-30 A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts Mohammed Nowaz Rabbani Chowdhury et.al. 2405.16646 null
2024-05-26 Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation Rongyu Zhang et.al. 2405.16486 link
2024-05-25 MoEUT: Mixture-of-Experts Universal Transformers Róbert Csordás et.al. 2405.16039 link
2024-05-23 Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training Xianzhi Du et.al. 2405.15052 link
2024-05-23 Unchosen Experts Can Contribute Too: Unleashing MoE Models’ Power by Self-Contrast Chufan Shi et.al. 2405.14507 link
2024-05-23 Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models Yongxin Guo et.al. 2405.14297 link
2024-05-23 Graph Sparsification via Mixture of Graphs Guibin Zhang et.al. 2405.14260 link
2024-05-23 Statistical Advantages of Perturbing Cosine Router in Sparse Mixture of Experts Huy Nguyen et.al. 2405.14131 null
2024-05-23 Mixture of Experts Meets Prompt-Based Continual Learning Minh Le et.al. 2405.14124 link
2024-05-22 Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts Huy Nguyen et.al. 2405.13997 null
2024-05-22 xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token Xin Cheng et.al. 2405.13792 link
2024-05-24 MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models Jingwei Xu et.al. 2405.13053 link
2024-05-21 Optimizing Generative AI Networking: A Dual Perspective with Multi-Agent Systems and Mixture of Experts Ruichen Zhang et.al. 2405.12472 null
2024-05-21 Ensemble and Mixture-of-Experts DeepONets For Operator Learning Ramansh Sharma et.al. 2405.11907 null
2024-05-19 Learning More Generalized Experts by Merging Experts in Mixture-of-Experts Sejik Park et.al. 2405.11530 null
2024-05-18 Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts Yunxin Li et.al. 2405.11273 link
2024-05-16 Many Hands Make Light Work: Task-Oriented Dialogue System with Module-Based Mixture-of-Experts Ruolin Su et.al. 2405.09744 null
2024-05-15 M $^4$ oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts Yufeng Jiang et.al. 2405.09446 link
2024-05-13 Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition Zhiyong Yang et.al. 2405.07780 link
2024-05-07 SUTRA: Scalable Multilingual Language Model Architecture Abhijit Bendale et.al. 2405.06694 null
2024-05-09 A Mixture of Experts Approach to 3D Human Motion Prediction Edmund Shieh et.al. 2405.06088 link
2024-05-09 A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds Christopher Z. Cui et.al. 2405.06059 null
2024-05-09 EWMoE: An effective model for global weather forecasting with mixture-of-experts Lihao Gan et.al. 2405.06004 link
2024-05-09 CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts Jiachen Li et.al. 2405.05949 link
2024-05-16 DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model DeepSeek-AI et.al. 2405.04434 link
2024-05-07 Enhancing Physical Layer Communication Security through Generative AI with Mixture of Experts Changyuan Zhao et.al. 2405.04198 null
2024-05-06 Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training Zexuan Zhong et.al. 2405.03133 null
2024-05-06 WDMoE: Wireless Distributed Large Language Models with Mixture of Experts Nan Xue et.al. 2405.03131 null

<a href=#updated-on-20250124>(back to top)</a>