Table of Contents
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-01-13 | Robust Single Object Tracking in LiDAR Point Clouds under Adverse Weather Conditions | Xiantong Zhao et.al. | 2501.07133 | null |
2025-01-05 | DeTrack: In-model Latent Denoising Learning for Visual Object Tracking | Xinyu Zhou et.al. | 2501.02467 | null |
2025-01-13 | FusionSORT: Fusion Methods for Online Multi-object Visual Tracking | Nathanael L. Baisa et.al. | 2501.00843 | link |
2025-01-01 | Less is More: Token Context-aware Learning for Object Tracking | Chenlong Xu et.al. | 2501.00758 | null |
2024-12-28 | Learning Adaptive and View-Invariant Vision Transformer with Multi-Teacher Knowledge Distillation for Real-Time UAV Tracking | You Wu et.al. | 2412.20002 | link |
2024-12-26 | SUTrack: Towards Simple and Unified Single Object Tracking | Xin Chen et.al. | 2412.19138 | link |
2024-12-15 | Exploring Enhanced Contextual Information for Video-Level Object Tracking | Ben Kang et.al. | 2412.11023 | link |
2024-12-13 | Visual Object Tracking across Diverse Data Modalities: A Review | Mengmeng Wang et.al. | 2412.09991 | null |
2024-12-13 | MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual Cues | Zhaofeng Hu et.al. | 2412.02734 | link |
2024-12-03 | GSOT3D: Towards Generic 3D Single Object Tracking in the Wild | Yifan Jiao et.al. | 2412.02129 | link |
2024-11-28 | Improving Accuracy and Generalization for Efficient Visual Tracking | Ram Zaveri et.al. | 2411.18855 | null |
2024-11-27 | A comparison of extended object tracking with multi-modal sensors in indoor environment | Jiangtao Shuai et.al. | 2411.18476 | null |
2024-12-04 | A Distractor-Aware Memory for Visual Object Tracking with SAM2 | Jovana Videnovic et.al. | 2411.17576 | link |
2024-11-23 | How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking | Xuchen Li et.al. | 2411.15600 | null |
2024-11-24 | ClickTrack: Towards Real-time Interactive Single Object Tracking | Kuiran Wang et.al. | 2411.13183 | null |
2024-11-30 | SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory | Cheng-Yen Yang et.al. | 2411.11922 | link |
2024-12-09 | Vision Eagle Attention: a new lens for advancing image classification | Mahmudul Hasan et.al. | 2411.10564 | link |
2024-11-14 | MFTIQ: Multi-Flow Tracker with Independent Matching Quality Estimation | Jonas Serych et.al. | 2411.09551 | link |
2024-11-12 | Visual Tracking with Intermittent Visibility: Switched Control Design and Implementation | Yangge Li et.al. | 2411.08144 | null |
2024-12-16 | ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model | Yiming Sun et.al. | 2411.01756 | null |
2024-10-30 | IP-MOT: Instance Prompt Learning for Cross-Domain Multi-Object Tracking | Run Luo et.al. | 2410.23907 | null |
2024-10-27 | NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tracking | Yu Liu et.al. | 2410.20421 | link |
2024-10-19 | The Solution for Single Object Tracking Task of Perception Test Challenge 2024 | Zhiqiang Zhong et.al. | 2410.16329 | null |
2024-10-13 | Gaussian Splatting Visual MPC for Granular Media Manipulation | Wei-Cheng Tseng et.al. | 2410.09740 | null |
2024-10-09 | DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM | Xuchen Li et.al. | 2410.02492 | null |
2024-09-30 | Opt-in Camera: Person Identification in Video via UWB Localization and Its Application to Opt-in Systems | Matthew Ishige et.al. | 2409.19891 | null |
2024-09-27 | Improving Visual Object Tracking through Visual Prompting | Shih-Fang Chen et.al. | 2409.18901 | link |
2024-09-26 | General Compression Framework for Efficient Transformer Object Tracking | Lingyi Hong et.al. | 2409.17564 | null |
2024-09-25 | Towards Underwater Camouflaged Object Tracking: An Experimental Evaluation of SAM and SAM 2 | Chunhui Zhang et.al. | 2409.16902 | link |
2024-09-25 | Conditional Generative Denoiser for Nighttime UAV Tracking | Yucheng Wang et.al. | 2409.16834 | link |
2024-09-25 | Progressive Representation Learning for Real-Time UAV Tracking | Changhong Fu et.al. | 2409.16652 | link |
2024-09-25 | Enhancing Nighttime UAV Tracking with Light Distribution Suppression | Liangliang Yao et.al. | 2409.16631 | link |
2024-09-19 | WeHelp: A Shared Autonomy System for Wheelchair Users | Abulikemu Abuduweili et.al. | 2409.12159 | link |
2024-09-18 | Distilling Channels for Efficient Deep Tracking | Shiming Ge et.al. | 2409.11785 | null |
2024-09-13 | Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark | Xuchen Li et.al. | 2409.08887 | null |
2024-09-10 | VBIT: Towards Enhancing Privacy Control Over IoT Devices | Jad Al Aaraj et.al. | 2409.06233 | null |
2024-09-03 | Ultra-broadband room-temperature Fourier transform spectrometer with watt-level power consumption | Jakub Mnich et.al. | 2409.01875 | null |
2024-08-25 | Camouflaged_Object_Tracking__A_Benchmark | Xiaoyu Guo et.al. | 2408.13877 | null |
2024-08-21 | Low-Light Object Tracking: A Benchmark | Pengzhi Zhong et.al. | 2408.11463 | link |
2024-08-20 | MambaEVT: Event Stream based Visual Object Tracking using State Space Model | Xiao Wang et.al. | 2408.10487 | link |
2024-08-05 | VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking | Yuxuan Lu et.al. | 2408.02263 | null |
2024-09-06 | 3D Single-object Tracking in Point Clouds with High Temporal Variation | Qiao Wu et.al. | 2408.02049 | null |
2024-09-09 | SiamMo: Siamese Motion-Centric 3D Object Tracking | Yuxiang Yang et.al. | 2408.01688 | link |
2024-08-02 | Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach | Yabin Zhu et.al. | 2408.00969 | link |
2024-08-06 | Broadband THz wave generation and detection in organic crystal PNPA at MHz repetition rates | Lukasz A. Sterczewski et.al. | 2407.20745 | null |
2024-07-16 | Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers | Zhengbo Zhang et.al. | 2407.08394 | null |
2024-07-11 | PINN-Ray: A Physics-Informed Neural Network to Model Soft Robotic Fin Ray Fingers | Xing Wang et.al. | 2407.08222 | null |
2024-07-07 | Addressing single object tracking in satellite imagery through prompt-engineered solutions | Athena Psalta et.al. | 2407.05518 | null |
2024-07-07 | Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking | You Wu et.al. | 2407.05383 | null |
2024-07-09 | P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds | Jiahao Nie et.al. | 2407.05238 | link |
2024-07-07 | Tracking Reflected Objects: A Benchmark | Xiaoyu Guo et.al. | 2407.05235 | null |
2024-07-04 | TrackPGD: A White-box Attack using Binary Masks against Robust Transformer Trackers | Fatemeh Nourilenjan Nokabadi et.al. | 2407.03946 | link |
2024-07-02 | FlowTrack: Point-level Flow Network for 3D Single Object Tracking | Shuo Li et.al. | 2407.01959 | null |
2024-09-07 | eMoE-Tracker: Environmental MoE-based Transformer for Robust Event-guided Object Tracking | Yucheng Chen et.al. | 2406.20024 | null |
2024-06-14 | Constrained Motion Planning for a Robotic Endoscope Holder based on Hierarchical Quadratic Programming | Jacinto Colan et.al. | 2406.09982 | null |
2024-06-14 | Robust compressive tracking via online weighted multiple instance learning | Sandeep Singh Sengar et.al. | 2406.09914 | null |
2024-07-01 | Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking | Xiangyang Yang et.al. | 2406.08037 | null |
2024-06-07 | Multi-Granularity Language-Guided Multi-Object Tracking | Yuhao Li et.al. | 2406.04844 | link |
2024-06-02 | Robust Visual Tracking via Iterative Gradient Descent and Threshold Selection | Zhuang Qi et.al. | 2406.00589 | null |
2024-05-28 | Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion | Hongze Sun et.al. | 2405.17903 | link |
2024-05-27 | LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking | Shaohua Dong et.al. | 2405.17660 | null |
2024-05-31 | Awesome Multi-modal Object Tracking | Chunhui Zhang et.al. | 2405.14200 | link |
2024-05-20 | DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM | Xuchen Li et.al. | 2405.12139 | null |
2024-05-16 | A Novel Bounding Box Regression Method for Single Object Tracking | Omar Abdelaziz et.al. | 2405.10444 | null |
2024-05-16 | Beyond Traditional Single Object Tracking: A Survey | Omar Abdelaziz et.al. | 2405.10439 | null |
2024-05-08 | TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking | Pengcheng Shao et.al. | 2405.05004 | link |
2024-04-22 | 360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos | Yinzhe Xu et.al. | 2404.13953 | null |
2024-05-25 | An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training | Jin Gao et.al. | 2404.12210 | link |
2024-04-16 | Attention-Aware Visualization: Tracking and Responding to User Perception Over Time | Arvind Srinivasan et.al. | 2404.10732 | null |
2024-04-15 | Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL | Fangwei Zhong et.al. | 2404.09857 | null |
2024-04-15 | Learning Tracking Representations from Single Point Annotations | Qiangqiang Wu et.al. | 2404.09504 | null |
2024-04-11 | PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds | Weisheng Xu et.al. | 2404.07495 | link |
2024-05-02 | Longitudinal Analysis and Quantitative Assessment of Child Development through Mobile Interaction | Juan Carlos Ruiz-Garcia et.al. | 2404.06919 | link |
2024-04-09 | LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks | Jianlang Chen et.al. | 2404.06247 | link |
2024-04-08 | Semi-Supervised Novelty Detection for Precise Ultra-Wideband Error Signal Prediction | Umberto Albertin et.al. | 2404.05351 | null |
2024-03-29 | Context-Aware Integration of Language and Visual References for Natural Language Tracking | Yanyan Shao et.al. | 2403.19975 | null |
2024-03-27 | TAFormer: A Unified Target-Aware Transformer for Video and Motion Joint Prediction in Aerial Scenes | Liangyu Xu et.al. | 2403.18238 | null |
2024-03-26 | OmniVid: A Generative Framework for Universal Video Understanding | Junke Wang et.al. | 2403.17935 | link |
2024-03-26 | Exploring Dynamic Transformer for Efficient Object Tracking | Jiawen Zhu et.al. | 2403.17651 | null |
2024-03-29 | Elysium: Exploring Object-level Perception in Videos via MLLM | Han Wang et.al. | 2403.16558 | link |
2024-03-25 | Multi-attention Associate Prediction Network for Visual Tracking | Xinglong Sun et.al. | 2403.16395 | null |
2024-03-28 | SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking | Xiaojun Hou et.al. | 2403.16002 | link |
2024-03-23 | Spatio-Temporal Bi-directional Cross-frame Memory for Distractor Filtering Point Cloud Single Object Tracking | Shaoyu Sun et.al. | 2403.15831 | null |
2024-03-19 | TON-VIO: Online Time Offset Modeling Networks for Robust Temporal Alignment in High Dynamic Motion VIO | Chaoran Xiong et.al. | 2403.12504 | null |
2024-03-18 | Pedestrian Tracking with Monocular Camera using Unconstrained 3D Motion Model | Jan Krejčí et.al. | 2403.11978 | null |
2024-03-16 | A Spectrum-based Image Denoising Method with Edge Feature Enhancement | Peter Luvton et.al. | 2403.11036 | null |
2024-03-15 | Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers | Jinxia Xie et.al. | 2403.10574 | null |
2024-03-14 | OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning | Lingyi Hong et.al. | 2403.09634 | null |
2024-02-27 | ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking | Yushan Han et.al. | 2403.07914 | null |
2024-04-03 | Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline | Xiao Wang et.al. | 2403.05839 | link |
2024-03-08 | Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance | Liting Lin et.al. | 2403.05231 | link |
2024-03-08 | Motion-Guided Dual-Camera Tracker for Low-Cost Skill Evaluation of Gastric Endoscopy | Yuelin Zhang et.al. | 2403.05146 | link |
2024-03-06 | VastTrack: Vast Category Visual Object Tracking | Liang Peng et.al. | 2403.03493 | link |
2024-02-28 | Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks | Zhewei Wu et.al. | 2402.17976 | null |
2024-02-26 | SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud Tracking | Yu Lin et.al. | 2402.16249 | link |
2024-02-26 | Reading Relevant Feature from Global Representation Memory for Visual Object Tracking | Xinyu Zhou et.al. | 2402.14392 | null |
2024-02-13 | Optimized Information Flow for Transformer Tracking | Janani Kugarajeevan et.al. | 2402.08195 | link |
2024-02-07 | BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision | Xin Zhao et.al. | 2402.04519 | null |
2024-02-04 | Spatio-temporal Prompting Network for Robust Video Feature Extraction | Guanxiong Sun et.al. | 2402.02574 | link |
2024-01-24 | Small Object Tracking in LiDAR Point Cloud: Learning the Target-awareness Prototype and Fine-grained Search Region | Shengjing Tian et.al. | 2401.13285 | null |
2024-01-23 | Correlation-Embedded Transformer Tracking: A Single-Branch Framework | Fei Xie et.al. | 2401.12743 | link |
2024-01-20 | Unifying Visual and Vision-Language Tracking via Contrastive Learning | Yinchao Ma et.al. | 2401.11228 | link |
2024-01-20 | Towards Category Unification of 3D Single Object Tracking on Point Clouds | Jiahao Nie et.al. | 2401.11204 | null |
2024-01-18 | Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual Tracking | Amir M. Mansourian et.al. | 2401.09942 | null |
2024-01-12 | Dense Optical Flow Estimation Using Sparse Regularizers from Reduced Measurements | Muhammad Wasim Nawaz et.al. | 2401.06396 | null |
2024-01-18 | Hold 'em and Fold 'em: Towards Human-scale, Feedback-Controlled Soft Origami Robots | Immanuel Ampomah Mensah et.al. | 2401.04650 | null |
2024-01-06 | Explicit Visual Prompts for Visual Object Tracking | Liangtao Shi et.al. | 2401.03142 | link |
2024-01-03 | ODTrack: Online Dense Temporal Token Learning for Visual Tracking | Yaozong Zheng et.al. | 2401.01686 | link |
2023-12-27 | X Modality Assisting RGBT Object Tracking | Zhaisheng Ding et.al. | 2312.17273 | null |
2023-12-22 | Cross-Modal Object Tracking via Modality-Aware Fusion Network and A Large-Scale Dataset | Lei Liu et.al. | 2312.14446 | link |
2023-12-18 | Multi-Correlation Siamese Transformer Network with Dense Connection for 3D Single Object Tracking | Shihao Feng et.al. | 2312.11051 | link |
2023-12-17 | Robust 3D Tracking with Quality-Aware Shape Completion | Jingwen Zhang et.al. | 2312.10608 | null |
2023-12-15 | Tracking Skiers from the Top to the Bottom | Matteo Dunnhofer et.al. | 2312.09723 | null |
2023-12-11 | M3SOT: Multi-frame, Multi-field, Multi-space 3D Single Object Tracking | Jiaming Liu et.al. | 2312.06117 | link |
2023-12-07 | Instance Tracking in 3D Scenes from Egocentric Videos | Yunhan Zhao et.al. | 2312.04117 | link |
2024-02-19 | Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for Vision-Language Tracking | Jiawei Ge et.al. | 2311.17085 | null |
2023-11-21 | Visual tracking brain computer interface | Changxing Huang et.al. | 2311.12592 | null |
2024-01-10 | ViKi-HyCo: A Hybrid-Control approach for complex car-like maneuvers | Edison P. Velasco Sánchez et.al. | 2311.07268 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-01-31 | Low-Rank Adapting Models for Sparse Autoencoders | Matthew Chen et.al. | 2501.19406 | null |
2025-01-31 | Vintix: Action Model via In-Context Reinforcement Learning | Andrey Polubarov et.al. | 2501.19400 | link |
2025-01-31 | Scalable-Softmax Is Superior for Attention | Ken M. Nakanishi et.al. | 2501.19399 | null |
2025-01-31 | Do LLMs Strategically Reveal, Conceal, and Infer Information? A Theoretical and Empirical Analysis in The Chameleon Game | Mustafa O. Karabag et.al. | 2501.19398 | null |
2025-01-31 | s1: Simple test-time scaling | Niklas Muennighoff et.al. | 2501.19393 | link |
2025-01-31 | Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models | Alina Shutova et.al. | 2501.19392 | null |
2025-01-31 | Federated Sketching LoRA: On-Device Collaborative Fine-Tuning of Large Language Models | Wenzhi Fang et.al. | 2501.19389 | null |
2025-01-31 | Decoding-based Regression | Xingyou Song et.al. | 2501.19383 | link |
2025-01-31 | TableMaster: A Recipe to Advance Table Understanding with Language Models | Lang Cao et.al. | 2501.19378 | null |
2025-01-31 | SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions | Dominik Wagner et.al. | 2501.19377 | null |
2025-01-31 | We're Different, We're the Same: Creative Homogeneity Across LLMs | Emily Wenger et.al. | 2501.19361 | null |
2025-01-31 | Mechanical Properties of the Meninges: Large Language Model Assisted Systematic Review of over 25,000 Studies | Brandon P. Chelstrom et.al. | 2501.19359 | null |
2025-01-31 | The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking | Yuchun Miao et.al. | 2501.19358 | null |
2025-01-31 | Towards Adaptive Self-Improvement for Smarter Energy Systems | Alexander Sommer et.al. | 2501.19340 | null |
2025-01-31 | PixelWorld: Towards Perceiving Everything as Pixels | Zhiheng Lyu et.al. | 2501.19339 | null |
2025-01-31 | Homogeneity Bias as Differential Sampling Uncertainty in Language Models | Messi H. J. Lee et.al. | 2501.19337 | null |
2025-01-31 | Reward-Guided Speculative Decoding for Efficient LLM Reasoning | Baohao Liao et.al. | 2501.19324 | null |
2025-01-31 | MINDSTORES: Memory-Informed Neural Decision Synthesis for Task-Oriented Reinforcement in Embodied Systems | Anirudh Chari et.al. | 2501.19318 | null |
2025-01-31 | LLM-based Affective Text Generation Quality Based on Different Quantization Values | Yarik Menchaca Resendiz et.al. | 2501.19317 | null |
2025-01-31 | An Efficient Approach for Machine Translation on Low-resource Languages: A Case Study in Vietnamese-Chinese | Tran Ngoc Son et.al. | 2501.19314 | null |
2025-01-30 | Foundational Models for 3D Point Clouds: A Survey and Outlook | Vishal Thengane et.al. | 2501.18594 | null |
2025-01-30 | Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models | Hao Dong et.al. | 2501.18592 | link |
2025-01-30 | Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs | Yue Wang et.al. | 2501.18585 | null |
2025-01-30 | Prediction-Powered Inference with Imputed Covariates and Nonuniform Sampling | Dan M. Kluger et.al. | 2501.18577 | link |
2025-01-30 | Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH | Evgenii Evstafev et.al. | 2501.18576 | null |
2025-01-30 | BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos | Lehao Lin et.al. | 2501.18565 | null |
2025-01-30 | SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation | Haoquan Fang et.al. | 2501.18564 | null |
2025-01-30 | Semantic Web and Creative AI -- A Technical Report from ISWS 2023 | Raia Abu Ahmad et.al. | 2501.18542 | null |
2025-01-30 | Loss Functions and Operators Generated by f-Divergences | Vincent Roulet et.al. | 2501.18537 | null |
2025-01-30 | Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges | Manveer Singh Tamber et.al. | 2501.18536 | link |
2025-01-30 | Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models | Yi Ding et.al. | 2501.18533 | null |
2025-01-30 | Differentially Private Steering for Large Language Model Alignment | Anmol Goel et.al. | 2501.18532 | link |
2025-01-30 | Learn from the Past: Language-conditioned Object Rearrangement with Large Language Models | Guanqun Cao et.al. | 2501.18516 | null |
2025-01-30 | Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch | Arthur Douillard et.al. | 2501.18512 | null |
2025-01-30 | WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training | Benjamin Feuer et.al. | 2501.18511 | link |
2025-01-30 | CLEAR: Cue Learning using Evolution for Accurate Recognition Applied to Sustainability Data Extraction | Peter J. Bentley et.al. | 2501.18504 | null |
2025-01-30 | A Tool for In-depth Analysis of Code Execution Reasoning of Large Language Models | Changshu Liu et.al. | 2501.18482 | null |
2025-01-30 | CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization | Yanxia Deng et.al. | 2501.18475 | null |
2025-01-30 | Tuning Vision Foundation Model via Test-Time Prompt-Guided Training for VFSS Segmentations | Chengxi Zeng et.al. | 2501.18474 | null |
2025-01-30 | A Benchmark and Evaluation for Real-World Out-of-Distribution Detection Using Vision-Language Models | Shiho Noda et.al. | 2501.18463 | link |
2025-01-29 | Learning Beyond the Surface: How Far Can Continual Pre-Training with LoRA Enhance LLMs' Domain-Specific Insight Learning? | Pouya Pezeshkpour et.al. | 2501.17840 | link |
2025-01-29 | Matrix Product Sketching via Coordinated Sampling | Majid Daliri et.al. | 2501.17836 | null |
2025-01-29 | Aggregation Schemes for Single-Vector WSI Representation Learning in Digital Pathology | Sobhan Hemati et.al. | 2501.17822 | null |
2025-01-29 | Leveraging Multimodal LLM for Inspirational User Interface Search | Seokhyeon Park et.al. | 2501.17799 | link |
2025-01-29 | BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights | Chan-Jan Hsu et.al. | 2501.17790 | null |
2025-01-29 | Reasoning Over the Glyphs: Evaluation of LLM's Decipherment of Rare Scripts | Yu-Fei Shih et.al. | 2501.17785 | null |
2025-01-29 | AdditiveLLM: Large Language Models Predict Defects in Additive Manufacturing | Peter Pak et.al. | 2501.17784 | null |
2025-01-29 | 2SSP: A Two-Stage Framework for Structured Pruning of LLMs | Fabrizio Sandri et.al. | 2501.17771 | link |
2025-01-29 | Hybrid Graphs for Table-and-Text based Question Answering using LLMs | Ankush Agarwal et.al. | 2501.17767 | null |
2025-01-29 | On the Partitioning of GPU Power among Multi-Instances | Tirth Vamja et.al. | 2501.17752 | null |
2025-01-29 | Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation | Aitor Arrieta et.al. | 2501.17749 | null |
2025-01-29 | A technical review of multi-omics data integration methods: from classical statistical to deep generative approaches | Ana R. Baião et.al. | 2501.17729 | null |
2025-01-29 | Using Code Generation to Solve Open Instances of Combinatorial Design Problems | Christopher D. Rosin et.al. | 2501.17725 | link |
2025-01-29 | RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts | Eujeong Choi et.al. | 2501.17715 | link |
2025-01-29 | Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate | Yubo Wang et.al. | 2501.17703 | null |
2025-01-29 | Planning with Vision-Language Models and a Use Case in Robot-Assisted Teaching | Xuzhe Dang et.al. | 2501.17665 | null |
2025-01-29 | Exploring Vision Language Models for Multimodal and Multilingual Stance Detection | Jake Vasilakes et.al. | 2501.17654 | null |
2025-01-29 | Tonguescape: Exploring Language Models Understanding of Vowel Articulation | Haruki Sakajo et.al. | 2501.17643 | link |
2025-01-29 | Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation | Lin Chen et.al. | 2501.17642 | null |
2025-01-29 | In-Context Meta LoRA Generation | Yihua Shao et.al. | 2501.17635 | null |
2025-01-28 | SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training | Tianzhe Chu et.al. | 2501.17161 | null |
2025-01-28 | AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders | Zhengxuan Wu et.al. | 2501.17148 | link |
2025-01-28 | FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data | Deren Lei et.al. | 2501.17144 | link |
2025-01-28 | ASTRAL: Automated Safety Testing of Large Language Models | Miriam Ugarte et.al. | 2501.17132 | null |
2025-01-28 | Scenario Understanding of Traffic Scenes Through Large Visual Language Models | Rivera Esteban et.al. | 2501.17131 | null |
2025-01-28 | Histoires Morales: A French Dataset for Assessing Moral Alignment | Thibaud Leteno et.al. | 2501.17117 | link |
2025-01-28 | Optimizing Large Language Model Training Using FP4 Quantization | Ruizhe Wang et.al. | 2501.17116 | null |
2025-01-28 | Unlocking Transparent Alignment Through Enhanced Inverse Constitutional AI for Principle Extraction | Carl-Leander Henneking et.al. | 2501.17112 | null |
2025-01-28 | COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models | Tobias Materzok et.al. | 2501.17104 | null |
2025-01-28 | Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving | Evgenii Evstafev et.al. | 2501.17084 | null |
2025-01-28 | Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding | Akash Kumar et.al. | 2501.17053 | null |
2025-01-28 | How Linguistics Learned to Stop Worrying and Love the Language Models | Richard Futrell et.al. | 2501.17047 | null |
2025-01-28 | Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models | Minghan Li et.al. | 2501.17039 | null |
2025-01-28 | Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies | Manojkumar Parmar et.al. | 2501.17030 | null |
2025-01-28 | Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs | Alessandro Midolo et.al. | 2501.17024 | link |
2025-01-28 | Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement | Kei Katsumata et.al. | 2501.17022 | link |
2025-01-28 | Large Language Models for Code Generation: The Practitioners Perspective | Zeeshan Rasheed et.al. | 2501.16998 | link |
2025-01-28 | Artificial Intelligence Clones | Annie Liang et.al. | 2501.16996 | null |
2025-01-28 | FedEFM: Federated Endovascular Foundation Model with Unseen Data | Tuong Do et.al. | 2501.16992 | null |
2025-01-28 | Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection | Xiangyu Gao et.al. | 2501.16981 | null |
2025-01-27 | LUCY: Linguistic Understanding and Control Yielding Early Stage of Her | Heting Gao et.al. | 2501.16327 | link |
2025-01-27 | Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology | Meiyun Cao et.al. | 2501.16309 | null |
2025-01-27 | RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval | Long Nguyen et.al. | 2501.16303 | null |
2025-01-27 | Matryoshka Re-Ranker: A Flexible Re-Ranking Architecture With Configurable Depth and Width | Zheng Liu et.al. | 2501.16302 | null |
2025-01-27 | Large Models in Dialogue for Active Perception and Anomaly Detection | Tzoulio Chamiti et.al. | 2501.16300 | link |
2025-01-27 | FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers | Renshan Zhang et.al. | 2501.16297 | null |
2025-01-27 | Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models | Jing Zhang et.al. | 2501.16282 | null |
2025-01-27 | Do LLMs Have Visualization Literacy? An Evaluation on Modified Visualizations to Test Generalization in Data Interpretation | Jiayi Hong et.al. | 2501.16277 | link |
2025-01-27 | URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots -- A Case Study at HCMUT | Long Nguyen et.al. | 2501.16276 | null |
2025-01-27 | Return of the Encoder: Maximizing Parameter Efficiency for SLMs | Mohamed Elfeki et.al. | 2501.16273 | link |
2025-01-27 | A foundation model for human-AI collaboration in medical literature mining | Zifeng Wang et.al. | 2501.16255 | null |
2025-01-27 | Multi-Agent Geospatial Copilots for Remote Sensing Workflows | Chaehong Lee et.al. | 2501.16254 | null |
2025-01-27 | Zero-Shot Decision Tree Construction via Large Language Models | Lucas Carrasco et.al. | 2501.16247 | null |
2025-01-27 | CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation | Xiaochuan Ma et.al. | 2501.16246 | null |
2025-01-27 | Phase Transitions in Large Language Models and the |
Youran Sun et.al. | 2501.16241 | null |
2025-01-27 | AiGet: Transforming Everyday Moments into Hidden Knowledge Discovery with AI Assistance on Smart Glasses | Runze Cai et.al. | 2501.16240 | null |
2025-01-27 | Distilling foundation models for robust and efficient models in digital pathology | Alexandre Filiot et.al. | 2501.16239 | null |
2025-01-27 | Language-Based Bayesian Optimization Research Assistant (BORA) | Abdoulatif Cissé et.al. | 2501.16224 | null |
2025-01-27 | Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models | Huayu Li et.al. | 2501.16215 | link |
2025-01-27 | Provence: efficient and robust context pruning for retrieval-augmented generation | Nadezhda Chirkova et.al. | 2501.16214 | null |
2025-01-24 | HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation | Xin Zhou et.al. | 2501.14729 | link |
2025-01-24 | Do LLMs Provide Consistent Answers to Health-Related Questions across Languages? | Ipek Baris Schlicht et.al. | 2501.14719 | null |
2025-01-24 | Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models | Naihao Deng et.al. | 2501.14717 | null |
2025-01-24 | FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing | James Seale Smith et.al. | 2501.14713 | null |
2025-01-24 | The Karp Dataset | Mason DiCicco et.al. | 2501.14705 | null |
2025-01-24 | Rethinking Table Instruction Tuning | Naihao Deng et.al. | 2501.14693 | null |
2025-01-24 | Rethinking Foundation Models for Medical Image Classification through a Benchmark Study on MedMNIST | Fuping Wu et.al. | 2501.14685 | null |
2025-01-24 | An Empirical Study on LLM-based Classification of Requirements-related Provisions in Food-safety Regulations | Shabnam Hassani et.al. | 2501.14683 | null |
2025-01-24 | Diffusion based Text-to-Music Generationwith Global and Local Text based Conditioning | Jisi Zhang et.al. | 2501.14680 | null |
2025-01-24 | MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications | Yixing Jiang et.al. | 2501.14654 | link |
2025-01-24 | Investigating the (De)Composition Capabilities of Large Language Models in Natural-to-Formal Language Conversion | Ziyao Xu et.al. | 2501.14649 | link |
2025-01-24 | Recommending Actionable Strategies: A Semantic Approach to Integrating Analytical Frameworks with Decision Heuristics | Renato Ghisellini et.al. | 2501.14634 | null |
2025-01-24 | Extracting Problem Structure with LLMs for Optimized SAT Local Search | André Schilder et.al. | 2501.14630 | null |
2025-01-24 | ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations | Tianming Liang et.al. | 2501.14607 | null |
2025-01-24 | Knowledge Graphs Construction from Criminal Court Appeals: Insights from the French Cassation Court | Alexander V. Belikov et.al. | 2501.14579 | null |
2025-01-24 | ZETA: Leveraging Z-order Curves for Efficient Top-k Attention | Qiuhao Zeng et.al. | 2501.14577 | null |
2025-01-24 | Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding | Zhongyi Shui et.al. | 2501.14548 | link |
2025-01-24 | Leveraging ChatGPT's Multimodal Vision Capabilities to Rank Satellite Images by Poverty Level: Advancing Tools for Social Science Research | Hamid Sarmadi et.al. | 2501.14546 | null |
2025-01-24 | VERUS-LM: a Versatile Framework for Combining LLMs with Symbolic Reasoning | Benjamin Callewaert et.al. | 2501.14540 | null |
2025-01-24 | Design and Implementation of a Psychiatry Resident Training System Based on Large Language Models | Zhenguang Zhong et.al. | 2501.14530 | link |
2025-01-23 | CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation | Guofeng Cui et.al. | 2501.13927 | null |
2025-01-23 | The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling Capabilities | Chan-Jan Hsu et.al. | 2501.13921 | link |
2025-01-23 | Analysis of Indic Language Capabilities in LLMs | Aatman Vaidya et.al. | 2501.13912 | null |
2025-01-23 | Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models | Linh Tran et.al. | 2501.13904 | null |
2025-01-23 | Exploring Finetuned Audio-LLM on Heart Murmur Features | Adrian Florea et.al. | 2501.13884 | null |
2025-01-23 | The machine learning platform for developers of large systems | Alexey Naikov et.al. | 2501.13881 | null |
2025-01-23 | A RAG-Based Institutional Assistant | Gustavo Kuratomi et.al. | 2501.13880 | null |
2025-01-23 | Dual-Modal Prototype Joint Learning for Compositional Zero-Shot Learning | Shiyu Zhang et.al. | 2501.13859 | null |
2025-01-23 | Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes | Shiling Deng et.al. | 2501.13851 | link |
2025-01-23 | Think Outside the Data: Colonial Biases and Systemic Issues in Automated Moderation Pipelines for Low-Resource Languages | Farhana Shahid et.al. | 2501.13836 | null |
2025-01-23 | On the Reasoning Capacity of AI Models and How to Quantify It | Santosh Kumar Radha et.al. | 2501.13833 | null |
2025-01-23 | Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing | Hao Zhang et.al. | 2501.13831 | null |
2025-01-23 | Hallucinations Can Improve Large Language Models in Drug Discovery | Shuzhou Yuan et.al. | 2501.13824 | null |
2025-01-23 | Large Language Model driven Policy Exploration for Recommender Systems | Jie Wang et.al. | 2501.13816 | null |
2025-01-23 | Enhancing LLMs for Governance with Human Oversight: Evaluating and Aligning LLMs on Expert Classification of Climate Misinformation for Detecting False or Misleading Claims about Climate Change | Mowafak Allaham et.al. | 2501.13802 | null |
2025-01-23 | PromptMono: Cross Prompting Attention for Self-Supervised Monocular Depth Estimation in Challenging Environments | Changhao Wang et.al. | 2501.13796 | null |
2025-01-23 | Training-Free Zero-Shot Temporal Action Detection with Vision-Language Models | Chaolei Han et.al. | 2501.13795 | null |
2025-01-23 | Parameter-Efficient Fine-Tuning for Foundation Models | Dan Zhang et.al. | 2501.13787 | link |
2025-01-23 | Not Every AI Problem is a Data Problem: We Should Be Intentional About Data Scaling | Tanya Rodchenko et.al. | 2501.13779 | null |
2025-01-23 | Explainable XR: Understanding User Behaviors of XR Environments using LLM-assisted Analytics Framework | Yoonsang Kim et.al. | 2501.13778 | link |
2025-01-22 | VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding | Boqiang Zhang et.al. | 2501.13106 | link |
2025-01-22 | Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment | Melissa Kazemi Rad et.al. | 2501.13080 | null |
2025-01-22 | Autonomy-of-Experts Models | Ang Lv et.al. | 2501.13074 | null |
2025-01-22 | Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning | Bohao Yang et.al. | 2501.13042 | link |
2025-01-22 | Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament | Yantao Liu et.al. | 2501.13007 | link |
2025-01-22 | Large Language Model-Based Semantic Communication System for Image Transmission | Soheyb Ribouh et.al. | 2501.12988 | null |
2025-01-22 | LLM4WM: Adapting LLM for Wireless Multi-Tasking | Xuanyu Liu et.al. | 2501.12983 | null |
2025-01-22 | OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models | Chongren Sun et.al. | 2501.12975 | link |
2025-01-22 | Accessible Smart Contracts Verification: Synthesizing Formal Models with Tamed LLMs | Jan Corazza et.al. | 2501.12972 | null |
2025-01-22 | It's complicated. The relationship of algorithmic fairness and non-discrimination regulations in the EU AI Act | Kristof Meding et.al. | 2501.12962 | null |
2025-01-22 | Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference | Weizhi Fei et.al. | 2501.12959 | null |
2025-01-22 | GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models | Pengxiang Zhao et.al. | 2501.12956 | null |
2025-01-22 | Correctness Assessment of Code Generated by Large Language Models Using Internal Representations | Tuan-Dung Bui et.al. | 2501.12934 | null |
2025-01-22 | DynamicEarth: How Far are We from Open-Vocabulary Change Detection? | Kaiyu Li et.al. | 2501.12931 | null |
2025-01-22 | A Functional Software Reference Architecture for LLM-Integrated Systems | Alessio Bucaioni et.al. | 2501.12904 | null |
2025-01-22 | Architectural Fusion Through Contextual Partitioning in Large Language Models: A Novel Approach to Parameterized Knowledge Integration | Offa Kingsleigh et.al. | 2501.12901 | null |
2025-01-22 | Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback | Yafu Li et.al. | 2501.12895 | link |
2025-01-22 | Generative AI Misuse Potential in Cyber Security Education: A Case Study of a UK Degree Program | Carlton Shepherd et.al. | 2501.12883 | null |
2025-01-22 | WisdomBot: Tuning Large Language Models with Artificial Intelligence Knowledge | Jingyuan Chen et.al. | 2501.12877 | null |
2025-01-22 | HierPromptLM: A Pure PLM-based Framework for Representation Learning on Heterogeneous Text-rich Networks | Qiuyu Zhu et.al. | 2501.12857 | null |
2025-01-21 | InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling | Yi Wang et.al. | 2501.12386 | link |
2025-01-21 | MMVU: Measuring Expert-Level Multi-Discipline Video Understanding | Yilun Zhao et.al. | 2501.12380 | link |
2025-01-21 | Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists | Thomas F. Eisenmann et.al. | 2501.12374 | link |
2025-01-21 | Is Long Context All You Need? Leveraging LLM's Extended Context for NL2SQL | Yeounoh Chung et.al. | 2501.12372 | null |
2025-01-21 | Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models | Samira Abnar et.al. | 2501.12370 | null |
2025-01-21 | InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model | Yuhang Zang et.al. | 2501.12368 | link |
2025-01-21 | Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2 | Md. Rakibul Islam et.al. | 2501.12356 | null |
2025-01-21 | Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration | Thomas Walshe et.al. | 2501.12332 | null |
2025-01-21 | Cinepro: Robust Training of Foundation Models for Cancer Detection in Prostate Ultrasound Cineloops | Mohamed Harmanani et.al. | 2501.12331 | link |
2025-01-21 | VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model | Xianwei Zhuang et.al. | 2501.12327 | link |
2025-01-21 | LLM-Assisted Knowledge Graph Completion for Curriculum and Domain Modelling in Personalized Higher Education Recommendations | Hasan Abu-Rasheed et.al. | 2501.12300 | null |
2025-01-21 | MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks | Qishen Zhou et.al. | 2501.12281 | link |
2025-01-21 | Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement | Maosong Cao et.al. | 2501.12273 | link |
2025-01-21 | CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification | Cristiano Patrício et.al. | 2501.12266 | null |
2025-01-21 | FOCUS: First Order Concentrated Updating Scheme | Yizhou Liu et.al. | 2501.12243 | null |
2025-01-21 | InsTALL: Context-aware Instructional Task Assistance with Multi-modal Large Language Models | Pha Nguyen et.al. | 2501.12231 | null |
2025-01-21 | CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning | Yuanheng Fang et.al. | 2501.12226 | null |
2025-01-21 | Leveraging Large Language Models for Realizing Truly Intelligent User Interfaces | Allard Oelen et.al. | 2501.12221 | null |
2025-01-21 | You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense | Wuyuao Mai et.al. | 2501.12210 | null |
2025-01-21 | Fixing Imbalanced Attention to Mitigate In-Context Hallucination of Large Vision-Language Model | Kazi Hasan Ibn Arif et.al. | 2501.12206 | link |
2025-01-17 | FaceXBench: Evaluating Multimodal LLMs on Face Understanding | Kartik Narayan et.al. | 2501.10360 | link |
2025-01-17 | Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems | Weibo Gao et.al. | 2501.10332 | null |
2025-01-17 | BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation | Suvodip Dey et.al. | 2501.10328 | link |
2025-01-17 | Large language models for automated scholarly paper review: A survey | Zhenzhen Zhuang et.al. | 2501.10326 | null |
2025-01-17 | Hierarchical Autoregressive Transformers: Combining Byte-~and Word-Level Processing for Robust, Adaptable Language Models | Pit Neitemeier et.al. | 2501.10322 | null |
2025-01-17 | HiMix: Reducing Computational Complexity in Large Vision-Language Models | Xuange Zhang et.al. | 2501.10318 | null |
2025-01-17 | Addressing Popularity Bias in Third-Party Library Recommendations Using LLMs | Claudio Di Sipio et.al. | 2501.10313 | null |
2025-01-17 | Computational Protein Science in the Era of Large Language Models (LLMs) | Wenqi Fan et.al. | 2501.10282 | null |
2025-01-17 | Test Wars: A Comparative Study of SBST, Symbolic Execution, and LLM-Based Approaches to Unit Test Generation | Azat Abdullin et.al. | 2501.10200 | null |
2025-01-17 | Generative Artificial Intelligence: Implications for Biomedical and Health Professions Education | William Hersh et.al. | 2501.10186 | null |
2025-01-17 | Multi-stage Training of Bilingual Islamic LLM for Neural Passage Retrieval | Vera Pavlova et.al. | 2501.10175 | null |
2025-01-17 | Dual Debiasing: Remove Stereotypes and Keep Factual Gender for Fair Language Modeling and Translation | Tomasz Limisiewicz et.al. | 2501.10150 | null |
2025-01-17 | A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features | Enes Karanfil et.al. | 2501.10144 | null |
2025-01-17 | Exploring the Impact of Generative Artificial Intelligence in Education: A Thematic Analysis | Abhishek Kaushik et.al. | 2501.10134 | null |
2025-01-17 | ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario | Lucen Zhong et.al. | 2501.10132 | link |
2025-01-17 | PaSa: An LLM Agent for Comprehensive Academic Paper Search | Yichen He et.al. | 2501.10120 | link |
2025-01-17 | LLM Reasoner and Automated Planner: A new NPC approach | Israel Puerta-Merino et.al. | 2501.10106 | null |
2025-01-17 | Universal Actions for Enhanced Embodied Foundation Models | Jinliang Zheng et.al. | 2501.10105 | link |
2025-01-17 | Few-shot Structure-Informed Machinery Part Segmentation with Foundation Models and Graph Neural Networks | Michael Schwingshackl et.al. | 2501.10080 | link |
2025-01-17 | SpatialCoT: Advancing Spatial Reasoning through Coordinate Alignment and Chain-of-Thought for Embodied Task Planning | Yuecheng Liu et.al. | 2501.10074 | null |
2025-01-16 | Distilling Multi-modal Large Language Models for Autonomous Driving | Deepti Hegde et.al. | 2501.09757 | null |
2025-01-16 | Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues | Youngjoon Jang et.al. | 2501.09754 | null |
2025-01-16 | OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking | Zekun Xi et.al. | 2501.09751 | null |
2025-01-16 | Enhancing Lexicon-Based Text Embeddings with Large Language Models | Yibin Lei et.al. | 2501.09749 | null |
2025-01-16 | Suggesting Code Edits in Interactive Machine Learning Notebooks Using Large Language Models | Bihui Jin et.al. | 2501.09745 | null |
2025-01-16 | Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps | Nanye Ma et.al. | 2501.09732 | null |
2025-01-16 | A Simple Aerial Detection Baseline of Multimodal Language Models | Qingyun Li et.al. | 2501.09720 | link |
2025-01-16 | CyberMentor: AI Powered Learning Tool Platform to Address Diverse Student Needs in Cybersecurity Education | Tianyu Wang et.al. | 2501.09709 | link |
2025-01-16 | Domain Adaptation of Foundation LLMs for e-Commerce | Christian Herold et.al. | 2501.09706 | null |
2025-01-16 | Cueless EEG imagined speech for subject identification: dataset and benchmarks | Ali Derakhshesh et.al. | 2501.09700 | link |
2025-01-16 | Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key | Zhihe Yang et.al. | 2501.09695 | link |
2025-01-16 | Simulated Interactive Debugging | Yannic Noller et.al. | 2501.09694 | null |
2025-01-16 | Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models | Fengli Xu et.al. | 2501.09686 | null |
2025-01-16 | Reward-Guided Controlled Generation for Inference-Time Alignment in Diffusion Models: Tutorial and Review | Masatoshi Uehara et.al. | 2501.09685 | null |
2025-01-16 | Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark | Alexis Roger et.al. | 2501.09672 | null |
2025-01-16 | A Survey of Research in Large Language Models for Electronic Design Automation | Jingyu Pan et.al. | 2501.09655 | null |
2025-01-16 | The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models | Jonathan Katzy et.al. | 2501.09653 | null |
2025-01-16 | CarMem: Enhancing Long-Term Memory in LLM Voice Assistants through Category-Bounding | Johannes Kirmayr et.al. | 2501.09645 | link |
2025-01-16 | LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading | Kuan-Ming Liu et.al. | 2501.09636 | null |
2025-01-16 | Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework | Yushen Lin et.al. | 2501.09631 | null |
2025-01-15 | Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians | Ishan Amin et.al. | 2501.09009 | link |
2025-01-15 | Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails | Shaona Ghosh et.al. | 2501.09004 | null |
2025-01-15 | Vision Foundation Models for Computed Tomography | Suraj Pai et.al. | 2501.09001 | null |
2025-01-15 | CityLoc: 6 DoF Localization of Text Descriptions in Large-Scale Scenes with Gaussian Representation | Qi Ma et.al. | 2501.08982 | null |
2025-01-15 | Development and Validation of the Provider Documentation Summarization Quality Instrument for Large Language Models | Emma Croxford et.al. | 2501.08977 | null |
2025-01-15 | Learning to Extract Cross-Domain Aspects and Understanding Sentiments Using Large Language Models | Karukriti Kaushik Ghosh et.al. | 2501.08974 | null |
2025-01-15 | Analyzing the Ethical Logic of Six Large Language Models | W. Russell Neuman et.al. | 2501.08951 | null |
2025-01-15 | Applying General Turn-taking Models to Conversational Human-Robot Interaction | Gabriel Skantze et.al. | 2501.08946 | null |
2025-01-15 | Disentangling Exploration of Large Language Models by Optimal Exploitation | Tim Grams et.al. | 2501.08925 | null |
2025-01-15 | GenAI Content Detection Task 3: Cross-Domain Machine-Generated Text Detection Challenge | Liam Dugan et.al. | 2501.08913 | link |
2025-01-15 | Leveraging Large Language Models as Knowledge-Driven Agents for Reliable Retrosynthesis Planning | Qinyu Ma et.al. | 2501.08897 | link |
2025-01-15 | Generative Planning with 3D-vision Language Pre-training for End-to-End Autonomous Driving | Tengpeng Li et.al. | 2501.08861 | null |
2025-01-15 | Exploring Task-Level Optimal Prompts for Visual In-Context Learning | Yan Zhu et.al. | 2501.08841 | null |
2025-01-15 | IDEA: Image Description Enhanced CLIP-Adapter | Zhipeng Ye et.al. | 2501.08816 | link |
2025-01-15 | How Developers Interact with AI: A Taxonomy of Human-AI Collaboration in Software Engineering | Christoph Treude et.al. | 2501.08774 | null |
2025-01-15 | Admitting Ignorance Helps the Video Question Answering Models to Answer | Haopeng Li et.al. | 2501.08771 | null |
2025-01-15 | Enhanced Large Language Models for Effective Screening of Depression and Anxiety | June M. Liu et.al. | 2501.08769 | null |
2025-01-15 | Leveraging LLM Agents for Translating Network Configurations | Yunze Wei et.al. | 2501.08760 | null |
2025-01-15 | Expanding Vietnamese SentiWordNet to Improve Performance of Vietnamese Sentiment Analysis Models | Hong-Viet Tran et.al. | 2501.08758 | null |
2025-01-15 | The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities | Irina Bigoulaeva et.al. | 2501.08716 | link |
2025-01-14 | PokerBench: Training Large Language Models to become Professional Poker Players | Richard Zhuang et.al. | 2501.08328 | link |
2025-01-14 | Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks | Miran Heo et.al. | 2501.08326 | null |
2025-01-14 | ADAM-1: AI and Bioinformatics for Alzheimer's Detection and Microbiome-Clinical Data Integrations | Ziyuan Huang et.al. | 2501.08324 | null |
2025-01-14 | Exploring Robustness of Multilingual LLMs on Real-World Noisy Data | Amirhossein Aliakbarzadeh et.al. | 2501.08322 | link |
2025-01-14 | Enhancing Automated Interpretability with Output-Centric Feature Descriptions | Yoav Gur-Arieh et.al. | 2501.08319 | link |
2025-01-14 | MiniMax-01: Scaling Foundation Models with Lightning Attention | MiniMax et.al. | 2501.08313 | null |
2025-01-14 | HALoGEN: Fantastic LLM Hallucinations and Where to Find Them | Abhilasha Ravichander et.al. | 2501.08292 | null |
2025-01-14 | LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding | Hongyu Li et.al. | 2501.08282 | link |
2025-01-14 | Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing | Pulkit Arora et.al. | 2501.08276 | null |
2025-01-14 | Addressing the sustainable AI trilemma: a case study on LLM agents and RAG | Hui Wu et.al. | 2501.08262 | null |
2025-01-14 | Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models | Yifu Qiu et.al. | 2501.08248 | null |
2025-01-14 | Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints | Jonathan Nöther et.al. | 2501.08246 | null |
2025-01-14 | Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings | Paul Joe Maliakel et.al. | 2501.08219 | null |
2025-01-14 | ASTRID -- An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems | Mohita Chowdhury et.al. | 2501.08208 | null |
2025-01-14 | ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving | Zain Ul Abedin et.al. | 2501.08203 | null |
2025-01-14 | CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation | Jinjun Peng et.al. | 2501.08200 | link |
2025-01-14 | OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training | Yijiong Yu et.al. | 2501.08197 | link |
2025-01-14 | PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving | Ahmet Caner Yüzügüler et.al. | 2501.08192 | null |
2025-01-14 | A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation | Steven Landgraf et.al. | 2501.08188 | null |
2025-01-14 | A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following | Yin Fang et.al. | 2501.08187 | link |
2025-01-13 | Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss | Xinyu Zhang et.al. | 2501.07563 | null |
2025-01-13 | SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing | Varun Biyyala et.al. | 2501.07554 | link |
2025-01-13 | Imagine while Reasoning in Space: Multimodal Visualization-of-Thought | Chengzu Li et.al. | 2501.07542 | null |
2025-01-13 | ML Mule: Mobile-Driven Context-Aware Collaborative Learning | Haoxiang Yu et.al. | 2501.07536 | null |
2025-01-13 | Investigating Large Language Models in Inferring Personality Traits from User Conversations | Jianfeng Zhu et.al. | 2501.07532 | null |
2025-01-13 | RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment | Difei Gu et.al. | 2501.07525 | link |
2025-01-13 | Parallel Key-Value Cache Fusion for Position Invariant RAG | Philhoon Oh et.al. | 2501.07523 | null |
2025-01-13 | Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards | Yangsibo Huang et.al. | 2501.07493 | null |
2025-01-13 | TiEBe: A Benchmark for Assessing the Current Knowledge of Large Language Models | Thales Sales Almeida et.al. | 2501.07482 | null |
2025-01-13 | A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities | Yihao Liu et.al. | 2501.07468 | null |
2025-01-13 | Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI | Rolf Pfister et.al. | 2501.07458 | null |
2025-01-13 | Enhancing LLM's Ability to Generate More Repository-Aware Unit Tests Through Precise Contextual Information Injection | Xin Yin et.al. | 2501.07425 | null |
2025-01-13 | Initial Findings on Sensor based Open Vocabulary Activity Recognition via Text Embedding Inversion | Lala Shakti Swarup Ray et.al. | 2501.07408 | null |
2025-01-13 | Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models | Yasiru Ranasinghe et.al. | 2501.07396 | null |
2025-01-13 | Enhancing Retrieval-Augmented Generation: A Study of Best Practices | Siran Li et.al. | 2501.07391 | link |
2025-01-13 | Extracting Participation in Collective Action from Social Media | Arianna Pera et.al. | 2501.07368 | null |
2025-01-13 | Emergent effects of scaling on the functional hierarchies within large language models | Paul C. Bogdan et.al. | 2501.07359 | null |
2025-01-13 | Evaluating Pre-Trained Models for Multi-Language Vulnerability Patching | Zanis Ali Khan et.al. | 2501.07339 | null |
2025-01-13 | TempoGPT: Enhancing Temporal Reasoning via Quantizing Embedding | Haochuan Zhang et.al. | 2501.07335 | null |
2025-01-13 | Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic Hiring | Buse Sibel Korkmaz et.al. | 2501.07324 | link |
2025-01-10 | LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs | Omkar Thawakar et.al. | 2501.06186 | link |
2025-01-10 | PEACE: Empowering Geologic Map Holistic Understanding with MLLMs | Yangyu Huang et.al. | 2501.06184 | null |
2025-01-10 | VideoAuteur: Towards Long Narrative Video Generation | Junfei Xiao et.al. | 2501.06173 | null |
2025-01-10 | Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories | Gerd Kortemeyer et.al. | 2501.06143 | null |
2025-01-10 | Supervision policies can shape long-term risk management in general-purpose AI models | Manuel Cebrian et.al. | 2501.06137 | link |
2025-01-10 | CoDriveVLM: VLM-Enhanced Urban Cooperative Dispatching and Motion Planning for Future Autonomous Mobility on Demand Systems | Haichao Liu et.al. | 2501.06132 | link |
2025-01-10 | Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI | Yuya Asano et.al. | 2501.06129 | null |
2025-01-10 | Merging Feed-Forward Sublayers for Compressed Transformers | Neha Verma et.al. | 2501.06126 | link |
2025-01-10 | Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding | Fabian David Schmidt et.al. | 2501.06117 | link |
2025-01-10 | From Conversation to Automation: Leveraging Large Language Models to Analyze Strategies in Problem Solving Therapy | Elham Aghakhani et.al. | 2501.06101 | null |
2025-01-10 | Personalized Language Model Learning on Text Data Without User Identifiers | Yucheng Ding et.al. | 2501.06062 | link |
2025-01-10 | AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery | Johann Wenckstern et.al. | 2501.06039 | link |
2025-01-10 | Generate, Transduct, Adapt: Iterative Transduction with VLMs | Oindrila Saha et.al. | 2501.06031 | null |
2025-01-10 | Addressing speaker gender bias in large scale speech translation systems | Shubham Bansal et.al. | 2501.05989 | null |
2025-01-10 | Comparing Self-Supervised Learning Models Pre-Trained on Human Speech and Animal Vocalizations for Bioacoustics Processing | Eklavya Sarkar et.al. | 2501.05987 | link |
2025-01-10 | Exploring LLMs for Automated Pre-Testing of Cross-Cultural Surveys | Divya Mani Adhikari et.al. | 2501.05985 | null |
2025-01-10 | Hermit Kingdom Through the Lens of Multiple Perspectives: A Case Study of LLM Hallucination on North Korea | Eunjung Cho et.al. | 2501.05981 | null |
2025-01-10 | Model Inversion in Split Learning for Personalized LLMs: New Insights from Information Bottleneck Theory | Yunmeng Shu et.al. | 2501.05965 | null |
2025-01-10 | Effective faking of verbal deception detection with target-aligned adversarial attacks | Bennett Kleinberg et.al. | 2501.05962 | null |
2025-01-10 | Scalable Vision Language Model Training via High Quality Data Curation | Hongyuan Dong et.al. | 2501.05952 | null |
2025-01-09 | An Empirical Study of Autoregressive Pre-training from Videos | Jathushan Rajasegaran et.al. | 2501.05453 | null |
2025-01-09 | ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding | Xingyu Fu et.al. | 2501.05452 | null |
2025-01-09 | Relative Pose Estimation through Affine Corrections of Monocular Depth Priors | Yifan Yu et.al. | 2501.05446 | link |
2025-01-09 | Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark | Yunzhuo Hao et.al. | 2501.05444 | null |
2025-01-09 | A survey of textual cyber abuse detection using cutting-edge language models and large language models | Jose A. Diaz-Garcia et.al. | 2501.05443 | null |
2025-01-09 | Using LLMs to Infer Non-Binary COVID-19 Sentiments of Chinese Micro-bloggers | Jerry Chongyi Hu et.al. | 2501.05423 | null |
2025-01-09 | LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation | Xi Ye et.al. | 2501.05414 | null |
2025-01-09 | Seeing Sound: Assembling Sounds from Visuals for Audio-to-Image Generation | Darius Petermann et.al. | 2501.05413 | null |
2025-01-09 | A Novel Pathology Foundation Model by Mayo Clinic, Charité, and Aignostics | Maximilian Alber et.al. | 2501.05409 | null |
2025-01-09 | Mechanistic understanding and validation of large AI models with SemanticLens | Maximilian Dreyer et.al. | 2501.05398 | null |
2025-01-09 | FairCode: Evaluating Social Bias of LLMs in Code Generation | Yongkang Du et.al. | 2501.05396 | link |
2025-01-09 | Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models | Kristian G. Barman et.al. | 2501.05382 | null |
2025-01-09 | Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance | Dimitrios Gerogiannis et.al. | 2501.05379 | null |
2025-01-09 | Accelerated Diffusion Models via Speculative Sampling | Valentin De Bortoli et.al. | 2501.05370 | null |
2025-01-09 | Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction | Hantao Lou et.al. | 2501.05336 | link |
2025-01-09 | "What's Happening"- A Human-centered Multimodal Interpreter Explaining the Actions of Autonomous Vehicles | Xuewen Luo et.al. | 2501.05322 | null |
2025-01-09 | Comparison Study: Glacier Calving Front Delineation in Synthetic Aperture Radar Images With Deep Learning | Nora Gourmelon et.al. | 2501.05281 | link |
2025-01-09 | CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation Models | Fabian Hörst et.al. | 2501.05269 | link |
2025-01-09 | Enhancing Plagiarism Detection in Marathi with a Weighted Ensemble of TF-IDF and BERT Embeddings for Low-Resource Language Processing | Atharva Mutsaddi et.al. | 2501.05260 | link |
2025-01-09 | CallNavi: A Study and Challenge on Function Calling Routing and Invocation in Large Language Models | Yewei Song et.al. | 2501.05255 | null |
2025-01-08 | EditAR: Unified Conditional Generation with Autoregressive Models | Jiteng Mu et.al. | 2501.04699 | null |
2025-01-08 | Re-ranking the Context for Multimodal Retrieval Augmented Generation | Matin Mortaheb et.al. | 2501.04695 | null |
2025-01-08 | URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics | Ruilin Luo et.al. | 2501.04686 | link |
2025-01-08 | Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations | Archita Srivastava et.al. | 2501.04675 | null |
2025-01-08 | DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests | Charles Corbière et.al. | 2501.04671 | null |
2025-01-08 | On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena | Tarek Naous et.al. | 2501.04662 | null |
2025-01-08 | Assessing Language Comprehension in Large Language Models Using Construction Grammar | Wesley Scivetti et.al. | 2501.04661 | null |
2025-01-08 | Multi-task retriever fine-tuning for domain-specific and efficient RAG | Patrice Béchard et.al. | 2501.04652 | null |
2025-01-08 | FlairGPT: Repurposing LLMs for Interior Designs | Gabrielle Littlefair et.al. | 2501.04648 | null |
2025-01-08 | A Statistical Theory of Contrastive Pre-training and Multimodal Generative AI | Kazusato Oko et.al. | 2501.04641 | link |
2025-01-08 | Knowledge Retrieval Based on Generative AI | Te-Lun Yang et.al. | 2501.04635 | null |
2025-01-08 | "Can you be my mum?": Manipulating Social Robots in the Large Language Models Era | Giulio Antonio Abbo et.al. | 2501.04633 | null |
2025-01-08 | MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation | Daniele Molino et.al. | 2501.04614 | null |
2025-01-08 | Quantum-inspired Embeddings Projection and Similarity Metrics for Representation Learning | Ivan Kankeu et.al. | 2501.04591 | link |
2025-01-08 | Boosting Salient Object Detection with Knowledge Distillated from Large Foundation Models | Miaoyang He et.al. | 2501.04582 | null |
2025-01-08 | InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection | Yuhang Liu et.al. | 2501.04575 | link |
2025-01-08 | Supervision-free Vision-Language Alignment | Giorgio Giannone et.al. | 2501.04568 | null |
2025-01-08 | OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis | Run Luo et.al. | 2501.04561 | link |
2025-01-08 | The Impostor is Among Us: Can Large Language Models Capture the Complexity of Human Personas? | Christopher Lazik et.al. | 2501.04543 | null |
2025-01-08 | rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking | Xinyu Guan et.al. | 2501.04519 | null |
2025-01-07 | LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving | Lingdong Kong et.al. | 2501.04005 | null |
2025-01-07 | Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives | Shaoyuan Xie et.al. | 2501.04003 | link |
2025-01-07 | Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Haobo Yuan et.al. | 2501.04001 | link |
2025-01-07 | RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance | Matin Mortaheb et.al. | 2501.03995 | null |
2025-01-07 | Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles | Yuxi Xia et.al. | 2501.03991 | null |
2025-01-07 | (De)-Indexing and the Right to be Forgotten | Salvatore Vilella et.al. | 2501.03989 | null |
2025-01-07 | VLM-driven Behavior Tree for Context-aware Task Planning | Naoki Wake et.al. | 2501.03968 | link |
2025-01-07 | Vision Language Models as Values Detectors | Giulio Antonio Abbo et.al. | 2501.03957 | null |
2025-01-07 | Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States | Jurgita Kapočiūtė-Dzikienė et.al. | 2501.03952 | null |
2025-01-07 | Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection | Pablo Miralles-González et.al. | 2501.03940 | null |
2025-01-07 | Visual question answering: from early developments to recent advances -- a survey | Ngoc Dung Huynh et.al. | 2501.03939 | null |
2025-01-07 | Exploring the Potential of Large Language Models in Public Transportation: San Antonio Case Study | Ramya Jonnala et.al. | 2501.03904 | null |
2025-01-07 | LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token | Shaolei Zhang et.al. | 2501.03895 | link |
2025-01-07 | AlphaPO -- Reward shape matters for LLM alignment | Aman Gupta et.al. | 2501.03884 | null |
2025-01-07 | CL3DOR: Contrastive Learning for 3D Large Multimodal Models via Odds Ratio on High-Resolution Point Clouds | Keonwoo Kim et.al. | 2501.03879 | null |
2025-01-07 | Improving Dialectal Slot and Intent Detection with Auxiliary Tasks: A Multi-Dialectal Bavarian Case Study | Xaver Maria Krückl et.al. | 2501.03863 | link |
2025-01-07 | Progressive Document-level Text Simplification via Large Language Models | Dengzhao Fang et.al. | 2501.03857 | null |
2025-01-07 | BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context | Alexis Matzopoulos et.al. | 2501.03855 | null |
2025-01-07 | OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints | Mingjie Pan et.al. | 2501.03841 | null |
2025-01-07 | MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attention | Aadya Arora et.al. | 2501.03839 | null |
2025-01-06 | BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning | Beichen Zhang et.al. | 2501.03226 | link |
2025-01-06 | Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation | Yuhui Zhang et.al. | 2501.03225 | link |
2025-01-06 | Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLMs-Generated Text | Ayat Najjar et.al. | 2501.03212 | null |
2025-01-06 | Detecting AI-Generated Text in Educational Content: Leveraging Machine Learning and Explainable AI for Academic Integrity | Ayat A. Najjar et.al. | 2501.03203 | null |
2025-01-06 | The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input | Alon Jacovi et.al. | 2501.03200 | null |
2025-01-06 | CLIX: Cross-Lingual Explanations of Idiomatic Expressions | Aaron Gluck et.al. | 2501.03191 | null |
2025-01-06 | Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text | Ali Al-Lawati et.al. | 2501.03166 | link |
2025-01-06 | Segment Anything Model for Zero-shot Single Particle Tracking in Liquid Phase Transmission Electron Microscopy | Risha Goel et.al. | 2501.03153 | link |
2025-01-06 | Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches | Alhassan Mumuni et.al. | 2501.03151 | null |
2025-01-06 | VicSim: Enhancing Victim Simulation with Emotional and Linguistic Fidelity | Yerong Li et.al. | 2501.03139 | null |
2025-01-06 | PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models | Mingyang Song et.al. | 2501.03124 | link |
2025-01-06 | CAT: Content-Adaptive Image Tokenization | Junhong Shen et.al. | 2501.03120 | null |
2025-01-06 | LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases | Dylan Bouchard et.al. | 2501.03112 | link |
2025-01-06 | Sentiment-guided Commonsense-aware Response Generation for Mental Health Counseling | Aseem Srivastava et.al. | 2501.03088 | null |
2025-01-06 | Retrieval-Augmented TLAPS Proof Generation with Large Language Models | Yuhao Zhou et.al. | 2501.03073 | null |
2025-01-06 | Trust Modeling in Counseling Conversations: A Benchmark Study | Aseem Srivastava et.al. | 2501.03064 | null |
2025-01-06 | ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events | Duygu Sezen Islakoglu et.al. | 2501.03040 | null |
2025-01-06 | Piano Transcription by Hierarchical Language Modeling with Pretrained Roll-based Encoders | Dichucheng Li et.al. | 2501.03038 | null |
2025-01-06 | Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning | Zhen Li et.al. | 2501.03035 | null |
2025-01-06 | CALM: Curiosity-Driven Auditing for Large Language Models | Xiang Zheng et.al. | 2501.02997 | link |
2025-01-03 | VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction | Chaoyou Fu et.al. | 2501.01957 | link |
2025-01-03 | Metadata Conditioning Accelerates Language Model Pre-training | Tianyu Gao et.al. | 2501.01956 | link |
2025-01-03 | Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap | Weizhi Zhang et.al. | 2501.01945 | link |
2025-01-03 | Abstractive Text Summarization for Contemporary Sanskrit Prose: Issues and Challenges | Shagun Sinha et.al. | 2501.01933 | null |
2025-01-03 | Bridging Classification and Segmentation in Osteosarcoma Assessment via Foundation and Discrete Diffusion Models | Manh Duong Nguyen et.al. | 2501.01932 | link |
2025-01-03 | Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding | Jiaming Li et.al. | 2501.01926 | link |
2025-01-03 | Virgo: A Preliminary Exploration on Reproducing o1-like MLLM | Yifan Du et.al. | 2501.01904 | link |
2025-01-03 | QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture | Shvetank Prakash et.al. | 2501.01892 | null |
2025-01-03 | Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions | Rachneet Sachdeva et.al. | 2501.01872 | link |
2025-01-03 | Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification | Xiangxiang Dai et.al. | 2501.01849 | link |
2025-01-03 | MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning | Pu Yang et.al. | 2501.01834 | null |
2025-01-03 | Time Series Language Model for Descriptive Caption Generation | Mohamed Trabelsi et.al. | 2501.01832 | null |
2025-01-03 | Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models | Yanjiang Liu et.al. | 2501.01830 | null |
2025-01-03 | SDPO: Segment-Level Direct Preference Optimization for Social Agents | Aobo Kong et.al. | 2501.01821 | link |
2025-01-03 | BERT4MIMO: A Foundation Model using BERT Architecture for Massive MIMO Channel State Information Prediction | Ferhat Ozgur Catak et.al. | 2501.01802 | link |
2025-01-03 | Reading Between the Lines: A dataset and a study on why some texts are tougher than others | Nouran Khallaf et.al. | 2501.01796 | link |
2025-01-03 | Creating Artificial Students that Never Existed: Leveraging Large Language Models and CTGANs for Synthetic Data Generation | Mohammad Khalil et.al. | 2501.01793 | link |
2025-01-03 | Efficient LLM Inference with Activation Checkpointing and Hybrid Caching | Sanghyeon Lee et.al. | 2501.01792 | null |
2025-01-03 | LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction | Er Jin et.al. | 2501.01767 | null |
2025-01-03 | SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation | Mingjie Li et.al. | 2501.01765 | null |
2025-01-02 | GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models | Zhangyang Qi et.al. | 2501.01428 | null |
2025-01-02 | Unifying Specialized Visual Encoders for Video Language Models | Jihoon Chung et.al. | 2501.01426 | link |
2025-01-02 | Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models | Jingfeng Yao et.al. | 2501.01423 | link |
2025-01-02 | OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios | Xize Cheng et.al. | 2501.01384 | null |
2025-01-02 | Training Medical Large Vision-Language Models with Abnormal-Aware Feedback | Yucheng Zhou et.al. | 2501.01377 | null |
2025-01-02 | ScarNet: A Novel Foundation Model for Automated Myocardial Scar Quantification from LGE in Cardiac MRI | Neda Tavakoli et.al. | 2501.01372 | link |
2025-01-02 | CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering | Ben Vardi et.al. | 2501.01371 | null |
2025-01-02 | Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability | Dong Shu et.al. | 2501.01346 | null |
2025-01-02 | Aligning Large Language Models for Faithful Integrity Against Opposing Argument | Yong Zhao et.al. | 2501.01336 | link |
2025-01-02 | CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models | Johan Wahréus et.al. | 2501.01335 | link |
2025-01-02 | Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension | Yanbo Fang et.al. | 2501.01332 | null |
2025-01-02 | The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation | Shuzheng Gao et.al. | 2501.01329 | null |
2025-01-02 | Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking | Xiaoxue Cheng et.al. | 2501.01306 | null |
2025-01-02 | Large Language Models for Mental Health Diagnostic Assessments: Exploring The Potential of Large Language Models for Assisting with Mental Health Diagnostic Assessments -- The Depression and Anxiety Case | Kaushik Roy et.al. | 2501.01305 | null |
2025-01-02 | NeutraSum: A Language Model can help a Balanced Media Diet by Neutralizing News Summaries | Xi Luo et.al. | 2501.01284 | null |
2025-01-02 | CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries | Shudong Liu et.al. | 2501.01282 | null |
2025-01-02 | Language Models for Code Optimization: Survey, Challenges and Future Directions | Jingzhi Gong et.al. | 2501.01277 | link |
2025-01-02 | Does a Large Language Model Really Speak in Human-Like Language? | Mose Park et.al. | 2501.01273 | null |
2025-01-02 | ProgCo: Program Helps Self-Correction of Large Language Models | Xiaoshuai Song et.al. | 2501.01264 | null |
2025-01-02 | CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings | Shanghaoran Quan et.al. | 2501.01257 | null |
2024-12-30 | Distributed Mixture-of-Agents for Edge Inference with Large Language Models | Purbesh Mitra et.al. | 2412.21200 | link |
2024-12-31 | HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation | Zhaojian Yu et.al. | 2412.21199 | link |
2024-12-30 | Aviary: training language agents on challenging scientific tasks | Siddharth Narayanan et.al. | 2412.21154 | null |
2024-12-30 | Facilitating large language model Russian adaptation with Learned Embedding Propagation | Mikhail Tikhomirov et.al. | 2412.21140 | link |
2024-12-30 | Training Software Engineering Agents and Verifiers with SWE-Gym | Jiayi Pan et.al. | 2412.21139 | link |
2024-12-30 | Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism | Tim Tsz-Kit Lau et.al. | 2412.21124 | null |
2024-12-30 | ExpShield: Safeguarding Web Text from Unauthorized Crawling and Language Modeling Exploitation | Ruixuan Liu et.al. | 2412.21123 | null |
2024-12-30 | Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model | Yifei Huang et.al. | 2412.21080 | link |
2024-12-30 | Efficient Multi-Task Inferencing with a Shared Backbone and Lightweight Task-Specific Adapters for Automatic Scoring | Ehsan Latif et.al. | 2412.21065 | null |
2024-12-30 | Toward Intelligent and Secure Cloud: Large Language Model Empowered Proactive Defense | Yuyang Zhou et.al. | 2412.21051 | link |
2024-12-30 | Visual Style Prompt Learning Using Diffusion Models for Blind Face Restoration | Wanglong Lu et.al. | 2412.21042 | link |
2024-12-30 | TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization | Chia-Yu Hung et.al. | 2412.21037 | link |
2024-12-30 | GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models | Shangyu Xing et.al. | 2412.21036 | null |
2024-12-30 | MapQaTor: A System for Efficient Annotation of Map Query Datasets | Mahir Labib Dihan et.al. | 2412.21015 | link |
2024-12-31 | Verbosity-Aware Rationale Reduction: Effective Reduction of Redundant Rationale via Principled Criteria | Joonwon Jang et.al. | 2412.21006 | null |
2024-12-30 | Plug-and-Play Training Framework for Preference Optimization | Jingyuan Ma et.al. | 2412.20996 | null |
2024-12-30 | KARPA: A Training-free Method of Adapting Knowledge Graph as References for Large Language Model's Reasoning Path Aggregation | Siyuan Fang et.al. | 2412.20995 | null |
2024-12-30 | Efficiently Serving LLM Reasoning Programs with Certaindex | Yichao Fu et.al. | 2412.20993 | null |
2024-12-30 | AlignAb: Pareto-Optimal Energy Alignment for Designing Nature-Like Antibodies | Yibo Wen et.al. | 2412.20984 | null |
2024-12-30 | UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI | Fangwei Zhong et.al. | 2412.20977 | null |
2024-12-27 | MVTamperBench: Evaluating Robustness of Vision-Language Models | Amit Agarwal et.al. | 2412.19794 | null |
2024-12-27 | InfAlign: Inference-aware language model alignment | Ananth Balashankar et.al. | 2412.19792 | null |
2024-12-27 | Enhancing Whisper's Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization | Kumud Tripathi et.al. | 2412.19785 | null |
2024-12-27 | Can AI Help with Your Personal Finances? | Oudom Hean et.al. | 2412.19784 | null |
2024-12-27 | Fortran2CPP: Automating Fortran-to-C++ Migration using LLMs via Multi-Turn Dialogue and Dual-Agent Integration | Le Chen et.al. | 2412.19770 | link |
2024-12-27 | On dual-projectively equivalent connections associated to second order superintegrable systems | Andreas Vollmer et.al. | 2412.19739 | null |
2024-12-27 | Can Large Language Models Adapt to Other Agents In-Context? | Matthew Riemer et.al. | 2412.19726 | null |
2024-12-27 | OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis | Qiushi Sun et.al. | 2412.19723 | null |
2024-12-27 | Toward Adaptive Reasoning in Large Language Models with Thought Rollback | Sijia Chen et.al. | 2412.19707 | link |
2024-12-27 | A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization | Jingchun Lian et.al. | 2412.19685 | null |
2024-12-27 | Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework | Jiang Liu et.al. | 2412.19684 | null |
2024-12-27 | CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs | Siyu Wang et.al. | 2412.19663 | null |
2024-12-27 | Asymmetrical Reciprocity-based Federated Learning for Resolving Disparities in Medical Diagnosis | Jiaqi Wang et.al. | 2412.19654 | link |
2024-12-27 | FreStega: A Plug-and-Play Method for Boosting Imperceptibility and Capacity in Generative Linguistic Steganography for Real-World Scenarios | Kaiyi Pang et.al. | 2412.19652 | null |
2024-12-27 | Xmodel-2 Technical Report | Wang Qun et.al. | 2412.19638 | null |
2024-12-27 | IMTP: Search-based Code Generation for In-memory Tensor Programs | Yongwon Shin et.al. | 2412.19630 | null |
2024-12-27 | Signatures of prediction during natural listening in MEG data? | Sahel Azizpour et.al. | 2412.19622 | null |
2024-12-27 | Gradient Weight-normalized Low-rank Projection for Efficient LLM Training | Jia-Hong Huang et.al. | 2412.19616 | link |
2024-12-27 | Let Watermarks Speak: A Robust and Unforgeable Watermark for Language Models | Minhao Bai et.al. | 2412.19603 | null |
2024-12-27 | SocRATES: Towards Automated Scenario-based Testing of Social Navigation Algorithms | Shashank Rao Marpally et.al. | 2412.19595 | null |
2024-12-24 | Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models | Jinhui Yi et.al. | 2412.18609 | link |
2024-12-24 | Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models | Zehan Wang et.al. | 2412.18605 | link |
2024-12-24 | Explaining in Diffusion: Explaining a Classifier Through Hierarchical Semantics with Text-to-Image Diffusion Models | Tahira Kazimi et.al. | 2412.18604 | null |
2024-12-24 | Long-Form Speech Generation with Spoken Language Models | Se Jin Park et.al. | 2412.18603 | link |
2024-12-24 | Decentralized Intelligence in GameFi: Embodied AI Agents and the Convergence of DeFi and Virtual Ecosystems | Fernando Jia et.al. | 2412.18601 | link |
2024-12-24 | A Paragraph is All It Takes: Rich Robot Behaviors from Interacting, Trusted LLMs | OpenMind et.al. | 2412.18588 | null |
2024-12-24 | Exploring Embedding Priors in Prompt-Tuning for Improved Interpretability and Control | Sergey Sedov et.al. | 2412.18582 | null |
2024-12-24 | Zero-resource Speech Translation and Recognition with LLMs | Karel Mundnich et.al. | 2412.18566 | null |
2024-12-24 | Distilling Fine-grained Sentiment Understanding from Large Language Models | Yice Zhang et.al. | 2412.18552 | link |
2024-12-24 | Token-Budget-Aware LLM Reasoning | Tingxu Han et.al. | 2412.18547 | link |
2024-12-24 | Consistency Checks for Language Model Forecasters | Daniel Paleka et.al. | 2412.18544 | null |
2024-12-24 | PLD-Tree: Persistent Laplacian Decision Tree for Protein-Protein Binding Free Energy Prediction | Xingjian Xu et.al. | 2412.18541 | null |
2024-12-24 | Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation | Derong Xu Xinhang Li et.al. | 2412.18537 | link |
2024-12-24 | Automated Code Review In Practice | Umut Cihan et.al. | 2412.18531 | null |
2024-12-24 | The Key of Understanding Vision Tasks: Explanatory Instructions | Yang Shen et.al. | 2412.18525 | link |
2024-12-24 | Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving | Hao Pang et.al. | 2412.18511 | null |
2024-12-24 | Think or Remember? Detecting and Directing LLMs Towards Memorization or Generalization | Yi-Fu Fu et.al. | 2412.18497 | null |
2024-12-24 | Generating event descriptions under syntactic and semantic constraints | Angela Cao et.al. | 2412.18496 | link |
2024-12-24 | Segment-Based Attention Masking for GPTs | Shahar Katz et.al. | 2412.18487 | link |
2024-12-24 | 3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding | Tatiana Zemskova et.al. | 2412.18450 | link |
2024-12-23 | ChatGarment: Garment Estimation, Generation and Editing via Large Language Models | Siyuan Bian et.al. | 2412.17811 | null |
2024-12-23 | Reconstructing People, Places, and Cameras | Lea Müller et.al. | 2412.17806 | null |
2024-12-23 | Examining Imbalance Effects on Performance and Demographic Fairness of Clinical Language Models | Precious Jones et.al. | 2412.17803 | null |
2024-12-23 | Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection | Yitong Chen et.al. | 2412.17800 | link |
2024-12-23 | Automating the Search for Artificial Life with Foundation Models | Akarsh Kumar et.al. | 2412.17799 | link |
2024-12-23 | Memory makes computation universal, remember? | Erik Garrison et.al. | 2412.17794 | null |
2024-12-23 | Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective | Xinmiao Yu et.al. | 2412.17787 | null |
2024-12-23 | PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion | Sophia Tang et.al. | 2412.17780 | null |
2024-12-23 | ResearchTown: Simulator of Human Research Community | Haofei Yu et.al. | 2412.17767 | link |
2024-12-23 | Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy | Priyaranjan Pattnayak et.al. | 2412.17759 | null |
2024-12-23 | ADC: Enhancing Function Calling Via Adversarial Datasets and Code Line-Level Feedback | Wei Zhang et.al. | 2412.17754 | null |
2024-12-23 | Deliberation in Latent Space via Differentiable Cache Augmentation | Luyang Liu et.al. | 2412.17747 | null |
2024-12-23 | YuLan-Mini: An Open Data-efficient Language Model | Yiwen Hu et.al. | 2412.17743 | link |
2024-12-23 | Reasoning to Attend: Try to Understand How Token Works | Rui Qian et.al. | 2412.17741 | link |
2024-12-23 | Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization | Ermo Hua et.al. | 2412.17739 | link |
2024-12-23 | Knowledge Editing through Chain-of-Thought | Changyue Wang et.al. | 2412.17727 | link |
2024-12-23 | From Models to Microtheories: Distilling a Model's Topical Knowledge for Grounded Question Answering | Nathaniel Weir et.al. | 2412.17701 | link |
2024-12-23 | Understanding the Logic of Direct Preference Alignment through Logic | Kyle Richardson et.al. | 2412.17696 | null |
2024-12-23 | FedTLU: Federated Learning with Targeted Layer Updates | Jong-Ik Park et.al. | 2412.17692 | null |
2024-12-23 | Large Language Model Safety: A Holistic Survey | Dan Shi et.al. | 2412.17686 | link |
2024-12-20 | HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding | Chenxin Tao et.al. | 2412.16158 | null |
2024-12-20 | Frequency Is What You Need: Word-frequency Masking Benefits Vision-Language Model Pre-training | Mingliang Liang et.al. | 2412.16148 | link |
2024-12-20 | Offline Reinforcement Learning for LLM Multi-Step Reasoning | Huaijie Wang et.al. | 2412.16145 | link |
2024-12-20 | Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation | Seyedreza Mohseni et.al. | 2412.16135 | null |
2024-12-20 | Data-Driven Mechanism Design: Jointly Eliciting Preferences and Information | Dirk Bergemann et.al. | 2412.16132 | null |
2024-12-20 | PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics | Daniil Larionov et.al. | 2412.16120 | null |
2024-12-20 | Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts | Muhammad Abdullah Sohail et.al. | 2412.16119 | link |
2024-12-20 | PruneVid: Visual Token Pruning for Efficient Video Large Language Models | Xiaohu Huang et.al. | 2412.16117 | link |
2024-12-20 | The Content Moderator's Dilemma: Removal of Toxic Content and Distortions to Online Discourse | Mahyar Habibi et.al. | 2412.16114 | null |
2024-12-20 | Demystifying the Potential of ChatGPT-4 Vision for Construction Progress Monitoring | Ahmet Bahaddin Ersoz et.al. | 2412.16108 | null |
2024-12-20 | Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers | Yifan Yang et.al. | 2412.16102 | null |
2024-12-20 | Logical Consistency of Large Language Models in Fact-checking | Bishwamittra Ghosh et.al. | 2412.16100 | null |
2024-12-20 | The Evolution of LLM Adoption in Industry Data Curation Practices | Crystal Qian et.al. | 2412.16089 | null |
2024-12-20 | Efficient MedSAMs: Segment Anything in Medical Images on Laptop | Jun Ma et.al. | 2412.16085 | link |
2024-12-20 | Formal Mathematical Reasoning: A New Frontier in AI | Kaiyu Yang et.al. | 2412.16075 | null |
2024-12-20 | A Framework for Streaming Event-Log Prediction in Business Processes | Benedikt Bollig et.al. | 2412.16032 | null |
2024-12-20 | The Only Way is Ethics: A Guide to Ethical Research with Large Language Models | Eddie L. Ungless et.al. | 2412.16022 | link |
2024-12-20 | Fearful Falcons and Angry Llamas: Emotion Category Annotations of Arguments by Humans and LLMs | Lynn Greschner et.al. | 2412.15993 | null |
2024-12-20 | BabyHGRN: Exploring RNNs for Sample-Efficient Training of Language Models | Patrick Haller et.al. | 2412.15978 | null |
2024-12-20 | Legommenders: A Comprehensive Content-Based Recommendation Library with LLM Support | Qijiong Liu et.al. | 2412.15973 | link |
2024-12-19 | PRIMA: Multi-Image Vision-Language Models for Reasoning Segmentation | Muntasir Wahed et.al. | 2412.15209 | null |
2024-12-19 | OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving | Shuo Xing et.al. | 2412.15208 | link |
2024-12-19 | AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving | Shuo Xing et.al. | 2412.15206 | link |
2024-12-19 | MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark | Qihao Zhao et.al. | 2412.15194 | link |
2024-12-19 | EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues | Sagar Soni et.al. | 2412.15190 | null |
2024-12-19 | LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation | Weijia Shi et.al. | 2412.15188 | null |
2024-12-19 | Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning | Simon Frieder et.al. | 2412.15184 | null |
2024-12-19 | STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning | Marius Memmel et.al. | 2412.15182 | null |
2024-12-19 | HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages | Aman Chaturvedi et.al. | 2412.15178 | null |
2024-12-19 | Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying | Federico Castagna et.al. | 2412.15177 | link |
2024-12-19 | Rethinking Uncertainty Estimation in Natural Language Generation | Lukas Aichberger et.al. | 2412.15176 | null |
2024-12-19 | Language Models as Continuous Self-Evolving Data Engineers | Peidong Wang et.al. | 2412.15151 | null |
2024-12-19 | Adaptive Pruning for Large Language Models with Structural Importance Awareness | Haotian Zheng et.al. | 2412.15127 | null |
2024-12-19 | Outcome-Refining Process Supervision for Code Generation | Zhuohao Yu et.al. | 2412.15118 | link |
2024-12-19 | Qwen2.5 Technical Report | Qwen et.al. | 2412.15115 | link |
2024-12-19 | Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture | Thomas F Burns et.al. | 2412.15113 | link |
2024-12-19 | Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search | Lei Tan et.al. | 2412.15106 | null |
2024-12-19 | Review-Then-Refine: A Dynamic Framework for Multi-Hop Question Answering with Temporal Adaptability | Xiangsen Chen et.al. | 2412.15101 | null |
2024-12-19 | Nano-ESG: Extracting Corporate Sustainability Information from News Articles | Fabian Billert et.al. | 2412.15093 | link |
2024-12-19 | ScamChatBot: An End-to-End Analysis of Fake Account Recovery on Social Media via Chatbots | Bhupendra Acharya et.al. | 2412.15072 | null |
2024-12-18 | Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces | Jihan Yang et.al. | 2412.14171 | link |
2024-12-18 | TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks | Frank F. Xu et.al. | 2412.14161 | link |
2024-12-18 | Advanced Reasoning and Transformation Engine for Multi-Step Insight Synthesis in Data Analytics with Large Language Models | Atin Sakkeer Hussain et.al. | 2412.14146 | null |
2024-12-18 | Incorporating Feature Pyramid Tokenization and Open Vocabulary Semantic Segmentation | Jianyu Zhang et.al. | 2412.14145 | null |
2024-12-18 | LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research | Tianyang Gu et.al. | 2412.14141 | null |
2024-12-18 | Design choices made by LLM-based test generators prevent them from finding bugs | Noble Saji Mathews et.al. | 2412.14137 | null |
2024-12-18 | Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models | Ido Cohen et.al. | 2412.14133 | link |
2024-12-18 | Foundation Models Meet Low-Cost Sensors: Test-Time Adaptation for Rescaling Disparity for Zero-Shot Metric Depth Estimation | Rémi Marsal et.al. | 2412.14103 | null |
2024-12-18 | Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts | Jihye Choi et.al. | 2412.14097 | null |
2024-12-18 | Alignment faking in large language models | Ryan Greenblatt et.al. | 2412.14093 | link |
2024-12-18 | Future Research Avenues for Artificial Intelligence in Digital Gaming: An Exploratory Report | Markus Dablander et.al. | 2412.14085 | null |
2024-12-18 | Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification | Kyle Thompson et.al. | 2412.14063 | link |
2024-12-18 | Understanding and Evaluating Trust in Generative AI and Large Language Models for Spreadsheets | Simon Thorne et.al. | 2412.14062 | null |
2024-12-18 | Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models | Xinghang Li et.al. | 2412.14058 | null |
2024-12-18 | A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future | Shilin Sun et.al. | 2412.14056 | link |
2024-12-18 | Digestion Algorithm in Hierarchical Symbolic Forests: A Fast Text Normalization Algorithm and Semantic Parsing Framework for Specific Scenarios and Lightweight Deployment | Kevin You et.al. | 2412.14054 | null |
2024-12-18 | Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation | Vera Neplenbroek et.al. | 2412.14050 | link |
2024-12-18 | CAD-Recode: Reverse Engineering CAD Code from Point Clouds | Danila Rukhovich et.al. | 2412.14042 | link |
2024-12-18 | Hansel: Output Length Controlling Framework for Large Language Models | Seoha Song et.al. | 2412.14033 | null |
2024-12-18 | Discovering maximally consistent distribution of causal tournaments with Large Language Models | Federico Baldo et.al. | 2412.14019 | null |
2024-12-17 | Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents | Yifei Zhou et.al. | 2412.13194 | null |
2024-12-17 | GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding | Haoyi Jiang et.al. | 2412.13193 | link |
2024-12-17 | HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction | Chen Bao et.al. | 2412.13187 | null |
2024-12-17 | Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration | Mark Endo et.al. | 2412.13180 | null |
2024-12-17 | SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents | Sheng Yin et.al. | 2412.13178 | link |
2024-12-17 | DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation | Miriam Wanner et.al. | 2412.13175 | null |
2024-12-17 | Locate n' Rotate: Two-stage Openable Part Detection with Foundation Model Priors | Siqi Li et.al. | 2412.13173 | link |
2024-12-17 | Compressed Chain of Thought: Efficient Reasoning Through Dense Representations | Jeffrey Cheng et.al. | 2412.13171 | null |
2024-12-17 | Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study | Bolei Ma et.al. | 2412.13169 | link |
2024-12-17 | C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System | Parker Addison et.al. | 2412.13163 | null |
2024-12-17 | SWAN: Preprocessing SGD Enables Adam-Level Performance On LLM Training With Significant Memory Reduction | Chao Ma et.al. | 2412.13148 | null |
2024-12-17 | Are Your LLMs Capable of Stable Reasoning? | Junnan Liu et.al. | 2412.13147 | link |
2024-12-17 | A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis | Xiao Zhou et.al. | 2412.13126 | null |
2024-12-17 | AI PERSONA: Towards Life-long Personalization of LLMs | Tiannan Wang et.al. | 2412.13103 | null |
2024-12-17 | AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark | Jianlyu Chen et.al. | 2412.13102 | link |
2024-12-17 | Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election | Roberto Mondini et.al. | 2412.13098 | null |
2024-12-17 | LMUnit: Fine-grained Evaluation with Natural Language Unit Tests | Jon Saad-Falcon et.al. | 2412.13091 | null |
2024-12-17 | Taming Multi-Domain, -Fidelity Data: Towards Foundation Models for Atomistic Scale Simulations | Tomoya Shiota et.al. | 2412.13088 | link |
2024-12-17 | Modality-Inconsistent Continual Learning of Multimodal Large Language Models | Weiguo Pian et.al. | 2412.13050 | null |
2024-12-17 | Harnessing Event Sensory Data for Error Pattern Prediction in Vehicles: A Language Model Approach | Hugo Math et.al. | 2412.13041 | link |
2024-12-16 | SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator | Guoxuan Chen et.al. | 2412.12094 | link |
2024-12-16 | Instruction-based Image Manipulation by Watching How Things Move | Mingdeng Cao et.al. | 2412.12087 | null |
2024-12-16 | CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology | Yuxuan Sun et.al. | 2412.12077 | null |
2024-12-16 | CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding | Guo Chen et.al. | 2412.12075 | null |
2024-12-16 | Making FETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats | Kuleen Sasse et.al. | 2412.12072 | link |
2024-12-16 | How Private are Language Models in Abstractive Summarization? | Anthony Hughes et.al. | 2412.12040 | null |
2024-12-16 | Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection | Ira Ceka et.al. | 2412.12039 | null |
2024-12-16 | FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning | Gaojian Wang et.al. | 2412.12032 | link |
2024-12-16 | SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval | Yueqian Lin et.al. | 2412.12009 | null |
2024-12-16 | The Open Source Advantage in Large Language Models (LLMs) | Jiya Manchanda et.al. | 2412.12004 | null |
2024-12-16 | LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input Contexts | Zhuhao Wang et.al. | 2412.12001 | link |
2024-12-16 | SAMIC: Segment Anything with In-Context Spatial Prompt Engineering | Savinay Nagendra et.al. | 2412.11998 | null |
2024-12-16 | Combining Large Language Models with Tutoring System Intelligence: A Case Study in Caregiver Homework Support | Devika Venugopalan et.al. | 2412.11995 | link |
2024-12-16 | ExecRepoBench: Multi-level Executable Code Completion Evaluation | Jian Yang et.al. | 2412.11990 | null |
2024-12-16 | SciFaultyQA: Benchmarking LLMs on Faulty Science Question Detection with a GAN-Inspired Approach to Synthetic Dataset Generation | Debarshi Kundu et.al. | 2412.11988 | link |
2024-12-16 | Cost-Effective Label-free Node Classification with LLMs | Taiyan Zhang et.al. | 2412.11983 | null |
2024-12-16 | AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws | Oren Neumann et.al. | 2412.11979 | link |
2024-12-16 | Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection | Beomseok Lee et.al. | 2412.11978 | null |
2024-12-16 | Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning | Qi Sun et.al. | 2412.11974 | link |
2024-12-16 | DARWIN 1.5: Large Language Models as Materials Science Adapted Learners | Tong Xie et.al. | 2412.11970 | link |
2024-12-13 | UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities | Muhammad Uzair Khattak et.al. | 2412.10372 | link |
2024-12-13 | A Grounded Typology of Word Classes | Coleman Haley et.al. | 2412.10369 | null |
2024-12-13 | Robust image classification with multi-modal large language models | Francesco Villani et.al. | 2412.10353 | null |
2024-12-13 | Towards a foundation model for heavy-ion collision experiments through point cloud diffusion | Manjunath Omana Kuttan et.al. | 2412.10352 | null |
2024-12-13 | A dual contrastive framework | Yuan Sun et.al. | 2412.10348 | null |
2024-12-13 | COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models | Yuchen Ren et.al. | 2412.10347 | null |
2024-12-13 | Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining | Zhiqi Ge et.al. | 2412.10342 | null |
2024-12-13 | AdvPrefix: An Objective for Nuanced LLM Jailbreaks | Sicheng Zhu et.al. | 2412.10321 | link |
2024-12-13 | BrushEdit: All-In-One Image Inpainting and Editing | Yaowei Li et.al. | 2412.10316 | null |
2024-12-13 | DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding | Zhiyu Wu et.al. | 2412.10302 | link |
2024-12-13 | Still "Talking About Large Language Models": Some Clarifications | Murray Shanahan et.al. | 2412.10291 | null |
2024-12-13 | One world, one opinion? The superstar effect in LLM responses | Sofie Goethals et.al. | 2412.10281 | null |
2024-12-13 | Benchmarking Linguistic Diversity of Large Language Models | Yanzhu Guo et.al. | 2412.10271 | link |
2024-12-13 | Cultural Evolution of Cooperation among LLM Agents | Aron Vallinder et.al. | 2412.10270 | null |
2024-12-13 | Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT | Danielle R. Thomas et.al. | 2412.10267 | link |
2024-12-13 | Reasoner Outperforms: Generative Stance Detection with Rationalization for Social Media | Jiaqing Yuan et.al. | 2412.10266 | null |
2024-12-13 | Targeted Angular Reversal of Weights (TARS) for Knowledge Removal in Large Language Models | Harry J. Davies et.al. | 2412.10257 | null |
2024-12-13 | Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts | Hazel Kim et.al. | 2412.10246 | null |
2024-12-13 | Efficient Continual Pre-training of LLMs for Low-resource Languages | Arijit Nag et.al. | 2412.10244 | null |
2024-12-13 | Retrieval-Augmented Semantic Parsing: Using Large Language Models to Improve Generalization | Xiao Zhang et.al. | 2412.10207 | null |
2024-12-12 | EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM | Zhuofan Zong et.al. | 2412.09618 | null |
2024-12-12 | V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding | Junqi Ge et.al. | 2412.09616 | link |
2024-12-12 | PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models | Chenyu Yang et.al. | 2412.09613 | null |
2024-12-12 | Olympus: A Universal Task Router for Computer Vision Tasks | Yuanze Lin et.al. | 2412.09612 | link |
2024-12-12 | Feat2GS: Probing Visual Foundation Models with Gaussian Splatting | Yue Chen et.al. | 2412.09606 | null |
2024-12-12 | AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials | Yiheng Xu et.al. | 2412.09605 | null |
2024-12-12 | SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding | Hao Li et.al. | 2412.09604 | null |
2024-12-12 | Do Multimodal Large Language Models See Like Humans? | Jiaying Lin et.al. | 2412.09603 | null |
2024-12-12 | Hidden Biases of End-to-End Driving Datasets | Julian Zimmerlin et.al. | 2412.09602 | link |
2024-12-12 | InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions | Pan Zhang et.al. | 2412.09596 | link |
2024-12-12 | OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages | Chester Palen-Michel et.al. | 2412.09587 | null |
2024-12-12 | DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction | Yu Feng et.al. | 2412.09572 | null |
2024-12-12 | Does Representation Matter? Exploring Intermediate Layers in Large Language Models | Oscar Skean et.al. | 2412.09563 | null |
2024-12-12 | Foundational Large Language Models for Materials Research | Vaibhav Mishra et.al. | 2412.09560 | link |
2024-12-12 | Video Creation by Demonstration | Yihong Sun et.al. | 2412.09551 | null |
2024-12-12 | Exemplar Masking for Multimodal Incremental Learning | Yi-Lun Lee et.al. | 2412.09549 | link |
2024-12-12 | Capturing the Temporal Dependence of Training Data Influence | Jiachen T. Wang et.al. | 2412.09538 | null |
2024-12-12 | Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM | Han Wang et.al. | 2412.09530 | link |
2024-12-12 | Can Modern LLMs Act as Agent Cores in Radiology~Environments? | Qiaoyu Zheng et.al. | 2412.09529 | link |
2024-12-12 | Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Clinical Pathology Analysis | Shengxuming Zhang et.al. | 2412.09521 | null |
2024-12-11 | Generative Semantic Communication: Architectures, Technologies, and Applications | Jinke Ren et.al. | 2412.08642 | null |
2024-12-11 | Fast Prompt Alignment for Text-to-Image Generation | Khalil Mrini et.al. | 2412.08639 | link |
2024-12-11 | Multimodal Latent Language Modeling with Next-Token Diffusion | Yutao Sun et.al. | 2412.08635 | link |
2024-12-11 | Synthetic Vision: Training Vision-Language Models to Understand Physics | Vahid Balazadeh et.al. | 2412.08619 | null |
2024-12-11 | Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models | Jiahui Li et.al. | 2412.08615 | link |
2024-12-11 | Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning | Fan Lu et.al. | 2412.08614 | link |
2024-12-11 | Competition and Diversity in Generative AI | Manish Raghavan et.al. | 2412.08610 | link |
2024-12-11 | AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models | Mintong Kang et.al. | 2412.08608 | null |
2024-12-11 | Preference Discerning with LLM-Enhanced Generative Retrieval | Fabian Paischer et.al. | 2412.08604 | null |
2024-12-11 | Empirical Measurements of AI Training Power Demand on a GPU-Accelerated Node | Imran Latif et.al. | 2412.08602 | null |
2024-12-11 | Leveraging Graph-RAG and Prompt Engineering to Enhance LLM-Based Automated Requirement Traceability and Compliance Checks | Arsalan Masoudifard et.al. | 2412.08593 | null |
2024-12-11 | Advancing Single- and Multi-task Text Classification through Large Language Model Fine-tuning | Hang Zhao et.al. | 2412.08587 | null |
2024-12-11 | TURBOATTENTION: Efficient Attention Approximation For High Throughputs LLMs | Hao Kang et.al. | 2412.08585 | null |
2024-12-11 | LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations | Zejian Li et.al. | 2412.08580 | link |
2024-12-11 | Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning | Rongzhe Wei et.al. | 2412.08559 | null |
2024-12-11 | MaestroMotif: Skill Design from Artificial Intelligence Feedback | Martin Klissarov et.al. | 2412.08542 | null |
2024-12-11 | SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with ground-level prompting | Pallavi Jain et.al. | 2412.08536 | link |
2024-12-11 | Continual Learning for Encoder-only Language Models via a Discrete Key-Value Bottleneck | Andor Diera et.al. | 2412.08528 | null |
2024-12-11 | EMS: Adaptive Evict-then-Merge Strategy for Head-wise KV Cache Compression Based on Global-Local Importance | Yingxin Li et.al. | 2412.08521 | null |
2024-12-11 | Bridging Relevance and Reasoning: Rationale Distillation in Retrieval-Augmented Generation | Pengyue Jia et.al. | 2412.08519 | null |
2024-12-10 | Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences | Alan Nawzad Amin et.al. | 2412.07763 | link |
2024-12-10 | SAT: Spatial Aptitude Training for Multimodal Language Models | Arijit Ray et.al. | 2412.07755 | null |
2024-12-10 | LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models | Ziqi Lu et.al. | 2412.07746 | null |
2024-12-10 | Zero-Shot ATC Coding with Large Language Models for Clinical Assessments | Zijian Chen et.al. | 2412.07743 | null |
2024-12-10 | AI Expands Scientists' Impact but Contracts Science's Focus | Qianyue Hao et.al. | 2412.07727 | link |
2024-12-10 | Granite Guardian | Inkit Padhi et.al. | 2412.07724 | link |
2024-12-10 | Leveraging Content and Context Cues for Low-Light Image Enhancement | Igor Morawski et.al. | 2412.07693 | link |
2024-12-10 | DriveMM: All-in-One Large Multimodal Model for Autonomous Driving | Zhijian Huang et.al. | 2412.07689 | link |
2024-12-10 | Privacy-Preserving Customer Support: A Framework for Secure and Scalable Interactions | Anant Prakash Awasthi et.al. | 2412.07687 | null |
2024-12-10 | TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation | Alfredo Garrachón Ruiz et.al. | 2412.07682 | null |
2024-12-10 | RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models | Greg Heinrich et.al. | 2412.07679 | link |
2024-12-10 | Ask Humans or AI? Exploring Their Roles in Visualization Troubleshooting | Shuyu Shen et.al. | 2412.07673 | link |
2024-12-10 | FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks | Bocheng Chen et.al. | 2412.07672 | null |
2024-12-10 | Automating Business Intelligence Requirements with Generative AI and Semantic Search | Nimrod Busany et.al. | 2412.07668 | null |
2024-12-10 | Searching for Structure: Investigating Emergent Communication with Large Language Models | Tom Kouwenhoven et.al. | 2412.07646 | null |
2024-12-10 | TrojanWhisper: Evaluating Pre-trained LLMs to Detect and Localize Hardware Trojans | Md Omar Faruque et.al. | 2412.07636 | null |
2024-12-10 | ChocoLlama: Lessons Learned From Teaching Llamas Dutch | Matthieu Meeus et.al. | 2412.07633 | null |
2024-12-10 | Piece of Table: A Divide-and-Conquer Approach for Selecting Sub-Tables in Table Question Answering | Wonjin Lee et.al. | 2412.07629 | null |
2024-12-10 | OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations | Linke Ouyang et.al. | 2412.07626 | link |
2024-12-10 | DRUM: Learning Demonstration Retriever for Large MUlti-modal Models | Ellen Yi-Ge et.al. | 2412.07619 | null |
2024-12-09 | Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models | Yi-Lun Lee et.al. | 2412.06775 | link |
2024-12-09 | Visual Lexicon: Rich Image Features in Language Space | XuDong Wang et.al. | 2412.06774 | null |
2024-12-09 | Training Large Language Models to Reason in a Continuous Latent Space | Shibo Hao et.al. | 2412.06769 | null |
2024-12-09 | Ranking-aware adapter for text-driven image ordering with CLIP | Wei-Hsiang Yu et.al. | 2412.06760 | link |
2024-12-09 | Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code | Joy Krishan Das et.al. | 2412.06757 | null |
2024-12-09 | Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models | Neel Jain et.al. | 2412.06748 | null |
2024-12-09 | ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities | Adhiraj Ghosh et.al. | 2412.06745 | null |
2024-12-09 | JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM | Takuro Fujii et.al. | 2412.06738 | link |
2024-12-09 | AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark | Lan Li et.al. | 2412.06724 | link |
2024-12-09 | How to Merge Your Multimodal Models Over Time? | Sebastian Dziadzio et.al. | 2412.06712 | link |
2024-12-09 | OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions | Yi-Kai Zhang et.al. | 2412.06693 | null |
2024-12-09 | Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach | Weichao Xu et.al. | 2412.06684 | null |
2024-12-09 | Toward LLM-Agent-Based Modeling of Transportation Systems: A Conceptual Framework | Tianming Liu et.al. | 2412.06681 | null |
2024-12-09 | I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token | Roi Cohen et.al. | 2412.06676 | null |
2024-12-09 | ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance | Chunwei Wang et.al. | 2412.06673 | null |
2024-12-09 | MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models | Shansong Liu et.al. | 2412.06660 | link |
2024-12-09 | Chatbots im Schulunterricht: Wir testen das Fobizz-Tool zur automatischen Bewertung von Hausaufgaben | Rainer Mühlhoff et.al. | 2412.06651 | null |
2024-12-09 | The Narrow Gate: Localized Image-Text Communication in Vision-Language Models | Alessandro Serra et.al. | 2412.06646 | null |
2024-12-09 | MAVias: Mitigate any Visual Bias | Ioannis Sarridis et.al. | 2412.06632 | null |
2024-12-09 | Copyright-Protected Language Generation via Adaptive Model Fusion | Javier Abad et.al. | 2412.06619 | link |
2024-12-06 | Birth and Death of a Rose | Chen Geng et.al. | 2412.05278 | null |
2024-12-06 | Sparse autoencoders reveal selective remapping of visual concepts during adaptation | Hyesu Lim et.al. | 2412.05276 | link |
2024-12-06 | Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | Zhe Chen et.al. | 2412.05271 | link |
2024-12-06 | APOLLO: SGD-like Memory, AdamW-level Performance | Hanqing Zhu et.al. | 2412.05270 | link |
2024-12-06 | Uncertainty Quantification for Transformer Models for Dark-Pattern Detection | Javier Muñoz et.al. | 2412.05251 | null |
2024-12-06 | Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization | Luca Masserano et.al. | 2412.05244 | null |
2024-12-06 | CompCap: Improving Multimodal Large Language Models with Composite Captions | Xiaohui Chen et.al. | 2412.05243 | null |
2024-12-06 | MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale | Jarvis Guo et.al. | 2412.05237 | null |
2024-12-06 | BEExformer: A Fast Inferencing Transformer Architecture via Binarization with Multiple Early Exits | Wazib Ansar et.al. | 2412.05225 | null |
2024-12-06 | 100% Hallucination Elimination Using Acurai | Michael C. Wood et.al. | 2412.05223 | link |
2024-12-06 | Evaluating and Aligning CodeLLMs on Human Preference | Jian Yang et.al. | 2412.05210 | null |
2024-12-06 | A Survey of Large Language Model-Based Generative AI for Text-to-SQL: Benchmarks, Applications, Use Cases, and Challenges | Aditi Singh et.al. | 2412.05208 | null |
2024-12-06 | Are Frontier Large Language Models Suitable for Q&A in Science Centres? | Jacob Watson et.al. | 2412.05200 | null |
2024-12-06 | SurgBox: Agent-Driven Operating Room Sandbox with Surgery Copilot | Jinlin Wu et.al. | 2412.05187 | link |
2024-12-06 | LinVT: Empower Your Image-level Large Language Model to Understand Videos | Lishuai Gao et.al. | 2412.05185 | link |
2024-12-06 | QueEn: A Large Language Model for Quechua-English Translation | Junhao Chen et.al. | 2412.05184 | null |
2024-12-06 | Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models | Kuofeng Gao et.al. | 2412.05167 | null |
2024-12-06 | Enhancing Cross-Language Code Translation via Task-Specific Embedding Alignment in Retrieval-Augmented Generation | Manish Bhattarai et.al. | 2412.05159 | null |
2024-12-06 | Multimodal Fact-Checking with Vision Language Models: A Probing Classifier based Solution with Embedding Strategies | Recep Firat Cekinel et.al. | 2412.05155 | link |
2024-12-06 | A text-to-tabular approach to generate synthetic patient data using LLMs | Margaux Tornqvist et.al. | 2412.05153 | link |
2024-12-05 | Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail | Luca Bartolomei et.al. | 2412.04472 | link |
2024-12-05 | NVILA: Efficient Frontier Visual Language Models | Zhijian Liu et.al. | 2412.04468 | null |
2024-12-05 | VisionZip: Longer is Better but Not Necessary in Vision Language Models | Senqiao Yang et.al. | 2412.04467 | link |
2024-12-05 | Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection | Enshen Zhou et.al. | 2412.04455 | null |
2024-12-05 | p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay | Jun Zhang et.al. | 2412.04449 | link |
2024-12-05 | EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios | Lu Qiu et.al. | 2412.04447 | null |
2024-12-05 | DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models | Yizhuo Li et.al. | 2412.04446 | null |
2024-12-05 | Moto: Latent Motion Token as the Bridging Language for Robot Manipulation | Yi Chen et.al. | 2412.04445 | null |
2024-12-05 | Towards Real-Time Open-Vocabulary Video Instance Segmentation | Bin Yan et.al. | 2412.04434 | null |
2024-12-05 | Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation | Yuying Ge et.al. | 2412.04432 | link |
2024-12-05 | Grounding Descriptions in Images informs Zero-Shot Visual Recognition | Shaunak Halbe et.al. | 2412.04429 | link |
2024-12-05 | Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion | Jiuhai Chen et.al. | 2412.04424 | link |
2024-12-05 | Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation | Xuying Li et.al. | 2412.04415 | null |
2024-12-05 | Establishing Task Scaling Laws via Compute-Efficient Model Ladders | Akshita Bhagia et.al. | 2412.04403 | null |
2024-12-05 | SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding | Rong Li et.al. | 2412.04383 | null |
2024-12-05 | Discriminative Fine-tuning of LVLMs | Yassine Ouali et.al. | 2412.04378 | null |
2024-12-05 | Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting | Edoardo Cetin et.al. | 2412.04368 | null |
2024-12-05 | Approximate Top- |
Oscar Key et.al. | 2412.04358 | null |
2024-12-05 | Retrieval-Augmented Machine Translation with Unstructured Knowledge | Jiaan Wang et.al. | 2412.04342 | link |
2024-12-05 | Liquid: Language Models are Scalable Multi-modal Generators | Junfeng Wu et.al. | 2412.04332 | link |
2024-12-04 | From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents | Xinyi Mou et.al. | 2412.03563 | link |
2024-12-04 | FLAIR: VLM with Fine-grained Language-informed Image Representations | Rui Xiao et.al. | 2412.03561 | link |
2024-12-04 | Best-of-N Jailbreaking | John Hughes et.al. | 2412.03556 | link |
2024-12-04 | PaliGemma 2: A Family of Versatile VLMs for Transfer | Andreas Steiner et.al. | 2412.03555 | null |
2024-12-04 | SPICE: Smart Projection Interface for Cooking Enhancement | Vera Prohaska et.al. | 2412.03551 | null |
2024-12-04 | Perception Tokens Enhance Visual Reasoning in Multimodal Language Models | Mahtab Bigverdi et.al. | 2412.03548 | null |
2024-12-04 | Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models | Natalie Mackraz et.al. | 2412.03537 | null |
2024-12-04 | A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences | Gabriel Lino Garcia et.al. | 2412.03531 | null |
2024-12-04 | FANAL -- Financial Activity News Alerting Language Modeling Framework | Urjitkumar Patel et.al. | 2412.03527 | null |
2024-12-04 | You're (Not) My Type -- Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks? | Dominic Lohr et.al. | 2412.03516 | null |
2024-12-04 | Distillation of Diffusion Features for Semantic Correspondence | Frank Fundel et.al. | 2412.03512 | null |
2024-12-04 | Tight PAC-Bayesian Risk Certificates for Contrastive Learning | Anna van Elst et.al. | 2412.03486 | link |
2024-12-04 | Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning | Neale Ratzlaff et.al. | 2412.03467 | null |
2024-12-04 | Pre-trained Multiple Latent Variable Generative Models are good defenders against Adversarial Attacks | Dario Serez et.al. | 2412.03453 | link |
2024-12-04 | From Words to Workflows: Automating Business Processes | Laura Minkova et.al. | 2412.03446 | null |
2024-12-04 | Assessing Foundation Models' Transferability to Physiological Signals in Precision Medicine | Matthias Christenson et.al. | 2412.03427 | null |
2024-12-04 | PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation | Ao Wang et.al. | 2412.03409 | link |
2024-12-04 | RedStone: Curating General, Code, Math, and QA Data for Large Language Models | Yaoyao Chang et.al. | 2412.03398 | null |
2024-12-04 | Enhancing Supply Chain Visibility with Generative AI: An Exploratory Case Study on Relationship Prediction in Knowledge Graphs | Ge Zheng et.al. | 2412.03390 | null |
2024-12-04 | WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis | Chengwei Hu et.al. | 2412.03359 | null |
2024-12-03 | T-REG: Preference Optimization with Token-Level Reward Regularization | Wenxuan Zhou et.al. | 2412.02685 | null |
2024-12-03 | Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models | Yuda Song et.al. | 2412.02674 | null |
2024-12-03 | LLM-Enhanced Path Planning: Safe and Efficient Autonomous Navigation with Instructional Inputs | Pranav Doma et.al. | 2412.02655 | null |
2024-12-03 | Time-Reversal Provides Unsupervised Feedback to LLMs | Yerram Varun et.al. | 2412.02626 | null |
2024-12-03 | Medical Multimodal Foundation Models in Clinical Diagnosis and Treatment: Applications, Challenges, and Future Directions | Kai Sun et.al. | 2412.02621 | null |
2024-12-03 | Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback | Hiroki Furuta et.al. | 2412.02617 | null |
2024-12-03 | GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot | Aohan Zeng et.al. | 2412.02612 | link |
2024-12-03 | AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? | Kaixiong Gong et.al. | 2412.02611 | null |
2024-12-03 | Interpretable Company Similarity with Sparse Autoencoders | Marco Molinari et.al. | 2412.02605 | null |
2024-12-03 | CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs | Abhas Kumar et.al. | 2412.02602 | null |
2024-12-03 | PrefixLLM: LLM-aided Prefix Circuit Design | Weihua Xiao et.al. | 2412.02594 | null |
2024-12-03 | OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation | Junyuan Zhang et.al. | 2412.02592 | link |
2024-12-03 | Explainable CTR Prediction via LLM Reasoning | Xiaohan Yu et.al. | 2412.02588 | null |
2024-12-03 | Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey | Chenyang Liu et.al. | 2412.02573 | link |
2024-12-03 | SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection | Joongwon Chae et.al. | 2412.02565 | link |
2024-12-03 | Semantic Tokens in Retrieval Augmented Generation | Joel Suro et.al. | 2412.02563 | null |
2024-12-03 | Patent-CR: A Dataset for Patent Claim Revision | Lekang Jiang et.al. | 2412.02549 | null |
2024-12-03 | Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks | Jinjin Cai et.al. | 2412.02531 | null |
2024-12-03 | LLMForecaster: Improving Seasonal Event Forecasts with Unstructured Textual Data | Hanyu Zhang et.al. | 2412.02525 | null |
2024-12-03 | OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations | Caixin Kang et.al. | 2412.02479 | null |
2024-12-02 | T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs | Shukang Yin et.al. | 2411.19951 | link |
2024-12-02 | Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability | Zicheng Lin et.al. | 2411.19943 | null |
2024-11-29 | VLSBench: Unveiling Visual Leakage in Multimodal Safety | Xuhao Hu et.al. | 2411.19939 | null |
2024-11-29 | On Domain-Specific Post-Training for Multimodal Large Language Models | Daixuan Cheng et.al. | 2411.19930 | null |
2024-11-29 | SIMS: Simulating Human-Scene Interactions with Real World Script Planning | Wenjia Wang et.al. | 2411.19921 | null |
2024-11-29 | FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation | Chang Won Lee et.al. | 2411.19888 | null |
2024-11-29 | PDDLFuse: A Tool for Generating Diverse Planning Domains | Vedant Khandelwal et.al. | 2411.19886 | null |
2024-12-02 | LUMIA: Linear probing for Unimodal and MultiModal Membership Inference Attacks leveraging internal LLM states | Luis Ibanez-Lissen et.al. | 2411.19876 | null |
2024-11-29 | DeMo: Decoupled Momentum Optimization | Bowen Peng et.al. | 2411.19870 | link |
2024-11-29 | AIDetx: a compression-based method for identification of machine-learning generated text | Leonardo Almeida et.al. | 2411.19869 | link |
2024-11-29 | Reverse Thinking Makes LLMs Stronger Reasoners | Justin Chih-Yao Chen et.al. | 2411.19865 | null |
2024-11-29 | Cross-Domain Recommendation Meets Large Language Models | Ajay Krishna Vajjala et.al. | 2411.19862 | link |
2024-11-29 | What fifty-one years of Linguistics and Artificial Intelligence research tell us about their correlation: A scientometric review | Mohammed Q. Shormani et.al. | 2411.19858 | null |
2024-11-29 | Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation | Dimosthenis Antypas et.al. | 2411.19832 | null |
2024-11-29 | Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation | Robin D. Pesl et.al. | 2411.19804 | null |
2024-11-29 | INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge | Angelika Romanou et.al. | 2411.19799 | null |
2024-11-29 | MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks | Yiming Wu et.al. | 2411.19786 | null |
2024-11-29 | PerLA: Perceptive 3D Language Assistant | Guofeng Mei et.al. | 2411.19774 | null |
2024-11-29 | LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos | Tiantian Geng et.al. | 2411.19772 | null |
2024-11-29 | Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models | Kaican Li et.al. | 2411.19757 | link |
2024-11-27 | Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation | Yueru Jia et.al. | 2411.18623 | null |
2024-11-27 | Cross-modal Information Flow in Multimodal Large Language Models | Zhi Zhang et.al. | 2411.18620 | null |
2024-11-27 | Diffusion Self-Distillation for Zero-Shot Customized Image Generation | Shengqu Cai et.al. | 2411.18616 | null |
2024-11-27 | Automated Literature Review Using NLP Techniques and LLM-Based Retrieval-Augmented Generation | Nurshat Fateh Ali et.al. | 2411.18583 | null |
2024-11-27 | Challenges in Adapting Multilingual LLMs to Low-Resource Languages using LoRA PEFT Tuning | Omkar Khade et.al. | 2411.18571 | null |
2024-11-27 | A Pipeline of Neural-Symbolic Integration to Enhance Spatial Reasoning in Large Language Models | Rong Wang et.al. | 2411.18564 | null |
2024-11-27 | DexDiffuser: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation | Zhixuan Liang et.al. | 2411.18562 | null |
2024-11-27 | Retrofitting (Large) Language Models with Dynamic Tokenization | Darius Feher et.al. | 2411.18553 | null |
2024-11-27 | AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans | Dillon Loh et.al. | 2411.18539 | link |
2024-11-27 | Emergence of Self-Identity in AI: A Mathematical Framework and Empirical Study with Generative Large Language Models | Minhyeok Lee et.al. | 2411.18530 | link |
2024-11-27 | LLM-ABBA: Understand time series via symbolic approximation | Erin Carson et.al. | 2411.18506 | null |
2024-11-27 | GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation | Pengfei Zhou et.al. | 2411.18499 | null |
2024-11-27 | Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS | Jinyang Wu et.al. | 2411.18478 | null |
2024-11-27 | Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding | Ziyin Zhang et.al. | 2411.18462 | link |
2024-11-27 | Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator | Frederic Kirstein et.al. | 2411.18444 | null |
2024-11-27 | An AI-Assisted Multi-Agent Dual Dialogue System to Support Mental Health Care Providers | Onno P. Kampman et.al. | 2411.18429 | null |
2024-11-27 | FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving | Ao Shen et.al. | 2411.18424 | null |
2024-11-27 | Politicians vs ChatGPT. A study of presuppositions in French and Italian political communication | Davide Garassino et.al. | 2411.18403 | null |
2024-11-27 | Topic Modeling and Sentiment Analysis on Japanese Online Media's Coverage of Nuclear Energy | Yifan Sun et.al. | 2411.18383 | null |
2024-11-27 | ChatGPT as speechwriter for the French presidents | Dominique Labbé et.al. | 2411.18382 | null |
2024-11-26 | Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats | Jiaxin Wen et.al. | 2411.17693 | null |
2024-11-26 | Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens | Xu Ouyang et.al. | 2411.17691 | null |
2024-11-26 | Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration | Yuhang Han et.al. | 2411.17686 | null |
2024-11-26 | Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning | Zhu Xu et.al. | 2411.17679 | link |
2024-11-26 | Instance-Aware Graph Prompt Learning | Jiazheng Li et.al. | 2411.17676 | null |
2024-11-26 | Push the Limit of Multi-modal Emotion Recognition by Prompting LLMs with Receptive-Field-Aware Attention Weighting | Liyun Zhang et.al. | 2411.17674 | null |
2024-11-26 | SketchAgent: Language-Driven Sequential Sketch Generation | Yael Vinker et.al. | 2411.17673 | null |
2024-11-26 | Synthetic Data Generation with LLM for Improved Depression Prediction | Andrea Kang et.al. | 2411.17672 | null |
2024-11-26 | How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations | Hyunji Lee et.al. | 2411.17666 | null |
2024-11-26 | Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism | Yi-Chien Lin et.al. | 2411.17651 | null |
2024-11-26 | On Limitations of LLM as Annotator for Low Resource Languages | Suramya Jadhav et.al. | 2411.17637 | null |
2024-11-26 | MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation | Harsh Singh et.al. | 2411.17636 | null |
2024-11-26 | Data-driven development of cycle prediction models for lithium metal batteries using multi modal mining | Jaewoong Lee et.al. | 2411.17625 | null |
2024-11-26 | Scaling Speech-Text Pre-training with Synthetic Interleaved Data | Aohan Zeng et.al. | 2411.17607 | null |
2024-11-26 | HyperSeg: Towards Universal Visual Segmentation with Large Language Model | Cong Wei et.al. | 2411.17606 | link |
2024-11-26 | Making History Readable | Bipasha Banerjee et.al. | 2411.17600 | null |
2024-11-26 | Agentic AI for Improving Precision in Identifying Contributions to Sustainable Development Goals | William A. Ingram et.al. | 2411.17598 | null |
2024-11-26 | Can artificial intelligence predict clinical trial outcomes? | Shuyi Jin et.al. | 2411.17595 | null |
2024-11-26 | RTL-Breaker: Assessing the Security of LLMs against Backdoor Attacks on HDL Code Generation | Lakshmi Likhitha Mankali et.al. | 2411.17569 | null |
2024-11-26 | Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey | Jiayi Kuang et.al. | 2411.17558 | null |
2024-11-25 | Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts? | Sohee Yang et.al. | 2411.16679 | null |
2024-11-25 | Diffusion Features for Zero-Shot 6DoF Object Pose Estimation | Bernd Von Gimborn et.al. | 2411.16668 | null |
2024-11-25 | DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation | Zun Wang et.al. | 2411.16657 | null |
2024-11-25 | Self-Generated Critiques Boost Reward Modeling for Language Models | Yue Yu et.al. | 2411.16646 | null |
2024-11-25 | Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective | Jean Marie Tshimula et.al. | 2411.16642 | null |
2024-11-25 | StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training | Kaustubh Ponkshe et.al. | 2411.16618 | null |
2024-11-25 | Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models | Ronghuan Wu et.al. | 2411.16602 | null |
2024-11-25 | From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge | Dawei Li et.al. | 2411.16594 | link |
2024-11-25 | Large Language Model-based Decision-making for COLREGs and the Control of Autonomous Surface Vehicles | Klinsmann Agyei et.al. | 2411.16587 | link |
2024-11-25 | MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series | Aaron Wheeler et.al. | 2411.16585 | link |
2024-11-25 | Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision | Zhiheng Xi et.al. | 2411.16579 | null |
2024-11-25 | Predictive Power of LLMs in Financial Markets | Jerick Shi et.al. | 2411.16569 | null |
2024-11-25 | EnStack: An Ensemble Stacking Framework of Large Language Models for Enhanced Vulnerability Detection in Source Code | Shahriyar Zaman Ridoy et.al. | 2411.16561 | null |
2024-11-25 | Generating Out-Of-Distribution Scenarios Using Language Models | Erfan Aasi et.al. | 2411.16554 | null |
2024-11-25 | Representation Collapsing Problems in Vector Quantization | Wenhao Zhao et.al. | 2411.16550 | null |
2024-11-25 | RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics | Chan Hee Song et.al. | 2411.16537 | null |
2024-11-25 | Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings | Carolin M. Schuster et.al. | 2411.16527 | null |
2024-11-25 | Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency | Jerry Yao-Chieh Hu et.al. | 2411.16525 | null |
2024-11-25 | LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation | Steven Song et.al. | 2411.16523 | link |
2024-11-25 | Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis | Boming Miao et.al. | 2411.16503 | null |
2024-11-22 | Measuring Bullshit in the Language Games played by ChatGPT | Alessandro Trevisan et.al. | 2411.15129 | null |
2024-11-22 | Health AI Developer Foundations | Atilla P. Kiraly et.al. | 2411.15128 | null |
2024-11-22 | TÜLU 3: Pushing Frontiers in Open Language Model Post-Training | Nathan Lambert et.al. | 2411.15124 | link |
2024-11-22 | RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts | Hjalmar Wijk et.al. | 2411.15114 | link |
2024-11-22 | Efficient Pruning of Text-to-Image Models: Insights from Pruning Stable Diffusion | Samarth N Ramesh et.al. | 2411.15113 | null |
2024-11-22 | AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution | Fengyuan Liu et.al. | 2411.15102 | link |
2024-11-22 | What You See is Not What You Get: Neural Partial Differential Equations and The Illusion of Learning | Arvind Mohan et.al. | 2411.15101 | null |
2024-11-22 | XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models | Yixin Dong et.al. | 2411.15100 | null |
2024-11-22 | Context-Aware Multimodal Pretraining | Karsten Roth et.al. | 2411.15099 | null |
2024-11-22 | mR |
Tao Zhang et.al. | 2411.15041 | null |
2024-11-22 | One to rule them all: natural language to bind communication, perception and action | Simone Colombani et.al. | 2411.15033 | null |
2024-11-22 | Time is on my sight: scene graph filtering for dynamic environment perception in an LLM-driven robot | Simone Colombani et.al. | 2411.15027 | null |
2024-11-22 | DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models | Keda Tao et.al. | 2411.15024 | link |
2024-11-22 | FTA generation using GenAI with an Autonomy sensor Usecase | Sneha Sudhir Shetiya et.al. | 2411.15007 | null |
2024-11-22 | ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data | Junhong Shen et.al. | 2411.15004 | link |
2024-11-22 | Generative AI may backfire for counterspeech | Dominik Bär et.al. | 2411.14986 | null |
2024-11-22 | Exploring Foundation Models Fine-Tuning for Cytology Classification | Manon Dausort et.al. | 2411.14975 | link |
2024-11-22 | Open-Amp: Synthetic Data Framework for Audio Effect Foundation Models | Alec Wright et.al. | 2411.14972 | link |
2024-11-22 | SwissADT: An Audio Description Translation System for Swiss Languages | Lukas Fischer et.al. | 2411.14967 | null |
2024-11-22 | LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement | Jieming Bian et.al. | 2411.14961 | null |
2024-11-21 | Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models | Yuhao Dong et.al. | 2411.14432 | link |
2024-11-21 | Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation | Zhuoman Liu et.al. | 2411.14423 | null |
2024-11-21 | From RNNs to Foundation Models: An Empirical Study on Commercial Building Energy Consumption | Shourya Bose et.al. | 2411.14421 | null |
2024-11-21 | Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding | Yiming Zhang et.al. | 2411.14401 | null |
2024-11-21 | Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings | Aaron Zheng et.al. | 2411.14398 | null |
2024-11-21 | UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages | Bethel Melesse Tessema et.al. | 2411.14343 | link |
2024-11-21 | SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching | Arjun P S et.al. | 2411.14322 | link |
2024-11-21 | Velocitune: A Velocity-based Dynamic Domain Reweighting Method for Continual Pre-training | Zheheng Luo et.al. | 2411.14318 | null |
2024-11-21 | Automated Generation of Code Debugging Exercises | Victor-Alexandru Pădurean et.al. | 2411.14303 | null |
2024-11-21 | Auto-SPICE: Leveraging LLMs for Dataset Creation via Automated SPICE Netlist Extraction from Analog Circuit Diagrams | Jitendra Bhandari et.al. | 2411.14299 | link |
2024-11-21 | EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild | Yumeng Liu et.al. | 2411.14280 | null |
2024-11-21 | Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance | Haozhe Zhao et.al. | 2411.14279 | null |
2024-11-21 | Efficient Aspect-Based Summarization of Climate Change Reports with Small Language Models | Iacopo Ghinassi et.al. | 2411.14272 | link |
2024-11-21 | Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective | Ernests Lavrinovics et.al. | 2411.14258 | null |
2024-11-21 | Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models | Javier Ferrando et.al. | 2411.14257 | null |
2024-11-21 | Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs | Zeyu Dong et.al. | 2411.14256 | null |
2024-11-21 | Intent-Aware Dialogue Generation and Multi-Task Contrastive Learning for Multi-Turn Intent Classification | Junhua Liu et.al. | 2411.14252 | null |
2024-11-21 | Natural Language Reinforcement Learning | Xidong Feng et.al. | 2411.14251 | link |
2024-11-21 | FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual Token Compression | Yuke Zhu et.al. | 2411.14228 | null |
2024-11-21 | Towards Context-Rich Automated Biodiversity Assessments: Deriving AI-Powered Insights from Camera Trap Data | Paul Fergus et.al. | 2411.14219 | null |
2024-11-20 | Find Any Part in 3D | Ziqi Ma et.al. | 2411.13550 | null |
2024-11-20 | SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs | Shirley Kokane et.al. | 2411.13547 | null |
2024-11-20 | Promoting User Data Autonomy During the Dissolution of a Monopolistic Firm | Rushabh Solanki et.al. | 2411.13546 | null |
2024-11-20 | BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games | Davide Paglieri et.al. | 2411.13543 | null |
2024-11-20 | Metacognition for Unknown Situations and Environments (MUSE) | Rodolfo Valiente et.al. | 2411.13537 | null |
2024-11-20 | Predictive Insights into LGBTQ+ Minority Stress: A Transductive Exploration of Social Media Discourse | S. Chapagain et.al. | 2411.13534 | link |
2024-11-20 | Advancing Complex Medical Communication in Arabic with Sporo AraSum: Surpassing Existing Large Language Models | Chanseo Lee et.al. | 2411.13518 | null |
2024-11-20 | Disentangling Memory and Reasoning Ability in Large Language Models | Mingyu Jin et.al. | 2411.13504 | link |
2024-11-20 | Neural machine translation of seismic waves for petrophysical inversion | José Cunha Teixeira et.al. | 2411.13491 | null |
2024-11-20 | Utilizing Large Language Models to Synthesize Product Desirability Datasets | John D. Hastings et.al. | 2411.13485 | null |
2024-11-20 | PatentEdits: Framing Patent Novelty as Textual Entailment | Ryan Lee et.al. | 2411.13477 | null |
2024-11-20 | When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training | Haonan Wang et.al. | 2411.13476 | link |
2024-11-20 | SoK: A Systems Perspective on Compound AI Threats and Countermeasures | Sarbartha Banerjee et.al. | 2411.13459 | null |
2024-11-20 | LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models | Salvatore Mario Carta et.al. | 2411.13453 | null |
2024-11-20 | AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations | Gaurav Verma et.al. | 2411.13451 | null |
2024-11-20 | WaterPark: A Robustness Assessment of Language Model Watermarking | Jiacheng Liang et.al. | 2411.13425 | link |
2024-11-20 | Unleashing the Power of Large Language Models for Group POI Recommendations | Jing Long et.al. | 2411.13415 | null |
2024-11-20 | A Survey On Enhancing Reinforcement Learning in Complex Environments: Insights from Human and LLM Feedback | Alireza Rashidi Laleh et.al. | 2411.13410 | null |
2024-11-20 | Unification of Balti and trans-border sister dialects in the essence of LLMs and AI Technology | Muhammad Sharif et.al. | 2411.13409 | null |
2024-11-20 | Transformer-Based Contextualized Language Models Joint with Neural Networks for Natural Language Inference in Vietnamese | Dat Van-Thanh Nguyen et.al. | 2411.13407 | null |
2024-11-19 | ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models | Salma Kharrat et.al. | 2411.12736 | link |
2024-11-19 | Information Theory of Meaningful Communication | Doron Sivan et.al. | 2411.12728 | null |
2024-11-19 | CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs | Zhehan Kan et.al. | 2411.12713 | null |
2024-11-19 | Enhancing Multi-Class Disease Classification: Neoplasms, Cardiovascular, Nervous System, and Digestive Disorders Using Advanced LLMs | Ahmed Akib Jawad Karim et.al. | 2411.12712 | null |
2024-11-19 | Strengthening Fake News Detection: Leveraging SVM and Sophisticated Text Vectorization Techniques. Defying BERT? | Ahmed Akib Jawad Karim et.al. | 2411.12703 | null |
2024-11-19 | When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations | Huaizhi Ge et.al. | 2411.12701 | null |
2024-11-19 | SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference | Jiho Shin et.al. | 2411.12692 | null |
2024-11-19 | Neurosymbolic Graph Enrichment for Grounded World Models | Stefano De Giorgis et.al. | 2411.12671 | null |
2024-11-19 | DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models | Vinay Kumar Sankarapu et.al. | 2411.12643 | link |
2024-11-19 | Improving Controllability and Editability for Pretrained Text-to-Music Generation Models | Yixiao Zhang et.al. | 2411.12641 | null |
2024-11-19 | Provable unlearning in topic modeling and downstream tasks | Stanley Wei et.al. | 2411.12600 | null |
2024-11-19 | AdaCM |
Yuanbin Man et.al. | 2411.12593 | null |
2024-11-19 | Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models | Laura Ruis et.al. | 2411.12580 | link |
2024-11-19 | Large Language Models for Combinatorial Optimization of Design Structure Matrix | Shuo Jiang et.al. | 2411.12571 | null |
2024-11-19 | Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues | Riccardo Grazzi et.al. | 2411.12537 | link |
2024-11-19 | Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution | Yang Zou et.al. | 2411.12530 | link |
2024-11-19 | Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus | Terufumi Morishita et.al. | 2411.12498 | link |
2024-11-19 | AI Flow at the Network Edge | Jiawei Shao et.al. | 2411.12469 | null |
2024-11-19 | Guide-to-Explain for Controllable Summarization | Sangwon Ryu et.al. | 2411.12460 | null |
2024-11-19 | \textsc{Neon}: News Entity-Interaction Extraction for Enhanced Question Answering | Sneha Singhania et.al. | 2411.12449 | null |
2024-11-18 | Bi-Mamba: Towards Accurate 1-Bit State Space Models | Shengkun Tang et.al. | 2411.11843 | null |
2024-11-18 | Tackling prediction tasks in relational databases with LLMs | Marek Wydmuch et.al. | 2411.11829 | null |
2024-11-18 | Exploring adversarial robustness of JPEG AI: methodology, comparison and new methods | Egor Kovalev et.al. | 2411.11795 | null |
2024-11-18 | LLM-IE: A Python Package for Generative Information Extraction with Large Language Models | Enshuo Hsu et.al. | 2411.11779 | null |
2024-11-18 | sMoRe: Enhancing Object Manipulation and Organization in Mixed Reality Spaces with LLMs and Generative AI | Yunhao Xing et.al. | 2411.11752 | null |
2024-11-18 | BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration | Yuzong Chen et.al. | 2411.11745 | link |
2024-11-18 | Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment | Allison Huang et.al. | 2411.11731 | link |
2024-11-18 | Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation | Mingchao Qi et.al. | 2411.11714 | link |
2024-11-18 | FedCoLLM: A Parameter-Efficient Federated Co-tuning Framework for Large and Small Language Models | Tao Fan et.al. | 2411.11707 | null |
2024-11-18 | MC-LLaVA: Multi-Concept Personalized Vision-Language Model | Ruichuan An et.al. | 2411.11706 | link |
2024-11-18 | Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search | Jinhao Jiang et.al. | 2411.11694 | null |
2024-11-18 | TrojanRobot: Backdoor Attacks Against Robotic Manipulation in the Physical World | Xianlong Wang et.al. | 2411.11683 | null |
2024-11-18 | PSPO: An Effective Process-supervised Policy Optimization for Reasoning Alignment* | Jiawei Li et.al. | 2411.11681 | link |
2024-11-18 | Dissecting Misalignment of Multimodal Large Language Models via Influence Function | Lijie Hu et.al. | 2411.11667 | null |
2024-11-18 | TSINR: Capturing Temporal Continuity via Implicit Neural Representations for Time Series Anomaly Detection | Mengxuan Li et.al. | 2411.11641 | link |
2024-11-18 | Chapter 7 Review of Data-Driven Generative AI Models for Knowledge Extraction from Scientific Literature in Healthcare | Leon Kopitar et.al. | 2411.11635 | null |
2024-11-18 | Signaling and Social Learning in Swarms of Robots | Leo Cazenille et.al. | 2411.11616 | null |
2024-11-18 | Leveraging Computational Pathology AI for Noninvasive Optical Imaging Analysis Without Retraining | Danny Barash et.al. | 2411.11613 | null |
2024-11-18 | VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation | Bangguo Yu et.al. | 2411.11609 | null |
2024-11-18 | Exploring LLMs for Verifying Technical System Specifications Against Requirements | Lasse M. Reinpold et.al. | 2411.11582 | null |
2024-11-15 | VeriGraph: Scene Graphs for Execution Verifiable Robot Planning | Daniel Ekpo et.al. | 2411.10446 | null |
2024-11-15 | Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization | Weiyun Wang et.al. | 2411.10442 | null |
2024-11-15 | LLaVA-o1: Let Vision Language Models Reason Step-by-Step | Guowei Xu et.al. | 2411.10440 | link |
2024-11-15 | MARS: Unleashing the Power of Variance Reduction for Training Large Models | Huizhuo Yuan et.al. | 2411.10438 | link |
2024-11-15 | Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization | Yuhan Fu et.al. | 2411.10436 | null |
2024-11-15 | Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent Balderdash | Parsa Hejabi et.al. | 2411.10422 | link |
2024-11-15 | On the Foundation Model for Cardiac MRI Reconstruction | Chi Zhang et.al. | 2411.10403 | null |
2024-11-15 | Interactive Cycle Model -- The Linkage Combination among Automatic Speech Recognition, Large Language Models and Smart Glasses | Libo Wang et.al. | 2411.10362 | link |
2024-11-15 | Bias Unveiled: Investigating Social Bias in LLM-Generated Code | Lin Ling et.al. | 2411.10351 | null |
2024-11-15 | Y-MAP-Net: Real-time depth, normals, segmentation, multi-label captioning and 2D human pose in RGB images | Ammar Qammaz et.al. | 2411.10334 | null |
2024-11-15 | Number it: Temporal Grounding Videos like Flipping Manga | Yongliang Wu et.al. | 2411.10332 | link |
2024-11-15 | Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting | Ziqi Xie et.al. | 2411.10309 | link |
2024-11-15 | Static network structure cannot stabilize cooperation among Large Language Model agents | Jin Han et.al. | 2411.10294 | null |
2024-11-15 | Scaling Law for Post-training after Model Pruning | Xiaodong Chen et.al. | 2411.10272 | null |
2024-11-15 | Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning | Jingru Yang et.al. | 2411.10252 | null |
2024-11-15 | Measuring Non-Adversarial Reproduction of Training Data in Large Language Models | Michael Aerni et.al. | 2411.10242 | null |
2024-11-15 | Generative AI in Multimodal User Interfaces: Trends, Challenges, and Cross-Platform Adaptability | J. Bieniek et.al. | 2411.10234 | null |
2024-11-15 | An Empirical Study on LLM-based Agents for Automated Bug Fixing | Xiangxin Meng et.al. | 2411.10213 | null |
2024-11-15 | Agentic LLMs in the Supply Chain: Towards Autonomous Multi-Agent Consensus-Seeking | Valeria Jannelli et.al. | 2411.10184 | null |
2024-11-15 | CART: Compositional Auto-Regressive Transformer for Image Generation | Siddharth Roheda et.al. | 2411.10180 | null |
2024-11-14 | MagicQuill: An Intelligent Interactive Image Editing System | Zichen Liu et.al. | 2411.09703 | null |
2024-11-14 | Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models | Wei Wang et.al. | 2411.09691 | null |
2024-11-14 | Squeezed Attention: Accelerating Long Context Length LLM Inference | Coleman Hooper et.al. | 2411.09688 | link |
2024-11-14 | Adaptive Decoding via Latent Preference Optimization | Shehzaad Dhuliawala et.al. | 2411.09661 | null |
2024-11-14 | On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse | Alkis Kalavasis et.al. | 2411.09642 | null |
2024-11-14 | Local deployment of large-scale music AI models on commodity hardware | Xun Zhou et.al. | 2411.09625 | null |
2024-11-14 | PTR: Precision-Driven Tool Recommendation for Large Language Models | Hang Gao et.al. | 2411.09613 | null |
2024-11-14 | The Moral Foundations Weibo Corpus | Renjie Cao et.al. | 2411.09612 | null |
2024-11-14 | Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework | Ronak Pradeep et.al. | 2411.09607 | null |
2024-11-14 | Accelerating Knowledge Graph and Ontology Engineering with Large Language Models | Cogan Shimizu et.al. | 2411.09601 | null |
2024-11-14 | Assessing the Performance of the DINOv2 Self-supervised Learning Vision Transformer Model for the Segmentation of the Left Atrium from MRI Images | Bipasha Kundu et.al. | 2411.09598 | null |
2024-11-14 | LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models | Zhengyi Wang et.al. | 2411.09595 | null |
2024-11-14 | Adopting RAG for LLM-Aided Future Vehicle Design | Vahid Zolfaghari et.al. | 2411.09590 | null |
2024-11-14 | BabyLM Challenge: Exploring the Effect of Variation Sets on Language Model Training Efficiency | Akari Haga et.al. | 2411.09587 | null |
2024-11-14 | Software Performance Engineering for Foundation Model-Powered Software (FMware) | Haoxiang Zhang et.al. | 2411.09580 | null |
2024-11-14 | Piecing It All Together: Verifying Multi-Hop Multimodal Claims | Haoran Wang et.al. | 2411.09547 | null |
2024-11-14 | A Practical Guide to Fine-tuning Language Models with Limited Data | Márton Szép et.al. | 2411.09539 | null |
2024-11-14 | Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents | Yuyou Gan et.al. | 2411.09523 | null |
2024-11-14 | Communication Compression for Tensor Parallel LLM Inference | Jan Hansen-Palmus et.al. | 2411.09510 | null |
2024-11-14 | Spider: Any-to-Many Multimodal LLM | Jinxiang Lai et.al. | 2411.09439 | null |
2024-11-13 | Large Wireless Model (LWM): A Foundation Model for Wireless Channels | Sadjad Alikhani et.al. | 2411.08872 | link |
2024-11-13 | The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models | Daniel P. Jeong et.al. | 2411.08870 | link |
2024-11-13 | CamemBERT 2.0: A Smarter French Language Model Aged to Perfection | Wissam Antoun et.al. | 2411.08868 | null |
2024-11-13 | LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs | Piyush Jha et.al. | 2411.08862 | null |
2024-11-13 | Multimodal Instruction Tuning with Hybrid State Space Models | Jianing Zhou et.al. | 2411.08840 | null |
2024-11-13 | FinRobot: AI Agent for Equity Research and Valuation with Large Language Models | Tianyu Zhou et.al. | 2411.08804 | link |
2024-11-13 | Evaluating World Models with LLM for Decision Making | Chang Yang et.al. | 2411.08794 | null |
2024-11-13 | Can sparse autoencoders be used to decompose and interpret steering vectors? | Harry Mayne et.al. | 2411.08790 | link |
2024-11-13 | Sharingan: Extract User Action Sequence from Desktop Recordings | Yanting Chen et.al. | 2411.08768 | null |
2024-11-13 | Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers | Clément Dumas et.al. | 2411.08745 | link |
2024-11-13 | A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models | Dingdong Wang et.al. | 2411.08742 | null |
2024-11-13 | Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models | Somanshu Singla et.al. | 2411.08733 | link |
2024-11-13 | Polymetis:Large Language Modeling for Multiple Material Domains | Chao Huang et.al. | 2411.08728 | null |
2024-11-13 | Voxeland: Probabilistic Instance-Aware Semantic Mapping with Evidence-based Uncertainty Quantification | Jose-Luis Matez-Bandera et.al. | 2411.08727 | link |
2024-11-13 | Theoretical Analysis of Byte-Pair Encoding | László Kozma et.al. | 2411.08671 | null |
2024-11-13 | OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Geometric and Semantic Guidances | Youqi Liao et.al. | 2411.08665 | link |
2024-11-13 | UniMat: Unifying Materials Embeddings through Multi-modal Learning | Janghoon Ock et.al. | 2411.08664 | null |
2024-11-13 | Accelerating Quasi-Static Time Series Simulations with Foundation Models | Alban Puech et.al. | 2411.08652 | null |
2024-11-13 | A System Level Performance Evaluation for Superconducting Digital Systems | Joyjit Kundu et.al. | 2411.08645 | null |
2024-11-13 | Towards Secure Intelligent O-RAN Architecture: Vulnerabilities, Threats and Promising Technical Solutions using LLMs | Mojdeh Karbalaee Motalleb et.al. | 2411.08640 | null |
2024-11-12 | Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data | Juanhui Li et.al. | 2411.08028 | null |
2024-11-12 | LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models | Anoop Cherian et.al. | 2411.08027 | null |
2024-11-12 | Language Models as Causal Effect Generators | Lucius E. J. Bynum et.al. | 2411.08019 | link |
2024-11-12 | ExpressivityArena: Can LLMs Express Information Implicitly? | Joshua Tint et.al. | 2411.08010 | null |
2024-11-12 | Can adversarial attacks by large language models be attributed? | Manuel Cebrian et.al. | 2411.08003 | null |
2024-11-12 | Derivational Morphology Reveals Analogical Generalization in Large Language Models | Valentin Hofmann et.al. | 2411.07990 | null |
2024-11-12 | JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation | Yiyang Ma et.al. | 2411.07975 | link |
2024-11-12 | From General to Specific: Utilizing General Hallucation to Automatically Measure the Role Relationship Fidelity for Specific Role-Play Agents | Chuyi Kong et.al. | 2411.07965 | null |
2024-11-12 | Towards Low-bit Communication for Tensor Parallel LLM Inference | Harry Dong et.al. | 2411.07942 | null |
2024-11-12 | Leveraging Multimodal Models for Enhanced Neuroimaging Diagnostics in Alzheimer's Disease | Francesco Chiumento et.al. | 2411.07871 | null |
2024-11-12 | Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders | Xiaofeng Zhu et.al. | 2411.07870 | null |
2024-11-12 | Verbosity |
Yusen Zhang et.al. | 2411.07858 | link |
2024-11-12 | Tucano: Advancing Neural Text Generation for Portuguese | Nicholas Kluge Corrêa et.al. | 2411.07854 | link |
2024-11-12 | NL-SLAM for OC-VLN: Natural Language Grounded SLAM for Object-Centric VLN | Sonia Raychaudhuri et.al. | 2411.07848 | null |
2024-11-12 | Chain Association-based Attacking and Shielding Natural Language Processing Systems | Jiacheng Huang et.al. | 2411.07843 | null |
2024-11-12 | FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training | Philip Zmushko et.al. | 2411.07837 | link |
2024-11-12 | Efficient Federated Finetuning of Tiny Transformers with Resource-Constrained Devices | Kilian Pfeiffer et.al. | 2411.07826 | null |
2024-11-12 | Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models | Youan Cong et.al. | 2411.07820 | null |
2024-11-12 | Federated Low-Rank Adaptation with Differential Privacy over Wireless Networks | Tianqu Kang et.al. | 2411.07806 | null |
2024-11-12 | Likelihood as a Performance Gauge for Retrieval-Augmented Generation | Tianyu Liu et.al. | 2411.07773 | link |
2024-11-11 | UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts | Bo Yang et.al. | 2411.07240 | link |
2024-11-11 | OpenThaiGPT 1.5: A Thai-Centric Open Source Large Language Model | Sumeth Yuenyong et.al. | 2411.07238 | null |
2024-11-11 | Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations | Chaitanya Malaviya et.al. | 2411.07237 | null |
2024-11-11 | Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving | Botao Yu et.al. | 2411.07228 | null |
2024-11-11 | TempCharBERT: Keystroke Dynamics for Continuous Access Control Based on Pre-trained Language Models | Matheus Simão et.al. | 2411.07224 | null |
2024-11-11 | Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks | Madeline Brumley et.al. | 2411.07213 | null |
2024-11-11 | General Geospatial Inference with a Population Dynamics Foundation Model | Mohit Agarwal et.al. | 2411.07207 | link |
2024-11-11 | DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID | Nyle Siddiqui et.al. | 2411.07205 | link |
2024-11-11 | The Super Weight in Large Language Models | Mengxia Yu et.al. | 2411.07191 | link |
2024-11-11 | NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics | David Robinson et.al. | 2411.07186 | null |
2024-11-11 | SAMPart3D: Segment Any Part in 3D Objects | Yunhan Yang et.al. | 2411.07184 | link |
2024-11-11 | Counterfactual Generation from Language Models | Shauli Ravfogel et.al. | 2411.07180 | link |
2024-11-11 | More Expressive Attention with Negative Weights | Ang Lv et.al. | 2411.07176 | link |
2024-11-11 | Continual Memorization of Factoids in Large Language Models | Howard Chen et.al. | 2411.07175 | link |
2024-11-11 | A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19 | Vedant Khandelwal et.al. | 2411.07163 | null |
2024-11-11 | Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models | Yancheng He et.al. | 2411.07140 | null |
2024-11-11 | Stronger Models are NOT Stronger Teachers for Instruction Tuning | Zhangchen Xu et.al. | 2411.07133 | null |
2024-11-11 | Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis | Taihang Hu et.al. | 2411.07132 | link |
2024-11-11 | Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation | Kaijian Zou et.al. | 2411.07130 | link |
2024-11-11 | Benchmarking LLMs' Judgments with No Gold Standard | Shengwei Xu et.al. | 2411.07127 | link |
2024-11-08 | Recycled Attention: Efficient inference for long-context language models | Fangyuan Xu et.al. | 2411.05787 | null |
2024-11-08 | Using Language Models to Disambiguate Lexical Choices in Translation | Josh Barua et.al. | 2411.05781 | link |
2024-11-08 | Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths? | Veronica Chatrath et.al. | 2411.05775 | null |
2024-11-08 | Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024 | Christopher Malon et.al. | 2411.05762 | null |
2024-11-08 | End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering | Dylan Goetting et.al. | 2411.05755 | link |
2024-11-08 | Aioli: A Unified Optimization Framework for Language Model Data Mixing | Mayee F. Chen et.al. | 2411.05735 | link |
2024-11-08 | Poze: Sports Technique Feedback under Data Constraints | Agamdeep Singh et.al. | 2411.05734 | null |
2024-11-08 | STARS: Sensor-agnostic Transformer Architecture for Remote Sensing | Ethan King et.al. | 2411.05714 | null |
2024-11-08 | Unmasking the Limits of Large Language Models: A Systematic Evaluation of Masked Text Processing Ability through MskQA and MskCal | Fuka Matsuzaki et.al. | 2411.05665 | link |
2024-11-08 | The influence of persona and conversational task on social interactions with a LLM-controlled embodied conversational agent | Leon O. H. Kroczek et.al. | 2411.05653 | null |
2024-11-08 | LightVA: Lightweight Visual Analytics with LLM Agent-Based Task Planning and Execution | Yuheng Zhao et.al. | 2411.05651 | null |
2024-11-08 | Harnessing High-Level Song Descriptors towards Natural Language-Based Music Recommendation | Elena V. Epure et.al. | 2411.05649 | link |
2024-11-08 | Evaluating Large Language Model Capability in Vietnamese Fact-Checking Data Generation | Long Truong To et.al. | 2411.05641 | null |
2024-11-08 | Assessing Open-Source Large Language Models on Argumentation Mining Subtasks | Mohammad Yeghaneh Abkenar et.al. | 2411.05639 | null |
2024-11-08 | A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis | Cristiano Patrício et.al. | 2411.05609 | link |
2024-11-08 | Evaluating and Adapting Large Language Models to Represent Folktales in Low-Resource Languages | JA Meaney et.al. | 2411.05593 | null |
2024-11-08 | Open-set object detection: towards unified problem formulation and benchmarking | Hejer Ammar et.al. | 2411.05564 | null |
2024-11-08 | Training objective drives the consistency of representational similarity across datasets | Laure Ciernik et.al. | 2411.05561 | link |
2024-11-08 | AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality | Ilias Bournias et.al. | 2411.05555 | null |
2024-11-08 | Assessing the Answerability of Queries in Retrieval-Augmented Code Generation | Geonmin Kim et.al. | 2411.05547 | null |
2024-11-07 | SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models | Muyang Li et.al. | 2411.05007 | link |
2024-11-07 | Analyzing The Language of Visual Tokens | David M. Chan et.al. | 2411.05001 | null |
2024-11-07 | Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? | Jonathan Roberts et.al. | 2411.05000 | null |
2024-11-07 | DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation | Peiqi Liu et.al. | 2411.04999 | link |
2024-11-07 | LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation | Weiquan Huang et.al. | 2411.04997 | link |
2024-11-07 | Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models | Weixin Liang et.al. | 2411.04996 | null |
2024-11-07 | Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and Alternatives | Hao Sun et.al. | 2411.04991 | link |
2024-11-07 | The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities | Zhaofeng Wu et.al. | 2411.04986 | link |
2024-11-07 | Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries | Dylan Manuel et.al. | 2411.04981 | null |
2024-11-07 | SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference | Gabriele Oliaro et.al. | 2411.04975 | null |
2024-11-07 | BitNet a4.8: 4-bit Activations for 1-bit LLMs | Hongyu Wang et.al. | 2411.04965 | null |
2024-11-07 | Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability | Yanjun Gao et.al. | 2411.04962 | null |
2024-11-07 | CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM | Jingwei Xu et.al. | 2411.04954 | null |
2024-11-07 | M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding | Jaemin Cho et.al. | 2411.04952 | null |
2024-11-07 | A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model | Panwen Hu et.al. | 2411.04942 | null |
2024-11-07 | VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos | Shehan Munasinghe et.al. | 2411.04923 | null |
2024-11-07 | GPTKB: Building Very Large Knowledge Bases from Language Models | Yujia Hu et.al. | 2411.04920 | link |
2024-11-07 | OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models | Siming Huang et.al. | 2411.04905 | null |
2024-11-07 | In the Era of Prompt Learning with Vision-Language Models | Ankit Jha et.al. | 2411.04892 | null |
2024-11-07 | GUI Agents with Foundation Models: A Comprehensive Survey | Shuai Wang et.al. | 2411.04890 | null |
2024-11-06 | Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? | Daniel P. Jeong et.al. | 2411.04118 | link |
2024-11-06 | How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis | Guan Zhe Hong et.al. | 2411.04105 | null |
2024-11-06 | RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models | Maya Varma et.al. | 2411.04097 | link |
2024-11-06 | Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation | Ke Fan et.al. | 2411.04079 | null |
2024-11-06 | H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models | Nhi Pham et.al. | 2411.04077 | null |
2024-11-06 | M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models | Chuhan Li et.al. | 2411.04075 | null |
2024-11-06 | Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning | Ping Li et.al. | 2411.04059 | link |
2024-11-06 | Beemo: Benchmark of Expert-edited Machine-generated Outputs | Ekaterina Artemova et.al. | 2411.04032 | null |
2024-11-06 | Prompt Engineering Using GPT for Word-Level Code-Mixed Language Identification in Low-Resource Dravidian Languages | Aniket Deroy et.al. | 2411.04025 | null |
2024-11-06 | Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval | Davide Buoso et.al. | 2411.04006 | null |
2024-11-06 | Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning | Jiawei Yao et.al. | 2411.03978 | link |
2024-11-06 | What Really is Commonsense Knowledge? | Quyet V. Do et.al. | 2411.03964 | null |
2024-11-06 | How Does A Text Preprocessing Pipeline Affect Ontology Syntactic Matching? | Zhangcheng Qiang et.al. | 2411.03962 | link |
2024-11-06 | Face Reconstruction from Face Embeddings using Adapter to a Face Foundation Model | Hatef Otroshi Shahreza et.al. | 2411.03960 | null |
2024-11-06 | Fine-Grained Guidance for Retrievers: Leveraging LLMs' Feedback in Retrieval-Augmented Generation | Yuhang Liu et.al. | 2411.03957 | null |
2024-11-06 | Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks | Felipe Marra et.al. | 2411.03948 | link |
2024-11-06 | Interactions Across Blocks in Post-Training Quantization of Large Language Models | Khasmamad Shabanovi et.al. | 2411.03934 | null |
2024-11-06 | Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models | Minh Duc Bui et.al. | 2411.03888 | link |
2024-11-06 | Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models | Zhijian Zhuo et.al. | 2411.03884 | link |
2024-11-06 | MEG: Medical Knowledge-Augmented Large Language Models for Question Answering | Laura Cabello et.al. | 2411.03883 | link |
2024-11-05 | Inference Optimal VLMs Need Only One Visual Token but Larger Models | Kevin Y. Li et.al. | 2411.03312 | link |
2024-11-05 | LLMs for Domain Generation Algorithm Detection | Reynier Leyva La O et.al. | 2411.03307 | null |
2024-11-05 | VERITAS: A Unified Approach to Reliability Evaluation | Rajkumar Ramamurthy et.al. | 2411.03300 | null |
2024-11-05 | Examining Human-AI Collaboration for Co-Writing Constructive Comments Online | Farhana Shahid et.al. | 2411.03295 | null |
2024-11-05 | Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation? | Jingyu Xiao et.al. | 2411.03292 | link |
2024-11-05 | The Future of Intelligent Healthcare: A Systematic Analysis and Discussion on the Integration and Impact of Robots Using Large Language Models for Healthcare | Souren Pashangpour et.al. | 2411.03287 | null |
2024-11-05 | SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents | Dawei Li et.al. | 2411.03284 | link |
2024-11-05 | Spontaneous Emergence of Agent Individuality through Social Interactions in LLM-Based Communities | Ryosuke Takata et.al. | 2411.03252 | null |
2024-11-05 | DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models | Ying Zhou et.al. | 2411.03250 | null |
2024-11-05 | From Pen to Prompt: How Creative Writers Integrate AI into their Writing Practice | Alicia Guo et.al. | 2411.03137 | null |
2024-11-05 | "Create a Fear of Missing Out" -- ChatGPT Implements Unsolicited Deceptive Designs in Generated Websites Without Warning | Veronika Krauß et.al. | 2411.03108 | null |
2024-11-05 | Utilizing Precise and Complete Code Context to Guide LLM in Automatic False Positive Mitigation | Jinbao Chen et.al. | 2411.03079 | null |
2024-11-05 | Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning | Bei Li et.al. | 2411.03042 | null |
2024-11-05 | HumanVLM: Foundation for Human-Scene Vision-Language Model | Dawei Dai et.al. | 2411.03034 | null |
2024-11-05 | Leveraging Large Language Models in Code Question Answering: Baselines and Issues | Georgy Andryushchenko et.al. | 2411.03012 | link |
2024-11-05 | Controlling for Unobserved Confounding with Large Language Model Classification of Patient Smoking Status | Samuel Lee et.al. | 2411.03004 | null |
2024-11-05 | Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation | Junchen Fu et.al. | 2411.02992 | null |
2024-11-05 | Growing a Tail: Increasing Output Diversity in Large Language Models | Michal Shur-Ofry et.al. | 2411.02989 | null |
2024-11-05 | [Vision Paper] PRObot: Enhancing Patient-Reported Outcome Measures for Diabetic Retinopathy using Chatbots and Generative AI | Maren Pielka et.al. | 2411.02973 | null |
2024-11-05 | Multi-modal NeRF Self-Supervision for LiDAR Semantic Segmentation | Xavier Timoneda et.al. | 2411.02969 | null |
2024-11-04 | Training-free Regional Prompting for Diffusion Transformers | Anthony Chen et.al. | 2411.02395 | link |
2024-11-04 | Adaptive Length Image Tokenization via Recurrent Allocation | Shivam Duggal et.al. | 2411.02393 | link |
2024-11-04 | Attacking Vision-Language Computer Agents via Pop-ups | Yanzhe Zhang et.al. | 2411.02391 | link |
2024-11-04 | Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models | Guangzhi Xiong et.al. | 2411.02382 | null |
2024-11-04 | Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI | Ramneet Kaur et.al. | 2411.02381 | null |
2024-11-04 | Learning General-Purpose Biomedical Volume Representations using Randomized Synthesis | Neel Dey et.al. | 2411.02372 | link |
2024-11-04 | DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution | Yang Yue et.al. | 2411.02359 | link |
2024-11-04 | "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization | Eldar Kurtic et.al. | 2411.02355 | null |
2024-11-04 | Machine learning identification of maternal inflammatory response and histologic choroamnionitis from placental membrane whole slide images | Abhishek Sharma et.al. | 2411.02354 | null |
2024-11-04 | Social-RAG: Retrieving from Group Interactions to Socially Ground Proactive AI Generation to Group Preferences | Ruotong Wang et.al. | 2411.02353 | null |
2024-11-04 | Can Large Language Models generalize analogy solving like people can? | Claire E. Stevenson et.al. | 2411.02348 | null |
2024-11-04 | WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning | Zehan Qi et.al. | 2411.02337 | link |
2024-11-04 | Sparsing Law: Towards Large Language Models with Greater Activation Sparsity | Yuqi Luo et.al. | 2411.02335 | link |
2024-11-04 | Disrupting Test Development with AI Assistants | Vijay Joshi et.al. | 2411.02328 | null |
2024-11-04 | PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance | Ruyang Liu et.al. | 2411.02327 | link |
2024-11-04 | An Empirical Study on the Code Refactoring Capability of Large Language Models | Jonathan Cordeiro et.al. | 2411.02320 | null |
2024-11-04 | Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast | Marilyn Rego et.al. | 2411.02318 | null |
2024-11-04 | Defining and Evaluating Physical Safety for Large Language Models | Yung-Chen Tang et.al. | 2411.02317 | null |
2024-11-04 | Evaluating Creative Short Story Generation in Humans and Large Language Models | Mete Ismayilzada et.al. | 2411.02316 | link |
2024-11-04 | Taking AI Welfare Seriously | Robert Long et.al. | 2411.00986 | null |
2024-10-31 | P-Masking: Power Law Masking Improves Multi-attribute Controlled Generation | Mohamed Elgaar et.al. | 2410.24201 | null |
2024-11-01 | SelfCodeAlign: Self-Alignment for Code Generation | Yuxiang Wei et.al. | 2410.24198 | link |
2024-10-31 | DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models | Heng-Jui Chang et.al. | 2410.24177 | null |
2024-10-31 | Constraint Back-translation Improves Complex Instruction Following of Large Language Models | Yunjia Qi et.al. | 2410.24175 | null |
2024-10-31 | Kevin Black et.al. | 2410.24164 | null | |
2024-10-31 | GPT or BERT: why not both? | Lucas Georges Gabriel Charpentier et.al. | 2410.24159 | link |
2024-10-31 | Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning | Jinghan Zhang et.al. | 2410.24155 | null |
2024-10-31 | Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning | Jiaqi Liu et.al. | 2410.24152 | null |
2024-10-31 | Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age | Nouar AlDahoul et.al. | 2410.24148 | null |
2024-10-31 | Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing | Akash Dhruv et.al. | 2410.24119 | link |
2024-10-31 | Repository-Level Compositional Code Translation and Validation | Ali Reza Ibrahimzada et.al. | 2410.24117 | link |
2024-10-31 | Matchmaker: Self-Improving Large Language Model Programs for Schema Matching | Nabeel Seedat et.al. | 2410.24105 | null |
2024-10-31 | Progressive Safeguards for Safe and Model-Agnostic Reinforcement Learning | Nabil Omi et.al. | 2410.24096 | null |
2024-10-31 | In-Context Fine-Tuning for Time-Series Foundation Models | Abhimanyu Das et.al. | 2410.24087 | null |
2024-10-31 | Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs | Muhammed Saeed et.al. | 2410.24049 | null |
2024-10-31 | Handwriting Recognition in Historical Documents with Multimodal LLM | Lucian Li et.al. | 2410.24034 | null |
2024-10-31 | Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks | Yingzhe Peng et.al. | 2410.24032 | null |
2024-10-31 | AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents | Yifan Xu et.al. | 2410.24024 | link |
2024-10-31 | SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation | Liang He et.al. | 2410.24022 | null |
2024-10-31 | Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody? | Ioannis Tsiamas et.al. | 2410.24019 | null |
2024-10-30 | ReferEverything: Towards Segmenting Everything We Can Speak of in Videos | Anurag Bagchi et.al. | 2410.23287 | null |
2024-10-30 | A Monte Carlo Framework for Calibrated Uncertainty Estimation in Sequence Prediction | Qidong Yang et.al. | 2410.23272 | null |
2024-10-30 | TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models | Ziyao Shangguan et.al. | 2410.23266 | link |
2024-10-30 | EMMA: End-to-End Multimodal Model for Autonomous Driving | Jyh-Jing Hwang et.al. | 2410.23262 | null |
2024-10-30 | Keypoint Abstraction using Large Models for Object-Relative Imitation Learning | Xiaolin Fang et.al. | 2410.23254 | null |
2024-10-30 | Evaluating Cultural and Social Awareness of LLM Web Agents | Haoyi Qiu et.al. | 2410.23252 | null |
2024-10-30 | Carrot and Stick: Eliciting Comparison Data and Beyond | Yiling Chen et.al. | 2410.23243 | null |
2024-10-30 | A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment | Matteo G. Mecattaf et.al. | 2410.23242 | link |
2024-10-30 | EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning | Peide Huang et.al. | 2410.23234 | null |
2024-10-30 | COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences | Yixin Liu et.al. | 2410.23223 | link |
2024-10-30 | Partial Channel Dependence with Channel Masks for Time Series Foundation Models | Seunghan Lee et.al. | 2410.23222 | null |
2024-10-30 | OS-ATLAS: A Foundation Action Model for Generalist GUI Agents | Zhiyong Wu et.al. | 2410.23218 | link |
2024-10-31 | Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval | Sheryl Hsu et.al. | 2410.23214 | null |
2024-10-30 | ProTransformer: Robustify Transformers via Plug-and-Play Paradigm | Zhichao Hou et.al. | 2410.23182 | link |
2024-10-30 | ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning | Millennium Bismay et.al. | 2410.23180 | link |
2024-10-30 | TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters | Haiyang Wang et.al. | 2410.23168 | link |
2024-10-30 | SciPIP: An LLM-based Scientific Paper Idea Proposer | Wenxiao Wang et.al. | 2410.23166 | link |
2024-10-30 | FlexTSF: A Universal Forecasting Model for Time Series with Variable Regularities | Jingge Xiao et.al. | 2410.23160 | link |
2024-10-30 | VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning | Yichao Liang et.al. | 2410.23156 | null |
2024-10-30 | Public Domain 12M: A Highly Aesthetic Image-Text Dataset with Novel Governance Mechanisms | Jordan Meyer et.al. | 2410.23144 | null |
2024-10-29 | Local Policies Enable Zero-shot Long-horizon Manipulation | Murtaza Dalal et.al. | 2410.22332 | null |
2024-10-29 | Task Vectors are Cross-Modal | Grace Luo et.al. | 2410.22330 | null |
2024-10-29 | Enhancing Code Annotation Reliability: Generative AI's Role in Comment Quality Assessment Models | Seetharam Killivalavan et.al. | 2410.22323 | null |
2024-10-29 | Online Detecting LLM-Generated Texts via Sequential Hypothesis Testing by Betting | Can Chen et.al. | 2410.22318 | link |
2024-10-29 | Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier | Kai Wang et.al. | 2410.22317 | link |
2024-10-29 | Natural Language Inference Improves Compositionality in Vision-Language Models | Paola Cascante-Bonilla et.al. | 2410.22315 | null |
2024-10-29 | Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving | Bo Jiang et.al. | 2410.22313 | link |
2024-10-29 | GPT-4o reads the mind in the eyes | James W. A. Strachan et.al. | 2410.22309 | null |
2024-10-29 | SVIP: Towards Verifiable Inference of Open-source Large Language Models | Yifan Sun et.al. | 2410.22307 | null |
2024-10-29 | Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | Yihe Deng et.al. | 2410.22304 | null |
2024-10-29 | LLMs are Highly-Constrained Biophysical Sequence Optimizers | Angelica Chen et.al. | 2410.22296 | null |
2024-10-29 | Fine-Tuning LLMs for Code Mutation: A New Era of Cyber Threats | Mohammad Setak et.al. | 2410.22293 | null |
2024-10-29 | From melodic note sequences to pitches using word2vec | Daniel Defays et.al. | 2410.22285 | null |
2024-10-29 | Embedding-based classifiers can detect prompt injection attacks | Md. Ahsan Ayub et.al. | 2410.22284 | link |
2024-10-29 | Whose ChatGPT? Unveiling Real-World Educational Inequalities Introduced by Large Language Models | Renzhe Yu et.al. | 2410.22282 | null |
2024-10-29 | Fourier Head: Helping Large Language Models Learn Complex Probability Distributions | Nate Gillman et.al. | 2410.22269 | null |
2024-10-29 | Meta-Learning Adaptable Foundation Models | Jacob L. Block et.al. | 2410.22264 | null |
2024-10-29 | FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation | Farima Fatahi Bayat et.al. | 2410.22257 | null |
2024-10-29 | Abrupt Learning in Transformers: A Case Study on Matrix Completion | Pulkit Gopalani et.al. | 2410.22244 | null |
2024-10-29 | Are Decoder-Only Large Language Models the Silver Bullet for Code Search? | Yuxuan Chen et.al. | 2410.22240 | link |
2024-10-28 | Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics | Yaniv Nikankin et.al. | 2410.21272 | link |
2024-10-28 | LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior | Hanyu Wang et.al. | 2410.21264 | null |
2024-10-28 | BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference | Changwoo Lee et.al. | 2410.21262 | link |
2024-10-28 | AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? | Han Bao et.al. | 2410.21259 | link |
2024-10-28 | Multi-modal AI for comprehensive breast cancer prognostication | Jan Witowski et.al. | 2410.21256 | null |
2024-10-28 | LongReward: Improving Long-context Large Language Models with AI Feedback | Jiajie Zhang et.al. | 2410.21252 | link |
2024-10-28 | Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback | Nour Jedidi et.al. | 2410.21242 | null |
2024-10-28 | Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce | Zhantao Yang et.al. | 2410.21237 | null |
2024-10-28 | Flaming-hot Initiation with Regular Execution Sampling for Large Language Models | Weizhe Chen et.al. | 2410.21236 | null |
2024-10-28 | LoRA vs Full Fine-tuning: An Illusion of Equivalence | Reece Shuttleworth et.al. | 2410.21228 | null |
2024-10-28 | Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines | Zhixin Zhang et.al. | 2410.21220 | link |
2024-10-28 | Lifting the Veil on the Large Language Model Supply Chain: Composition, Risks, and Mitigations | Kaifeng Huang et.al. | 2410.21218 | null |
2024-10-28 | BongLLaMA: LLaMA for Bangla Language | Abdullah Khan Zehady et.al. | 2410.21200 | null |
2024-10-28 | Belief in the Machine: Investigating Epistemological Blind Spots of Language Models | Mirac Suzgun et.al. | 2410.21195 | link |
2024-10-29 | Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction | Qintong Zhang et.al. | 2410.21169 | null |
2024-10-28 | M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation | Jiaheng Liu et.al. | 2410.21157 | null |
2024-10-28 | Palisade -- Prompt Injection Detection Framework | Sahasra Kokkula et.al. | 2410.21146 | null |
2024-10-28 | LLM-initialized Differentiable Causal Discovery | Shiv Kampani et.al. | 2410.21141 | null |
2024-10-28 | Do LLMs generate test oracles that capture the actual or the expected program behaviour? | Michael Konstantinou et.al. | 2410.21136 | null |
2024-10-28 | Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments | Marharyta Domnich et.al. | 2410.21131 | link |
2024-10-25 | The Potential and Value of AI Chatbot in Personalized Cognitive Training | Zilong Wang et.al. | 2410.19733 | null |
2024-10-25 | Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models | Yucheng Zhou et.al. | 2410.19732 | null |
2024-10-25 | Counting Ability of Large Language Models and Impact of Tokenization | Xiang Zhang et.al. | 2410.19730 | link |
2024-10-25 | FISHNET: Financial Intelligence from Sub-querying, Harmonizing, Neural-Conditioning, Expert Swarms, and Task Planning | Nicole Cho et.al. | 2410.19727 | null |
2024-10-25 | 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision | Shilong Li et.al. | 2410.19720 | null |
2024-10-25 | Multi-view biomedical foundation models for molecule-target and property prediction | Parthasarathy Suryanarayanan et.al. | 2410.19704 | link |
2024-10-25 | TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Xiangyu Zeng et.al. | 2410.19702 | null |
2024-10-25 | IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation | Kaixian Qu et.al. | 2410.19697 | null |
2024-10-25 | Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs | Yifei Zhang et.al. | 2410.19694 | null |
2024-10-25 | APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs | Huaxiaoyue Wang et.al. | 2410.19656 | null |
2024-10-25 | Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models | Shenghao Fu et.al. | 2410.19635 | null |
2024-10-25 | Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina | Yuan Gao et.al. | 2410.19599 | null |
2024-10-25 | Diverse Sign Language Translation | Xin Shen et.al. | 2410.19586 | link |
2024-10-25 | ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems | Ritvik Aggarwal Ishneet Sukhvinder Singh Ibrahim Allahverdiyev et.al. | 2410.19572 | null |
2024-10-25 | GeoLLaVA: Efficient Fine-Tuned Vision-Language Models for Temporal Change Detection in Remote Sensing | Hosam Elgendy et.al. | 2410.19552 | link |
2024-10-25 | Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad? | Antonia Wüst et.al. | 2410.19546 | link |
2024-10-25 | Brain-like Functional Organization within Large Language Models | H. Sun et.al. | 2410.19542 | null |
2024-10-25 | Detection of Human and Machine-Authored Fake News in Urdu | Muhammad Zain Ali et.al. | 2410.19517 | link |
2024-10-25 | SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models | Jahyun Koo et.al. | 2410.19503 | null |
2024-10-25 | Introducing MAPO: Momentum-Aided Gradient Descent Prompt Optimization | Anthony Cui et.al. | 2410.19499 | null |
2024-10-24 | Unbounded: A Generative Infinite Game of Character Life Simulation | Jialu Li et.al. | 2410.18975 | null |
2024-10-24 | Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques | David Ortiz-Perez et.al. | 2410.18972 | null |
2024-10-24 | ConceptDrift: Uncovering Biases through the Lens of Foundational Models | Cristian Daniel Păduraru et.al. | 2410.18970 | null |
2024-10-24 | Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms | Zhangheng Li et.al. | 2410.18967 | null |
2024-10-24 | Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions | Yujuan Fu et.al. | 2410.18966 | null |
2024-10-24 | On the Crucial Role of Initialization for Matrix Factorization | Bingcong Li et.al. | 2410.18965 | null |
2024-10-24 | OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning | Xiaoqiang Wang et.al. | 2410.18963 | null |
2024-10-24 | Context is Key: A Benchmark for Forecasting with Essential Textual Information | Andrew Robert Williams et.al. | 2410.18959 | link |
2024-10-24 | Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code | Jipeng Zhang et.al. | 2410.18957 | null |
2024-10-24 | BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning | Yujuan Velvin Fu et.al. | 2410.18955 | null |
2024-10-24 | Dynamic Vocabulary Pruning in Early-Exit LLMs | Jort Vincenti et.al. | 2410.18952 | link |
2024-10-24 | SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models | Zonghao Ying et.al. | 2410.18927 | null |
2024-10-24 | From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems | A M Muntasir Rahman et.al. | 2410.18921 | null |
2024-10-25 | A Survey on Speech Large Language Models | Jing Peng et.al. | 2410.18908 | null |
2024-10-24 | PRISM: A Methodology for Auditing Biases in Large Language Models | Leif Azzopardi et.al. | 2410.18906 | link |
2024-10-24 | LLMs for Extremely Low-Resource Finno-Ugric Languages | Taido Purason et.al. | 2410.18902 | null |
2024-10-24 | Creating and Repairing Robot Programs in Open-World Domains | Claire Schlesinger et.al. | 2410.18893 | null |
2024-10-24 | Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks | Graziano A. Manduzio et.al. | 2410.18890 | null |
2024-10-24 | Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance | Omer Nahum et.al. | 2410.18889 | null |
2024-10-24 | Provably Robust Watermarks for Open-Source Language Models | Miranda Christ et.al. | 2410.18861 | null |
2024-10-23 | TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts | Yuxuan Xie et.al. | 2410.18071 | null |
2024-10-23 | CLEAR: Character Unlearning in Textual and Visual Modalities | Alexey Dontsov et.al. | 2410.18057 | null |
2024-10-23 | LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering | Qingfei Zhao et.al. | 2410.18050 | link |
2024-10-23 | Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases | Anna Glazkova et.al. | 2410.18040 | null |
2024-10-23 | MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning | Jingfan Zhang et.al. | 2410.18035 | null |
2024-10-23 | GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration | Xin Li et.al. | 2410.18032 | link |
2024-10-23 | MiniFed : Integrating LLM-based Agentic-Workflow for Simulating FOMC Meeting | Sungil Seok et.al. | 2410.18012 | null |
2024-10-23 | Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation | Suho Kang et.al. | 2410.18001 | link |
2024-10-23 | MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers | Zebin Yang et.al. | 2410.17957 | null |
2024-10-23 | ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference | Xin He et.al. | 2410.17954 | null |
2024-10-23 | SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains | Ran Xu et.al. | 2410.17952 | null |
2024-10-23 | Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling | Nirav Bhan et.al. | 2410.17950 | null |
2024-10-23 | Toward path-invariant embeddings for local distance source characterization | Lisa Linville et.al. | 2410.17937 | null |
2024-10-23 | Guide for Defense (G4D): Dynamic Guidance for Robust and Balanced Defense in Large Language Models | He Cao et.al. | 2410.17922 | link |
2024-10-23 | Scaling Diffusion Language Models via Adaptation from Autoregressive Models | Shansan Gong et.al. | 2410.17891 | link |
2024-10-23 | R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models | Linger Deng et.al. | 2410.17885 | link |
2024-10-23 | Lightweight Neural App Control | Filippos Christianos et.al. | 2410.17883 | null |
2024-10-23 | AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning | Yehonathan Refael et.al. | 2410.17881 | null |
2024-10-23 | Understanding Layer Significance in LLM Alignment | Guangyuan Shi et.al. | 2410.17875 | null |
2024-10-23 | DataTales: A Benchmark for Real-World Intelligent Data Narration | Yajing Yang et.al. | 2410.17859 | link |
2024-10-22 | PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction | Long Xing et.al. | 2410.17247 | link |
2024-10-22 | Towards Reliable Evaluation of Behavior Steering Interventions in LLMs | Itamar Pres et.al. | 2410.17245 | null |
2024-10-22 | Frontiers in Intelligent Colonoscopy | Ge-Peng Ji et.al. | 2410.17241 | link |
2024-10-22 | Large Language Models Empowered Personalized Web Agents | Hongru Cai et.al. | 2410.17236 | null |
2024-10-22 | Automated Spinal MRI Labelling from Reports Using a Large Language Model | Robin Y. Park et.al. | 2410.17235 | link |
2024-10-22 | Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy | Benedict Aaron Tjandra et.al. | 2410.17234 | null |
2024-10-22 | Few-shot In-Context Preference Learning Using Large Language Models | Chao Yu et.al. | 2410.17233 | null |
2024-10-22 | Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods | Tsachi Blau et.al. | 2410.17222 | null |
2024-10-22 | MiniPLM: Knowledge Distillation for Pre-Training Language Models | Yuxian Gu et.al. | 2410.17215 | link |
2024-10-22 | Exploring Possibilities of AI-Powered Legal Assistance in Bangladesh through Large Language Modeling | Azmine Toushik Wasi et.al. | 2410.17210 | link |
2024-10-22 | VoiceBench: Benchmarking LLM-Based Voice Assistants | Yiming Chen et.al. | 2410.17196 | link |
2024-10-23 | Non-myopic Generation of Language Model for Reasoning and Planning | Chang Ma et.al. | 2410.17195 | link |
2024-10-22 | Remote Timing Attacks on Efficient Language Model Inference | Nicholas Carlini et.al. | 2410.17175 | null |
2024-10-22 | From Attention to Activation: Unravelling the Enigmas of Large Language Models | Prannay Kaul et.al. | 2410.17174 | null |
2024-10-22 | Self-calibration for Language Model Quantization and Pruning | Miles Williams et.al. | 2410.17170 | null |
2024-10-22 | Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence | İlker Işık et.al. | 2410.17161 | null |
2024-10-22 | Improving Pinterest Search Relevance Using Large Language Models | Han Wang et.al. | 2410.17152 | null |
2024-10-22 | Are Visual-Language Models Effective in Action Recognition? A Comparative Study | Mahmoud Ali et.al. | 2410.17149 | null |
2024-10-22 | Can General-Purpose Large Language Models Generalize to English-Thai Machine Translation ? | Jirat Chiaranaipanich et.al. | 2410.17145 | null |
2024-10-22 | Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements | Isamu Isozaki et.al. | 2410.17141 | link |
2024-10-21 | Reflection-Bench: probing AI intelligence with reflection | Lingyu Li et.al. | 2410.16270 | link |
2024-10-21 | SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree | Shuangrui Ding et.al. | 2410.16268 | link |
2024-10-21 | xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs | Michael S. Ryoo et.al. | 2410.16267 | null |
2024-10-22 | Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance | Zhangwei Gao et.al. | 2410.16261 | link |
2024-10-21 | Elucidating the design space of language models for image generation | Xuantong Liu et.al. | 2410.16257 | link |
2024-10-21 | CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution | Maosong Cao et.al. | 2410.16256 | link |
2024-10-21 | Can Knowledge Editing Really Correct Hallucinations? | Baixiang Huang et.al. | 2410.16251 | link |
2024-10-21 | Analyzing Context Contributions in LLM-based Machine Translation | Emmanouil Zaranis et.al. | 2410.16246 | null |
2024-10-21 | IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems | Yihuan Mao et.al. | 2410.16237 | null |
2024-10-21 | LLaVA-KD: A Framework of Distilling Multimodal Large Language Models | Yuxuan Cai et.al. | 2410.16236 | link |
2024-10-21 | ToW: Thoughts of Words Improve Reasoning in Large Language Models | Zhikun Xu et.al. | 2410.16235 | link |
2024-10-21 | Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping | Ryan Li et.al. | 2410.16232 | null |
2024-10-21 | Building A Coding Assistant via the Retrieval-Augmented Language Model | Xinze Li et.al. | 2410.16229 | link |
2024-10-21 | A Realistic Threat Model for Large Language Model Jailbreaks | Valentyn Boreiko et.al. | 2410.16222 | link |
2024-10-21 | Pre-training Distillation for Large Language Models: A Design Space Exploration | Hao Peng et.al. | 2410.16215 | null |
2024-10-21 | Comprehensive benchmarking of large language models for RNA secondary structure prediction | L. I. Zablocki et.al. | 2410.16212 | link |
2024-10-21 | CoT-TL: Low-Resource Temporal Knowledge Representation of Planning Instructions Using Chain-of-Thought Reasoning | Kumar Manas et.al. | 2410.16207 | null |
2024-10-21 | Improve Vision Language Model Chain-of-thought Reasoning | Ruohong Zhang et.al. | 2410.16198 | link |
2024-10-22 | LASER: Script Execution by Autonomous Agents for On-demand Traffic Simulation | Hao Gao et.al. | 2410.16197 | link |
2024-10-21 | Contamination Report for Multilingual Benchmarks | Sanchit Ahuja et.al. | 2410.16186 | null |
2024-10-18 | Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts | German Gritsai et.al. | 2410.14677 | link |
2024-10-18 | SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment | Qin Liu et.al. | 2410.14676 | null |
2024-10-18 | Enhancing Large Language Models' Situated Faithfulness to External Contexts | Yukun Huang et.al. | 2410.14675 | link |
2024-10-18 | Decomposing The Dark Matter of Sparse Autoencoders | Joshua Engels et.al. | 2410.14670 | link |
2024-10-18 | NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples | Baiqi Li et.al. | 2410.14669 | null |
2024-10-18 | MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps | Xiongtao Zhou et.al. | 2410.14668 | link |
2024-10-18 | A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning | Shengjie Sun et.al. | 2410.14660 | null |
2024-10-18 | Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens | Zhepeng Cen et.al. | 2410.14655 | null |
2024-10-18 | EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search | Oliver Sieberling et.al. | 2410.14649 | link |
2024-10-18 | Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs | Runchu Tian et.al. | 2410.14641 | link |
2024-10-18 | GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings | Raghuveer Thirukovalluru et.al. | 2410.14635 | link |
2024-10-18 | Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning | Yuxiang Lu et.al. | 2410.14633 | null |
2024-10-18 | On the Regularization of Learnable Embeddings for Time Series Processing | Luca Butera et.al. | 2410.14630 | null |
2024-10-18 | CELI: Controller-Embedded Language Model Interactions | Jan-Samuel Wagner et.al. | 2410.14627 | null |
2024-10-18 | DiSCo Meets LLMs: A Unified Approach for Sparse Retrieval and Contextual Distillation in Conversational Search | Simon Lupart et.al. | 2410.14609 | null |
2024-10-18 | Teaching Models to Balance Resisting and Accepting Persuasion | Elias Stengel-Eskin et.al. | 2410.14596 | link |
2024-10-18 | Neuro-Symbolic Traders: Assessing the Wisdom of AI Crowds in Markets | Namid R. Stillman et.al. | 2410.14587 | null |
2024-10-18 | Do LLMs estimate uncertainty well in instruction-following? | Juyeon Heo et.al. | 2410.14582 | null |
2024-10-18 | Large Language Models Are Overparameterized Text Encoders | Thennal D K et.al. | 2410.14578 | null |
2024-10-18 | MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts | Rachel S. Y. Teo et.al. | 2410.14574 | link |
2024-10-17 | Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens | Lijie Fan et.al. | 2410.13863 | null |
2024-10-17 | PUMA: Empowering Unified MLLM with Multi-granular Visual Generation | Rongyao Fang et.al. | 2410.13861 | link |
2024-10-17 | VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding | Runsen Xu et.al. | 2410.13860 | link |
2024-10-17 | Yaxin Luo et.al. | 2410.13859 | null | |
2024-10-17 | How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs | Guhao Feng et.al. | 2410.13857 | null |
2024-10-17 | Can MLLMs Understand the Deep Implication Behind Chinese Images? | Chenhao Zhang et.al. | 2410.13854 | link |
2024-10-17 | Retrospective Learning from Interactions | Zizhao Chen et.al. | 2410.13852 | null |
2024-10-17 | Differentiable Robot Rendering | Ruoshi Liu et.al. | 2410.13851 | null |
2024-10-17 | SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction | Xuan Zhang et.al. | 2410.13846 | link |
2024-10-17 | A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models | Qiaoyu Tang et.al. | 2410.13841 | null |
2024-10-17 | Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs | Tianyu Guo et.al. | 2410.13835 | link |
2024-10-17 | A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement | Hui Yuan et.al. | 2410.13828 | link |
2024-10-17 | Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models | Mazda Moayeri et.al. | 2410.13826 | null |
2024-10-17 | AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents | Ke Yang et.al. | 2410.13825 | null |
2024-10-18 | Harnessing Webpage UIs for Text-Rich Visual Understanding | Junpeng Liu et.al. | 2410.13824 | null |
2024-10-17 | Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning | Xiaodan Xing et.al. | 2410.13823 | link |
2024-10-17 | Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance | Mitsuhiko Nakamoto et.al. | 2410.13816 | null |
2024-10-17 | De-mark: Watermark Removal in Large Language Models | Ruibo Chen et.al. | 2410.13808 | null |
2024-10-17 | A Watermark for Order-Agnostic Language Models | Ruibo Chen et.al. | 2410.13805 | null |
2024-10-18 | BenTo: Benchmark Task Reduction with In-Context Transferability | Hongyu Zhao et.al. | 2410.13804 | link |
2024-10-16 | Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models | Ce Zhang et.al. | 2410.12790 | link |
2024-10-16 | Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception | Jihao Zhao et.al. | 2410.12788 | link |
2024-10-16 | In-Context Learning Enables Robot Action Prediction in LLMs | Yida Yin et.al. | 2410.12782 | null |
2024-10-16 | Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information | Yingya Li et.al. | 2410.12774 | null |
2024-10-16 | Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions | Zhenyu Jiang et.al. | 2410.12773 | null |
2024-10-16 | Towards Zero-Shot Camera Trap Image Categorization | Jiří Vyskočil et.al. | 2410.12769 | null |
2024-10-16 | The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse | Ekansh Sharma et.al. | 2410.12766 | null |
2024-10-16 | StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples | Ajay Patel et.al. | 2410.12757 | null |
2024-10-17 | CREAM: Consistency Regularized Self-Rewarding Language Models | Zhaoyang Wang et.al. | 2410.12735 | null |
2024-10-16 | WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation | João Matos et.al. | 2410.12722 | link |
2024-10-16 | FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression | Zhenheng Tang et.al. | 2410.12707 | null |
2024-10-16 | WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines | Genta Indra Winata et.al. | 2410.12705 | link |
2024-10-16 | Sarcasm Detection in a Less-Resourced Language | Lazar Đoković et.al. | 2410.12704 | link |
2024-10-16 | Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization | Xingqi Wang et.al. | 2410.12700 | link |
2024-10-16 | VividMed: Vision Language Model with Versatile Visual Grounding for Medicine | Lingxiao Luo et.al. | 2410.12694 | link |
2024-10-16 | Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2 | Mohamad Abdi et.al. | 2410.12686 | null |
2024-10-16 | 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation | Dewei Zhou et.al. | 2410.12669 | link |
2024-10-16 | Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models | Shicheng Xu et.al. | 2410.12662 | null |
2024-10-16 | Evaluating Morphological Compositional Generalization in Large Language Models | Mete Ismayilzada et.al. | 2410.12656 | null |
2024-10-16 | Beyond Speech and More: Investigating the Emergent Ability of Speech Foundation Models for Classifying Physiological Time-Series Signals | Orchid Chetia Phukan et.al. | 2410.12645 | null |
2024-10-15 | GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation | Fei Tang et.al. | 2410.11841 | link |
2024-10-15 | A Hitchhiker's Guide to Scaling Law Estimation | Leshem Choshen et.al. | 2410.11840 | link |
2024-10-15 | MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding | Yue Cao et.al. | 2410.11829 | link |
2024-10-15 | Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws | Yiding Jiang et.al. | 2410.11820 | link |
2024-10-15 | Improving Long-Text Alignment for Text-to-Image Diffusion Models | Luping Liu et.al. | 2410.11817 | link |
2024-10-15 | SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing | Zhiyuan Zhang et.al. | 2410.11815 | null |
2024-10-15 | NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models | Han Han et.al. | 2410.11805 | link |
2024-10-15 | FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting | Zhe Li et.al. | 2410.11802 | null |
2024-10-15 | Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability | Tsz Ting Chung et.al. | 2410.11786 | null |
2024-10-15 | Latent BKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty | Joey Wilson et.al. | 2410.11783 | link |
2024-10-15 | G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks | Guibin Zhang et.al. | 2410.11782 | null |
2024-10-15 | Language Models Encode Numbers Using Digit Representations in Base 10 | Amit Arnold Levy et.al. | 2410.11781 | link |
2024-10-15 | MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation | Chenxi Wang et.al. | 2410.11779 | link |
2024-10-15 | Time-Series Foundation Model for Value-at-Risk | Anubha Goel et.al. | 2410.11773 | link |
2024-10-15 | Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models | Kai Yao et.al. | 2410.11772 | link |
2024-10-15 | SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding | Ying Chen et.al. | 2410.11761 | null |
2024-10-15 | Latent Action Pretraining from Videos | Seonghyeon Ye et.al. | 2410.11758 | null |
2024-10-15 | Personas with Attitudes: Controlling LLMs for Diverse Data Annotation | Leon Fröhling et.al. | 2410.11745 | link |
2024-10-15 | DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure | Yunfan Xiong et.al. | 2410.11744 | null |
2024-10-15 | Light-Weight Fault Tolerant Attention for Large Language Model Training | Yuhang Liang et.al. | 2410.11720 | null |
2024-10-14 | DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads | Guangxuan Xiao et.al. | 2410.10819 | link |
2024-10-14 | Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free | Ziyue Li et.al. | 2410.10814 | link |
2024-10-14 | LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory | Di Wu et.al. | 2410.10813 | link |
2024-10-14 | Local and Global Decoding in Text Generation | Daniel Gareev et.al. | 2410.10810 | link |
2024-10-14 | Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning | Aakanksha et.al. | 2410.10801 | null |
2024-10-14 | Towards Foundation Models for 3D Vision: How Close Are We? | Yiming Zuo et.al. | 2410.10799 | link |
2024-10-15 | MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling | Jian Yang et.al. | 2410.10798 | null |
2024-10-14 | Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance | Sachin Goyal et.al. | 2410.10796 | link |
2024-10-15 | LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content | Nimrod Shabtay et.al. | 2410.10783 | link |
2024-10-14 | When Attention Sink Emerges in Language Models: An Empirical View | Xiangming Gu et.al. | 2410.10781 | link |
2024-10-14 | Focused ReAct: Improving ReAct through Reiterate and Early Stop | Shuoqiu Li et.al. | 2410.10779 | null |
2024-10-14 | AFlow: Automating Agentic Workflow Generation | Jiayi Zhang et.al. | 2410.10762 | link |
2024-10-14 | Denial-of-Service Poisoning Attacks against Large Language Models | Kuofeng Gao et.al. | 2410.10760 | link |
2024-10-14 | SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization | Akrit Mudvari et.al. | 2410.10759 | null |
2024-10-14 | Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification | Jan Cegin et.al. | 2410.10756 | link |
2024-10-14 | NT-LLM: A Novel Node Tokenizer for Integrating Graph Structure into Large Language Models | Yanbiao Ji et.al. | 2410.10743 | null |
2024-10-14 | SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing | Pengrui Quan et.al. | 2410.10741 | link |
2024-10-14 | Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs | Ishan Jindal et.al. | 2410.10739 | null |
2024-10-14 | Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning | Kuofeng Gao et.al. | 2410.10735 | null |
2024-10-14 | Towards LLM-guided Efficient and Interpretable Multi-linear Tensor Network Rank Selection | Giorgos Iacovides et.al. | 2410.10728 | null |
2024-10-11 | Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models | Qin Liu et.al. | 2410.09047 | null |
2024-10-11 | AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation | Zijun Wang et.al. | 2410.09040 | link |
2024-10-11 | Semi-Supervised Learning of Noisy Mixture of Experts Models | Oh-Ran Kwon et.al. | 2410.09039 | null |
2024-10-11 | SimpleStrat: Diversifying Language Model Generation with Stratification | Justin Wong et.al. | 2410.09038 | null |
2024-10-11 | Mentor-KD: Making Small Language Models Better Multi-step Reasoners | Hojae Lee et.al. | 2410.09037 | link |
2024-10-11 | PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents | Xiangyu Yin et.al. | 2410.09034 | link |
2024-10-11 | MedMobile: A mobile-sized language model with expert-level clinical capabilities | Krithik Vishwanath et.al. | 2410.09019 | link |
2024-10-11 | Parameter-Efficient Fine-Tuning of State Space Models | Kevin Galim et.al. | 2410.09016 | link |
2024-10-11 | The Impact of Visual Information in Chinese Characters: Evaluating Large Models' Ability to Recognize and Utilize Radicals | Xiaofeng Wu et.al. | 2410.09013 | null |
2024-10-11 | Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models | Hao Li et.al. | 2410.09012 | link |
2024-10-11 | SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights | Ling Yang et.al. | 2410.09008 | link |
2024-10-11 | From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating UI Operation Impacts | Zhuohao Jerry Zhang et.al. | 2410.09006 | null |
2024-10-11 | DA-Ada: Learning Domain-Aware Adapter for Domain Adaptive Object Detection | Haochen Li et.al. | 2410.09004 | null |
2024-10-11 | Hypothesis-only Biases in Large Language Model-Elicited Natural Language Inference | Grace Proebsting et.al. | 2410.08996 | null |
2024-10-11 | The structure of the token space for large language models | Michael Robinson et.al. | 2410.08993 | null |
2024-10-11 | Science is Exploration: Computational Frontiers for Conceptual Metaphor Theory | Rebecca M. M. Hicke et.al. | 2410.08991 | link |
2024-10-11 | SubZero: Random Subspace Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning | Ziming Yu et.al. | 2410.08989 | link |
2024-10-11 | Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective | Bo Ni et.al. | 2410.08985 | null |
2024-10-11 | NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models | Zheng Yi Ho et.al. | 2410.08970 | null |
2024-10-11 | Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements | Jingyu Zhang et.al. | 2410.08968 | null |
2024-10-10 | DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models | Xiaoxiao He et.al. | 2410.08207 | null |
2024-10-10 | Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training | Gen Luo et.al. | 2410.08202 | null |
2024-10-10 | Adam Exploits |
Shuo Xie et.al. | 2410.08198 | link |
2024-10-10 | From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions | Changle Qu et.al. | 2410.08197 | link |
2024-10-10 | MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code | Zimu Lu et.al. | 2410.08196 | link |
2024-10-10 | Features are fate: a theory of transfer learning in high-dimensional regression | Javan Tahir et.al. | 2410.08194 | null |
2024-10-10 | GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment | Yuancheng Xu et.al. | 2410.08193 | null |
2024-10-10 | MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models | Wenbo Hu et.al. | 2410.08182 | null |
2024-10-10 | Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models | Qingni Wang et.al. | 2410.08174 | null |
2024-10-10 | On the Evaluation of Generative Robotic Simulations | Feng Chen et.al. | 2410.08172 | null |
2024-10-10 | Visual Scratchpads: Enabling Global Reasoning in Vision | Aryo Lotfi et.al. | 2410.08165 | null |
2024-10-10 | Agent S: An Open Agentic Framework that Uses Computers Like a Human | Saaket Agashe et.al. | 2410.08164 | link |
2024-10-10 | The Effect of Surprisal on Reading Times in Information Seeking and Repeated Reading | Keren Gruteke Klein et.al. | 2410.08162 | link |
2024-10-10 | DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation | Jiatao Gu et.al. | 2410.08159 | null |
2024-10-10 | Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning | Amrith Setlur et.al. | 2410.08146 | null |
2024-10-10 | Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs | Xiaoyuan Liu et.al. | 2410.08145 | link |
2024-10-10 | DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory | Yutong Wang et.al. | 2410.08143 | link |
2024-10-10 | Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction | Jarrid Rector-Brooks et.al. | 2410.08134 | null |
2024-10-10 | Think Beyond Size: Dynamic Prompting for More Effective Reasoning | Kamesh R et.al. | 2410.08130 | null |
2024-10-10 | Mars: Situated Inductive Reasoning in an Open-World Environment | Xiaojuan Tang et.al. | 2410.08126 | null |
2024-10-09 | MM-Ego: Towards Building Egocentric Multimodal LLMs | Hanrong Ye et.al. | 2410.07177 | null |
2024-10-09 | Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models | Fei Wang et.al. | 2410.07176 | null |
2024-10-09 | Do better language models have crisper vision? | Jona Ruthardt et.al. | 2410.07173 | null |
2024-10-09 | One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation | Fabian Paischer et.al. | 2410.07170 | link |
2024-10-09 | Sylber: Syllabic Embedding Representation of Speech from Raw Audio | Cheol Jun Cho et.al. | 2410.07168 | link |
2024-10-09 | Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate | Qidong Huang et.al. | 2410.07167 | link |
2024-10-09 | Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making | Manling Li et.al. | 2410.07166 | link |
2024-10-09 | Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning | Chongyu Fan et.al. | 2410.07163 | link |
2024-10-09 | Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis | Bohan Zeng et.al. | 2410.07155 | link |
2024-10-09 | Towards Interpreting Visual Information Processing in Vision-Language Models | Clement Neo et.al. | 2410.07149 | link |
2024-10-09 | Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling | Yingfa Chen et.al. | 2410.07145 | null |
2024-10-09 | Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates | Xiaosen Zheng et.al. | 2410.07137 | link |
2024-10-10 | EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models | Rui Zhao et.al. | 2410.07133 | link |
2024-10-09 | Mental Disorders Detection in the Era of Large Language Models | Gleb Kuzmin et.al. | 2410.07129 | null |
2024-10-09 | Exploring the Readiness of Prominent Small Language Models for the Democratization of Financial Literacy | Tagore Rao Kosireddy et.al. | 2410.07118 | link |
2024-10-09 | Personalized Visual Instruction Tuning | Renjie Pi et.al. | 2410.07113 | link |
2024-10-09 | VHELM: A Holistic Evaluation of Vision Language Models | Tony Lee et.al. | 2410.07112 | link |
2024-10-09 | I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy | Gian Maria Campedelli et.al. | 2410.07109 | link |
2024-10-09 | Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context | Sangwon Yu et.al. | 2410.07103 | null |
2024-10-09 | MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering | Jun Shern Chan et.al. | 2410.07095 | link |
2024-10-07 | Fine-Tuning CLIP's Last Visual Projector: A Few-Shot Cornucopia | Mohammad Fahes et.al. | 2410.05270 | link |
2024-10-07 | Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models | Fei Wang et.al. | 2410.05269 | null |
2024-10-07 | PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs | Mengzhao Chen et.al. | 2410.05265 | link |
2024-10-07 | TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles | Qingchen Yu et.al. | 2410.05262 | link |
2024-10-07 | TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens | Ya-Qi Yu et.al. | 2410.05261 | null |
2024-10-07 | Differential Transformer | Tianzhu Ye et.al. | 2410.05258 | link |
2024-10-07 | GLEE: A Unified Framework and Benchmark for Language-based Economic Environments | Eilam Shapira et.al. | 2410.05254 | link |
2024-10-07 | Causal Micro-Narratives | Mourad Heddaya et.al. | 2410.05252 | null |
2024-10-07 | SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe | Yuxin Xiao et.al. | 2410.05248 | null |
2024-10-07 | Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents | Boyu Gou et.al. | 2410.05243 | link |
2024-10-08 | TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models | Rabin Adhikari et.al. | 2410.05239 | link |
2024-10-07 | GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models | Iman Mirzadeh et.al. | 2410.05229 | null |
2024-10-07 | Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates | Avanika Narayan et.al. | 2410.05224 | null |
2024-10-07 | Precise Model Benchmarking with Only a Few Observations | Riccardo Fogliato et.al. | 2410.05222 | null |
2024-10-07 | Density estimation with LLMs: a geometric investigation of in-context learning trajectories | Toni J. B. Liu et.al. | 2410.05218 | null |
2024-10-07 | Organizing Unstructured Image Collections using Natural Language | Mingxuan Liu et.al. | 2410.05217 | null |
2024-10-07 | Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality | Youngtaek Oh et.al. | 2410.05210 | link |
2024-10-07 | RevisEval: Improving LLM-as-a-Judge via Response-Adapted References | Qiyuan Zhang et.al. | 2410.05193 | null |
2024-10-07 | Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective | Kaiyue Wen et.al. | 2410.05192 | null |
2024-10-07 | LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation | Zhijie Wang et.al. | 2410.05191 | null |
2024-10-04 | Enhance Reasoning by Learning from Mistakes: Peer-Review Knowledge Distillation from Multiple Large Language Models | Zhuochun Li et.al. | 2410.03663 | null |
2024-10-04 | Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models | Tinghui Zhu et.al. | 2410.03659 | link |
2024-10-04 | RAFT: Realistic Attacks to Fool Text Detectors | James Wang et.al. | 2410.03658 | link |
2024-10-04 | Aligning LLMs with Individual Preferences via Interaction | Shujin Wu et.al. | 2410.03642 | link |
2024-10-04 | Conditional Enzyme Generation Using Protein Language Models with Adapters | Jason Yang et.al. | 2410.03634 | null |
2024-10-04 | Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation | Jie Xiao et.al. | 2410.03613 | null |
2024-10-04 | TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation | Jonathan Cook et.al. | 2410.03608 | null |
2024-10-04 | LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Videos | Noriaki Hirose et.al. | 2410.03603 | null |
2024-10-04 | Efficiently Identifying Watermarked Segments in Mixed-Source Texts | Xuandong Zhao et.al. | 2410.03600 | null |
2024-10-04 | Understanding Reasoning in Chain-of-Thought from the Hopfieldian View | Lijie Hu et.al. | 2410.03595 | null |
2024-10-04 | Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models | Xin Zou et.al. | 2410.03577 | link |
2024-10-04 | Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs) | Abrar Rahman et.al. | 2410.03568 | null |
2024-10-04 | Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding | Wei Wu et.al. | 2410.03553 | null |
2024-10-04 | Re-examining Sexism and Misogyny Classification with Annotator Attitudes | Aiqi Jiang et.al. | 2410.03543 | null |
2024-10-04 | No Need to Talk: Asynchronous Mixture of Language Models | Anastasiia Filippova et.al. | 2410.03529 | null |
2024-10-04 | Steering Large Language Models between Code Execution and Textual Reasoning | Yongchao Chen et.al. | 2410.03524 | null |
2024-10-04 | A Probabilistic Perspective on Unlearning and Alignment for Large Language Models | Yan Scholten et.al. | 2410.03523 | null |
2024-10-04 | CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios | Zetian Ouyang et.al. | 2410.03502 | link |
2024-10-04 | FedStein: Enhancing Multi-Domain Federated Learning Through James-Stein Estimator | Sunny Gupta et.al. | 2410.03499 | link |
2024-10-04 | Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores | Robert E. Blackwell et.al. | 2410.03492 | null |
2024-10-03 | Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations | Nick Jiang et.al. | 2410.02762 | link |
2024-10-03 | FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models | Zhipei Xu et.al. | 2410.02761 | link |
2024-10-03 | Erasing Conceptual Knowledge from Language Models | Rohit Gandikota et.al. | 2410.02760 | link |
2024-10-03 | Loong: Generating Minute-level Long Videos with Autoregressive Language Models | Yuqing Wang et.al. | 2410.02757 | null |
2024-10-03 | SIEVE: General Purpose Data Filtering System Matching GPT-4o Accuracy at 1% the Cost | Jifan Zhang et.al. | 2410.02755 | null |
2024-10-03 | Training Language Models on Synthetic Edit Sequences Improves Code Synthesis | Ulyana Piterbarg et.al. | 2410.02749 | link |
2024-10-03 | CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation | Han He et.al. | 2410.02748 | link |
2024-10-03 | Contrastive Localized Language-Image Pre-Training | Hong-You Chen et.al. | 2410.02746 | null |
2024-10-03 | Neutral residues: revisiting adapters for model extension | Franck Signe Talla et.al. | 2410.02744 | null |
2024-10-03 | MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions | Yekun Chai et.al. | 2410.02743 | link |
2024-10-03 | Grounding Large Language Models In Embodied Environment With Imperfect World Models | Haolan Liu et.al. | 2410.02742 | null |
2024-10-03 | Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization | Lei Xu et.al. | 2410.02741 | link |
2024-10-03 | Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models | Zhengfeng Lai et.al. | 2410.02740 | null |
2024-10-04 | Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge | Jiayi Ye et.al. | 2410.02736 | null |
2024-10-03 | DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects | Zhaowei Wang et.al. | 2410.02730 | link |
2024-10-03 | Unified Multi-Modal Interleaved Document Representation for Information Retrieval | Jaewoo Lee et.al. | 2410.02729 | null |
2024-10-03 | Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation | Rohin Manvi et.al. | 2410.02725 | null |
2024-10-03 | Large Language Models as Markov Chains | Oussama Zekri et.al. | 2410.02724 | null |
2024-10-03 | Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization | Ryan C. Barron et.al. | 2410.02721 | null |
2024-10-03 | UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation | Zixuan Li et.al. | 2410.02719 | null |
2024-10-02 | Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads | Yuxiang Huang et.al. | 2410.01805 | link |
2024-10-02 | Efficient |
Alex W. Neal Riasanovsky et.al. | 2410.01799 | null |
2024-10-02 | Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models | Joseph Lee et.al. | 2410.01795 | link |
2024-10-02 | When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 | R. Thomas McCoy et.al. | 2410.01792 | null |
2024-10-02 | Investigating on RLHF methodology | Alexey Kutalev et.al. | 2410.01789 | null |
2024-10-02 | OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation Models | Heng Yang et.al. | 2410.01784 | link |
2024-10-02 | Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models | Shayekh Bin Islam et.al. | 2410.01782 | link |
2024-10-03 | Quantifying Generalization Complexity for Large Language Models | Zhenting Qi et.al. | 2410.01769 | link |
2024-10-02 | Integrating Protein Sequence and Expression Level to Analysis Molecular Characterization of Breast Cancer Subtypes | Hossein Sholehrasa et.al. | 2410.01755 | null |
2024-10-03 | Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks | Mengzhao Jia et.al. | 2410.01744 | link |
2024-10-02 | VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models | Kailai Feng et.al. | 2410.01738 | link |
2024-10-02 | Visual Perception in Text Strings | Qi Jia et.al. | 2410.01733 | link |
2024-10-02 | Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing | Yilmazcan Ozyurt et.al. | 2410.01727 | link |
2024-10-02 | Auto-Demo Prompting: Leveraging Generated Outputs as Demonstrations for Enhanced Batch Prompting | Longyu Feng et.al. | 2410.01724 | null |
2024-10-02 | Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective | Zeyu Gan et.al. | 2410.01720 | link |
2024-10-02 | Examining the Role of Relationship Alignment in Large Language Models | Kristen M. Altenburger et.al. | 2410.01708 | null |
2024-10-02 | Interpretable Contrastive Monte Carlo Tree Search Reasoning | Zitian Gao et.al. | 2410.01707 | link |
2024-10-02 | An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task Settings | Soham Govande et.al. | 2410.01704 | link |
2024-10-02 | CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs | Kangsheng Wang et.al. | 2410.01696 | null |
2024-10-02 | U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models | Tung-Yu Wu et.al. | 2410.01692 | null |
2024-09-30 | MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning | Haotian Zhang et.al. | 2409.20566 | null |
2024-09-30 | LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner | Xiaopan Zhang et.al. | 2409.20560 | null |
2024-09-30 | Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos | Md Mohaiminul Islam et.al. | 2409.20557 | null |
2024-09-30 | UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models | Qiaojun Yu et.al. | 2409.20551 | null |
2024-09-30 | LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation | Ziyao Zhang et.al. | 2409.20550 | null |
2024-09-30 | Robi Butler: Remote Multimodal Interactions with Household Robot Assistant | Anxing Xiao et.al. | 2409.20548 | null |
2024-09-30 | Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models | Arpan Mukherjee et.al. | 2409.20512 | null |
2024-09-30 | COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models | Divyanshu Daiya et.al. | 2409.20502 | null |
2024-09-30 | A Weakly Supervised Data Labeling Framework for Machine Lexical Normalization in Vietnamese Social Media | Dung Ha Nguyen et.al. | 2409.20467 | null |
2024-09-30 | Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments | Mohamed Elnoor et.al. | 2409.20445 | null |
2024-10-01 | Instance-adaptive Zero-shot Chain-of-Thought Prompting | Xiaosong Yuan et.al. | 2409.20441 | null |
2024-09-30 | HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding | Fan Yuan et.al. | 2409.20429 | link |
2024-09-30 | World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering | Jiacong Wang et.al. | 2409.20424 | link |
2024-09-30 | Anti-stereotypical Predictive Text Suggestions Do Not Reliably Yield Anti-stereotypical Writing | Connor Baumler et.al. | 2409.20390 | null |
2024-09-30 | Wait, but Tylenol is Acetaminophen... Investigating and Improving Language Models' Ability to Resist Requests for Misinformation | Shan Chen et.al. | 2409.20385 | null |
2024-09-30 | Word-wise intonation model for cross-language TTS systems | Tomilov A. A. et.al. | 2409.20374 | null |
2024-09-30 | The Perfect Blend: Redefining RLHF with Mixture of Judges | Tengyu Xu et.al. | 2409.20370 | null |
2024-09-30 | VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs | Ruotong Liao et.al. | 2409.20365 | link |
2024-09-30 | Efficient Driving Behavior Narration and Reasoning on Edge Device Using Large Language Models | Yizhou Huang et.al. | 2409.20364 | null |
2024-09-30 | Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference | Ke Yi et.al. | 2409.20361 | null |
2024-09-27 | Exploring Token Pruning in Vision State Space Models | Zheng Zhan et.al. | 2409.18962 | null |
2024-09-27 | LML: Language Model Learning a Dataset for Data-Augmented Prediction | Praneeth Vadlapati et.al. | 2409.18957 | link |
2024-09-27 | Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models | Jiaming Li et.al. | 2409.18943 | link |
2024-09-27 | From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding | Heqing Zou et.al. | 2409.18938 | link |
2024-09-27 | Social Media Bot Policies: Evaluating Passive and Active Enforcement | Kristina Radivojevic et.al. | 2409.18931 | null |
2024-09-27 | AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow | Huizi Yu et.al. | 2409.18924 | null |
2024-09-27 | Soft Measures for Extracting Causal Collective Intelligence | Maryam Berijanian et.al. | 2409.18911 | link |
2024-09-27 | Improving Visual Object Tracking through Visual Prompting | Shih-Fang Chen et.al. | 2409.18901 | link |
2024-09-27 | IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation | Fan Lin et.al. | 2409.18892 | link |
2024-09-27 | Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models | Zehan Li et.al. | 2409.18878 | null |
2024-09-27 | Predicting and analyzing memorization within fine-tuned Large Language Models | Jérémie Dentan et.al. | 2409.18858 | null |
2024-09-27 | Mitigating Selection Bias with Node Pruning and Auxiliary Options | Hyeong Kyu Choi et.al. | 2409.18857 | null |
2024-09-27 | LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis | Hamed Babaei Giglou et.al. | 2409.18812 | link |
2024-09-27 | Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs | Yanyuan Qiao et.al. | 2409.18794 | null |
2024-09-27 | A Survey on the Honesty of Large Language Models | Siheng Li et.al. | 2409.18786 | link |
2024-09-27 | Enhancing Explainability in Multimodal Large Language Models Using Ontological Context | Jihen Amara et.al. | 2409.18753 | null |
2024-09-27 | OpenObject-NAV: Open-Vocabulary Object-Oriented Navigation Based on Dynamic Carrier-Relationship Scene Graph | Yujie Tang et.al. | 2409.18743 | null |
2024-09-27 | Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item Catalogs | Gleb Mezentsev et.al. | 2409.18721 | link |
2024-09-27 | Read Over the Lines: Attacking LLMs and Toxicity Detection Systems with ASCII Art to Mask Profanity | Sergey Berezin et.al. | 2409.18708 | link |
2024-09-27 | Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models | Yiming Chen et.al. | 2409.18680 | link |
2024-09-26 | EgoLM: Multi-Modal Language Model of Egocentric Motions | Fangzhou Hong et.al. | 2409.18127 | null |
2024-09-26 | Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction | Jing He et.al. | 2409.18124 | null |
2024-09-26 | Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography | Yuexi Du et.al. | 2409.18119 | null |
2024-09-26 | E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding | Ye Liu et.al. | 2409.18111 | link |
2024-09-26 | Open-World Evaluation for Retrieving Diverse Perspectives | Hung-Ting Chen et.al. | 2409.18110 | null |
2024-09-26 | MALPOLON: A Framework for Deep Species Distribution Modeling | Theo Larcher et.al. | 2409.18102 | link |
2024-09-26 | SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation | Xin Li et.al. | 2409.18082 | null |
2024-09-26 | Infer Human's Intentions Before Following Natural Language Instructions | Yanming Wan et.al. | 2409.18073 | link |
2024-09-26 | Infering Alt-text For UI Icons With Large Language Models During App Development | Sabrina Haque et.al. | 2409.18060 | null |
2024-09-26 | DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving | Dingrui Wang et.al. | 2409.18053 | link |
2024-09-26 | EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions | Kai Chen et.al. | 2409.18042 | null |
2024-09-26 | Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective | Yotam Wolf et.al. | 2409.18028 | null |
2024-09-26 | An Adversarial Perspective on Machine Unlearning for AI Safety | Jakub Łucki et.al. | 2409.18025 | link |
2024-09-26 | DARE: Diverse Visual Question Answering with Robustness Evaluation | Hannah Sterz et.al. | 2409.18023 | null |
2024-09-26 | Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles | Lewei He et.al. | 2409.18014 | null |
2024-09-26 | Control Industrial Automation System with Large Language Models | Yuchen Xia et.al. | 2409.18009 | link |
2024-09-26 | Multilingual Evaluation of Long Context Retrieval and Reasoning | Ameeta Agrawal et.al. | 2409.18006 | link |
2024-09-26 | Enhancing Tourism Recommender Systems for Sustainable City Trips Using Retrieval-Augmented Generation | Ashmi Banerjee et.al. | 2409.18003 | null |
2024-09-26 | Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models | Georg Ahnert et.al. | 2409.17990 | link |
2024-09-26 | LLM4Brain: Training a Large Language Model for Brain Video Understanding | Ruizhe Zheng et.al. | 2409.17987 | null |
2024-09-25 | Attention Prompting on Image for Large Vision-Language Models | Runpeng Yu et.al. | 2409.17143 | link |
2024-09-25 | FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression | Fazal Mittu et.al. | 2409.17141 | link |
2024-09-25 | Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents | Junting Lu et.al. | 2409.17140 | null |
2024-09-25 | Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset | Andrew Goldberg et.al. | 2409.17126 | null |
2024-09-25 | Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale | Fan Zhou et.al. | 2409.17115 | link |
2024-09-25 | Unveiling Ontological Commitment in Multi-Modal Foundation Models | Mert Keser et.al. | 2409.17109 | null |
2024-09-25 | Accumulator-Aware Post-Training Quantization | Ian Colbert et.al. | 2409.17092 | null |
2024-09-25 | Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning? | Bowen Zhao et.al. | 2409.17080 | link |
2024-09-25 | VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models | Yifei Liu et.al. | 2409.17066 | link |
2024-09-25 | Benchmarking Domain Generalization Algorithms in Computational Pathology | Neda Zamanitajeddin et.al. | 2409.17063 | link |
2024-09-25 | Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia | Azmul Asmar Irfan et.al. | 2409.17054 | null |
2024-09-25 | GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design | Phillip Mueller et.al. | 2409.17045 | null |
2024-09-25 | How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not | Francesco Verdini et.al. | 2409.17044 | null |
2024-09-25 | Counterfactual Token Generation in Large Language Models | Ivi Chatzi et.al. | 2409.17027 | link |
2024-09-25 | LLM-CARD: Towards a Description and Landscape of Large Language Models | Shengwei Tian et.al. | 2409.17011 | link |
2024-09-25 | Models Can and Should Embrace the Communicative Nature of Human-Generated Math | Sasha Boguraev et.al. | 2409.17005 | null |
2024-09-26 | INT-FlashAttention: Enabling Flash Attention for INT8 Quantization | Shimao Chen et.al. | 2409.16997 | link |
2024-09-25 | Harnessing Diversity for Important Data Selection in Pretraining Large Language Models | Chi Zhang et.al. | 2409.16986 | null |
2024-09-25 | AXCEL: Automated eXplainable Consistency Evaluation using LLMs | P Aditya Sreekar et.al. | 2409.16984 | null |
2024-09-25 | Decoding Large-Language Models: A Systematic Overview of Socio-Technical Impacts, Constraints, and Emerging Questions | Zeyneb N. Kaya et.al. | 2409.16974 | null |
2024-09-24 | Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation | Yong Xien Chng et.al. | 2409.16278 | null |
2024-09-24 | LLM Echo Chamber: personalized and automated disinformation | Tony Ma et.al. | 2409.16241 | link |
2024-09-24 | EuroLLM: Multilingual Language Models for Europe | Pedro Henrique Martins et.al. | 2409.16235 | null |
2024-09-24 | Fine-Tuning is Fine, if Calibrated | Zheda Mai et.al. | 2409.16223 | link |
2024-09-24 | Towards Enhancing Linked Data Retrieval in Conversational UIs using Large Language Models | Omar Mussa et.al. | 2409.16220 | link |
2024-09-24 | LLMCount: Enhancing Stationary mmWave Detection with Multimodal-LLM | Boyan Li et.al. | 2409.16209 | null |
2024-09-25 | CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data | Qian-Wen Zhang et.al. | 2409.16202 | link |
2024-09-24 | Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking | Jun Bai et.al. | 2409.16198 | link |
2024-09-24 | HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models | Haoran Que et.al. | 2409.16191 | link |
2024-09-24 | Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation | Xiaohong Liu et.al. | 2409.16183 | null |
2024-09-24 | SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image | Dimitrije Antić et.al. | 2409.16178 | null |
2024-09-24 | Cyber Knowledge Completion Using Large Language Models | Braden K Webb et.al. | 2409.16176 | null |
2024-09-24 | Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering | Ziyu Zhao et.al. | 2409.16167 | null |
2024-09-24 | EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges | Talor Abramovich et.al. | 2409.16165 | link |
2024-09-24 | ComiCap: A VLMs pipeline for dense captioning of Comic Panels | Emanuele Vivoli et.al. | 2409.16159 | link |
2024-09-24 | Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework | Lu Chen et.al. | 2409.16146 | link |
2024-09-24 | Evaluation of state-of-the-art ASR Models in Child-Adult Interactions | Aditya Ashvin et.al. | 2409.16135 | null |
2024-09-24 | MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents | Ming Zhu et.al. | 2409.16120 | link |
2024-09-25 | Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration | Pin-Jui Ku et.al. | 2409.16117 | link |
2024-09-24 | Exploring Hint Generation Approaches in Open-Domain Question Answering | Jamshid Mozafari et.al. | 2409.16096 | link |
2024-09-20 | Gender Representation and Bias in Indian Civil Service Mock Interviews | Somonnoy Banerjee et.al. | 2409.12194 | null |
2024-09-18 | Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution | Peng Wang et.al. | 2409.12191 | link |
2024-09-18 | To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning | Zayne Sprague et.al. | 2409.12183 | link |
2024-09-23 | A Controlled Study on Long Context Extension and Generalization in LLMs | Yi Lu et.al. | 2409.12181 | link |
2024-09-18 | Finetuning Language Models to Emit Linguistic Expressions of Uncertainty | Arslan Chaudhry et.al. | 2409.12180 | null |
2024-09-18 | Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference | Najmeh Forouzandehmehr et.al. | 2409.12150 | null |
2024-09-18 | MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | Justin Chih-Yao Chen et.al. | 2409.12147 | link |
2024-09-18 | MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion | Kalakonda Sai Shashank et.al. | 2409.12140 | null |
2024-09-24 | Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models | Sijing Chen et.al. | 2409.12139 | null |
2024-09-18 | GRIN: GRadient-INformed MoE | Liyuan Liu et.al. | 2409.12136 | null |
2024-09-18 | Linguini: A benchmark for language-agnostic linguistic reasoning | Eduardo Sánchez et.al. | 2409.12126 | link |
2024-09-18 | Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement | An Yang et.al. | 2409.12122 | null |
2024-09-18 | Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference | Edresson Casanova et.al. | 2409.12117 | null |
2024-09-18 | Measuring Human and AI Values based on Generative Psychometrics with Large Language Models | Haoran Ye et.al. | 2409.12106 | link |
2024-09-19 | Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval | Warren Jouanneau et.al. | 2409.12097 | null |
2024-09-19 | The Impact of Element Ordering on LM Agent Performance | Wayne Chi et.al. | 2409.12089 | link |
2024-09-18 | Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking | Ningyuan Xi et.al. | 2409.12059 | null |
2024-09-19 | Using Large Language Models to Generate Clinical Trial Tables and Figures | Yumeng Yang et.al. | 2409.12046 | null |
2024-09-18 | All-in-one foundational models learning across quantum chemical levels | Yuxinxin Chen et.al. | 2409.12015 | link |
2024-09-18 | Mixture of Prompt Learning for Vision Language Models | Yu Du et.al. | 2409.12011 | null |
2024-09-17 | AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs | Basel Mousi et.al. | 2409.11404 | null |
2024-09-17 | NVLM: Open Frontier-Class Multimodal LLMs | Wenliang Dai et.al. | 2409.11402 | null |
2024-09-17 | Says Who? Effective Zero-Shot Annotation of Focalization | Rebecca M. M. Hicke et.al. | 2409.11390 | null |
2024-09-17 | Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement | Simon Yu et.al. | 2409.11378 | link |
2024-09-17 | Towards Time Series Reasoning with LLMs | Winnie Chow et.al. | 2409.11376 | null |
2024-09-17 | Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification | Fatema-E- Jannat et.al. | 2409.11375 | null |
2024-09-17 | Learning Spatially-Aware Language and Audio Embedding | Bhavika Devnani et.al. | 2409.11369 | null |
2024-09-17 | CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration | Jiahui Gao et.al. | 2409.11365 | null |
2024-09-17 | CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark | Zachary S. Siegel et.al. | 2409.11363 | link |
2024-09-17 | AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances | Dhruv Agarwal et.al. | 2409.11360 | null |
2024-09-17 | THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models | Mengfei Liang et.al. | 2409.11353 | link |
2024-09-17 | LPT++: Efficient Training on Mixture of Long-tailed Experts | Bowen Dong et.al. | 2409.11323 | null |
2024-09-17 | SOAP: Improving and Stabilizing Shampoo using Adam | Nikhil Vyas et.al. | 2409.11321 | link |
2024-09-17 | Beyond LoRA: Exploring Efficient Fine-Tuning Techniques for Time Series Foundational Models | Divij Gupta et.al. | 2409.11302 | null |
2024-09-17 | Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5 | Marcel Lamott et.al. | 2409.11282 | null |
2024-09-17 | P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task | Weiye Xu et.al. | 2409.11279 | null |
2024-09-17 | Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments | Maria Rigaki et.al. | 2409.11276 | null |
2024-09-17 | Task Arithmetic for Language Expansion in Speech Translation | Yao-Fei Cheng et.al. | 2409.11274 | null |
2024-09-17 | LOLA -- An Open-Source Massively Multilingual Large Language Model | Nikit Srivastava et.al. | 2409.11272 | link |
2024-09-17 | Bio-Inspired Mamba: Temporal Locality and Bioplausible Learning in Selective State Space Models | Jiahao Qin et.al. | 2409.11263 | null |
2024-09-16 | RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval | Di Liu et.al. | 2409.10516 | link |
2024-09-16 | Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models | Momoko Shiraishi et.al. | 2409.10506 | null |
2024-09-16 | DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction | John Wu et.al. | 2409.10504 | null |
2024-09-16 | Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles | Kulin Shah et.al. | 2409.10502 | link |
2024-09-16 | Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models | Shaznin Sultana et.al. | 2409.10490 | null |
2024-09-16 | Do Pre-trained Vision-Language Models Encode Object States? | Kaleb Newman et.al. | 2409.10488 | null |
2024-09-16 | XLM for Autonomous Driving Systems: A Comprehensive Review | Sonda Fourati et.al. | 2409.10484 | null |
2024-09-16 | Schrodinger's Memory: Large Language Models | Wei Wang et.al. | 2409.10482 | null |
2024-09-16 | Towards Semantic Versioning of Open Pre-trained Language Model Releases on Hugging Face | Adekunle Ajibode et.al. | 2409.10472 | null |
2024-09-16 | LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning | Jicong Ao et.al. | 2409.10444 | link |
2024-09-16 | CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera | Jingpei Lu et.al. | 2409.10441 | null |
2024-09-16 | HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models | Vineet Bhat et.al. | 2409.10419 | null |
2024-09-16 | A Large-Scale Privacy Assessment of Android Third-Party SDKs | Mark Huasong Meng et.al. | 2409.10411 | null |
2024-09-16 | A Knowledge-Enhanced Disease Diagnosis Method Based on Prompt Learning and BERT Integration | Zhang Zheng et.al. | 2409.10403 | null |
2024-09-17 | Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot | Bhuvan Sachdeva et.al. | 2409.10354 | null |
2024-09-16 | Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation | Tianrui Song et.al. | 2409.10343 | null |
2024-09-16 | The 20 questions game to distinguish large language models | Gurvan Richardeau et.al. | 2409.10338 | null |
2024-09-16 | MGSA: Multi-granularity Graph Structure Attention for Knowledge Graph-to-Text Generation | Shanshan Wang et.al. | 2409.10294 | null |
2024-09-16 | ReflectDiffu: Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion Framework | Jiahao Yuan et.al. | 2409.10289 | link |
2024-09-16 | ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code | Jia Feng et.al. | 2409.10280 | link |
2024-09-13 | Agents in Software Engineering: Survey, Landscape, and Vision | Yanxian Huang et.al. | 2409.09030 | link |
2024-09-13 | Contri(e)ve: Context + Retrieve for Scholarly Question Answering | Kanchan Shivashankar et.al. | 2409.09010 | null |
2024-09-13 | Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance | Lucio La Cava et.al. | 2409.08963 | null |
2024-09-13 | Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions | Zahra Ashktorab et.al. | 2409.08937 | null |
2024-09-13 | SynSUM -- Synthetic Benchmark with Structured and Unstructured Medical Records | Paloma Rabaey et.al. | 2409.08936 | link |
2024-09-13 | LLM-based Weak Supervision Framework for Query Intent Classification in Video Search | Farnoosh Javadi et.al. | 2409.08931 | null |
2024-09-13 | Affective Computing Has Changed: The Foundation Model Disruption | Björn Schuller et.al. | 2409.08907 | null |
2024-09-13 | AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models | Yifei Yao et.al. | 2409.08904 | link |
2024-09-13 | A Market for Lemons? Strategic Directions for a Vigilant Application of Artificial Intelligence in Entrepreneurship Research | Martin Obschonka et.al. | 2409.08890 | null |
2024-09-13 | Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark | Xuchen Li et.al. | 2409.08887 | null |
2024-09-13 | Exploring Graph Structure Comprehension Ability of Multimodal Large Language Models: Case Studies | Zhiqiang Zhong et.al. | 2409.08864 | null |
2024-09-13 | FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition | Zhenhua Xu et.al. | 2409.08846 | null |
2024-09-13 | AIPO: Improving Training Objective for Iterative Preference Optimization | Yaojie Shen et.al. | 2409.08845 | link |
2024-09-13 | A RAG Approach for Generating Competency Questions in Ontology Engineering | Xueli Pan et.al. | 2409.08820 | null |
2024-09-13 | Your Weak LLM is Secretly a Strong Teacher for Alignment | Leitian Tao et.al. | 2409.08813 | null |
2024-09-13 | Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task | Shao Zhang et.al. | 2409.08811 | null |
2024-09-13 | LLaQo: Towards a Query-Based Coach in Expressive Music Performance Assessment | Huan Zhang et.al. | 2409.08795 | link |
2024-09-13 | Optimizing Ingredient Substitution Using Large Language Models to Enhance Phytochemical Content in Recipes | Luis Rita et.al. | 2409.08792 | null |
2024-09-13 | Electrocardiogram Report Generation and Question Answering via Retrieval-Augmented Self-Supervised Modeling | Jialu Tang et.al. | 2409.08788 | null |
2024-09-13 | Uncertainty and Generalizability in Foundation Models for Earth Observation | Raul Ramos-Pollan et.al. | 2409.08744 | null |
2024-09-12 | Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale | Rogerio Bonatti et.al. | 2409.08264 | link |
2024-09-12 | OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering | Jiahao Nick Li et.al. | 2409.08250 | null |
2024-09-12 | Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources | Alisia Lupidi et.al. | 2409.08239 | null |
2024-09-12 | LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems | Hakan T. Otal et.al. | 2409.08234 | link |
2024-09-12 | Adaptive Language-Guided Abstraction from Contrastive Explanations | Andi Peng et.al. | 2409.08212 | null |
2024-09-12 | ComAlign: Compositional Alignment in Vision-Language Models | Ali Abdollah et.al. | 2409.08206 | null |
2024-09-12 | What Makes a Maze Look Like a Maze? | Joy Hsu et.al. | 2409.08202 | null |
2024-09-12 | AudioBERT: Audio Knowledge Augmented Language Model | Hyunjong Ok et.al. | 2409.08199 | link |
2024-09-12 | Fine-tuning Large Language Models for Entity Matching | Aaron Steiner et.al. | 2409.08185 | link |
2024-09-12 | On the Role of Context in Reading Time Prediction | Andreas Opedal et.al. | 2409.08160 | link |
2024-09-12 | Faster Speech-LLaMA Inference with Multi-token Prediction | Desh Raj et.al. | 2409.08148 | null |
2024-09-12 | LLM-POTUS Score: A Framework of Analyzing Presidential Debates with Large Language Models | Zhengliang Liu et.al. | 2409.08147 | null |
2024-09-12 | Towards a graph-based foundation model for network traffic analysis | Louis Van Langendonck et.al. | 2409.08111 | null |
2024-09-12 | The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language | Michael Ong et.al. | 2409.08103 | null |
2024-09-12 | The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal | Huiyuan Xie et.al. | 2409.08098 | null |
2024-09-12 | Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks | Benji Peng et.al. | 2409.08087 | null |
2024-09-12 | SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality | Chenyang Lei et.al. | 2409.08083 | link |
2024-09-12 | SoVAR: Building Generalizable Scenarios from Accident Reports for Autonomous Driving Testing | An Guo et.al. | 2409.08081 | link |
2024-09-12 | TravelAgent: An AI Assistant for Personalized Travel Planning | Aili Chen et.al. | 2409.08069 | null |
2024-09-12 | An Evaluation Framework for Attributed Information Retrieval using Large Language Models | Hanane Djeddal et.al. | 2409.08014 | link |
2024-09-11 | "My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays | Shengxin Hong et.al. | 2409.07453 | null |
2024-09-11 | StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos | Sijie Zhao et.al. | 2409.07447 | null |
2024-09-11 | SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories | Ben Bogin et.al. | 2409.07440 | link |
2024-09-11 | A Suite for Acoustic Language Model Evaluation | Gallil Maimon et.al. | 2409.07437 | link |
2024-09-11 | Synthetic continued pretraining | Zitong Yang et.al. | 2409.07431 | link |
2024-09-11 | Agent Workflow Memory | Zora Zhiruo Wang et.al. | 2409.07429 | link |
2024-09-11 | CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification | Zeqing Qin et.al. | 2409.07407 | null |
2024-09-11 | AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge | Han Wang et.al. | 2409.07394 | link |
2024-09-11 | Awaking the Slides: A Tuning-free and Knowledge-regulated AI Tutoring System via Language Model Coordination | Daniel Zhang-Li et.al. | 2409.07372 | null |
2024-09-11 | Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code | Khiem Ton et.al. | 2409.07368 | null |
2024-09-11 | Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation | SeongYeub Chu et.al. | 2409.07355 | link |
2024-09-11 | Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks | Md Zarif Hossain et.al. | 2409.07353 | link |
2024-09-11 | Explanation, Debate, Align: A Weak-to-Strong Framework for Language Model Generalization | Mehrdad Zakershahrak et.al. | 2409.07335 | null |
2024-09-11 | Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering | Weixi Weng et.al. | 2409.07331 | null |
2024-09-11 | MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications | Praveen K Kanithi et.al. | 2409.07314 | null |
2024-09-11 | Exploring User-level Gradient Inversion with a Diffusion Prior | Zhuohang Li et.al. | 2409.07291 | null |
2024-09-11 | STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM | Qijiong Liu et.al. | 2409.07276 | null |
2024-09-11 | MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving | Enming Zhang et.al. | 2409.07267 | link |
2024-09-11 | Alignment of Diffusion Models: Fundamentals, Challenges, and Future | Buhua Liu et.al. | 2409.07253 | link |
2024-09-11 | PiTe: Pixel-Temporal Alignment for Large Video-Language Model | Yang Liu et.al. | 2409.07239 | link |
2024-09-10 | Benchmarking Sub-Genre Classification For Mainstage Dance Music | Hongzhi Shu et.al. | 2409.06690 | null |
2024-09-10 | E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning | Zihan Liao et.al. | 2409.06679 | null |
2024-09-10 | LLaMA-Omni: Seamless Speech Interaction with Large Language Models | Qingkai Fang et.al. | 2409.06666 | link |
2024-09-10 | Human Perception of LLM-generated Text Content in Social Media Environments | Kristina Radivojevic et.al. | 2409.06653 | null |
2024-09-10 | Optimal Workload Placement on Multi-Instance GPUs | Bekir Turkkan et.al. | 2409.06646 | null |
2024-09-10 | EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis | Danli Shi et.al. | 2409.06644 | null |
2024-09-11 | Segmenting sea ice floes in close-range optical imagery with active contour and foundation models | Giulio Passerotti et.al. | 2409.06641 | null |
2024-09-10 | TeXBLEU: Automatic Metric for Evaluate LaTeX Format | Kyudan Jung et.al. | 2409.06639 | link |
2024-09-10 | MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders | Wenyu Zhang et.al. | 2409.06635 | null |
2024-09-10 | A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio | Ningyuan Xi et.al. | 2409.06624 | null |
2024-09-10 | Exploring Italian sentence embeddings properties through multi-tasking | Vivi Nastase et.al. | 2409.06622 | link |
2024-09-10 | Alleviating Hallucinations in Large Language Models with Scepticism Modeling | Yetao Wu et.al. | 2409.06601 | null |
2024-09-10 | GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering | Sacha Muller et.al. | 2409.06595 | link |
2024-09-10 | Quantifying and Enabling the Interpretability of CLIP-like Models | Avinash Madasu et.al. | 2409.06579 | null |
2024-09-10 | Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement | Vivi Nastase et.al. | 2409.06567 | null |
2024-09-10 | MAPS: Energy-Reliability Tradeoff Management in Autonomous Vehicles Through LLMs Penetrated Science | Mahdieh Aliazam et.al. | 2409.06558 | null |
2024-09-10 | Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games | Juhwan Choi et.al. | 2409.06518 | link |
2024-09-10 | Aligning Machine and Human Visual Representations across Abstraction Levels | Lukas Muttenthaler et.al. | 2409.06509 | null |
2024-09-10 | Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive Decoding | Xiaoyu Liang et.al. | 2409.06485 | null |
2024-09-10 | Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles | Qiujing Lu et.al. | 2409.06450 | null |
2024-09-09 | MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct | Run Luo et.al. | 2409.05840 | null |
2024-09-09 | Are Large Language Models a Threat to Programming Platforms? An Exploratory Study | Md Mustakim Billah et.al. | 2409.05824 | null |
2024-09-09 | VFA: Vision Frequency Analysis of Foundation Models and Human | Mohammad-Javad Darvishi-Bayazi et.al. | 2409.05817 | null |
2024-09-09 | Improving Pretraining Data Using Perplexity Correlations | Tristan Thrush et.al. | 2409.05816 | null |
2024-09-09 | Benchmarking Chinese Knowledge Rectification in Large Language Models | Tianhe Lu et.al. | 2409.05806 | link |
2024-09-09 | Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models | Emily Cheng et.al. | 2409.05771 | null |
2024-09-09 | Model Input Verification of Large Scale Simulations | Rumyana Neykova et.al. | 2409.05768 | null |
2024-09-09 | A Novel Idea Generation Tool using a Structured Conversational AI (CAI) System | B. Sankar et.al. | 2409.05747 | null |
2024-09-09 | LLMs Will Always Hallucinate, and We Need to Live With This | Sourav Banerjee et.al. | 2409.05746 | null |
2024-09-09 | A System and Benchmark for LLM-based Q&A on Heterogeneous Data | Achille Fokoue et.al. | 2409.05735 | null |
2024-09-09 | Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach | Meng Zhou et.al. | 2409.05732 | null |
2024-09-09 | The Influence of Task and Group Disparities over Users' Attitudes Toward Using Large Language Models for Psychotherapy | Qihang He et.al. | 2409.05703 | null |
2024-09-09 | Segmentation by Factorization: Unsupervised Semantic Segmentation for Pathology by Factorizing Foundation Model Features | Jacob Gildenblat et.al. | 2409.05697 | null |
2024-09-09 | Zero-shot Outlier Detection via Prior-data Fitted Networks: Model Selection Bygone! | Yuchen Shen et.al. | 2409.05672 | null |
2024-09-09 | Revisiting English Winogender Schemas for Consistency, Coverage, and Grammatical Case | Vagrant Gautam et.al. | 2409.05653 | link |
2024-09-10 | MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery | Hongjin Qian et.al. | 2409.05591 | link |
2024-09-09 | Leveraging Content and Acoustic Representations for Efficient Speech Emotion Recognition | Soumya Dutta et.al. | 2409.05566 | link |
2024-09-09 | CauseJudger: Identifying the Cause with LLMs for Abductive Logical Reasoning | Jinwei He et.al. | 2409.05559 | null |
2024-09-09 | SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning | Alireza Ghafarollahi et.al. | 2409.05556 | link |
2024-09-09 | Harmonic Reasoning in Large Language Models | Anna Kruspe et.al. | 2409.05521 | null |
2024-09-06 | VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation | Yecheng Wu et.al. | 2409.04429 | link |
2024-09-06 | Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning Techniques | Davide Clode da Silva et.al. | 2409.04424 | null |
2024-09-06 | RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs | Jiaxing Wu et.al. | 2409.04421 | null |
2024-09-06 | Question-Answering Dense Video Events | Hangyu Qin et.al. | 2409.04388 | null |
2024-09-06 | Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs | Aliakbar Nafar et.al. | 2409.04318 | link |
2024-09-06 | An optically accelerated extreme learning machine using hot atomic vapors | Pierre Azam et.al. | 2409.04312 | null |
2024-09-06 | Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets | Desiree Heim et.al. | 2409.04286 | null |
2024-09-06 | Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models | Yuxiao Huang et.al. | 2409.04270 | null |
2024-09-06 | An overview of domain-specific foundation model: key technologies, applications and challenges | Haolong Chen et.al. | 2409.04267 | null |
2024-09-06 | UniDet3D: Multi-dataset Indoor 3D Object Detection | Maksim Kolodiazhnyi et.al. | 2409.04234 | link |
2024-09-06 | Fast Forwarding Low-Rank Training | Adir Rahamim et.al. | 2409.04206 | null |
2024-09-06 | Residual Stream Analysis with Multi-Layer SAEs | Tim Lawson et.al. | 2409.04185 | link |
2024-09-06 | GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding | Ziyin Zhang et.al. | 2409.04183 | null |
2024-09-06 | Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering | Larissa Pusch et.al. | 2409.04181 | null |
2024-09-06 | From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks | Andreas Stephan et.al. | 2409.04168 | null |
2024-09-06 | Can OpenSource beat ChatGPT? -- A Comparative Study of Large Language Models for Text-to-Code Generation | Luis Mayer et.al. | 2409.04164 | null |
2024-09-06 | Prompt-based Personality Profiling: Reinforcement Learning for Relevance Filtering | Jan Hofmann et.al. | 2409.04122 | null |
2024-09-06 | Multi-Programming Language Ensemble for Code Generation in Large Language Model | Tengfei Xue et.al. | 2409.04114 | link |
2024-09-06 | Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers | Chenglei Si et.al. | 2409.04109 | link |
2024-09-06 | UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity | Yicheng Fu et.al. | 2409.04081 | null |
2024-09-05 | Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding | Yunze Man et.al. | 2409.03757 | link |
2024-09-05 | Foundation Model or Finetune? Evaluation of few-shot semantic segmentation for river pollution | Marga Don et.al. | 2409.03754 | link |
2024-09-05 | Attention Heads of Large Language Models: A Survey | Zifan Zheng et.al. | 2409.03752 | link |
2024-09-05 | LLM-CI: Assessing Contextual Integrity Norms in Language Models | Yan Shvartzshnaider et.al. | 2409.03735 | null |
2024-09-05 | Safety vs. Performance: How Multi-Objective Learning Reduces Barriers to Market Entry | Meena Jagadeesan et.al. | 2409.03734 | null |
2024-09-05 | Planning In Natural Language Improves LLM Search For Code Generation | Evan Wang et.al. | 2409.03733 | link |
2024-09-06 | RAG based Question-Answering for Contextual Response Prediction System | Sriram Veturi et.al. | 2409.03708 | null |
2024-09-05 | LAST: Language Model Aware Speech Tokenization | Arnon Turetzky et.al. | 2409.03701 | null |
2024-09-05 | TRACE-cs: Trustworthy Reasoning for Contrastive Explanations in Course Scheduling Problems | Stylianos Loukas Vasileiou et.al. | 2409.03671 | link |
2024-09-05 | A Fused Large Language Model for Predicting Startup Success | Abdurahman Maarouf et.al. | 2409.03668 | null |
2024-09-05 | The representation landscape of few-shot learning and fine-tuning in large language models | Diego Doimo et.al. | 2409.03662 | link |
2024-09-06 | LLM-based multi-agent poetry generation in non-cooperative environments | Ran Zhang et.al. | 2409.03659 | link |
2024-09-05 | On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization | Yong Lin et.al. | 2409.03650 | null |
2024-09-05 | Text-Guided Mixup Towards Long-Tailed Image Categorization | Richard Franklin et.al. | 2409.03583 | link |
2024-09-05 | FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation | Xi Chen et.al. | 2409.03525 | null |
2024-09-05 | Have Large Vision-Language Models Mastered Art History? | Ombretta Strafforello et.al. | 2409.03521 | null |
2024-09-05 | Tissue Concepts: supervised foundation models in computational pathology | Till Nicke et.al. | 2409.03519 | link |
2024-09-05 | From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents | Jifan Yu et.al. | 2409.03512 | null |
2024-09-05 | LLM-based event abstraction and integration for IoT-sourced logs | Mohsen Shirali et.al. | 2409.03478 | link |
2024-09-05 | How Much Data is Enough Data? Fine-Tuning Large Language Models for In-House Translation: Performance Evaluation Across Multiple Dataset Sizes | Inacio Vieira et.al. | 2409.03454 | null |
2024-09-04 | RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version) | Yao Mu et.al. | 2409.02920 | null |
2024-09-04 | Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving | Yuhang Lu et.al. | 2409.02914 | null |
2024-09-04 | Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling | Kaiwen Zheng et.al. | 2409.02908 | null |
2024-09-05 | LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA | Jiajie Zhang et.al. | 2409.02897 | link |
2024-09-04 | LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture | Xidong Wang et.al. | 2409.02889 | link |
2024-09-04 | CanvOI, an Oncology Intelligence Foundation Model: Scaling FLOPS Differently | Jonathan Zalach et.al. | 2409.02885 | null |
2024-09-04 | Benchmarking Spurious Bias in Few-Shot Image Classifiers | Guangtao Zheng et.al. | 2409.02882 | link |
2024-09-04 | Configurable Foundation Models: Building LLMs from a Modular Perspective | Chaojun Xiao et.al. | 2409.02877 | null |
2024-09-04 | Historical German Text Normalization Using Type- and Token-Based Language Modeling | Anton Ehrmanntraut et.al. | 2409.02841 | null |
2024-09-04 | Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models | Moein Shahiki Tash et.al. | 2409.02836 | null |
2024-09-04 | CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models | Wentao Liu et.al. | 2409.02834 | link |
2024-09-04 | ExpLLM: Towards Chain of Thought for Facial Expression Recognition | Xing Lan et.al. | 2409.02828 | null |
2024-09-04 | Design Contradictions: Help or Hindrance? | Aron E. Owen et.al. | 2409.02823 | null |
2024-09-04 | Language Understanding as a Constraint on Consensus Size in LLM Societies | Giordano De Marzo et.al. | 2409.02822 | null |
2024-09-04 | Towards a Unified View of Preference Learning for Large Language Models: A Survey | Bofei Gao et.al. | 2409.02795 | link |
2024-09-05 | Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models? | Yixuan Tang et.al. | 2409.02727 | link |
2024-09-04 | Pre-training data selection for biomedical domain adaptation using journal impact metrics | Mathieu Laï-king et.al. | 2409.02725 | null |
2024-09-04 | Alignment-Aware Model Extraction Attacks on Large Language Models | Zi Liang et.al. | 2409.02718 | link |
2024-09-04 | Creating a Gen-AI based Track and Trace Assistant MVP (SuperTracy) for PostNL | Mohammad Reshadati et.al. | 2409.02711 | null |
2024-09-04 | LLM-Assisted Visual Analytics: Opportunities and Challenges | Maeve Hutchinson et.al. | 2409.02691 | null |
2024-08-30 | SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists | Raoyuan Zhao et.al. | 2408.17437 | link |
2024-08-30 | DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model | Mona Sheikh Zeinoddin et.al. | 2408.17433 | link |
2024-08-30 | Advancing Multi-talker ASR Performance with Large Language Models | Mohan Shi et.al. | 2408.17431 | null |
2024-08-30 | CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models | Jonathan Bourne et.al. | 2408.17428 | link |
2024-09-03 | Open-vocabulary Temporal Action Localization using VLMs | Naoki Wake et.al. | 2408.17422 | null |
2024-08-30 | Getting Inspiration for Feature Elicitation: App Store- vs. LLM-based Approach | Jialiang Wei et.al. | 2408.17404 | link |
2024-08-30 | EMPOWER: Embodied Multi-role Open-vocabulary Planning with Online Grounding and Execution | Francesco Argenziano et.al. | 2408.17379 | null |
2024-08-30 | NDP: Next Distribution Prediction as a More Broad Target | Junhao Ruan et.al. | 2408.17377 | null |
2024-08-30 | Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain | Francesca Grasso et.al. | 2408.17362 | link |
2024-08-30 | Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage | Md Rafi Ur Rashid et.al. | 2408.17354 | null |
2024-09-02 | LSMS: Language-guided Scale-aware MedSegmentor for Medical Image Referring Segmentation | Shuyi Ouyang et.al. | 2408.17347 | null |
2024-08-30 | Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering | Nicholas Pochinkov et.al. | 2408.17322 | link |
2024-08-30 | Bridging Domain Knowledge and Process Discovery Using Large Language Models | Ali Norouzifar et.al. | 2408.17316 | link |
2024-08-30 | Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts | Rhui Dih Lee et.al. | 2408.17280 | null |
2024-08-30 | Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach | Tong Nie et.al. | 2408.17258 | null |
2024-08-30 | VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters | Mouxiang Chen et.al. | 2408.17253 | link |
2024-08-30 | Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study | Shubham Agarwal et.al. | 2408.17181 | null |
2024-08-30 | Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model | Zhen Ye et.al. | 2408.17175 | link |
2024-08-30 | Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning | Xiaoye Qu et.al. | 2408.17150 | link |
2024-08-30 | Reasoning AI Performance Degradation in 6G Networks with Large Language Models | Liming Huang et.al. | 2408.17097 | null |
2024-08-29 | PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning | Noor Hussein et.al. | 2408.16769 | link |
2024-08-29 | How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models | Jiyue Jiang et.al. | 2408.16756 | link |
2024-08-29 | Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models | Alec Solway et.al. | 2408.16753 | null |
2024-08-29 | A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models | Yi-Lin Tuan et.al. | 2408.16751 | null |
2024-08-29 | Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge | Beidi Dong et.al. | 2408.16749 | null |
2024-08-29 | Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models | Jiří Milička et.al. | 2408.16740 | null |
2024-08-29 | Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling | Hritik Bansal et.al. | 2408.16737 | null |
2024-08-29 | VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation | Shiwei Wu et.al. | 2408.16730 | null |
2024-08-30 | Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming | Zhifei Xie et.al. | 2408.16725 | link |
2024-08-29 | GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models | Moreno D'Incà et.al. | 2408.16700 | link |
2024-08-29 | Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity | Ziniu Li et.al. | 2408.16673 | null |
2024-08-29 | Space3D-Bench: Spatial 3D Question Answering Benchmark | Emilia Szymanska et.al. | 2408.16662 | null |
2024-08-29 | DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving | Yongjie Fu et.al. | 2408.16647 | null |
2024-08-29 | Examination of Code generated by Large Language Models | Robin Beer et.al. | 2408.16601 | link |
2024-08-29 | Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies | Zhiyang Qi et.al. | 2408.16586 | null |
2024-08-29 | WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling | Shengpeng Ji et.al. | 2408.16532 | link |
2024-08-29 | CNIMA: A Universal Evaluation Framework and Automated Approach for Assessing Second Language Dialogues | Rena Gao et.al. | 2408.16518 | link |
2024-08-29 | LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs? | Jan Cegin et.al. | 2408.16502 | null |
2024-08-29 | CogVLM2: Visual Language Models for Image and Video Understanding | Wenyi Hong et.al. | 2408.16500 | link |
2024-08-29 | A Survey on Evaluating Large Language Models in Code Generation Tasks | Liguo Chen et.al. | 2408.16498 | null |
2024-08-28 | Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders | Min Shi et.al. | 2408.15998 | link |
2024-08-29 | Spatio-Temporal Context Prompting for Zero-Shot Action Detection | Wei-Jhe Huang et.al. | 2408.15996 | null |
2024-08-28 | Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration | Xu Zhang et.al. | 2408.15994 | null |
2024-08-28 | BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems | Wei Wang et.al. | 2408.15971 | null |
2024-08-28 | More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding | Yuan Tang et.al. | 2408.15966 | link |
2024-08-28 | Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games | Nicholas R. Waytowich et.al. | 2408.15950 | null |
2024-08-28 | DeMoBot: Deformable Mobile Manipulation with Vision-based Sub-goal Retrieval | Yuying Zhang et.al. | 2408.15919 | null |
2024-08-28 | Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models | Yuncheng Yang et.al. | 2408.15915 | link |
2024-08-28 | Decentralized LLM Inference over Edge Networks with Energy Harvesting | Aria Khoshsirat et.al. | 2408.15907 | null |
2024-08-28 | LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments | Ruirui Chen et.al. | 2408.15903 | null |
2024-08-28 | Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts | Nikolas Gritsch et.al. | 2408.15901 | null |
2024-08-28 | Bias in LLMs as Annotators: The Effect of Party Cues on Labelling Decision by Large Language Models | Sebastian Vallejo Vera et.al. | 2408.15895 | null |
2024-08-28 | LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation | Fangxun Shu et.al. | 2408.15881 | link |
2024-08-28 | Persuasion Games using Large Language Models | Ganesh Prasath Ramani et.al. | 2408.15879 | null |
2024-08-28 | Retrieval-Augmented Instruction Tuning for Automated Process Engineering Calculations : A Tool-Chaining Problem-Solving Framework with Attributable Reflection | Sagar Srinivas Sakhinana et.al. | 2408.15866 | null |
2024-08-28 | Benchmarking foundation models as feature extractors for weakly-supervised computational pathology | Peter Neidlinger et.al. | 2408.15823 | null |
2024-08-28 | Visual Prompt Engineering for Medical Vision Language Models in Radiology | Stefan Denner et.al. | 2408.15802 | null |
2024-08-28 | Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization | Léo Hemamou et.al. | 2408.15801 | null |
2024-08-28 | Evaluating Named Entity Recognition Using Few-Shot Prompting with Large Language Models | Hédi Zhegidi et.al. | 2408.15796 | link |
2024-08-28 | Efficient LLM Scheduling by Learning to Rank | Yichao Fu et.al. | 2408.15792 | link |
2024-08-27 | Generative Verifiers: Reward Modeling as Next-Token Prediction | Lunjun Zhang et.al. | 2408.15240 | null |
2024-08-27 | The Mamba in the Llama: Distilling and Accelerating Hybrid Models | Junxiong Wang et.al. | 2408.15237 | link |
2024-08-27 | Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations | Yucheng Jiang et.al. | 2408.15232 | null |
2024-08-27 | LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet | Nathaniel Li et.al. | 2408.15221 | null |
2024-08-27 | Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks | Shide Zhou et.al. | 2408.15207 | null |
2024-08-27 | Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation | Jian Hu et.al. | 2408.15205 | link |
2024-08-27 | Can Unconfident LLM Annotations Be Used for Confident Conclusions? | Kristina Gligorić et.al. | 2408.15204 | link |
2024-08-27 | Infusing Acoustic Pause Context into Text-Based Dementia Assessment | Franziska Braun et.al. | 2408.15188 | null |
2024-08-27 | Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement | Longshen Ou et.al. | 2408.15176 | null |
2024-08-27 | X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation | Hanjia Lyu et.al. | 2408.15172 | null |
2024-08-27 | Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation | N. E. Kriman et.al. | 2408.15171 | null |
2024-08-27 | How transformers learn structured data: insights from hierarchical filtering | Jerome Garnier-Brun et.al. | 2408.15138 | link |
2024-08-27 | CLIP-AGIQA: Boosting the Performance of AI-Generated Image Quality Assessment with CLIP | Zhenchen Tang et.al. | 2408.15098 | null |
2024-08-27 | Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models | Xiyu Liu et.al. | 2408.15091 | null |
2024-08-27 | BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline | Guosheng Dong et.al. | 2408.15079 | null |
2024-08-27 | Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models | Ned Cooper et.al. | 2408.15066 | null |
2024-08-27 | The Benefits of Balance: From Information Projections to Variance Reduction | Lang Liu et.al. | 2408.15065 | null |
2024-08-28 | DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding | Wenhui Liao et.al. | 2408.15045 | null |
2024-08-28 | A Survey of Large Language Models for European Languages | Wazir Ali et.al. | 2408.15040 | null |
2024-08-27 | Speech Recognition Transformers: Topological-lingualism Perspective | Shruti Singh et.al. | 2408.14991 | null |
2024-08-26 | A Practitioner's Guide to Continual Multimodal Pretraining | Karsten Roth et.al. | 2408.14471 | link |
2024-08-27 | Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models | Aradhye Agarwal et.al. | 2408.14470 | link |
2024-08-26 | Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos | Qirui Chen et.al. | 2408.14469 | null |
2024-08-26 | Explicit Inductive Inference using Large Language Models | Tianyang Liu et.al. | 2408.14467 | null |
2024-08-26 | Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study | Liuchang Xu Shuo Zhao et.al. | 2408.14438 | null |
2024-08-26 | Social perception of faces in a vision-language model | Carina I. Hausladen et.al. | 2408.14435 | link |
2024-08-26 | CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models | Shubham Bharti et.al. | 2408.14419 | null |
2024-08-26 | MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues | Kuluhan Binici et.al. | 2408.14418 | null |
2024-08-26 | Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse | Yahao Ding et.al. | 2408.14416 | null |
2024-08-26 | Language-specific Calibration for Pruning Multilingual Language Models | Simon Kurz et.al. | 2408.14398 | null |
2024-08-26 | Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning | Sakhinana Sagar Srinivas et.al. | 2408.14387 | null |
2024-08-26 | Probing Causality Manipulation of Large Language Models | Chenyang Zhang et.al. | 2408.14380 | link |
2024-08-26 | An Embedding is Worth a Thousand Noisy Labels | Francesco Di Salvo et.al. | 2408.14358 | link |
2024-08-26 | SWE-bench-java: A GitHub Issue Resolving Benchmark for Java | Daoguang Zan et.al. | 2408.14354 | link |
2024-08-26 | Assessing Contamination in Large Language Models: Introducing the LogProber method | Nicolas Yax et.al. | 2408.14352 | null |
2024-08-26 | Foundation Models for Music: A Survey | Yinghao Ma et.al. | 2408.14340 | link |
2024-08-26 | Claim Verification in the Age of Large Language Models: A Survey | Alphaeus Dmonte et.al. | 2408.14317 | null |
2024-08-26 | LLM-3D Print: Large Language Models To Monitor and Control 3D Printing | Yayati Jadhav et.al. | 2408.14307 | null |
2024-08-26 | Investigating the Effectiveness of Bayesian Spam Filters in Detecting LLM-modified Spam Mails | Malte Josten et.al. | 2408.14293 | link |
2024-08-26 | Predictability and Causality in Spanish and English Natural Language Generation | Andrea Busto-Castiñeira et.al. | 2408.14283 | null |
2024-08-23 | MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? | Yi-Fan Zhang et.al. | 2408.13257 | null |
2024-08-23 | Domain-specific long text classification from sparse relevant information | Célia D'Cruz et.al. | 2408.13253 | null |
2024-08-23 | Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption | Sakhinana Sagar Srinivas et.al. | 2408.13248 | null |
2024-08-23 | Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time | Yingyu Liang et.al. | 2408.13233 | null |
2024-08-23 | EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods | Hongcheng Ding et.al. | 2408.13214 | null |
2024-08-23 | DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation | Qiming Zhu et.al. | 2408.13204 | null |
2024-08-23 | Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning | Hourui Deng et.al. | 2408.13184 | null |
2024-08-23 | IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models | Zhihao Yu et.al. | 2408.13073 | link |
2024-08-23 | Guiding IoT-Based Healthcare Alert Systems with Large Language Models | Yulan Gao et.al. | 2408.13071 | null |
2024-08-23 | SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks | Kai-Wei Chang et.al. | 2408.13040 | null |
2024-08-23 | VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models | Wentao Wu et.al. | 2408.13031 | link |
2024-08-23 | In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting | Haowei Du et.al. | 2408.13028 | null |
2024-08-23 | A Web-Based Solution for Federated Learning with LLM-Based Automation | Chamith Mawela et.al. | 2408.13010 | null |
2024-08-23 | Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates | Hui Wei et.al. | 2408.13006 | link |
2024-08-23 | CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution | Ruiyang Xu et.al. | 2408.13001 | null |
2024-08-23 | Open Llama2 Model for the Lithuanian Language | Artūras Nakvosas et.al. | 2408.12963 | null |
2024-08-23 | Multimodal Contrastive In-Context Learning | Yosuke Miyanishi et.al. | 2408.12959 | null |
2024-08-23 | Image Segmentation in Foundation Model Era: A Survey | Tianfei Zhou et.al. | 2408.12957 | link |
2024-08-23 | E-code: Mastering Efficient Code Generation through Pretrained Models and Expert Encoder Group | Yue Pan et.al. | 2408.12948 | null |
2024-08-23 | Causal-Guided Active Learning for Debiasing Large Language Models | Zhouhao Sun et.al. | 2408.12942 | link |
2024-08-22 | Controllable Text Generation for Large Language Models: A Survey | Xun Liang et.al. | 2408.12599 | link |
2024-08-23 | Non-Homophilic Graph Pre-Training and Prompt Learning | Xingtong Yu et.al. | 2408.12594 | link |
2024-08-22 | RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment | Xiaohan Wang et.al. | 2408.12579 | null |
2024-08-22 | MuMA-ToM: Multi-modal Multi-Agent Theory of Mind | Haojun Shi et.al. | 2408.12574 | link |
2024-08-22 | Jamba-1.5: Hybrid Transformer-Mamba Models at Scale | Jamba Team et.al. | 2408.12570 | null |
2024-08-22 | ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation | Lujia Zhong et.al. | 2408.12561 | link |
2024-08-22 | Towards Evaluating and Building Versatile Large Language Models for Medicine | Chaoyi Wu et.al. | 2408.12547 | link |
2024-08-22 | Show-o: One Single Transformer to Unify Multimodal Understanding and Generation | Jinheng Xie et.al. | 2408.12528 | null |
2024-08-22 | MEDCO: Medical Education Copilots Based on A Multi-Agent Framework | Hao Wei et.al. | 2408.12496 | null |
2024-08-22 | GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models | Kunsheng Tang et.al. | 2408.12494 | link |
2024-08-23 | Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese | Khang T. Doan et.al. | 2408.12480 | null |
2024-08-22 | Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition | Bozheng Li et.al. | 2408.12475 | null |
2024-08-22 | DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems | Jiaju Chen et.al. | 2408.12470 | link |
2024-08-22 | Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning | Mushui Liu et.al. | 2408.12469 | null |
2024-08-22 | Enhancing Multi-hop Reasoning through Knowledge Erasure in Large Language Model Editing | Mengqi Zhang et.al. | 2408.12456 | null |
2024-08-22 | Positional Description for Numerical Normalization | Deepanshu Gupta et.al. | 2408.12430 | null |
2024-08-22 | FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing | Jue Wang et.al. | 2408.12429 | link |
2024-08-22 | Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification | Sudi Murindanyi et.al. | 2408.12426 | null |
2024-08-22 | Unlearning Trojans in Large Language Models: A Comparison Between Natural Language and Source Code | Mahdi Kazemi et.al. | 2408.12416 | null |
2024-08-22 | Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes | Sota Kato et.al. | 2408.12406 | link |
2024-08-21 | Great Memory, Shallow Reasoning: Limits of |
Shangyi Geng et.al. | 2408.11815 | link |
2024-08-21 | SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs | Yuanyang Yin et.al. | 2408.11813 | null |
2024-08-21 | EmbodiedSAM: Online Segment Any 3D Thing in Real Time | Xiuwei Xu et.al. | 2408.11811 | null |
2024-08-21 | Approaching Deep Learning through the Spectral Dynamics of Weights | David Yunis et.al. | 2408.11804 | link |
2024-08-21 | Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models | Yuzhou Huang et.al. | 2408.11801 | null |
2024-08-21 | PermitQA: A Benchmark for Retrieval Augmented Generation in Wind Siting and Permitting domain | Rounak Meyur et.al. | 2408.11800 | null |
2024-08-21 | Practical token pruning for foundation models in few-shot conversational virtual assistant systems | Haode Qi et.al. | 2408.11799 | null |
2024-08-21 | EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model | Feipeng Ma et.al. | 2408.11795 | null |
2024-08-21 | Leveraging Chemistry Foundation Models to Facilitate Structure Focused Retrieval Augmented Generation in Multi-Agent Workflows for Catalyst and Materials Design | Nathaniel H. Park et.al. | 2408.11793 | null |
2024-08-21 | Critique-out-Loud Reward Models | Zachary Ankner et.al. | 2408.11791 | link |
2024-08-21 | DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework | Zhifei Xie et.al. | 2408.11788 | null |
2024-08-21 | Personality Alignment of Large Language Models | Minjun Zhu et.al. | 2408.11779 | link |
2024-08-21 | Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards | Omar Erak et.al. | 2408.11775 | link |
2024-08-21 | Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks | Yiyi Chen et.al. | 2408.11749 | link |
2024-08-21 | DH-Bench: Probing Depth and Height Perception of Large Visual-Language Models | Shehreen Azad et.al. | 2408.11748 | link |
2024-08-21 | Open-Ended 3D Point Cloud Instance Segmentation | Phuc D. A. Nguyen et.al. | 2408.11747 | null |
2024-08-21 | Mixed Sparsity Training: Achieving 4 |
Pihe Hu et.al. | 2408.11746 | null |
2024-08-21 | FocusLLM: Scaling LLM's Context by Parallel Decoding | Zhenyu Li et.al. | 2408.11745 | link |
2024-08-21 | MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models | Elias Frantar et.al. | 2408.11743 | link |
2024-08-21 | CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering | Yuliang Cai et.al. | 2408.11742 | link |
2024-08-20 | Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement | Satoshi Kosugi et.al. | 2408.11055 | link |
2024-08-20 | Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks | Nathaniel Pinckney et.al. | 2408.11053 | link |
2024-08-20 | FLAME: Learning to Navigate with Multimodal LLM in Urban Environments | Yunzhe Xu et.al. | 2408.11051 | link |
2024-08-20 | MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding | Jian Chen et.al. | 2408.11049 | link |
2024-08-20 | Inside the Black Box: Detecting Data Leakage in Pre-trained Language Encoders | Yuan Xin et.al. | 2408.11046 | null |
2024-08-20 | Reconciling Methodological Paradigms: Employing Large Language Models as Novice Qualitative Research Assistants in Talent Management Research | Sreyoshi Bhaduri et.al. | 2408.11043 | null |
2024-08-20 | Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model | Chunting Zhou et.al. | 2408.11039 | null |
2024-08-20 | Scaling Law with Learning Rate Annealing | Howe Tissue et.al. | 2408.11029 | null |
2024-08-20 | Athena: Safe Autonomous Agents with Verbal Contrastive Learning | Tanmana Sadhu et.al. | 2408.11021 | null |
2024-08-20 | While GitHub Copilot Excels at Coding, Does It Ensure Responsible Output? | Wen Cheng et.al. | 2408.11006 | link |
2024-08-20 | SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised Pretraining | Jonathan Prexl et.al. | 2408.11000 | link |
2024-08-20 | CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models | Michael Reinisch et.al. | 2408.10995 | null |
2024-08-20 | Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models | Yuyan Chen et.al. | 2408.10947 | null |
2024-08-20 | Large Language Model Driven Recommendation | Anton Korikov et.al. | 2408.10946 | null |
2024-08-20 | HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments | Kazi Hasan Ibn Arif et.al. | 2408.10945 | link |
2024-08-20 | SysBench: Can Large Language Models Follow System Messages? | Yanzhao Qin et.al. | 2408.10943 | link |
2024-08-20 | Proxona: Leveraging LLM-Driven Personas to Enhance Creators' Understanding of Their Audience | Yoonseo Choi et.al. | 2408.10937 | null |
2024-08-20 | LBC: Language-Based-Classifier for Out-Of-Variable Generalization | Kangjun Noh et.al. | 2408.10923 | link |
2024-08-21 | BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model | Yeyong Yu et.al. | 2408.10903 | link |
2024-08-20 | Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs | John Mendonça et.al. | 2408.10902 | link |
2024-08-19 | SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP | Yusuke Hirota et.al. | 2408.10202 | null |
2024-08-19 | Demystifying the Communication Characteristics for Distributed Transformer Models | Quentin Anthony et.al. | 2408.10197 | null |
2024-08-19 | Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models | Aviv Bick et.al. | 2408.10189 | null |
2024-08-19 | LongVILA: Scaling Long-Context Visual Language Models for Long Videos | Fuzhao Xue et.al. | 2408.10188 | link |
2024-08-19 | SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models | Anke Tang et.al. | 2408.10174 | link |
2024-08-19 | Customizing Language Models with Instance-wise LoRA for Sequential Recommendation | Xiaoyu Kong et.al. | 2408.10159 | link |
2024-08-19 | Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models | Amey Hengle et.al. | 2408.10151 | link |
2024-08-19 | In-Context Learning with Representations: Contextual Generalization of Trained Transformers | Tong Yang et.al. | 2408.10147 | null |
2024-08-19 | Instruction Finetuning for Leaderboard Generation from Empirical AI Research | Salomon Kabongo et.al. | 2408.10141 | null |
2024-08-19 | Rhyme-aware Chinese lyric generator based on GPT | Yixiao Yuan et.al. | 2408.10130 | null |
2024-08-19 | Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track | Feiyu Pan et.al. | 2408.10125 | null |
2024-08-19 | Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models | Tianyu Zhang et.al. | 2408.10124 | link |
2024-08-19 | Geometry Informed Tokenization of Molecules for Language Model Generation | Xiner Li et.al. | 2408.10120 | null |
2024-08-19 | GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization | Ran Liu et.al. | 2408.10115 | link |
2024-08-20 | PLUTUS: A Well Pre-trained Large Unified Transformer can Unveil Financial Time Series Regularities | Yuanjian Xu et.al. | 2408.10111 | null |
2024-08-19 | ARMADA: Attribute-Based Multimodal Data Augmentation | Xiaomeng Jin et.al. | 2408.10086 | null |
2024-08-19 | Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning | Sriyash Poddar et.al. | 2408.10075 | null |
2024-08-19 | FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant | Zhengchao Huang et.al. | 2408.10072 | link |
2024-08-19 | Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory | Haoran Li et.al. | 2408.10053 | null |
2024-08-19 | Defense Priorities in the Open-Source AI Debate: A Preliminary Assessment | Masao Dahlgren et.al. | 2408.10026 | null |
2024-08-16 | SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation | Xinyu Xiong et.al. | 2408.08870 | link |
2024-08-16 | PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars | Sumanth Prabhu et.al. | 2408.08869 | null |
2024-08-16 | A Hassle-free Algorithm for Private Learning in Practice: Don't Use Tree Aggregation, Use BLTs | H. Brendan McMahan et.al. | 2408.08868 | null |
2024-08-16 | Visual Agents as Fast and Slow Thinkers | Guangyan Sun et.al. | 2408.08862 | link |
2024-08-16 | DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models | Eman Ali et.al. | 2408.08855 | null |
2024-08-16 | GeoTransformer: Enhancing Urban Forecasting with Geospatial Attention Mechanisms | Yuhao Jia et.al. | 2408.08852 | null |
2024-08-16 | ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis | Yubao Zhao et.al. | 2408.08849 | link |
2024-08-16 | PsychoLex: Unveiling the Psychological Mind of Large Language Models | Mohammad Amin Abbasi et.al. | 2408.08848 | null |
2024-08-16 | FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats | Xuanliang Zhang et.al. | 2408.08841 | link |
2024-08-16 | EasyRec: Simple yet Effective Language Models for Recommendation | Xubin Ren et.al. | 2408.08821 | link |
2024-08-16 | Retrieval-augmented Few-shot Medical Image Segmentation with Foundation Models | Lin Zhao et.al. | 2408.08813 | null |
2024-08-16 | Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors | Felipe A. Csaszar et.al. | 2408.08811 | null |
2024-08-16 | Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge | Ravi Raju et.al. | 2408.08808 | null |
2024-08-16 | CIKMar: A Dual-Encoder Approach to Prompt-Based Reranking in Educational Dialogue Systems | Joanito Agili Lopo et.al. | 2408.08805 | null |
2024-08-16 | A Disease-Specific Foundation Model Using Over 100K Fundus Images: Release and Validation for Abnormality and Multi-Disease Classification on Downstream Tasks | Boa Jang et.al. | 2408.08790 | link |
2024-08-16 | EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics | Chenwei Wan et.al. | 2408.08782 | link |
2024-08-16 | Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions | Chenming Tang et.al. | 2408.08780 | null |
2024-08-16 | DAC: Decomposed Automation Correction for Text-to-SQL | Dingzirui Wang et.al. | 2408.08779 | link |
2024-08-16 | Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused | Dingwei Chen et.al. | 2408.08769 | null |
2024-08-16 | Rethinking Generative Semantic Communication for Multi-User Systems with Multi-Modal LLM | Wanting Yang et.al. | 2408.08765 | null |
2024-08-15 | Can Large Language Models Understand Symbolic Graphics Programs? | Zeju Qiu et.al. | 2408.08313 | null |
2024-08-15 | ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws | Ruihang Li et.al. | 2408.08310 | null |
2024-08-15 | Towards Flexible Visual Relationship Segmentation | Fangrui Zhu et.al. | 2408.08305 | null |
2024-08-15 | Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors | Usman Syed et.al. | 2408.08302 | null |
2024-08-15 | VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object Localization Probability Maps | Senthil Hariharan Arul et.al. | 2408.08301 | null |
2024-08-15 | HELP: Hierarchical Embeddings-based Log Parsing | Andy Xu et.al. | 2408.08300 | null |
2024-08-15 | The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community | Shachar Don-Yehiya et.al. | 2408.08291 | null |
2024-08-15 | Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model | Jin Wang et.al. | 2408.08282 | null |
2024-08-15 | BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts | Qizhen Zhang et.al. | 2408.08274 | null |
2024-08-15 | DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System | Xihong Yang et.al. | 2408.08231 | null |
2024-08-15 | RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science | David Farr et.al. | 2408.08217 | null |
2024-08-15 | Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models | Javier González et.al. | 2408.08210 | null |
2024-08-15 | LLM4DSR: Leveraing Large Language Model for Denoising Sequential Recommendation | Bohao Wang et.al. | 2408.08208 | null |
2024-08-15 | Heavy Labels Out! Dataset Distillation with Label Space Lightening | Ruonan Yu et.al. | 2408.08201 | null |
2024-08-15 | Scaling Up Natural Language Understanding for Multi-Robots Through the Lens of Hierarchy | Shaojun Xu et.al. | 2408.08188 | null |
2024-08-15 | General-purpose Clothes Manipulation with Semantic Keypoints | Yuhong Deng et.al. | 2408.08160 | null |
2024-08-15 | EmBARDiment: an Embodied AI Agent for Productivity in XR | Riccardo Bovo et.al. | 2408.08158 | null |
2024-08-15 | DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search | Huajian Xin et.al. | 2408.08152 | link |
2024-08-15 | P/D-Serve: Serving Disaggregated Large Language Model at Scale | Yibo Jin et.al. | 2408.08147 | null |
2024-08-15 | KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning | Kaiqi Zhang et.al. | 2408.08146 | null |
2024-08-14 | The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models | Karime Maamari et.al. | 2408.07702 | null |
2024-08-15 | Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities | Enneng Yang et.al. | 2408.07666 | link |
2024-08-14 | Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models | Yi-Cheng Lin et.al. | 2408.07665 | link |
2024-08-14 | Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions | Quan Liu et.al. | 2408.07663 | link |
2024-08-14 | WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs | Weijian Xie et.al. | 2408.07611 | null |
2024-08-14 | Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey | Hamza Kheddar et.al. | 2408.07583 | null |
2024-08-15 | MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Minxuan Zhou et.al. | 2408.07543 | link |
2024-08-15 | Usefulness of data flow diagrams and large language models for security threat validation: a registered report | Winnie Bahati Mbaka et.al. | 2408.07537 | null |
2024-08-14 | Development of a Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments | Seungjun Han et.al. | 2408.07531 | null |
2024-08-14 | Large Language Models Know What Makes Exemplary Contexts | Quanyu Long et.al. | 2408.07505 | null |
2024-08-14 | Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach | Shizhou Zhang et.al. | 2408.07500 | link |
2024-08-14 | QirK: Question Answering via Intermediate Representation on Knowledge Graphs | Jan Luca Scheerer et.al. | 2408.07494 | null |
2024-08-14 | Training Overhead Ratio: A Practical Reliability Metric for Large Language Model Training Systems | Ning Lu et.al. | 2408.07482 | null |
2024-08-14 | Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization | Yuxin Jiang et.al. | 2408.07471 | link |
2024-08-14 | Domain-invariant Representation Learning via Segment Anything Model for Blood Cell Classification | Yongcheng Li et.al. | 2408.07467 | link |
2024-08-14 | Large Language Models Prompting With Episodic Memory | Dai Do et.al. | 2408.07465 | null |
2024-08-14 | From Brazilian Portuguese to European Portuguese | João Sanches et.al. | 2408.07457 | null |
2024-08-14 | Fact or Fiction? Improving Fact Verification with Knowledge Graphs through Simplified Subgraph Retrievals | Tobias A. Opsahl et.al. | 2408.07453 | link |
2024-08-15 | BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning | Asif Hanif et.al. | 2408.07440 | link |
2024-08-14 | Beyond Inter-Item Relations: Dynamic Adaptive Mixture-of-Experts for LLM-Based Sequential Recommendation | CanYi Liu et.al. | 2408.07427 | null |
2024-08-13 | Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents | Kexun Zhang et.al. | 2408.07060 | null |
2024-08-13 | LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs | Yushi Bai et.al. | 2408.07055 | link |
2024-08-13 | Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models | Chun Jie Chong et.al. | 2408.07004 | null |
2024-08-13 | LLMs can Schedule | Henrik Abgaryan et.al. | 2408.06993 | link |
2024-08-13 | DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs | Dongyuan Li et.al. | 2408.06966 | null |
2024-08-13 | Towards Holistic Disease Risk Prediction using Small Language Models | Liv Björkdahl et.al. | 2408.06943 | null |
2024-08-13 | OpenResearcher: Unleashing AI for Accelerated Scientific Research | Yuxiang Zheng et.al. | 2408.06941 | link |
2024-08-13 | The advantages of context specific language models: the case of the Erasmian Language Model | João Gonçalves et.al. | 2408.06931 | link |
2024-08-13 | Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas | Louis Kwok et.al. | 2408.06929 | link |
2024-08-13 | SceneGPT: A Language Model for 3D Scene Understanding | Shivam Chandhok et.al. | 2408.06926 | null |
2024-08-13 | Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives | Zhihu Wang et.al. | 2408.06904 | null |
2024-08-13 | Leveraging Language Models for Emotion and Behavior Analysis in Education | Kaito Tanaka et.al. | 2408.06874 | null |
2024-08-13 | LoRA |
Jia-Chen Zhang et.al. | 2408.06854 | null |
2024-08-13 | Causal Agent based on Large Language Model | Kairong Han et.al. | 2408.06849 | link |
2024-08-13 | DracoGPT: Extracting Visualization Design Preferences from Large Language Models | Huichen Will Wang et.al. | 2408.06845 | null |
2024-08-13 | How Aligned are Human Chart Takeaways and LLM Predictions? A Case Study on Bar Charts with Varying Layouts | Huichen Will Wang et.al. | 2408.06837 | null |
2024-08-13 | Efficient Search for Customized Activation Functions with Gradient Descent | Lukas Strack et.al. | 2408.06820 | link |
2024-08-13 | MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty | Yongjin Yang et.al. | 2408.06816 | null |
2024-08-13 | HLSPilot: LLM-based High-Level Synthesis | Chenwei Xiong et.al. | 2408.06810 | link |
2024-08-13 | Layerwise Recurrent Router for Mixture-of-Experts | Zihan Qiu et.al. | 2408.06793 | link |
2024-08-12 | FastFiD: Improve Inference Efficiency of Open Domain Question Answering via Sentence Selection | Yufei Huang et.al. | 2408.06333 | link |
2024-08-12 | Animate, or Inanimate, That is the Question for Large Language Models | Leonardo Ranaldi et.al. | 2408.06332 | null |
2024-08-12 | Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let's Take TravelPlanner as an Example | Yanan Chen et.al. | 2408.06318 | null |
2024-08-12 | Long-Form Answers to Visual Questions from Blind and Low Vision People | Mina Huh et.al. | 2408.06303 | null |
2024-08-12 | The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery | Chris Lu et.al. | 2408.06292 | link |
2024-08-12 | MovieSum: An Abstractive Summarization Dataset for Movie Screenplays | Rohit Saxena et.al. | 2408.06281 | link |
2024-08-13 | Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation | Jieyong Kim et.al. | 2408.06276 | link |
2024-08-12 | FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data | Haoran Sun et.al. | 2408.06273 | link |
2024-08-12 | A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution | Sampath Rajapaksha et.al. | 2408.06272 | null |
2024-08-12 | Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment | Karel D'Oosterlinck et.al. | 2408.06266 | link |
2024-08-12 | Context-aware Visual Storytelling with Visual Prefix Tuning and Contrastive Learning | Yingjin Song et.al. | 2408.06259 | null |
2024-08-12 | On Effects of Steering Latent Representation for Large Language Model Unlearning | Dang Huu-Tien et.al. | 2408.06223 | link |
2024-08-12 | Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers | Zhenting Qi et.al. | 2408.06195 | link |
2024-08-12 | FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework | Lukas Meyer et.al. | 2408.06190 | link |
2024-08-12 | Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting | Halley Young et.al. | 2408.06186 | null |
2024-08-12 | OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning | Mushui Liu et.al. | 2408.06158 | link |
2024-08-12 | LipidBERT: A Lipid Language Model Pre-trained on METiS de novo Lipid Library | Tianhao Yu et.al. | 2408.06150 | null |
2024-08-12 | Self-Supervised Learning on MeerKAT Wide-Field Continuum Images | Erica Lastufka et.al. | 2408.06147 | link |
2024-08-12 | Med42-v2: A Suite of Clinical LLMs | Clément Christophe et.al. | 2408.06142 | null |
2024-08-12 | Utilize Transformers for translating Wikipedia category names | Hoang-Thang Ta et.al. | 2408.06124 | null |
2024-08-10 | Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions | Michele Miranda et.al. | 2408.05212 | link |
2024-08-09 | VITA: Towards Open-Source Interactive Omni Multimodal LLM | Chaoyou Fu et.al. | 2408.05211 | link |
2024-08-09 | Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners | Michael Vaccaro Jr et.al. | 2408.05204 | null |
2024-08-09 | TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning | Yujie Feng et.al. | 2408.05200 | link |
2024-08-09 | ECG-FM: An Open Electrocardiogram Foundation Model | Kaden McKeen et.al. | 2408.05178 | link |
2024-08-09 | Weak-Annotation of HAR Datasets using Vision Foundation Models | Marius Bock et.al. | 2408.05169 | link |
2024-08-09 | AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset | Pritam Deka et.al. | 2408.05149 | null |
2024-08-09 | A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning | Ye Yuan et.al. | 2408.05141 | null |
2024-08-09 | Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations | Jasmine Latendresse et.al. | 2408.05128 | null |
2024-08-09 | Large Language Models and Thematic Analysis: Human-AI Synergy in Researching Hate Speech on Social Media | Petre Breazu et.al. | 2408.05126 | null |
2024-08-09 | Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video | Chunggi Lee et.al. | 2408.05123 | null |
2024-08-09 | A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? | Xinyu Liu et.al. | 2408.05109 | link |
2024-08-09 | Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection | Xincheng Pang et.al. | 2408.05107 | null |
2024-08-09 | How Well Do LLMs Identify Cultural Unity in Diversity? | Jialin Li et.al. | 2408.05102 | link |
2024-08-09 | Hyperbolic Learning with Multimodal Large Language Models | Paolo Mandica et.al. | 2408.05097 | null |
2024-08-09 | Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts | Tingchen Fu et.al. | 2408.05094 | null |
2024-08-09 | Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models | Zikai Xie et.al. | 2408.05093 | link |
2024-08-09 | Generating novel experimental hypotheses from language models: A case study on cross-dative generalization | Kanishka Misra et.al. | 2408.05086 | link |
2024-08-09 | RT-Surv: Improving Mortality Prediction After Radiotherapy with Large Language Model Structuring of Large-Scale Unstructured Electronic Health Records | Sangjoon Park et.al. | 2408.05074 | null |
2024-08-09 | Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil | Marcelo Sartori Locatelli et.al. | 2408.05035 | null |
2024-08-08 | Better Alignment with Instruction Back-and-Forth Translation | Thao Nguyen et.al. | 2408.04614 | null |
2024-08-08 | Code-switching in text and speech reveals information-theoretic audience design | Debasmita Bhattacharya et.al. | 2408.04596 | null |
2024-08-09 | Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models | Qirui Jiao et.al. | 2408.04594 | link |
2024-08-08 | Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness | Xiaojing Fan et.al. | 2408.04585 | null |
2024-08-08 | SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More | Tianrun Chen et.al. | 2408.04579 | null |
2024-08-08 | SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals | Haoran Zheng et.al. | 2408.04575 | null |
2024-08-08 | Learning Fine-Grained Grounded Citations for Attributed Large Language Models | Lei Huang et.al. | 2408.04568 | link |
2024-08-08 | Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models | Yupeng Chang et.al. | 2408.04556 | link |
2024-08-08 | Depth Any Canopy: Leveraging Depth Foundation Models for Canopy Height Estimation | Daniele Rege Cambrin et.al. | 2408.04523 | link |
2024-08-08 | Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models | Fabio Pernisi et.al. | 2408.04522 | null |
2024-08-08 | What You Need is What You Get: Theory of Mind for an LLM-Based Code Understanding Assistant | Jonan Richards et.al. | 2408.04477 | null |
2024-08-08 | Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate | Yiqun Zhang et.al. | 2408.04472 | link |
2024-08-08 | RiskAwareBench: Towards Evaluating Physical Risk Awareness for High-level Planning of LLM-based Embodied Agents | Zihao Zhu et.al. | 2408.04449 | link |
2024-08-08 | Large Language Models for cross-language code clone detection | Micheline Bénédicte Moumoula et.al. | 2408.04430 | null |
2024-08-08 | Recognizing Emotion Regulation Strategies from Human Behavior with Large Language Models | Philipp Müller et.al. | 2408.04420 | null |
2024-08-08 | Enhancing Robustness of Retrieval-Augmented Language Models with In-Context Learning | Seong-Il Park et.al. | 2408.04414 | null |
2024-08-08 | Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers | Moritz Scherer et.al. | 2408.04413 | null |
2024-08-08 | Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset | Kentaro Ozeki et.al. | 2408.04403 | link |
2024-08-08 | Automated Educational Question Generation at Different Bloom's Skill Levels using Large Language Models: Strategies and Evaluation | Nicy Scaria et.al. | 2408.04394 | link |
2024-08-08 | Open-domain Implicit Format Control for Large Language Model Generation | Yiqun Yao et.al. | 2408.04392 | link |
2024-08-07 | How Well Can Vision Language Models See Image Details? | Chenhui Gou et.al. | 2408.03940 | null |
2024-08-07 | SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature | Vinícius Di Oliveira et.al. | 2408.03936 | null |
2024-08-07 | CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases | Xiangyan Liu et.al. | 2408.03910 | link |
2024-08-07 | Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models | Shachi H Kumar et.al. | 2408.03907 | null |
2024-08-07 | Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond | Beomseok Lee et.al. | 2408.03900 | link |
2024-08-07 | Simplifying Scholarly Abstracts for Accessible Digital Libraries | Haining Wang et.al. | 2408.03899 | link |
2024-08-07 | From Data to Story: Towards Automatic Animated Data Video Creation with LLM-based Multi-Agent Systems | Leixian Shen et.al. | 2408.03876 | null |
2024-08-07 | PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training | Haoran Xu et.al. | 2408.03865 | null |
2024-08-07 | GAIA -- A Large Language Model for Advanced Power Dispatch | Yuheng Cheng et.al. | 2408.03847 | null |
2024-08-07 | MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models | Yuchen Dong et.al. | 2408.03841 | null |
2024-08-07 | WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models | Prannaya Gupta et.al. | 2408.03837 | link |
2024-08-07 | Target Prompting for Information Extraction with Vision Language Model | Dipankar Medhi et.al. | 2408.03834 | null |
2024-08-07 | Leveraging Variation Theory in Counterfactual Data Augmentation for Optimized Active Learning | Simret Araya Gebreegziabher et.al. | 2408.03819 | null |
2024-08-07 | Generative Language Models with Retrieval Augmented Generation for Automated Short Answer Scoring | Zifan Wang et.al. | 2408.03811 | null |
2024-08-07 | 'Finance Wizard' at the FinLLM Challenge Task: Financial Text Summarization | Meisin Lee et.al. | 2408.03762 | null |
2024-08-07 | MMSummary: Multimodal Summary Generation for Fetal Ultrasound Video | Xiaoqing Guo et.al. | 2408.03761 | null |
2024-08-07 | Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation | Jingjing Xie et.al. | 2408.03735 | link |
2024-08-07 | Question Rephrasing for Quantifying Uncertainty in Large Language Models: Applications in Molecular Chemistry Tasks | Zizhang Chen et.al. | 2408.03732 | null |
2024-08-07 | A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models | Pengxiang Zhao et.al. | 2408.03728 | null |
2024-08-07 | Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction | Benjamin Matthias Ruppik et.al. | 2408.03706 | null |
2024-08-06 | CoverBench: A Challenging Benchmark for Complex Claim Verification | Alon Jacovi et.al. | 2408.03325 | null |
2024-08-06 | Segment Anything in Medical Images and Videos: Benchmark and Deployment | Jun Ma et.al. | 2408.03322 | link |
2024-08-06 | TextIM: Part-aware Interactive Motion Synthesis from Text | Siyuan Fan et.al. | 2408.03302 | null |
2024-08-06 | KaPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models | Ruizhe Zhang et.al. | 2408.03297 | null |
2024-08-06 | Biomedical SAM 2: Segment Anything in Biomedical Images and Videos | Zhiling Yan et.al. | 2408.03286 | link |
2024-08-07 | StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation | Boxi Cao et.al. | 2408.03281 | link |
2024-08-06 | Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments | Angie Boggust et.al. | 2408.03274 | null |
2024-08-06 | Synthesizing Text-to-SQL Data from Weak and Strong LLMs | Jiaxi Yang et.al. | 2408.03256 | null |
2024-08-06 | Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons | Yifei Wang et.al. | 2408.03247 | link |
2024-08-06 | Making Long-Context Language Models Better Multi-Hop Reasoners | Yanyang Li et.al. | 2408.03246 | link |
2024-08-06 | Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi | Pranita Deshmukh et.al. | 2408.03172 | null |
2024-08-06 | Conditioning LLMs with Emotion in Neural Machine Translation | Charles Brazier et.al. | 2408.03150 | null |
2024-08-06 | Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization | Yanghai Zhang et.al. | 2408.03149 | link |
2024-08-06 | Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations | Leo Donisch et.al. | 2408.03130 | null |
2024-08-06 | Lisbon Computational Linguists at SemEval-2024 Task 2: Using A Mistral 7B Model and Data Augmentation | Artur Guimarães et.al. | 2408.03127 | link |
2024-08-06 | Evaluating the Translation Performance of Large Language Models Based on Euas-20 | Yan Huang et.al. | 2408.03119 | null |
2024-08-06 | Topic Modeling with Fine-tuning LLMs and Bag of Sentences | Johannes Schneider et.al. | 2408.03099 | link |
2024-08-07 | TestART: Improving LLM-based Unit Test via Co-evolution of Automated Generation and Repair Iteration | Siqi Gu et.al. | 2408.03095 | null |
2024-08-06 | 500xCompressor: Generalized Prompt Compression for Large Language Models | Zongqian Li et.al. | 2408.03094 | link |
2024-08-06 | Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement | Le Yu et.al. | 2408.03092 | link |
2024-08-05 | Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining | Dongyang Liu et.al. | 2408.02657 | link |
2024-08-05 | Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models? | Mohammad Bahrami Karkevandi et.al. | 2408.02651 | null |
2024-08-05 | Command-line Obfuscation Detection using Small Language Models | Vojtech Outrata et.al. | 2408.02637 | null |
2024-08-05 | SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models | Muxi Diao et.al. | 2408.02632 | null |
2024-08-05 | Language Model Can Listen While Speaking | Ziyang Ma et.al. | 2408.02622 | null |
2024-08-05 | Progressively Selective Label Enhancement for Language Model Alignment | Biao Liu et.al. | 2408.02599 | null |
2024-08-05 | Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection | Sajal Aggarwal et.al. | 2408.02595 | null |
2024-08-05 | Leveraging the Power of LLMs: A Fine-Tuning Approach for High-Quality Aspect-Based Summarization | Ankan Mullick et.al. | 2408.02584 | null |
2024-08-05 | DanModCap: Designing a Danmaku Moderation Tool for Video-Sharing Platforms that Leverages Impact Captions | Siying Hu et.al. | 2408.02574 | null |
2024-08-05 | Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information | Yauwai Yim et.al. | 2408.02559 | null |
2024-08-05 | Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning | Hao Zhou et.al. | 2408.02549 | null |
2024-08-05 | RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation | Daniel Fleischer et.al. | 2408.02545 | link |
2024-08-05 | Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions | Xinbei Ma et.al. | 2408.02544 | link |
2024-08-05 | Towards Coarse-grained Visual Language Navigation Task Planning Enhanced by Event Knowledge Graph | Zhao Kaichen et.al. | 2408.02535 | null |
2024-08-05 | Practical Attacks against Black-box Code Completion Engines | Slobodan Jenko et.al. | 2408.02509 | null |
2024-08-05 | UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model | Zhaowei Li et.al. | 2408.02503 | link |
2024-08-05 | Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation | Aaron Imani et.al. | 2408.02502 | null |
2024-08-05 | A First Look at License Compliance Capability of LLMs in Code Generation | Weiwei Xu et.al. | 2408.02487 | link |
2024-08-05 | Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection | Ting Lei et.al. | 2408.02484 | link |
2024-08-05 | From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future | Haolin Jin et.al. | 2408.02479 | null |
2024-08-02 | Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting | Xiangyu Zhao et.al. | 2408.01423 | null |
2024-08-02 | Mission Impossible: A Statistical Perspective on Jailbreaking LLMs | Jingtong Su et.al. | 2408.01420 | null |
2024-08-02 | DebateQA: Evaluating Question Answering on Debatable Knowledge | Rongwu Xu et.al. | 2408.01419 | link |
2024-08-02 | Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs | Yilun Hua et.al. | 2408.01417 | null |
2024-08-02 | Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer | Yu Yang et.al. | 2408.01402 | null |
2024-08-02 | Coalitions of Large Language Models Increase the Robustness of AI Agents | Prattyush Mangal et.al. | 2408.01380 | null |
2024-08-02 | Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation | Jheng-Hong Yang et.al. | 2408.01363 | null |
2024-08-02 | Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs | Peng Ding et.al. | 2408.01355 | link |
2024-08-02 | MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code | Kaiwen Ning et.al. | 2408.01354 | link |
2024-08-02 | Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks | Anders Giovanni Møller et.al. | 2408.01346 | null |
2024-08-02 | MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models | Benno Weck et.al. | 2408.01337 | link |
2024-08-02 | A Backbone for Long-Horizon Robot Task Understanding | Xiaoshuai Chen et.al. | 2408.01334 | null |
2024-08-02 | FANNO: Augmenting High-Quality Instruction Data with Open-Sourced LLMs Only | He Zhu et.al. | 2408.01323 | null |
2024-08-02 | A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks | Jiaqi Wang et.al. | 2408.01319 | null |
2024-08-02 | Reconsidering Token Embeddings with the Definitions for Pre-trained Language Models | Ying Zhang et.al. | 2408.01308 | null |
2024-08-02 | The Mismeasure of Man and Models: Evaluating Allocational Harms in Large Language Models | Hannah Chen et.al. | 2408.01285 | null |
2024-08-02 | RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework | Kunlun Zhu et.al. | 2408.01262 | link |
2024-08-02 | The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models | Simone Caldarella et.al. | 2408.01228 | null |
2024-08-02 | High-Throughput Phenotyping of Clinical Text Using Large Language Models | Daniel B. Hier et.al. | 2408.01214 | null |
2024-08-02 | Misinforming LLMs: vulnerabilities, challenges and opportunities | Bo Zhou et.al. | 2408.01168 | null |
2024-08-01 | AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation | Mengkang Hu et.al. | 2408.00764 | link |
2024-08-01 | UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model | Xiangyu Fan et.al. | 2408.00762 | null |
2024-08-01 | Tamper-Resistant Safeguards for Open-Weight LLMs | Rishub Tamirisa et.al. | 2408.00761 | link |
2024-08-01 | Thermal Conductivity Predictions with Foundation Atomistic Models | Balázs Póta et.al. | 2408.00755 | link |
2024-08-01 | Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model | Benlin Liu et.al. | 2408.00754 | null |
2024-08-01 | Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation | Siyu Jiao et.al. | 2408.00744 | link |
2024-08-01 | DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency | Jovan Stojkovic et.al. | 2408.00741 | null |
2024-08-01 | Virchow 2: Scaling Self-Supervised Mixed Magnification Models in Pathology | Eric Zimmermann et.al. | 2408.00738 | null |
2024-08-01 | Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions | Guangzhi Xiong et.al. | 2408.00727 | link |
2024-08-01 | An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models | Yangzhen Wu et.al. | 2408.00724 | null |
2024-08-01 | Pathway to Secure and Trustworthy 6G for LLMs: Attacks, Defense, and Opportunities | Sunder Ali Khowaja et.al. | 2408.00722 | null |
2024-08-01 | SAM 2: Segment Anything in Images and Videos | Nikhila Ravi et.al. | 2408.00714 | link |
2024-08-01 | Point-supervised Brain Tumor Segmentation with Box-prompted MedSAM | Xiaofeng Liu et.al. | 2408.00706 | null |
2024-08-01 | Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning | Trapoom Ukarapol et.al. | 2408.00690 | link |
2024-08-01 | Can Developers Prompt? A Controlled Experiment for Code Documentation Generation | Hans-Alexander Kruse et.al. | 2408.00686 | null |
2024-08-01 | ExpertAF: Expert Actionable Feedback from Video | Kumar Ashutosh et.al. | 2408.00672 | null |
2024-08-01 | AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models | Daqin Luo et.al. | 2408.00665 | link |
2024-08-01 | Disentangling Dense Embeddings with Sparse Autoencoders | Charles O'Neill et.al. | 2408.00657 | null |
2024-08-02 | SentenceVAE: Faster, Longer and More Accurate Inference with Next-sentence Prediction for Large Language Models | Hongjun An et.al. | 2408.00655 | link |
2024-08-01 | Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning | Xuri Ge et.al. | 2408.00644 | null |
2024-07-31 | Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey | Atsuyuki Miyai et.al. | 2407.21794 | null |
2024-07-31 | Vision-Language Model Based Handwriting Verification | Mihir Chauhan et.al. | 2407.21788 | null |
2024-07-31 | Large Language Monkeys: Scaling Inference Compute with Repeated Sampling | Bradley Brown et.al. | 2407.21787 | link |
2024-07-31 | The Llama 3 Herd of Models | Abhimanyu Dubey et.al. | 2407.21783 | null |
2024-07-31 | Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs | Shi Liu et.al. | 2407.21771 | null |
2024-07-31 | MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts | Xi Victoria Lin et.al. | 2407.21770 | null |
2024-07-31 | ReplanVLM: Replanning Robotic Tasks with Visual Language Models | Aoran Mei et.al. | 2407.21762 | null |
2024-07-31 | Learning Video Context as Interleaved Multimodal Sequences | Kevin Qinghong Lin et.al. | 2407.21757 | link |
2024-07-31 | A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation | Mothilal Asokan et.al. | 2407.21739 | null |
2024-07-31 | Open-Vocabulary Audio-Visual Semantic Segmentation | Ruohao Guo et.al. | 2407.21721 | null |
2024-07-31 | Adaptive Retrieval-Augmented Generation for Conversational Systems | Xi Wang et.al. | 2407.21712 | null |
2024-07-31 | CEAR: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature | Stefan Langer et.al. | 2407.21708 | null |
2024-07-31 | TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities | Ming Zhang et.al. | 2407.21693 | link |
2024-07-31 | Synth-Empathy: Towards High-Quality Synthetic Empathy Data | Hao Liang et.al. | 2407.21669 | link |
2024-08-01 | Defending Jailbreak Attack in VLMs via Cross-modality Information Detector | Yue Xu et.al. | 2407.21659 | link |
2024-07-31 | MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment | Anurag Das et.al. | 2407.21654 | null |
2024-07-31 | Zero-Shot Cross-Domain Dialogue State Tracking via Dual Low-Rank Adaptation | Xiang Luo et.al. | 2407.21633 | link |
2024-07-31 | TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization Methods | Gabriel Loiseau et.al. | 2407.21630 | link |
2024-07-31 | LLM-for-X: Application-agnostic Integration of Large Language Models to Support Personal Writing Workflows | Lukas Teufelberger et.al. | 2407.21593 | null |
2024-07-31 | A Performance Study of LLM-Generated Code on Leetcode | Tristan Coignion et.al. | 2407.21579 | null |
2024-07-30 | ThinK: Thinner Key Cache by Query-Driven Pruning | Yuhui Xu et.al. | 2407.21018 | null |
2024-07-30 | CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning | Yuexi Du et.al. | 2407.21011 | link |
2024-07-30 | GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models | Ali Abdollahi et.al. | 2407.21001 | link |
2024-07-30 | MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning | Yupeng Chen et.al. | 2407.20999 | null |
2024-07-30 | From Feature Importance to Natural Language Explanations Using LLMs with RAG | Sule Tekkesinoglu et.al. | 2407.20990 | link |
2024-07-30 | Large Language Models (LLMs) for Semantic Communication in Edge-based IoT Networks | Alakesh Kalita et.al. | 2407.20970 | null |
2024-07-30 | MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions | Xiaowei Chi et.al. | 2407.20962 | link |
2024-07-30 | UniProcessor: A Text-induced Unified Low-level Image Processor | Huiyu Duan et.al. | 2407.20928 | link |
2024-07-30 | SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition | Hao Tan et.al. | 2407.20920 | null |
2024-07-30 | Automated Review Generation Method Based on Large Language Models | Shican Wu et.al. | 2407.20906 | link |
2024-07-30 | Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach | Adam Wojciechowski et.al. | 2407.20899 | link |
2024-07-30 | ThinkRepair: Self-Directed Automated Program Repair | Xin Yin et.al. | 2407.20898 | link |
2024-07-30 | Effective Black Box Testing of Sentiment Analysis Classification Networks | Parsa Karbasizadeh et.al. | 2407.20884 | null |
2024-07-30 | Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification | Boyang Zhang et.al. | 2407.20859 | null |
2024-07-30 | Learn by Selling: Equipping Large Language Models with Product Knowledge for Context-Driven Recommendations | Sarthak Anand et.al. | 2407.20856 | null |
2024-07-30 | Large Language Model (LLM)-enabled Graphs in Dynamic Networking | Geng Sun et.al. | 2407.20840 | null |
2024-07-30 | How to Measure the Intelligence of Large Language Models? | Nils Körber et.al. | 2407.20828 | null |
2024-07-30 | Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning | Norman Di Palo et.al. | 2407.20798 | null |
2024-07-30 | Interpretable Pre-Trained Transformers for Heart Time-Series Data | Harry J. Davies et.al. | 2407.20775 | link |
2024-07-30 | OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance | Yongqiang Yao et.al. | 2407.20761 | link |
2024-07-29 | Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing | Ekaterina Iakovleva et.al. | 2407.20232 | null |
2024-07-29 | Improving 2D Feature Representations by 3D-Aware Fine-Tuning | Yuanwen Yue et.al. | 2407.20229 | null |
2024-07-29 | FlexAttention for Efficient High-Resolution Vision-Language Models | Junyan Li et.al. | 2407.20228 | null |
2024-07-29 | Can Editing LLMs Inject Harm? | Canyu Chen et.al. | 2407.20224 | null |
2024-07-29 | SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction | Çağhan Köksal et.al. | 2407.20214 | null |
2024-07-29 | QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval | Hongming Tan et.al. | 2407.20207 | null |
2024-07-29 | MindSearch: Mimicking Human Minds Elicits Deep AI Searcher | Zehui Chen et.al. | 2407.20183 | link |
2024-07-29 | Theia: Distilling Diverse Vision Foundation Models for Robot Learning | Jinghuan Shang et.al. | 2407.20179 | link |
2024-07-29 | AutoScale: Automatic Prediction of Compute-optimal Data Composition for Training LLMs | Feiyang Kang et.al. | 2407.20177 | link |
2024-07-29 | Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning | Xingchen Zeng et.al. | 2407.20174 | link |
2024-07-29 | Diffusion Feedback Helps CLIP See Better | Wenxuan Wang et.al. | 2407.20171 | link |
2024-07-29 | Language-Conditioned Offline RL for Multi-Robot Navigation | Steven Morad et.al. | 2407.20164 | null |
2024-07-29 | rLLM: Relational Table Learning with LLMs | Weichen Li et.al. | 2407.20157 | link |
2024-07-29 | ByteCheckpoint: A Unified Checkpointing System for LLM Development | Borui Wan et.al. | 2407.20143 | null |
2024-07-29 | Strong Copyright Protection for Language Models via Adaptive Model Fusion | Javier Abad et.al. | 2407.20105 | null |
2024-07-29 | Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models | Zhe Li et.al. | 2407.20053 | null |
2024-07-29 | Exploring Large Language Models to generate Easy to Read content | Paloma Martínez et.al. | 2407.20046 | null |
2024-07-29 | MaskInversion: Localized Embeddings via Optimization of Explainability Maps | Walid Bousselham et.al. | 2407.20034 | null |
2024-07-29 | Efficient Training of Large Language Models on Distributed Infrastructures: A Survey | Jiangfei Duan et.al. | 2407.20018 | null |
2024-07-29 | Rosetta Statements: Lowering the Barrier for Semantic Parsing and Increasing the Cognitive Interoperability of Knowledge Graphs | Lars Vogt et.al. | 2407.20007 | null |
2024-07-26 | Wolf: Captioning Everything with a World Summarization Framework | Boyi Li et.al. | 2407.18908 | null |
2024-07-26 | SHIC: Shape-Image Correspondences with no Keypoint Supervision | Aleksandar Shtedritski et.al. | 2407.18907 | null |
2024-07-26 | A Flexible and Scalable Approach for Collecting Wildlife Advertisements on the Web | Juliana Barbosa et.al. | 2407.18898 | link |
2024-07-26 | Small Molecule Optimization with Large Language Models | Philipp Guevorguian et.al. | 2407.18897 | link |
2024-07-26 | Human-artificial intelligence teaming for scientific information extraction from data-driven additive manufacturing research using large language models | Mutahar Safdar et.al. | 2407.18827 | null |
2024-07-26 | Automatic Detection of Moral Values in Music Lyrics | Vjosa Preniqi et.al. | 2407.18787 | link |
2024-07-26 | The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs | Aleix Sant et.al. | 2407.18786 | null |
2024-07-26 | Foundation Models for the Digital Twin Creation of Cyber-Physical Systems | Shaukat Ali et.al. | 2407.18779 | null |
2024-07-26 | TAGIFY: LLM-powered Tagging Interface for Improved Data Findability on OGD portals | Kevin Kliimask et.al. | 2407.18764 | null |
2024-07-26 | Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery | Yuni Susanti et.al. | 2407.18752 | link |
2024-07-26 | Towards Effective and Efficient Continual Pre-training of Large Language Models | Jie Chen et.al. | 2407.18743 | null |
2024-07-26 | Towards Generalized Offensive Language Identification | Alphaeus Dmonte et.al. | 2407.18738 | null |
2024-07-26 | LLASP: Fine-tuning Large Language Models for Answer Set Programming | Erica Coppolillo et.al. | 2407.18723 | null |
2024-07-26 | Neurosymbolic AI for Enhancing Instructability in Generative AI | Amit Sheth et.al. | 2407.18722 | null |
2024-07-26 | Cluster-norm for Unsupervised Probing of Knowledge | Walter Laurito et.al. | 2407.18712 | link |
2024-07-26 | Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation | Esteban Garces Arias et.al. | 2407.18698 | link |
2024-07-26 | Collaborative Evolving Strategy for Automatic Data-Centric Development | Xu Yang et.al. | 2407.18690 | null |
2024-07-26 | The BIAS Detection Framework: Bias Detection in Word Embeddings and Language Models for European Languages | Alexandre Puttick et.al. | 2407.18689 | link |
2024-07-26 | Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift | Seongho Son et.al. | 2407.18676 | null |
2024-07-26 | Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models | Xiang Shi et.al. | 2407.18626 | link |
2024-07-25 | Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning | Tianduo Wang et.al. | 2407.18248 | link |
2024-07-25 | LoRA-Pro: Are Low-Rank Adapters Properly Optimized? | Zhengbo Wang et.al. | 2407.18242 | link |
2024-07-25 | Recursive Introspection: Teaching Language Model Agents How to Self-Improve | Yuxiao Qu et.al. | 2407.18219 | null |
2024-07-26 | Exploring Scaling Trends in LLM Robustness | Nikolaus Howe et.al. | 2407.18213 | null |
2024-07-25 | AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction | Chunan Liu et.al. | 2407.18184 | link |
2024-07-25 | Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning | Sindhura Kommu et.al. | 2407.18181 | null |
2024-07-25 | Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models | Sanae Lotfi et.al. | 2407.18158 | null |
2024-07-25 | Vlad Sobal et.al. | 2407.18134 | null | |
2024-07-25 | Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic | Fakhraddin Alwajih et.al. | 2407.18129 | null |
2024-07-25 | Efficient Inference of Vision Instruction-Following Models with Elastic Cache | Zuyan Liu et.al. | 2407.18121 | link |
2024-07-25 | Multi-Resolution Histopathology Patch Graphs for Ovarian Cancer Subtyping | Jack Breen et.al. | 2407.18105 | link |
2024-07-25 | Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow | Tian Guo et.al. | 2407.18103 | null |
2024-07-25 | PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization | Christopher Clarke et.al. | 2407.18078 | link |
2024-07-25 | C2P: Featuring Large Language Models with Causal Reasoning | Abdolmahdi Bagheri et.al. | 2407.18069 | null |
2024-07-25 | ComPeer: A Generative Conversational Agent for Proactive Peer Support | Tianjian Liu et.al. | 2407.18064 | link |
2024-07-25 | Audio Entailment: Assessing Deductive Reasoning for Audio Understanding | Soham Deshmukh et.al. | 2407.18062 | link |
2024-07-25 | Difficulty Estimation and Simplification of French Text Using LLMs | Henri Jamet et.al. | 2407.18061 | null |
2024-07-25 | The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation | Eric Yang et.al. | 2407.18044 | null |
2024-07-25 | RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models | Haoyu Chen et.al. | 2407.18035 | null |
2024-07-25 | GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy | Jan Batzner et.al. | 2407.18008 | null |
2024-07-24 | I Could've Asked That: Reformulating Unanswerable Questions | Wenting Zhao et.al. | 2407.17469 | link |
2024-07-24 | WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries | Wenting Zhao et.al. | 2407.17468 | null |
2024-07-24 | CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models | Jiawei Gu et.al. | 2407.17467 | null |
2024-07-24 | Yunhao Fang et.al. | 2407.17453 | null | |
2024-07-24 | Fluent Student-Teacher Redteaming | T. Ben Thompson et.al. | 2407.17447 | link |
2024-07-24 | Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? | Michael-Andrei Panaitescu-Liess et.al. | 2407.17417 | null |
2024-07-24 | (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork | Tianjin Huang et.al. | 2407.17412 | null |
2024-07-24 | Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models | Yida Zhao et.al. | 2407.17406 | link |
2024-07-24 | Grammar-based Game Description Generation using Large Language Models | Tsunehiko Tanaka et.al. | 2407.17404 | link |
2024-07-24 | 3D Question Answering for City Scene Understanding | Penglei Sun et.al. | 2407.17398 | null |
2024-07-24 | PERSONA: A Reproducible Testbed for Pluralistic Alignment | Louis Castricato et.al. | 2407.17387 | null |
2024-07-24 | A Comprehensive Approach to Misspelling Correction with BERT and Levenshtein Distance | Amirreza Naziri et.al. | 2407.17383 | null |
2024-07-24 | MMRA: A Benchmark for Multi-granularity Multi-image Relational Association | Siwei Wu et.al. | 2407.17379 | link |
2024-07-24 | ViPer: Visual Personalization of Generative Models via Individual Preference Learning | Sogand Salehi et.al. | 2407.17365 | null |
2024-07-24 | Gradient-based inference of abstract task representations for generalization in neural networks | Ali Hummos et.al. | 2407.17356 | null |
2024-07-24 | Scalify: scale propagation for efficient low-precision LLM training | Paul Balança et.al. | 2407.17353 | link |
2024-07-24 | Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching | Yuyang Ding et.al. | 2407.17349 | link |
2024-07-24 | DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation | Qian Feng et.al. | 2407.17348 | null |
2024-07-24 | Label Alignment and Reassignment with Generalist Large Language Model for Enhanced Cross-Domain Named Entity Recognition | Ke Bao et.al. | 2407.17344 | null |
2024-07-24 | How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations? | Leo Yu-Ho Lo et.al. | 2407.17291 | null |
2024-07-23 | PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects | Junyi Li et.al. | 2407.16696 | link |
2024-07-23 | Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack | Xiaoyue Xu et.al. | 2407.16695 | link |
2024-07-23 | Can Large Language Models Automatically Jailbreak GPT-4V? | Yuanwei Wu et.al. | 2407.16686 | null |
2024-07-23 | SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation | Pengfei Chen et.al. | 2407.16682 | null |
2024-07-23 | RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent | Huiyu Xu et.al. | 2407.16667 | null |
2024-07-23 | Course-Correction: Safety Alignment Using Synthetic Preferences | Rongwu Xu et.al. | 2407.16637 | link |
2024-07-23 | Lawma: The Power of Specialization for Legal Tasks | Ricardo Dominguez-Olmedo et.al. | 2407.16615 | null |
2024-07-23 | Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? | Jonathan Hayase et.al. | 2407.16607 | link |
2024-07-23 | Shared Imagination: LLMs Hallucinate Alike | Yilun Zhou et.al. | 2407.16604 | null |
2024-07-23 | A Comparative Study on Patient Language across Therapeutic Domains for Effective Patient Voice Classification in Online Health Discussions | Giorgos Lysandrou et.al. | 2407.16593 | null |
2024-07-23 | Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs | Yifan Xia et.al. | 2407.16576 | null |
2024-07-23 | TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback | Eunseop Yoon et.al. | 2407.16574 | link |
2024-07-23 | Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models | Ioana Buhnila et.al. | 2407.16565 | link |
2024-07-23 | Patched RTC: evaluating LLMs for diverse software development tasks | Asankhaya Sharma et.al. | 2407.16557 | link |
2024-07-24 | MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues | Liyun Zhang et.al. | 2407.16552 | null |
2024-07-23 | Quantifying the Role of Textual Predictability in Automatic Speech Recognition | Sean Robertson et.al. | 2407.16537 | null |
2024-07-23 | Imperfect Vision Encoders: Efficient and Robust Tuning for Vision-Language Models | Aristeidis Panos et.al. | 2407.16526 | null |
2024-07-23 | AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game | Yizhou Chi et.al. | 2407.16521 | null |
2024-07-23 | Language-Based Security for Low-Level MPC | Christian Skalka et.al. | 2407.16504 | null |
2024-07-23 | Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models | Kenza Benkirane et.al. | 2407.16470 | link |
2024-07-22 | AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description | Junyu Xie et.al. | 2407.15850 | link |
2024-07-22 | LLMmap: Fingerprinting For Large Language Models | Dario Pasquini et.al. | 2407.15847 | link |
2024-07-22 | SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models | Mingze Xu et.al. | 2407.15841 | link |
2024-07-22 | MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity | Yangzhou Liu et.al. | 2407.15838 | link |
2024-07-22 | dMel: Speech Tokenization made Simple | He Bai et.al. | 2407.15835 | null |
2024-07-22 | J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling | Wataru Nakata et.al. | 2407.15828 | null |
2024-07-22 | Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight | Ziyuan Huang et.al. | 2407.15819 | null |
2024-07-22 | Perceptions of Linguistic Uncertainty by Language Models and Humans | Catarina G Belem et.al. | 2407.15814 | link |
2024-07-22 | AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection | Yunkang Cao et.al. | 2407.15795 | link |
2024-07-22 | CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning | Emanuele Frascaroli et.al. | 2407.15793 | link |
2024-07-22 | Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach | Rian Dolphin et.al. | 2407.15788 | null |
2024-07-22 | Concept-Based Interpretable Reinforcement Learning with Limited to No Human Labels | Zhuorui Ye et.al. | 2407.15786 | null |
2024-07-22 | Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning | Kaiwen Wang et.al. | 2407.15762 | null |
2024-07-22 | MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation | Marco Simoni et.al. | 2407.15748 | null |
2024-07-22 | OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context | Steffen Kleinle et.al. | 2407.15736 | null |
2024-07-22 | TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON | John Chong Min Tan et.al. | 2407.15734 | link |
2024-07-22 | Zero-Shot Embeddings Inform Learning and Forgetting with Vision-Language Encoders | Laura Niss et.al. | 2407.15731 | null |
2024-07-22 | SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection | Dimitrios Kollias et.al. | 2407.15728 | null |
2024-07-22 | DStruct2Design: Data and Benchmarks for Data Structure Driven Generative Floor Plan Design | Zhi Hao Luo et.al. | 2407.15723 | link |
2024-07-22 | Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability | Zhuoyan Xu et.al. | 2407.15720 | link |
2024-07-19 | Internal Consistency and Self-Feedback in Large Language Models: A Survey | Xun Liang et.al. | 2407.14507 | link |
2024-07-19 | On Pre-training of Multimodal Language Models Customized for Chart Understanding | Wan-Cyuan Fan et.al. | 2407.14506 | null |
2024-07-19 | PD-TPE: Parallel Decoder with Text-guided Position Encoding for 3D Visual Grounding | Chenshu Hou et.al. | 2407.14491 | null |
2024-07-19 | Evaluating the Reliability of Self-Explanations in Large Language Models | Korbinian Randl et.al. | 2407.14487 | link |
2024-07-19 | Data-Centric Human Preference Optimization with Rationales | Hoang Anh Just et.al. | 2407.14477 | link |
2024-07-19 | Contrastive Learning with Counterfactual Explanations for Radiology Report Generation | Mingjie Li et.al. | 2407.14474 | null |
2024-07-19 | Check-Eval: A Checklist-based Approach for Evaluating Text Quality | Jayr Pereira et.al. | 2407.14467 | null |
2024-07-19 | Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier | Zachary Wojtowicz et.al. | 2407.14452 | null |
2024-07-19 | Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding | Renshan Zhang et.al. | 2407.14439 | link |
2024-07-19 | Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders | Senthooran Rajamanoharan et.al. | 2407.14435 | null |
2024-07-19 | Mixture of Experts with Mixture of Precisions for Tuning Quality of Service | HamidReza Imani et.al. | 2407.14417 | null |
2024-07-19 | System-1.x: Learning to Balance Fast and Slow Planning with Language Models | Swarnadeep Saha et.al. | 2407.14414 | link |
2024-07-19 | DEAL: Disentangle and Localize Concept-level Explanations for VLMs | Tang Li et.al. | 2407.14412 | link |
2024-07-19 | The Vision of Autonomic Computing: Can LLMs Make It a Reality? | Zhiyang Zhang et.al. | 2407.14402 | null |
2024-07-19 | Frontiers of Deep Learning: From Novel Application to Real-World Deployment | Rui Xie et.al. | 2407.14386 | null |
2024-07-19 | Open Artificial Knowledge | Vadim Borisov et.al. | 2407.14371 | null |
2024-07-19 | Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models | Xuenan Xu et.al. | 2407.14355 | link |
2024-07-19 | Improving Retrieval in Sponsored Search by Leveraging Query Context Signals | Akash Kumar Mohankumar et.al. | 2407.14346 | null |
2024-07-19 | LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains | Raphael Hernandes et.al. | 2407.14344 | null |
2024-07-19 | Multimodal Misinformation Detection using Large Vision-Language Models | Sahar Tahmasebi et.al. | 2407.14321 | null |
2024-07-18 | Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data | Charles Jin et.al. | 2407.13765 | null |
2024-07-18 | SegPoint: Segment Any Point Cloud via Large Language Model | Shuting He et.al. | 2407.13761 | null |
2024-07-18 | Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models | Zhuo Chen et.al. | 2407.13757 | null |
2024-07-18 | CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications | Mirza Masfiqur Rahman et.al. | 2407.13742 | null |
2024-07-18 | Baba Is AI: Break the Rules to Beat the Benchmark | Nathan Cloos et.al. | 2407.13729 | null |
2024-07-18 | CoDefeater: Using LLMs To Find Defeaters in Assurance Cases | Usman Gohar et.al. | 2407.13717 | link |
2024-07-18 | Understanding Reference Policies in Direct Preference Optimization | Yixin Liu et.al. | 2407.13709 | link |
2024-07-18 | A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice | Shaina Raza et.al. | 2407.13699 | null |
2024-07-18 | Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation | Yotam Perlitz et.al. | 2407.13696 | link |
2024-07-18 | Prover-Verifier Games improve legibility of LLM outputs | Jan Hendrik Kirchner et.al. | 2407.13692 | null |
2024-07-18 | Shaded Route Planning Using Active Segmentation and Identification of Satellite Images | Longchao Da et.al. | 2407.13689 | null |
2024-07-18 | FuLG: 150B Romanian Corpus for Language Model Pretraining | Vlad-Andrei Bădoiu et.al. | 2407.13657 | null |
2024-07-18 | COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization | Skyler Grandel et.al. | 2407.13648 | null |
2024-07-18 | Weak-to-Strong Reasoning | Yuqing Yang et.al. | 2407.13647 | link |
2024-07-18 | Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies | Chaofan Tao et.al. | 2407.13623 | link |
2024-07-18 | KNOWNET: Guided Health Information Seeking from LLMs via Knowledge Graph Integration | Youfu Yan et.al. | 2407.13598 | null |
2024-07-18 | PLANTS: A Novel Problem and Dataset for Summarization of Planning-Like (PL) Tasks | Vishal Pallagani et.al. | 2407.13597 | null |
2024-07-18 | EarthMarker: A Visual Prompt Learning Framework for Region-level and Point-level Remote Sensing Imagery Comprehension | Wei Zhang et.al. | 2407.13596 | link |
2024-07-18 | Robust Calibration of Large Vision-Language Adapters | Balamurali Murugesan et.al. | 2407.13588 | link |
2024-07-18 | Towards Zero-Shot Multimodal Machine Translation | Matthieu Futeral et.al. | 2407.13579 | link |
2024-07-17 | LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models | Kaichen Zhang et.al. | 2407.12772 | link |
2024-07-17 | EchoSight: Advancing Visual-Language Models with Wiki Knowledge | Yibin Yan et.al. | 2407.12735 | null |
2024-07-17 | NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model | Zhongqun Zhang et.al. | 2407.12727 | null |
2024-07-17 | Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? | Ben Yao et.al. | 2407.12725 | null |
2024-07-17 | The Future of Learning: Large Language Models through the Lens of Students | He Zhang et.al. | 2407.12723 | null |
2024-07-17 | MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models | Leyang Shen et.al. | 2407.12709 | link |
2024-07-17 | Subgraph-Aware Training of Text-based Methods for Knowledge Graph Completion | Youmin Ko et.al. | 2407.12703 | link |
2024-07-17 | Patch-Level Training for Large Language Models | Chenze Shao et.al. | 2407.12665 | link |
2024-07-17 | Zero-shot Text-guided Infinite Image Synthesis with LLM guidance | Soyeong Kwon et.al. | 2407.12642 | null |
2024-07-17 | Domain-specific or Uncertainty-aware models: Does it really make a difference for biomedical text classification? | Aman Sinha et.al. | 2407.12626 | null |
2024-07-17 | Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences | Claudio Pinhanez et.al. | 2407.12620 | null |
2024-07-17 | AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism | William Brannon et.al. | 2407.12613 | link |
2024-07-17 | VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding | Ofir Abramovich et.al. | 2407.12594 | null |
2024-07-18 | Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks | Antoni Kowalczuk et.al. | 2407.12588 | **[link](https://github.com/la |