Skip to content

xy-guo/github_bot_3d_papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Daily Updates on 3D-Related Papers

This repository automatically fetches new or updated arXiv papers in the [cs.CV] category every day, checks if they are relevant to "3D reconstruction" or "3D generation" via ChatGPT, and lists them below.

How It Works

  1. A GitHub Actions workflow runs daily at 09:00 UTC.
  2. It uses the script fetch_cv_3d_papers.py to:
    • Retrieve the latest arXiv papers in cs.CV.
    • Use ChatGPT to filter out those related to 3D reconstruction/generation.
    • Update this README.md with the new findings.
    • Send an email via 163 Mail if any relevant papers are found.

Paper List

Arxiv 2025-02-07

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.03901 LeAP: Consistent multi-domain 3D labeling using Foundation Models
[{'name': 'Simon Gebraad, Andras Palffy, Holger Caesar'}]
3D Semantic Understanding 3D语义理解 v2
3D semantic labeling
Bayesian update
Vision Foundation Models
Input: Unlabeled image-pointcloud pairs 输入: 未标记的图像-点云对
Step1: Generate soft 2D labels using Vision Foundation Models 步骤1: 使用视觉基础模型生成软2D标签
Step2: Apply Bayesian updating to obtain 3D pseudo-labels 步骤2: 应用贝叶斯更新以获得3D伪标签
Step3: Use 3D Consistency Network to improve label quality 步骤3: 使用3D一致性网络提高标签质量
Output: High-quality 3D semantic labels 输出: 高质量的3D语义标签
9.5 [9.5] 2502.04318 sshELF: Single-Shot Hierarchical Extrapolation of Latent Features for 3D Reconstruction from Sparse-Views
[{'name': 'Eyvaz Najafli, Marius K\"astingsch\"afer, Sebastian Bernhard, Thomas Brox, Andreas Geiger'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
sparse views
latent features
Input: Sparse view images 稀疏视图图像
Step1: Generate intermediate virtual views 生成中间虚拟视图
Step2: Decode Gaussian primitives 解码高斯原语
Step3: Render novel views 渲染新视图
Output: 360-degree reconstructed scene 360度重建场景
9.0 [9.0] 2502.04139 Beyond the Final Layer: Hierarchical Query Fusion Transformer with Agent-Interpolation Initialization for 3D Instance Segmentation
[{'name': 'Jiahao Lu, Jiacheng Deng, Tianzhu Zhang'}]
3D Instance Segmentation 3D实例分割 v2
3D instance segmentation
transformer-based methods
Input: Scene point cloud input 场景点云输入
Step1: Query initialization 查询初始化
Step2: Hierarchical query fusion 层次查询融合
Step3: Instance segmentation 实例分割
Output: Binary foreground masks with semantic labels 输出:带语义标签的二元前景掩码
8.5 [8.5] 2502.03510 Mapping and Localization Using LiDAR Fiducial Markers
[{'name': 'Yibo Liu'}]
Mapping and Localization 映射与定位 v2
LiDAR
fiducial markers
mapping
localization
Input: LiDAR sensors and fiducial markers
Step1: Development of Intensity Image-based LiDAR Fiducial Marker system
Step2: Detection of 3D fiducials from intensity images
Step3: Algorithm enhancement for 3D map merging and localization
Output: Optimized mapping and localization using LFMs
8.5 [8.5] 2502.03628 The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
[{'name': 'Zhuowei Li, Haizhou Shi, Yunhe Gao, Di Liu, Zhenting Wang, Yuxiao Chen, Ting Liu, Long Zhao, Hao Wang, Dimitris N. Metaxas'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
hallucination
VISTA
multimodal learning
Input: Visual tokens from large Vision-Language Models (LVLMs) 视觉令牌来自大型视觉-语言模型
Step1: Analyze token logits ranking 分析令牌的对数排名
Step2: Identify visual information loss 识别视觉信息损失
Step3: Propose VISTA framework 提出VISTA框架
Output: Enhanced decoding with reduced hallucination 输出:减少幻觉的增强解码
8.5 [8.5] 2502.03639 Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach
[{'name': 'Yunuo Chen, Junli Cao, Anil Kag, Vidit Goel, Sergei Korolev, Chenfanfu Jiang, Sergey Tulyakov, Jian Ren'}]
Image and Video Generation 图像生成与视频生成 v2
Video Generation 视频生成
3D Point Regularization 3D点正则化
Diffusion Models 扩散模型
Input: 2D videos with 3D point trajectories 2D视频与3D点轨迹
Step1: Data augmentation 数据增强
Step2: Model fine-tuning 模型微调
Step3: Regularization of shape and motion 形状与运动的正则化
Output: Enhanced video quality 改进的视频质量
8.5 [8.5] 2502.03836 Adapting Human Mesh Recovery with Vision-Language Feedback
[{'name': 'Chongyang Xu, Buzhen Huang, Chengfang Zhang, Ziliang Feng, Yangang Wang'}]
3D Reconstruction and Modeling 三维重建 v2
human mesh recovery
vision-language models
3D reconstruction
diffusion-based framework
Input: Monocular images 单目图像
Step1: Initial pose prediction using a regression model 初始姿态预测
Step2: 2D keypoints extraction from images 从图像中提取2D关键点
Step3: Integration of vision-language descriptions 结合视觉语言描述
Step4: Refinement of 3D mesh using diffusion modeling 使用扩散模型优化3D网格
Output: Enhanced 3D human mesh 改进的3D人类网格
8.5 [8.5] 2502.03877 Advanced Object Detection and Pose Estimation with Hybrid Task Cascade and High-Resolution Networks
[{'name': 'Yuhui Jin, Yaqiong Zhang, Zheyuan Xu, Wenqing Zhang, Jingyu Xu'}]
6D Object Detection and Pose Estimation 6D对象检测与姿态估计 v2
6D object detection
pose estimation
Hybrid Task Cascade
High-Resolution Network
Input: 6D object detection data 6D对象检测数据
Step1: Hybrid Task Cascade integration 集成混合任务级联
Step2: High-Resolution Network backbone usage 使用高分辨率网络骨干
Step3: Advanced post-processing techniques 先进的后处理技术
Output: Improved object detection and pose estimation models 改进的对象检测和姿态估计模型
8.5 [8.5] 2502.04111 Adaptive Margin Contrastive Learning for Ambiguity-aware 3D Semantic Segmentation
[{'name': 'Yang Chen, Yueqi Duan, Runzhong Zhang, Yap-Peng Tan'}]
3D Reconstruction and Modeling 三维重建 v2
3D Semantic Segmentation
Point Cloud Processing
Contrastive Learning
Input: 3D point cloud 数据集
Step1: Ambiguity estimation based on position embeddings 基于位置嵌入的模糊性估计
Step2: Development of adaptive margin contrastive learning algorithm 自适应边际对比学习算法开发
Step3: Evaluation on large-scale datasets 在大规模数据集上进行评估
Output: Improved semantic segmentation results 改进的语义分割结果
8.5 [8.5] 2502.04293 GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation
[{'name': 'Weihang Li, Hongli Xu, Junwen Huang, Hyunjun Jung, Peter KT Yu, Nassir Navab, Benjamin Busam'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
semantic shape
pose estimation
Input: Partial RGB-D observations 具有部分可见性的RGB-D观测
Step1: Semantic Shape Reconstruction (SSR) 语义形状重建
Step2: Global Context Enhanced (GCE) feature fusion module 全球上下文增强特征融合模块
Output: Enhanced object poses 改进的物体姿态
8.5 [8.5] 2502.04329 SMART: Advancing Scalable Map Priors for Driving Topology Reasoning
[{'name': 'Junjie Ye, David Paz, Hengyuan Zhang, Yuliang Guo, Xinyu Huang, Henrik I. Christensen, Yue Wang, Liu Ren'}]
Autonomous Systems and Robotics 自动驾驶 v2
autonomous driving
lane topology reasoning
Input: Standard-definition (SD) and satellite maps 标准清晰度和卫星地图
Step 1: Train map prior model to infer lane graphs 训练地图先验模型以推断车道图
Step 2: Integrate model with online topology reasoning models 将模型与在线拓扑推理模型集成
Output: Enhanced lane topology understanding 改进的车道拓扑理解
7.5 [7.5] 2502.03813 Optimized Unet with Attention Mechanism for Multi-Scale Semantic Segmentation
[{'name': 'Xuan Li, Quanchao Lu, Yankaiqi Li, Muqing Li, Yijiashun Qi'}]
Image Generation 图像生成 v2
semantic segmentation
attention mechanism
autonomous driving
Input: Multi-scale images 多尺度图像
Step1: Implement attention mechanism 实施注意力机制
Step2: Optimize Unet architecture 优化Unet架构
Step3: Evaluate on Cityscapes dataset 在Cityscapes数据集上评估
Output: Improved segmentation results 改进的分割结果
7.5 [7.5] 2502.04244 An object detection approach for lane change and overtake detection from motion profiles
[{'name': 'Andrea Benericetti, Niccol\`o Bellaccini, Henrique Pi\~neiro Monteagudo, Matteo Simoncini, Francesco Sambo'}]
Autonomous Driving 自动驾驶 v2
object detection
lane change
ADAS
motion profiles
autonomous driving
Input: Motion profile images 运动轮廓图像
Step1: Dataset creation 数据集创建
Step2: Object detection model development 目标检测模型开发
Step3: Performance evaluation 性能评估
Output: Detection of lane change and overtake maneuvers 车道变换和超车动作检测

Arxiv 2025-02-06

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.02936 Every Angle Is Worth A Second Glance: Mining Kinematic Skeletal Structures from Multi-view Joint Cloud
[{'name': 'Junkun Jiang, Jie Chen, Ho Yin Au, Mingyuan Chen, Wei Xue, Yike Guo'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Joint Cloud
multi-view motion capture
Input: Multi-view images 多视角图像
Step1: Triangulate 2D joints into Joint Cloud 将2D关节三角测量为联合云
Step2: Process using JCSAT to explore correlations 使用JCSAT处理以探索相关性
Step3: Utilize OTAP for feature selection 使用OTAP进行特征选择
Output: 3D motion estimation 3D运动估计
9.5 [9.5] 2502.03449 Dress-1-to-3: Single Image to Simulation-Ready 3D Outfit with Diffusion Prior and Differentiable Physics
[{'name': 'Xuan Li, Chang Yu, Wenxin Du, Ying Jiang, Tianyi Xie, Yunuo Chen, Yin Yang, Chenfanfu Jiang'}]
3D Reconstruction 三维重建 v2
3D reconstruction
garment generation
multi-view images
simulation-ready
Input: In-the-wild image 单张图像
Step1: Pre-trained image-to-sewing pattern generation model 预训练的图像到缝制模式生成模型
Step2: Multi-view diffusion model for producing images 多视角扩散模型用于生成图像
Step3: Refinement using a differentiable garment simulator differentiable garment simulator 进行细化
Output: Simulation-ready 3D garment 适合模拟的三维服装
8.5 [8.5] 2502.02907 PoleStack: Robust Pole Estimation of Irregular Objects from Silhouette Stacking
[{'name': 'Jacopo Villa, Jay W. McMahon, Issa A. D. Nesnas'}]
3D Reconstruction and Modeling 三维重建 v2
3D pole estimation
silhouette stacking
Input: Silhouette images from multiple camera poses 多个相机视角的轮廓图像
Step1: Create a silhouette-stack image 创建轮廓堆叠图像
Step2: Apply Discrete Fourier Transform to enhance robustness 应用离散傅里叶变换以增强鲁棒性
Step3: Estimate 3D pole orientation using projected-pole measurements 使用投影极坐标测量来估计3D极坐标方向
Output: Accurate pole orientation estimation 准确的极坐标方向估计
8.5 [8.5] 2502.02977 Disentangling CLIP Features for Enhanced Localized Understanding
[{'name': 'Samyak Rawelekar, Yujun Cai, Yiwei Wang, Ming-Hsuan Yang, Narendra Ahuja'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
mutual feature information (MFI)
vision-language models (VLM)
multi-label recognition (MLR)
Input: CLIP features from vision-language models 视觉语言模型中的CLIP特征
Step1: Analyze feature correlation 分析特征相关性
Step2: Implement MFI loss 施加MFI损失
Step3: Align text and image features 对齐文本和图像特征
Output: Improved localized understanding 改进的局部理解
8.5 [8.5] 2502.03005 Driver Assistance System Based on Multimodal Data Hazard Detection
[{'name': 'Long Zhouxiang, Ovanes Petrosian'}]
Autonomous Driving 自动驾驶 v2
multimodal data
hazard detection
autonomous driving
incident recognition
Input: Multimodal data (video, audio) 输入:多模态数据(视频、音频)
Step1: Data integration 数据集成
Step2: Attention-based fusion strategy 基于注意力的融合策略
Step3: Incident recognition incidents 事件识别
Output: Enhanced detection accuracy 改进的检测精度
8.5 [8.5] 2502.03465 Seeing World Dynamics in a Nutshell
[{'name': 'Qiuhong Shen, Xuanyu Yi, Mingbao Lin, Hanwang Zhang, Shuicheng Yan, Xinchao Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D representation
Monocular video
Dynamic Gaussian Splatting
Input: Monocular videos 单目视频
Step1: Transform videos to dynamic Gaussian representations 将视频转换为动态高斯表示
Step2: Introduce STAG representation 引入结构化时空对齐高斯表示
Step3: Optimizing for spatial and temporal coherence 进行空间和时间一致性的优化
Output: High-fidelity video reconstruction and spatial-temporal modeling 高保真视频重建和时空建模
7.5 [7.5] 2502.02951 VQA-Levels: A Hierarchical Approach for Classifying Questions in VQA
[{'name': 'Madhuri Latha Madaka, Chakravarthy Bhagvati'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Visual Question Answering
VQA dataset
Hierarchical questions
Input: Visual content and questions 视觉内容和问题
Step1: Dataset development 数据集开发
Step2: Classification of questions 问题分类
Step3: Initial testing on VQA systems 在VQA系统上的初步测试
Output: VQA-Levels dataset VQA-Levels数据集

Arxiv 2025-02-05

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.01666 Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding
[{'name': 'Jingming Xia, Guanqun Cao, Guang Ma, Yiben Luo, Qinzhao Li, John Oyekan'}]
Depth Estimation 深度估计 v2
monocular depth estimation
3D reconstruction
generative models
autonomous driving
Input: RGB image
Step1: Extract latent features using Image Encoder
Step2: Extract semantic vector through Image Semantic Encoder
Step3: Integrate features within a denoising UNet
Step4: Generate final metric depth map
Output: Enhanced depth prediction
9.5 [9.5] 2502.01846 UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping
[{'name': 'Aashish Rai, Dilin Wang, Mihir Jain, Nikolaos Sarafianos, Arthur Chen, Srinath Sridhar, Aayush Prakash'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
diffusion models
3D generation
structured representation
Input: 3D Gaussian Splatting data 3D高斯点云数据
Step1: Spherical mapping to transform data into structured 2D representation 使用球面映射将数据转换为结构化2D表示
Step2: Multi-branch network for feature compression 使用多分支网络进行特征压缩
Step3: Integration with existing 2D models with zero-shot learning 将其与现有的2D模型进行无缝整合
Output: Structured 3D representation ready for generative tasks 输出:准备好用于生成任务的结构化3D表示
9.5 [9.5] 2502.01855 Learning Fine-to-Coarse Cuboid Shape Abstraction
[{'name': 'Gregor Kobsik, Morten Henkel, Yanjiang He, Victor Czech, Tim Elsner, Isaak Lim, Leif Kobbelt'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
shape abstraction
cuboids
unsupervised learning
structural analysis
Input: Collections of 3D shapes 3D形状集
Step1: Initialize with fine reconstruction to capture details 细致重建以捕获细节
Step2: Gradually reduce primitives while optimizing loss 渐进减少原始体并优化损失
Step3: Evaluate performance on shape benchmarks 在形状基准上评估性能
Output: Compact cuboid-based representations 紧凑的立方体表示
9.5 [9.5] 2502.01856 Reliability-Driven LiDAR-Camera Fusion for Robust 3D Object Detection
[{'name': 'Reza Sadeghian, Niloofar Hooshyaripour, Chris Joslin, WonSook Lee'}]
3D Object Detection 三维物体检测 v2
LiDAR-camera fusion
3D object detection
autonomous driving
Input: LiDAR and camera data 数据
Step1: Spatio-Temporal Feature Aggregation (STFA) module processes input 提取时空特征
Step2: Reliability module assigns confidence scores 可靠性模块自信度评分
Step3: Confidence-Weighted Mutual Cross-Attention (CW-MCA) module balances information with confidence 用置信度动态平衡信息
Output: Enhanced 3D object detection 改进的三维物体检测
9.5 [9.5] 2502.01896 INTACT: Inducing Noise Tolerance through Adversarial Curriculum Training for LiDAR-based Safety-Critical Perception and Autonomy
[{'name': 'Nastaran Darabi, Divake Kumar, Sina Tayebati, Amit Ranjan Trivedi'}]
3D Perception and Modeling 3D 感知与建模 v2
LiDAR
3D perception
object detection
Input: Noisy LiDAR data 噪声激光雷达数据
Step 1: Meta-learning phase 迁移学习阶段
Step 2: Generate robust saliency maps 生成健壮的显著性图
Step 3: Adversarial curriculum training 对抗性课程训练
Output: Enhanced noise resilience 提升噪声鲁棒性
9.5 [9.5] 2502.02163 Progressive Correspondence Regenerator for Robust 3D Registration
[{'name': 'Guiyu Zhao, Sheng Ao, Ye Zhang, Kai Xu Yulan Guo'}]
3D Registration 3D配准 v2
3D registration
point cloud
outlier removal
reconstruction
robustness
Input: Point cloud data 点云数据
Step1: Prior-guided local grouping using generalized mutual matching 先验引导的局部分组与互匹配
Step2: Local correspondence correction using center-aware three-point consistency 局部对应关系修正
Step3: Global correspondence refinement using extensive iterations 全局对应关系的细化
Output: High-quality point correspondences 高质量的点对应关系
9.5 [9.5] 2502.02187 ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion
[{'name': 'Nissim Maruani, Wang Yifan, Matthew Fisher, Pierre Alliez, Mathieu Desbrun'}]
3D Generation 三维生成 v2
3D Generation 3D生成
Shape Variations 形状变体
Input: Reference 3D model 参考3D模型
Step1: Sparse voxel grid and point sampling 稀疏体素网格和点采样
Step2: Multiscale neural architecture training 多尺度神经架构训练
Step3: Generate shape variations 生成形状变体
Output: High-quality 3D shapes 高质量3D形状
9.5 [9.5] 2502.02247 Rotation-Adaptive Point Cloud Domain Generalization via Intricate Orientation Learning
[{'name': 'Bangzhen Liu, Chenxi Zheng, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Shengfeng He'}]
3D Reconstruction and Modeling 三维重建 v2
3D point cloud analysis 3D点云分析
domain generalization 域推广
rotation robustness 旋转鲁棒性
Input: 3D point clouds 3D点云
Step 1: Identify challenging rotations 识别具有挑战性的旋转
Step 2: Construct intricate orientation set 构建复杂方向集
Step 3: Utilize contrastive learning against orientations 使用对比学习进行方向建模
Output: Generalizable features with rotation consistency 输出: 具有旋转一致性的可泛化特征
9.5 [9.5] 2502.02283 GP-GS: Gaussian Processes for Enhanced Gaussian Splatting
[{'name': 'Zhihao Guo, Jingxuan Su, Shenglin Wang, Jinlong Fan, Jing Zhang, Liangxiu Han, Peng Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
Structure-from-Motion
point clouds
novel view synthesis
Input: Sparse SfM point clouds 稀疏结构光点云
Step1: Dynamic sampling dynamic sampling 动态采样
Step2: Gaussian Process modeling 高斯过程建模
Step3: Densification of point clouds 点云稠密化
Output: Enhanced 3D Gaussian representation 改进的3D高斯表示
9.5 [9.5] 2502.02334 Event-aided Semantic Scene Completion
[{'name': 'Shangwei Guo, Hao Shi, Song Wang, Xiaoting Yin, Kailun Yang, Kaiwei Wang'}]
3D Reconstruction and Modeling 三维重建 v2
Semantic Scene Completion
3D Reconstruction
Input: Multi-view images 多视角图像
Step1: Data integration 数据集成
Step2: Algorithm development 算法开发
Step3: Model evaluation 模型评估
Output: Enhanced 3D models 改进的三维模型
9.5 [9.5] 2502.02338 Geometric Neural Process Fields
[{'name': 'Wenzhe Yin, Zehao Xiao, Jiayi Shen, Yunlu Chen, Cees G. M. Snoek, Jan-Jakob Sonke, Efstratios Gavves'}]
Neural Rendering 神经渲染 v2
Neural Radiance Fields
3D scenes
probabilistic modeling
Input: Limited context images 限制的上下文图像
Step1: Probabilistic modeling 概率建模
Step2: Integrate geometric bases 集成几何基底
Step3: Hierarchical latent variable design 分层潜变量设计
Output: Improved generalization 改进的泛化能力
9.5 [9.5] 2502.02372 MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning
[{'name': 'Shengbo Gu, Yu-Kun Qiu, Yu-Ming Tang, Ancong Wu, Wei-Shi Zheng'}]
Neural Rendering 神经渲染 v2
Neural Radiance Fields
avatar generation
continual learning
Input: Image data of avatars 头像图像数据
Step1: Implement continual learning strategy 进行持续学习策略
Step2: Develop Global-Local Joint Storage Module 开发全局-局部联合存储模块
Step3: Develop Pose Distillation Module 开发姿态提炼模块
Output: Maintainable virtual avatar 可维护虚拟头像
9.5 [9.5] 2502.02548 Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation
[{'name': 'Junha Lee, Chunghyun Park, Jaesung Choe, Yu-Chiang Frank Wang, Jan Kautz, Minsu Cho, Chris Choy'}]
3D Segmentation 三维分割 v2
3D segmentation
open-vocabulary
Vision-Language Models
Input: Multi-view images 多视角图像
Step1: Data generation 数据生成
Step2: Data annotation 数据注释
Step3: Training model 训练模型
Output: Open-vocabulary segmentation model 开放词汇分割模型
9.5 [9.5] 2502.02590 Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling
[{'name': 'Xiaowen Qiu, Jincheng Yang, Yian Wang, Zhehuan Chen, Yufei Wang, Tsun-Hsuan Wang, Zhou Xian, Chuang Gan'}]
3D Reconstruction and Modeling 三维重建 v2
3D articulated objects
Vision-Language Models
3D modeling
Input: 3D meshes 3D 网格
Step1: Movable Part Segmentation 可动部分分割
Step2: Articulation Estimation 关节估计
Step3: Refinement 精化
Output: Articulated 3D objects 装配式三维物体
9.2 [9.2] 2502.01940 Toward a Low-Cost Perception System in Autonomous Vehicles: A Spectrum Learning Approach
[{'name': 'Mohammed Alsakabi, Aidan Erickson, John M. Dolan, Ozan K. Tonguz'}]
Autonomous Driving 自动驾驶 v2
3D reconstruction
autonomous driving
depth maps
Input: Images from 4D radar detectors and RGB cameras 4D 雷达探测器和 RGB 摄像头的图像
Step1: Integrate radar depth maps and RGB images 集成雷达深度图和 RGB 图像
Step2: Apply pixel positional encoding algorithm 应用像素位置信息编码算法
Step3: Develop spectrum estimation algorithms 研发光谱估计算法
Step4: Train depth map generative models 训练深度图生成模型
Output: Enhanced depth maps 改进的深度图
9.2 [9.2] 2502.02144 DOC-Depth: A novel approach for dense depth ground truth generation
[{'name': 'Simon de Moreau, Mathias Corsia, Hassan Bouchiba, Yasser Almehio, Andrei Bursuc, Hafid El-Idrissi, Fabien Moutarde'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D Reconstruction 三维重建
Dense Depth Generation 密集深度生成
LiDAR 激光雷达
Input: LiDAR sensor data 利用激光雷达传感器数据
Step1: 3D environment reconstruction 3D环境重建
Step2: Dynamic object classification 动态对象分类
Step3: Dense depth generation 密集深度生成
Output: Dense depth annotation output 输出:密集深度标注
8.5 [8.5] 2502.01814 PolyhedronNet: Representation Learning for Polyhedra with Surface-attributed Graph
[{'name': 'Dazhou Yu, Genpei Zhang, Liang Zhao'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
polyhedral representation
surface-attributed graph
Input: Polyhedral data 多面体数据
Step1: Decompose into local rigid representations 将其分解为局部刚性表示
Step2: Hierarchical aggregation of representations 层次聚合表示
Output: Global representation of polyhedra 全球多面体表示
8.5 [8.5] 2502.01894 SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset
[{'name': 'Goodarz Mehr, Azim Eskandarian'}]
Autonomous Systems and Robotics 自主系统与机器人 v2
Synthetic Data Generation 合成数据生成
Autonomous Driving 自动驾驶
BEV Representation 鸟瞰视图表示
Input: Multi-sensor data collection 多传感器数据收集
Step1: Configuration of synthetic data generation 生成合成数据的配置
Step2: Data generation for BEV representation 生成鸟瞰视图表示的数据
Step3: Annotation of perception data 性能数据的标注
Output: SimBEV dataset with annotated driving scenarios 输出: 包含标注的驾驶场景的SimBEV数据集
8.5 [8.5] 2502.01949 LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation
[{'name': 'Yang Zhou, Zongjin He, Qixuan Li, Chao Wang'}]
3D Generation 三维生成 3D scene generation
physically consistent layouts
text-guided generation
Input: Text prompt 文本提示
Step1: Convert text to scene graph 将文本转换为场景图
Step2: Adjust Gaussian densities and layouts 调整高斯密度和布局
Step3: Make dynamic camera adjustments 进行动态相机调整
Output: 3D compositional scene generation 3D 组合场景生成
8.5 [8.5] 2502.01961 Hierarchical Consensus Network for Multiview Feature Learning
[{'name': 'Chengwei Xia, Chaoxi Niu, Kun Zhan'}]
Multi-view and Stereo Vision 多视角立体 v2
multiview feature learning
hierarchical consensus
3D reconstruction
Input: Multi-view images 多视角图像
Step1: Learning view-consistency features 学习视图一致性特征
Step2: Hierarchical consensus derivation 层次共识推导
Step3: Comprehensive feature extraction 综合特征提取
Output: Discriminative features 具有区分性的特征
8.5 [8.5] 2502.02091 Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation
[{'name': 'JooHyun Kwon, Hanbyel Cho, Junmo Kim'}]
Image and Video Generation 图像生成 v2
4D Gaussian Splatting
dynamic scene editing
computer vision
motion artifacts
Input: 4D dynamic scene data 4D动态场景数据
Step1: Model static 3D Gaussians 模型静态三维高斯
Step2: Implement Hexplane-based deformation field 实现基于Hexplane的变形场
Step3: Perform editing on static 3D Gaussians 在静态三维高斯上执行编辑
Step4: Apply score distillation for refinement 应用得分蒸馏进行细化
Output: Enhanced edited dynamic scenes 改进的编辑动态场景
8.5 [8.5] 2502.02322 Improving Generalization Ability for 3D Object Detection by Learning Sparsity-invariant Features
[{'name': 'Hsin-Cheng Lu, Chung-Yi Lin, Winston H. Hsu'}]
3D Object Detection 3D物体检测 v2
3D object detection 3D物体检测
autonomous driving 自动驾驶
generalization 泛化
Input: Source domain 3D point clouds 源域3D点云
Step1: Downsample the point cloud based on confidence scores 根据置信度得分下采样点云
Step2: Teacher-student framework to align BEV features 使用师生框架对齐鸟瞰视图特征
Step3: Apply FCA and GERA to maintain consistency 使用FCA和GERA保持一致性
Output: Domain-agnostic 3D object detector 域无关的3D物体检测器
8.5 [8.5] 2502.02468 High-Fidelity Human Avatars from Laptop Webcams using Edge Compute
[{'name': 'Akash Haridas, Imran N. Junejo'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D Morphable Models 3D可变形模型
Photo-realistic Rendering 照相真实渲染
Avatar Generation 头像生成
Input: Images from consumer-grade laptop webcams 笔记本电脑网络摄像头拍摄的图像
Step1: Shape generation by fitting 3DMM shape parameters 通过拟合3D形状模型参数生成形状
Step2: Texture map generation 纹理图生成
Step3: Rendering using pre-defined parameters 使用预定义参数进行渲染
Output: High-fidelity animatable avatars 高保真可动画化头像
8.5 [8.5] 2502.02537 Uncertainty Quantification for Collaborative Object Detection Under Adversarial Attacks
[{'name': 'Huiqun Huang, Cong Chen, Jean-Philippe Monteuuis, Jonathan Petit, Fei Miao'}]
Autonomous Systems and Robotics 自动驾驶 v2
Collaborative Object Detection
Uncertainty Quantification
Adversarial Attacks
Autonomous Driving
Input: Collaborative Object Detection (COD) models 协作目标检测模型
Step1: Apply adversarial training adversarially during collaboration 在协作中施加对抗性训练
Step2: Provide output uncertainty estimation through learning-based module 提供基于学习的模块输出的不确定性估计
Step3: Calibrate uncertainty using conformal prediction 对不确定性进行校准
Output: Enhanced object detection accuracy 提高的目标检测准确性
7.5 [7.5] 2502.01906 Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models
[{'name': 'Chia-Wen Kuo, Sijie Zhu, Fan Chen, Xiaohui Shen, Longyin Wen'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
vision-language models
Decomposed Attention
cross-modal learning
Input: Visual and textual embeddings 视觉和文本嵌入
Step1: Decompose the self-attention mechanism 解构自注意力机制
Step2: Optimize visual-to-visual self-attention 视觉-视觉自注意力优化
Step3: Merge visual and textual information 视觉与文本信息合并
Output: Improved efficiency and performance of LVLMs 提高LVLM效率与性能
7.5 [7.5] 2502.01969 Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration
[{'name': 'Younan Zhu, Linwei Tao, Minjing Dong, Chang Xu'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
object hallucination
attention calibration
Input: Large Vision-Language Models (LVLMs) 大型视觉语言模型
Step1: Bias estimation from input image 输入图像的偏差估计
Step2: Uniform Attention Calibration (UAC) application 应用统一注意力校准
Step3: Dynamic Attention Calibration (DAC) implementation 实现动态注意力校准
Output: Reduced object hallucination 减少物体幻觉

Arxiv 2025-02-05

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.01814 PolyhedronNet: Representation Learning for Polyhedra with Surface-attributed Graph
[{'name': 'Dazhou Yu, Genpei Zhang, Liang Zhao'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
polyhedral representation
Input: 3D polyhedral objects 3D 多面体对象
Step1: Surface-attributed graph construction 表面属性图构建
Step2: Local rigid representation learning 局部刚性表示学习
Step3: Hierarchical aggregation of representations 表示的分层聚合
Output: Global representation of polyhedra 全球多面体表示
9.5 [9.5] 2502.01846 UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping
[{'name': 'Aashish Rai, Dilin Wang, Mihir Jain, Nikolaos Sarafianos, Arthur Chen, Srinath Sridhar, Aayush Prakash'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
UV Mapping
image-based generation
3D reconstruction 3D重建
Input: 3D Gaussian Splatting (3DGS) data 3D高斯点云数据
Step1: Spherical mapping to create a structured 2D representation 使用球面映射创建结构化的2D表示
Step2: Compression of heterogeneous features into a shared feature space 将异构特征压缩到共享特征空间
Step3: Integration with pre-trained 2D generative models 与预训练的2D生成模型集成
Output: Structured 2D UV Gaussian Splatting representation 结构化的2D UV高斯点云表示
9.5 [9.5] 2502.01856 Reliability-Driven LiDAR-Camera Fusion for Robust 3D Object Detection
[{'name': 'Reza Sadeghian, Niloofar Hooshyaripour, Chris Joslin, WonSook Lee'}]
3D Object Detection 3D目标检测 v2
3D object detection
LiDAR-camera fusion
autonomous driving
Input: Sensor data from LiDAR and camera LiDAR和摄像头的传感器数据
Step1: Integration of spatial and semantic information 空间和语义信息的集成
Step2: Implementation of Reliability module to assess confidence 实现可靠性模块以评估置信度
Step3: Use of CW-MCA for dynamic weighting of modalities 使用CW-MCA对模态进行动态加权
Output: Robust 3D object detection results 稳健的3D目标检测结果
9.5 [9.5] 2502.01940 Toward a Low-Cost Perception System in Autonomous Vehicles: A Spectrum Learning Approach
[{'name': 'Mohammed Alsakabi, Aidan Erickson, John M. Dolan, Ozan K. Tonguz'}]
Depth Estimation 深度估计 v2
Depth Estimation 深度估计
Autonomous Vehicles 自动驾驶
Radar-RGB Integration 雷达- RGB集成
Input: Radar depth maps and RGB images 雷达深度图和RGB图像
Step1: Pixel positional encoding 像素位置编码
Step2: Transformation to Spatial Spectrum 转换为空间谱
Step3: Generating denser depth maps 生成更密集的深度图
Output: Enhanced depth maps 改进的深度图
9.5 [9.5] 2502.02144 DOC-Depth: A novel approach for dense depth ground truth generation
[{'name': 'Simon de Moreau, Mathias Corsia, Hassan Bouchiba, Yasser Almehio, Andrei Bursuc, Hafid El-Idrissi, Fabien Moutarde'}]
Depth Estimation 深度估计 v2
depth estimation 深度估计
LiDAR
3D reconstruction 三维重建
Input: LiDAR measurements LiDAR测量
Step1: Data aggregation 数据聚合
Step2: Dynamic object classification 动态物体分类
Step3: Dense depth generation 密集深度生成
Output: Fully-dense depth annotations 完全密集的深度注解
9.5 [9.5] 2502.02163 Progressive Correspondence Regenerator for Robust 3D Registration
[{'name': 'Guiyu Zhao, Sheng Ao, Ye Zhang, Kai Xu Yulan Guo'}]
3D Registration 3D 注册 v2
3D registration
point cloud registration
Input: Point clouds from different perspectives 从不同视角获得点云
Step1: Prior-guided local grouping prior引导局部分组
Step2: Generalized mutual matching 广义互匹配
Step3: Center-aware three-point consistency center-aware三点一致性
Step4: Global correspondence refinement 全局对应关系精炼
Output: High-quality correspondences 高质量对应关系
9.5 [9.5] 2502.02187 ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion
[{'name': 'Nissim Maruani, Wang Yifan, Matthew Fisher, Pierre Alliez, Mathieu Desbrun'}]
3D Generation 三维生成 v2
3D Generation
shape variations
multiscale neural architecture
interactive generation
Input: A single reference 3D model 单一参考3D模型
Step1: Shape variations generation 形状变体生成
Step2: Multiscale diffusion sampling 多尺度扩散采样
Step3: Interactive editing 交互式编辑
Output: High-quality 3D shape variants 高质量3D形状变体
9.5 [9.5] 2502.02247 Rotation-Adaptive Point Cloud Domain Generalization via Intricate Orientation Learning
[{'name': 'Bangzhen Liu, Chenxi Zheng, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Shengfeng He'}]
3D Reconstruction and Modeling 三维重建 v2
3D point cloud
domain generalization
rotation robustness
Input: Point clouds with variable orientations 变量方向的点云
Step1: Identify challenging rotations 识别具有挑战性的旋转
Step2: Construct intricate orientation set 构建复杂方向集
Step3: Apply contrastive learning using intricate samples 使用复杂样本进行对比学习
Output: Enhanced orientation-aware 3D representations 改进的方向感知3D表示
9.5 [9.5] 2502.02283 GP-GS: Gaussian Processes for Enhanced Gaussian Splatting
[{'name': 'Zhihao Guo, Jingxuan Su, Shenglin Wang, Jinlong Fan, Jing Zhang, Liangxiu Han, Peng Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Gaussian Processes
novel view synthesis
Input: Sparse SfM point clouds 稀疏的结构光点云
Step1: Develop MOGP model 开发多输出高斯过程模型
Step2: Adaptive sampling and filtering strategy 自适应采样和过滤策略
Step3: Densify the point clouds 使点云密集化
Output: High-quality 3D Gaussians 高质量的3D高斯
9.5 [9.5] 2502.02322 Improving Generalization Ability for 3D Object Detection by Learning Sparsity-invariant Features
[{'name': 'Hsin-Cheng Lu, Chung-Yi Lin, Winston H. Hsu'}]
3D Object Detection 3D物体检测 v2
3D object detection
autonomous driving
domain generalization
Input: LiDAR point clouds from various domains 各种域的LiDAR点云
Step1: Data subsampling based on confidence scores 根据置信度评分进行数据子采样
Step2: Teacher-student framework implementation 教师-学生框架实施
Step3: Feature alignment between domains 域间特征对齐
Output: Generalized 3D object detector 具备良好泛化能力的3D物体检测器
9.5 [9.5] 2502.02334 Event-aided Semantic Scene Completion
[{'name': 'Shangwei Guo, Hao Shi, Song Wang, Xiaoting Yin, Kailun Yang, Kaiwei Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
semantic scene completion
autonomous driving
event cameras
Input: Event and RGB images 输入:事件图像和RGB图像
Step1: Data integration 数据集成
Step2: Event-aided Lifting Module (ELM) 事件辅助提升模块开发
Step3: 3D scene reconstruction 三维场景重建
Output: Enhanced 3D semantic occupancy models 输出:改进的3D语义占用模型
9.5 [9.5] 2502.02338 Geometric Neural Process Fields
[{'name': 'Wenzhe Yin, Zehao Xiao, Jiayi Shen, Yunlu Chen, Cees G. M. Snoek, Jan-Jakob Sonke, Efstratios Gavves'}]
Neural Rendering 神经渲染 v2
Neural Radiance Fields
Geometric Neural Process Fields
3D reconstruction
Input: Limited context observations 有限上下文观察
Step 1: Formulate NeF generalization as a probabilistic problem 将NeF泛化表述为一个概率问题
Step 2: Design geometric bases to encode structural information 设计几何基以编码结构信息
Step 3: Develop a hierarchical latent variable model for parameterization 建立分层潜变量模型以进行参数化
Output: Improved generalization for novel scenes and signals 改进的新场景和信号的泛化能力
9.5 [9.5] 2502.02548 Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation
[{'name': 'Junha Lee, Chunghyun Park, Jaesung Choe, Yu-Chiang Frank Wang, Jan Kautz, Minsu Cho, Chris Choy'}]
3D Segmentation 三维分割 v2
3D segmentation 3D分割
open-vocabulary 开放词汇
Input: 3D scene datasets 3D场景数据集
Step1: Data generation data generation 数据生成
Step2: Model training 模型训练
Step3: Segmentation validation 分割验证
Output: Open-vocabulary 3D segmentation results 开放词汇3D分割结果
9.5 [9.5] 2502.02590 Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling
[{'name': 'Xiaowen Qiu, Jincheng Yang, Yian Wang, Zhehuan Chen, Yufei Wang, Tsun-Hsuan Wang, Zhou Xian, Chuang Gan'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D modeling
articulated objects 3D建模
可动物体
Input: 3D mesh 输入: 3D网格
Step1: Movable Part Segmentation 可移动部分分割
Step2: Articulation Estimation and Refinement 动作估计与精细化
Output: Articulated 3D object 输出: 可动的3D物体
9.0 [9.0] 2502.01666 Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding
[{'name': 'Jingming Xia, Guanqun Cao, Guang Ma, Yiben Luo, Qinzhao Li, John Oyekan'}]
Depth Estimation 深度估计 v2
Monocular Depth Estimation 单目深度估计
Autonomous Driving 自动驾驶
3D Reconstruction 三维重建
Input: Single RGB image 单个RGB图像
Step1: Image-based semantic embedding image-based using SeeCoder 图像语义嵌入
Step2: Integration of features via denoising UNet 特征集成通过去噪UNet
Step3: Depth map generation 深度图生成
Output: Enhanced depth map 改进的深度图
9.0 [9.0] 2502.01855 Learning Fine-to-Coarse Cuboid Shape Abstraction
[{'name': 'Gregor Kobsik, Morten Henkel, Yanjiang He, Victor Czech, Tim Elsner, Isaak Lim, Leif Kobbelt'}]
3D Reconstruction and Modeling 三维重建 v2
3D shape abstraction
unsupervised learning
cuboids
Input: Collections of 3D shapes 三维形状集合
Step1: Initial fine reconstruction 初始化细致重建
Step2: Apply fine-to-coarse abstraction fine-to-coarse abstraction
Step3: Optimize reconstruction and volume preservation 优化重建与体积保持
Output: Cuboid-based structural abstraction cuboid 基于的结构抽象
8.5 [8.5] 2502.01894 SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset
[{'name': 'Goodarz Mehr, Azim Eskandarian'}]
Autonomous Driving 自动驾驶 v2
BEV perception
synthetic data generation
autonomous driving
Input: Multi-sensor data 多传感器数据
Step1: Data generation 生成数据
Step2: Ground truth capture 捕获真实数据
Step3: Dataset creation 创建数据集
Output: Comprehensive BEV dataset 完整的鸟瞩图数据集
8.5 [8.5] 2502.01896 INTACT: Inducing Noise Tolerance through Adversarial Curriculum Training for LiDAR-based Safety-Critical Perception and Autonomy
[{'name': 'Nastaran Darabi, Divake Kumar, Sina Tayebati, Amit Ranjan Trivedi'}]
3D Point Cloud Processing 点云处理 v2
LiDAR
adversarial training
3D perception
Input: Noisy LiDAR data 噪声LiDAR数据
Step1: Prepare saliency maps 准备显著性图
Step2: Apply adversarial curriculum training 应用对抗课程训练
Step3: Train student network 训练学生网络
Output: Robust deep learning model 稳健的深度学习模型
8.5 [8.5] 2502.01949 LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation
[{'name': 'Yang Zhou, Zongjin He, Qixuan Li, Chao Wang'}]
3D Generation 三维生成 3D scene generation
3D Gaussian Splatting
physics-guided generation
Input: Text prompt 文本提示
Step1: Convert text to scene graph 将文本转换为场景图
Step2: Adjust density and layout 调整密度和布局
Step3: Dynamic camera adjustments 动态相机调整
Output: Compositional 3D scenes 组合三维场景
8.5 [8.5] 2502.01961 Hierarchical Consensus Network for Multiview Feature Learning
[{'name': 'Chengwei Xia, Chaoxi Niu, Kun Zhan'}]
Multi-view and Stereo Vision 多视角与立体视觉 v2
Multiview Learning 多视角学习
Consensus Learning 共识学习
Feature Integration 特征整合
Input: Multi-view data 多视角数据
Step1: Learn distinct and common information 学习独特和共同信息
Step2: Derive consensus indices 生成共识指标
Step3: Perform hierarchical consensus learning 进行分层共识学习
Output: Comprehensive and discriminative features 详尽和有辨识度的特征
8.5 [8.5] 2502.01969 Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration
[{'name': 'Younan Zhu, Linwei Tao, Minjing Dong, Chang Xu'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
object hallucination
Input: LVLMs with visual tokens 视觉语言模型与视觉标记
Step1: Analyze attention biases 分析注意力偏差
Step2: Implement UAC for calibration 实施均匀注意力校准
Step3: Develop DAC for dynamic adjustment 开发动态注意力校准模块
Output: Improved alignment and reduced hallucination 输出: 改进的对齐和减少的幻觉
8.5 [8.5] 2502.02171 DeepForest: Sensing Into Self-Occluding Volumes of Vegetation With Aerial Imaging
[{'name': 'Mohamed Youssef, Jian Peng, Oliver Bimber'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
remote sensing
vegetation analysis
Input: Aerial images from drones 通过无人机获取航空图像
Step1: Synthetic-aperture imaging 合成孔径成像
Step2: Use 3D convolutional neural networks to reduce out-of-focus signals 使用3D卷积神经网络减少模糊信号
Step3: Combine multiple reflectance stacks from various spectral channels 结合来自不同光谱通道的多重反射堆栈
Output: Volumetric representations of vegetation 体积植被表示
8.5 [8.5] 2502.02372 MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning
[{'name': 'Shengbo Gu, Yu-Kun Qiu, Yu-Ming Tang, Ancong Wu, Wei-Shi Zheng'}]
Neural Rendering 神经渲染 v2
Neural Radiance Fields
3D rendering
continual learning
Input: Limited training data 对应的有限训练数据
Step1: Employ NeRF for 3D rendering 使用NeRF进行3D渲染
Step2: Implement a Global-Local Joint Storage Module 实现全局-局部联合存储模块
Step3: Utilize a Pose Distillation Module 使用姿态蒸馏模块
Output: Maintainable virtual avatars 可维护的虚拟 avatar
8.5 [8.5] 2502.02468 High-Fidelity Human Avatars from Laptop Webcams using Edge Compute
[{'name': 'Akash Haridas Imran N. Junejo'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
avatar generation
differentiable rendering
Input: Consumer-grade laptop webcam images 使用普通笔记本电脑网络摄像头的图像
Step1: Shape generation using 3D morphable models 使用3D可变形模型生成形状
Step2: Landmark detection using optimization 标记检测使用优化
Step3: Texture generation with GANs 使用GAN生成纹理
Step4: Differentiable rendering to create avatars 使用可微渲染创建虚拟形象
Output: High-fidelity human avatars 高保真度人类虚拟形象
8.5 [8.5] 2502.02525 Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation
[{'name': 'Jian Liu, Wei Sun, Hui Yang, Pengchao Deng, Chongpei Liu, Nicu Sebe, Hossein Rahmani, Ajmal Mian'}]
Object Pose Estimation 物体姿态估计 v2
9-DoF object pose estimation
domain generalization
robotic grasping
Input: Rendered synthetic data 渲染合成数据
Step1: Model training 模型训练
Step2: Pose estimation 估计姿态
Step3: Real-time performance optimization 实时性能优化
Output: Estimated 9-DoF object poses 估计的9自由度物体姿态
8.5 [8.5] 2502.02537 Uncertainty Quantification for Collaborative Object Detection Under Adversarial Attacks
[{'name': 'Huiqun Huang, Cong Chen, Jean-Philippe Monteuuis, Jonathan Petit, Fei Miao'}]
Autonomous Systems and Robotics 自动驾驶 v2
Collaborative Object Detection
Uncertainty Quantification
Adversarial Robustness
Autonomous Vehicles
Input: Collaborative object detection models 协作目标检测模型
Step1: Adversarial training for robustness 对抗训练以增强鲁棒性
Step2: Uncertainty quantification estimation 不确定性量化估计
Step3: Calibration of uncertainty using conformal prediction 使用保形预测进行不确定性校准
Output: Enhanced object detection accuracy 改进的目标检测准确性
8.0 [8.0] 2502.01890 Geometric Framework for 3D Cell Segmentation Correction
[{'name': 'Peter Chen, Bryan Chang, Olivia Annette Creasey, Julie Beth Sneddon, Yining Liu'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D Segmentation 3D分割
Geometric Framework 几何框架
Input: 2D cell segmentation results 2D细胞分割结果
Step1: Extract geometric features 提取几何特征
Step2: Train binary classifier 训练二元分类器
Step3: Correct segmentation errors 修正分割错误
Output: Accurate 3D cell body reconstruction 精确的3D细胞体重建
8.0 [8.0] 2502.01906 Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models
[{'name': 'Chia-Wen Kuo, Sijie Zhu, Fan Chen, Xiaohui Shen, Longyin Wen'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
Decomposed Attention
Computational Efficiency
Input: Visual and textual embeddings 视觉和文本嵌入
Step1: Decompose the attention mechanism 分解注意力机制
Step2: Optimize visual-to-visual self-attention 优化视觉间自注意力
Step3: Debias positional encodings 去偏差位置编码
Output: Enhanced processing of visual and textual embeddings 改进的视觉和文本嵌入处理
7.5 [7.5] 2502.02225 Exploring the latent space of diffusion models directly through singular value decomposition
[{'name': 'Li Wang, Boyan Gao, Yanran Li, Zhao Wang, Xiaosong Yang, David A. Clifton, Jun Xiao'}]
Image Generation 图像生成 v2
diffusion models
image editing
latent space
Singular Value Decomposition
image generation
Input: Latent space of diffusion models 扩散模型的潜在空间
Step1: Investigate latent space using Singular Value Decomposition (SVD) 通过奇异值分解(SVD)研究潜在空间
Step2: Discover properties of latent space 发现潜在空间的属性
Step3: Propose image editing framework based on properties 提出基于属性的图像编辑框架
Output: Enhanced image editing capabilities 改进的图像编辑能力

Arxiv 2025-02-04

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.00173 Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance Segmentation
[{'name': 'Rohan Chacko, Nicolai Haeni, Eldar Khaliullin, Lin Sun, Douglas Lee'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D instance segmentation 3D实例分割
Gaussian Splatted Radiance Fields 高斯点云辐射场
novel view synthesis 新视图合成
Input: Posed 2D image data 2D图像数据
Step1: Extract per-image 2D segmentation masks 提取每帧的2D分割掩码
Step2: 2D-to-3D lifting to assign unique object IDs 在3D中分配唯一对象ID的2D到3D提升流程
Step3: Incremental merging of object fragments into coherent objects 将对象片段合并成一致的对象
Output: High-quality 3D object segments 高质量的3D对象片段
9.5 [9.5] 2502.00360 Shape from Semantics: 3D Shape Generation from Multi-View Semantics
[{'name': 'Liangchen Li, Caoliwen Wang, Yuqi Zhou, Bailin Deng, Juyong Zhang'}]
3D Shape Generation 3D形状生成 v2
3D reconstruction
shape generation
semantic input
Input: Semantic descriptions 语义描述
Step1: Distill 3D geometry from 2D diffusion models 从2D扩散模型提取3D几何
Step2: Refine textures using image and video generation models 使用图像和视频生成模型细化纹理
Step3: Represent the refined 3D model with neural implicit representations 使用神经隐式表示来表示细化的3D模型
Output: Fabricable high-quality meshes 可制造的高质量网格
9.5 [9.5] 2502.00801 Environment-Driven Online LiDAR-Camera Extrinsic Calibration
[{'name': 'Zhiwei Huang, Jiaqi Li, Ping Zhong, Rui Fan'}]
3D Reconstruction and Modeling 三维重建 v2
LiDAR-camera calibration
3D reconstruction
autonomous driving
Input: LiDAR and camera data 激光雷达和相机数据
Step1: Environment interpretation 环境解读
Step2: Data fusion 数据融合
Step3: Dual-path correspondence matching 双色通道对应匹配
Step4: Spatial-temporal optimization 空间-时间优化
Output: Accurate extrinsic calibration 精准的外部标定
9.5 [9.5] 2502.01045 WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction
[{'name': 'Zilong Wang, Zhiyang Dou, Yuan Liu, Cheng Lin, Xiao Dong, Yunhui Guo, Chenxu Zhang, Xin Li, Wenping Wang, Xiaohu Guo'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
generative models
dynamic avatars
Input: Monocular video 单目视频
Step1: Generative prior usage 生成优先级使用
Step2: Dual-Space Optimization 双空间优化
Step3: View selection strategy 视图选择策略
Step4: Pose feature injection 姿势特征注入
Output: High-fidelity dynamic human avatars 高保真动态人形象
9.5 [9.5] 2502.01405 FourieRF: Few-Shot NeRFs via Progressive Fourier Frequency Control
[{'name': 'Diego Gomez, Bingchen Gong, Maks Ovsjanikov'}]
3D Reconstruction and Modeling 三维重建与建模 v2
Few-Shot NeRF
3D Reconstruction
Neural Rendering
Input: Limited input views 有限的输入视角
Step1: Frequency control frequency control
Step2: Curriculum training curriculum training
Step3: Scene reconstruction scene reconstruction
Output: Accurate 3D representations 准确的三维表示
9.2 [9.2] 2502.00262 Your submission contained main.bib and main.tex file, but no main.bbl file (include main.bbl, or submit without main.bib; and remember to verify references)
[{'name': 'Dianwei Chen, Zifan Zhang, Yuchen Liu, Xianfeng Terry Yang'}]
Autonomous Systems and Robotics 自动驾驶 v2
hazard detection
vision-language model
autonomous driving
Input: Multimodal data fusion 多模态数据融合
Step1: Semantic and visual inputs integration 语义和视觉输入集成
Step2: Supervised fine-tuning of vision-language models 有监督微调视觉语言模型
Step3: Hazard detection and edge case evaluation 危险检测和边缘案例评估
Output: Enhanced situational awareness 改进的情境意识
9.2 [9.2] 2502.00315 MonoDINO-DETR: Depth-Enhanced Monocular 3D Object Detection Using a Vision Foundation Model
[{'name': 'Jihyeok Kim, Seongwoo Moon, Sungwon Nah, David Hyunchul Shim'}]
3D Object Detection 3D对象检测 v2
3D object detection 3D对象检测
monocular vision 单目视觉
depth estimation 深度估计
Input: Monocular images 单目图像
Step1: Depth estimation using Vision Transformer 步骤1:使用视觉Transformer进行深度估计
Step2: Feature extraction with Hierarchical Feature Fusion 步骤2:利用层次特征融合提取特征
Step3: Object detection using DETR architecture 步骤3:使用DETR架构进行对象检测
Output: 3D bounding boxes for detected objects 输出:检测到对象的3D边界框
8.5 [8.5] 2502.00074 SpikingRTNH: Spiking Neural Network for 4D Radar Object Detection
[{'name': 'Dong-Hee Paek, Seung-Hyun Kong'}]
3D Object Detection 目标检测 v2
4D Radar
3D object detection
energy efficiency
autonomous driving
Input: 4D Radar point clouds 4D雷达点云
Step1: Convert RTNH to SNN architecture 将RTNH转换为SNN架构
Step2: Implement biological top-down inference (BTI) 实现生物学自上而下推理(BTI)
Step3: Model evaluation and comparison 模型评估与比较
Output: Energy-efficient 3D object detection model 能源高效的3D目标检测模型
8.5 [8.5] 2502.00342 Embodied Intelligence for 3D Understanding: A Survey on 3D Scene Question Answering
[{'name': 'Zechuan Li, Hongshan Yu, Yihao Ding, Yan Li, Yong He, Naveed Akhtar'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
3D Scene Question Answering
multimodal models
Input: 3D scene representation and query 3D场景表示和查询
Step1: Systematic literature review 系统文献综述
Step2: Dataset analysis 数据集分析
Step3: Methodology evaluation 方法评估
Output: Comprehensive insights and challenges on 3D SQA 对3D SQA的综合见解和挑战
8.5 [8.5] 2502.00500 Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
[{'name': 'Yang Cao, Zhao Song, Chiwun Yang'}]
Video Generation 视频生成 v2
video generation
interpolation
extrapolation
latent flow matching
Input: Video frames 视频帧
Step1: Model latent flow 模型潜在流
Step2: Polynomial projection 多项式投影
Step3: Generate time-dependent frames 生成时间相关帧
Output: Video with interpolation and extrapolation 带插值和外推的视频
8.5 [8.5] 2502.00708 PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation
[{'name': 'Qixuan Li, Chao Wang, Zongjin He, Yan Peng'}]
3D Generation 三维生成 v2
3D generation
compositional scenes
large language models
Input: Complex scene descriptions 复杂场景描述
Step1: Semantic parsing and relationship extraction 语义解析和关系提取
Step2: Scene graph generation 场景图生成
Step3: 2D and 3D asset generation 2D和3D资产生成
Step4: Layout prediction and planning 布局预测与规划
Output: High-quality 3D compositional scenes 高质量三维组合场景
8.5 [8.5] 2502.00843 VLM-Assisted Continual learning for Visual Question Answering in Self-Driving
[{'name': 'Yuxin Lin, Mengshi Qi, Liang Liu, Huadong Ma'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Visual Question Answering
Vision-Language Models
Autonomous Driving
Input: Visual Question Answering task in autonomous driving 视觉问答任务之于自动驾驶
Step1: Integrate Vision-Language Models with continual learning 结合视觉语言模型与持续学习
Step2: Implement selective memory replay and knowledge distillation 实施选择性记忆重放与知识蒸馏
Step3: Apply task-specific projection layer regularization 应用特定任务的投影层正则化
Output: Enhanced VQA performance in autonomous driving environments 改进的自动驾驶环境中的视觉问答性能
8.5 [8.5] 2502.00954 Hypo3D: Exploring Hypothetical Reasoning in 3D
[{'name': 'Ye Mao, Weixun Luo, Junpeng Jing, Anlan Qiu, Krystian Mikolajczyk'}]
3D Reasoning in Scenes 三维场景推理 v2
3D reasoning
visual question answering
hypothetical reasoning
Input: Context change descriptions 上下文变化描述
Step1: Dataset construction 数据集构建
Step2: Model evaluation 模型评估
Output: Performance analysis 性能分析
8.5 [8.5] 2502.00960 SAM-guided Pseudo Label Enhancement for Multi-modal 3D Semantic Segmentation
[{'name': 'Mingyu Yang, Jitong Lu, Hun-Seok Kim'}]
3D Semantic Segmentation 三维语义分割 v2
3D semantic segmentation
domain adaptation
pseudo labels
autonomous driving
Input: 3D point cloud and SAM masks 3D点云和SAM掩码
Step1: Class label determination using majority voting 类别标签确定(使用投票法)
Step2: Application of filtering constraints to unreliable labels 对不可靠标签应用过滤约束
Step3: Geometry-Aware Progressive Propagation (GAPP) for label propagation 到所有3D点进行标签传播(GAPP方法)
Output: Enhanced pseudo-labels and improved segmentation performance 输出:改进的伪标签和增强的分割性能
8.5 [8.5] 2502.00972 Pushing the Boundaries of State Space Models for Image and Video Generation
[{'name': 'Yicong Hong, Long Mai, Yuan Yao, Feng Liu'}]
Image and Video Generation 图像生成和视频生成 v2
image generation
video generation
state-space models
transformer models
Input: Images and video sequences 图像和视频序列
Step1: Develop SSM-Transformer hybrid model 开发SSM-Transformer混合模型
Step2: Efficient processing of visual sequences 高效处理视觉序列
Step3: Generate images and videos 生成图像和视频
Output: High-quality images and dynamic videos 高质量图像和动态视频
8.5 [8.5] 2502.01004 ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking
[{'name': 'Jianqiu Chen, Zikun Zhou, Xin Li, Ye Zheng, Tianpeng Bao, Zhenyu He'}]
Autonomous Systems and Robotics 自动驾驶与机器人技术 v2
6D pose estimation
bin-picking
zero-shot learning
robotic manipulation
Input: RGB-D image and CAD model 输入: RGB-D图像和CAD模型
Step1: Object detection 物体检测
Step2: Point cloud extraction 点云提取
Step3: Position-Aware Correspondence learning 位置感知对应学习
Step4: Pose estimation 位置估计
Output: 6D pose predictions 输出: 6D姿态预测
8.5 [8.5] 2502.01157 Radiant Foam: Real-Time Differentiable Ray Tracing
[{'name': 'Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi'}]
Neural Rendering 神经渲染 v2
differentiable rendering
volumetric meshes
real-time rendering
Input: Volumetric mesh representations 体积网格表示
Step1: Mesh parameterization 网格参数化
Step2: Differentiable ray tracing 可微光线追踪
Step3: Rendering and evaluation 渲染与评估
Output: Real-time rendering results 实时渲染结果
8.5 [8.5] 2502.01281 Label Correction for Road Segmentation Using Road-side Cameras
[{'name': 'Henrik Toikka, Eerik Alamikkotervo, Risto Ojala'}]
Autonomous Systems and Robotics 自动驾驶机器人系统 v2
road segmentation
autonomous vehicles
image registration
deep learning
Input: Roadside camera images 道路监控摄像头图像
Step1: Automatic data collection 自动数据收集
Step2: Semi-automatic annotation method 开发半自动注释方法
Step3: Image registration to correct labels 图像配准以修正标签
Output: Enhanced road segmentation models 改进的道路分割模型
8.5 [8.5] 2502.01297 XR-VIO: High-precision Visual Inertial Odometry with Fast Initialization for XR Applications
[{'name': 'Shangjin Zhai, Nan Wang, Xiaomeng Wang, Danpeng Chen, Weijian Xie, Hujun Bao, Guofeng Zhang'}]
Autonomous Systems and Robotics 自动驾驶与机器人技术 v2
Visual Inertial Odometry
Initialization
Feature Matching
AR
VR
Input: Visual Inertial Odometry (VIO) data 视觉惯性里程计数据
Step1: Initialization using gyroscope and visual measurements 初始化算法
Step2: Hybrid feature matching using optical flow and descriptor methods 特征匹配
Step3: Evaluation on benchmarks and practical applications 验证和实际应用
Output: Enhanced VIO performance 改进的VIO性能
8.5 [8.5] 2502.01357 Bayesian Approximation-Based Trajectory Prediction and Tracking with 4D Radar
[{'name': 'Dong-In Kim, Dong-Hee Paek, Seung-Hyun Song, Seung-Hyun Kong'}]
Autonomous Driving 自动驾驶 v2
3D multi-object tracking
4D Radar
Input: 4D Radar data 4D雷达数据
Step1: Object detection using Bayesian approximation 基于贝叶斯近似进行目标检测
Step2: Motion prediction with transformer network 使用变换器网络进行运动预测
Step3: Two-stage data association integrating Doppler measurements 两阶段数据关联,整合多普勒测量
Output: Accurate 3D MOT results 准确的3D多目标跟踪结果
8.5 [8.5] 2502.01401 Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection
[{'name': 'Boyu Mi, Hanqing Wang, Tai Wang, Yilun Chen, Jiangmiao Pang'}]
3D Visual Grounding 3D视觉基础 v2
3D visual grounding
Large Language Model
3D reconstruction
vision-language model
Input: Referring utterances and 3D scene scans 参考话语和三维场景扫描
Step1: Parse utterance into symbolic expression 将话语解析为符号表达式
Step2: Generate spatial relation features 生成空间关系特征
Step3: Use VLM to process visual information 使用视觉语言模型处理视觉信息
Output: Identified target object 确定目标对象
8.0 [8.0] 2502.00800 Adversarial Semantic Augmentation for Training Generative Adversarial Networks under Limited Data
[{'name': 'Mengping Yang, Zhe Wang, Ziqiu Chi, Dongdong Li, Wenli Du'}]
Image Generation 图像生成 v2
Generative Adversarial Networks
Data Augmentation
Image Generation
Input: Limited training data 有限训练数据
Step 1: Estimate covariance matrices 估计协方差矩阵
Step 2: Identify semantic transformation directions 确定语义转换方向
Step 3: Apply adversarial semantic augmentation 应用对抗性语义增强
Output: Improved generation quality 改进的生成质量
7.5 [7.5] 2502.00618 DesCLIP: Robust Continual Adaptation via General Attribute Descriptions for Pretrained Vision-Language Models
[{'name': 'Chiyuan He, Zihuan Qiu, Fanman Meng, Linfeng Xu, Qingbo Wu, Hongliang Li'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
vision-language models
knowledge forgetting
general attributes
Input: Pretrained Vision-Language Models (VLMs) 预训练视觉语言模型
Step1: Generating General Attribute Descriptions 生成通用属性描述
Step2: Establishing Vision-GA-Class Associations 建立视觉-通用属性-类关联
Step3: Tuning Visual Encoder 调整视觉编码器
Output: Enhanced Adaptation with Reduced Knowledge Forgetting 改进的适应性,减少知识遗忘
7.5 [7.5] 2502.00639 Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
[{'name': 'Tao Ren, Zishi Zhang, Zehao Li, Jingyang Jiang, Shentao Qin, Guanghao Li, Yan Li, Yi Zheng, Xinping Li, Min Zhan, Yijie Peng'}]
Image Generation 图像生成 v2
Diffusion Model
Image Generation
Video Generation
Input: Diffusion Model (DM) diffusion模型
Step1: Analyze variance and bias variance和偏差分析
Step2: Develop Recursive Likelihood Ratio optimizer 开发递归似然比优化器
Step3: Validate on image and video tasks 在图像和视频任务上验证
Output: Fine-tuned model 改进的模型
7.0 [7.0] 2502.01530 The in-context inductive biases of vision-language models differ across modalities
[{'name': 'Kelsey Allen, Ishita Dasgupta, Eliza Kosoy, Andrew K. Lampinen'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
vision-language models
inductive biases
generalization
Input: Visual and textual stimuli 视觉和文本刺激
Step1: Inductive bias analysis 偏置分析
Step2: Experimental paradigm application 实验范式应用
Step3: Data collection and evaluation 数据收集与评估
Output: Insights on model generalization 关于模型泛化的见解
6.5 [6.5] 2502.01524 Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective
[{'name': 'Xiaorui Ma, Haoran Xie, S. Joe Qin'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
multimodal learning
Large Language Models
parameter-efficient learning
Vision-Language Models
Input: Vision-language models 视觉-语言模型
Step1: Categorize and review VLLMs 对VLLMs进行分类和审查
Step2: Discuss training paradigms 讨论训练范式
Step3: Summarize benchmarks 总结基准测试
Output: Comprehensive survey report 综合调查报告

Arxiv 2025-02-04

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2502.00173 Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance Segmentation
[{'name': 'Rohan Chacko, Nicolai Haeni, Eldar Khaliullin, Lin Sun, Douglas Lee'}]
3D Reconstruction and Modeling 三维重建 3D instance segmentation 3D实例分割
Gaussian Splatted Radiance Fields 高斯喷溅辐射场
Input: 2D segmentation masks 2D分割掩码
Step1: Feature integration 特征集成
Step2: 3D Gaussian lifting 3D高斯提升
Step3: Segmentation application 分割应用
Output: 3D segmented assets 3D分割资产
9.5 [9.5] 2502.00360 Shape from Semantics: 3D Shape Generation from Multi-View Semantics
[{'name': 'Liangchen Li, Caoliwen Wang, Yuqi Zhou, Bailin Deng, Juyong Zhang'}]
3D Generation 三维生成 3D reconstruction
shape generation
semantics
Input: Multi-view semantics 多视角语义
Step1: Semantic input analysis 语义输入分析
Step2: Geometry and appearance distillation from 2D models 从2D模型提取几何与外观
Step3: Image restoration and detail enhancement 图像修复与细节增强
Step4: Shape reconstruction using neural SDF representation 使用神经签名距离场重建形状
Output: Complex detailed 3D meshes 复杂细节的三维网格
9.5 [9.5] 2502.00801 Environment-Driven Online LiDAR-Camera Extrinsic Calibration
[{'name': 'Zhiwei Huang, Jiaqi Li, Ping Zhong, Rui Fan'}]
3D Reconstruction and Modeling 三维重建 LiDAR-camera calibration
3D reconstruction
data fusion
Input: LiDAR and camera data LiDAR和相机数据
Step1: Environmental interpretation 环境解释
Step2: Dual-path correspondence matching 双路径对应匹配
Step3: Spatial-temporal optimization 空间时间优化
Output: Precise extrinsic calibration 精确的外部标定
8.5 [8.5] 2502.00074 SpikingRTNH: Spiking Neural Network for 4D Radar Object Detection
[{'name': 'Dong-Hee Paek, Seung-Hyun Kong'}]
3D Object Detection 三维物体检测 3D object detection
neural networks
autonomous driving
Input: 4D Radar data 4D 雷达数据
Step1: Process high-density point clouds 处理高密度点云
Step2: Implement spiking neural network architecture 实现脉冲神经网络架构
Step3: Apply biological top-down inference (BTI) 应用生物学的自上而下推理法
Output: Efficient 3D object detection results 高效的三维物体检测结果
8.5 [8.5] 2502.00262 Your submission contained main.bib and main.tex file, but no main.bbl file (include main.bbl, or submit without main.bib; and remember to verify references)
[{'name': 'Dianwei Chen, Zifan Zhang, Yuchen Liu, Xianfeng Terry Yang'}]
Autonomous Driving 自动驾驶 hazard detection
autonomous driving
multimodal data fusion
Input: Multimodal data 输入: 多模态数据
Step1: Data integration 数据集成
Step2: Hazard detection 危险检测
Step3: Spatial localization 空间定位
Output: Enhanced hazard prediction 改进的危险预测
8.5 [8.5] 2502.00315 MonoDINO-DETR: Depth-Enhanced Monocular 3D Object Detection Using a Vision Foundation Model
[{'name': 'Jihyeok Kim, Seongwoo Moon, Sungwon Nah, David Hyunchul Shim'}]
3D Reconstruction 三维重建 3D object detection
depth estimation
Input: Monocular images 单目图像
Step1: Feature extraction using Vision Transformer 基于视觉变换器的特征提取
Step2: Depth estimation using a relative depth model 使用相对深度模型进行深度估计
Step3: Object detection using DETR architecture 使用DETR架构进行物体检测
Output: Enhanced 3D object detection capabilities 改进的3D物体检测能力
8.5 [8.5] 2502.00528 Vision-Language Modeling in PET/CT for Visual Grounding of Positive Findings
[{'name': 'Zachary Huemann, Samuel Church, Joshua D. Warner, Daniel Tran, Xin Tie, Alan B McMillan, Junjie Hu, Steve Y. Cho, Meghan Lubner, Tyler J. Bradshaw'}]
VLM & VLA 视觉语言模型 3D vision-language model
PET/CT
visual grounding
Input: PET/CT reports and images PET/CT 报告和图像
Step1: Automation of weak labeling pipeline 弱标记生成管道自动化
Step2: Data extraction from reports 报告中数据提取
Step3: Training of ConTEXTual Net 3D 训练 ConTEXTual Net 3D
Output: 3D visual grounding model 3D 视觉定位模型
8.5 [8.5] 2502.00708 PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation
[{'name': 'Qixuan Li, Chao Wang, Zongjin He, Yan Peng'}]
3D Generation 三维生成 text-to-3D generation
compositional scenes
physics-guided generation
Input: Complex scene descriptions 复杂场景描述
Step1: Scene graph generation 场景图生成
Step2: Asset creation using multimodal agents 使用多模态代理进行资产创建
Step3: Layout prediction with physical model 使用物理模型进行布局预测
Output: Compositional scenes with physical rationality 具有物理合理性的组合场景
8.5 [8.5] 2502.00843 VLM-Assisted Continual learning for Visual Question Answering in Self-Driving
[{'name': 'Yuxin Lin, Mengshi Qi, Liang Liu, Huadong Ma'}]
VLM & VLA 视觉语言模型与视觉语言对齐 Vision-Language Models
Visual Question Answering
autonomous driving
continual learning
Input: Visual Question Answering tasks in autonomous driving 在自动驾驶中的视觉问答任务
Step1: Integrate Vision-Language Models with continual learning 整合视觉语言模型与持续学习
Step2: Implement selective memory replay and knowledge distillation 实施选择性记忆重放和知识蒸馏
Step3: Apply task-specific projection layer regularization 应用任务特定投影层正则化
Output: Improved VQA system performance 改进的视觉问答系统性能
8.5 [8.5] 2502.00954 Hypo3D: Exploring Hypothetical Reasoning in 3D
[{'name': 'Ye Mao, Weixun Luo, Junpeng Jing, Anlan Qiu, Krystian Mikolajczyk'}]
3D Reasoning 3D推理 3D reasoning
Visual Question Answering
scene understanding
Input: Context changes and indoor scene descriptions 上下文变化和室内场景描述
Step1: Benchmark formulation 基准测试制定
Step2: Model evaluation models performance evaluation 模型性能评估
Output: Hypothetical reasoning capabilities 设想推理能力
8.5 [8.5] 2502.00960 SAM-guided Pseudo Label Enhancement for Multi-modal 3D Semantic Segmentation
[{'name': 'Mingyu Yang, Jitong Lu, Hun-Seok Kim'}]
3D Reconstruction and Modeling 三维重建 3D semantic segmentation
domain adaptation
pseudo-labels
autonomous driving
Input: 3D point cloud and SAM masks 输入: 3D点云和SAM掩码
Step1: Class label determination using majority voting 步骤1: 使用投票法确定类别标签
Step2: Unreliable mask label filtering using constraints 步骤2: 使用约束过滤不可靠的掩码标签
Step3: Geometry-Aware Progressive Propagation (GAPP) to propagate mask labels 步骤3: 使用几何感知逐步传播来传递掩码标签
Output: Enhanced pseudo-labels with improved quality 输出: 质量提升的增强伪标签
8.5 [8.5] 2502.01004 ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking
[{'name': 'Jianqiu Chen, Zikun Zhou, Xin Li, Ye Zheng, Tianpeng Bao, Zhenyu He'}]
Autonomous Systems and Robotics 自动驾驶 6D pose estimation
bin-picking
robotic manipulation
zero-shot learning
Input: Scene instances and CAD models 场景实例与CAD模型
Step1: Feature extraction 特征提取
Step2: Position-aware correspondence learning 基于位置的对应学习
Step3: Pose estimation 位置估计
Output: Accurate 6D poses 准确的6D姿势
8.5 [8.5] 2502.01045 WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction
[{'name': 'Zilong Wang, Zhiyang Dou, Yuan Liu, Cheng Lin, Xiao Dong, Yunhui Guo, Chenxu Zhang, Xin Li, Wenping Wang, Xiaohu Guo'}]
3D Reconstruction 三维重建 3D human reconstruction
photorealistic rendering
Input: Monocular video 单目视频
Step1: Dual-Space Optimization 双空间优化
Step2: Score Distillation Sampling (SDS) 评分蒸馏采样
Step3: View Selection_strategy 视图选择策略
Step4: Pose Feature Injection 姿态特征注入
Output: High-fidelity dynamic human avatars 高保真动态人类虚拟形象
8.5 [8.5] 2502.01157 Radiant Foam: Real-Time Differentiable Ray Tracing
[{'name': 'Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi'}]
Neural Rendering 神经渲染 differentiable rendering
ray tracing
computer vision
Input: Scene representations 场景表示
Step1: Implement volumetric mesh ray tracing 实现体积网格光线追踪
Step2: Develop a novel scene representation 发展新场景表示
Step3: Evaluate rendering speed and quality 评估渲染速度和质量
Output: Real-time rendering model 实时渲染模型
8.5 [8.5] 2502.01281 Label Correction for Road Segmentation Using Road-side Cameras
[{'name': 'Henrik Toikka, Eerik Alamikkotervo, Risto Ojala'}]
Autonomous Driving 自动驾驶 road segmentation
deep learning
autonomous vehicles
data annotation
Input: Roadside camera feeds 路边摄像头视频
Step1: Manual labeling of one frame 手动标注一帧
Step2: Transfer labels to other frames 转移标签到其他帧
Step3: Compensate for camera movements 使用频域图像配准补偿相机位移
Output: Semi-automatically labeled road data 半自动标注的道路数据
8.5 [8.5] 2502.01297 XR-VIO: High-precision Visual Inertial Odometry with Fast Initialization for XR Applications
[{'name': 'Shangjin Zhai, Nan Wang, Xiaomeng Wang, Danpeng Chen, Weijian Xie, Hujun Bao, Guofeng Zhang'}]
Visual Odometry 视觉里程计 Visual Inertial Odometry
Structure from Motion
Augmented Reality
Virtual Reality
Input: Visual inertial measurements 视觉惯性测量
Step1: Robust initialization initialization 稳健初始化
Step2: Feature matching 特征匹配
Step3: State estimation 状态估计
Output: Accurate visual inertial odometry result 精确的视觉惯性里程计结果
8.5 [8.5] 2502.01356 Quasi-Conformal Convolution : A Learnable Convolution for Deep Learning on Riemann Surfaces
[{'name': 'Han Zhang, Tsz Lok Ip, Lok Ming Lui'}]
3D Reconstruction and Modeling 3D重建 3D facial analysis
Riemann surfaces
Input: Geometric data and Riemann surfaces 几何数据和黎曼曲面
Step1: Define quasi-conformal mappings 定义准保形映射
Step2: Develop Quasi-Conformal Convolution operators 开发准保形卷积算子
Step3: Implement Quasi-Conformal Convolutional Neural Network (QCCNN) 实现准保形卷积神经网络
Output: Adaptive convolution for geometric data 自适应卷积用于几何数据
8.5 [8.5] 2502.01357 Bayesian Approximation-Based Trajectory Prediction and Tracking with 4D Radar
[{'name': 'Dong-In Kim, Dong-Hee Paek, Seung-Hyun Song, Seung-Hyun Kong'}]
Robotic Perception 机器人感知 3D multi-object tracking
Bayesian approximation
autonomous driving
Input: 4D Radar data 4D 雷达数据
Step1: Motion prediction using transformer-based network 使用基于变换器的网络进行运动预测
Step2: Bayesian approximation for detection and prediction 步骤 2: 检测和预测中的贝叶斯近似
Step3: Two-stage data association leveraging Doppler measurements 基于多普勒测量的两阶段数据关联
Output: Enhanced multi-object tracking performance 提升的多目标跟踪性能
8.5 [8.5] 2502.01401 Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection
[{'name': 'Boyu Mi, Hanqing Wang, Tai Wang, Yilun Chen, Jiangmiao Pang'}]
3D Visual Grounding 3D视觉定位 3D visual grounding
weakly supervised learning
Input: 3D visual information and language 3D视觉信息与语言
Step1: Code generation using LLM 通过LLM生成代码
Step2: Spatial relationship computation 空间关系计算
Step3: Quality evaluation and optimization 质量评估和优化
Output: Efficient grounding results 高效的定位结果
8.5 [8.5] 2502.01405 FourieRF: Few-Shot NeRFs via Progressive Fourier Frequency Control
[{'name': 'Diego Gomez, Bingchen Gong, Maks Ovsjanikov'}]
3D Reconstruction 三维重建 Few-Shot NeRFs 少样本神经辐射场
3D Reconstruction 三维重建
Input: Scene images 场景图像
Step1: Curriculum training curriculum training 课程训练
Step2: Feature parameterization 特征参数化
Step3: Scene complexity increment 增加场景复杂性
Output: High-quality reconstruction 高质量重建
8.0 [8.0] 2502.00342 Embodied Intelligence for 3D Understanding: A Survey on 3D Scene Question Answering
[{'name': 'Zechuan Li, Hongshan Yu, Yihao Ding, Yan Li, Yong He, Naveed Akhtar'}]
3D Reconstruction and Modeling 3D重建与建模 3D scene question answering
multimodal modelling
datasets
Input: 3D scene data 3D场景数据
Step1: Systematic review of datasets 数据集的系统评审
Step2: Analysis of methodologies 方法论分析
Step3: Evaluation of metrics 评估指标
Output: Comprehensive understanding of 3D SQA 3D场景问答的综合理解
8.0 [8.0] 2502.00800 Adversarial Semantic Augmentation for Training Generative Adversarial Networks under Limited Data
[{'name': 'Mengping Yang, Zhe Wang, Ziqiu Chi, Dongdong Li, Wenli Du'}]
Image Generation 图像生成 Generative Adversarial Networks
data augmentation
image synthesis
semantic features
Input: Limited image datasets 有限图像数据集
Step1: Estimate covariance matrices 估计协方差矩阵
Step2: Identify meaningful transformation directions 识别有意义的转化方向
Step3: Apply transformations to semantic features 对语义特征应用转化
Output: Enhanced synthetic images 增强合成图像
7.5 [7.5] 2502.00333 BiMaCoSR: Binary One-Step Diffusion Model Leveraging Flexible Matrix Compression for Real Super-Resolution
[{'name': 'Kai Liu, Kaicheng Yang, Zheng Chen, Zhiteng Li, Yong Guo, Wenbo Li, Linghe Kong, Yulun Zhang'}]
Image Generation 图像生成 super-resolution
diffusion model
binarization
model compression
Input: Diffusion model for super-resolution 超分辨率扩散模型
Step1: Binarization of model models 模型的二值化
Step2: One-step distillation into extreme compression 一步蒸馏以实现极端压缩
Step3: Integration of sparse and low rank matrix branches 结合稀疏和低秩矩阵分支
Output: Compressed and accelerated super-resolution model 压缩和加速的超分辨率模型
7.5 [7.5] 2502.00500 Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
[{'name': 'Yang Cao, Zhao Song, Chiwun Yang'}]
Image and Video Generation 图像生成 video generation
interpolation
extrapolation
Input: Video frames 视频帧
Step1: Hypothesis generation 假设生成
Step2: Optimal projection approximation 最优投影近似
Step3: Interpolation and extrapolation 插值和外推
Output: Time-dependent video frames 时间依赖视频帧
7.5 [7.5] 2502.00639 Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
[{'name': 'Tao Ren, Zishi Zhang, Zehao Li, Jingyang Jiang, Shentao Qin, Guanghao Li, Yan Li, Yi Zheng, Xinping Li, Min Zhan, Yijie Peng'}]
Image Generation 图像生成 Diffusion Model
image generation
video generation
Input: Probabilistic diffusion model 概率扩散模型
Step1: Pre-training on unlabeled data 在无标签数据上进行预训练
Step2: Recursive Likelihood Ratio optimizer proposal 提出递归似然比优化器
Step3: Implementation of zero-order gradient estimation 零阶梯度估计的实施
Output: Aligned diffusion models 对齐的扩散模型
7.5 [7.5] 2502.00662 Mitigating the Modality Gap: Few-Shot Out-of-Distribution Detection with Multi-modal Prototypes and Image Bias Estimation
[{'name': 'Yimu Wang, Evelien Riddell, Adrian Chow, Sean Sedwards, Krzysztof Czarnecki'}]
VLM & VLA 视觉语言模型与对齐 vision-language models
out-of-distribution detection
few-shot learning
Input: ID image and text prototypes 输入: ID图像和文本原型
Step1: Theoretical analysis 理论分析
Step2: Incorporation of image prototypes 图像原型的整合
Step3: Development of biased prompts generation (BPG) module 偏差提示生成(BPG)模块的开发
Step4: Implementation of image-text consistency (ITC) module 图像文本一致性(ITC)模块的实施
Output: Enhanced VLM-based OOD detection performance 输出: 改进的基于VLM的OOD检测性能
7.5 [7.5] 2502.00711 VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
[{'name': 'Chunbai Zhang, Chao Wang, Yang Zhou, Yan Peng'}]
Vision-Language Models (VLMs) 视觉语言模型 visual reasoning
evidence-based reasoning
VLM
Input: Visual information (images/videos) 输入: 视觉信息(图像/视频)
Step1: Extract fine-grained visual knowledge from visual relationships 第一步: 从视觉关系中提取细粒度视觉知识
Step2: Paraphrase questions with underspecification using extracted knowledge 第二步: 利用提取的知识对欠规范的问题进行改写
Step3: Employ Chain-of-Evidence prompting for interpretable reasoning 第三步: 使用证据链提示进行可解释推理
Output: Enhanced visual reasoning capabilities 输出: 改进的视觉推理能力
7.5 [7.5] 2502.00719 Vision and Language Reference Prompt into SAM for Few-shot Segmentation
[{'name': 'Kosuke Sakurai, Ryotaro Shimizu, Masayuki Goto'}]
VLM & VLA 视觉语言模型与对齐 few-shot segmentation
vision-language model
Input: Annotated reference images and text labels 参考图像和文本标签
Step1: Input visual and semantic reference信息输入视觉和语义参考
Step2: Integrate prompt embeddings into SAM 将提示嵌入集成到SAM
Step3: Few-shot segmentation via VLP-SAM 通过VLP-SAM进行少样本分割
Output: High-performance segmentation results 高性能的分割结果
7.5 [7.5] 2502.00972 Pushing the Boundaries of State Space Models for Image and Video Generation
[{'name': 'Yicong Hong, Long Mai, Yuan Yao, Feng Liu'}]
Image Generation 图像生成 image generation
video generation
Input: Visual sequences 视觉序列
Step1: Model development 模型开发
Step2: Integration of SSM and Transformers SSM与变换器的整合
Step3: Evaluation of generated outputs 生成结果的评估
Output: Generated images and videos 生成的图像和视频
7.5 [7.5] 2502.01524 Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective
[{'name': 'Xiaorui Ma, Haoran Xie, S. Joe Qin'}]
VLM & VLA 视觉语言模型与对齐 Vision-Language
Large Language Models
parameter efficiency
Step1: Introduce architecture of LLMs 介绍LLM架构
Step2: Discuss parameter-efficient learning methods 讨论参数效率学习方法
Step3: Present taxonomy of modality integrators 提出模态集成器分类
Step4: Review training paradigms and efficiency considerations 回顾训练范式及效率考虑
Step5: Compare experimental results of representative models 比较代表模型的实验结果
7.5 [7.5] 2502.01530 The in-context inductive biases of vision-language models differ across modalities
[{'name': 'Kelsey Allen, Ishita Dasgupta, Eliza Kosoy, Andrew K. Lampinen'}]
Vision-Language Models (VLMs) 视觉语言模型 vision-language models
inductive biases
generalization
Input: Stimuli presented in vision and text 视觉和文本中呈现的刺激
Step1: Conduct experiments 进行实验
Step2: Analyze generalization across models 分析模型间的概括性
Output: Insights on inductive biases regarding shape and color 对形状和颜色的归纳偏见的见解
5.0 [5.0] 2502.00618 DesCLIP: Robust Continual Adaptation via General Attribute Descriptions for Pretrained Vision-Language Models
[{'name': 'Chiyuan He, Zihuan Qiu, Fanman Meng, Linfeng Xu, Qingbo Wu, Hongliang Li'}]
Vision-Language Models (VLMs) 视觉语言模型 vision-language models
continual adaptation
attribute descriptions
Input: Visual features and class text visuals 视觉特征和类别文本
Step1: Generate general attribute descriptions 生成一般属性描述
Step2: Design anchor-based embedding filter 设计基于锚点的嵌入过滤器
Step3: Tune visual encoder 调整视觉编码器
Output: Robust vision-GA-class associations 稳健的视觉-一般属性-类别关联

Arxiv 2025-01-31

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2501.17978v2 VoD-3DGS: View-opacity-Dependent 3D Gaussian Splatting 3D generation 3D生成 3D Gaussian Splatting
view-dependent representation
3D高斯渲染
视角依赖表示
input: images 图片
extend the 3D Gaussian Splatting model 扩展3D高斯渲染模型
introduce an additional symmetric matrix 引入额外的对称矩阵
achieve view-dependent opacity representation 实现视角依赖的透明度表示
output: improved 3D scene reconstruction 输出:改进的3D场景重建
8.5 [8.5] 2501.19319v1 Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping 3D reconstruction 三维重建 3D reconstruction
3D Gaussian Splatting
endoscopic SLAM
depth reconstruction
三维重建
3D高斯斑点
内窥镜SLAM
深度重建
input: endoscopic image sequences 内窥镜图像序列
Step 1: tracking using Gaussian Splatting 使用高斯斑点的跟踪
Step 2: mapping and bundle adjustment 映射与束调整
Step 3: surface normal-aware reconstruction 结合表面法向量进行重构
output: accurate 3D reconstruction and real-time tracking 输出: 精确的3D重建与实时跟踪
8.5 [8.5] 2501.19270v1 Imagine with the Teacher: Complete Shape in a Multi-View Distillation Way 3D reconstruction 三维重建 Point Cloud Completion
3D Shape Completion
Knowledge Distillation
Points Completion
点云补全
3D形状补全
知识蒸馏
点补全
input: incomplete point cloud 有缺失的点云
step1: apply autoencoder to encode the point cloud 应用自编码器对点云进行编码
step2: use knowledge distillation for completion 使用知识蒸馏进行补全
step3: output: completed 3D shape 输出:完整的3D形状
8.5 [8.5] 2501.19196v1 RaySplats: Ray Tracing based Gaussian Splatting 3D generation 3D生成 3D Gaussian Splatting
Gaussian Splatting
3D高斯喷溅
高斯喷溅
Input: 2D images 2D图像
Ray-tracing mechanism 射线追踪机制
Intersection computation 交点计算
Ray-tracing algorithms construction 射线追踪算法构建
Final 3D object with lighting and shadows 最终带有光影效果的三维物体
8.5 [8.5] 2501.19088v1 JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting 3D generation 3D生成 3D Gaussian Splatting
3D reconstruction
实时渲染
3D高斯分喷
三维重建
input: 3D key points (输入:3D关键点)
Step 1: Create a joint-driven 3D Gaussian representation (步骤1:创建联合驱动的3D高斯表示)
Step 2: Implement differentiable spatial transformations (步骤2:实现可微分的空间变换)
Step 3: Apply real-time shadow simulation method (步骤3:应用实时阴影模拟方法)
output: High-fidelity hand images (输出:高保真的手部图像)
8.5 [8.5] 2501.18982v1 OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation 3D generation 3D生成 3D generation
3D gaussian
物体生成
3D高斯
input: 3D assets 3D资产
extract: physical properties 提取物理属性
generate: physics-based dynamics 生成基于物理的动态
output: dynamic scene 输出动态场景
7.5 [7.5] 2501.19382v1 LiDAR Loop Closure Detection using Semantic Graphs with Graph Attention Networks Autonomous Driving 自动驾驶 LiDAR
loop closure detection
graph attention networks
place recognition
semanitic registration
激光雷达
回环闭合检测
图注意力网络
地点识别
语义注册
input: semantic graphs 语义图
step1: encode semantic graphs using graph attention networks 使用图注意力网络编码语义图
step2: compare graph vectors to identify loop closure 比较图向量以识别回环闭合
step3: estimate 6 DoF pose constraint using semantic registration 使用语义注册估计6自由度位姿约束
output: loop closure detection results 回环闭合检测结果
7.5 [7.5] 2501.19259v1 Neuro-LIFT: A Neuromorphic, LLM-based Interactive Framework for Autonomous Drone FlighT at the Edge Autonomous Driving 自主驾驶 Autonomous Driving
Neuromorphic Vision
Real-time Navigation
Autonomous Systems
自驾驶
神经形态视觉
实时导航
自主系统
Input: Human speech commands 人类语音指令
Step 1: Translate speech into planning commands 将语音翻译成规划指令
Step 2: Execute commands using neuromorphic vision 执行命令使用神经形态视觉
Step 3: Navigate and avoid obstacles in real-time 实时导航和避免障碍
Output: Autonomous drone navigation output 自主无人机导航输出
7.5 [7.5] 2501.19252v1 Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search Video Generation 视频生成 video generation
text-to-video models
视频生成
文本到视频模型
input: diffusion model inputs 输入:扩散模型输入
step1: align video frames with text prompts 步骤1:将视频帧与文本提示对齐
step2: utilize a beam search strategy to optimize output 使用束搜索策略优化输出
step3: compute metrics for perceptual quality evaluation 计算感知质量评估的指标
output: high-quality, aligned video generation 输出:高质量、对齐的视频生成
7.5 [7.5] 2501.19035v1 SynthmanticLiDAR: A Synthetic Dataset for Semantic Segmentation on LiDAR Imaging Autonomous Driving 自动驾驶 Semantic Segmentation
LiDAR Imaging
Autonomous Driving
合成分割
LiDAR成像
自动驾驶
input: LiDAR data 输入: LiDAR 数据
step1: generate synthetic dataset 生成合成数据集
step2: utilize CARLA simulator 使用 CARLA 模拟器
step3: train segmentation algorithms 训练分割算法
output: improved segmentation performance 输出: 改进的分割性能
7.5 [7.5] 2501.17159v2 IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait Image Generation 图像生成 personalized portrait generation
identity preservation
view-consistent reconstruction
个性化肖像生成
身份保留
视角一致重建
input: reference images 参考图像
step1: Lighting-Aware Stitching 光照感知拼接
step2: View-Consistent Adaptation 视角一致自适应
step3: ControlNet-like supervision 控制网络样监督
output: personalized portraits 个性化肖像
6.5 [6.5] 2501.18994v1 VKFPos: A Learning-Based Monocular Positioning with Variational Bayesian Extended Kalman Filter Integration Autonomous Driving (自动驾驶) Monocular Positioning
Extended Kalman Filter
Deep Learning
Single-shot
单目定位
扩展卡尔曼滤波
深度学习
单次
input: monocular images 单目图像
step1: Absolute Pose Regression (APR) 绝对姿态回归
step2: Relative Pose Regression (RPR) 相对姿态回归
step3: Integrate APR and RPR using EKF 通过扩展卡尔曼滤波整合APR和RPR
output: accurate positioning results 精确定位结果
6.0 [6.0] 2501.19331v1 Consistent Video Colorization via Palette Guidance Video Generation 视频生成 Video Colorization
Stable Video Diffusion
Palette Guidance
视频上色
稳定视频扩散
调色板引导
input: video sequences 视频序列
step 1: design palette-based color guider 设计调色板引导器
step 2: utilize Stable Video Diffusion as base model 利用稳定视频扩散作为基础模型
step 3: generate vivid colors using color context 根据颜色上下文生成生动的颜色
output: colorized video sequences 上色的视频序列
5.5 [5.5] 2501.18865v1 REG: Rectified Gradient Guidance for Conditional Diffusion Models Image Generation 图像生成 conditional generation
diffusion models
conditional generation 条件生成
扩散模型
input: guidance techniques 指导技术
step1: replace the scaled marginal distribution target 替换缩放的边际分布目标
step2: implement rectified gradient guidance 实施矩形梯度指导
step3: conduct experiments on image generation tasks 进行图像生成任务的实验
output: improved image generation results 改进的图像生成结果

Arxiv 2025-01-31

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2501.19196v1 RaySplats: Ray Tracing based Gaussian Splatting 3D generation 三维生成 3D Gaussian Splatting
Ray Tracing
3D高斯点云
光线追踪
input: 2D images 2D图像
process: Gaussian Splatting 高斯点云渲染
process: ray tracing based on Gaussian primitives 基于高斯原始体的光线追踪
output: 3D objects with light and shadow effects 输出具有光影效果的3D物体
9.0 [9.0] 2501.17978v2 VoD-3DGS: View-opacity-Dependent 3D Gaussian Splatting 3D generation 3D生成 3D Gaussian Splatting
view-dependent rendering
3D高斯点云
视角依赖的渲染
input: 3D scene reconstruction from images 3D场景重建从图像中提取
step 1: extend 3D Gaussian Splatting model 扩展3D高斯点云模型
step 2: introduce symmetric matrix to enhance opacity representation 引入对称矩阵以增强不透明性表示
step 3: optimize suppression of Gaussians based on viewer perspective 根据观察者视角优化高斯的抑制
output: improved representation of view-dependent reflections and specular highlights 输出:改进视角依赖的反射和镜面高光的表示
8.5 [8.5] 2501.19319v1 Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping 3D reconstruction 三维重建 3D Gaussian Splatting
SLAM
endoscopic reconstruction
depth reconstruction
3D 高斯点
SLAM
内窥镜重建
深度重建
input: endoscopic images 内窥镜图像
step1: surface normal-aware tracking 表面法线感知跟踪
step2: accurate mapping 精确地图构建
step3: bundle adjustment 捆绑调整
output: geometrically accurate 3D reconstruction 准确的三维重建
8.5 [8.5] 2501.19252v1 Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search Video Generation 视频生成 Text-to-video
Diffusion models
Video generation
评分调整
文本转视频
扩散模型
视频生成
奖励校准
input: video generation prompts 视频生成提示
step1: employ diffusion latent beam search 使用扩散潜在光束搜索
step2: maximize alignment reward 最大化对齐奖励
step3: improve perceptual quality 提升感知质量
output: high-quality video optimized for natural movement 输出:高质量视频,优化自然运动
8.5 [8.5] 2501.19088v1 JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting 3D generation 3D生成 3D Gaussian Splatting
animatable hand avatar
3D高斯喷涂
可动画手部化身
input: 3D key points 3D关键点
Jointly 3D Gaussian Splatting (3DGS) joint-driven representation 联合3D高斯喷涂(3DGS)驱动表示
apply spatial transformations based on 3D key points 基于3D关键点应用空间变换
real-time rendering and shadow simulation 实时渲染和阴影模拟
output: animatable high-fidelity hand images 输出:可动画的高保真手部图像
8.5 [8.5] 2501.18982v1 OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation 3D generation 3D生成 3D generation
3D gaussian
3D生成
3D高斯
input: user-specified prompts 用户指定的提示
step1: define a scene according to user prompts 根据用户提示定义场景
step2: estimate material weighting factors using a pretrained video diffusion model 使用预训练的视频扩散模型估计材料权重因子
step3: represent each 3D asset as a collection of constitutive 3D Gaussians 将每个3D资产表示为一组组成的3D高斯分布
output: a physics-based 3D dynamic scene 输出:基于物理的3D动态场景
8.0 [8.0] 2501.19270v1 Imagine with the Teacher: Complete Shape in a Multi-View Distillation Way 3D reconstruction三维重建 Point Cloud Completion
Multi-view Distillation
3D Shape Recovery
点云补全
多视图蒸馏
3D形状恢复
input: incomplete point cloud 输入: 不完整的点云
step1: apply autoencoder architecture 应用自编码器架构
step2: use knowledge distillation strategy to enhance completion 使用知识蒸馏策略以增强完成度
step3: output: completed point cloud 输出: 完整的点云
7.5 [7.5] 2501.19382v1 LiDAR Loop Closure Detection using Semantic Graphs with Graph Attention Networks Autonomous Driving 自主驾驶 Loop Closure Detection
Semantic Graphs
Graph Attention Networks
闭环检测
语义图
图注意力网络
input: point cloud 输入: 点云
step1: encode semantic graphs using graph attention networks 步骤1: 使用图注意力网络编码语义图
step2: generate graph vectors through self-attention mechanisms 步骤2: 通过自注意力机制生成图向量
step3: compare graph vectors to detect loop closure 步骤3: 比较图向量以检测闭环
output: loop closure candidates 输出: 闭环候选
7.5 [7.5] 2501.19035v1 SynthmanticLiDAR: A Synthetic Dataset for Semantic Segmentation on LiDAR Imaging Autonomous Driving 自主驾驶 Semantic segmentation
LiDAR imaging
autonomous driving
合成分割
LiDAR成像
自主驾驶
input: LiDAR images (输入: LiDAR图像)
modify CARLA simulator (修改CARLA模拟器)
generate SynthmanticLiDAR dataset (生成SynthmanticLiDAR数据集)
evaluate with transfer learning (使用迁移学习进行评估)
output: improved semantic segmentation performance (输出: 改进的语义分割性能)
7.5 [7.5] 2501.17159v2 IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait Image Generation 图像生成 Personalized Portrait Generation
3D-aware relighting
个性化肖像生成
具3D感知的重光照
Input: reference portrait images 参考肖像图像
Step 1: Lighting-Aware Stitching 具光照感知的拼接
Step 2: View-Consistent Adaptation 具视图一致的适配
Output: personalized portraits with identity preservation 具有身份保留的个性化肖像
7.0 [7.0] 2501.19243v1 Accelerating Diffusion Transformer via Error-Optimized Cache Image Generation 图像生成 Image Generation
Diffusion Transformer
ImageNet Dataset
图像生成
扩散变换器
ImageNet数据集
input: Diffusion Transformer features (扩散变换器特征)
extract caching differences (提取缓存差异)
optimize cache based on errors (基于错误优化缓存)
output: improved generated images (输出: 改进的生成图像)
6.5 [6.5] 2501.19259v1 Neuro-LIFT: A Neuromorphic, LLM-based Interactive Framework for Autonomous Drone FlighT at the Edge Autonomous Driving 自主驾驶 autonomous driving
natural language processing
neuroscience
autonomous navigation
自主驾驶
自然语言处理
神经科学
自主导航
input: human speech and dynamic environment 输入:人类语言和动态环境
step1: translate human speech into planning commands 步骤1:将人类语言翻译为规划命令
step2: navigate and avoid obstacles using neuromorphic vision 步骤2:利用神经形态视觉导航并避免障碍物
output: real-time autonomous navigation output 实时自主导航结果
6.5 [6.5] 2501.18994v1 VKFPos: A Learning-Based Monocular Positioning with Variational Bayesian Extended Kalman Filter Integration Autonomous Driving 自主驾驶 monocular positioning
extended kalman filter
variational bayesian inference
单目定位
扩展卡尔曼滤波
变分贝叶斯推理
input: monocular images 单目图像
step1: Absolute Pose Regression (APR) 绝对姿态回归
step2: Relative Pose Regression (RPR) 相对姿态回归
step3: Integration with Extended Kalman Filter (EKF) 通过扩展卡尔曼滤波整合
output: accurate positional predictions 准确的位置信息预测

Arxiv 2025-01-30

Relavance Title Research Topic Keywords Pipeline
8.5 [8.5] 2501.18594v1 Foundational Models for 3D Point Clouds: A Survey and Outlook 3D reconstruction 3D重建 3D point clouds
foundational models
3D视觉理解
基础模型
3D点云
input: 3D point clouds 3D点云
step1: review of foundational models FMs 基础模型的回顾
step2: categorize use of FMs in 3D tasks 分类基础模型在3D任务中的应用
step3: summarize state-of-the-art methods 总结最新的方法
output: comprehensive overview of FMs for 3D understanding 输出:基础模型在3D理解中的综合概述
8.5 [8.5] 2501.18162v1 IROAM: Improving Roadside Monocular 3D Object Detection Learning from Autonomous Vehicle Data Domain Autonomous Driving 自动驾驶 3D object detection
autonomous driving
3D对象检测
自动驾驶
input: roadside data and vehicle-side data
In-Domain Query Interaction module learns content and depth information
Cross-Domain Query Enhancement decouples queries into semantic and geometry parts
outputs enhanced object queries
8.5 [8.5] 2501.18110v1 Lifelong 3D Mapping Framework for Hand-held & Robot-mounted LiDAR Mapping Systems 3D reconstruction 三维重建 3D Mapping
3D Reconstruction
Lifelong Mapping
激光雷达
三维映射
三维重建
终身映射
Input: Hand-held and robot-mounted LiDAR maps 输入:手持和机器人安装的激光雷达地图
Dynamic point removal algorithm 动态点去除算法
Multi-session map alignment using feature descriptor matching and fine registration 多会话地图对齐,使用特征描述符匹配和精细配准
Map change detection to identify changes between aligned maps 地图变化检测以识别对齐地图之间的变化
Map version control for maintaining current environmental state and querying changes 地图版本控制,用于维护当前环境状态和查询变化
8.0 [8.0] 2501.18595v1 ROSA: Reconstructing Object Shape and Appearance Textures by Adaptive Detail Transfer Mesh Reconstruction 网格重建 Mesh Reconstruction
3D reconstruction
网格重建
三维重建
input: limited set of images 限制的图像集
step1: optimize mesh geometry 优化网格几何形状
step2: refine mesh with spatially adaptive resolution 使用空间自适应分辨率细化网格
step3: reconstruct high-resolution textures 重新构建高分辨率纹理
output: textured mesh with detailed appearance 带有详细外观的纹理网格
7.5 [7.5] 2501.18590v1 DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models Rendering Techniques 渲染技术 Inverse Rendering
Forward Rendering
Video Diffusion Models
Inverse渲染
正向渲染
视频扩散模型
input: real-world videos, 真实世界视频
step1: estimate G-buffers using inverse rendering model, 使用逆向渲染模型估计G-buffer
step2: generate photorealistic images from G-buffers, 从G-buffer生成照片级真实图像
output: relit images, material edited images, realistic object insertions, 重新照明图像,材料编辑图像,逼真的物体插入
7.5 [7.5] 2501.18315v1 Surface Defect Identification using Bayesian Filtering on a 3D Mesh Mesh Reconstruction 网格重建 3D Mesh
Mesh Reconstruction
3D网格
网格重建
input: CAD model and point cloud data 输入:CAD模型和点云数据
transform CAD model into polygonal mesh 将CAD模型转换为多边形网格
apply weighted least squares algorithm 应用加权最小二乘算法
estimate state based on point cloud measurements 根据点云测量估计状态
output: high-precision defect identification 输出:高精度缺陷识别
7.5 [7.5] 2501.17636v2 Efficient Interactive 3D Multi-Object Removal 3D reconstruction 三维重建 3D scene understanding
multi-object removal
3D场景理解
多对象移除
input: selected areas and objects for removal 选定的移除区域和对象
step1: mask matching and refinement mask 匹配和细化掩码步骤
step2: homography-based warping 同伦变换基础的扭曲
step3: inpainting process 修复过程
output: modified 3D scene 修改后的3D场景
7.0 [7.0] 2501.18246v1 Ground Awareness in Deep Learning for Large Outdoor Point Cloud Segmentation 3D reconstruction 三维重建 point cloud segmentation
outdoor point clouds
semantic segmentation
point cloud
关键点云分割
户外点云
语义分割
点云
input: outdoor point clouds 户外点云
compute Digital Terrain Models (DTMs) 计算数字地形模型
employ RandLA-Net for segmentation 使用 RandLA-Net 进行分割
evaluate performance on datasets 评估在数据集上的表现
integrate relative elevation features 集成相对高程特征
6.5 [6.5] 2501.18494v1 Runway vs. Taxiway: Challenges in Automated Line Identification and Notation Approaches Autonomous Driving 自动驾驶 Automated line identification 自动化线识别
Convolutional Neural Network 卷积神经网络
runway markings 跑道标记
autonomous systems 自动化系统
labeling algorithms 标记算法
input: runway and taxiway images 跑道和滑行道图像
Step 1: color threshold adjustment 颜色阈值调整
Step 2: refine region of interest selection 精细化感兴趣区域选择
Step 3: integrate CNN classification 集成CNN分类
output: improved marking identification 改进的标记识别

Newly Found Papers on ...

(Older entries get replaced automatically when the script runs again.)

About

Call Arxiv API and automatically update paper list

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published