'NLP, LLM, Multi-modal' 카테고리의 글 목록

Notice

Recent Posts

Recent Comments

Link

« 2026/03 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Tags more

Archives

Today

Total

관리 메뉴

목록NLP, LLM, Multi-modal (15)

JINWOOJUNG

3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera

Paperhttps://arxiv.org/pdf/1910.02527https://3dscenegraph.stanford.edu/images/supp_mat.pdf Introduction객체 및 공간의 기하학적 구조, 카테고리(클래스), 특정 장면의 시점 등의 정보를 효과적으로 저장하는 것은 매우 중요한 문제이다. 이러한 정보를 저장하기 위한 이상적인 공간은 변화에 불변 즉, 전체적인 공간적 정보를 불변하게 모두 포괄해야 한다. 또한, 이미지나 비디오 등 다양한 도메인에 쉽고 결정론적으로 연결되어야 한다. 이러한 측면에서, 이미지는 이상적인 해결책이 되지 않는다. 이미지는 시점에 따른 제약이 존재하며, Depth/Size 등의 정보를 효과적으로 다루지 못한다. 따라서 본 논문에서는 3D Scene Gra..

NLP, LLM, Multi-modal 2025. 7. 17. 21:02

BAT: Learning to Reason about Spatial Sounds with Large Language Models

Paperhttps://arxiv.org/abs/2402.01591 BAT: Learning to Reason about Spatial Sounds with Large Language ModelsSpatial sound reasoning is a fundamental human skill, enabling us to navigate and interpret our surroundings based on sound. In this paper we present BAT, which combines the spatial sound perception ability of a binaural acoustic scene analysis model witharxiv.org IntroductionBLIP-2, CLIP..

NLP, LLM, Multi-modal 2025. 7. 4. 12:17

3D Concept Learning and Reasoning from Multi-View Image

Paperhttps://arxiv.org/abs/2303.11327 3D Concept Learning and Reasoning from Multi-View ImagesHumans are able to accurately reason in 3D by gathering multi-view observations of the surrounding world. Inspired by this insight, we introduce a new large-scale benchmark for 3D multi-view visual question answering (3DMV-VQA). This dataset is collected barxiv.org IntroductionVisual Reasoning은 시각적 장면에 ..

NLP, LLM, Multi-modal 2025. 6. 26. 00:59

Habitat-Matterport 3D semantic dataset

보호되어 있는 글입니다.

NLP, LLM, Multi-modal 2025. 6. 25. 15:18

SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning

Paperhttps://arxiv.org/abs/2206.08312 SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic LearningWe introduce SoundSpaces 2.0, a platform for on-the-fly geometry-based audio rendering for 3D environments. Given a 3D mesh of a real-world environment, SoundSpaces can generate highly realistic acoustics for arbitrary sounds captured from arbitrary microparxiv.org Introduction시각(Vision)과 청각(..

NLP, LLM, Multi-modal 2025. 6. 22. 02:41

[ Transformer to LLaMA ] ELMo: Embeddings from Language Models

본 포스팅은 서울대학교 강필성 교수님의 Transformer to LLaMA 강의자료 및 강의를 기반으로공부한 내용을 정리하는 포스팅입니다.https://www.youtube.com/watch?v=zV8kIUwH32M&list=PLRmOKHpXQgr8CDy-eG4pC1hSNkkneCaWJ Embeddings from Language Models ELMo는 2018년에 제안된 새로운 Word Embedding Method이다. Embeddings from Language Models의 약자인 ELMo는 이름에서도 알 수 있듯이 Pre-trained Language Model을 사용한다. 즉, Pre-trained Language Model을 기반으로, 문맥에 따라 단어의 Embedding이 달라지는 Con..

NLP, LLM, Multi-modal 2025. 4. 28. 20:24

[ Transformer to LLaMA ] Transformer..02

본 포스팅은 서울대학교 강필성 교수님의 Transformer to LLaMA 강의자료 및 강의를 기반으로공부한 내용을 정리하는 포스팅입니다.https://www.youtube.com/watch?v=Yk1tV_cXMMU&t=1021s08-2: Transformer Transformer는 Attention Mechanism을 바탕으로 NLP, CV 등 다양한 분야에 새로운 발전을 이끌어 낸 딥러닝 모델입니다. 본 포스팅에 앞서, Attnetion Mechanism을 공부하지 않은 경우 아래 포스팅에서 먼저 공부 하시는 것을 추천합니다.https://jinwoo-jung.tistory.com/148 [EECS 498] Lecture 17: Attention본 포스팅은 Michigan Univ.의 EECS 4..

NLP, LLM, Multi-modal 2025. 4. 25. 16:59

[ NLP ] BLEU Score: 기계번역 평가지표

NLP에서 기계 번역의 성능이 얼마나 뛰어난가를 측정하기 위해 사용되는 평가지표 중 하나가 BLEU(Bilingual Evaluation Understudy)이다. 오늘은 BLEU를 계산하는 방법에 대해 알아보자. BLEUBLEU는 기계 번역 결과와 사람이 직접 번역한 결과가 얼마나 유사한지 비교하여 번역 성능을 측정하는 방법이다. BLEU 점수는 아래 3가지 주요 요소를 기반으로 계산된다.n-gram Precision : 예측 문장에서 참조 문장과 겹치는 n-gram의 비율(1~4 gram)Clipping : 같은 n-gram이 중복으로 등장할 경우, 참조 문장에서 등장한 최대 횟수까지만 Precision 계산에 반영Brevity Penalty(BP) : 예측 문장이 너무 짧아 Precision만으로 ..

NLP, LLM, Multi-modal 2025. 4. 25. 15:12

이전 Prev 1 2 Next 다음

목록NLP, LLM, Multi-modal (15)

JINWOOJUNG

티스토리툴바