Learning to detect, associate, and recognize human actions and surrounding scenes in untrimmed videos
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Park, J. | - |
dc.contributor.author | Lee, J. | - |
dc.contributor.author | Jeon, S. | - |
dc.contributor.author | Kim, S. | - |
dc.contributor.author | Kim, S. | - |
dc.contributor.author | Sohn, K. | - |
dc.date.accessioned | 2023-04-21T01:40:24Z | - |
dc.date.available | 2023-04-21T01:40:24Z | - |
dc.date.issued | 2018-10 | - |
dc.identifier.issn | 0000-0000 | - |
dc.identifier.uri | https://yscholarhub.yonsei.ac.kr/handle/2021.sw.yonsei/6639 | - |
dc.description.abstract | While recognizing human actions and surrounding scenes addresses different aspects of video understanding, they have strong correlations that can be used to complement the singular information of each other. In this paper, we propose an approach for joint action and scene recognition that is formulated in an end-to-end learning framework based on temporal attention techniques and the fusion of them. By applying temporal attention modules to the generic feature network, action and scene features are extracted efficiently, and then they are composed to a single feature vector through the proposed fusion module. Our experiments on the CoVieW18 dataset show that our model is able to detect temporal attention with only weak supervision, and remarkably improves multi-task action and scene classification accuracies. © 2018 Association for Computing Machinery. | - |
dc.format.extent | 6 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | Association for Computing Machinery, Inc | - |
dc.title | Learning to detect, associate, and recognize human actions and surrounding scenes in untrimmed videos | - |
dc.type | Article | - |
dc.identifier.doi | 10.1145/3265987.3265989 | - |
dc.identifier.scopusid | 2-s2.0-85058144673 | - |
dc.identifier.bibliographicCitation | CoVieW 2018 - Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild, co-located with MM 2018, pp 21 - 26 | - |
dc.citation.title | CoVieW 2018 - Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild, co-located with MM 2018 | - |
dc.citation.startPage | 21 | - |
dc.citation.endPage | 26 | - |
dc.type.docType | Conference Paper | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | other | - |
dc.subject.keywordPlus | Semantics | - |
dc.subject.keywordPlus | Action classifications | - |
dc.subject.keywordPlus | Learning frameworks | - |
dc.subject.keywordPlus | Scene classification | - |
dc.subject.keywordPlus | Semantic features | - |
dc.subject.keywordPlus | Singular information | - |
dc.subject.keywordPlus | Strong correlation | - |
dc.subject.keywordPlus | Video classification | - |
dc.subject.keywordPlus | Video understanding | - |
dc.subject.keywordPlus | Classification (of information) | - |
dc.subject.keywordAuthor | Action Classification | - |
dc.subject.keywordAuthor | Scene Classification | - |
dc.subject.keywordAuthor | Semantic Feature Fusion | - |
dc.subject.keywordAuthor | Video Classification | - |
Items in Scholar Hub are protected by copyright, with all rights reserved, unless otherwise indicated.
Yonsei University 50 Yonsei-ro Seodaemun-gu, Seoul, 03722, Republic of Korea1599-1885
© 2021 YONSEI UNIV. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.