index.html


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--suppress CheckImageSize -->
<html xmlns="http://www.w3.org/1999/xhtml">

  <head>

    <script src="js/head.js?prefix="></script>
    <meta name="description" content="Cycle-Contrast for Self-Supervised Video Represenation Learning">
    <meta name="keywords" content="Lumada Data Science Lab,Hitachi,Vision,Video Representation,Learning,Action Recognition,Computer Science">

    <title>Cycle-Contrast for Self-Supervised Video Representation Learning</title>
    <link rel="stylesheet" href="css/main.css">

  </head>

  <body>

    <div class="outercontainer">
      <div class="container">

        <div class="content project_title">
          <h1>Cycle-Contrast for Self-Supervised Video Representation Learning</h1>
          <h3>Lumada Data Science Lab. Hitachi, Ltd.</h3>
          <span class="hr"></span>
        </div>

        <div class="content project_headline">
          <div class="img">
            <center>
            <img class="img_responsive" src="img/ccl_web.png" width="80%" alt="Overview of the framework"/>
            </center>
          </div>
          <div class="text">
            <p class="text-left"><b>Cycle-Contrastive Learning</b> - Learn a video representation is supposed to be closed across video and its frames yet distant to all the other videos and frames in corresponding domain, respectively.
            </p>
          </div>
        </div>

        <div class="content">
          <div class="text">
            <h2>Abstract</h2>
            <p>Cycle-Contrastive Learning (CCL) is a self-supervised method for learning video representation.
              Following a nature that there is a belong and inclusion relation of video and its frames,
              CCL is designed to find correspondences across frames and videos considering the contrastive representation in their domains respectively.
              It is different from recent approaches that merely learn correspondences across frames or clips.
              In our method, the frame and video representations are learned from a single network based on an R3D architecture,
              with a shared non-linear transformation for embedding both frame and video features before the cycle-contrastive loss.
              We demonstrate that the video representation learned by CCL can be transferred well to downstream tasks of video understanding,
              outperforming previous methods in nearest neighbour retrieval and action recognition on UCF101, HMDB51 and MMAct.</p>
          </div>
        </div>

        <div class="content">
          <div class="text">
            <h2>Publication</h2>
            <ul>
              <li>
                <div class="title"><a name="ccl_neurips">Cycle-Contrast for Self-Supervised Video Representation Learning</a></div>
                <div class="authors">
                  Quan Kong, Wenpeng Wei, Ziwei Deng, Tomoaki Yoshinaga, and Tomokazu Murakami
                </div>
                <div>
                  <span class="venue">NeurIPS 2020</span>
                  <span class="tag"><a href="https://papers.nips.cc/paper/2020/file/5c9452254bccd24b8ad0bb1ab4408ad1-Paper.pdf">PDF</a></span>
                  <span class="tag"><a href="https://ent.box.com/s/18ppgfecswpmvzg7xjmriyanu0v3iev9">Poster</a></span>
                  <span class="tag"><a href="bibtex/ccl.bib">Bibtex</a></span>
                </div>
              </li>
            </ul>
            </div>
        </div>

        <div class="content">
            <h2>Presentation</h2>
            <iframe src="//www.slideshare.net/slideshow/embed_code/key/3DxkBwnB8OgZ6J" width="50%" height="300px" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"
              style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe>
            <div style="margin-bottom:5px"></div>
        </div>

        <div class="content">
          <div class="text">
            <h2>Related Publications</h2>
            <ul>
              <li>
              <div class="title"><a name="mmact_iccv">MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding</a></div>
              <div class="authors">
                Quan Kong, Ziming Wu, Ziwei Deng, Klinkigt Martin, Bin Tong, and Tomokazu Murakami
              </div>
              <div>
                <span class="venue">ICCV 2019</span>
                <span class="tag"><a href="http://openaccess.thecvf.com/content_ICCV_2019/papers/Kong_MMAct_A_Large-Scale_Dataset_for_Cross_Modal_Human_Action_Understanding_ICCV_2019_paper.pdf">PDF</a></span>
                <span class="tag"><a href="https://mmact19.github.io/2019/">Project Page</a></span>
                <span class="tag"><a href="bibtex/mmact_iccv.bib">Bibtex</a></span>
              </div>
        </div>

      </div>
    </div>

  </body>

</html>

<style>
  .hr {
    display: block;
    flex: 1;
    margin: 0 0px;
    height: 2px;
    background: #D4D4D4;
  }
</style>