I am a 1st-year PhD Stuent in Johns Hopkins University, advised by Professor Vishal Patel. My primary research focus is on computer vision, generative models and AI. Before coming into JHU, I was a Research Assistant in CITI at Academia Sinica with Professor Jun-Cheng Chen. I received my undergraduate degree in Computer Science and Engineering from National Sun Yat-sen University, where I got my start on research working with Professor Chia-Ping Chen.
My primary interest lies in machine learning and generative models. I'm currently working on video generation using image Diffusion Models. (Last updated on June 1, 2023)
Before join CITI, I also worked part-time at the Office of International Affairs, NSYSU as a full stack Web developer. I develop Web applications for exchange programs mostly using PHP, Express, Vue, and maintaining all websites across the OIA office.
I've also involved in subjects such as, computer graphics, socket programming, attribute-based encryption, data visualization, compiler design and chrome extension development during my undergraduate study.
This study introduces an efficient and effective method, MeDM, that utilizes pre-trained image Diffusion Models for video-to-video translation with consistent temporal flow. The proposed framework can render videos from scene position information, such as a normal G-buffer, or perform text-guided editing on videos captured in real-world scenarios. We employ explicit optical flows to construct a practical coding that enforces physical constraints on generated frames and mediates independent frame-wise scores. By leveraging this coding, maintaining temporal consistency in the generated videos can be framed as an optimization problem with a closed-form solution. To ensure compatibility with Stable Diffusion, we also suggest a workaround for modifying observed-space scores in latent-space Diffusion Models. Notably, MeDM does not require fine-tuning or test-time optimization of the Diffusion Models. Through extensive qualitative, quantitative, and subjective experiments on various benchmarks, the study demonstrates the effectiveness and superiority of the proposed approach.
Deepfake detection has attracted extensive attention due to widespread forged images on social media. Recently, self-supervised learning (SSL)-based Deepfake detection approaches have outperformed supervised methods in terms of model generalization. However, we notice that most SSL-based methods do not take the manipulation strength levels of synthesized forgery samples into consideration according to different synthesis parameters and result in suboptimal detection performances. To address this issue, we introduce several auxiliary losses to the state-of-the-art SSL-based method based on different synthesis sub-tasks during data generation by inferring their synthesis parameters where the ground-truth labels are obtained from the synthesis pipeline for free. With comprehensive evaluations on various benchmarks, our approach has achieved noticeable performance improvement. Specifically, for the cross-dataset evaluation, the proposed approach outperforms the state-of-the-art method in terms of AUC on various datasets with improvements of 3.4%, 1.47%, 1.56%, and 1.3% on the CDF, DFDC, DFDCP, and FFIW datasets and achieves competitive performance on the DFD dataset. This further demonstrates the effectiveness of the proposed approach in generalization.
Please refer to my CV and Google Scholar profile for a full list.