Feeling inspired to start your own project? Learn more about research opportunities for CS undergraduate students here.
Student Spotlight: Shiye “Sally” Cao
The fourth-year undergraduate was named a finalist for the 2022 CRA Outstanding Undergraduate Researcher Award for her work in AI research.
Research Projects
Faculty Research Advisor: Kenton Murray
Abstract: In this work, we focus on intrasentential code-mixing and propose several different synthetic code-mixing (SCM) data augmentation methods that outperform the baseline on downstream sentiment analysis tasks across various amounts of labeled gold data. Our proposed methods demonstrate that strategically replacing parts of sentences in the matrix language with a constant mask significantly improves classification accuracy, motivating further linguistic insights into the phenomenon of code-mixing. We test our data augmentation method in a variety of low-resource and cross-lingual settings, reaching up to a relative improvement of 7.73% on the extremely scarce English-Malayalam dataset. We conclude that the code-switch pattern in code-mixing sentences is also important for the model to learn. Finally, we propose a language-agnostic SCM algorithm that is cheap—yet extremely helpful—for low-resource languages.
Faculty Research Advisor: Michael F. Bonner
Abstract: Memory often fills in what is not there. A striking example of this is boundary extension, whereby observers mistakenly recall a view that extends beyond what was actually seen. However, not all visual memories extend in this way, suggesting that this process depends on specific scene properties. What factors determine when visual memories will include details that go beyond perceptual experience? Here, seven experiments (N = 1,100 adults) explored whether spatial scale—specifically, perceived viewing distance—drives boundary extension. We created fake miniatures by exploiting tilt shift, a photographic effect that selectively reduces perceived distance while preserving other scene properties (e.g., making a distant railway appear like a model train). Fake miniaturization increased boundary extension for otherwise identical scenes; participants who performed a scene-memory task misremembered fake-miniaturized views as farther away than they actually were. This effect went beyond low-level image changes and generalized to a completely different distance manipulation. Thus, visual memory is modulated by the spatial scale at which the environment is viewed.
Faculty Research Advisor: Benjamin Van Durme
Abstract: Our common-sense knowledge about objects includes their typical visual attributes; for example, we know that bananas are typically yellow or green, not purple. Text and image corpora, being subject to reporting bias, represent this world knowledge with varying degrees of faithfulness. In this paper, we investigate to what degree unimodal (language-only) and multimodal (image and language) models capture a broad range of visually salient attributes. To this end, we create the Visual Commonsense Tests (ViComTe) dataset, covering five property types (color, shape, material, size, and visual co-occurrence) for over 5,000 subjects. We validate this dataset by showing that our grounded color data correlates much better than ungrounded text-only data with crowdsourced color judgments provided by Paik et al. (2021). We then use our dataset to evaluate pretrained unimodal models and multimodal models. Our results indicate that multimodal models better reconstruct attribute distributions, but are still subject to reporting bias. Moreover, increasing model size does not enhance performance, suggesting that the key to visual common sense lies in the data themselves.
Student Spotlight: 2022 Pistritto Fellows
Chinat Yu, Kyuhee Jo, and Jingyu “Jack” Zhang were awarded Pistritto Fellowships this year. The fellowship program was created to foster student-faculty collaboration as undergraduates explore research areas.