Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2 
17th International Workshop on Treebanks and Linguistic Theories (TLT), Dec 2018
Building a dataset for Native Language Identification from language learning forums.
PeerJ Computer Science, Oct 2021
Using paraphrasing to improve performance across multitask question answering benchmarks.
Proceedings of the Language Resources and Evaluation Conference (LREC 2022), Jun 2022
A new long document benchmark consisting of only documents over 10,000 tokens
The 14th International Conference on Learning Representations (ICLR 2023), May 2023
An editing method is proposed that can effectively improve the robustness of models against length attacks and can be attributed to reduced length information in the embeddings, more robust intra-document token interaction.
EMNLP/CoNLL 2023 BabyLM Challenge, May 2023
A contextualizer pretraining strategy to produce more human-like language models; winner of the EMNLP/CoNLL BabyLM Challenge Loose Track.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), Dec 2023
We show that contrastive learning models are sensitive to text length in ways that distort semantic representations, and propose a length-agnostic framework that improves robustness and retrieval performance.
IEEE Transactions on Neural Networks and Learning Systems, Jul 2025
Pixel-space spatiotemporal transformers for predicting the future states of dynamic physical simulations.
International Conference on Computer Vision (ICCV 2025), Oct 2025
A unified framework that treats all modalities as video and learns through next-frame prediction.
Nature Communications Medicine, Mar 2026
Building a model which consideres clinical freetext alongside dermatology images.
npj Digital Medicine, Apr 2026
Using large language models to automatically monitor and improve the quality of hospital discharge summaries.
Nature Communications Medicine, Apr 2026
Using synthetic data to build robust anonymization models.
npj Digital Medicine, May 2026
Combining survival analysis with machine learning to estimate healthy life expectancy from personal health records.
Presenting my work on using NLP for exploring hospital discharge summaries.