Publications from 2022
-
Do Text-to-Text Multi-Task Learners Suffer from Task Conflict?
Traditional multi-task learning architectures learn a single model across multiple tasks through a shared encoder followed by task-specific decoders. Learning these models often requires specialized training algorithms that address task-conflict in the shared parameter updates, which otherwise can lead to negative transfer. A new type of multi-task learning within NLP homogenizes multi-task architectures as a shared encoder and language model decoder, which does surprisingly well across a range of diverse tasks. Does this new architecture suffer from task-conflicts that require specialized training algorithms? We study how certain factors in the shift towards text-to-text models affects multi-task conflict and negative transfer, finding that both directional conflict and transfer are surprisingly constant across architectures.
David Mueller , Nicholas Andrews , Mark Dredze
Findings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
-
The Importance of Temperature in Multi-Task Optimization
The promise of multi-task learning is that optimizing a single model on multiple related tasks will lead to a better solution for all tasks than independently trained models. In practice, optimization difficulties, such as conflicting gradients, can result in negative transfer, where multi-task models which perform worse than single-task models. In this work, we identify the optimization temperature---the ratio of learning rate to batch size---as a key factor in negative transfer. Temperature controls the level of noise in each optimization step, which prior work has shown to have a strong correlation with generalization. We demonstrate that, in some multi-task settings, negative transfer may arise due to poorly set optimization temperature, rather than inherently high task conflict. The implication of this finding is that in some settings, SGD with a carefully controlled temperature achieves comparable, and in some cases superior, performance to that of specialized optimization procedures such as PCGrad, MGDA, and GradNorm. In particular, our results suggest that the significant additional computational burden of these specialized methods may not always be necessary. Finally, we observe a conflict between the optimal temperatures of different tasks in a multi-task objective, with different levels of noise promoting better generalization for different tasks. Our work suggests the need for novel multi-task optimization methods which consider individual task noise-levels, and their impact on generalization.
David Mueller , Mark Dredze , Nicholas Andrews
OPT 2022: Optimization for Machine Learning (NeurIPS 2022 Workshop), 2022
-
Low-Resource Authorship Style Transfer: Can Non-Famous Authors Be Imitated?
Authorship style transfer involves altering text to match the style of a target author whilst preserving the original meaning. Existing unsupervised approaches like STRAP have largely focused on style transfer to target authors with many examples of their writing style in books, speeches, or other published works. This high-resource training data requirement (often greater than 100,000 words) makes these approaches primarily useful for style transfer to published authors, politicians, or other well-known figures and authorship styles, while style transfer to non-famous authors has not been well-studied. We introduce the low-resource authorship style transfer task, a more challenging class of authorship style transfer where only a limited amount of text in the target author's style may exist. In our experiments, we specifically choose source and target authors from Reddit and style transfer their Reddit posts, limiting ourselves to just 16 posts (on average ~500 words) of the target author's style. Style transfer accuracy is typically measured by how often a classifier or human judge will classify an output as written by the target author. Recent authorship representations models excel at authorship identification even with just a few writing samples, making automatic evaluation of this task possible for the first time through evaluation metrics we propose. Our results establish an in-context learning technique we develop as the strongest baseline, though we find current approaches do not yet achieve mastery of this challenging task. We release our data and implementations to encourage further investigation.
Ajay Patel , Nicholas Andrews , Chris Callison-Burch
arXiv preprint arXiv:2212.08986, 2022