Andreas Rücklé - News/Blog

Leading a SoftwareCampus Project: A Summary

Mon, 01 Jun 2020 00:00:00 +0000

I am grateful for having received a SoftwareCampus grant at the end of 2017 after a multi-stage selection process consisting of interviews and pitching the project idea to potential industry partners. The SoftwareCampus program has significantly influenced my PhD journey, and I am glad for being able to share my experiences here in the hope that it will be helpful to other PhD students that may consider applying to SoftwareCampus.

SoftwareCampus

SoftwareCampus aims to train doctoral students with outstanding academic achievements and an entrepreneurial mindset to become future IT leaders. The program in itself is truly unique: Each recipient of a SoftwareCampus grant works with an industry partner and leads a joint research project over two years. The industry partner contributes networking events, mentoring, and executive trainings. Bonus: The German Federal Ministry of Education and Research (BMBF) funds the project with up to 100,000€!

From February 2018 to March 2020 I led the project "Intelligent Search in the Social Web" and coordinated our cooperation with DATEV eG. As part of the project, I have taught seminars and guided several students and interns at UKP Lab through theses or by employing them through the project's funding.

Highlights

The most apparent advantage of SoftwareCampus is that it teaches you as a doctoral student to plan and manage your own dedicated research projects (with adequate funding). There are many other, less obvious aspects that I consider very important, including both advantages and disadvantages:

You can expect to learn a lot of new skills. Participation in executive training courses was particularly fruitful for me. For example, I have attended training courses to learn negotiation skills, to conduct sales presentations, and to understand the different personality types in teams. I consider these skills very important, and throughout the program, I was able to strengthen them considerably. Without a doubt, participation in executive training courses is one of the biggest advantages of the program. I strongly recommend every prospective SoftwareCampus participant to book the courses as early as possible to ensure admission to the most interesting ones (they are quickly overbooked).
The program provides a great platform to connect people that have a similar entrepreneurial mindset coming from diverse backgrounds. For instance, each executive training course is accompanied by social events. Depending on the project partner, such events might also be part of the quarterly status updates. Most notably, this facilitates having insightful discussions with other participants as well as senior executives from the project partners. You can effortlessly establish new collaborations to pursue joint projects.
Participation in the SoftwareCampus is associated with a considerable investment of time. Writing a project proposal, applying for funding, managing and executing the project, participating in executive training courses, conducting status meetings, and finishing project reports all takes time. This may prolong the duration of the your PhD studies and you need to take this into account in your further planning. Notably, the exact amount depends on how much the project topic and PhD topic overlap.
You'll learn to pitch ideas to a diverse audience. For example, the application process consists of pitching a project idea to HR managers, quarterly meetings with the project partner typically consist of pitching the newest ideas and results to employees, in my case from the IT and legal departments of DATEV eG. Besides, there are even more interesting formats, for example, we recorded short project pitches on video — which was a particularly exciting experience.

In summary, there are several key advantages to participating in SoftwareCampus. Most importantly, you can expect to considerably strengthen many essential skills beyond research. If the proposed project correlates strongly with your PhD topic, the time investment is moderate and it will pay off significantly. Even if the two topics are entirely different, I believe the program is an excellent way to broaden your horizon.

In the following, I briefly outline our SoftwareCampus project titled "Intelligent Search in the Social Web".

Intelligent Search in the Social Web

Our project in a nutshell:

The amount of knowledge available on the web has grown exponentially in recent years. Online discussion forums, in particular, have gained popularity. For example, Reddit has accumulated more than 1.2 billion user comments in 2018 alone.
Properly accessing this knowledge has considerable potential, e.g., we can use this knowledge to automatically answer questions similar to the ones that have already been discussed. Therefore, we proposed a question answering system that performs an intelligent search in online forums to identify suitable answers.
In collaboration with DATEV eG, we defined several critical research questions that correspond to practical use-cases of such a system: How can we automatically answer programming questions posed by developers in German? How can we identify similar questions without having access to labeled training data? How can we transfer our models across different domains?
Based upon our research questions, we have planned and carried out a number of research projects, the results of which have been incorporated into our publications (see the sidebar on the left). Parallel to our research, we developed a prototypical QA system for the integration in a social Q&A platform (see image below).

Launching AdapterHub: A Framework and Central Repository for Adapters

Thu, 13 Aug 2020 00:00:00 +0000

We have just open-sourced AdapterHub.ml — a framework and central repository enabling the effortless adaptation of pre-trained transformers such as BERT, RoBERTa, and XLM‑R to new tasks and languages!

Pre-trained transformers have led to considerable advances in NLP, achieving state-of-the-art results across the board. They often contain hundreds of millions of parameters, and thus, sharing and distributing fine-tuned transformer models can be prohibitive. Adapters are a light-weight alternative to full model fine-tuning, consisting of only a tiny set of newly introduced parameters at every transformer layer. During training, the weights of the pre-trained transformer are fixed, which allows us to share the large majority of parameters between tasks. This results in extremely light-weight models (one adapter is around 3MB!). Adapters have been shown to achieve performances comparable to full model fine-tuning. Moreover, they provide elegant ways to share knowledge across multiple tasks more effectively, and we can leverage them for zero-shot cross-lingual transfer.

Before AdapterHub, training, sharing, and re-using adapters has not been straightforward. The reasons are that there exist several different adapter architectures, several ways of composing them, and a wide variety of pre-trained transformers. AdapterHub solves this issue by providing a unified interface to the most recent adapter architectures and composition techniques. Built on top of HuggingFace's popular Transformers library, AdapterHub has access to the most widely used pre-trained transformers.

AdapterHub covers the entire lifecycle of adapter training, inference, and sharing by (a) providing an open-source framework, (b) establishing a central repository of pre-trained adapters.

An Open-Source Framework for Adapting Transformers

Our AdapterHub framework is a fork of the popular Transformers library by HuggingFace. Therefore, Researchers can effortlessly integrate AdapterHub in existing experimental code by a drop-in replacement of the original library.

AdapterHub provides researchers with additional methods, e.g., to instantiate adapters in transformers and to freeze/unfreeze their weights. The following code provides a minimal example (this corresponds to figure 2 in our paper):

model = AutoModelWithHeads.from_pretrained('roberta-base')
model.add_adapter("sst-2", AdapterType.text_task)
model.train_adapter(["sst-2"])
...
model.save_adapter("adapters/text-task/sst-2/", "sst-2")

model.add_adapter adds the new adapter weights to the model, here under the name "sst-2". model.train_adapter indicates that we wish to train the adapter (and only the adapter), thus freezing the pre-trained transformer model weights. There exist several possible options for configuring adapters, e.g., we can choose among different architectures. Our documentation describes this in greater detail.

After adapter training has been finished, we call model.save_adapter to export the adapter's config and weights. We can seamlessly load our adapter into the same pre-trained transformer model, which makes it possible to share and open-source trained adapters, e.g., through our central repository.

A Central Repository of Pre-Trained Adapters

AdapterHub is accompanied by a central repository and website to share, find, and reuse pre-trained adapter models. After identifying a suitable adapter, we can simply load it:

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
model.load_adapter("sentiment/sst-2@ukp")

Our repository already contains over 100 adapters across several tasks and languages, see our explore page!

We leverage GitHub's infrastructure, automating the deployment and verification of novel adapters using GitHub actions. Adding new tasks, datasets, and adapters is as simple as creating a single yaml configuration based on one of our example files. Contributions are managed through pull requests to our Hub repository.

For more details, see the AdapterHub lifecycle below and check out our quickstart guide!

MultiCQA and AdapterHub accepted at EMNLP 2020

Wed, 16 Sep 2020 00:00:00 +0000

I'm glad to announce that two of our publications have been accepted at EMNLP 2020!

In MultiCQA, we study the zero-shot transfer capabilities of text matching models on a massive scale, by self-supervised training on 140 source domains from community question answering forums in English. We surprisingly find that neither domain similarity nor data size are critical factors for the best zero-shot transferability. We also demonstrate that considering a broad selection of source domains is crucial for obtaining the best zero-shot transfer performances, which contrasts the standard procedure that merely relies on the largest and most similar ones. We extensively study how to best combine multiple source domains and propose incorporating self-supervised with supervised multi-task learning. Our MultiCQA model trained on all available source domains considerably outperforms in-domain BERT on six out of nine benchmarks.
In AdapterHub, we present a novel framework for adapting transformers, built on top of the popular HuggingFace Transformers library. AdapterHub covers the entire lifecycle of training, sharing, and reusing adapter models in state-of-the-art pre-trained transformers such as BERT, RoBERTa, and XLM-R. AdapterHub provides researchers with a unified interface to the most recent adapter architectures and composition techniques, and it allows dynamic "stitching-in" of pre-trained adapters for different tasks and languages. Researchers can easily export their adapters and share them with the community through our central repository.

ArXiv links:

MultiCQA: https://arxiv.org/abs/2010.00980
AdapterHub: https://arxiv.org/abs/2007.07779

Ph.D. defended, new position at Amazon

Thu, 29 Apr 2021 00:00:00 +0000

I am happy to announce that I've successfully defended my dissertation titled "Representation Learning and Learning from Limited Labeled Data for Community Question Answering" on April 12! I would like to sincerely thank Iryna Gurevych for mentoring me over the past five years, and for the great opportunities she gave me at UKP Lab. I would also like to thank Jonathan Berant and Goran Glavaš for reviewing my thesis and for the very insightful discussion we had.

As of May 1st, I will start a new position as Applied Scientist NLP/IR at Amazon in Berlin. I am particularly thrilled to be conducting research with direct practical relevance and for the benefit of many customers. Through the close connection between research and real-world applications, I look forward to continuing to contribute to our terrific research community.

Renewed commitment in support of the Efficient NLP community

Sun, 31 Jul 2022 00:00:00 +0000

I am glad to announce that I'm renewing my commitment in support of the Efficient NLP community. I believe that inventing simple and efficient solutions is critical to increasing the participation in NLP research. This will enable more scientists to contribute to the state of the art and create sustainable progress in the long run.

First, I will be an Area Chair for the Efficient NLP Track at EMNLP 2022, held in December 2022. The Efficient NLP Track encourages submissions that focus on reducing memory requirements, improving training and inference efficiency, performing efficient model selection, and coming up with new methods to measure efficiency. I served as Area Chair for the 2021 Efficient NLP Track at EMNLP, which was a great success with over 130 submissions. I believe this track is critical to promoting research that goes beyond optimizing model accuracy.

Second, I am a co-organizer of SustaiNLP 2022—the third workshop on simple and efficient natural language processing (co-located with EMNLP 2022). SustaiNLP will provide a dedicated venue and discussion platform for researchers working on efficiency topics. The 2021 edition has received 51 submissions and has hosted a number of high profile keynotes. SustaiNLP will promote research focussing on clever methods that improve upon efficiency dimensions.

Third, as a member of ACL's Efficient NLP working group, I will contribute to the implementation of the Efficient NLP policy. The ACL executive committee has approved our policy earlier this year, which will impact important aspects of future conferences. Our recommendations include: (i) Better aligning experiments and research claims by changing the review process. (ii) Rewarding the open release of models of different sizes. (iii) Setting up efficiency tracks at conferences. I believe the Efficient NLP policy marks a significant milestone for the community, which will raise the equity in NLP research and ensure that future conferences reward scientific contributions without requiring enourmous resources.