# Software Team Experiences and Challenges: A Report from Day 1 of the 2021 Collegeville Workshop on Scientific Software ### Contributors - Todd Munson, Argonne National Laboratory, GitHub: tmunson - Anshu Dubey, Argonne National Laboratory, GitHub: adubey64 - Sam Yates, Swiss National Supercomputing Centre; GitHub: halfflat - Barry Smith, Argonne National Laboratory, GitHub: BarrySmith - Sarah Knepper, Intel Corporation; GitHub: sknepper - Ulrike Meier Yang, Lawrence Livermore National Laboratory; GitHub: ulrikeyang - Reed Milewicz, Sandia National Labs, GitHub: tbd - Jacob Moxley, Sandia National Labs, GitHub: tbd - Ben Cowan, Pilot AI, GitHub: benc303 - Sarah Osborn, Lawrence Livermore National Laboratory, GitHub: osborn9 - Robert Jacob, Argonne National Laboratory, jacob@anl.gov - Jim Pivarski, Princeton and IRIS-HEP, GitHub: jpivarski - Gerasimos Chourdakis, Technical University of Munich, GitHub: MakisH - Jay Lofstead, Sandia National Labs, GitHub: gflofst - Cody Balos, Lawrence Livermore National Laboratory, GitHub: balos1 - Lois Curfman McInnes, Argonne National Laboratory, Github: curfman - Jed Brown, CU Boulder, GitHub: tbd - James Willenbring, Sandia National Labs, GitHub: tbd - Vadim Dyadechko, ExxonMobil, GitHub, tbd - Elaine Raybourn, Sandia National Labs, GitHub, tbd ### Editors: - Michael A. Heroux, Sandia National Labs, GitHub: maherou - Johanna Cohoon, UT Austin, GitHub: jlcohoon ## Background: The Collegeville Workshop Series The Collegeville Workshop Series on Scientific Software is intended to bring together three communities of scientific software contributors: academia, industry and laboratories. While there are existing exchanges between these communities, we are dedicated to improving awareness of common needs, unique contributions and career paths that span these communities. Workshop contributions include short white papers, video interviews, and a three-day live event with panels, small-group discussions, and teatime sessions for themed conversations. ## Collegeville 2021 Theme: Software Teams The Collegeville 2021 theme was scientific software teams. The first day of live discussion focused on software team definitions and challenges; the second day on technical strategies for improvement; and the third on cultural approaches for improvement. Little scientific software is developed by individual scientists. Instead, teams with diverse skills collaborate on producing and using software to advance scientific discovery and understanding. Understanding how teams function and how teamwork can be improved represent two of the frontiers in improving the impact of software on science. Software team skills and cultures can vary. A scientific software team will have science domain experts but increasingly has expertise in computer science, mathematics and software engineering. As we increase our focus on understanding and improving software teams, we see growing value in including expertise in social and cognitive sciences. ## Workshop Small Group Discussions During each of the live discussions, small groups gathered to discuss the topic of the day, creating a shared notes file. This blog is the first in a series of three that summarize the output from these discussions ## Day 1: Workshop software team experiences and challenges To provide a framework for discussing software team improvement, we spent Day 1 learning about each other, our backgrounds and what we see as the biggest challenges to improving software teams. About half of the discussion participants were from research labs; the remaining half were evenly split between universities and industry. 21 participants chose to receive attribution for their contributions. ### Summary of participant software team experiences Discussion participants from labs and universities represented a number of well-known open-source software projects. Others came from industry, representing the oil & gas sector, and technical computing software providers. Finally, a number of participants were from the social and cognitive science communities, where their domain of study includes scientific software teams and developers. In aggregate, the discussion participants have approximately 300 years of collective software development and software project leadership experience, ranging from individual contributors to leaders of large, multi-team efforts. Together, these participants provide software to thousands of users throughout the world. Participant experiences further represent approximately four decades of focused study of scientific software teams via methodologies from the social and cognitive sciences. ### Key Challenges In the remainder of this report, we summarize the key challenges identified during the small group discussions. The detailed notes from these discussions are available on the [Collegeville 2021 Workshop website](https://collegeville.github.io/CW21). **Improving business models for research software sustainment:** All scientific software has a business model, even if it is implicit. Making the business model explicit and actively managing toward sustainability is much less common. Having a good business model is important for all kinds of software, but critical where there is a distributed and evolving community involved in its development. **Curating and maintaining knowledge:** Teams have many conversations and tools that support maintenance of the software product. Software practices, regression test, and continuous integration are examples. Less attention is paid to assuring that knowledge about the product is curated and maintained. Preparing for staff departure, creating paths for long-term membership, and otherwise committing to the long-term viability of a scientific software project represent opportunities for improvement. Curation and maintenance will differ depending on details of a team’s purpose. Software development teams will need different approaches than software deployment and support teams. Sustaining product knowledge is essential to maintaining trust in what the software product does. **Maintaining and growing software teams:** Finding people with the right skills and competing for them within the salary structures of institutions that develop scientific software is challenging. In this context, having better strategies for acquiring and retaining the right talent is a growing problem. We are beginning to address the needs of research software engineers (RSEs). However, in general, we need to improve the recognition, value and career paths for all members of a scientific software team, and the management of its diverse members. **Evolving team practices with the maturing and increased size and scope of the software:** The initial practices, tools and infrastructure adopted by a team will need to evolve, to take advantage of emerging approaches and tools, and to improve effectiveness and efficiency as a team grows. Evolution includes enabling bottom up change, and giving individual contributors autonomy to explore new approaches and receive recognition for their contributions. **Scientific software development is increasingly large-scale and complex, with development spread across multiple institutions and scientific domains:** Increased diversity and complexity requires better ways of communicating, collaborating, and coordinating across disciplines and institutions. Common challenges include managing team organization as sub-teams or hierarchies, effective and efficient meetings, acknowledging and incorporating backgrounds that are more diverse scientifically (CS, domain scientists, mathematicians, social scientists) and culturally (geography, languages), and effective remote communication. Attempted solutions to these challenges, e.g., teach-ins, are hard to justify when deadlines loom. **Aligning incentives and rewards around code development vs research and publishing: People often do not get adequate credit and recognition for their contributions to software, relative to other forms of contribution (e.g., publications). In addition, there is a disconnect between what is highlighted in funding proposals about what the funding is for and what is needed to make a software product successful. This disconnect influences what skills are considered during hiring. **Scientific software teams need diverse skill sets:** Advanced scientific software teams need testing, integration, deployment, and user support skills. But these skills are seldom listed as explicitly funded activities. In addition, we need career paths for junior members who have these important skills. We need to support maintainers who do not want to be principle investigators (PIs). Without being recognized and funded, non-PIs find it difficult to continue contributing to a software project, and to establish their career. Software teams need members to handle this important software work, yet funding agencies do not provide sufficient opportunities to pursue funding for these activities. We need to distribute credit across the team. We need to address the question, “Why can’t I be a phenomenal contributor?” **Improving awareness and collaboration across end-user application development teams and reusable (library) software teams:** Application teams develop software that is targeted toward specific computational problems. Teams that development libraries and other reusable components need to focus on a broader set of uses, based on characteristic examples derived from interaction with their user community. Application programmer interfaces (APIs) are joint responsibilities; application teams have design concerns on one end (dependencies), library teams on both (dependencies and dependents). For dependents, dealing with backward compatibility is an issue, with approaches such as mocking in tests and adopting semantic versioning as possible solutions. Collaborations between application and library teams require determining how to design and develop capabilities in the spaces between software providers and clients. Challenges include how to integrate the library team with the application user community, how to foster software ecosystem perspectives, especially around collaborative development of APIs, and designating concrete staffing roles for shepherding API development and adoption. **Characterizing what software quality means and establishing ways to measure and improve it:** We need to understand and calibrate the effort vs payoff balance for software quality. Similarly, we need to address how much we value innovation vs stability, and portability vs cost of special-purpose coding. Our stakeholders have varying priorities and we might not have support for the quality we desire. In addition, we need funding for evaluating and improving software quality. We need to develop ways to understand if software is fulfilling its purpose and we need our sponsors to value and invest in code quality. ## Final Remarks from Day 1 Discussions Day 1 discussions at the Collegeville 2021 Workshop represent the input of a diverse and experienced group of scientific software developers and leaders, and their colleagues from the social and cognitive sciences. We hope that the challenges summarized in this blog resonate with the reader and help the scientific software community when prioritizing efforts to improve the quality and impact of software in the pursuit of scientific discovery.