Jacob Krüger

Assistant Professor for Software Engineering
Eindhoven University of Technology

Mail Scholar Mail ORCID Twitter



Research

In my research, I focus on understanding, describing, and facilitating software evolution and the impact of human factors, particularly developers cognition, by integrating psychological and economical concepts. Roughly, my research can be structured in five intersecting sub-topics, yielding empirical insights, novel techniques, conceptual models, and guidelines for conducting research. If you are interested in any of these topics or their intersections, have a read below or ping me (e.g., to discuss projects or theses).

Re-engineering variant-rich systems The primary focus of my dissertation has been on the re-engineering of variant-rich systems. A variant-rich system describes a number of reused software variants that are similar, but have unique functionalities (i.e., features) to fulfill individual customer requirements. Organizations implement variant-rich systems through different techniques, which can be primarily distinguished into clone-based (e.g., copy-paste, clone & own) or platform-based (e.g., product-line engineering) strategies. Most developers start with clone-based development by creating and adapting a copy of an existing variant, since it is well supported and readily available, for instance, via forking on GitHub. However, an increasing number of cloned variants can easily cause problems in developing and maintaining the variant-rich system, for instance, because new features or bug fixes must be propagated between the independent and co-evolving variants. In such cases, organizations often decide to adopt a platform by re-engineering their cloned variants. A platform builds on a variability mechanism (i.e., a technique for implementing configuration options, such as the C preprocessor) and automated tool support (e.g., for modeling features, configuring, and deriving variants) to help developers reuse software artifacts more systematically.

Contributions
  • We empirically elicited economical data on the (re-)engineering of variant-rich systems, highlighting that organizations should aim to iteratively move towards platform-based software reuse but must be aware about costly factors (e.g., feature location). The data can help organizations in their decision making and confirms/refutes established assumptions in research (e.g., change propagation can be more challenging in a platform than often assumed).pdfpdfpdfpdf
  • We published academic and industrial case studies on re-engineering variant-rich systems, providing insights into the processes, pitfalls, and benefits.pdfpdfpdfpdfpdfpdfpdf
  • We collected empirical insights into the feature-location problem, and how to tackle it by eagerly tracing features in advance, recommending that feature traces in the source code should be lightweight and separated from variability mechanisms to facilitate program comprehension.pdfpdfpdfpdfpdfpdfpdfpdfpdfpdfpdfpdf
  • We constructed a process modelpdf, conceptual modelpdfpdf, and operationspdf for specifying and supporting the (re-)engineering and evolution of variant-rich systems by providing an understanding of contemporary practices.
  • We develop techniques for supporting developers in (re-)engineering endeavors of variant-rich systems, for instance, for analyzing the variability of source code, feature modeling, or visualziations.pdfpdfpdfpdfpdfpdfpdfpdfpdfpdfpdf
  • We have proposed guidelines for assessing and planning the (re-)engineering of variant-rich systems, for instance, feature modeling principles or product-structuring concepts for systems platforms.pdfpdfpdfpdfpdfpdf
  • Other contributions on variant-rich systems include, for instance, datasetspdfpdf, definitions of benchmarkspdf, concepts for promoting systems security/safetypdfpdfpdfpdfpdf, visions for future research on variant-rich systemspdfpdfpdfpdfpdf, and support for quality assurancepdfpdfpdfpdfpdf.
Some interesting reads
  • Empirical Software Engineering 2022pdf: We have proposed a conceptual model for unifying variability in space and time to guide contemporary research and tool development.
  • Empirical Software Engineering 2021pdf: With funding from pure-systems GmbH, we have formalized and implemented operations in a tool that enables collaborative, distributed feature modeling; which works similar to Google Docs for text.
  • Empirical Software Engineering 2021pdf: We have instantiated the Family Evaluation Framework for assessing software product-line engineering in a company, reporting how to use the framework for managing and monitoring.
  • ESEC/FSE 2020pdf: We elicited data through a literature review and interviews to collect reliable empirical insights on the costs of clone- and platform-based software reuse; confirming many established hypotheses, but also refuting some.
  • ICSME 2020pdf: We conducted a large-scale study with open-source developers, revealing that while they preferred refactored preprocessor directives, they actually performed worse in two program comprehension tasks; and had particular challenges in understanding the configurability of the source code.
  • ESEC/FSE 2019pdf: We report an online experiment in which practitioners solved six comprehension tasks while exposed to different types of feature traces, indicating that virtual traces (i.e., annotations) can facilitate program comprehension.
  • ESEC/FSE 2019pdf: We elicited a collection of feature modeling principles from the literature and interviews with practitioners to facilitate the construction of feature models.
  • Journal of Systems and Software 2019pdf: We investigated how to recover feature facets for two open-source systems, showing what information sources in social-coding platforms (e.g., GitHub) can be helpful to understand important properties of the respective system.
Projects and funding
  • Pure-Systems GmbH: Go SPLC 2019 Challenge project
  • German Academic Exchange Service: IFI fellowship, research visits fellowship, conference traveling fellowship
  • European Union: Erasmus traineeship grant


Quality in software evolution Most software systems exist for a longer time, and thus are evolving. There are numerous reasons why software evolves, for instance, because new features are added (with the system potentially becoming variant-rich), refactorings employed, or bugs fixed. However, not every evolution may improve the system. Instead, a change may lead to new bugs in the system or a general degeneration of the source code; causing, for instance, architectural or code smells, technical debt, or incomprehensible code. It is important to understand how software degenerates and how this is impacted or impacts developers. Precisely, a system may become less and less comprehensible, requiring major re-engineering to improve its quality and make it usable for an organization.

Contributions
  • We are working on improving techniques for automatically repairing bugs and quality problems that stem from the misuse of APIs, for instance, after an API has been updated.pdfpdfpdf
  • We contribute empirical insights into quality problems arising during the evolution of software systems, such as architectural degeneration, support for program comprehension, or tangled evolution histories.pdfpdfpdfpdfpdf
  • We study how organizations can benefit from measuring their software evolution, and thus improve their practices using KPIs, process, mining, or other metrics.pdfpdfpdf
  • Other contributions on software quality in evolution involve datasetspdf, industrial case studiespdf, visions of related cognitive challengespdf, and investigations of how communities provide informationpdf or use toolspdf.
Some interesting reads
  • ICSE-SEIP 2024pdf: We have interviewed developers at ASML to understand unintended software dependencies (e.g., architecture smells) in real-world, multi-lingual systems; finding that these are often similar to unintended dependencies in mono-lingual systems, but pose additional challenges to identify and resolve.
  • Journal of Systems and Software 2023pdf: We analyzed for 11 open-source microservice systems to what extent their changes are tangled with respect to what microservices and what types of changes they involve, finding that there seems no focus or clear separation of distinct business/microservice-oriented changes.
  • ICSME 2023 Industry Trackpdf: We worked with Thermo Fisher Scientific to design a new process-mining technique that analyzes Jira issues, enabling us to recover and shed light into process activities that are not fully visible in typical version-control data.
  • ICSME 2021pdf: We advanced existing tools to conduct a novel mining study on the evolution of architectural smells in open-source software and their impact on technical debt (e.g., cyclic dependencies have particular impact).
  • ESEC/FSE Industry Track 2020pdf: We contribute an experience report of how a large German company implemented and benefited from measuring key performance indicators for its software development.
  • Empirical Software Engineering 2019pdf: We conducted a large experiment in which developers had to comprehend smaller code examples with different types of comments, with our results and comparison to the related work indicating that comments are less helpful for smaller code excerpts and that developers often mistrust them.


Cognition in software engineering Developers, and thus humans, implement and evolve software. Consequently, software engineering is subject to cognitive biases and other psychological or sociological concepts. Unfortunately, such biases can cause problems in the software itself, for instance, bugs, architectural degeneration, or performance problems. A prime example is that developers reuse an existing implementation that solves a problem similar to theirs. However, that solution may not work properly in the new context or may not be the best solution. The developers may not properly reflect on such problems, since cognitive biases impair their rationality. Understanding cognition in software engineering, studying program comprehension, and mitigating biases can help developers improve the quality of their systems.

Contributions
  • We improve our foundational understanding of what knowledge developers aim to remember, and how to measure their remaining expertise; with the results indicating that more abstract (e.g., feature) knowledge is easier to remember for them.pdf pdfpdfpdfpdfpdfpdfpdf
  • We are aiming to incorporate our work on human cognition to facilitate and ensure the correctness of software evolution.pdf
  • Other contributions on developers' cognition involve studies on program comprehension and recommendations on how to improve these (see also other topics).pdf
Some interesting reads
  • Journal of Systems and Software: In Practice 2024pdf: We report on the development of the German Corona-Warn App, eliciting how the development process varied compared to other applications developed at SAP and how these variations impacted the developers involved.
  • Journal of Systems and Software: In Practice 2024pdf: We share the experiences of a startup with developing a COVID-certificate verification-system, focusing on how the startup context, emergency situation, and pressure of developing an application for helping the general public impacted the startup developers.
  • ICSME 2020pdf: We conducted an interview survey on smaller software systems that indicates that developers focus on memorizing more abstract knowledge about their system, and are quite good at remembering knowledge.
  • ICSE 2018pdf: We report the results of a developer survey in which we investigated what factors impact developers' memory and whether we could adopt psychological forgetting curves to measure their remaining expertise.
Projects and funding
  • German Research Foundation: INKleSS
  • Otto-von-Guericke University Magdeburg: Innovation fund


Fork-based software development and fork ecosystems Fork-based software development refers to developers creating a fork (i.e., copy) of a system and implementing, for instance, a new feature or bug fix on that fork. Afterwards, the developers request that the fork is merged back into the system, for which a review is typically performed. This development paradigm enables collaborative and distributed work, while also providing means to improve the management of concurrent development effort. However, forks can also easily become long-living clones of their system, resulting in variant-rich systems. Consequently, most research related to this area is also closely connected to the area of variant-rich systems.

Contributions
  • We are developing techniques to support developers during the merging of forked variants and in understanding fork ecosystems.pdfpdf
  • We provide insights into the properties of fork ecosystems and how they can be analyzed.pdf
Some interesting reads
  • SANER 2024pdf: We have analyzed a community discussion about GitHub's network graph, based on which we designed and evaluated visualizations intended to help developers explore fork ecosystems.
  • ASE 2023pdf: We analyzed to what extent test cases within fork ecosystems are reused, finding that there is huge potential for sharing more test cases to improve the quality of other forks across an ecosystem.


Guidelines for conducting research In parallel to the other research topics, publishing experiences and recommendations on scientific methods is an important topic. Particularly, recommendations on pitfalls of literature analyses (e.g., literature reviews) and empirical studies are important to improve community practices. As such, such meta-research aims at improving science itself and facilitating researchers' tasks. Moreover, it is concerned with potential impediments that (certain groups of) researchers face when participating in the scientific community.

Contributions
  • We empirically study the software engineering/computer science community, particularly what impediments junior researchers face when they start their scientific careers.pdfpdfpdfpdf pdfpdfpdf
  • We report on pitfalls of conducting literature reviews, and suggest ways to mitigate these.pdfpdf
  • We work on techniques for facilitating literature analyses for researchers and practitioners.pdfpdfpdfpdfpdf
  • Other contributions on such guidelines are concerned with artifact sharingpdf, article recommendation platformspdf, (alt)metricspdfpdfpdf, and suggestions for different types of empirical studiespdfpdfpdf.
Some interesting reads
  • ICSE-SEET 2024pdf: We have reported our experiences of integrating video-creation tasks in a requirements-engineering course with the intention of increasing student participation and interaction.
  • EASE 2022pdf: We conducted a survey among software engineering researchers to explore the challenges they observe for junior researchers to actively participate in the community.
  • EASE 2022pdf: We propose a technique that incorporates various metrics to facilitate the discovery, selection, and quality assessment of publications during literature analyses.
  • JCDL 2022pdf: We report on a large study of the computer science community in which we compare citations and altmetrics for a number of high reputation venues.
  • JCDL 2021pdf: We performed a comparative survey of existing article recommendation platforms, which help research disseminate and discuss their publications after they have been accepted.
  • Empirical Software Engineering 2020pdf: We have studied how research artifacts are (and should be) shared, indicating that researchers should be motivated to publish their artifacts in persistent repositories.
  • Empirical Software Engineering 2020pdf: We report on a large-scale experiment that shows the limitations and problems of search engines in computer science, indicating clear threats to the conduct and replication of literature reviews.