Your resource for web content, online publishing
and the distribution of digital products.
«  
  »
S M T W T F S
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
 
 
 
 
 
 
 
 
 
19
 
20
 
21
 
22
 
23
 
24
 
25
 
26
 
27
 
28
 
29
 
30
 
 
 
 
 
 

How a BPM Dream Team Ranked the Risks and Tools for AB-BPM

DATE POSTED:June 17, 2025
Table of Links

Abstract and 1. Introduction

  1. Background and Related Work
  2. Research Method
  3. Results
  4. Discussion
  5. Conclusion and References

\

3 Research Method

In order to address the research questions, we apply the grounded theory (GT) research method [4], combined with the Delphi method [17]. The GT research methodology is suitable for building a theory and answering questions about fields where little is known. It has been selected because so far, no research has been conducted on the industry perspective of applying AB testing for BPI. After the initial data collection with semi-structured interviews (purposive sampling in GT), we approached the second stage of the GT methodology (also called theoretical sampling [4]) as a shortened Delphi method study. The novelty and complexity of the topic and the fact that the experts from the interviews have already been introduced to the AB-BPM method made a follow-up within this group of panelists more suitable than a broader follow-up survey, which would have required training the larger group and led to a more heterogeneous exposure of participant knowledge. The Delphi method has multiple sub-categories, and the version we use is called the ranking-type Delphi method (RTDM). The goal of RTDM is identifying and ranking key issues regarding a certain topic [17]. In the following paragraphs, we describe the research methods used in more detail.

\ Expert Selection. We have recruited experts from a multi-national software company with more than 100,000 employees. The company develops enterprise software, and the majority of study participants are employees of a sub-unit that specializes in developing BPM software. Due to the study’s exploratory nature, the aim was to obtain a perspective from a broad range of experts. For this purpose, we set a number of goals for the selection of the experts: i) to include people who develop BPM software as well as people that work in consulting (however, not necessarily both at once); ii) to cover various areas of technical skills, e.g., software engineering and data science; iii) the study participants should have experience with business process improvement initiatives. The aim was to have a panel with ten experts, in line with standard practice for RTDM studies in Information Systems [17]. After reaching out to eleven people, ten people agreed to take part. Most study participants have a background in software engineering or other product-related roles (e.g., product management). But the panel also includes experts from consulting and a data scientist. The study participants had, on average, 7.6 years (SD = 2.4 years) of full-time industry experience, working in the BPM field for an average of 4.3 years (SD = 1.8 years). Most of the experts (seven) have a degree in the Science/Engineering realm, while some (three) obtained their education in the field of Business/Management. The highest educational degree of five of the study participants is a Ph.D., for four a master’s and for one a bachelor’s degree. The experts went through three study rounds: the interview, a validation survey and a ranking survey. Regarding participation levels, there was a drop from ten to five in the validation survey, whereas the final ranking survey reached eight people. Since the ranking survey is more important for the final results and included the option to give feedback on the coding as well, the level of participation after the initial interviews can be seen as relatively high.

\ Interviews. As is common in GT research, we conducted semi-structured qualitative interviews with subject matter experts, aiming to capture a wide range of ideas and thoughts on the complexities of a topic by openly engaging in conversation with subjects and analyzing these interactions [21,3]. Since the order and wording of questions and the follow-up questions are highly flexible in semistructured interviews, the interview guide is more of a collection of topics to be covered and not a verbatim script. There have also been minor adjustments to the interview guide during the interview phase in response to gained knowledge, in line with standard practice. Such adjustments are considered unproblematic since the goal is not a comparison of different subgroups, to test a hypothesis, or to find out how many people hold certain beliefs, but to find out what kind of beliefs are present [3]. We used the following interview guideline, given in a condensed version: 1. prior experience with BPI, 2. short introduction to AB-BPM (not a question, short presentation; 5-10 minutes), 3. execution of AB tests/feasibility, 4. suitability, 5. prerequisites to adopt the AB-BPM method, 6. risks, 7. tool requirements, 8. open discussion.

\ Consolidation and Validation. After the interviews, the transcripts were coded, and topics were consolidated (GT phase initial coding [4]). After the consolidation, the categories risks and tool features were selected for further data collection. The selection was motivated by the fact that the experts seemed highly interested in and provided many ideas around these categories; also, the categories can be considered highly relevant for the elicitation of requirements. The item lists were sent to the experts, which then had to validate whether their stance on the issues was properly represented. If not, the experts could give feedback on which items were missing or if some points should be separated and specified more clearly. Note that the narrowing-down phase, which asks the experts to exclude the least important items from each list, was skipped because the lists we presented to the experts had less than 20 items in them, to begin with. This is in accordance with common practice and guidelines [17,16].

\ Ranking. After validating the relevant points, the ranking phase aims to rank the items – often with respect to the importance of issues. Since our two different lists, i.e., regarding risks and tool features, are topically distinct, we operationalized the ranking metrics differently for each list. Multiple rounds of ranking, as is common in RTDM studies, were outside of the scope of this work due to the extensive interviews and the focus on the exploration of new insights rather than the quantification of known facts. Since risk is a complex and hard-to-poll topic, we operationalized it as the product of the perceived likelihood of occurrence and the potential damage if said situation manifests [20]. The participants were asked to rank each the probability and the impact on a Likert scale: very low (1) - low (2) - moderate (3) - high (4) - very high (5). This results in risk scores from 1 to 25. Furthermore, we asked the study participants to rate the importance of possible tool features on a Likert scale. The possible choices were: extremely unimportant (1) - somewhat unimportant (2) - neutral (3) - somewhat important (4) - extremely important (5).

\

:::info Authors:

(1) Aaron Friedrich Kurz [0000 −0002 −2547 −6780], Technische Universitat Berlin, Berlin, Germany and SAP Signavio, Berlin, Germany ([email protected]);

(2) Timotheus Kampik, SAP Signavio, Berlin, Germany ([email protected]);

(3) Luise Pufahl, Technische Universitat Munchen, Munich, Germany ([email protected]);

(4) Ingo Weber [0000 −0002 −4833 −5921], Technische Universitat Munchen, Munich, Germany and Fraunhofer Gesellschaft, Munich, Germany ([email protected]).

:::

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\