2 Survey with Industry Professionals
3 RQ1: Real-World use cases that necessitate output constraints
4.2 Integrating with Downstream Processes and Workflows
4.3 Satisfying UI and Product Requirements and 4.4 Improving User Experience, Trust, and Adoption
5.2 The Case for NL: More Intuitive and Expressive for Complex Constraints
6 The Constraint maker Tool and 6.1 Iterative Design and User Feedback
2 SURVEY WITH INDUSTRY PROFESSIONALSMethodology. To get a broad range of insights from people who have experience with prompting and building LLM-powered applications, we deployed an online survey to users of an internal prompt-based prototyping platform (similar to the OpenAI API Playground [28] and Google AI studio [11]) at a large technology company for two weeks during Fall 2023. We chose this platform because it was explicitly designed to lower the barriers to entry into LLM prompting and encourage a broader population (beyond machine learning professionals) to prototype and develop LLMpowered applications. We publicized the survey through the platform’s user mailing list. We ran the survey for two weeks during Fall 2023. Participants were rewarded $10 USD for their participation. The survey was approved by our organization’s IRB.
\ Instrument. Our survey started with questions concerning participants’ background and technical proficiency, such as their job roles and their level of experience in designing and engineering LLM prompts. The survey subsequently investigated RQ1 and RQ2 by asking participants to report three real-world use cases in which they believe the implementation of constraints to LLM outputs is necessary or advantageous. For each use case, they were encouraged to detail the specific scenario where they would like to apply constraints, the type of constraint that they would prefer to implement, the degree of precision required in adhering to the constraint, and the importance of this constraint to their workflow. Finally, the survey investigated RQ3 by asking participants to reflect on scenarios where they would prefer expressing constraints via a GUI (sliders, buttons, etc.) over natural language (in prompts, etc.) and vice versa, as well as any alternative ways they would prefer to articulate constraints to LLMs. The GUI alternative draws inspiration from tools like the OpenAI Playground that allow users to adjust settings like temperature and top-k through buttons, toggles, and sliders. Detailed survey questions are documented in section A of the Appendix.
\ Results. 51 individuals responded to our survey. Over half of the respondents were software engineers (58.8%) across various product teams; others held a variety of roles like consultant & specialist (9.8%), analyst (7.8%), researcher (5.9%), UX engineer (5.9%), designer (3.9%), data scientist (3.9%), product manager (2.0%), and customer relationship manager (2.0%). All respondents had experience with prompt design and engineering, with the majority reported having extensive experience (62.7%). The targeted audience and use cases of their prompts were split approximately evenly among consumers and end-users (33.3%), downstream development teams (31.4%), or both (29.4%), with some created specifically for exploratory research & analysis (5.9%). Together, respondents contributed 134 unique use cases of output constraints. To analyze the contents of the open-ended responses, the first author read through all responses and used inductive analysis [34] to generate and refine themes for each research question with frequent discussions with the research team. We present the resulting themes for each research question in sections 3-5.
\ Limitations. Note that our findings largely capture the views of industry professionals, and may not encompass those of casual users who engage with LLMs conversationally [25]. Additionally, as our respondent sample is limited to a single corporation, the results described in the following sections may not be representative of the entire industry. Furthermore, our frequent use of open-ended questions might have negatively impacted the response rate. However, the saturation of novel findings and insights towards the end of the survey deployment suggests that we have successfully captured a comprehensive range of perspectives.
\
\
:::info This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.
:::
:::info Authors:
(1) Michael Xieyang Liu, Google Research, Pittsburgh, PA, USA ([email protected]);
(2) Frederick Liu, Google Research, Seattle, Washington, USA ([email protected]);
(3) Alexander J. Fiannaca, Google Research, Seattle, Washington, USA ([email protected]);
(4) Terry Koo, Google, Indiana, USA ([email protected]);
(5) Lucas Dixon, Google Research, Paris, France ([email protected]);
(6) Michael Terry, Google Research, Cambridge, Massachusetts, USA ([email protected]);
(7) Carrie J. Cai, Google Research, Mountain View, California, USA ([email protected]).
:::
\
All Rights Reserved. Copyright , Central Coast Communications, Inc.