Thread by @sh_reya on Thread Reader App

内容

Shreya Shankar Profile picture

thread#showTweet data-screenname=sh_reya data-tweet=1851812569390158095 dir=auto> I have a lot of thoughts on this as someone who has manually combed through hundreds of humans' prompt deltas

thread#showTweet data-screenname=sh_reya data-tweet=1851812570807832707 dir=auto> first, humans tend to underspecify the first version of their prompt. if they're in the right environment where they can get a near-instantaneous LLM response in the same interface (e.g., chatgpt, Claude, openai playground), they just want to see what the llm can do

thread#showTweet data-screenname=sh_reya data-tweet=1851812571961258199 dir=auto> there's a lot of literature on LLM sensemaking from the HCI community here (our own "who validates the validators" paper is one of many), but I still think LLM sensemaking is woefully unexplored, especially with respect to the stage in the mlops lifecycle

not only do people want to just see what the LLM can do, but they also don't fully know what they are supposed to say in the prompt, or what answer they want (they won't know it until they see it).

I think of a prompt as a form you need to fill out to submit your task to the LLM, and the fields of this form are unknown and dynamic (i.e., task-specific). a prompt writing tool can make these fields more known

thread#showTweet data-screenname=sh_reya data-tweet=1851812574876307479 dir=auto> second, humans derive a lot of the prompt content based on their sensemaking. if they observe a weird output, they edit their prompt (usually by adding another instruction). many of these edits are clarifications/definitions

thread#showTweet data-screenname=sh_reya data-tweet=1851812575987798331 dir=auto> for example (see our docetl paper), if you are an investigative journalist and want an LLM to find all instances of police misconduct in a report/document, you have to define misconduct. there are many types of misconduct, each of which also may require their own definitions

thread#showTweet data-screenname=sh_reya data-tweet=1851812577522893247 dir=auto> the LLM-generated prompt writer is GREAT here to relieve blank page syndrome for defining important terms in prompts. if I ask Claude to generate a prompt for "find all instances of police misconduct in this document", it makes an attempt to start definitions, which I can then refineImage

third, with docetl (our AI-powered data processing tool), some of our users don't have lots of programming experience, and as a result, I've sent them starter pipelines they can use to process their data.

surprisingly many of them can move forward without me, tweaking pipeline prompts, editing them, adding new operations, etc. I think an AI assistant can do my job of initially drafting the pipeline

thread#showTweet data-screenname=sh_reya data-tweet=1851812580928688386 dir=auto> overall I think prompting is going to be a collaborative effort between humans and LLMs in the future. humans alone are limited; LLM-written prompts alone are limited (you need to have _some_ human input and feedback to solve the human's fuzzy or underspecified task).

• • •

Missing some Tweet in this thread? You can try to force a refresh

总结
Shreya Shankar在推文中探讨了人类与大型语言模型(LLM)之间的互动,特别是在提示(prompt)编写方面。她指出,人类在首次编写提示时往往不够具体,通常只是想看看LLM的能力。此外,人们在观察到LLM输出的奇怪结果后,会根据自己的理解对提示进行修改,通常是通过添加说明来澄清定义。Shankar提到,使用AI生成的提示工具可以帮助用户更好地定义重要术语,从而减轻“空白页综合症”。她还提到,许多用户在使用AI工具处理数据时,能够在没有编程经验的情况下进行调整和编辑,显示出AI助手在初步草拟工作中的潜力。总体而言,Shankar认为未来的提示编写将是人类与LLM之间的协作过程,双方的输入和反馈都是必不可少的。