Thread by @sh_reya on Thread Reader App

Shreya Shankar

thread#showTweet data-screenname=sh_reya data-tweet=1851812569390158095 dir=auto> I have a lot of thoughts on this as someone who has manually combed through hundreds of humans' prompt deltas

https://twitter.com/simonw/status/1851771710510633081

thread#showTweet data-screenname=sh_reya data-tweet=1851812570807832707 dir=auto> first, humans tend to underspecify the first version of their prompt. if they're in the right environment where they can get a near-instantaneous LLM response in the same interface (e.g., chatgpt, Claude, openai playground), they just want to see what the llm can do

thread#showTweet data-screenname=sh_reya data-tweet=1851812571961258199 dir=auto> there's a lot of literature on LLM sensemaking from the HCI community here (our own "who validates the validators" paper is one of many), but I still think LLM sensemaking is woefully unexplored, especially with respect to the stage in the mlops lifecycle

not only do people want to just see what the LLM can do, but they also don't fully know what they are supposed to say in the prompt, or what answer they want (they won't know it until they see it).

I think of a prompt as a form you need to fill out to submit your task to the LLM, and the fields of this form are unknown and dynamic (i.e., task-specific). a prompt writing tool can make these fields more known

thread#showTweet data-screenname=sh_reya data-tweet=1851812574876307479 dir=auto> second, humans derive a lot of the prompt content based on their sensemaking. if they observe a weird output, they edit their prompt (usually by adding another instruction). many of these edits are clarifications/definitions

thread#showTweet data-screenname=sh_reya data-tweet=1851812575987798331 dir=auto> for example (see our docetl paper), if you are an investigative journalist and want an LLM to find all instances of police misconduct in a report/document, you have to define misconduct. there are many types of misconduct, each of which also may require their own definitions

thread#showTweet data-screenname=sh_reya data-tweet=1851812577522893247 dir=auto> the LLM-generated prompt writer is GREAT here to relieve blank page syndrome for defining important terms in prompts. if I ask Claude to generate a prompt for "find all instances of police misconduct in this document", it makes an attempt to start definitions, which I can then refine

third, with docetl (our AI-powered data processing tool), some of our users don't have lots of programming experience, and as a result, I've sent them starter pipelines they can use to process their data.

surprisingly many of them can move forward without me, tweaking pipeline prompts, editing them, adding new operations, etc. I think an AI assistant can do my job of initially drafting the pipeline

thread#showTweet data-screenname=sh_reya data-tweet=1851812580928688386 dir=auto> overall I think prompting is going to be a collaborative effort between humans and LLMs in the future. humans alone are limited; LLM-written prompts alone are limited (you need to have _some_ human input and feedback to solve the human's fuzzy or underspecified task).

• • •

Missing some Tweet in this thread? You can try to force a refresh