A visual programming environment for prompt engineering

コンテンツ

ChainForge is an open-source visual programming environment for prompt engineering. With ChainForge, you can evaluate the robustness of prompts and text generation models in a way that goes beyond anecdotal evidence. We believe prompting multiple LLMs, comparing their responses and testing hypotheses about them should be not only easy, but fun.

To learn more, read our documentation.

Try out ChainForge: chainforge.ai/play

*Note that you must be on a Chrome, Firefox, Edge, or Brave browser.*

We've made some Example Flows to get started (top-right corner). For instance, here is an example flow for evaluating model robustness to prompt injection attacks:

For any questions, comments, or feature requests, please submit an Issue on our GitHub, or submit a Google Form here.

Or... install ChainForge locally

The web version of ChainForge has a slightly limited feature set. For instance, in the full version you can load API keys from environment variables, write Python code to evaluate LLM responses, or query locally-run Alpaca/Llama models hosted via Dalai.

To install ChainForge on your machine, simply do:

pip install chainforge
chainforge serve

Open localhost:8000 in a Chrome, Firefox, Edge, or Brave browser.

What can I do with ChainForge?

Software built on LLM calls require one to verify the quality of outputs. ChainForge provides a suite of tools to evaluate and visualize prompt (and model) quality, with minimal effort by you. In other words, it aims to make evaluation of LLMs a piece of cake 🍰.

Everyday, developers on social media make claims about such-and-such prompt working for them. But these claims are anecdotal, with no data verifying robustness — no plots, no hard evidence, no way to verify that one model works better than another for your use case. What if you could know, precisely and in a split second, what prompt actually was the 'best'? And not only that, but which model had the most performant responses?

With ChainForge, out of the box, you can:

  • test robustness to prompt injection attacks
  • test consistency of output when instructing the LLM to respond only in a certain format (e.g., only code)
  • send off a ton of parametrized prompts, cache them and export them to an Excel file, without having to write a line of code
  • verify quality of responses for the same model, but at different settings
  • measure the impact of different system messages on ChatGPT output
  • run example evaluations generated from OpenAI evals
  • ...and more

Development + Contributing

ChainForge is in active development and is currently provided as an open beta test. We welcome and encourage contributors. If you'd like to contribute, just submit an Issue or fork the repository and make a Pull Request.

ChainForge was created by Ian Arawjo, a postdoctoral scholar at Harvard in the Glassman Lab of the Harvard HCI group. He is currently the lead developer. Ongoing collaborators include Elena Glassman, Martin Wattenberg, Priyan Vaithilingam, and Chelse Swoopes.

This work was partially funded by the NSF grants IIS-2107391, IIS-2040880, and IIS-1955699. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Copyright © Ian Arawjo

要約する
ChainForge is an open-source visual programming environment designed for evaluating the robustness of prompts and text generation models. It allows users to prompt multiple language models, compare responses, and test hypotheses easily. The tool provides features like testing robustness to prompt injection attacks, evaluating output consistency, and measuring the impact of system messages on model responses. ChainForge aims to make evaluating language models effortless and data-driven, offering tools to visualize prompt and model quality. The platform is actively developed and welcomes contributions. Created by Ian Arawjo, the tool is funded by NSF grants and supported by collaborators from Harvard University. Users can access ChainForge online or install it locally for advanced features.