- Visibility into every LLM call, input, and output in your application
- Systematic evaluation to measure performance against curated test cases
- Version tracking for prompts, models, and data so you can understand what changed
- Feedback collection to capture human judgments and production signals
The main threads of Weave
Traces
Tracks end-to-end how a specific LLM application comes to its response.- See inputs and outputs of each application usage.
- See source documents used to produce the LLM feedback.
- See cost, token count, latency of the LLM call.
- Drill down into specific prompts and how answers are produced.
- Collect feedback on responses from users.
- In your code, you can use Weave ops and calls to track what your functions are doing.
Evaluations
Systematic benchmarking of your application for you to know how good it is performing and be confident to deploy it to production.- Easily track which versions of model/prompt resulted in what performance.
- Define metrics to evaluate responses using one or more scoring functions.
- Compare two or more different evaluations over multiple metrics. Contrast specific samples for their performance.
Version everything
Weave tracks versions of your prompts, datasets, and model configurations. When something breaks, you can see exactly what changed. When something works, you can reproduce it. Learn about versioningExperiment with prompts and models
Bring your API keys and quickly test prompts and compare responses from various commercial models using the Playground. Experiment in the Weave PlaygroundCollect feedback
Capture human feedback, annotations, and corrections from production use. Use this data to build better test cases and improve your application. Collect feedbackMonitor production
Score production traffic with the same scorers you use in evaluation. Set up guardrails to catch issues before they reach users. Set up guardrails and monitorsGet started using Weave
Weave provides SDKs for Python and TypeScript. Both SDKs support tracing, evaluation, datasets, and the core Weave features. Some advanced features like class-based Models and Scorers are currently not available for the Weave TypeScript SDK. To get started using Weave:- Create a Weights & Biases account at https://wandb.ai/site and get your API key from https://wandb.ai/authorize
- Install Weave:
- In your script, import Weave and initialize a project::
-
Weave integrates with popular LLM providers and frameworks. When you use a supported integration, Weave automatically traces LLM calls without additional code changes.
However, to log traces to custom methods, add a one line decorator
weave.opto any function. Works in development or production.