The Claude Computer Use API doesn't require a full VM to be useful! Anthropic has an easy-to-run VM setup in their quickstarts repo, which I used some code from, but I wanted to explore if you could accomplish tasks with a more limited set of capabilities.
The API is really: input a picture of a computer screen, and get a described "action", including coordinates. So my theory was that I could use the same API and substitute an image of a PDF page as the "screenshot", use the move_mouse, click, and type actions to determine what text to put where, and then manually add the text myself in the background using the Pillow library in Python. And it turns out it works pretty well!
I hope this inspires more projects where the "screen" is a specific interface that the user wants to manipulate--I think there are a lot of interesting things to do in between "the LLM can only call APIs and can't use a UI" and "the LLM has complete control of a full VM with shell access".
On Mac, pdf2image requires installation of poppler:
brew install poppler
pip install formfill
You must provide your Anthropic API key via environment variable:
export ANTHROPIC_API_KEY=sk-ant-api-***
FormFill can take input data either directly as a string or from a CSV file:
# Using a string input
formfill path/to/form.pdf -s "Name: John Smith, Age: 30, Occupation: Engineer"
# Using a file
formfill path/to/form.pdf -f data.csv
The filled form will be saved as {original_name}_filled.pdf
in the same directory as the command is run.