Anthropic, an AI startup backed by Amazon, has taught its Claude 3.5 Sonnet model new tricks. This new model is capable of controlling desktop applications, allowing it to perform tasks traditionally handled by humans. The announcement was made on Tuesday, and the model is now available in open beta through Anthropic’s application programming interface (API).
New Capabilities
The Claude 3.5 Sonnet model can understand and interact with any desktop application. The new feature called “Computer Use” enables the model to imitate keystrokes, mouse gestures, and button clicks. This essentially allows the AI to emulate a person using a computer. According to Anthropic, the model has been trained to observe what is displayed on a screen and use available software tools to complete tasks.
In a blog post, Anthropic explained, “When a developer tasks Claude with using a piece of computer software and gives it the necessary access, Claude looks at screenshots of what’s visible to the user, then counts how many pixels vertically or horizontally it needs to move a cursor in order to click in the correct place.” This capability allows Claude to perform a variety of tasks, from searching the web to filling out forms.
Developers can access the Computer Use feature through platforms such as Amazon Bedrock and Google Cloud’s Vertex AI. The new model also includes performance improvements over its predecessor, the old Claude 3.5 Sonnet.
The Rise of AI Agents
The introduction of the updated Claude 3.5 Sonnet comes amid a growing interest in AI agents, which are designed to automate software tasks. While the concept of automating tasks on a computer is not new, the competition in this space has intensified. Various companies, including Salesforce and Microsoft, have made announcements regarding their own AI agent technologies.
Anthropic describes its approach as an “action-execution layer,” which allows the 3.5 Sonnet to perform desktop-level commands. The model can browse the web and use any application, making it a versatile tool for developers. An Anthropic spokesperson stated, “Humans remain in control by providing specific prompts that direct Claude’s actions, like ‘use data from my computer and online to fill out this form.’”
Several companies are already testing the capabilities of the new model. For instance, Replit is using an early version of Claude 3.5 Sonnet to create an “autonomous verifier” for evaluating applications during development. Canva is exploring how the model can assist in design and editing tasks.
Performance and Limitations
Despite its advanced capabilities, the Claude 3.5 Sonnet model has limitations. In tests designed to evaluate its ability to assist with tasks such as airline bookings, the model successfully completed less than half of the tasks. In another test involving return processes, it failed approximately one-third of the time. Anthropic acknowledges that the model struggles with basic actions like scrolling and zooming, and it can miss short-lived actions due to its method of capturing screenshots.
Of course there are also the potential risks associated with the new model. A recent study indicated that AI models even without desktop capabilities could engage in harmful behaviors when subjected to specific techniques. Anthropic recognizes that models with desktop access could pose greater risks, such as exploiting software vulnerabilities, but the benefits greatly outweigh the risks.
In response to these concerns, Anthropic has implemented measures to mitigate misuse. The company will not train the model on users’ screenshots and prompts. It has also developed classifiers to guide the model away from high-risk actions. With the approaching elections in the US Anthropic is also focused on preventing election-related abuse of its models.
Future Developments
In addition to the new Claude 3.5 Sonnet, Anthropic announced an updated version of its more efficient model, Claude 3.5 Haiku, which is expected to be released soon. This model aims to match the performance of its predecessor, Claude 3 Opus, while maintaining a lower cost and faster processing speed. The 3.5 Haiku model will initially be available as a text-only version, with plans for a multimodal package that can analyze both text and images.
(Photo by Rock’n Roll Monkey on Unsplash)