Skip to content

computer-use: For compressed screenshots, format=webp is advertised in all SDKs but silently returns PNG #4568

@rovle

Description

@rovle

Summary

The compressed-screenshot endpoints accept a format query parameter that's documented as supporting webp in every public SDK, but the daemon's encoder does not implement WebP at all. Today, requests with format=webp silently return PNG bytes. The call returns 200, callers think they got WebP, while quietly getting something else.

Where webp is currently advertised

SDK source:

  • libs/sdk-python/src/daytona/common/computer_use.py:34,42: Pydantic field description and class docstring list 'png', 'jpeg', 'webp'.
  • libs/sdk-python/src/daytona/_sync/computer_use.py:394 and _async/computer_use.py:395: example: ScreenshotOptions(format="webp", quality=80, show_cursor=True).
  • libs/sdk-typescript/src/ComputerUse.ts:398: JSDoc example uses format: 'webp'.
  • libs/sdk-typescript/src/__tests__/ComputerUse.test.ts:118,122: unit test asserts webp pass-through.
  • libs/sdk-go/pkg/types/types.go:258,259: type comments mention "jpeg", "webp", etc. and JPEG/WebP.
  • libs/sdk-ruby/lib/daytona/computer_use.rb:321,443,453: multiple docstring/comment refs.
  • libs/sdk-java/src/main/java/io/daytona/sdk/ComputerUse.java:108: @param format ... {@code webp}.

User-facing docs:

  • apps/docs/src/content/docs/en/computer-use.mdx (lines 1223, 1234, 1248): top-level user guide, code examples in Python, TypeScript, and Ruby use webp.
  • Per-SDK docs mirror the source: en/python-sdk/{sync,async}/...computer-use.mdx, en/typescript-sdk/computer-use.mdx, en/go-sdk/types.mdx, en/java-sdk/computer-use.mdx (likely autogenerated from SDK source).

Daemon-side:

  • libs/computer-use/pkg/computeruse/display.go: encodeImageWithCompression switches on "jpeg" and "png", with a default case that emits PNG. There is no WebP arm.

The spread suggests webp was design intent that never got finished, not a stray copy-paste. Five SDKs, an active unit test, and the top-level user guide all advertise it. But anyone who follows the docs gets PNG without warning.

Possible resolutions

  1. Implement WebP encoding in Go. Stdlib has no WebP encoder. The viable libraries (chai2010/webp, kolesa-team/go-webp) are CGO-based and require libwebp. The daemon already uses CGO via robotgo + X11, so adding libwebp is incremental. Open questions: lossless vs lossy default, how quality maps to WebP (similar to JPEG), Dockerfile/base-image impact.
  2. Strict-reject format=webp server-side and remove WebP from SDK docs/examples. Would surface the gap as a 400 instead of silent PNG.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions