open-autoglm-phone-agent

Installation
SKILL.md

Open-AutoGLM Phone Agent

Skill by ara.so — Daily 2026 Skills collection.

Open-AutoGLM is an open-source AI phone agent framework that enables natural language control of Android, HarmonyOS NEXT, and iOS devices. It uses the AutoGLM vision-language model (9B parameters) to perceive screen content and execute multi-step tasks like "open Meituan and search for nearby hot pot restaurants."

Architecture Overview

User Natural Language → AutoGLM VLM → Screen Perception → ADB/HDC/WebDriverAgent → Device Actions
  • Model: AutoGLM-Phone-9B (Chinese-optimized) or AutoGLM-Phone-9B-Multilingual
  • Device control: ADB (Android), HDC (HarmonyOS NEXT), WebDriverAgent (iOS)
  • Model serving: vLLM or SGLang (self-hosted) or BigModel/ModelScope API
  • Input: Screenshot + task description → Output: structured action commands

Installation

Related skills
Installs
1.2K
GitHub Stars
4
First Seen
Mar 18, 2026