AI Tech Digest

AI Tech digest — April 20, 2026

The AI Tech Digest is evolving. We’re shifting from industry news to focusing on what matters to builders: new tools, trending open-source projects, and the best from the AI developer community. If you want earnings reports and CEO drama, there are plenty of other newsletters. This one is for people who ship.

This Week’s Top Stories

1. Gemma 4 Crosses 2 Million Downloads — Apache 2.0 Is Paying Off

Google’s Gemma 4 family has hit 2 million downloads less than three weeks after launch, already closing in on Gemma 3’s full-year total of 6.7M. The model family (2B, 4B, 26B-A4B MoE, 31B dense, plus E2B/E4B edge variants) ships under Apache 2.0 — a first for Google’s open-weight models — and includes native audio and vision capabilities.

The 31B dense variant benchmarks close to Gemini 3.1 Pro on reasoning tasks while running on consumer GPUs. The E4B edge model is designed for on-device deployment, and the Hugging Face community has already built Ollama and llama.cpp integration for running the smaller variants locally.

The licensing switch to Apache 2.0 is the headline. Developers can now fine-tune, deploy commercially, and ship products without negotiating with Google Legal. For teams that avoided Gemma over licensing uncertainty, that constraint is gone.

2. GLM-5.1 Goes Open Source Under MIT — Beats GPT-5.4 on SWE-Bench Pro

Zhipu AI open-sourced GLM-5.1 on April 7 under the MIT license. The 744B-parameter MoE model (only 40B active parameters per inference) outperforms GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro, the industry-standard benchmark for real-world software engineering tasks.

The model also ships as GLM-5V-Turbo, a multimodal variant tuned for coding from screenshots and diagrams. At $0.28 per million input tokens via API, it undercuts OpenAI’s GPT-5.4 ($1.75/M) by 6x while leading on coding-specific benchmarks. For high-volume code review, refactoring, and test generation workloads, the cost difference alone justifies evaluation.

The MIT license means zero restrictions: no attribution requirements, no commercial use caveats, no maximum revenue clauses. Zhipu AI is essentially saying: take this and build whatever you want.

3. llama.cpp Ships Vulkan Flash Attention + Qwen3 Audio Support

The latest releases of llama.cpp bring two upgrades for local inference. Vulkan flash attention delivers faster inference on AMD GPUs and integrated graphics, closing the performance gap with NVIDIA’s CUDA backend. This matters for anyone running LLMs on non-NVIDIA hardware — Steam Deck owners, Linux laptop users, and cloud instances with AMD accelerators.

Separately, llama.cpp now supports Qwen3 audio input, enabling speech-to-text and audio understanding pipelines that run entirely locally. Combined with the existing multimodal vision support, you can run speech + vision + text models without any cloud dependency.

There are still rough edges — several open issues report bugs with Qwen3.5 models on both Vulkan and ROCm backends, and the community is actively debugging CUDA regressions with hybrid attention architectures. But the trajectory is clear: local inference is getting faster, cheaper, and more capable every week.

  • Why it matters: Flash attention on Vulkan means fast local inference on hardware most people already own. The AMD ecosystem for local LLMs has lagged behind CUDA for years — this closes the gap.
  • GitHub · Vulkan flash attention issues tracker

4. METATRON: Offline AI Penetration Testing with Local LLMs

METATRON is a new open-source penetration testing assistant that runs entirely offline. Built for Parrot OS and Debian-based Linux distributions, it combines automated reconnaissance tooling with a locally-hosted LLM (typically a fine-tuned Qwen 3.5 via Ollama) for vulnerability analysis and remediation suggestions.

No API keys, no cloud connectivity, no subscriptions. Feed it an IP address or a domain, and it performs automated scanning, analyzes results through a local model, and generates vulnerability reports with remediation guidance. The project hit 705 GitHub stars within weeks of its early April release.

For security researchers who work in air-gapped environments, handle sensitive infrastructure, or simply don’t want their reconnaissance traffic routed through third-party AI services, METATRON solves a real problem. The MIT license and Python 3 codebase make it straightforward to extend with custom scanning modules.

5. Ring-a-Ding: AI Agents That Can Make Phone Calls

Ring-a-Ding launched this week as a telephony skill for AI agents — and it just shipped an OpenClaw integration. For $19/month, your agent gets a phone number and can make outbound calls: booking appointments, getting price quotes, checking store inventory, or any other task that still requires a phone call in 2026.

It handles SIP connectivity, real-time voice routing, call transcription, and automatic summaries. The calls are recorded and transcribed, so your agent can reference them later. Think of it as giving your AI assistant a phone — not a new concept, but the OpenClaw integration makes it trivially composable with existing agent workflows.

If your business still relies on phone-based processes (and many do — restaurants, contractors, healthcare), this bridges agent automation and the real world.

6. r/LocalLLaMA April 2026 Megathread: Qwen 3.5 vs. Gemma 4 Debate

The monthly Best Local LLMs thread hit 330 upvotes and 141 comments this week, with the community split between Qwen 3.5-35B-A3B and Gemma 4-31B as the go-to model for local deployment.

The consensus from the thread:

  • Qwen 3.5-35B-A3B wins for agentic coding workflows. The hybrid Mamba2 + attention architecture gives it strong performance at low active parameter counts, making it efficient on consumer hardware. Community members report it’s “a gamechanger for agentic coding.”
  • Gemma 4-31B wins for general-purpose chat and reasoning. The Apache 2.0 license and native multimodal support (audio + vision) give it broader applicability. But tool calling reliability with agent frameworks like Hermes and OpenCode remains a work in progress.
  • GLM-5.1 is generating buzz but isn’t yet available on Ollama, so the local community hasn’t fully tested it. Several commenters flagged that Chinese model providers have been slow to release weights to Western download mirrors.

The thread also captures an ongoing tension between local-first purists and those who argue cloud frontier models (Opus 4.7, GPT-5.4) have pulled too far ahead for local models to compete on anything except cost.

7. OpenClaw 2026.4.14: The Security-Hardening Release

OpenClaw dropped version 2026.4.14 this week. Not a flashy feature release, but the most important update of the month for anyone running AI agents in production: over 50 fixes addressing prompt injection vectors, config file protection, and subagent reliability.

The standout changes: smarter GPT-5.4 routing and automatic recovery when model calls fail, Chrome CDP improvements for browser automation stability, and fixes for Slack, Telegram, and Discord messaging integrations that were causing subagents to get stuck in infinite loops. The project also introduced a “Dreaming” feature in v2026.4.9 that uses REM-style memory consolidation to process historical conversation data during idle periods — essentially letting your agent sleep on its experiences.

OpenClaw is now at 250K+ GitHub stars after surpassing React’s 10-year record in just 60 days.

  • Why it matters: When your agent has shell access and messaging integrations, security isn’t optional. This release treats prompt injection as a first-class bug class.
  • GitHub · Release notes · Security analysis

Quick Hits

  • OpenAI Codex CLI hit 0.121 alpha — expanding beyond pure coding into computer use, web workflows, image generation, and SSH devbox integration. The open-source Rust-based terminal agent is becoming a full automation platform. GitHub

  • Alibaba released Qwen 3.6-Plus (April 16) with 1M token context and improved agentic coding. Available free on OpenRouter. OpenRouter

  • Open-source AI landscape April 2026: Five of the six major open models now use mixture-of-experts architectures. Apache 2.0 and MIT licenses cover Gemma 4, Qwen 3.6, Mistral Small 4, gpt-oss-120b, and GLM-5. Full guide

What to Watch

  • GPT-5.5 “Spud”: Pretraining is complete. Prediction markets point to a Q2 2026 release, likely before June 30. OpenAI may brand it as GPT-6 if the capability jump is large enough.

  • Claude Mythos / Project Glasswing: Anthropic’s unreleased frontier model (internally codenamed Capybara) remains gated to ~50 partner organizations. No public release date. The restricted preview at $25/$125 per million tokens suggests a different approach to frontier model distribution.

  • Meta’s next open-source models: Under new leadership from Alexandr Wang, Meta is preparing to release the first AI models developed under his tenure, with plans for eventual open-source licensing.

  • Google Gemma 4 keynote (London): Google is expected to announce expanded Gemma 4 capabilities and new deployment partnerships.


That’s this week’s digest. If you’re building something with any of these tools, I’d love to hear about it. Drop a comment or hit me up on X/Twitter.