Daily Intelligence Briefing — Tech / AI / IT Monitor
Executive Summary
Google released Gemma 4, a significant open-weights model available in 26B MoE and 31B dense variants that achieves state-of-the-art performance on consumer hardware including MacBooks and RTX GPUs. NousResearch's Hermes Agent became the third fastest-growing GitHub repository, with users migrating from OpenClaw due to bloat issues—Hermes now offers 7 pluggable memory providers. Benchmark tests show dense models (Qwen 3.5 27B, Gemma 4 31B) can outperform much larger MoE models (120B+) on coding tasks when running on consumer hardware. The 2026 Vibe Jam game development competition was announced, requiring 90% AI-generated code.
Key Events
- Gemma 4 Released: Google's new open-weights model with 26B MoE and 31B dense variants achieves parity with much larger models like Kimi K2.5 while running on consumer hardware including MacBooks and RTX 3090/4090/5090 GPUs. → link
- Hermes Agent Momentum: Now the 3rd fastest-growing GitHub repo of the week, with rapid migration from OpenClaw due to cleaner architecture and better tool call parsing. 7 pluggable memory providers shipped. → link
- Dense vs MoE Benchmark Surprise: Qwen 3.5 27B dense (Q4 quantized on single RTX 3090) one-shotted a game challenge that 120B MoE models on enterprise H200 hardware failed at. → link
- Vibe Jam 2026 Announced: AI game development competition with $35,000 total prizes, requiring 90% AI-generated code, running for 1 month. → link
- Cursor 3 Launched: New version with Composer 2, built for agent-driven code generation workflows. → link
- Block Open-Sources mesh-llm: Peer-to-peer system for pooling GPU compute to run large open-source AI models. → link
Analysis
Patterns: Dense architecture models are showing unexpected strength in agent-based coding tasks compared to larger MoE models, suggesting parameter quality and harness efficiency may matter more than raw parameter count. The Hermes Agent migration wave indicates growing user frustration with bloated agent frameworks that interfere with model reasoning. Local inference is reaching genuine productivity parity with cloud services for many use cases.
Escalation/De-escalation: The competition between agent frameworks (Hermes vs OpenClaw) is intensifying, with Hermes capturing market share through open-source ethos and rapid feature shipping. GPU hardware discussions remain focused on VRAM efficiency rather than raw compute.
What to Watch Next: Expect deeper benchmarking of dense vs MoE architectures on agent tasks. Watch for more migration tooling as users flee bloated frameworks. The Vibe Jam will serve as a real-world test of AI coding capabilities at scale—results may benchmark the current state of autonomous code generation.
Tweet Feed
AI Models & Benchmarks
@TheAhmadOsman · 2026-04-02T18:42
MASSIVE Gemma 4 (31B, Dense), a model that performs on parity w/ Kimi K2.5 (1.1T, MoE) > 35x SMALLER than Kimi K2.5 Would run on any hardware at home - RTX 3090 / 4090 / 5090 * - DGX Spark / Mac Studios - MacBook Pro (24GB+) → tweet link
@TheAhmadOsman · 2026-04-03T07:20
How Fast is Gemma 4 on a MacBook Pro M4? Benchmarking Google's new MoE (26B-A4B) - TTFT: 5.68s - prompt: 3,701 tokens @ 652 tok/s - decode: 40.08 tok/s → tweet link
@sudoingX · 2026-04-03T13:28
I am still in shock that Qwen 3.5 27B dense on a single RTX 3090, a $900 GPU, one shotted a game challenge that 120B MoE at full precision on $70K+ production hardware could not. Dense models with all parameters active on every token might matter more than total parameter count for agent coding. → tweet link
@TheAhmadOsman · 2026-04-03T10:37
Fundamentals of LLMs: MoE vs Dense - Dense models (Qwen 3.5 27B, Gemma 4 31B) every parameter fires on every token. MoE models (MiniMax M2, Kimi K2.5) router + many experts, per token: activate top-k. This one design choice changes everything for inference speed, memory/VRAM, compute/FLOPs. → tweet link
@FrameworkPuter · 2026-04-03T16:32
Google's new Gemma 4 is excellent, and the 26B MoE version is likely the best model to run on a 32GB Framework Desktop. It's fast, smart, and also great for tool call if you use it with @openclaw or other local agent platforms. → tweet link
@alexocheema · 2026-04-03T18:12
Direct comparison of NVIDIA RTX 5090 to M3 Ultra. Small models should always be faster on the 5090. The best perf for large models is to use both together. Using llama.cpp isn't super fair given the performance is not great on Apple Silicon. MLX is better. → tweet link
Agent Frameworks (Hermes vs OpenClaw)
@sudoingX · 2026-04-03T14:14
Hermes agent is on absolute fire. 5th most used AI agent in the world right now. The only one on that list that is fully open source from head to toe. Open models, open harness, open memory system, open everything. NousResearch just shipped 7 pluggable memory providers in one release. → tweet link
@Teknium · 2026-04-03T15:53
Hermes Agent is the third fastest growing GitHub repo this week! → tweet link
@sudoingX · 2026-04-03T04:34
Teknium just shipped 7 pluggable memory providers. Your agent can now remember you across sessions with the backend YOU choose. Run 'hermes update' then 'hermes memory setup'. Self-hosted or cloud. Local sqlite or knowledge graphs. → tweet link
@sudoingX · 2026-04-03T04:54
My timeline is full of OpenClaw users migrating to Hermes Agent on their own. Pattern is always the same: slow gateway, broken tool calls on small local models, generic parsing that blames your model for bad output. Then they try Hermes and everything works. → tweet link
@sudoingX · 2026-04-03T13:48
I have seen the same model perform completely differently depending on what agent framework is driving it. A model that looks broken on one bloated harness will one shot the same task on a clean one. If you're running local models and they feel dumber than they should, check your harness. → tweet link
Developer Tools & Infrastructure
@TheAhmadOsman · 2026-04-03T18:05
DROP EVERYTHING: install Harbor, harbor pull unsloth/gemma-4-31B-it-GGUF:Q4_K_M, harbor up llamacpp searxng webui, open Open WebUI, load Gemma 4. Now your local model has a UI, web search, and a sandboxed stack. → tweet link
@jezell · 2026-04-02T20:30
The inevitable is happening. Codex is moving to usage based API pricing for credits. This is the way it should be. → tweet link
@gdb · 2026-04-02T22:22
We've changed our pricing so it's now possible to try Codex at work without any up-front commitment. Codex (especially through the app!) has gotten really good. → tweet link
@karpathy · 2026-04-02T20:42
LLM Knowledge Bases: Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. Raw data compiled by LLM into .md wiki, operated on by CLIs for Q&A. You rarely write or edit the wiki manually. → tweet link
@steipete · 2026-04-03T14:02
I keep hitting quota limits from GitHub's API. This hasn't been designed with agents in mind. → tweet link
Hardware & Local Inference
@sudoingX · 2026-04-03T16:28
If you want to see what 12GB of VRAM actually builds in 2026 or want to get started running local models—any GPU with 12GB VRAM you are sleeping on intelligence that can replace a few of your AI subscriptions. No one reading your prompts. No rate limits. → tweet link
@sudoingX · 2026-04-03T14:20
People keep asking me what model to run on a single 3090. It's not even close. Qwen 3.5 27B dense Q4_K_M. Undisputed. → tweet link
@TheAhmadOsman · 2026-04-03T14:50
You like Chinese opensource models then use Qwen 3.5 27B. You like American opensource models then use Gemma 4 31B. Both can run easily on consumer hardware at home and they're State of The Art models. → tweet link
@nummanali · 2026-04-03T17:44
Bun should overtake Python for ML. TypeScript is more readable. It makes it much more accessible. DevEx is better IMO. Bun unlocks so much over node. → tweet link
AI Coding & Game Development
@levelsio · 2026-04-02T18:51
THE VIBE JAM IS BACK! 2026 Vibe Coding Game Jam. Sponsored by @boltdotnew + @cursor_ai. Start: Today! Deadline: 1 May 2026. REAL CASH PRIZES: Gold: $20,000, Silver: $10,000, Bronze: $5,000. RULES: at least 90% AI-written code, multiplayer preferred, web accessible. → tweet link
@levelsio · 2026-04-03T16:11
Okay honestly this makes vibe coding into production very dangerous, you guys were all right. I think what I'll do is cut off all access to DBs and run it as a user with almost no privileges. → tweet link
@MengTo · 2026-04-03T14:24
I made a tool that generates UIs with a design system that's copyable as prompts. Everything is a prompt: font pairings, color system, spacing, icon sets, buttons and even webgl & threejs with full code samples. → tweet link
@jezell · 2026-04-03T05:39
AI is really good at cloning things everyone uses. It's still hard to make something 10x better than that thing you can clone. → tweet link
Open Source & Community
@jack · 2026-04-03T00:16
Block just open-sourced mesh-llm, a peer-to-peer system that lets anyone pool spare GPU compute to run large open-source AI models. → tweet link
@badlogicgames · 2026-04-03T18:32
People of pi. I'm emerging from the refactoring mines to give you the first pi release with some new internals. If you have extensions, please direct your agent at the changelog. → tweet link
@nummanali · 2026-04-03T14:28
Intelligence now 10x densely packed. Inference speed much faster. Training faster. They're no Opus distilled data. It'll run on your MacBook 24/7. Extrapolate this to the next SOTA models. → tweet link