Tech / AI / IT Intelligence Briefing
Period: 2026-03-22T19:00 – 2026-03-23T19:30 UTC
Executive Summary
The open-source AI agent space is experiencing a notable momentum surge, with Hermes Agent by NousResearch hitting 10,000 GitHub stars and generating significant community buzz around local-first, privacy-respecting AI tooling. MiniMax's M2.7 model is drawing attention for strong coding and multi-agent performance, with the MiniMax team actively engaging the developer community around upcoming releases including M3. On the infrastructure side, Apple's M5 Pro/Max MacBooks are enabling novel RDMA-over-Thunderbolt 5 clusters for running 30B+ parameter models locally with near-linear scaling. Sam Altman announced his departure from the Helion board as OpenAI and the fusion energy startup begin exploring deeper collaboration. Meanwhile, tinygrad reported a 1.8x flash attention performance improvement over PyTorch's AOTriton on AMD hardware, and the broader developer community continues debating AI coding tool costs and the viability of local vs. cloud inference.
Key Events
-
Hermes Agent (NousResearch) hits 10K GitHub stars with a near-vertical growth curve; 32 PRs merged in a single day; features 11 per-model tool call parsers for local models (Qwen, DeepSeek, LLaMA, Mistral, GLM); community positions it as the leading open-source, local-first agent harness. → link
-
Sam Altman steps down from Helion board as OpenAI and Helion prepare to work together "at significant scale"; Altman retains financial interest but cites governance clarity as motivation. → link
-
MiniMax M2.7 shows strong benchmark results across MLE and semi-autonomous RL; team met with developers in San Jose; upcoming M3 model discussed; MiniMax team endorses Hermes Agent compatibility. → link
-
Apple M5 MacBook RDMA clusters enable daisy-chaining up to 4 MacBooks via Thunderbolt 5 for single-digit microsecond latency tensor parallelism, running 1T parameter models at 70+ tok/s with near-linear scaling. → link
-
tinygrad achieves 1.8x faster flash attention than PyTorch/AOTriton on AMD RDNA3/Strix Halo hardware, attributed to kernel profiler quality; LLM-coded flash attention confirmed outperforming PyTorch's implementation. → link
-
Cursor's new coding model revealed to be built on Moonshot AI's Kimi, per TechCrunch report; highlights the growing role of Chinese AI labs in powering Western developer tools. → link
-
NVIDIA Kimodo released: text-to-3D motion model trained on 700 hours of professional mocap data; supports human and robot skeletons; available free on Hugging Face. → link
-
OpenAI offering PE firms 17.5% guaranteed minimum return plus early model access, per RT of @AndrewCurran_; signals aggressive capital strategy. → link
-
Hugging Face Daily Papers SKILL.md tool enables agents to read papers as markdown, search, and find linked resources — a new agentic research interface. → link
-
Pi agent now available in Hugging Face "Use this model" menu for compatible MLX models, lowering barrier to local agent deployment. → link
-
ArrowJS 1.0 open-sourced: described as the first UI framework designed for coding agents — no compiler, no build step, React/Vue-like paradigm. → link
-
Agones moves to CNCF: open-source multiplayer game server infrastructure now under CNCF governance, broadening enterprise adoption path. → link
-
Ghostty terminal surpasses Terraform in GitHub stars, reaching the milestone in a fraction of the time (Terraform took 12 years). → link
-
Framework Computer announces pricing reductions on select products for US customers following tariff updates. → link
-
jack (Jack Dorsey) on the future of open source: argues value is shifting from code to data, provenance, protocols, evals, and model weights — in that order. → link
-
GB10 / Flash-MoE local inference optimization: developer reports 1.94x speedup via zero-copy GPU reads from mmap'd NVMe page cache on NVIDIA GB10, plus a pre-attention expert prediction model achieving 93–97% cache hit rate via async prefetch. → link
Analysis
Pattern: Open-Source Local AI Momentum Is Real and Accelerating
The volume of organic developer activity around Hermes Agent, Pi, and local model stacks (RTX 3060/3090, MacBook RDMA clusters) is unusually high. This is not astroturf — it reflects genuine frustration with per-token API costs and privacy concerns. The 10K-star milestone for Hermes Agent within what appears to be days of launch, combined with 32 merged PRs in a single day, signals a genuinely active maintainer and community. Framework Computer's public endorsement of Hermes running on their hardware adds a hardware-vendor dimension to this momentum.
Pattern: Chinese AI Labs Increasingly Powering Western Dev Tools
The Cursor/Kimi revelation is significant — it suggests Western developer tool companies are quietly sourcing foundation models from Chinese labs (MiniMax, Moonshot AI, Qwen) for cost or capability reasons. MiniMax M2.7 is gaining developer mindshare rapidly, with multiple independent demos of coding and multi-agent tasks. Watch for further disclosures of similar arrangements.
Pattern: Infrastructure Debate — Bubble vs. Buildout
Multiple accounts (@thdxr, @TheAhmadOsman) are pushing back against "AI bubble" narratives, arguing AI inference demand is structurally growing and current infrastructure is already strained. The OpenAI/PE 17.5% guaranteed return offering fits this pattern — aggressive capital deployment to lock in infrastructure at scale.
Pattern: Agentic Tooling Ecosystem Maturing Rapidly
ArrowJS (UI for agents), Hugging Face SKILL.md (agents reading papers), Kubernetes Agent Sandbox, WASM sandboxes, terminal diff tools with agent remote control — the tooling layer for agentic AI is rapidly filling in. Expect consolidation pressure on smaller tools within 60–90 days.
What to Watch
- MiniMax M3 release timeline and benchmark positioning against GPT-4-class models
- Hermes Agent vs. OpenClaw competitive dynamics — the community debate is intensifying and FrameworkPuter is now publicly aligned with Hermes
- Apple RDMA cluster ecosystem — whether exolabs and similar tools formalize multi-MacBook inference as a supported deployment pattern
- Cursor's model sourcing and whether other Western dev tools have similar undisclosed dependencies on Chinese foundation models
- OpenAI/Helion collaboration scope — energy infrastructure for compute is a long-term strategic story
Tweet Feed
🤖 Hermes Agent / NousResearch / Open-Source Agents
@sudoingX · 2026-03-23T05:57
hermes agent hit 10K stars and the curve is vertical. no other agent harness is shipping this fast while staying fully open source and local focused. 11 per-model tool call parsers. your local qwen, deepseek, llama, mistral, glm all get parsed correctly out of the box... teknium merged 32 PRs in a single day this week. → tweet link
@Teknium · 2026-03-23T19:11
RT @NousResearch: Did you feel that vibe shift anon? Open Source is in the air. → tweet link
@Teknium · 2026-03-23T13:14
Hermes Agent tip of the day: use /bg
or /background to have Hermes Agent execute an additional task in the background. When it's done, it just pops it back into your main session... → tweet link
@Teknium · 2026-03-23T13:09
I think this is still an underrated tool! Not only does this make it easy to install Hermes (and apparently on windows native) - it also gives it access to like 50+ other AI tools that are also built into Pinokio → tweet link
@Teknium · 2026-03-23T12:53
RT @nyk_builderz: Just shipped awesome-hermes-agent — A curated list of 40+ skills, tools, integrations, and resources for the @NousResearch Hermes Agent → tweet link
@Teknium · 2026-03-23T01:28
RT @LottoLabs: Qwen 3.5 27b + hermes agent saas — All done in my phone, minimal steering, working autonomously w/ tests, self auditing its... → tweet link
@sudoingX · 2026-03-23T03:44
the founder of openclaw joined the company that was founded to make AI open and now charges you per token... "open models aren't there yet" is what you say when your harness can't parse tool calls on local models and you blame the model instead of fixing the harness. → tweet link
@sudoingX · 2026-03-23T13:21
we are trending, hermes agent is trending, opensource is trending. did you make the switch? → tweet link
@FrameworkPuter · 2026-03-22T23:15
Awesome to see the rapid adoption on Hermes. We've seeing a lot of people running it locally on Framework Desktops! → tweet link
@SkylerMiao7 · 2026-03-23T01:17
This is an outstanding project. I really like it; the taste is excellent. M2.7 has good performance on it. We are also considering further iterations and optimizations. @NousResearch → tweet link
@badlogicgames · 2026-03-23T13:23
RT @thorstenball: Btw, it's Amp → tweet link
@badlogicgames · 2026-03-23T13:23
RT @trq212: we're testing a new version of /init based on your feedback — it should interview you and help setup skills, hooks, etc. → tweet link
🧠 MiniMax M2.7 / Upcoming Models
@SkylerMiao7 · 2026-03-23T01:22
We've pushed performance quite a bit here — M2.7 is showing strong results across both MLE and semi-autonomous RL. More details in the report 👇 → tweet link
@TheAhmadOsman · 2026-03-23T11:30
Was really great to meet with the MiniMax team. We discussed the upcoming M2.7 weights, the highly-anticipated M3, and opensource AI → tweet link
@TheAhmadOsman · 2026-03-22T22:53
Peter is wrong. He needs to try MiniMax M2.7 and Qwen 3.5 27B in OpenClaw before making these comments → tweet link
@SkylerMiao7 · 2026-03-23T07:19
RT @chetaslua: My first test of MiniMax M2.7 — this made 3d fibre physics with all the specs of m2.7 is written → tweet link
@SkylerMiao7 · 2026-03-23T07:19
RT @chetaslua: MiniMax M2.7 multi agents system — made a water physics with proper temperature control website → tweet link
@badlogicgames · 2026-03-23T08:47
hey @MiniMax_AI your OpenAI Completions endpoint emits \<thinking> inside of assistant text blocks, instead of emitting separate thinking blocks. No other provider claiming OpenAI Completions compatibility does this. Can this be fixed? → tweet link
💻 Local Inference / Hardware / GPU
@alexocheema · 2026-03-23T00:18
The new M5 Pro/Max MacBooks have 3 Thunderbolt 5 ports, enabling you to create RDMA clusters with up to 4 MacBooks. The latency with RDMA over Thunderbolt is single digit microseconds, fast enough for tensor parallelism with close to linear scaling. → tweet link
@alexocheema · 2026-03-23T02:08
The new LAN party is RDMA party. Daisy chain MacBooks up to 4 to run 1T parameter models with linear scaling. → tweet link
@sudoingX · 2026-03-23T15:36
i ran a 9 billion parameter model on a single RTX 3060. 50 tokens per second. it wrote