FABBI TECHNICAL INTELLIGENCE BRIEF
LLM · Coding Agents · Harness Engineering · AI SDLC
2026-05-30 15:34 ICT
Gate: PARTIAL-PASS
Project: ai-report-260530-1534
Candidates
386
Unique
327
GitHub
91
HN/Web
171
Papers
40

1Executive Snapshot

SignalWhy it mattersAction
386 candidates / 360 uniqueRuntime+eval > model newsNEXA harness
GitHub 91 reposOSS phân mảnh3-repo benchmark
HN/Web 171 itemsSecurity/context concernSYNCA gate
Paper 40 itemsEnterprise eval gapFabbi eval suite
YouTube 25 videosPractitioner educationTrack KOL not hype

2Trend Radar

Hot: agent harnessHot: sandboxEmerging: context memoryWatch: CLI IDE convergence

  • Hot now: eval/runtime governance, 3+ source groups.
  • Noise: demo-only agent videos, engagement N/A.
  • Watchlist: Terminal-Bench/SWE-bench variants.

3KOL/OG Feed Watch

PlatformAuthor/KOLTimestampEngagementURLWhy CTO cares
hnvinhnx2026-05-30T03:07:25Z10Show HN: VT Code – open-source terminal coding agent in RustLiên quan agentic SDLC/eval/context; score 60-82.
hnagentseal2026-05-29T21:37:59Z2Where AI coding spend goes: 48% code, 40% thinkingLiên quan agentic SDLC/eval/context; score 60-82.
hnrobert_dds2026-05-29T18:28:49Z2DDS Vibe Academy – 47 free AI coding masterclasses, built by AI agentsLiên quan agentic SDLC/eval/context; score 60-82.
hnmatt_d2026-05-29T18:07:26Z2MIT EECS/CSAIL Agentic Coding in Practice Seminar SeriesLiên quan agentic SDLC/eval/context; score 60-82.
hnnike-172026-05-29T17:25:16Z3Show HN: Sverklo – repo memory for coding agentsLiên quan agentic SDLC/eval/context; score 60-82.
hnswanros2026-05-29T17:16:19Z2My "blocked-by-default" approach to working with coding agentsLiên quan agentic SDLC/eval/context; score 60-82.
hnBrajeshwar2026-05-29T16:36:21Z6Nesbitt: Protestware for Coding AgentsLiên quan agentic SDLC/eval/context; score 60-82.
hnjimsojim2026-05-29T16:13:45Z12Ask HN: Any advice on how to learn good software architecture practices?Liên quan agentic SDLC/eval/context; score 60-82.
hnananandreas2026-05-29T14:35:42Z5Show HN: OpenHive – AI agents share solutions so other agents dont re-solve themLiên quan agentic SDLC/eval/context; score 60-82.
hnjoozio2026-05-29T07:05:31Z58Undisclosed addition in jqwik instructed AI coding agents to delete app outputLiên quan agentic SDLC/eval/context; score 60-82.
hnsparkleMing2026-05-29T07:00:42Z1Show HN: SharkBay – a local macOS workbench for coding-agent CLIsLiên quan agentic SDLC/eval/context; score 60-82.
hnpatriceckhart2026-05-29T05:48:21Z78Show HN: Zot – Yet another coding agent harnessLiên quan agentic SDLC/eval/context; score 60-82.
hnpeterneyra2026-05-29T01:18:58Z2Dis Dat – Loom for AI coding agentsLiên quan agentic SDLC/eval/context; score 60-82.
hnaanet2026-05-28T22:46:14Z1Clawd-on-Desk: a pixel desktop pet watching your AI coding agentsLiên quan agentic SDLC/eval/context; score 60-82.
hnSVI2026-05-28T21:03:24Z59Protestware for coding agentsLiên quan agentic SDLC/eval/context; score 60-82.
hnakashi_dev2026-05-28T20:44:37Z2Show HN: Rig – Local-first code graph for coding agents, in one npx commandLiên quan agentic SDLC/eval/context; score 60-82.
hnnkko2026-05-28T18:54:47Z2Coding agent can read your .env fileLiên quan agentic SDLC/eval/context; score 60-82.
hnjuanre2026-05-28T18:30:22Z3Show HN: Bootstrap a team of coding agents from a template, OSSLiên quan agentic SDLC/eval/context; score 60-82.
hnltononro2026-05-28T18:18:21Z3Show HN: Notification when coding agent is done, freeLiên quan agentic SDLC/eval/context; score 60-82.
hnramonga2026-05-28T16:11:13Z3Show HN: Free open source coding models in SlackLiên quan agentic SDLC/eval/context; score 60-82.
hnspinchange2026-05-30T02:04:12Z1Show HN: A Claude Code skill that scopes problems like Peter NaurLiên quan agentic SDLC/eval/context; score 60-82.
hnvbutsomesayw2026-05-27T04:01:44Z3Bill Gates AI on AI (one month later)Liên quan agentic SDLC/eval/context; score 60-82.
hnarmcat2026-05-24T19:37:43Z3Show HN: Simple Sprite Sheet GenerationLiên quan agentic SDLC/eval/context; score 60-82.
hnjeroen_stulen2026-05-24T10:07:13Z3Show HN: My first app, artisanally vibe-coded in 4 monthsLiên quan agentic SDLC/eval/context; score 60-82.
hnxendo2026-05-23T11:13:35Z3Zero – Programming Language for AgentsLiên quan agentic SDLC/eval/context; score 60-82.
hngoodroot2026-05-21T14:59:15Z2Show HN: opub, donated compute for open-sourceLiên quan agentic SDLC/eval/context; score 60-82.
hnafshinmeh2026-05-19T20:19:46Z3Zero: The Programming Language for AgentsLiên quan agentic SDLC/eval/context; score 60-82.
hnamitbidlan2026-05-19T17:40:39Z1Show HN: Korveo – a local firewall for AI agentsLiên quan agentic SDLC/eval/context; score 60-82.
hnMarius772026-05-19T14:09:50Z20The Programming Language for AgentsLiên quan agentic SDLC/eval/context; score 60-82.
hnsteveharing12026-05-17T20:25:40Z5Vercel's Zero: A Programming Language Designed for AI AgentsLiên quan agentic SDLC/eval/context; score 60-82.
hnalex_x2026-05-17T14:40:22Z1The Programming Language for AgentsLiên quan agentic SDLC/eval/context; score 60-82.
hnmindwarp2026-05-12T13:24:00Z1Show HN: Telegram/Slack bridge for local Codex agentsLiên quan agentic SDLC/eval/context; score 60-82.
hnerrata_dev2026-05-04T17:27:52Z1Show HN: [inerrata] – Collective and Causal Knowledge Layer for Coding AgentsLiên quan agentic SDLC/eval/context; score 60-82.
hngeorgestrakhov2026-05-03T18:51:13Z2AOP: Agent-Oriented ProgrammingLiên quan agentic SDLC/eval/context; score 60-82.
hnis-it-art2026-04-27T21:15:46Z2Show HN: Is it art? An art project for AI agentsLiên quan agentic SDLC/eval/context; score 60-82.
hnposhmosh2026-04-27T11:34:19Z4Show HN: Slerp.audio – VDJ with WebGL2 and real-time DSPLiên quan agentic SDLC/eval/context; score 60-82.
hnNickMiladinov2026-04-23T19:47:17Z7Show HN: Chestnut – The antidote to AI-induced skill atrophyLiên quan agentic SDLC/eval/context; score 60-82.
hncobblr_mosaic2026-05-26T17:38:55Z3Agentic Harness EngineeringLiên quan agentic SDLC/eval/context; score 60-82.
hnramayac2026-05-20T04:31:50Z2Show HN: GoPOSIX – a Go-native POSIX userland, ~97% BusyBox-compatibleLiên quan agentic SDLC/eval/context; score 60-82.
hnredbell2026-05-18T12:17:04Z159Learn Harness EngineeringLiên quan agentic SDLC/eval/context; score 60-82.
hnGarbage2026-05-16T04:59:11Z3Agent Harness EngineeringLiên quan agentic SDLC/eval/context; score 60-82.
hnLunar52272026-05-15T05:42:46Z1Agentic SDLC: How OpenSearch accelerates engineering with its own engineLiên quan agentic SDLC/eval/context; score 60-82.
hnsahil-shubham2026-05-14T10:44:17Z3Show HN: Bhatti – self-hosted runtime for your harness engineeringLiên quan agentic SDLC/eval/context; score 60-82.
hngruyaume2026-05-12T14:37:45Z1Implicit Knowledge Is a LiabilityLiên quan agentic SDLC/eval/context; score 60-82.
hnpretext2026-05-10T05:19:22Z8Agent Harness EngineeringLiên quan agentic SDLC/eval/context; score 60-82.
hnstraydusk2026-05-08T22:57:31Z1Ask HN: Is agent-driven QA a thing?Liên quan agentic SDLC/eval/context; score 60-82.
hnnbstme2026-05-03T19:03:04Z2Why does my harness forget me? Agent engineeringLiên quan agentic SDLC/eval/context; score 60-82.
hnkumulo2026-04-30T11:31:33Z1Harness engineering: leveraging Codex in an agent-first worldLiên quan agentic SDLC/eval/context; score 60-82.
hnanophelon2026-04-29T07:30:11Z14Why Codex works better than Claude Code for my production monolithLiên quan agentic SDLC/eval/context; score 60-82.
hnElFitz2026-04-27T10:59:16Z6Ask HN: What does your agentic software dark factory look like?Liên quan agentic SDLC/eval/context; score 60-82.
hnkiyanwang2026-04-27T05:50:28Z1Agent Harness EngineeringLiên quan agentic SDLC/eval/context; score 60-82.
hnalex000kim2026-04-26T18:13:46Z7You've been doing harness engineering all alongLiên quan agentic SDLC/eval/context; score 60-82.
hnjdw642026-04-19T08:42:37Z10Ask HN: May be a basic question, but how can I use AI well?Liên quan agentic SDLC/eval/context; score 60-82.
hnalexblackwell_2026-04-16T15:19:54Z100Launch HN: Kampala (YC W26) – Reverse-Engineer Apps into APIsLiên quan agentic SDLC/eval/context; score 60-82.
hngeopsist2026-05-28T12:39:46Z6We Benchmarked Claude Code, Codex, Semgrep, CodeQL, Trent on 28 CWE-Bench CVEsLiên quan agentic SDLC/eval/context; score 60-82.
hnfittingopposite2026-05-28T05:05:59Z2Mini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python codeLiên quan agentic SDLC/eval/context; score 60-82.
hnkimjune012026-05-24T18:03:28Z2Show HN: 97% on SWE-bench Verified with subscription-token agentsLiên quan agentic SDLC/eval/context; score 60-82.
hnSushrutkm2026-05-19T10:02:03Z2Bito's AI Architect Boosts Claude Opus's task success rate by 35%Liên quan agentic SDLC/eval/context; score 60-82.
hnazurewraith2026-05-12T14:24:55Z126Show HN: Statewright – Visual state machines that make AI agents reliableLiên quan agentic SDLC/eval/context; score 60-82.
hnlieret2026-05-05T15:10:41Z24Show HN: New Benchmark from SWE-bench team is 0% solvedLiên quan agentic SDLC/eval/context; score 60-82.
hnPhilpax2026-05-02T21:35:54Z2talkie-coder: From 1930 to SWE-benchLiên quan agentic SDLC/eval/context; score 60-82.
hnjryio2026-04-29T19:16:48Z2Anthropic's Argument for Mythos SWE-bench improvement contains a fatal errorLiên quan agentic SDLC/eval/context; score 60-82.
hnkmdupree2026-04-26T13:58:13Z343SWE-bench Verified no longer measures frontier coding capabilitiesLiên quan agentic SDLC/eval/context; score 60-82.
hngeorge_ciobanu2026-04-24T21:34:31Z10Show HN: Codex context bloat? 87% avg reduction on SWE-bench Verified tracesLiên quan agentic SDLC/eval/context; score 60-82.
hnnicola_alessi2026-04-16T20:19:18Z1Ask HN: Opus 4.7 – is anyone measuring the real token cost on agentic tasks?Liên quan agentic SDLC/eval/context; score 60-82.
hnstared2026-04-14T08:32:45Z1Compare harnesses not models: Blitzy vs. GPT-5.4 on SWE-Bench ProLiên quan agentic SDLC/eval/context; score 60-82.
hnsriharis2026-04-13T13:14:44Z3Checking my model vibes against SWE-Bench ProLiên quan agentic SDLC/eval/context; score 60-82.
hnchenglin972026-04-10T22:48:29Z4SWE-Bench Verified Leaderboard March 2026 – Independent vs. Self-Reported ScoresLiên quan agentic SDLC/eval/context; score 60-82.
hnraghavchamadiya2026-04-06T20:15:26Z1Show HN: Repowise – Codebase intelligence for AI coding agents (open source)Liên quan agentic SDLC/eval/context; score 60-82.
hnasfsf234232026-03-29T21:47:54Z6SWE-bench will hit 90% this yearLiên quan agentic SDLC/eval/context; score 60-82.
hnstared2026-03-25T14:24:43Z4Blitzy Scores a Record 66.5% on SWE-Bench ProLiên quan agentic SDLC/eval/context; score 60-82.
hnneversettles2026-05-03T03:40:04Z1The Terminal Bench 3.0 community is looking for task contributorsLiên quan agentic SDLC/eval/context; score 60-82.
hngk12026-04-29T18:16:23Z4ForgeCode: Top open source coding agent in Terminal-Bench 2.0Liên quan agentic SDLC/eval/context; score 60-82.
hnubermon2026-04-28T19:11:57Z6Open-weight 27B hits 38% on Terminal-Bench 2.0 (Opus 4.1 hit 38% in Aug 2025)Liên quan agentic SDLC/eval/context; score 60-82.
hnGodelNumbering2026-04-27T12:35:55Z393Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-previewLiên quan agentic SDLC/eval/context; score 60-82.
hnneversupervised2026-04-15T00:42:30Z6Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environmentsLiên quan agentic SDLC/eval/context; score 60-82.
hnjackykwok2026-04-14T20:27:39Z1A simple test-time method that beats Claude Mythos on Terminal-BenchLiên quan agentic SDLC/eval/context; score 60-82.
hn_nhynes2026-04-13T07:48:11Z1Show HN: Amber, a capability-based runtime/compiler for agent benchmarksLiên quan agentic SDLC/eval/context; score 60-82.
hnjoozio2026-04-01T12:59:36Z4Claude Code ranks 39th on terminal bench. The leaked source shows whyLiên quan agentic SDLC/eval/context; score 60-82.
hnbcollins342026-03-31T19:07:11Z4Show HN: Wozcode – double Claude Code outputLiên quan agentic SDLC/eval/context; score 60-82.

4Repo Watch

RepoMetricUpdatedFabbi move
genusercillaindependentclause781/gstack0 stars2026-05-30T08:30:54ZTrial nếu có sandbox/logging/test hooks.
semionkuksov23/personal-workspace-os1 stars2026-05-30T08:30:52ZTrial nếu có sandbox/logging/test hooks.
piqueriastrongbreeze520/claude-and-codex-website1 stars2026-05-30T08:30:48ZTrial nếu có sandbox/logging/test hooks.
nolte/claude-shared0 stars2026-05-30T08:30:47ZTrial nếu có sandbox/logging/test hooks.
MyHeavenDyf/UXAI0 stars2026-05-30T08:30:46ZTrial nếu có sandbox/logging/test hooks.
Machineaccessible-ochre867/agenthub1 stars2026-05-30T08:30:39ZTrial nếu có sandbox/logging/test hooks.
Acromegaliacanaliculus452/Swift-Testing-Agent-Skill0 stars2026-05-30T08:30:34ZTrial nếu có sandbox/logging/test hooks.
entrepreneurial-cabinetminister913/harness-engineering0 stars2026-05-30T08:30:34ZTrial nếu có sandbox/logging/test hooks.
jeongmk522-netizen/agentlas-desktop1 stars2026-05-30T08:30:48ZTrial nếu có sandbox/logging/test hooks.
linny006/agent-eval-harness0 stars2026-05-30T08:30:29ZTrial nếu có sandbox/logging/test hooks.
elgrhy/gx2 stars2026-05-30T08:30:54ZTrial nếu có sandbox/logging/test hooks.
lorenzocl3940/gsd-20 stars2026-05-30T08:29:55ZTrial nếu có sandbox/logging/test hooks.
intangible-sidalceamalviflora302/engram0 stars2026-05-30T08:28:35ZTrial nếu có sandbox/logging/test hooks.
laoxs2002/genai-agentes0 stars2026-05-30T08:27:12ZTrial nếu có sandbox/logging/test hooks.
aditxver/engram2 stars2026-05-30T08:25:58ZTrial nếu có sandbox/logging/test hooks.

5Paper / Benchmark Watch

Paper/benchmarkDateMetricUse
Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software2026-05-28T17:59:59ZN/A public APIEval design, không ship trực tiếp.
SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations2026-05-28T17:59:50ZN/A public APIEval design, không ship trực tiếp.
GPIC: A Giant Permissive Image Corpus for Visual Generation2026-05-28T17:59:26ZN/A public APIEval design, không ship trực tiếp.
REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image2026-05-28T17:59:01ZN/A public APIEval design, không ship trực tiếp.
Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents2026-05-28T17:58:55ZN/A public APIEval design, không ship trực tiếp.
SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones?2026-05-28T17:57:37ZN/A public APIEval design, không ship trực tiếp.
RoboWits: Unexpected Challenges for Robotic Creative Problem Solving2026-05-28T17:57:15ZN/A public APIEval design, không ship trực tiếp.
Gram: Assessing sabotage propensities via automated alignment auditing2026-05-28T17:56:18ZN/A public APIEval design, không ship trực tiếp.
A Bayesian Proof and Interpretation of Talagrand's Majorizing Measure Theorem2026-05-28T17:56:03ZN/A public APIEval design, không ship trực tiếp.
SpecBench: Evaluating Specification-Level Reasoning for Software Engineering LLM Agents2026-05-28T17:54:01ZN/A public APIEval design, không ship trực tiếp.
Loong: A Human-Like Long Document Translation Agent with Observe-and-Act Adaptive Context Selection2026-05-28T17:32:25ZN/A public APIEval design, không ship trực tiếp.
PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions2026-05-28T17:29:19ZN/A public APIEval design, không ship trực tiếp.
Quiver Approach to Symmetry Theories2026-05-28T17:59:59ZN/A public APIEval design, không ship trực tiếp.
Benchmarking Single-Factor Physical Video-to-Audio Generation2026-05-28T17:59:09ZN/A public APIEval design, không ship trực tiếp.
Visualizing orbital magnetism in electron doped rhombohedral multilayer graphene2026-05-28T17:54:30ZN/A public APIEval design, không ship trực tiếp.

6Impact Coverage

DomainNow 0-2wNext 1-2mLater 3-6mDecision
FAREContext/codebase map evalRepo memory PoCKnowledge layertrial
NEXAHarness for coding agentSandbox executorMulti-agent orchestrationadopt
SYNCAQuality/risk gatesHuman-in-loop auditCompliance dashboardadopt
DOMUSMonitorAI ops assistantWorkflow automationwatch
Japan/VN/GlobalEnterprise cautionPilot bundlesManaged AI-SDLC servicetrial

7CTO Evaluation Matrix

Top signalThesisEvidenceCounter-signalFabbi implicationConfidenceDecisionNext validation
Harness-first coding agentsAgent value phụ thuộc eval loop386 candidates, 91 GitHubX/FB N/ANEXA+SYNCA differentiation78%adoptRun 20-task benchmark
CLI/IDE convergenceWorkflow moved to terminal+IDE25 YT, 171 HNVideo metadata partialDev enablement package68%trialMeasure review time
Context engineeringRepo understanding bottleneck40 paper refsNo universal benchmarkFARE moat72%trialCodebase Q&A eval

8CTO Recommendations

ActionROI/time-savingRiskOwnerTTVValidation
Build NEXA Agent Harness v0: 20 tasks, cost/latency/pass@1/security log.18-28%2/5AI Platform Lead2 tuầnBaseline vs agent
Add SYNCA HITL gate: risk class, diff summary, rollback checklist.12-20%2/5QA/Governance Lead10 ngàyDefect escape rate
Run FARE context-memory PoC on 2 legacy repos.15-25%3/5Solution Architect3 tuầnOnboarding time
Create Japan/VN packaged AI-SDLC pilot offer.8-15%3/5Delivery Director4 tuần2 paid pilots / 60 days

9Data Quality / Scan Health

Scanned 386 candidates; deduped 327. Counts: {'hn': 171, 'youtube': 25, 'github': 91, 'paper': 40}. X direct/Facebook public: N/A — no authenticated API/public usable links in cron; confidence impact -12%. Reddit JSON returned 0 usable; compensated by HN/GitHub/papers/YouTube. Status PARTIAL-PASS.