Technical Intelligence Brief PARTIAL-PASS

Candidates

386

Unique

327

GitHub

HN/Web

171

Papers

1Executive Snapshot

Signal	Why it matters	Action
386 candidates / 360 unique	Runtime+eval > model news	NEXA harness
GitHub 91 repos	OSS phân mảnh	3-repo benchmark
HN/Web 171 items	Security/context concern	SYNCA gate
Paper 40 items	Enterprise eval gap	Fabbi eval suite
YouTube 25 videos	Practitioner education	Track KOL not hype

2Trend Radar

Hot: agent harnessHot: sandboxEmerging: context memoryWatch: CLI IDE convergence

Hot now: eval/runtime governance, 3+ source groups.
Noise: demo-only agent videos, engagement N/A.
Watchlist: Terminal-Bench/SWE-bench variants.

3KOL/OG Feed Watch

Platform	Author/KOL	Timestamp	Engagement	URL	Why CTO cares
hn	vinhnx	2026-05-30T03:07:25Z	10	Show HN: VT Code – open-source terminal coding agent in Rust	Liên quan agentic SDLC/eval/context; score 60-82.
hn	agentseal	2026-05-29T21:37:59Z	2	Where AI coding spend goes: 48% code, 40% thinking	Liên quan agentic SDLC/eval/context; score 60-82.
hn	robert_dds	2026-05-29T18:28:49Z	2	DDS Vibe Academy – 47 free AI coding masterclasses, built by AI agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	matt_d	2026-05-29T18:07:26Z	2	MIT EECS/CSAIL Agentic Coding in Practice Seminar Series	Liên quan agentic SDLC/eval/context; score 60-82.
hn	nike-17	2026-05-29T17:25:16Z	3	Show HN: Sverklo – repo memory for coding agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	swanros	2026-05-29T17:16:19Z	2	My "blocked-by-default" approach to working with coding agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	Brajeshwar	2026-05-29T16:36:21Z	6	Nesbitt: Protestware for Coding Agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	jimsojim	2026-05-29T16:13:45Z	12	Ask HN: Any advice on how to learn good software architecture practices?	Liên quan agentic SDLC/eval/context; score 60-82.
hn	ananandreas	2026-05-29T14:35:42Z	5	Show HN: OpenHive – AI agents share solutions so other agents dont re-solve them	Liên quan agentic SDLC/eval/context; score 60-82.
hn	joozio	2026-05-29T07:05:31Z	58	Undisclosed addition in jqwik instructed AI coding agents to delete app output	Liên quan agentic SDLC/eval/context; score 60-82.
hn	sparkleMing	2026-05-29T07:00:42Z	1	Show HN: SharkBay – a local macOS workbench for coding-agent CLIs	Liên quan agentic SDLC/eval/context; score 60-82.
hn	patriceckhart	2026-05-29T05:48:21Z	78	Show HN: Zot – Yet another coding agent harness	Liên quan agentic SDLC/eval/context; score 60-82.
hn	peterneyra	2026-05-29T01:18:58Z	2	Dis Dat – Loom for AI coding agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	aanet	2026-05-28T22:46:14Z	1	Clawd-on-Desk: a pixel desktop pet watching your AI coding agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	SVI	2026-05-28T21:03:24Z	59	Protestware for coding agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	akashi_dev	2026-05-28T20:44:37Z	2	Show HN: Rig – Local-first code graph for coding agents, in one npx command	Liên quan agentic SDLC/eval/context; score 60-82.
hn	nkko	2026-05-28T18:54:47Z	2	Coding agent can read your .env file	Liên quan agentic SDLC/eval/context; score 60-82.
hn	juanre	2026-05-28T18:30:22Z	3	Show HN: Bootstrap a team of coding agents from a template, OSS	Liên quan agentic SDLC/eval/context; score 60-82.
hn	ltononro	2026-05-28T18:18:21Z	3	Show HN: Notification when coding agent is done, free	Liên quan agentic SDLC/eval/context; score 60-82.
hn	ramonga	2026-05-28T16:11:13Z	3	Show HN: Free open source coding models in Slack	Liên quan agentic SDLC/eval/context; score 60-82.
hn	spinchange	2026-05-30T02:04:12Z	1	Show HN: A Claude Code skill that scopes problems like Peter Naur	Liên quan agentic SDLC/eval/context; score 60-82.
hn	vbutsomesayw	2026-05-27T04:01:44Z	3	Bill Gates AI on AI (one month later)	Liên quan agentic SDLC/eval/context; score 60-82.
hn	armcat	2026-05-24T19:37:43Z	3	Show HN: Simple Sprite Sheet Generation	Liên quan agentic SDLC/eval/context; score 60-82.
hn	jeroen_stulen	2026-05-24T10:07:13Z	3	Show HN: My first app, artisanally vibe-coded in 4 months	Liên quan agentic SDLC/eval/context; score 60-82.
hn	xendo	2026-05-23T11:13:35Z	3	Zero – Programming Language for Agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	goodroot	2026-05-21T14:59:15Z	2	Show HN: opub, donated compute for open-source	Liên quan agentic SDLC/eval/context; score 60-82.
hn	afshinmeh	2026-05-19T20:19:46Z	3	Zero: The Programming Language for Agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	amitbidlan	2026-05-19T17:40:39Z	1	Show HN: Korveo – a local firewall for AI agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	Marius77	2026-05-19T14:09:50Z	20	The Programming Language for Agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	steveharing1	2026-05-17T20:25:40Z	5	Vercel's Zero: A Programming Language Designed for AI Agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	alex_x	2026-05-17T14:40:22Z	1	The Programming Language for Agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	mindwarp	2026-05-12T13:24:00Z	1	Show HN: Telegram/Slack bridge for local Codex agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	errata_dev	2026-05-04T17:27:52Z	1	Show HN: [inerrata] – Collective and Causal Knowledge Layer for Coding Agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	georgestrakhov	2026-05-03T18:51:13Z	2	AOP: Agent-Oriented Programming	Liên quan agentic SDLC/eval/context; score 60-82.
hn	is-it-art	2026-04-27T21:15:46Z	2	Show HN: Is it art? An art project for AI agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	poshmosh	2026-04-27T11:34:19Z	4	Show HN: Slerp.audio – VDJ with WebGL2 and real-time DSP	Liên quan agentic SDLC/eval/context; score 60-82.
hn	NickMiladinov	2026-04-23T19:47:17Z	7	Show HN: Chestnut – The antidote to AI-induced skill atrophy	Liên quan agentic SDLC/eval/context; score 60-82.
hn	cobblr_mosaic	2026-05-26T17:38:55Z	3	Agentic Harness Engineering	Liên quan agentic SDLC/eval/context; score 60-82.
hn	ramayac	2026-05-20T04:31:50Z	2	Show HN: GoPOSIX – a Go-native POSIX userland, ~97% BusyBox-compatible	Liên quan agentic SDLC/eval/context; score 60-82.
hn	redbell	2026-05-18T12:17:04Z	159	Learn Harness Engineering	Liên quan agentic SDLC/eval/context; score 60-82.
hn	Garbage	2026-05-16T04:59:11Z	3	Agent Harness Engineering	Liên quan agentic SDLC/eval/context; score 60-82.
hn	Lunar5227	2026-05-15T05:42:46Z	1	Agentic SDLC: How OpenSearch accelerates engineering with its own engine	Liên quan agentic SDLC/eval/context; score 60-82.
hn	sahil-shubham	2026-05-14T10:44:17Z	3	Show HN: Bhatti – self-hosted runtime for your harness engineering	Liên quan agentic SDLC/eval/context; score 60-82.
hn	gruyaume	2026-05-12T14:37:45Z	1	Implicit Knowledge Is a Liability	Liên quan agentic SDLC/eval/context; score 60-82.
hn	pretext	2026-05-10T05:19:22Z	8	Agent Harness Engineering	Liên quan agentic SDLC/eval/context; score 60-82.
hn	straydusk	2026-05-08T22:57:31Z	1	Ask HN: Is agent-driven QA a thing?	Liên quan agentic SDLC/eval/context; score 60-82.
hn	nbstme	2026-05-03T19:03:04Z	2	Why does my harness forget me? Agent engineering	Liên quan agentic SDLC/eval/context; score 60-82.
hn	kumulo	2026-04-30T11:31:33Z	1	Harness engineering: leveraging Codex in an agent-first world	Liên quan agentic SDLC/eval/context; score 60-82.
hn	anophelon	2026-04-29T07:30:11Z	14	Why Codex works better than Claude Code for my production monolith	Liên quan agentic SDLC/eval/context; score 60-82.
hn	ElFitz	2026-04-27T10:59:16Z	6	Ask HN: What does your agentic software dark factory look like?	Liên quan agentic SDLC/eval/context; score 60-82.
hn	kiyanwang	2026-04-27T05:50:28Z	1	Agent Harness Engineering	Liên quan agentic SDLC/eval/context; score 60-82.
hn	alex000kim	2026-04-26T18:13:46Z	7	You've been doing harness engineering all along	Liên quan agentic SDLC/eval/context; score 60-82.
hn	jdw64	2026-04-19T08:42:37Z	10	Ask HN: May be a basic question, but how can I use AI well?	Liên quan agentic SDLC/eval/context; score 60-82.
hn	alexblackwell_	2026-04-16T15:19:54Z	100	Launch HN: Kampala (YC W26) – Reverse-Engineer Apps into APIs	Liên quan agentic SDLC/eval/context; score 60-82.
hn	geopsist	2026-05-28T12:39:46Z	6	We Benchmarked Claude Code, Codex, Semgrep, CodeQL, Trent on 28 CWE-Bench CVEs	Liên quan agentic SDLC/eval/context; score 60-82.
hn	fittingopposite	2026-05-28T05:05:59Z	2	Mini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python code	Liên quan agentic SDLC/eval/context; score 60-82.
hn	kimjune01	2026-05-24T18:03:28Z	2	Show HN: 97% on SWE-bench Verified with subscription-token agents	Liên quan agentic SDLC/eval/context; score 60-82.
hn	Sushrutkm	2026-05-19T10:02:03Z	2	Bito's AI Architect Boosts Claude Opus's task success rate by 35%	Liên quan agentic SDLC/eval/context; score 60-82.
hn	azurewraith	2026-05-12T14:24:55Z	126	Show HN: Statewright – Visual state machines that make AI agents reliable	Liên quan agentic SDLC/eval/context; score 60-82.
hn	lieret	2026-05-05T15:10:41Z	24	Show HN: New Benchmark from SWE-bench team is 0% solved	Liên quan agentic SDLC/eval/context; score 60-82.
hn	Philpax	2026-05-02T21:35:54Z	2	talkie-coder: From 1930 to SWE-bench	Liên quan agentic SDLC/eval/context; score 60-82.
hn	jryio	2026-04-29T19:16:48Z	2	Anthropic's Argument for Mythos SWE-bench improvement contains a fatal error	Liên quan agentic SDLC/eval/context; score 60-82.
hn	kmdupree	2026-04-26T13:58:13Z	343	SWE-bench Verified no longer measures frontier coding capabilities	Liên quan agentic SDLC/eval/context; score 60-82.
hn	george_ciobanu	2026-04-24T21:34:31Z	10	Show HN: Codex context bloat? 87% avg reduction on SWE-bench Verified traces	Liên quan agentic SDLC/eval/context; score 60-82.
hn	nicola_alessi	2026-04-16T20:19:18Z	1	Ask HN: Opus 4.7 – is anyone measuring the real token cost on agentic tasks?	Liên quan agentic SDLC/eval/context; score 60-82.
hn	stared	2026-04-14T08:32:45Z	1	Compare harnesses not models: Blitzy vs. GPT-5.4 on SWE-Bench Pro	Liên quan agentic SDLC/eval/context; score 60-82.
hn	sriharis	2026-04-13T13:14:44Z	3	Checking my model vibes against SWE-Bench Pro	Liên quan agentic SDLC/eval/context; score 60-82.
hn	chenglin97	2026-04-10T22:48:29Z	4	SWE-Bench Verified Leaderboard March 2026 – Independent vs. Self-Reported Scores	Liên quan agentic SDLC/eval/context; score 60-82.
hn	raghavchamadiya	2026-04-06T20:15:26Z	1	Show HN: Repowise – Codebase intelligence for AI coding agents (open source)	Liên quan agentic SDLC/eval/context; score 60-82.
hn	asfsf23423	2026-03-29T21:47:54Z	6	SWE-bench will hit 90% this year	Liên quan agentic SDLC/eval/context; score 60-82.
hn	stared	2026-03-25T14:24:43Z	4	Blitzy Scores a Record 66.5% on SWE-Bench Pro	Liên quan agentic SDLC/eval/context; score 60-82.
hn	neversettles	2026-05-03T03:40:04Z	1	The Terminal Bench 3.0 community is looking for task contributors	Liên quan agentic SDLC/eval/context; score 60-82.
hn	gk1	2026-04-29T18:16:23Z	4	ForgeCode: Top open source coding agent in Terminal-Bench 2.0	Liên quan agentic SDLC/eval/context; score 60-82.
hn	ubermon	2026-04-28T19:11:57Z	6	Open-weight 27B hits 38% on Terminal-Bench 2.0 (Opus 4.1 hit 38% in Aug 2025)	Liên quan agentic SDLC/eval/context; score 60-82.
hn	GodelNumbering	2026-04-27T12:35:55Z	393	Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview	Liên quan agentic SDLC/eval/context; score 60-82.
hn	neversupervised	2026-04-15T00:42:30Z	6	Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments	Liên quan agentic SDLC/eval/context; score 60-82.
hn	jackykwok	2026-04-14T20:27:39Z	1	A simple test-time method that beats Claude Mythos on Terminal-Bench	Liên quan agentic SDLC/eval/context; score 60-82.
hn	_nhynes	2026-04-13T07:48:11Z	1	Show HN: Amber, a capability-based runtime/compiler for agent benchmarks	Liên quan agentic SDLC/eval/context; score 60-82.
hn	joozio	2026-04-01T12:59:36Z	4	Claude Code ranks 39th on terminal bench. The leaked source shows why	Liên quan agentic SDLC/eval/context; score 60-82.
hn	bcollins34	2026-03-31T19:07:11Z	4	Show HN: Wozcode – double Claude Code output	Liên quan agentic SDLC/eval/context; score 60-82.

4Repo Watch

Repo	Metric	Updated	Fabbi move
genusercillaindependentclause781/gstack	0 stars	2026-05-30T08:30:54Z	Trial nếu có sandbox/logging/test hooks.
semionkuksov23/personal-workspace-os	1 stars	2026-05-30T08:30:52Z	Trial nếu có sandbox/logging/test hooks.
piqueriastrongbreeze520/claude-and-codex-website	1 stars	2026-05-30T08:30:48Z	Trial nếu có sandbox/logging/test hooks.
nolte/claude-shared	0 stars	2026-05-30T08:30:47Z	Trial nếu có sandbox/logging/test hooks.
MyHeavenDyf/UXAI	0 stars	2026-05-30T08:30:46Z	Trial nếu có sandbox/logging/test hooks.
Machineaccessible-ochre867/agenthub	1 stars	2026-05-30T08:30:39Z	Trial nếu có sandbox/logging/test hooks.
Acromegaliacanaliculus452/Swift-Testing-Agent-Skill	0 stars	2026-05-30T08:30:34Z	Trial nếu có sandbox/logging/test hooks.
entrepreneurial-cabinetminister913/harness-engineering	0 stars	2026-05-30T08:30:34Z	Trial nếu có sandbox/logging/test hooks.
jeongmk522-netizen/agentlas-desktop	1 stars	2026-05-30T08:30:48Z	Trial nếu có sandbox/logging/test hooks.
linny006/agent-eval-harness	0 stars	2026-05-30T08:30:29Z	Trial nếu có sandbox/logging/test hooks.
elgrhy/gx	2 stars	2026-05-30T08:30:54Z	Trial nếu có sandbox/logging/test hooks.
lorenzocl3940/gsd-2	0 stars	2026-05-30T08:29:55Z	Trial nếu có sandbox/logging/test hooks.
intangible-sidalceamalviflora302/engram	0 stars	2026-05-30T08:28:35Z	Trial nếu có sandbox/logging/test hooks.
laoxs2002/genai-agentes	0 stars	2026-05-30T08:27:12Z	Trial nếu có sandbox/logging/test hooks.
aditxver/engram	2 stars	2026-05-30T08:25:58Z	Trial nếu có sandbox/logging/test hooks.

5Paper / Benchmark Watch

Paper/benchmark	Date	Metric	Use
Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software	2026-05-28T17:59:59Z	N/A public API	Eval design, không ship trực tiếp.
SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations	2026-05-28T17:59:50Z	N/A public API	Eval design, không ship trực tiếp.
GPIC: A Giant Permissive Image Corpus for Visual Generation	2026-05-28T17:59:26Z	N/A public API	Eval design, không ship trực tiếp.
REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image	2026-05-28T17:59:01Z	N/A public API	Eval design, không ship trực tiếp.
Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents	2026-05-28T17:58:55Z	N/A public API	Eval design, không ship trực tiếp.
SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones?	2026-05-28T17:57:37Z	N/A public API	Eval design, không ship trực tiếp.
RoboWits: Unexpected Challenges for Robotic Creative Problem Solving	2026-05-28T17:57:15Z	N/A public API	Eval design, không ship trực tiếp.
Gram: Assessing sabotage propensities via automated alignment auditing	2026-05-28T17:56:18Z	N/A public API	Eval design, không ship trực tiếp.
A Bayesian Proof and Interpretation of Talagrand's Majorizing Measure Theorem	2026-05-28T17:56:03Z	N/A public API	Eval design, không ship trực tiếp.
SpecBench: Evaluating Specification-Level Reasoning for Software Engineering LLM Agents	2026-05-28T17:54:01Z	N/A public API	Eval design, không ship trực tiếp.
Loong: A Human-Like Long Document Translation Agent with Observe-and-Act Adaptive Context Selection	2026-05-28T17:32:25Z	N/A public API	Eval design, không ship trực tiếp.
PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions	2026-05-28T17:29:19Z	N/A public API	Eval design, không ship trực tiếp.
Quiver Approach to Symmetry Theories	2026-05-28T17:59:59Z	N/A public API	Eval design, không ship trực tiếp.
Benchmarking Single-Factor Physical Video-to-Audio Generation	2026-05-28T17:59:09Z	N/A public API	Eval design, không ship trực tiếp.
Visualizing orbital magnetism in electron doped rhombohedral multilayer graphene	2026-05-28T17:54:30Z	N/A public API	Eval design, không ship trực tiếp.

6Impact Coverage

Domain	Now 0-2w	Next 1-2m	Later 3-6m	Decision
FARE	Context/codebase map eval	Repo memory PoC	Knowledge layer	trial
NEXA	Harness for coding agent	Sandbox executor	Multi-agent orchestration	adopt
SYNCA	Quality/risk gates	Human-in-loop audit	Compliance dashboard	adopt
DOMUS	Monitor	AI ops assistant	Workflow automation	watch
Japan/VN/Global	Enterprise caution	Pilot bundles	Managed AI-SDLC service	trial

7CTO Evaluation Matrix

Top signal	Thesis	Evidence	Counter-signal	Fabbi implication	Confidence	Decision	Next validation
Harness-first coding agents	Agent value phụ thuộc eval loop	386 candidates, 91 GitHub	X/FB N/A	NEXA+SYNCA differentiation	78%	adopt	Run 20-task benchmark
CLI/IDE convergence	Workflow moved to terminal+IDE	25 YT, 171 HN	Video metadata partial	Dev enablement package	68%	trial	Measure review time
Context engineering	Repo understanding bottleneck	40 paper refs	No universal benchmark	FARE moat	72%	trial	Codebase Q&A eval

8CTO Recommendations

Action	ROI/time-saving	Risk	Owner	TTV	Validation
Build NEXA Agent Harness v0: 20 tasks, cost/latency/pass@1/security log.	18-28%	2/5	AI Platform Lead	2 tuần	Baseline vs agent
Add SYNCA HITL gate: risk class, diff summary, rollback checklist.	12-20%	2/5	QA/Governance Lead	10 ngày	Defect escape rate
Run FARE context-memory PoC on 2 legacy repos.	15-25%	3/5	Solution Architect	3 tuần	Onboarding time
Create Japan/VN packaged AI-SDLC pilot offer.	8-15%	3/5	Delivery Director	4 tuần	2 paid pilots / 60 days

9Data Quality / Scan Health

Scanned 386 candidates; deduped 327. Counts: {'hn': 171, 'youtube': 25, 'github': 91, 'paper': 40}. X direct/Facebook public: N/A — no authenticated API/public usable links in cron; confidence impact -12%. Reddit JSON returned 0 usable; compensated by HN/GitHub/papers/YouTube. Status PARTIAL-PASS.