Help us build the counterweight.
aiworklab is small, ambitious, and intentionally slow on hiring. The people who join now will shape the product, the company, and — if we're right — the way a generation of engineers learns their craft.
Six values, written down. No mission statement.
We don't have a mission statement. We have a thesis: AI tools should make humans better, not weaker. Around that thesis, we work like this.
Craft is currency.
We hire for taste and pay for quality. Senior engineers who care about typography. Designers who can debug. Writers who can ship.
Honest measurement.
The metrics on our dashboard aren't flattering. We optimise for things that are hard to game and tell us the truth — even when it's slower.
Strong opinions, held loosely.
We argue. We change our minds. The product gets better because we're willing to be wrong out loud and then move.
Standing on shoulders.
We integrate, contribute upstream, and credit our predecessors. We do not believe building everything ourselves is a virtue.
Async by default.
Deep work needs uninterrupted hours. We have one all-hands a week. The rest is writing, code review, and time to think.
Use the product.
Every engineer here ships in Coach mode at least one day a week. We dogfood the medicine we sell. It's how we know it works.
Not a perks list. A working environment.
We aim to pay above market on cash and offer meaningful equity. We aim for a working environment that makes the cash and equity worth showing up for.
- + Above-market cash & meaningful equity. We bench against the top quartile for SF/NY senior engineers.
- + Fully covered health, dental, vision. Including dependents. US for now; international as we scale.
- + Hardware & software you actually want. M-series MacBook Pro, monitor, mechanical keyboard, every paid tool you ask for.
- + Hybrid SF or remote. SF office in Financial District; equally welcome to be remote anywhere within ±5 hours of US Pacific.
- + Deep-work calendar. No-meeting Tuesday and Thursday. One all-hands a week. The rest is yours.
- + $3,000/year learning budget. Books, courses, conferences. We mean it — we'll ask what you read.
- + Generous time off. Minimum 4 weeks, no maximum. Two-week winter shutdown. We rest because we want you here for the long run.
No negotiation games.
We publish levelling and bands internally. New hires are offered the same number their counterparts make. Equity vesting is 4-year, 1-year cliff, with double-trigger acceleration. We will never lowball you and ask you to negotiate up; nor will we pay differently because of where you're from or who else is offering.
how we hire
No LeetCode.
Our process is a take-home (paid), a real-codebase pair session, and three values conversations. End-to-end ~10–14 days. No coding-trivia interviews; we'd be hypocrites if we ran them.
Who we're hiring next.
We are growing slowly on purpose. If you don't see your role but think you'd be a force-multiplier, write to careers@aiworklab.com with how you'd spend your first 90 days.
Founding engineer · teaching kernel
This is the most important engineering hire we'll make. The teaching kernel is the layer that makes aiworklab different from every other agentic coding tool — it's the concept tagger, the skill graph, the FSRS scheduler, the LLM-judge service, and the explain-to-merge gate. Most of our defensible IP lives here.
You'll define the data model for the skill graph (the per-user, per-concept state machine that drives all learning decisions), design the tree-sitter rules for concept extraction across ten languages, fine-tune or prompt the LLM judge that evaluates user explanations, and implement the FSRS-based scheduler that decides which concepts come up for retrieval each day. You'll own the quality signals that tell us whether the product is working — concept retention rates, solo-throughput trend, explain-to-merge pass rates.
- → Concept extraction pipeline: tree-sitter rule layer + LLM fallback for novel patterns
- → Skill graph schema and state machine (SQLite local, sync layer for cloud)
- → FSRS scheduler with code-grounded retrieval prompt generation
- → LLM judge service: evaluates explain-to-merge answers against the diff
- → Internal eval framework to track tagger precision / judge calibration over time
- ✓ Strong background in compilers, program analysis, or language tooling (tree-sitter, LSP, ASTs)
- ✓ Experience prompting or fine-tuning LLMs for structured tasks (classification, judgement, extraction)
- ✓ Comfort with TypeScript and Rust or Python — most of the kernel will be one of these
- ✓ Interest in spaced repetition and learning science (you don't need a PhD, but curiosity required)
- ✓ Worked at a company where correctness and precision were non-negotiable (compilers, security, fintech)
- + Published FSRS or spaced repetition work or research
- + Contribution to or deep knowledge of one of: Claude Code SDK, Codex CLI, OpenCode
- + Prior founding-engineer role or equivalent ownership
Senior full-stack engineer · desktop & extensions
Every user interaction with aiworklab happens through the surfaces you'll own: the Tauri desktop app, the VS Code extension, and the language-client interface that connects to the IDE. This is the job where design and engineering intersect daily — you'll work closely with the designer on motion, transitions, and the micro-interactions that make a diff-review window feel like a place of learning rather than a blocker.
You'll build the diff viewer, the inline concept-card system, the explain-to-merge input and judge UI, the three-mode switcher, the daily retrieval session surface, and eventually the mobile companion app. Performance matters: the app needs to feel instantaneous at diff sizes up to 5,000 lines.
- → Tauri v2 desktop app: multi-window, IPC-connected to the teaching kernel
- → VS Code extension (Language Server Protocol + webview panels)
- → Diff viewer with inline concept annotations and merge gating
- → Spaced-retrieval session UI: daily card queue, answer input, verdict display
- → Skill graph visualisation (React + D3 or similar)
- ✓ TypeScript + React fluency — our frontend stack throughout
- ✓ Experience with Tauri, Electron, or native desktop apps (understanding of IPC, window management)
- ✓ VS Code extension experience or deep familiarity with Language Server Protocol
- ✓ Strong product taste — you notice when transitions are 10ms too slow
- ✓ Willingness to work on Rust when the performance case is clear
- + Experience building diff viewers or code-review UIs
- + React Native or Flutter experience (mobile companion is 2027)
- + WebGL or canvas animation experience for graph visualisations
ML engineer · concept tagger & LLM judge
The quality of the entire product depends on two inference pipelines: the concept tagger that reads a diff and outputs a list of programming concepts with confidence scores, and the LLM judge that evaluates a user's plain-English explanation against the diff. If the tagger mislabels, users get explain-to-merge checks on concepts they've mastered, and the product feels like a chore. If the judge is mis-calibrated, it fails people who understand and passes people who don't — both failures undermine trust.
This role owns both. You'll build the eval harness, curate the labelled dataset for the tagger, design the judge prompt with calibration criteria, run experiments on model choice and inference cost vs. quality trade-offs, and maintain the offline test suite that flags regressions before they ship.
- → Tagger pipeline: tree-sitter rules + LLM hybrid; precision / recall targets per language
- → Judge prompt engineering: structured criteria, rubric design, confidence scoring
- → Evaluation framework: automated evals, human-in-the-loop annotation tooling
- → Model selection and latency optimisation for on-device inference paths
- → Dataset curation and labelling guidelines for concept annotation at scale
- ✓ Strong prompting and LLM evaluation background — you've shipped judge pipelines before
- ✓ Python proficiency; comfortable with data pipelines (pandas, duckdb, or similar)
- ✓ Experience designing structured annotation tasks and inter-annotator agreement frameworks
- ✓ Comfort reasoning about precision / recall trade-offs in the context of user experience
- ✓ Enough software engineering to build and maintain production services independently
- + Background in code understanding, program synthesis, or static analysis
- + Familiarity with tree-sitter and writing grammar queries
- + Research background in natural language understanding or code intelligence
Founding designer
You'll own design end-to-end: the desktop app, the VS Code extension UI, the org dashboard, and the marketing site. The design language of the product is editorial and typographically driven — it should feel like working inside a well-designed technical journal, not a generic SaaS dashboard. You'll have real influence over that direction and the space to execute it properly.
The hardest design problems here are all about friction: when to interrupt the user and when to stay invisible; how to make a merge-gate feel like an invitation to learn rather than an obstacle; how to visualise a skill graph in a way that's honest and motivating at the same time. These are interesting problems and we want someone who finds them genuinely interesting.
- → Full visual and interaction design system across desktop app and extensions
- → Org dashboard: data visualisation design for concept coverage, retention curves, bus-factor
- → Marketing site iteration and campaign assets
- → Motion: micro-interaction spec, animation library decisions
- → Design infrastructure: tokens, component documentation, handoff patterns
- ✓ Portfolio that shows real typographic sensibility and attention to detail in dense interfaces
- ✓ Experience designing developer tools or technical products — not just consumer apps
- ✓ Comfortable with motion and interaction design, not just static screens
- ✓ Ability to write HTML/CSS at a level that lets you prototype and review implementation quality
- ✓ Used to owning design end-to-end at a small team — no specialists to hand off to
- + Experience designing data visualisation for complex graphs or skill systems
- + Framer or Rive animation experience
- + Prior founding-designer role
Founding devrel
The category we're building — teaching-layer AI tooling — doesn't have an established name yet. Part of your job is naming it, shaping the discourse around it, and becoming the voice that senior engineers trust when they want a clear-eyed take on what AI is doing to their craft. This is a writing-first role. Most devrel is demo-and-tweet; we want long-form essays, technical deep-dives, and original research that engineers share because it earns it.
You'll run the blog, the YouTube channel, the newsletter, and the HN presence. You'll run the discord and be the first line of contact for early users. You'll organise a quarterly "skill telemetry" report — public data from our anonymised user base on concept retention trends across the industry. That report is the artefact that earns press.
- → Blog: 2–4 long-form pieces per month; research essays, tutorials, field notes
- → YouTube: product deep-dives and "skill telemetry" explainer series
- → Quarterly industry skill report: methodology, analysis, media outreach
- → Discord and GitHub Discussions: first response, escalation triage, feedback synthesis
- → Conference talks, podcast appearances, academic research partnerships
- ✓ Writing samples that make technical engineers stop and read — not marketing copy
- ✓ Engineering background strong enough to write credibly about tree-sitter, FSRS, and LLM judges
- ✓ Distribution track record: HN front page, substantial Twitter/X following, or a newsletter engineers actually open
- ✓ Comfort with quantitative research: you know how to avoid p-hacking a survey
- ✓ Opinions about the craft of software engineering that you're willing to defend publicly
- + Video production chops (the YouTube audience for serious engineering content is underserved)
- + Prior devrel at a developer tools company
- + Academic or research background in cognitive science or learning
Don't see your role?
If you think you'd be a force-multiplier but none of the above fits, write to us with how you'd spend your first 90 days at aiworklab. We read every application personally and respond.
Then write to us.
A short note on how you'd spend your first 90 days at aiworklab is worth more to us than a 5-page CV. We respond personally and quickly.