Careers

Help us build the counterweight.

aiworklab is small, ambitious, and intentionally slow on hiring. The people who join now will shape the product, the company, and, if we're right, the way a generation of engineers learns their craft.

See open roles → careers@aiworklab.com

How we work

Six values, written down. No mission statement.

We don't have a mission statement. We have a thesis: AI tools should make humans better, not weaker. Around that thesis, we work like this.

Craft is currency.

We hire for taste and pay for quality. Senior engineers who care about typography. Designers who can debug. Writers who can ship.

Honest measurement.

The metrics on our dashboard aren't flattering. We optimise for things that are hard to game and tell us the truth, even when it's slower.

Strong opinions, held loosely.

We argue. We change our minds. The product gets better because we're willing to be wrong out loud and then move.

Standing on shoulders.

We integrate, contribute upstream, and credit our predecessors. We do not believe building everything ourselves is a virtue.

Async by default.

Deep work needs uninterrupted hours. We have one all-hands a week. The rest is writing, code review, and time to think.

Use the product.

Every engineer here ships in Coach mode at least one day a week. We dogfood the medicine we sell. It's how we know it works.

What we offer

Not a perks list. A working environment.

We aim to pay above market on cash and offer meaningful equity. We aim for a working environment that makes the cash and equity worth showing up for.

Above-market cash & meaningful equity. We bench against the top quartile for SF/NY senior engineers.

Fully covered health, dental, vision. Including dependents. US for now; international as we scale.

Hardware & software you actually want. M-series MacBook Pro, monitor, mechanical keyboard, every paid tool you ask for.

Hybrid Gurugram, SF, or remote. HQ in Gurugram with a satellite office in San Francisco; equally welcome to be remote anywhere that overlaps one of our offices for a few hours a day.

Deep-work calendar. No-meeting Tuesday and Thursday. One all-hands a week. The rest is yours.

$3,000/year learning budget. Books, courses, conferences. We mean it; we'll ask what you read.

Generous time off. Minimum 4 weeks, no maximum. Two-week winter shutdown. We rest because we want you here for the long run.

Our compensation philosophy

No negotiation games.

We publish levelling and bands internally. New hires are offered the same number their counterparts make. Equity vesting is 4-year, 1-year cliff, with double-trigger acceleration. We will never lowball you and ask you to negotiate up; nor will we pay differently because of where you're from or who else is offering.

How we hire

No LeetCode.

Our process is a paid take-home, a real-codebase pair session, and three values conversations. End to end in roughly 10 to 14 days. No coding-trivia interviews; we'd be hypocrites if we ran them.

Open roles

Who we're hiring next.

We are growing slowly on purpose. If you don't see your role but think you'd be a force-multiplier, write to careers@aiworklab.com with how you'd spend your first 90 days.

Founding engineer · teaching kernel

EngineeringSF / Remote (PT)

This is the most important engineering hire we'll make. The teaching kernel is the layer that makes aiworklab different from every other agentic coding tool: it's the concept tagger, the skill graph, the FSRS scheduler, the LLM-judge service, and the explain-to-merge gate. Most of our defensible IP lives here.

You'll define the data model for the skill graph (the per-user, per-concept state machine that drives all learning decisions), design the tree-sitter rules for concept extraction across ten languages, fine-tune or prompt the LLM judge that evaluates user explanations, and implement the FSRS-based scheduler that decides which concepts come up for retrieval each day. You'll own the quality signals that tell us whether the product is working: concept retention rates, solo-throughput trend, explain-to-merge pass rates.

What you'll build

Concept extraction pipeline: tree-sitter rule layer plus LLM fallback for novel patterns
Skill graph schema and state machine (SQLite local, sync layer for cloud)
FSRS scheduler with code-grounded retrieval prompt generation
LLM judge service: evaluates explain-to-merge answers against the diff
Internal eval framework to track tagger precision and judge calibration over time

We're looking for

Strong background in compilers, program analysis, or language tooling (tree-sitter, LSP, ASTs)
Experience prompting or fine-tuning LLMs for structured tasks (classification, judgement, extraction)
Comfort with TypeScript and Rust or Python; most of the kernel will be one of these
Interest in spaced repetition and learning science (you don't need a PhD, but curiosity is required)
Worked somewhere correctness and precision were non-negotiable (compilers, security, fintech)

Nice to have

Published FSRS or spaced repetition work or research
Contribution to or deep knowledge of one of: Claude Code SDK, Codex CLI, OpenCode
Prior founding-engineer role or equivalent ownership

Apply now → careers@aiworklab.com

Senior full-stack engineer · desktop & extensions

EngineeringSF / Remote (PT)

Every user interaction with aiworklab happens through the surfaces you'll own: the Tauri desktop app, the VS Code extension, and the language-client interface that connects to the IDE. This is the job where design and engineering intersect daily. You'll work closely with the designer on motion, transitions, and the micro-interactions that make a diff-review window feel like a place of learning rather than a blocker.

You'll build the diff viewer, the inline concept-card system, the explain-to-merge input and judge UI, the three-mode switcher, the daily retrieval session surface, and eventually the mobile companion app. Performance matters: the app needs to feel instantaneous at diff sizes up to 5,000 lines.

What you'll build

Tauri v2 desktop app: multi-window, IPC-connected to the teaching kernel
VS Code extension (Language Server Protocol plus webview panels)
Diff viewer with inline concept annotations and merge gating
Spaced-retrieval session UI: daily card queue, answer input, verdict display
Skill graph visualisation (React + D3 or similar)

We're looking for

TypeScript and React fluency; our frontend stack throughout
Experience with Tauri, Electron, or native desktop apps (IPC, window management)
VS Code extension experience or deep familiarity with Language Server Protocol
Strong product taste; you notice when transitions are 10ms too slow
Willingness to work in Rust when the performance case is clear

Nice to have

Experience building diff viewers or code-review UIs
React Native or Flutter experience (the mobile companion ships Q4 2026)
WebGL or canvas animation experience for graph visualisations

Apply now → careers@aiworklab.com

ML engineer · concept tagger & LLM judge

Engineering · MLSF preferred

The quality of the entire product depends on two inference pipelines: the concept tagger that reads a diff and outputs a list of programming concepts with confidence scores, and the LLM judge that evaluates a user's plain-English explanation against the diff. If the tagger mislabels, users get explain-to-merge checks on concepts they've mastered, and the product feels like a chore. If the judge is mis-calibrated, it fails people who understand and passes people who don't. Both failures undermine trust.

This role owns both. You'll build the eval harness, curate the labelled dataset for the tagger, design the judge prompt with calibration criteria, run experiments on model choice and inference cost versus quality trade-offs, and maintain the offline test suite that flags regressions before they ship.

What you'll own

Tagger pipeline: tree-sitter rules plus LLM hybrid; precision and recall targets per language
Judge prompt engineering: structured criteria, rubric design, confidence scoring
Evaluation framework: automated evals, human-in-the-loop annotation tooling
Model selection and latency optimisation for on-device inference paths
Dataset curation and labelling guidelines for concept annotation at scale

We're looking for

Strong prompting and LLM evaluation background; you've shipped judge pipelines before
Python proficiency; comfortable with data pipelines (pandas, duckdb, or similar)
Experience designing structured annotation tasks and inter-annotator agreement frameworks
Comfort reasoning about precision and recall trade-offs in the context of user experience
Enough software engineering to build and maintain production services independently

Nice to have

Background in code understanding, program synthesis, or static analysis
Familiarity with tree-sitter and writing grammar queries
Research background in natural language understanding or code intelligence

Apply now → careers@aiworklab.com

Founding designer

DesignSF / Remote (PT)

You'll own design end-to-end: the desktop app, the VS Code extension UI, the org dashboard, and the marketing site. The design language of the product is editorial and typographically driven. It should feel like working inside a well-designed technical journal, not a generic SaaS dashboard. You'll have real influence over that direction and the space to execute it properly.

The hardest design problems here are all about friction: when to interrupt the user and when to stay invisible; how to make a merge-gate feel like an invitation to learn rather than an obstacle; how to visualise a skill graph in a way that's honest and motivating at the same time. These are interesting problems and we want someone who finds them genuinely interesting.

What you'll own

Full visual and interaction design system across the desktop app and extensions
Org dashboard: data visualisation design for concept coverage, retention curves, bus-factor
Marketing site iteration and campaign assets
Motion: micro-interaction spec, animation library decisions
Design infrastructure: tokens, component documentation, handoff patterns

We're looking for

A portfolio that shows real typographic sensibility and attention to detail in dense interfaces
Experience designing developer tools or technical products, not just consumer apps
Comfort with motion and interaction design, not just static screens
Ability to write HTML/CSS at a level that lets you prototype and review implementation quality
Used to owning design end-to-end on a small team; no specialists to hand off to

Nice to have

Experience designing data visualisation for complex graphs or skill systems
Framer or Rive animation experience
A prior founding-designer role

Apply now → careers@aiworklab.com

Founding devrel

GTMSF / Remote (PT)

The category we're building, teaching-layer AI tooling, doesn't have an established name yet. Part of your job is naming it, shaping the discourse around it, and becoming the voice that senior engineers trust when they want a clear-eyed take on what AI is doing to their craft. This is a writing-first role. Most devrel is demo-and-tweet; we want long-form essays, technical deep-dives, and original research that engineers share because it earns it.

You'll run the blog, the YouTube channel, the newsletter, and the HN presence. You'll run the Discord and be the first line of contact for early users. You'll organise a quarterly "skill telemetry" report: public data from our anonymised user base on concept retention trends across the industry. That report is the artefact that earns press.

What you'll own

Blog: 2 to 4 long-form pieces per month; research essays, tutorials, field notes
YouTube: product deep-dives and a "skill telemetry" explainer series
Quarterly industry skill report: methodology, analysis, media outreach
Discord and GitHub Discussions: first response, escalation triage, feedback synthesis
Conference talks, podcast appearances, academic research partnerships

We're looking for

Writing samples that make technical engineers stop and read, not marketing copy
An engineering background strong enough to write credibly about tree-sitter, FSRS, and LLM judges
A distribution track record: HN front page, a substantial following, or a newsletter engineers actually open
Comfort with quantitative research: you know how to avoid p-hacking a survey
Opinions about the craft of software engineering that you're willing to defend publicly

Nice to have

Video production chops (the YouTube audience for serious engineering content is underserved)
Prior devrel at a developer tools company
An academic or research background in cognitive science or learning

Apply now → careers@aiworklab.com

Open application

Don't see your role?

If you think you'd be a force-multiplier but none of the above fits, write to us with how you'd spend your first 90 days at aiworklab. We read every application personally and respond.

careers@aiworklab.com →

Still reading

Then write to us.

A short note on how you'd spend your first 90 days at aiworklab is worth more to us than a 5-page CV. We respond personally and quickly.

careers@aiworklab.com → Read about us first