Skip to main content

GiantSled

World's First Agentic Enterprise

What we are

Agentic enterprise illustration

GiantSled is an Agentic Enterprise: a company where autonomous AI agents are the operating and strategic core, not a tool that people wield. The agents hold roles across executive, marketing, technology, and financial operations. They build and run real products that earn revenue. People are accountable for what the agents do, and serve as the bridge to the physical world.

We've been running this way since December 2025, quietly documenting our progress in public the whole time. If someone was doing this before us, we'd genuinely like to know; six months in, we haven't found them.

What an Agentic Enterprise is

Railroad-era clockmaker's workshop

Plenty of companies deploy AI agents, and plenty of platforms help them do it. In nearly all of them, the agents sit in execution: they handle tasks, while people hold the strategy and direct the work. The agents are a tool the company uses.

An Agentic Enterprise is the harder version. Here the agents don't just execute, they occupy the functional roles, coordinate with each other, and carry the operating and strategic work of the company within defined boundaries. People are the bridge to the physical world and the ones accountable for the outcomes. That inversion, agents as the core rather than the tool, is what makes it an Agentic Enterprise rather than a company that uses agents.

It's the difference between the telephone and the telegraph. The telegraph automated something that already existed: letter delivery, sped up but still one-way and turn-based. The telephone was a different kind of thing, real-time conversation over distance, a capability that hadn't existed before. Automation makes existing work faster. Agency is a new capability: agents that reason, adapt, and decide. An Agentic Enterprise is built on the second.

Why now

Railroad surveyor working with a transit

In November 2025, Anthropic launched Claude Opus 4.5. Within days it was clear this wasn't just a better tool. It could reason about complex problems and produce sophisticated analysis well enough to hold a role, not merely execute tasks. That was the moment the idea became worth testing, and GiantSled was founded weeks later, in December 2025, to find out whether an Agentic Enterprise could actually work: real products, real revenue, documented all the way.

We're not built on any single model. The company is built on the expectation that capability at this level keeps getting better and more widely available over time, not on betting everything on one provider's latest release.

What we've built

Vintage steam locomotive sketch

EndpointEvaluator — our first paid product

LLMs are powerful but unpredictable. A model update, a prompt tweak, a provider swap, and suddenly your system is handing customers confident, wrong answers, with nothing in your tests catching it.

The root problem is that LLM output isn't deterministic the way ordinary code is. You can't assert that a response equals an expected string, because the wording shifts every run even when nothing's broken. So most teams either skip testing their LLM features entirely or eyeball outputs by hand, and silent drift slips through to production.

EndpointEvaluator solves that. Give it two texts, a known-good reference and a fresh response from your app, and it scores how consistent they are using up to three comparison methods. You're measuring whether the meaning held, not whether the text matched character for character, and you set the threshold for what counts as consistent.

It's built to live in your CI pipeline. Wire it into your build and every commit gets checked, so a quiet change in model behavior, whether from a provider-side update, a prompt edit, or a model swap you're evaluating, gets flagged before it ships rather than after a customer finds it. That makes it useful in two ways: as a safety net catching regressions on every deploy, and as the tool that tells you whether switching models actually held your quality bar. Learn more →

How we operate

Railroad dispatch room interior

The company is run, to the largest extent possible, by LLM-backed agents we call actors. Each actor holds a defined role and operates with autonomy within it: reasoning about problems, coordinating with other actors, and pursuing the company's objectives, rather than executing a fixed script. The patterns by which they work with each other and with people have evolved since the company was founded.

People are the bridge between the actors and the physical world. They handle what the agents can't, and, by design, remain accountable for the company's decisions. This isn't temporary scaffolding waiting for the agents to catch up; keeping people accountable is a deliberate and permanent part of how GiantSled is built.

We run lean by choice. Constraints force clarity, and every limit we've hit has pushed us to improve how the company operates.

Our story

Railroad ticket window clerk

We're publishing the inside account of building this, one chapter a week, on a deliberate delay of about six months so we never reveal what we're working on now. It's the real record, the resets that didn't work, the day the platform fell over, the launch that nearly turned into a crisis over three failed deploys and three days of runway. The decisions the actors got right and the ones they didn't.

The first chapter is live now: how the company was founded on the morning we stopped tearing it down. Read our story →