Bespoke Labs - Business Technology Consulting Services | Advanced Development Solutions

Overview

Bespoke Labs built the foundation for reliable AI agents. The company researched agent environments, data curation, and evaluation for enterprises and frontier AI labs. As a
creator on the team, I authored the high-quality tasks that measured how well those agents actually performed.

Project Context

Agents only improved when someone tested them on hard, realistic problems. Therefore, my work focused on building exactly those problems. Specifically, I designed DevOps and SRE
scenarios that mirrored real production failures. Each task dropped an AI agent into a live, containerized infrastructure environment. The agent then had to diagnose the fault and
fix it on its own, just like an on-call engineer.

What I Built

I built each task as a self-contained, reproducible environment. First, I wrote the scenario and the agent’s prompt. Next, I injected the broken state into a Kubernetes-based
microservices stack. After that, I wrote a genuine, end-to-end solution. Finally, I built an automated grader that scored the agent with functional tests. Moreover, every grader
had to score zero before the fix and full marks after it.

Key Achievement

Quality meant calibrated difficulty, not just a task that ran. Consequently, I tuned each scenario so even strong agents failed more often than they succeeded. I verified this
with repeated evaluation runs and strict variance thresholds. As a result, my accepted tasks became reliable benchmarks that pushed frontier agents to improve.

Might interest you

Weka

WEKA built high-performance storage infrastructure for AI and HPC workloads.
The company helped leading AI teams keep their GPUs fed with data at massive scale.
I worked directly with the CTO to turn that complex hardware into simple sales tools.

Talk To The CTO

TalkToTheCTO helps freelancers and contractors stop sending CVs into the void.
Instead, it connects them directly with the people who actually hire: CTOs, VPs, and Heads of Engineering.
Behind the scenes, an AI engine runs around the clock to find this work and package it into ready-to-use leads.

Cypago

Compliance teams needed to ask complex, nuanced questions of their organizational data.
I designed and maintained a decision tree engine.
It translated these layered conditions into executable database searches.

Radvision

Radvision

The task was to develop a framework where the customer can configure his own algorithms for the incoming video calls. It included providing SDK for non-technical users, sandbox for apps testing and a backend for management

Healarium

Healarium

It was needed to develop a framework which connected between a company, its employees and physicians. The purpose was to improve employees health by creating personal goals and plans. The product was divided into 3 main pieces for every target audience.

PeerTV

PeerTV developed set-top boxes for home internet TV. We developed a backend from scratch for managing the set-top boxes content, including theming and customizations for every content provider. It also included a panel for managing service status of the boxes.

Accept Software Project

Accept Software

The main product of the company was a framework for tracking production lifecycle. The task was to develop a web tool for QA department which could be used for testing APIs written in different languages using different databases.

Ekkli Project

Ekkli

We built a collaboration tool for decision making. The idea was to listen to every single person in a discussion even if it’s involve a lot of people. 1 picture worth a thousand words, so we developed multiple visualizations for every type of discussion.

Get Started