How do you define and set SLAs with engineers?
I would recommend first building a relationship with your technical lead/engineering counterpart. Have them show you how your e-commerce platform (or product area) works end to end, from the backend perspective. Make sure that you first understand the end to end flow and specifically the systems design (which is critical in any e-comm platform). Once you understand how the customer's journey equates to the systems design, then start looking into each customer interaction with the site and make sure your team is tracking those metrics. You will end up at the checkout rates. If you have a good pulse on SQL or pulling and analyzing data, you could probably do the error rate comparison on your own. If you don't feel comfortable, work with your engineering lead (or data analyst) to dig into those numbers. Build out a report or dashboard that you can look at a regular basis. This will give you the background to ask or share opinions.
Service Level Agreements (SLA) are driven by three factors: (1) industry standard expectations by customers, (2) differentiating your product when marketing, (3) direct correlation with improving KPIs.
For checkout, you'll have uptime as an industry standard, but it's insufficient because subsystems of a checkout can malfunction without the checkout process outright failling. You could consider latency or throughput as market differentiators and would need instrumentation on APIs and client response. With payment failures or shipping calculation failures, you would directly impact conversion rates and trust erosion (hurting repeat buying), which are likely KPIs you care about. So your SLAs need to be a combination of measures that account for all of the above, and your engineering counterparts have to see the evidence that these matter in conjunction.
Of the three types, the one that's most difficult to compare objectively is the third. In your question, you mention 1.5% error rates. You could go on a hunt to find evidence that convinces your engineering counterparts that these are elevated vs. competition, or that they're hurting the business. What's more likely to succeed is running A/B tests that attempt to improve error rates and demonstrating a direct correlation with improving a KPI you care about. That's a more timeboxed exercise, and with evidence, you can change hearts and minds. That's what can lead to more rigorous setting of SLAs and investment in rituals to uphold them.
This is fundamentally a question about driving technical quality improvements through data-driven decision making. Let me break down the approach...
First, let's clarify terminology: What you're looking to establish is an SLO (Service Level Objective), not an SLA. SLOs are internal targets for service quality, which is exactly what we need here.
To make the case for a checkout success rate SLO:
-
Ground Your Argument in Data
-
Start with industry benchmarks: A quick Google search tells me that eCommerce checkout success rates typically range from 98-99.5%
You can do deeper research here and probably find more relevant data for your specific vertical
I wouldn't overly rely on benchmarks - they are aggregate after-all and every business is different, but this gives you a good litmus test for your intuition
-
Do your best to calculate business impact with some back of napkin math (or get as precise as you can if it helps build your case):
Direct Revenue Loss = (Checkout Attempts × Average Order Value × Error Rate)
Indirect Loss = (Failed Checkouts × % Customer Loss × Lifetime Value)
This gives you a clear dollar value impact per 0.1% improvement
-
-
Build Engineering Partnership
Before you pick a target and chuck it over the fence, work with engineering to understand the technical problems and constraints
In this case, I might start with error logging and categorization
-
Break down the current 1.5% error rate by type:
What portion is under your control (e.g., validation errors)?
What's external (e.g., payment processor issues)?
Set targeted SLOs for what you can control
Example: "Reduce validation-related checkout errors from 0.5% to 0.2%"
-
Propose a Phased Implementation Approach that allows the team to tackle it incrementally
Phase 1: Add detailed error tracking
Phase 2: Set baseline SLOs for controllable errors
Phase 3: Implement monitoring and regular review cycles
Phase 4: Iterate targets based on learnings
The key is to focus on what you can measure, what you can control, and what delivers clear business value. This transforms the conversation from 'I think 1.5% is too high' to 'Here's the impact of each 0.1% improvement, and here's how we can get there together.'