TrustedForm: Increased Error Rate
Incident Report for ActiveProspect
Postmortem

Incident: Degradation of Claiming Services

Beginning early Monday morning on October 19 at approximately 3:30am (CT), TrustedForm experienced a degradation in services. During this period, the ability to create claims was impacted. This degradation in services continued until we resolved the issue that night (Oct 19) around 11:15pm (CT).

As soon as we resolved the issue affecting our systems, all applications and services quickly returned to normal.

Incident: Partial Outage

During a one hour period between 1:55pm (CT) and 2:55pm (CT) and a 20 minute period between 5:30pm (CT) and 5:50pm (CT) on October 19, our services degraded to the point where we were unable to capture certificate and event data from pages running the TrustedForm script. Certificate information from this period will either be incomplete or missing entirely.

What We're Doing

Our engineering staff has resolved both incidents, which were the consequence of the same issue: a customer using our product in an unorthodox way. This problem had a cascading effect across the TrustedForm services, requiring more processing time to retrieve data from our database and presenting it to the requester. 

We are now monitoring this sort of usage, and considering putting limits in place that would prevent it.

Last Tuesday (Oct 20), we reprocessed those claims which came through the LeadConduit system during this time. If you used the API, our system will have no record of those claims submitted. There was a small number of claims that could not be resubmitted.

As stressful as this was for our team, we know these sorts of incidents are equally stressful for you. It's for this very reason that we take these matters seriously, continually looking for ways we can improve stability in our applications, and ensuring that when problems do arise that we communicate them to you as soon as we are able.

We apologize for last week’s disruption and the delay in the postmortem as we isolated the cause, and we are grateful for your patience.

Posted Oct 28, 2020 - 09:19 CDT

Resolved
This incident was resolved last evening as of the previous update. We monitored overnight and have experienced no further issues. We will follow up today with a postmortem.
Posted Oct 20, 2020 - 09:21 CDT
Update
We've identified the issue and have patched it. We have a plan established for a more permanent solution. More details to come soon, thank you for your patience!
Posted Oct 19, 2020 - 23:44 CDT
Update
We are continuing to investigate the source of the issue. Our team is actively pursuing a permanent solution to the elevated resource usage across our servers.
Posted Oct 19, 2020 - 20:49 CDT
Investigating
The elevated error rates have returned. Our team is continuing to investigate the root cause.
Posted Oct 19, 2020 - 17:49 CDT
Monitoring
All TrustedForm services have been restarted and services have been restored. We are continuing to investigate the root cause of the errors and are monitoring the TrustedForm service.

If you are calling the TrustedForm claim API directly from your system, you will need to retry any claim calls that did not result in an HTTP 201. If you are processing leads through LeadConduit, all your failed claim calls will be automatically retried for you. There is no further action necessary on your part.
Posted Oct 19, 2020 - 15:47 CDT
Update
In an effort to resolve the ongoing issue with increased errors, our team is working to restart the TrustedFrom services. This may cause additional latency and errors while the services are restarted, including errors when attempting to claim or view certificates.
Posted Oct 19, 2020 - 14:39 CDT
Update
We are continuing to investigate the root cause of the increased error rates and latency.
Posted Oct 19, 2020 - 11:14 CDT
Investigating
We are currently investigating an increase in error responses in TrustedForm.
Posted Oct 19, 2020 - 08:44 CDT
This incident affected: TrustedForm Application and TrustedForm Certify.