Helpful tips to stay up-to-date on iThenticate's system status:
iThenticate experienced a service disruption from 7:48PM-9:30PM Pacific Time. Our initial understanding of the cause for the issue is that when we brought our new data center live a flood of new routes were being propagated throughout our network. The iThenticate service was unavailable during the time frame listed above until we were able to control the routing of traffic to the service.
iThenticate experienced an issue processing doc and docx files when our machines for creating pdfs of those file types were stuck in an error state. We do have monitoring in place for when we begin to see errors in the logs for these machines but the way our system was failing those file types put the submission test monitoring in a "not supported" failure mode, which did not send out an alert.
We resolved the issue by restarting the machines that were stuck in an error state. When they rebooted they processed doc and docx files as expected.
Although we do have monitoring and safeguards in place during the extraction process we did not have an alert built in for this specific error case. This was an oversight on our part, and we are in the process of enabling more robust monitoring for this part of the system so that our engineers will be alerted when these types of errors occur in the future.
A number of large documents submitted by iThenticate users caused a temporary breakdown of the ‘workers’ responsible for managing the initial stages of our document submission process. Unfortunately, a bug in the code of these programs exacerbated the issue, as these problem documents would become stuck at the extraction stage and never timeout; so gradually more and more ‘workers’ would become overwhelmed by these problem submissions until all available resources had been tied up, creating a backlog for other users.
While we do have Monitoring & Reporting procedures in place for the later stages of the submission process (e.g. when running the actual similarity checks and generating reports); the initial extraction stage did not have such safeguards in place. This was an oversight on our part, and this has now been corrected (along with the bug in the code); the engineers will now be alerted if a similar problem occurs at this early stage of our submission process.