Back to overview
Degraded

MachShip Performance / Access issues

Sep 11 at 09:04am AEST
Affected services
MachShip Live API
MachShip Carriers
FusedShip

Resolved
Sep 11 at 09:19pm AEST

Hi All,

This is Mike McKay, the CEO at MachShip.

Firstly, can I thank you all, our valuable customers, for your patience today as we worked to resolve the issues with the "slowness" of the system.

As you know, MachShip is generally a very stable and performant platform and we constantly strive to have our uptime and reliability exceed our customers' expectations. Unfortunately, today we have fallen short with the issues we faced this morning.

Over the weekend, we moved our hosting from one provider to another, much larger provider, to get us ready for future scalability and security requirements.

When moving systems between hosting providers, often there are small differences in the way things work that are not easily identified, and often do not present themselves until there is a significant load on the platform.

As MachShip is B2B software servicing the Australian market, our significant loads are seen primarily in the mornings, between 6am and midday, Melbourne time, as that is when most of the bookings and such are being added by you, our customers.

We moved the platform over to the new hosting on Friday night, and over the weekend we fixed several small issues that we could identify, and worked to do as much testing as we could to try to ensure there would be no issues this morning when the load picked up. Unfortunately, there was still the remaining issue of which we were not aware, which has caused the slowness this morning and into this evening.

Long story short, the issues we faced today were related to our database and in particular, its ability to handle the amount of requests that it needs to during our busy periods. Our new hosting required that we tweak some fairly obscure settings to allow our database to handle that load. Unfortunately, computers (and in particular, virtualised servers) being the extremely technical and at times challenging beasts that they are, even for experts, it took us many hours of trial and error to finally find a fix.

Once we found that fix and rolled it out, we saw that the performance of the platform was back to its usual speedy nature, and at that time we have considered the issue as fixed.

We sincerely apologise to you, our customers, and to our extended user base for the inconvenience that this no doubt caused to you today.

Please rest assured that we will continue to closely monitor our systems and whilst we don't anticipate further issues, we will be right on top of them should the pop up.

I would like to take the opportunity to provide some further background as to why we're making the changes we are and what we've changed; here are some of the major changes:

  • We have moved our hosting to Equinix, who is one of the largest datacenter providers in the world, to ensure that we can scale our services effectively infinitely into the future, from both a volume and geographic perspective.
  • We have moved to more advanced storage that will allow us to speed up our systems going forward, and to allow us to have more control and visibility over our underlying infrastructure.
  • We have moved to using BGP for routing traffic into our network, moving us closer to the core internet infrastructure and allowing us to host our services in different places without the need to change our IP addresses going forward.
  • We have placed Cloudflare's enterprise protection services in front of our systems to provide more security and protection from attacks such as DDoS.
  • We moved our services from a Melbourne location to Equinix's Sydney datacenter, as step one of our move.
  • We will be, in the near future, having a full copy of our systems and data running in Equinix's Melbourne datacenter, to provide physical redundancy.
  • We have different internet providers in both sites to ensure that no problems with a single internet provider can bring MachShip down, and so that we can route around those issues if that ever does happen.

There are many more changes coming in the near future, especially focussed on redundancy and security, to ensure that we can provide you, our customers, with a product that you can continue to trust.

Should you have any further questions relating to the outage or what we're planning to do in the future, please feel free to reach out to our support team.

Once again, please accept our sincerest apologies for the inconvenience caused to you and your customers this morning.

Michael McKay

CEO

MachShip

Updated
Sep 11 at 09:19pm AEST

Hi All,

This is Mike McKay, the CEO at MachShip.

Firstly, can I thank you all, our valuable customers, for your patience today as we worked to resolve the issues with the "slowness" of the system.

As you know, MachShip is generally a very stable and performant platform and we constantly strive to have our uptime and reliability exceed our customers' expectations. Unfortunately, today we have fallen short with the issues we faced this morning.

Over the weekend, we moved our hosting from one provider to another, much larger provider, to get us ready for future scalability and security requirements.

When moving systems between hosting providers, often there are small differences in the way things work that are not easily identified, and often do not present themselves until there is a significant load on the platform.

As MachShip is B2B software servicing the Australian market, our significant loads are seen primarily in the mornings, between 6am and midday, Melbourne time, as that is when most of the bookings and such are being added by you, our customers.

We moved the platform over to the new hosting on Friday night, and over the weekend we fixed several small issues that we could identify, and worked to do as much testing as we could to try to ensure there would be no issues this morning when the load picked up. Unfortunately, there was still the remaining issue of which we were not aware, which has caused the slowness this morning and into this evening.

Long story short, the issues we faced today were related to our database and in particular, its ability to handle the amount of requests that it needs to during our busy periods. Our new hosting required that we tweak some fairly obscure settings to allow our database to handle that load. Unfortunately, computers (and in particular, virtualised servers) being the extremely technical and at times challenging beasts that they are, even for experts, it took us many hours of trial and error to finally find a fix.

Once we found that fix and rolled it out, we saw that the performance of the platform was back to its usual speedy nature, and at that time we have considered the issue as fixed.

We sincerely apologise to you, our customers, and to our extended user base for the inconvenience that this no doubt caused to you today.

Please rest assured that we will continue to closely monitor our systems and whilst we don't anticipate further issues, we will be right on top of them should the pop up.

I would like to take the opportunity to provide some further background as to why we're making the changes we are and what we've changed; here are some of the major changes:

  • We have moved our hosting to Equinix, who is one of the largest datacenter providers in the world, to ensure that we can scale our services effectively infinitely into the future, from both a volume and geographic perspective.
  • We have moved to more advanced storage that will allow us to speed up our systems going forward, and to allow us to have more control and visibility over our underlying infrastructure.
  • We have moved to using BGP for routing traffic into our network, moving us closer to the core internet infrastructure and allowing us to host our services in different places without the need to change our IP addresses going forward.
  • We have placed Cloudflare's enterprise protection services in front of our systems to provide more security and protection from attacks such as DDoS.
  • We moved our services from a Melbourne location to Equinix's Sydney datacenter, as step one of our move.
  • We will be, in the near future, having a full copy of our systems and data running in Equinix's Melbourne datacenter, to provide physical redundancy.
  • We have different internet providers in both sites to ensure that no problems with a single internet provider can bring MachShip down, and so that we can route around those issues if that ever does happen.

There are many more changes coming in the near future, especially focussed on redundancy and security, to ensure that we can provide you, our customers, with a product that you can continue to trust.

Should you have any further questions relating to the outage or what we're planning to do in the future, please feel free to reach out to our support team.

Once again, please accept our sincerest apologies for the inconvenience caused to you and your customers this morning.

Michael McKay

CEO

MachShip

Updated
Sep 11 at 09:13pm AEST

The team wishes to advise that the issues affecting the performance and stability of the platform have been resolved.

Clients should now be seeing the regular performance on the platform.

A post-mortem for this issue will be posted shortly.

Updated
Sep 11 at 08:41pm AEST

Update.

Work is continuing in the effort to resolve the issues that have caused performance and stability problems on the MachShip platform since early Monday, September 11th 2023

We will continue to provide updates as we work towards a permanent fix.

Updated
Sep 11 at 04:47pm AEST

Update:

The issue causing performance and stability-related problems within the MachShip platform still exists within the system.
The teams working on a solution to this will continue to work on finding a fix to the problem to ensure returning stability.

We will provide further updates when available.

Updated
Sep 11 at 02:32pm AEST

As a further update - 2:32pm

Work is continuing to apply a fix to the issue.

At this time, the MachShip team is unable to provide ETA on when clients will begin to see restored performance and stability.

When information is made available to our support team, they will be sure to advise

Updated
Sep 11 at 01:23pm AEST

The work to fix the issue within the database performance is continuing.

Users will still be experiencing poor performance and stability from the MachShip platform.

As advised previously, this issue is being worked on by the entirety of the MachShip backend teams and we hope to have a resolution for the customer base soon

Updated
Sep 11 at 11:53am AEST

As a further update, the issue has been identified and isolated to MachShip's database and how it is performing.

Currently our senior team as well score members of the infrastructure and database team are working through further isolate and fix the issue

Updated
Sep 11 at 11:23am AEST

As an Update. The issues affecting performance on the MachShip platform, which have been felt since early this morning, are still being seen.

The senior team is in the process of working through a resolution plan to restore the stability of the system.

We will continue to attempt to provide relevant updates when they are available.

MachShip understands the impact this incident is having on users and appreciates your patience as we work to return to normal performance.

Updated
Sep 11 at 09:26am AEST

As an Update, this P1 Incident is being investigated by all senior members of the MachShip team

We will continue to provide updates as they are available

Created
Sep 11 at 09:04am AEST

There are reports of performance and Access issues with the MachShip application.

The team is investigating this and we will provide advice as this is available