We Are Back!

Discussions

We Are Back!

Posted by Webmaster on October 20, 2025 at 6:43 pm
StudioVeena.Com was down briefly today as part of a much larger, worldwide outage, affecting millions of sites, connected to failures at Amazon’s Web Services business.

While we are not hosted on Amazon Web Services, the license server for one of the pieces of software we use was. Once we determined that was the root cause of the failure I immediately began implementing a workaround to eliminate dependency on the offending software. The fix was implemented and we were back up withing two and a half hours of going down while many larger sites were still fighting to get themselves up and running again.

I thought it would be a fun exercise to tell y’all in a very concise manner what an emergency like this looks like and how we resolve it.

– It all started at about 10:30 am EST when our site monitoring solution told us StudioVeena.Com had been down for two minutes. At this point I hadn’t heard about AWS’s failure and technically it had been resolved.

– I immediately logged in and began evaluating the issue. The first thought when you run a website is attack. It was unlikely as we have all kinds of countermeasures in place but I checked. No strong signs of attack but on the off chance I tightened down controls a bit and expanded our server capacity hoping to get some users through.

– Connections were stacking up without delivering web pages to our users. Generally this will lead a technician next to the database. If you have a failed read node or a corrupt table database queries can stack up never to be resolved. Oddly the database was showing no connections whatsoever.

– So now I’m showing nothing out of the ordinary for traffic and a database that isn’t even recieving any connections. I duplicated our environment and began testing and profiling each component. It didn’t take long to find that one of the pieces of software integral to StudioVeena.Com was attempting to verify our license and the licensing server was returning nothing.

– At this point I have determined what we call “root cause”. Now to decide on a plan. I knew that particular piece of software had just released an update, the only purpose of which was to implement their new licensing system.

– My plan was set: Rollback to a previous version of the software using the old licensing system. Only the software provider was completely down because they use their own software.

– Next option was to restore a previous version of StudioVeena.Com and copy out just this software into our current version so we don’t introduce any security issues. NOTE: This is always top of mind for a good software engineer we used to always say “Don’t burn down the house to fix a leaky pipe” the idea being that just rolling an entire system back can introduce security issues that can destroy your whole business even though you might get your system up quickly.

– After replacing the appropriate bits of the system and restarting everything we were off to the races. No security issues were introduced and everything was up and running again.

– Cleanup involved documenting the changes in our knowledgebase and submitting tickets with our vendor.

– It wasn’t until about an hour after everything was resolved and I was getting lunch for Veena and I that Veena sent me a Tik Tok about the AWS failure. A little more digging and I knew the exact failure point and why an issue that had supposedly been resolved was affecting us. The AWS failure had led to broad corruption of databases they were hosting.

– At StudioVeena.Com we take both your pole journey and your privacy seriously. I, Webmaster, leverage every ounce of my 30 years of experience in the software engineering and infrastructure world to apply best practices that go above and beyond what most businesses a hundred times our size do.

If you made it this far I hope this was informative to you. If you have any questions let me know, I love to further a safer and more secure internet by sharing best practices for security, software, and infrastructure.
- This discussion was modified 2 months ago by Webmaster.
Webmaster replied 2 months ago 2 Members · 2 Replies
2 Replies

Togame

Member
October 21, 2025 at 9:26 am

That was interesting to read. It is surprising how fragile the modern internet sometimes is.
Webmaster

Administrator
October 21, 2025 at 12:01 pm

In the tech world we view Amazon with great fear. They have somehow gotten themselves into very key positions in supporting our communications infrastructure yet they have failed to decentralize and create redundancy around key components that other similar companies have.

If for example Azure or Google Cloud Platform held the same sway as AWS, these issues would not happen. All their services are decentralized there is no one service that is housed in a single data center, or that doesn’t have geographical redundancy.

Togame

Webmaster

Register FREE!

Already a member?