It’s disappointing how sometimes a “reputable” product, something that you have faith in, lets you down. Like Solaris. I spent an enormous amount of time, together with a colleague, troubleshooting a variety of interoperability and performance problems with a Sun Thumper running Solaris 10. Really very silly things like a SMF script that calls an undefined shell function. This is the script that starts up the iSCSI target service. It calls a function that is not defined. The service cannot even start, I wonder how did this bug escape the most basic QA testing?
I’ve used Linux a long time and I’m quite familiar with it. But I’ve also used Solaris for a long time, at the time when it was still SunOS 4. Linux is more “fun”. It evolves really quickly to accommodate new features, things that have all-around appeal to notebook, desktop, and server environments. Solaris is a not so friendly monster, but it does have some very powerful features useful in enterprise production deployments.
So one of the thing we wanted to do with the Sun Thumper is video storage. Sun sells video surveillance as one of the key applications of their storage server. We tried CIFS (Windows file sharing). It failed to work with multiple video surveillance software. We tried NFS. It also failed to work with multiple video surveillance software. We’re talking about mainstream, well-supported, and oft-used video surveillance software. Not something strange with peculiar requirements that is rarely used in the industry. It was rather disappointing.
Nevermind. More recently, we decided to give iSCSI a shot. It seems to work better. Finally, we thought we had a solution. Except that it works for only 4 – 5 hours. After that the performance drops so bad that it is just unworkable. I can understand that a 7TB partition is quite sizable, but waiting over 1 day to “quick format” and still not completing is too much.
It turns out that an OS upgrade was required to fix a bunch of iSCSI issues. (Yeah, so we had a version of Solaris that was designed to work for demonstration purposes, but not quite so for long-term production use.) So we did the upgrade. Then we found that the iSCSI target would not even start. There was a stupid bug in the SMF script which referred to functions like smf_zonename without it being defined. Even after that was fixed, the iSCSI target daemon died. That, we fixed by deleting its configuration file.
We’re keeping our fingers crossed, hoping that we do have a working storage box now.
We seem to have bad luck with “big name” products. Like many years ago, DEC (Digital Equipment Corporation, which was bought over by Compaq, which was then bought by HP) ran advertisements on our local TV channels about how reliable their servers are. They listed customers such as Amazon, which at that time was like the online e-commerce company of the world. DEC servers must be pretty solid to run the Amazon website non-stop 24×7! Yeah. We had the same kind of DEC servers. Ours crashed like up to 6 times in one day. Even my Windows 95 at that time did not crash that many times in one day.
Then, we also had “bad experience” with a F5 load balancer many years ago. It is a web accelerator, load balancer and high-availability solution for running websites (among other applications). We had a pair of them so that they could also failover to each other to ensure that the box itself was also highly available. It worked well, except that they crashed and rebooted regularly a few times an hour. But they worked… box 1 crashes, failover to box 2, box 1 boots up, then box 2 crashes, so failover to box 1, then box 2 boots up, and the cycle keeps repeating. I don’t quite think this would be an acceptable operating norm.
Sometimes, it seems that although cheapo, open-source, and/or DIY solutions may have certain “problems” in enterprise production environments, they may not actually be worse off than the “enterprise solutions”.