Wednesday 10 April 2013 we will be switching the beach cluster to a ROCKS cluster. Beach will likely be down Wed-Fri. If you have any questions or concerns, please email firstname.lastname@example.org
Status Report (2 May 2012)
Beach is up and running and ready to be used since May 2nd, 11:45am (mountain time). So feel free to log in and start your processes. However, keep in mind that beach will be slow the coming day. One of the maintenance requirements was to rebuild directories. This will continue in the background in the coming 22-24hrs, but you should be able to access all of your data.
Please direct all questions and concerns to email@example.com and I'll respond to them as quickly as possible.
Sorry for the inconvenience
Status Report (7 April 2011)
As many of you know, beach began to behave badly about two weeks ago.
We replaced numerous hardware components in the server, all to no avail. Beach continued to crash, sometimes within two hours of being brought back up.
In the midst of these crashes we did notice that /data was having issues, as well as /home. /home was continually rebuilding its mirror and /data would go into read-only mode with input/output errors. We reported this to SGI and they insisted that the problems we'd been having were software, not hardware, related. They also indicated that the xfs errors and issues we'd been seeing on those two partitions could be cleared up by running file system checks on the system. I have to say that we have never seen the xfs filesystem fail without an underlying hardware problem, and I indicated that to SGI, but they insisted that was the root of the problem.
We were able to run an xfs check on /home, but /data was badly enough damaged that had to run an xfs repair. Unfortunately, the repair essentially unlinked the root directories under /data and threw everything into lost+found. There is no way to recover from that, and everyone knows that /data is not backed up.
Beach has been completely stable since the filesystem check and repair. We have stressed the system as much as possible without seeing any other errors. Although the xfs repair has made things usable, we do expect that there is still an underlying hardware problem. Please use beach normally, putting new data into /data, but please be aware that even though we've attempted to have as much redunancy as possible there is no subsitute for keeping backups of your important data.
Now, there is really nothing to be done but move forward. There is still 18TB of data sitting in lost+found and we must do something about that. I would like anyone who can write off their data to come forward and let me know. I could then remove any files owned by that person from lost+found and it would make looking for any other files a lot more manageable. For those out there who really need something, or would like to look for their data, we can work on it. We must clean out lost+found - it is taking up over half of the usable disk space in /data.
If all of the above was too much detail, I can boil it down to this: If we do nothing, all that data will sit in lost+found forever. So I'm putting a deadline - on June 1 I will remove anything left in lost+found. I'd like anyone who can live with the loss of their data to come forward so I can remove whatever I can find of theirs immediately. Anyone who wants me to help them look for their files, I'd be happy to. I know that I'll be working with just about each and every user on this, and I'm happy to do whatever I can to make this easier and free up /data for use. Beach is ready to be used, just think of /data as being empty unless and until we work to find files that are owned by you in lost+found. If you do desire to look through you files, let me know your username and that you want me to move all files owned by you out of lost+found into /data/<your_username> and we can get started.
If there are ANY questions about this, I am happy to answer them. Please direct all questions and concerns to firstname.lastname@example.org and I'll respond to them as quickly as possible.