Archive for the 'Disruptions & Downtime' Category

24
Jul
10

***GRID IS ONLINE***

GRID IS ONLINE – MySQL repairs are complete, no data loss, we applogize for the outage but it was ultimately necessary

20
Jul
10

Grid Back Online

Hi everyone.  OSGrid is back online.  We took advantage of the downtime today to do some grid maintenance we were originally going to schedule for a few weeks from now.  We moved the database backend for the central grid services (except assets) to a new pair of db servers.  Our backend dbs had grown so large, and the disks on the machine serving them were sufficiently slow, that running the backups would effectively take down the inventory server for up to an hour.  So we had disabled the automatic, nightly db backups because they were causing inventory problems for users.  But that meant we didn’t have automatic, nightly db backups anymore.  We had to backup the dbs by hand when an admin was available to babysit the grid servers while it ran and kill the backup if they started causing problems.

The fallout from that is that the most recent inventory backup we have is from June 11.  While it appears most users were unaffected by the inventory problems yesterday, some were.  If your inventory is missing and you have an Inventory ARchive (IAR) you can just load it up in a region you’re in and your inventory will be restored.  If you don’t have an IAR to restore from, we can restore your inventory to as it was the date of the most recent backup, June 11.  We apologize for the inconvenience we know this will cause.  To have your inventory restored from June 11, please send your full avatar name via email to info@osgrid.org.  Depending on the number of people affected this may take us a day or two to complete the restoration for them.

The good news is now that we’ve migrated our db services to a new pair of hosts, we’re able, and have already started, making multiple db backups per day.  This will allow us to minimize impact from any sort of catastrophic failure in the future, and help us to recover more quickly.

It’s not clear at this point what caused the inventory erasure for some users, but we may never really know the answer.  OpenSim is still alpha quality software, and unexpected things can and will happen.  That’s not to say we couldn’t have done a better job with backups; because OSGrid is 100% donation driving we try to be as thrifty as possible, but in this case we should’ve spent the money earlier when we disabled automatic backups to quickly get us to a place where we could turn them back on.  But today we have the benefit of hindsight.

We apologize for the downtime and for the inventory troubles for those affected, thank you for your patience, and thank you for using OSGrid.

-Dave

19
Jul
10

**GRID OFFLINE – UPDATES!

Hello everyone,

First, let me apologize for the grid downtime, for those who are not aware of whats happening, late on Sunday evening it came to our attention that a handful of peoples inventories were reset.   Since we were not sure what was happening we immediately downed the grid in case bad things were still occurring.  We began a database restore for comparison, but because of our current backup server configurations, ie. Software Raid, this machines ability to restore the database is amazingly slow.  On another note some good news, this hardware is scheduled for this week to begin phase 1 of the hardware upgrade to fancy new high-power Hardware raid adapters, sadly this would have already been done a few weeks ago, but unfortunately we ordered the wrong 1u riser boards for our servers and the upgrade had to be rescheduled until we found the proper parts, not so easy!! anyway back to the real issue, because we found out so late about the issue on Sunday night, our volunteer staff tried to stay as long as we possibly could without being so tired that we would likely do more harm than good to the databases.  So I again want to apologize to everyone that we could not stay up all night and fix the servers as quickly as we would like, but you can rest assured that we will be back early in the A.M. on Monday morning trying to make things as right as we can for everyone who is having issues.  Until the database restore is complete we will not have any good explanation as to what really happened, but as soon as we do know i promise you all will know as well, Thank you for your understanding, it is a great pleasure to work with the great citizens of this grid, if you have any questions or would like to report you had inventory issues before the grid went down, please post on the following forum thread : [Grid Downtime Discussion]

Michael Emory Cerquoni (Nebadon Izumi) President, OSgrid Inc.

08
May
10

Upcoming asset service db maintenance

Update: This work has been completed.

~~~

Hello.  This Sunday beginning at 11pm US/Eastern (that’s UTC-4) the OSGrid asset service will be taken offline for database maintenance.  The downtime is expected to be no more than 30 minutes.  The grid will remain up during this time, but item uploads will fail.  No changes will be required by region operators or users.

When the work has been completed I’ll update this post.  I’ll also post to Twitter and in the #osgrid IRC channel on freenode.

Apologies for the inconvenience, and thanks for using OSGrid.

-Dave Coyle

[Comment on this post here.]

02
Mar
10

The BIG Refactor..

Hello everyone,

I wanted to give everyone one some notice on the upcoming server refactoring everyone has been hearing about. According to Melanie and Diva, the coding is complete on their server changes. This means several things for OSgrid, not all of which i will discuss right now, we are currently formulating plans on how to proceed forward. We ask that you all bear with us while we do some testing behind the scenes to prepare for all of these changes, some are drastic, some you wont even notice at all. Likely in the coming weeks you will see changes to the website and the simulator software, but we will do our best to help everyone prepare and step through it.

That is about all i can say for right now, but in the mean time its best to not update past the current OSgrid release version, or anything you are currently running today prior to GIT REV a9580ebb496637323548b75c2bda605790b18a6b (r/12335) as this version is the no longer compatible with OSgrid until we can complete the back end updates.

If you have any questions you can visit our web chat or IRC channel at irc.freenode.net #osgrid

~Nebadon Izumi

15
Jan
10

Asset server migration this Saturday/Sunday

~~~~~~~~~~
EDIT: This maintenance was successfully completed. –coyled
~~~~~~~~~~

Hello. As Adam mentioned a couple of posts ago, thanks to generous user donations OSGrid has a shiny new asset server ready to go into service. This will bring our asset storage capacity from 1TB to 3.5TB, and decrease the grid’s recurring monthly operational costs.

Most of the assets have already been sync’d to the new machine, but we’ll have to take the asset server offline to do one final sync before flipping the switch. This maintenance window is now scheduled for:

    Sunday, 2009-01-17 03:00-06:00 UTC
    a.k.a.
    Saturday, 2009-01-16 7pm-10pm PST

You can also go to http://tr.im/KwqN and convert that time into your local time zone.

All other grid services will remain up during this time, though the user experience will be degraded as sims will be unable to upload or retrieve uncached assets. After the conclusion of the maintenance, it’s recommended that region operators restart their regions.

Further status information will be posted to http://twitter.com/osgrid

No configuration change is required on your end.

Apologies for the inconvenience, thank you for your patience during this upgrade, and thanks for using OSGrid.

-coyled

05
Jan
10

**GRID ONLINE: THE BIG “SNAFU” take 2..

First i want to apologize to everyone for todays and this weeks downtimes and Inventory issues, it appears that time has caught up with us here at OSgrid. After 2.5 years of virtually no cleanup or maintenance on our inventory tables, it seems the cruft just finally built up to much.

Todays downtime was a result of much of that build up bringing things to a grinding halt, so after all this time the big clean up finally took place. Thanks to Melanie and her awesome ability to peer into the database and see what was wrong, spent the last few hours fixing and cleaning the entire table.

So please everyone thank Melanie Milland, and everyone who helped her and the rest of the team get through this ordeal, I would like to thank the grid admins and devs, including Adam Frisby, Hiro Protagonist, Adelle Fitzgerald and Dave Coyle as well for also helping and contributing to getting through this insanity. Thanks to WhiteStar Magic and others on the IRC and everyone who helped out on Lbsa Plaza, there are just too many to mention everyone, you are all great people and this grid could not function without any of you. (I am sorry if i forgot anyone its only because my brain is so scrambled right now :) .

Please log back into the grid and check your inventory and report anything that seems odd to our team. Thanks again everyone.

ok folks, sorry about take 2 on the outage but we realized after we opened the grid pretty quickly that something was still wrong, seemed people still had some duplicate folders, so Melanie once again plunged into the database to save the day, unfortunately the queries were massive and took much longer to process than we anticipated, we are sorry for this turn of events but it was ultimately necessary.  Again i thank everyone for their patience in this matter and hopefully this is the last outage for a while.

Also if you find you are missing items from your root folder, be sure to check in your “Lost and Found” anything we could not properly place back should have ended up in your Lost and Found.  if you have any questions please let us know.

05
Jan
10

**GRID OFFLINE: Inventory Server Troubles.

Currently the grid is offline for some maintenance to the Inventory server, we took the grid offline to prevent any damage from occuring while we do some maintenance, please hang in there and check back soon for more updates.

06
Oct
09

OpenSim/OSGrid Status Report

Concerning network protocol versioning, and the recent osgrid bump in protocol version:

Late last week, thursday or friday depending on where you are in the world, osgrid updated their grid backend services and plazas to a ROBUST service paradigm (a project-local acronym, dont go pester google about it). ** note in retrospect that timeframe is probably a day early ** As there are technically significant differences in the backend protocols implemented by the new server(s), an additional change was to increment the network protocol version number, forcing a cut-off of regions operating prior versions of the software. This has serious implications for region operators: they are forced to update to a version of the software from which they cannot roll back. If things go wrong, it’s a rough ride until the kinks get ironed out. The bad news is, things went wrong this time. What went wrong? several things. A few fairly significant, a good many minor – no one thing contributed to the problems we’ve seen since the update. The good news is, a lot , and I mean a lot of good is to come of it. We’re seeing memory footprints cut in half, really quick, really reliable on-the fly texture decoding (spells the end of blurry textures once and for all), and greatly increased capacity to endure loads in the release candidates – but issues remain and addressing them is an incremental, iterative process.

Which brings us to The Current State of the Code

There is a release on the website at OSGrid.org. It may, in fact, be a different revision from what is recommended. This state of affairs is no accident or oversight – it’s a consequence of the dedication of the developement and testing teams to keeping the best possible code available at any given time. Many incremental improvements and hotfixes have been applied in the last 72 hours or so – if you need to download the binaries, you can have faith that what is there is the best code currently available.

You can expect that releases will be coming fast and furious in the days ahead. Work continues in the interest of producing some very dramatic improvements in opensim, in the broadest of senses. Refactoring projects that have been long under way are nearing completion, code has been cleaned up, new architectures implemented, and many optimizations of memory and other resource use are focused at delivering these benefits in the short term – so please bear with us as we labor to produce what will be nothing less than the most game-changing release of opensim we have ever produced.

03
Oct
09

**ANNOUNCEMENT – Protocol Bump [Forced update] 10/03/2009

This is the official announcment for the Protocol Bump up forced update, it will be occuring today Saturday October 3rd between 6-8pm PST – you can prepare your regions now buy downloading the latest release on website. (OSgrid OpenSimulator 0.6.6.36c8d558 – [zip] [23.0mb] 10-02-2009) at the following link – [Download Here] – please note there are also crucial changes to the GridCommon.ini file you can view the example here for comparison : GridCommon.ini

if you have any questions or need guidance with upgrading please be sure to visit the web chat IRC channel and ask for some help, you can also post questions on the forums if you have trouble connecting. Thanks for everyones patience on this, i am really sorry for all the flip flopping on times about this update, but i think you are going to find it was well worth the wait, good luck to everyone!




Latest Twitter Update

  • *ATTENTION: If you were having trouble logging into the grid please try again, we believe the problem is fixed, let us know if not!! 16 hours ago

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Copyright © 2007-2010 OSGrid, Inc. - A California Nonprofit Public Benefit Corporation. All rights reserved, except where noted.

The OSgrid Logo, and the word 'OSgrid' are trademarks of OSGrid, Inc. Usage of these terms elsewhere is allowed under certain conditions.