09
Mar
15

OSgrid Update 2015-03-09

REOPENING PROGRESS
Touch wood, but so far, the reopening of OSgrid has gone extremely well.

There has also been a 30% increase in concurrency since return to service.

SERVICE STATUS
While there were a handful of bumps settling in, the new asset cluster has performed extremely well.

Along the way we have found the occasional bad asset, which were nicknamed “chicken bones” and require an admin to intervene.

But, the admin team has been very quick to find and remove these bad assets as they are encountered.

From investigation these assets were long standing corruption in the asset data which existed long before the RAID crash, which the new asset service has uncovered for us to clear up.

After the assets are rezzed, the problem occurs, and an admin clears the bad asset from disk, the uploaded replacement (good) assets match the asset hash and repair the damage.

VEHICLE CODE AND COMMUNITIES
Improvements in the base OpenSimulator vehicle code have led car, boat, and airplane enthusiasts to begin collecting together in their respective communities.

SANDBOX IMPROVEMENTS
Sandbox Plaza III has been updated to provide water space for boat developers, and NPC support is coming soon.

DONATIONS
OSgrid needs your support!

Please visit the donation page and help keep OSgrid strong!

http://www.osgrid.org/index.php/donate

01
Mar
15

OSgrid Logins Reopen

Melanie from Avination investigated the stuck asset service process and found an old SRAS asset was giving the asset code fits.
The offending asset was corrected and OSgrid logins have been reopened.
We now know what to look for and where if there are any future recurrences.
Thankfully, the problem looks to be very rare or not very likely based on the grid’s behavior so far.
As reopening tests continue we are working on how to safely scan the entire recovered SRAS assets to see if any more of these surprises are hiding.
This is exactly like finding one kind of needle in a stack of millions of similar needles.
Please bear with us as the reopening continues.

And, again, we are sorry for the inconvenience.

01
Mar
15

OSgrid Logins Temporarily Closed for Maintenance

We are sorry for the inconvenience, but we are temporarily shutting off logins for maintenance on the grid.

We have a stuck asset server process and have shut off logins as a precaution to prevent any possible asset damage.

Also, until we give the all clear, please do not load or save IAR or OARs.

Check back with us Sunday March 1 for additional updates.

25
Feb
15

OSgrid ONLINE!

OSgrid is back online and open!

We know its been a long, painful, frustrating outage, and we do appreciate your patience, support, and encouragement through some rather bleak months.

But the wait is over – logins to OSgrid are now open again and the grid is back online!

Many people today have noticed our initial testing, and initial admin logins, plaza and region restarts, and hypergrid logins have all been working as expected.

Given the large number of people and testing, it was decided to try and hold an office hours meeting on Wright Plaza, and despite a funky garbage collector – 17 people or more were able to join.

With these initial successes, it was decided that OSgrid was healthy enough to open up for wider testing.

We’re not anticipating any further major outages or issues, but there may still be an occasional short outage or  downtime over the coming weeks as OSgrid settles back into normal usage, for us to adjust a setting here or there with the new asset services or updated plazas.

REGION RESTARTS
As you begin reconnecting regions, you may notice an initial failure when trying to teleport to them.

You may need to purge the region using your login to the osgrid.org website, then restart the region itself in order to fully reconnect it after the outage.

INVENTORY AND ASSETS
While we have done large amount of testing already, there may still be an issue here or there with specific assets.

In our testing so far, the few asset glitches found do not indicate a problem with the asset recovery, new asset cluster, or new FSassets service.

Initial investigation points to small data changes or format issues present with specific assets before the outage which are now found by the XML serializer.

If you see widespread issues, please report them – but also try to have patience as we may be swamped for a while getting everything fully rolling for everyone again.

In closing, many thanks again to everyone who helped OSgrid through this, and continued to support and encourage us.

Let me be the first to say … welcome back!

OSgrid is now open!

23
Feb
15

OSgrid Update for 2015-02-23

ASSET SERVER STATUS
Melanie Thielker of Avination donated considerable code, design, time, and effort to build a new asset cluster for OSgrid, based on her FSasset service which she is also contributing to OpenSim core.
After Hiro requested additional datacenter networking changes and Melanie’s final cluster configuration and testing, both cluster nodes are up and running with the recovered assets.
Melanie pronounced her initial asset requests and the new cluster a complete success!

OSGRID RESTART
The final steps to re-opening are finally underway.
The inventory table was cleared as part of an initial attempt to return the grid to service with a new asset and inventory database when it looked like the recovery service was not able to restore the drives.
With the recovered assets in place, the matching inventory tables are being loaded, so that your login will have everything it needs to find your assets again.
Once the inventory tables complete their restore, return-to-service testing will begin with Hiro announcing an ETA for reopening announced shortly after!

SPECIAL THANKS
OSgrid would like to offer thanks to everyone such as Melanie, Justin, Diva, our supporters, and everyone who has gotten behind OSgrid during this catastrophe.
OSgrid would like to extend an extra thanks to Melanie from Avination for her stellar asset code contribution and cluster design and configuration to pull OSgrid back from the abyss.
OSgrid’s return would not have been possible without her continued support!

Next Update 2015-03-01 or 2015-03-02

16
Feb
15

OSgrid Update for 2015-02-16

DRIVE RECOVERY STATUS
The recovered drive assets have been fully copied back over to the new asset servers without further NIC driver weirdness.

ASSET SERVER STATUS
Now that the recovery loading is done, tickets are open with the data center to reconfigure the underlying network connectivity to finish the asset server cluster setup.
We had to hold off reconfiguration until the asset recovery from drives were complete to avoid possible outage during the loading.

OSGRID RESTART
Still no ETA yet, sorry! Getting closer, step by step, though! Keep checking this space!

OSGRID RESTRUCTURING
Background discussions on OSG restructuring and service changes continue but no definite plans. Other than removing the jump regions, most discussions revolve around changing how existing spaces are managed, with an eye to consolidation and some modernization, rather than something more drastic.
More news here once plans are definitive.

SPECIAL THANKS
OSgrid would like to offer thanks to everyone such as Melanie, Justin, Diva, our supporters, and everyone who has gotten behind OSgrid during this catastrophe.
The assistance, the patience, and the good wishes are all very much the rays of light we need to keep pushing forward in an otherwise very awful time.

Thank you all! Next Update 2015-02-22 or 23

09
Feb
15

OSgrid Update for 2015-02-09

DRIVE RECOVERY STATUS
The recovered drive was not able to be directly copied into the database due to USB weirdness causing the disk to offline.
However, once moved to a different server and exported over the network there, it has proceeded to restore pretty smoothly until yesterday…
The database restore was well past 90% before a pair of ethernet NIC outages struck.

ASSET SERVER STATUS
The constant load of the recovery of the assets from disk to database cluster over the network exposed an ethernet driver bug, delaying the completion of the copy.
It was not expected that bug would have impacted production use, only that we hit it hard enough during recovery to uncover it.
BIOS settings to disable power management for the NIC, and an updated OS NIC driver were put in place today and will hopefully permanently resolve that NIC error.
The recovery to database has been restarted and we’re all watching to see if there are further NIC driver issues.

OSGRID RESTART
Everyone is getting pretty hopeful that we’re close enough to announce an actual ETA, but we’re still just not quite there yet.
But, thanks to Melanie from AviNation’s constant, patient help, the recovery and restart is looking very good.

OSGRID RESTRUCTURING
Background discussions on OSG restructuring and service changes continue but no definite plans yet other than to remove the jump regions once the grid is live and otherwise consolidate and shuffle regions as needed.
More news here once plans are more definite.

SPECIAL THANKS
OSgrid would like to offer thanks to everyone such as Melanie, Justin, Diva, our supporters, and everyone who has gotten behind OSgrid during this catastrophe.
The assistance, the patience, and the good wishes are all very much the rays of light we need to keep pushing forward in an otherwise very awful time.
Thank you all!

Next Update 2015-02-15 or 2015-02-16




Latest Twitter Update

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 241 other followers

Copyright © 2007-2010 OSGrid, Inc. - All rights reserved, except where noted.

The OSgrid Logo, and the word 'OSgrid' are trademarks of OSGrid, Inc. Usage of these terms elsewhere is allowed under certain conditions.


Follow

Get every new post delivered to your Inbox.

Join 241 other followers