01
Mar
15

OSgrid Logins Reopen

Melanie from Avination investigated the stuck asset service process and found an old SRAS asset was giving the asset code fits.
The offending asset was corrected and OSgrid logins have been reopened.
We now know what to look for and where if there are any future recurrences.
Thankfully, the problem looks to be very rare or not very likely based on the grid’s behavior so far.
As reopening tests continue we are working on how to safely scan the entire recovered SRAS assets to see if any more of these surprises are hiding.
This is exactly like finding one kind of needle in a stack of millions of similar needles.
Please bear with us as the reopening continues.

And, again, we are sorry for the inconvenience.

01
Mar
15

OSgrid Logins Temporarily Closed for Maintenance

We are sorry for the inconvenience, but we are temporarily shutting off logins for maintenance on the grid.

We have a stuck asset server process and have shut off logins as a precaution to prevent any possible asset damage.

Also, until we give the all clear, please do not load or save IAR or OARs.

Check back with us Sunday March 1 for additional updates.

25
Feb
15

OSgrid ONLINE!

OSgrid is back online and open!

We know its been a long, painful, frustrating outage, and we do appreciate your patience, support, and encouragement through some rather bleak months.

But the wait is over – logins to OSgrid are now open again and the grid is back online!

Many people today have noticed our initial testing, and initial admin logins, plaza and region restarts, and hypergrid logins have all been working as expected.

Given the large number of people and testing, it was decided to try and hold an office hours meeting on Wright Plaza, and despite a funky garbage collector – 17 people or more were able to join.

With these initial successes, it was decided that OSgrid was healthy enough to open up for wider testing.

We’re not anticipating any further major outages or issues, but there may still be an occasional short outage or  downtime over the coming weeks as OSgrid settles back into normal usage, for us to adjust a setting here or there with the new asset services or updated plazas.

REGION RESTARTS
As you begin reconnecting regions, you may notice an initial failure when trying to teleport to them.

You may need to purge the region using your login to the osgrid.org website, then restart the region itself in order to fully reconnect it after the outage.

INVENTORY AND ASSETS
While we have done large amount of testing already, there may still be an issue here or there with specific assets.

In our testing so far, the few asset glitches found do not indicate a problem with the asset recovery, new asset cluster, or new FSassets service.

Initial investigation points to small data changes or format issues present with specific assets before the outage which are now found by the XML serializer.

If you see widespread issues, please report them – but also try to have patience as we may be swamped for a while getting everything fully rolling for everyone again.

In closing, many thanks again to everyone who helped OSgrid through this, and continued to support and encourage us.

Let me be the first to say … welcome back!

OSgrid is now open!

23
Feb
15

OSgrid Update for 2015-02-23

ASSET SERVER STATUS
Melanie Thielker of Avination donated considerable code, design, time, and effort to build a new asset cluster for OSgrid, based on her FSasset service which she is also contributing to OpenSim core.
After Hiro requested additional datacenter networking changes and Melanie’s final cluster configuration and testing, both cluster nodes are up and running with the recovered assets.
Melanie pronounced her initial asset requests and the new cluster a complete success!

OSGRID RESTART
The final steps to re-opening are finally underway.
The inventory table was cleared as part of an initial attempt to return the grid to service with a new asset and inventory database when it looked like the recovery service was not able to restore the drives.
With the recovered assets in place, the matching inventory tables are being loaded, so that your login will have everything it needs to find your assets again.
Once the inventory tables complete their restore, return-to-service testing will begin with Hiro announcing an ETA for reopening announced shortly after!

SPECIAL THANKS
OSgrid would like to offer thanks to everyone such as Melanie, Justin, Diva, our supporters, and everyone who has gotten behind OSgrid during this catastrophe.
OSgrid would like to extend an extra thanks to Melanie from Avination for her stellar asset code contribution and cluster design and configuration to pull OSgrid back from the abyss.
OSgrid’s return would not have been possible without her continued support!

Next Update 2015-03-01 or 2015-03-02

16
Feb
15

OSgrid Update for 2015-02-16

DRIVE RECOVERY STATUS
The recovered drive assets have been fully copied back over to the new asset servers without further NIC driver weirdness.

ASSET SERVER STATUS
Now that the recovery loading is done, tickets are open with the data center to reconfigure the underlying network connectivity to finish the asset server cluster setup.
We had to hold off reconfiguration until the asset recovery from drives were complete to avoid possible outage during the loading.

OSGRID RESTART
Still no ETA yet, sorry! Getting closer, step by step, though! Keep checking this space!

OSGRID RESTRUCTURING
Background discussions on OSG restructuring and service changes continue but no definite plans. Other than removing the jump regions, most discussions revolve around changing how existing spaces are managed, with an eye to consolidation and some modernization, rather than something more drastic.
More news here once plans are definitive.

SPECIAL THANKS
OSgrid would like to offer thanks to everyone such as Melanie, Justin, Diva, our supporters, and everyone who has gotten behind OSgrid during this catastrophe.
The assistance, the patience, and the good wishes are all very much the rays of light we need to keep pushing forward in an otherwise very awful time.

Thank you all! Next Update 2015-02-22 or 23

09
Feb
15

OSgrid Update for 2015-02-09

DRIVE RECOVERY STATUS
The recovered drive was not able to be directly copied into the database due to USB weirdness causing the disk to offline.
However, once moved to a different server and exported over the network there, it has proceeded to restore pretty smoothly until yesterday…
The database restore was well past 90% before a pair of ethernet NIC outages struck.

ASSET SERVER STATUS
The constant load of the recovery of the assets from disk to database cluster over the network exposed an ethernet driver bug, delaying the completion of the copy.
It was not expected that bug would have impacted production use, only that we hit it hard enough during recovery to uncover it.
BIOS settings to disable power management for the NIC, and an updated OS NIC driver were put in place today and will hopefully permanently resolve that NIC error.
The recovery to database has been restarted and we’re all watching to see if there are further NIC driver issues.

OSGRID RESTART
Everyone is getting pretty hopeful that we’re close enough to announce an actual ETA, but we’re still just not quite there yet.
But, thanks to Melanie from AviNation’s constant, patient help, the recovery and restart is looking very good.

OSGRID RESTRUCTURING
Background discussions on OSG restructuring and service changes continue but no definite plans yet other than to remove the jump regions once the grid is live and otherwise consolidate and shuffle regions as needed.
More news here once plans are more definite.

SPECIAL THANKS
OSgrid would like to offer thanks to everyone such as Melanie, Justin, Diva, our supporters, and everyone who has gotten behind OSgrid during this catastrophe.
The assistance, the patience, and the good wishes are all very much the rays of light we need to keep pushing forward in an otherwise very awful time.
Thank you all!

Next Update 2015-02-15 or 2015-02-16

02
Feb
15

OSG Update for 2015-02-01

DRIVE RECOVERY STATUS
Asset import from the recovered disk has begun and is making great progress, currently over 25% complete and running two times faster than anticipated.
Melanie from AVN and OSG continue to monitor the asset restore from the recovered drives to the new asset cluster.

ASSET SERVER STATUS
The new asset server cluster is getting a workout from the asset recovery and doing well.
With Melanie’s cluster replication design, the assets are importing to both asset servers at once and will not require an additional, later replication step.
Melanie and OSG continue to monitor the new asset server cluster during this recovery.

OSGRID RESTART
OSgrid restart is getting closer but there is still no specific date or ETA.
Once the current asset recovery and testing is complete, there may be additional testing required.
However, initial indications have everyone positive.

OSGRID RESTRUCTURING
This week focused on the asset recovery procedures without discussing restructuring.
More news once additional planning and testing has been done and final decisions can be made.

SPECIAL THANKS
OSgrid would like to offer thanks to everyone such as Melanie, Justin, Diva, our supporters, and everyone who has gotten behind OSgrid during this catastrophe.
The assistance, the patience, and the good wishes are all very much the rays of light we need to keep pushing forward in an otherwise very awful time.

Thank you all!

Next Update 2015-02-08 or 2015-02-09




Latest Twitter Update

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 240 other followers

Copyright © 2007-2010 OSGrid, Inc. - All rights reserved, except where noted.

The OSgrid Logo, and the word 'OSgrid' are trademarks of OSGrid, Inc. Usage of these terms elsewhere is allowed under certain conditions.


Follow

Get every new post delivered to your Inbox.

Join 240 other followers