This Sunday I depart for the hot temperatures in Arizona where I plan to spend a month in cold A/C and 110+ degree temperatures. It’s not exactly a sabbatical; not exactly work, but something in between.
If you can tell by my old blog entries there have been significant gaps between posts. That’s due to the lack of time between wrangling kids under 3 and wrangling a small business. Fortunately, after 9 years, the business demands less of me, and frees time to take vacation days that have been sitting like rollover minutes for the characters of Lost. This gives me some time to mull over some topics and hopefully share a few of them with you.
As we’re slowly reaching the 10th year anniversary of Geocaching and Groundspeak, it’s a good time to step back and take a look at a project that has taken just shy of a quarter of my lifetime. It’s mind-boggling to know that we’ll have a million active caches in the world by 2010.
We’re also going to try and have a heck of a celebration next year. GeoWoodstock VIII will be in our own backyard, so we hope to coordinate our celebrations with all of the new visitors to the Emerald City.
Wish me luck in the heat of Arizona. It’s a dry heat, right?
Thursday, July 09, 2009
Tuesday, July 07, 2009
Fisher Plaza Kicks into (new) Gear
This is primarily a re-posting of information sent to the customers at Internap.
Fisher Plaza has taken a good look at their infrastructure and has a pretty robust plan to ensure that there is redundancy with their electrical systems. Even though our web sites are happily chugging along, they are running on gas while a more long term solution is being worked out. This latest update outlines both the short term and long term plans to get back to normal (or super-normal, since they are adding an additional layer of protection). The plans are fluid and may change, but you can tell the facilities are going to come out stronger from this plan.
(Reposted with permission from Internap)
Fisher Plaza has taken a good look at their infrastructure and has a pretty robust plan to ensure that there is redundancy with their electrical systems. Even though our web sites are happily chugging along, they are running on gas while a more long term solution is being worked out. This latest update outlines both the short term and long term plans to get back to normal (or super-normal, since they are adding an additional layer of protection). The plans are fluid and may change, but you can tell the facilities are going to come out stronger from this plan.
(Reposted with permission from Internap)
Near Term:
Proceeding immediately, the landlord will 1) obtain two additional generators and paralleling gear needed to provide redundancy across the 3 central risers affected by the outage. This will provide N+1 until we can return to full commercial power and allow required maintenance on the generators (required every 10 days) without any downtime. The city has approved closing the east end of John Street for the location of the additional generators. The landlord is also working with 2) Seattle City Light to get commercial power from the vault to the Life Safety System and Retail switchboard to provide a more reliable and effective solution to providing a safe work environment.
Long Term Utility Restoration:
The restoration plan involves building a completely new switchgear room outside the current vault and Switchgear Room. This would separate the Main 1 and Main 2 systems providing more reliability, new equipment, and the ability to begin construction while they are clearing out the damaged gear from the damaged Switchgear Room. All of the old Main 1 & 2 switchgear from the damaged switchgear room will be replaced with new equipment needed to provide a new Main1 service and the paralleling gear to bring up all four of the building's 1.5 MW generators on line together.
The landlord stressed that this is a very fluid plan and that suppliers are aggressively sourcing equipment, design, build, and shipping times. Once they are able to develop a specific timeline as well as Method of Operation (MOP) they will share it with Internap, and we will of course share it with our customers. They are expecting several months until full restoration.
Saturday, July 04, 2009
Colotastrophe: The Day After
Our servers are a bunch of primadonnas. They demand to be pampered in the greatest colocation facility in the world (if you agree with the video of Fisher Plaza touting that fact), resting on pillows of AC and fed power in Waterford crystal goblets. We literally pay more for the 5 cabinets that house the servers* than we do our entire Groundspeak office - and then some.
Around 5am Pacific today, all of our grumpy but lucid Groundspeak servers woke from their slumber to greet geocachers** who were, as one user wrote, scratching their arms in search for their next geocaching fix. Most were just happy to have the servers back online but others were asking questions about disaster recovery and communication in a crisis. Instead of finger pointing, although cathartic, I'd like to focus on what worked, what didn't, and how we can try to avert some issues if (and when) this happens again.
To set the stage, we have been hosted at Internap in the Fisher Plaza since 2002 and in that time have only had 2 significant events that related directly to facility issues. The last issue lasted around 8 hours while this one is, by far, the most signficant downtime in the history of the web site. In total we had 29 hours of downtime. Unfortunately the 29 hours were during the geocaching peak season on the busiest weekend of the year and, to compound things, a day off from work for many. The Fates were definitely conspiring to pick the worst day to bring the Geocaching.com site down.
What Worked
The usefulness of Twitter and Facebook became obvious for this crisis. All our web servers and email servers were all located at Fisher Plaza. We had very few options for posting updates, so we had to rely on outside systems to communicate with our community and our partners. I switched from Groundspeak emails to my Gmail account, and my iPhone running Tweetie helped me to get information out as I was "on the scene." By the end of the day I added an additional 800+ followers on Twitter which, in the past, was used as a toy for logging geocaching finds with my family and for the random Groundspeak update.
Also, although we didn't have the need for backups this time, we have daily backups of all our systems. Since this happened before our nightly backups occured it was close to the worst time for a data failure. At the most we would have lost a day of data. In a catastrophic event this isn't a total Fail. It just sucks.
What Didn't Work
Although I won't finger point at the cause of this issue, I will point out that Fisher Plaza people lacked any official communication with the first responders at the scene. Many clients of the building were in the dark, both figuratively and literally, while we were waiting outside for news of what really happened. Instead we had to join in on Twitter to figure out what happened. Was it a fire? (yes) Did the sprinklers turn on? (yes) OMG! Our machines are fried! (no. just the generator) If someone walked out of the building with some authority and told us what they knew - we could have passed that information on to our customers. Internap did a relatively good job at giving status updates though they were sparse and sometimes repeated. I'd give Internap a C and Fisher Plaza an F for communication.
I'll be just as hard on us and say that we should get an F for communication preparedness. Although I think we did a good job at working around our own issues with Facebook and Twitter (and this blog), we were unable to make updates available on our web pages and our iPhone application. The reason why some sites could do this and others could not is that our entire server infrastructure was in the Fisher Plaza basket. The other companies likely had better ways to switch over to a new location. Our only alternative, pointing DNS to another server, would have made it harder to get back online since many people would continue to point to the wrong machine when the servers were back with power. Since we only anticipated a ~12hr outage it made no sense to do something that could take another 24 hours to correct for some users.
What Next?
There are some obvious things to do to correct what didn't work, and some solutions that will require some thought. I'll highlight a couple of high level things we'll consider and implement.
We're not a bank, so although 29 hours is a long time to be down, we do not plan to duplicate our infrastructure so we are completely redundant. It is just too expensive to make fiscal sense. Instead, we'll ensure that in the case of a catastrophic event that we'll have the best backups and the best steps for restoring those backups to a new system. We already have a good system but we'll make it even better.
We'll have a better system for communicating with our customers, so these systems will be the focus for redundancy planning. This includes rerouting web servers and email. Even streaming my Twitter account on the front page of Geocaching.com would have been helpful for letting people know what is happening.
Lastly, we're going to create an official disaster recovery plan so everyone knows what to do at Groundspeak in the situation where there is a catastrophic event. We should always understand the worst case scenario and how to recover from it. We owe this to our customers.
For those in the US, have a Happy 4th of July! And thanks to everyone for your ongoing support of Groundspeak and the geocaching activity. From the Tweets and Facebook posts you definitely enjoy geocaching. Now go out and find a cache!
* we're not using all of the cabinets at Internap yet but we're still paying for them
** although we also run Waymarking.com and Wherigo.com, the geocaching community is easily the largest and most vocal, so I'm focusing on them for the blog. I know everyone else is just as excited to see our other sites back online.
Around 5am Pacific today, all of our grumpy but lucid Groundspeak servers woke from their slumber to greet geocachers** who were, as one user wrote, scratching their arms in search for their next geocaching fix. Most were just happy to have the servers back online but others were asking questions about disaster recovery and communication in a crisis. Instead of finger pointing, although cathartic, I'd like to focus on what worked, what didn't, and how we can try to avert some issues if (and when) this happens again.
To set the stage, we have been hosted at Internap in the Fisher Plaza since 2002 and in that time have only had 2 significant events that related directly to facility issues. The last issue lasted around 8 hours while this one is, by far, the most signficant downtime in the history of the web site. In total we had 29 hours of downtime. Unfortunately the 29 hours were during the geocaching peak season on the busiest weekend of the year and, to compound things, a day off from work for many. The Fates were definitely conspiring to pick the worst day to bring the Geocaching.com site down.
What Worked
The usefulness of Twitter and Facebook became obvious for this crisis. All our web servers and email servers were all located at Fisher Plaza. We had very few options for posting updates, so we had to rely on outside systems to communicate with our community and our partners. I switched from Groundspeak emails to my Gmail account, and my iPhone running Tweetie helped me to get information out as I was "on the scene." By the end of the day I added an additional 800+ followers on Twitter which, in the past, was used as a toy for logging geocaching finds with my family and for the random Groundspeak update.
Also, although we didn't have the need for backups this time, we have daily backups of all our systems. Since this happened before our nightly backups occured it was close to the worst time for a data failure. At the most we would have lost a day of data. In a catastrophic event this isn't a total Fail. It just sucks.
What Didn't Work
Although I won't finger point at the cause of this issue, I will point out that Fisher Plaza people lacked any official communication with the first responders at the scene. Many clients of the building were in the dark, both figuratively and literally, while we were waiting outside for news of what really happened. Instead we had to join in on Twitter to figure out what happened. Was it a fire? (yes) Did the sprinklers turn on? (yes) OMG! Our machines are fried! (no. just the generator) If someone walked out of the building with some authority and told us what they knew - we could have passed that information on to our customers. Internap did a relatively good job at giving status updates though they were sparse and sometimes repeated. I'd give Internap a C and Fisher Plaza an F for communication.
I'll be just as hard on us and say that we should get an F for communication preparedness. Although I think we did a good job at working around our own issues with Facebook and Twitter (and this blog), we were unable to make updates available on our web pages and our iPhone application. The reason why some sites could do this and others could not is that our entire server infrastructure was in the Fisher Plaza basket. The other companies likely had better ways to switch over to a new location. Our only alternative, pointing DNS to another server, would have made it harder to get back online since many people would continue to point to the wrong machine when the servers were back with power. Since we only anticipated a ~12hr outage it made no sense to do something that could take another 24 hours to correct for some users.
What Next?
There are some obvious things to do to correct what didn't work, and some solutions that will require some thought. I'll highlight a couple of high level things we'll consider and implement.
We're not a bank, so although 29 hours is a long time to be down, we do not plan to duplicate our infrastructure so we are completely redundant. It is just too expensive to make fiscal sense. Instead, we'll ensure that in the case of a catastrophic event that we'll have the best backups and the best steps for restoring those backups to a new system. We already have a good system but we'll make it even better.
We'll have a better system for communicating with our customers, so these systems will be the focus for redundancy planning. This includes rerouting web servers and email. Even streaming my Twitter account on the front page of Geocaching.com would have been helpful for letting people know what is happening.
Lastly, we're going to create an official disaster recovery plan so everyone knows what to do at Groundspeak in the situation where there is a catastrophic event. We should always understand the worst case scenario and how to recover from it. We owe this to our customers.
For those in the US, have a Happy 4th of July! And thanks to everyone for your ongoing support of Groundspeak and the geocaching activity. From the Tweets and Facebook posts you definitely enjoy geocaching. Now go out and find a cache!
* we're not using all of the cabinets at Internap yet but we're still paying for them
** although we also run Waymarking.com and Wherigo.com, the geocaching community is easily the largest and most vocal, so I'm focusing on them for the blog. I know everyone else is just as excited to see our other sites back online.
Friday, July 03, 2009
FAIL - Fisher Plaza Outage
When I get a phone call at 3am it is one of two things - either a relative has passed away or a server has gone down. Looking at those two options I would always take the latter. Bryan was on Caller ID, so I knew it was the latter. What I didn't realize was that this wasn't the ordinary check, reboot and "all's good" kind of server issue.
I tried to VPN into the machines and nothing was resolving, which is generally a Bad Thing. I dusted off my cardkey (unused for at least a year), grabbed my car keys and headed out the door in that daze that only a mom or dad knows during a 4am feeding. The usual fearful thoughts went through my head about the worst case scenarios.
Once I arrived at the Fisher Plaza building I circled the block and drove to the entrance to the parking garage, passing my keycard and receiving a satisfying beep and green light, but the gate didn't raise to let me enter. Through my dozy haze I realized that the parking garage was a dark maw with only the peek of a car in the shadows. It dawned on me that this was no longer just My Problem but Someone Else's Problem. This is one part frustrating and one part relieving. Although a power outage was a shock to the machines, we've handled this before at Fisher Plaza* and the machines were restarted with some light caresses. So I settled into the parking lot across the street and walked over to the entrance.
Arriving at the entrance I validated my initial observation that, yes, the power was out, so I struck up a conversation with some folks loitering near the entrance. "Is the power out?" I inquired, more as a conversation starter to get some additional information about what happened.
"Duh," a woman responded with a linguistic flourish. Apparently this was a KOMO employee Who Doesn't Speak with Tech Guys, but I persisted and got some general information. Yes, the power was out. It was due to a fire and one of the generators was fried. And it had been down since around 11pm. And they were here since 3 and they were sooo bored.
I realized that whatever welcome I had spent had emptied the account, so I noticed a person looking just as dazed and techie as me and struck up a conversation with him. Andrew was from PopCap Games and was also called out to check on his servers. Although most of their sites were up, their Facebook games were hosted exclusively at Internap in the Fisher Plaza Building.
After a while, more tech folks would appear and congregate, sharing any emails received by Internap which were sparse and devoid of any real content, but we'd get wisps of information from other folks who "know people" - generally the guys who actually have to get the work done. What I heard was this:
- An electrical cabinet set itself on fire between 11pm and midnight
- There was 6 inches of water in the parking garage where the generators are. This happened from putting out the fire.
- No one was injured (thanks for asking)
- One generator is fried and the other is sitting in the water so it is unusable
- The Fire Marshall wasn't letting anyone in (until around 8am when I got in)
- Rumors stated that the sprinklers did turn on but as alarming as it sounds, they didn't turn on in the colocation facility - so no wet machines
- Kiro stations like Kiro Channel 4 in the building weren't broadcasting - instead they were broadcasting content from sister stations.
Realizing I couldn't get the word out on the site outages via the normal channels, like email and the web sites, I resorted to posting to Twitter, which in turn posted to Facebook. Fortunately the geocachers continued to retweet and repost the updates throughout the day, so hopefully the information is getting out there. We even updated the text for the iPhone application but that won't satisfy many folks who are experiencing network errors when trying to look up caches.
Fortunately we're in good company. Authorize.Net and Bing Travel and other larger clients are at the facility so Internap is being very serious about getting this resolved. Sometimes it is good to ride on the coattails of the bigger guys when you have a big problem to solve.
I'll update when I have more information. In the meantime I'm monitoring emails and Twitter. When the machines come back online I'll get a flurry of emails so I can head down to make sure that the machines get back online. And I'll continue to update via Twitter and Facebook until this gets resolved. You can follow the latest developments with the hashtag #fisherfire or search for Fisher Plaza or follow me @locuslingua on Twitter.
*Around 2006 Fisher Plaza lost power allegedly due to a safety switch meant to shut down power if someone was electrocuted. It's a nice safety feature unless some knucklehead hits it by mistake.
Edited: It was KOMO, Not KIRO. Thanks for the comments!
Labels:
colocation,
fail,
fisherplaza,
geocaching,
groundspeak,
servers
Thursday, April 06, 2006
GPS Helps Predators Find your Kids
The title sounds scary, huh? It makes for great headlines which is what I initially thought was the intent of Robin Raskin's syndicated column topic called "Worst Hi Tech Gifts" last November. I initially discovered the topic from John Matarese's column called "Don't waste your money" and kicked off an email letting him know of the correction regarding the now-defunct Gizmondo device:
It's not unusual to be ignored via email so I shrugged it off and assumed it was just ignorance, or laziniess, or both that caused such a poorly penned article. Today, however, the Google news alert kicked out a newer article about the very same program. Coming from a biased newssource you have to take this with a grain of salt, but it does sound inticingly juicy:
The Gizmondo does not have a feature for predators to find your child. GPS receivers are receivers and not transmitters. I'm sure any gaming element within the unit will protect the child from broadcasting his or her location to others. The only game I'm aware of that even uses GPS is Colors and it hasn't been released yet. There is a GPS element but it is not in real time and most likely does not broadcast your child's position to others. That would be a stupid thing for a company to do.Mr. Materese was kind enough to respond the same day:
In fact, the Gizmondo device can be set up by parents to allow them to monitor their children. So in essence it provides a feature for parents to protect their children - not the other way around.
Hope this correction helps. It's kind of damaging material for a new concept like GPS in handheld games, especially with misinformation flowing about.
I am going to refer you to robinraskin. com. She is the Computer Columnist who was the source for that story.I kicked off a similar email to Robin and never received a response. Mr. Materese also adjusted his text on his own site noted on the "Worst Hi Tech Gifts" link, much to his credit.
She and DS Simon Productions (the video feed service) offered that story to HUNDREDS of TV stations in every city in the USA. My guess is that dozens of TV stations are airing the clip saying the Gizmondo could lure sexual predators.
Since it came from a feed, and not from us, I'm not in a position to contradict what came from the feed service. You may be 100% correct....but I wish you would contact Robin Raskin.
It's not unusual to be ignored via email so I shrugged it off and assumed it was just ignorance, or laziniess, or both that caused such a poorly penned article. Today, however, the Google news alert kicked out a newer article about the very same program. Coming from a biased newssource you have to take this with a grain of salt, but it does sound inticingly juicy:
By itself, this VNR is little more than a tri-company infomercial that plugs numerous products while trashing its competitors. And yet when laundered through credibility of TV journalism, viewers are deceived into thinking they're watching an independent news report with an impartial consumer expert.Based on the comments regarding GPS technology in the Gizmondo device being used by sexual predators it's outright fraud, in my opinion. The damage done by articles like these sets back the widespread adoption of GPS technology. Sadly I doubt this kind of material would be picked up by the larger media organizations.
Wednesday, March 22, 2006
Sony Keynote Mentions Geocaching
In the Playstation 3 keynote address at the Game Developer Conference in San Jose today, Phil Harrison mentioned geocaching:
My interpretation is that if it isn't mainstream it is underground, at least for the larger game development companies. Geocaching doesn't fit well into any of the standard genres like first person shooters, adventure games, sport or activities like golf or kayaking. It combines technology and the outdoors and is hard to put your finger on.
It does have a worldwide following but no real statistics have been reported on how popular it really is. Perhaps if that information was available it would change some minds on the subject. Or maybe it is still underground. I kind of like it that way.
Harrison also mentioned that GPS functionality will soon come on board, noting that, “Although geocaching is as underground as it gets, who can't see potential?”.Geocaching isn't exactly an underground activity, though admittedly it has been called "the most popular outdoor activity no one knows about." There have been literally thousands of articles written about the activity, radio interviews with geocachers, and it's even listed as one of the top "treasures" on a repeating show on the Discovery Channel. So what does Harrison, the president of Sony Computer Entertainment's Worldwide Studios mean exactly?
My interpretation is that if it isn't mainstream it is underground, at least for the larger game development companies. Geocaching doesn't fit well into any of the standard genres like first person shooters, adventure games, sport or activities like golf or kayaking. It combines technology and the outdoors and is hard to put your finger on.
It does have a worldwide following but no real statistics have been reported on how popular it really is. Perhaps if that information was available it would change some minds on the subject. Or maybe it is still underground. I kind of like it that way.
Tuesday, March 21, 2006
Augmented Reality Gaming
In the office we've been playing with names to call the new activity of Augmented Reality Gaming (or until recently, Geolocational Gaming). What came out of it was a fun term called auging or ogging. I prefer ogging though it does go a bit abstract.
That's what I have so far.
What is auging? (pronounced ogging)
Auging is short for Augmented Reality Gaming. Unlike virtual reality that immerses your senses in an alternate world, augmented reality takes virtual information and overlays it on top of real space. This means instead of being in some kind of Matrix-style goo with implants stuck in your head you are using your own eyes and ears to interact in the real world. The only difference is you have additional sounds and images that you can view and interact with via headsets and/or handheld devices. No goo is necessary.
What is a typical auging experience?
There's nothing really typical about an auging experience, but the best comparison would be an adventure games in the real world. This genre started with games like Zork from Infocom and more recently through various LucasArts games like Secret of Monkey Island and Sam and Max Hit the Road. Remember Myst? Riven? These are all good examples of adventure games.
In the real world you can go on your own adventures while auging. Using a handheld device and GPS technology (which pinpoints your location) the device creates a gaming environment that reacts with your movements. For example at a zoo, If you're at the monkey cage you can "talk" to a monkey with your handheld computer because the handheld knows where you are and the game has a virtual monkey that "lives" at that location. In your handheld you may even have a virtual banana that you have in your inventory. Or, you may have to visit somewhere within the zoo to find the virtual banana that "lives" at another location. Although the monkey and banana doesn't exist in the real world you can easily imagine they do as part of your game.
What makes auging so exciting is that you are no longer chained to your home entertainment system or computer to play adventure games.
Is this Alternate Reality Gaming (ARG) like I Love Bees?
I Love Bees was an online promotion for Halo2 that involved going to phone booths in the real world to answer phone calls from in-game characters. Though there are similarities between ARGs and auging there is far less online interaction and more physical exertion involved. As popularity in auging increases the more likely that you will see the two activities blend together.
That's what I have so far.
Subscribe to:
Posts (Atom)