expires 5-22-2012

Hey, listen! Remæus needs your help! Spare just a moment of your time and vote for LocalSense™. You can read a longer explanation in the Main Lobby topic!

Detailed Explanation of Tonight's RPG Downtime (May 28th)

Topic Tags:

If you have something you want to present to the entire community, post it here.
We sincerely apologize for tonight's downtime, as it happened for almost a full two hours, which is much longer than any previous unplanned downtimes. As per the RolePlayGatewayTwitter account (and thankfully we had one, so we could announce updates while the server was down: follow us now if you haven't already), it was completely unexpected and we have decided to apologize to the community with a fully detailed explanation as to what happened, why it happened, and what we've done to prevent it from happening in the future.

Warning: Complex and mind-numbing code-fu and server leetness condensed into simple, readable sentences with silly similes and morbid metaphors ahead. Read at your own risk.

Enter the RolePlayGateway server, a complex quad-core beast with several interacting components. As many of you are aware, we have previously had a lot of issues with the egosearch (or "View Your Posts") feature, including but not limited to blank pages and empty results. This feature is one of the most heavily used tools on the site, as it allows quick and easy access to all of the topics in which you've posted.

However, this tool is not without its flaws - being that it runs a very extensive set of queries on the server, it often puts a heavy load on our SQL database. The SQL database is where every single bit of RolePlayGateway's data is stored, and when posts are made on the forum, tiny little robots scan through this database and update and retrieve all of the information necessary to serve your request. Users with a large number of small posts may notice more problems than users with a smaller number of bigger posts, because the overhead for finding these files is based on the total number of posts. Of course, when many users are on the site at the same time, there are literally hundreds of these little robots (called "pointers") running around and updating the database.

The problem stems from when one robot is updating a chunk of data that another robot is waiting for. To make sure no one loses any data, only one robot can access any part of the database at a time. This amount of time is usually very, very small (during optimal performance, this should take no longer than 0.001s) but when a robot needs to check hundreds of other parts of the database before updating another, and there are several robots waiting for those parts, there can be what is a veritable traffic jam. This traffic jam, if not resolved quickly, often cascades into much longer wait times (during our worst times, up to around 5.000s) for each query.

To resolve this, we decided to eliminate a lot of our robots in favor of one big robot that keeps its own copy of the database for search queries only. To do this, we implemented a new Search Engine on the back end of our site called Sphinx. This search engine plugs directly into our site's server and does the same work that the little robots used to do in one big chunk, but in a separate bulding, if you will. This frees up the hallways in the database for more little robots to do other work, which ultimately makes the whole site run much faster. There is one small drawback, in that occasionally your searches will come back with no results, but this is an issue we are working on and hope to have an update for you in the future.

When Sphinx communicates with the server, it does so over a so-called "tube", or a port. All communication is sent through this port between our HTTP server (Apache) and Sphinx, and is exactly what frees up other tubes for the rest of the work around the site. Apache is who your browser talks to (Be it Google Chrome, Firefox, or Internet Explorer) when it wants to visit RolePlayGateway, and Apache listens to you, tells the robots to do work, then sends you exactly what you requested within a matter of milliseconds. If all of those little SQL robots get their job done very quickly, you can usually get your page back in under 0.100s. However, it is important that this tube (the port between Sphinx and Apache) is created in the proper order, or disastrous situations arise.

Such was the case tonight. In some special circumstances (currently set to 4000 requests), Apache needs to restart itself so it can make sure it has a clean work area and that it can quickly handle everything that the wonderful users of RolePlayGateway ask of it. Unfortunately, Apache isn't the smartest worker in our roleplaying factory of awesomesauce, and it occasionally forgets that it needs to communicate with Sphinx before starting up. When it tries to restart without telling Sphinx, it will go into a "Stopped" state (or flat our crash), and will simply fail to restart. When this happens, the robots running the database finish their work and sit idle, because Apache isn't telling them what to do.

What we did to fix it...
This particular problem has happened three times in the past couple weeks, but since we have such an amazing group of RolePlayGateway Administrators, it was usually caught and fixed in a matter of minutes. Unfortunately, we missed it this time. We've updated our monit configuration to actually do the work automatically if it detects that the server is down for more than a few minutes. (Monit is like another big robot that monitors things on a server)

And so, long story short - we're very sorry about the inconvenience, and we've worked hard to make sure this problem doesn't happen again. If it does, ping us on Twitter or shoot us an email (we're admin [at] roleplaygateway (dot) com), and we'll get the problem resolved as quickly as possible!

Now, back to your regularly scheduled roleplay...
User avatar
Remæus
Creator and Owner
Member for 7 years



Eric, no offense, but your comprehensive explanation of what happened seems lost upon this site.

Roughly 100 views, from a potential userbase of 13,000 or so people, means that only .0013% (or is it .013%) percent of the total userbase which could have seen this post actually did. Yep.
User avatar
Kronos
Member for 4 years


To be honest, us Roleplayers don't really read the notices regarding site maintenance and all unless it bothers us. 2 hours is hardly inconvinient, taking in account that out of the said 13000 live in all parts of the world. As long as we get to post and see posts to our posts, we are a rather contented lot.

Which means whether it's 100 views or 1000 views, it doesn't make this post any less effective. It's not lost or misplaced, it's just right where it is.
Image

A smilie costs nothing, but in the internet, it is EVERYTHING.

Hear ye, hear ye! These are some Roleplays I think are worth looking at, read all about it!


Thread advertisment:
User avatar
flickery
Member for 4 years


Eric, this could've been summed up in a few words

Server go sploosh, Eric fix, Server get better, Users happy

Now, Yami go lurk, hurgh
Image
User avatar
Yami-Dokuro
Member for 5 years


Lol @ Yami.

I found it a rather helpful explanation. I probably could have understood slightly more technical jargon, but I'm glad I could learn what the heck went wrong.

Now, what about what happened over Saturday and Sunday? It seems I and most other RPers I know could not find more than 1 and 2 hour windows to log on throughout the whole weekend? What happened there?


O heart, lament not, for this world is only metaphorical.
O soul, grieve not, for this abode is only transient.

User avatar
StandardFiend
Scholar
Member for 3 years


Okay because I was wondering why after every post I make, the system logs me out.
It's very annoying, but I understand. :3
Don't take life too seriously,no one makes it out alive anyway.
ImageImage
User avatar
Pandar
Member for 3 years


StandardFiend wrote:Lol @ Yami.

I found it a rather helpful explanation. I probably could have understood slightly more technical jargon, but I'm glad I could learn what the heck went wrong.

Now, what about what happened over Saturday and Sunday? It seems I and most other RPers I know could not find more than 1 and 2 hour windows to log on throughout the whole weekend? What happened there?


This indeed, Remæus. Over the course of the past weekend, the RPGateway site wouldn't load during certain hours, it just appeared as if it were and when it finally did (half an hour later, I believe) it was a blank page. The on another occasion, it read something about an 'XML' error; frankly, I didn't pay much attention to it. Has there been any more downtime as of recently? Or should I be mailing angry letters to my ISP?
User avatar
Safisan
Member for 3 years


Remæus wrote:There is one small drawback, in that occasionally your searches will come back with no results, but this is an issue we are working on and hope to have an update for you in the future.


"Occasionally"? Not to sound rude or anything, but I'm sorta annoyed. It's been over an hour 3 hours and I still can't view my posts (or anyone else's :roll: ). Anyone else have this problem? :evil:
ImageImageImageImageImageImageImageImageImage
User avatar
Mercurial
Member for 3 years


Yeh... my dad operates a couple of Apache servers so I know what you're talking about. Our MySQL databases are nowhere near as large as the ones for RPG probably though... And we're running on 4 or 5 year old processors... XD

Like Standard, more techie jargon probably would not have fallen deaf on my ears.
ImageImage
"All animals are equal but some animals are more equal than others."
-George Orwell
User avatar
KungFu.Chinaman
Member for 4 years


I appreciate you letting us know what had happened Eric, and considering the fact that you were able to figure out the problem and solve it as quickly as you did makes me proud that I chose to roleplay here. I do realize that sometimes any server will encounter problems, so I am never angry when something comes up on my computer screen that is not what I intended. If anything else, you have me backing you, so I quote from the words of a far greater roleplayer than I:

Yami-Dokuro wrote:Server go sploosh, Eric fix, Server get better, Users happy


Keep up the good work!
Image
User avatar
pioneercadet
Member for 4 years


Mercurial wrote:
Remæus wrote:There is one small drawback, in that occasionally your searches will come back with no results, but this is an issue we are working on and hope to have an update for you in the future.


"Occasionally"? Not to sound rude or anything, but I'm sorta annoyed. It's been over an hour 3 hours and I still can't view my posts (or anyone else's :roll: ). Anyone else have this problem? :evil:


Yup, me too. It's not bringing up the "Information" page that says "Ne results could be found" or anything like that. It brings up the "StandardFiend - Posts" page and says there are no posts. I didn't bother bookmarking all of my RPs, so I've been completely left behind in most of them.
User avatar
StandardFiend
Scholar
Member for 3 years


I thought it'd get better overnight, but it still doesn't work. Can't view my posts. D:
User avatar
Mercurial
Member for 3 years


Mercurial wrote:I thought it'd get better overnight, but it still doesn't work. Can't view my posts. D:


Well just try to understand, that Rem is working on the site, adding things and taking things away to make this a better place. Just calm down, and let him work, so he can fix everything. You won't regret it, trust me.
Image
Lamentations
Member for 4 years


Rem, maybe you could mass-pm the site about this? Like five topics about this D:
Bai Bai bby
Mid
Member for 5 years


I wanna know why the server keeps giving me general errors randomly, and network timeouts. ~> : (
“Man can live with about forty days without food, about three days without water, about eight minutes without air, but only for one second without hope.”

Image
User avatar
Zenatsu
Member for 4 years


Yeah,I'd like to know why I keep getting network timeouts recently and also whenever I click on the 'view your posts' link,I can no longer view my posts and instead,get an information notice that says,"No suitable matches were found."
"Never take life seriously. Nobody gets out alive anyway."~ Anonymous

"When I die, I want to go peacefully like my Grandfather did, in his sleep -- not screaming, like the passengers in his car"~ Anonymous
User avatar
NightLady
Member for 4 years


flickery wrote:To be honest, us Roleplayers don't really read the notices regarding site maintenance and all unless it bothers us. 2 hours is hardly inconvinient, taking in account that out of the said 13000 live in all parts of the world. As long as we get to post and see posts to our posts, we are a rather contented lot.

Which means whether it's 100 views or 1000 views, it doesn't make this post any less effective. It's not lost or misplaced, it's just right where it is.


I quite enjoyed Erics explanation xD

Especially the warning. That made me chuckle :]
There's Something About...
Tamara Hale... And It Reeks Of

Revenge


User avatar
tigerz-peace
Member for 5 years


Zenatsu wrote:I wanna know why the server keeps giving me general errors randomly, and network timeouts. ~> : (


Seriously, I've been getting consistent General Errors and blank pages for the last 12 hours. Didn't they say they had fixed this site up?
User avatar
StandardFiend
Scholar
Member for 3 years



Post a reply

RolePlayGateway is a site built by a couple roleplayers who wanted to give a little something back to the roleplay community. The site has no intention of earning any profit, and is paid for out of their own pockets.

If you appreciate what they do, feel free to donate your spare change to help feed them on the weekends. After selecting the amount you want to donate from the menu, you can continue by clicking on PayPal logo.

 

Who is online

Users browsing this forum: No registered users and 0 guests