Jump to content

Let's talk about the BE kicks


bamf

Recommended Posts

3 hours ago, Haych said:

@bamf @Olio I recommend putting this announcement on a banner on the front page if you want it to have more visibility for the whole community to see, you lot should use that feature on IPBoard more often. If no one bumps it, no one reads it like most announcement posts because they get buried quick as Change Logs are always more popular. Montages get more views than important announcement posts for this reason , +1 for more visibility.

for example 

2dXXRhm.png

I'll take a look at it tonight. Thanks

Haych and Google™ like this
Link to comment
On 12/14/2016 at 6:27 PM, bamf said:

There has been much discussion around the fact that there isn't a dedicated post to this topic, so I thought I would go ahead and start one to let the community know what is going on - as well as to begin a dialog on the subject.  

First and foremost let me say this:  no number of BE "mass kicks" is acceptable to @Paratus or myself.  We consider one in a day to be too many.  Also, I define something to a be a "mass kick" when 20 or more players are kicked from a single server in a 30 second interval with the dreaded "client not responding" message.  With that out of the way, let me give you a little background on the number and scope of the kicks (as I'll call them from here on).  

Here are the number of kicks we have had since 12/9/16 (with the count per server 1-4 in parenthesis to the side):  

9th:  9 (5, 2, 2, 0)
10th: 10 (3, 2, 1, 4)
11th: 11 (4, 1, 2, 4)
12th: 1 (S4)
13th: 1 (S1)
14th: 1 (S1)

The 9th-11th are the weekend, and I think we can all agree that the weekends are where we see more kicks.  I have a theory as to why that is the case, and I'll get to that in a minute, but let's take a look at the numbers.  I hear all the time that the mass kicks happen constantly, and while they are certainly prevalent on the weekends I do tend to think that the perception on the number of kicks greatly outweights the reality.  Now three weeks ago I would agree with you that there were more occurences of the kicks since I can look at just S4 and see that there were 16 kicks in a single Saturday for just that one server.  We have made incremental improvements though, so let's talk about what we have done and what we are continuing to do.  

First of all, we implemented kicks based on network activity.  One thing that we have noticed is that a single lagging player has the ability to affect network connectivity to all players in the game.  The kicks we implemented based on ping have been very helpful in ensuring that player is removed from the server faster - thus allowing network connectivity to continue for the rest of the players on the server.  There are additional network kicks that we can put in place (desync and packet loss for instance), but at this time we are only logging those items so that we can try to find a good balance before implementing new kicks based on those items.  

Second, we have been more aggressively monitoring the memory utilization of the Arma server process.  There is a disconnect in what the in game #monitor command reports as utilized memory and what the OS sees as utilized by Arma.  I've seen this discrepency at times be more than 30%, which can tend to point to a memory leak somewhere in the server executable.  That's not something we can fix, but we can try to minimize the leaks by using various memory allocators (mallocs).  We changed the malloc on S1 on Monday, but we are still seeing memory that does not appear to be released to the OS.  Perhaps that's how Arma is coded (to hold onto the memory that it allocates as it goes), but once you pass ~3GB the game gets desynced for everyone and the BE kicks start to happen.  

Third, we are looking at how we access the database from inside the game.  Those of you familiar with how the backend for a Life server works will likely have heard of extDB before.  That's the layer that sits between Arma and the database.  Typically this role is filled by an Open Database Connectivity (ODBC) driver, and I guess you could call extDB (a makeshift) one for Arma.  Since Arma does not work directly with the ODBC driver provided by the database software manufacturer, you are then left with extDB (or Arma2Net) to connect to the database and keep all your houses, vehicles, items, player data, etc all in order.  This layer also has a malloc that uses memory inside the Arma application.  There is a chance that the memory we are seeing as not used inside the game, but used according to the OS could be leaked from the malloc for extDB.  We are investigating this, as well as looking to update extDB to more recent version.  We have a customized version of extDB, so we will need to redo those customizations as we upgrade that as well.  

Finally, we continue to analyze the logs we have to see what correlations we can find inside them.  Here is a summary of those findings:  

  1. The kicks are more frequent when the servers are full, but the servers being full does not indicate that a kick is about to (or even likely) to happen.  
  2. The kicks tend to happen more when the OS sees ~3GB of RAM being used by the Arma server application.  Some of this is anecdotal since I need to be made aware of the kick and then check the RAM utilization inside of the OS.  I will say this though, this last weekend I checked the RAM on the server processes when I could, and once it got high I asked the admins to hard restart the server.  I think that did make a difference in the overall number of kicks for the weekend, but that's not a tenable solution to go forward.  
  3. The kicks tend to happen more on the weekend, which is when the servers are not only full - but also have higher turnover resulting in more unique users (UUs as @Gnashes likes to say) on the servers throughout those days.  As more players come and go to the server, more RAM is utilized by the game.  The servers are full during prime time on weeknights, but we just don't see the same number of kicks during the week.  We do have have fewer UUs during the week as well.  

Given all of that, I believe the Arma server process starts to lag once the RAM utilization gets above some unknown threshold.  The game and OS allocate memory as players come and go, as well as during the normal course of a server session.  As the server begins to lag, the network communication begins to lag as well - and at some point we get players receiving "client not responding" kicks.  

Some of you may have seen that 64-bit versions of the game hit the development branch of Arma today.  I am hopeful that a 64-bit server version will alleviate the issues of RAM constraints and allow the server to not lag (and therefor not kick players).  Of course, all of this is still in development and could be subject to other bugs and issues.  Further, we will have to make sure that all the external DLLs we use with the Arma server are updated to 64-bit and free of bugs as well, but having a 64-bit server will finally allow us to utilize more of the horsepower we have on each and every server that we have in the data center.  

I'll talk to Paratus in the morning, but perhaps we can be a tad more aggressive with the hard restarts we do.  Currently we do a hard restart after 3 server sessions, but perhaps we should move that to after 2 server sessions.  We will also continue to monitor the logs, RAM utilization, and player counts to see if we can try to really narrow down when we get into situations where the kicks begin to happen.  

I hope this was informative for you all, and I'm happy to hear your feedback and suggestions.  

Bamf

P.S.  I'm going to ask that everyone stay on topic and not turn this into a flaming or finger pointing thread.  Posts that are off topic or just generally unhelpful will be heavily moderated so that we can keep this as an open discussion with the community.  

Edit:  One final thing of note is that this does happen in other communities as well, so it is not isolated to just Asylum.  

can't you guys just contact Olympus and ask them how they have a server with a bigger mission (roughly 35mb) file that runs flawlessly. I am not trying to say you guys don't know what your doing by any means. Asylum is far better than olympus. but they have frames like liquid gold.. are they paying a ridiculously higher cost than asylum? is it cuz they only have 3 servers? is it less players? I dont understand how I can get such ridiculously amazing frames all the time and never had disconnects or any issues whatsoever? Olympus is literally flawless as far as connection and frames.

Edited by Eric916
Link to comment
3 hours ago, Eric916 said:

BI is to blame then! 

but that doesn't explain the frames. 54/100 players on Asylum and 54/111 on Olympus look completely different.

Probably because Asylum is more script heavy and possibly uses weaker servers than Olympus. I'd like it if Paratus spent a few days and gutted some scripts which are query heavy.

It would be awesome if we got rid of the scroll wheel menu (AddAction) and replaced it with something else. Right now there's a lot of bloat on Asylum that could be removed and I just see more being tacked on like criminal records which is extremely slow and laggy. I get they want to increase RP but it shouldn't be at the expense of gameplay and stability.

Edited by Hanzo/Dirty Scrubz
Link to comment
1 hour ago, Hanzo/Dirty Scrubz said:

possibly uses weaker servers than Olympus

I always giggle when people post that assumption.  The servers we have are the highest end servers for the Arma workload that were available when we ordered them (in terms of processor).  They are designed specifically to run as game servers as well.  

Arma uses only 3.2GB of RAM, yet we have much more installed in the servers as well.  

The hardware is not the issue.

Heidelberg and Matthew like this
Link to comment
1 hour ago, bamf said:

I always giggle when people post that assumption.  The servers we have are the highest end servers for the Arma workload that were available when we ordered them (in terms of processor).  They are designed specifically to run as game servers as well.  

Arma uses only 3.2GB of RAM, yet we have much more installed in the servers as well.  

The hardware is not the issue.

Well that's why I prefaced it with possibly but to be more specific, it would be in regard to the CPU itself (single thread clock). What CPUs are the Asylum servers using right now and what are the clocks?

Link to comment
5 hours ago, Hanzo/Dirty Scrubz said:

Well that's why I prefaced it with possibly but to be more specific, it would be in regard to the CPU itself (single thread clock). What CPUs are the Asylum servers using right now and what are the clocks?

The clock is at 4Ghz, and it's the current gen of processor.

Link to comment
7 hours ago, henky said:

hey @bamf are there any new updates?

yesterday it was crazy on s2 s3 s4,  people lagging and getting kicked while bountyhunting.

The memory was out of hand yesterday, and I think that manual hard restarts were not used as much as they should have been.  We had at least 1 kick on all 5 servers yesterday - and those happen when the memory gets too high.  

The admins have a new tool to watch the memory on the servers, but since I don't have an automated way (yet) to let the servers know to restart we have to manually restart them when the memory gets too high (which I think is due to a memory leak still, but I'm actively trying to find ways to reduce overall memory footprint from our mission).  

I'm not sure I've said this here yet or not - but we have been working with a large German server who is also having this issue across all of their servers.  We provided some information for them to look at the memory as a potential cause, and continue to collaborate with them to hopefully find a solution for this issue.  Of course, I think all of this *should* go away when the 64-bit server comes out - but who knows what other issues may (read that as will) crop up with that change.  

JIMBO, Sandwich, henky and 1 other like this
Link to comment
2 minutes ago, bamf said:

The memory was out of hand yesterday, and I think that manual hard restarts were not used as much as they should have been.  We had at least 1 kick on all 5 servers yesterday - and those happen when the memory gets too high.  

The admins have a new tool to watch the memory on the servers, but since I don't have an automated way (yet) to let the servers know to restart we have to manually restart them when the memory gets too high (which I think is due to a memory leak still, but I'm actively trying to find ways to reduce overall memory footprint from our mission).  

I'm not sure I've said this here yet or not - but we have been working with a large German server who is also having this issue across all of their servers.  We provided some information for them to look at the memory as a potential cause, and continue to collaborate with them to hopefully find a solution for this issue.  Of course, I think all of this *should* go away when the 64-bit server comes out - but who knows what other issues may (read that as will) crop up with that change.  

Thank you for the update. I hope you guys can find a fix for this !

Link to comment
2 minutes ago, Rated349 said:

How come the issue only happens on Asylum? Other servers never give me this issue period my forum post got locked and this was quoted so I'm posting it on here @HapHazard

It's not just Asylum.

We've asked several other communities (including Olympus and Zero-One, the 250 slot German servers). Many Altis Life communities are having this issue.

If you're trying to compare Altis Life server stability to say... KOTH or Wasteland, you're trying to compare a Monster Truck to a Prius in terms fuel (memory)  used to just make the thing operate as it's intended.

 

 

This week has been bad because we're seeing as much as 40% more unique players connecting when compared to the last 3-4 months. More players exacerbates the issue, even with the hard restarts every other restart,

Link to comment

@bamf , @Gnashes 

Let me ask you this, just out of pure curiosity for how things work

1. How long is a hard restart compaired to a soft restart.

2. Is there a cost to restarts other then time?

3. What do you think would happen if you decided to push the server restarts to every 6 hours + hard restart EVERY TIME.

4, Do you think a lot things like floating lights and things may be adding to the memory problems?

5. Would you ever consider setting a max ping to stop players already lagging from effecting the server negatively?

 

Thank you

Link to comment
Just now, Sugarfoot said:

@bamf , @Gnashes 

Let me ask you this, just out of pure curiosity for how things work

1. How long is a hard restart compaired to a soft restart.

2. Is there a cost to restarts other then time?

3. What do you think would happen if you decided to push the server restarts to every 6 hours + hard restart EVERY TIME.

4, Do you think a lot things like floating lights and things may be adding to the memory problems?

5. Would you ever consider setting a max ping to stop players already lagging from effecting the server negatively?

 

Thank you

1. Minimally longer

2. People leave and don't come back on hards

3. It'd likely kick at 4-5 hours

4. No. actual assets aren't the issue.

5. Already exists.

Link to comment
7 minutes ago, Gnashes said:

1. Minimally longer

2. People leave and don't come back on hards

3. It'd likely kick at 4-5 hours

4. No. actual assets aren't the issue.

5. Already exists.

Alright follow up,

1. So is the reason more or so an attempt to keep players on the server - if so why? Like are the numbers hurting or the income?

2. Is there an inbetween for the restarts / maybe a better clean up script or is that not really an issue.

3. Maybe decrease the ping that you kick limit to make it cleaner? or bad Idea for population speaking

Link to comment
30 minutes ago, Sugarfoot said:

Alright follow up,

1. So is the reason more or so an attempt to keep players on the server - if so why? Like are the numbers hurting or the income?

2. Is there an inbetween for the restarts / maybe a better clean up script or is that not really an issue.

3. Maybe decrease the ping that you kick limit to make it cleaner? or bad Idea for population speaking

1.  No, but full servers tend to stay full.  Hard restarts necessarily boot everyone off the server, so every other time is where we'd like to be worse case.  

2.  We're looking to clean things up better, but Arma seems to leave things in memory that should be cleaned up when a player leaves the server.  

3.  Ping is at 350 now.  That seems to be pretty decent for people in the EU and NA.

Link to comment
1 hour ago, bamf said:

1.  No, but full servers tend to stay full.  Hard restarts necessarily boot everyone off the server, so every other time is where we'd like to be worse case.  

2.  We're looking to clean things up better, but Arma seems to leave things in memory that should be cleaned up when a player leaves the server.  

3.  Ping is at 350 now.  That seems to be pretty decent for people in the EU and NA.

Now i dont mean this in anyway disrespectful - as I do beleive in my opinion Asylum is the best thing to happen to Arma 3.

With that being said I've noticed this:

 

Alot of popular servers ONLY do hard restarts and it doesn't seem to effect the player base. Maybe Do hard restarts only instead of soft for a server and see not only the performance changes but the opinion of the people on the server. Maybe people wont actually care and itll run better. Maybe it wont.

 

Again, I dont play anymore, but I still do appreciate when people take there time to improve a game thats free for everyone to enjoy.

Edited by Sugarfoot
Steve likes this
Link to comment
21 hours ago, bamf said:

1.  No, but full servers tend to stay full.  Hard restarts necessarily boot everyone off the server, so every other time is where we'd like to be worse case.  

2.  We're looking to clean things up better, but Arma seems to leave things in memory that should be cleaned up when a player leaves the server.  

3.  Ping is at 350 now.  That seems to be pretty decent for people in the EU and NA.

Yesterday I was playing with a bunch of friends on server 4. We got a 10-minutes restart message, 'cause the server seemed to be starting to kick off people (yes, it was full). After the hard restart, in a few minutes (like 5 or 10), it was full again. Then it kicked everyone. Everybody came back. A few minutes passed by and we all got kicked again. Then the server got hard restarted again (1-minute restart message). Didn't seem to solve the problem, 'cause moments later the same thing happened. Moment where we all decided to take a break.

If the problem is a memory leak, shouldn't it occur several minutes after a hard restart? Or is the memory leak related to the number of people connected to a single server? Are you guys sure it's not about server infrastructure?

@bamf

Edited by BlackShot
rodrigo silva and Mr. Linux like this
Link to comment
1 hour ago, Crasher2003 said:

All lies, those numbers are wayyyyy higher now. Lmao, I've been kicked 6 times in the past hour at least.

I can vogue for this. I mainly play P4 and today there was at least 5+ kicks that booted near 65+ people off. I believe one kicked almost 90 off at one point. 

 

:FeelsBadMan:

Link to comment
On 12/29/2016 at 3:06 AM, Sugarfoot said:

Now i dont mean this in anyway disrespectful - as I do beleive in my opinion Asylum is the best thing to happen to Arma 3.

With that being said I've noticed this:

 

Alot of popular servers ONLY do hard restarts and it doesn't seem to effect the player base. Maybe Do hard restarts only instead of soft for a server and see not only the performance changes but the opinion of the people on the server. Maybe people wont actually care and itll run better. Maybe it wont.

 

Again, I dont play anymore, but I still do appreciate when people take there time to improve a game thats free for everyone to enjoy.

Please just try it @bamf

Sandwich likes this
Link to comment
On 12/14/2016 at 8:27 PM, bamf said:

Here are the number of kicks we have had since 12/9/16 (with the count per server 1-4 in parenthesis to the side):  

9th:  9 (5, 2, 2, 0)
10th: 10 (3, 2, 1, 4)
11th: 11 (4, 1, 2, 4)
12th: 1 (S4)
13th: 1 (S1)
14th: 1 (S1)

The 9th-11th are the weekend, and I think we can all agree that the weekends are where we see more kicks.  I have a theory as to why that is the case, and I'll get to that in a minute, but let's take a look at the numbers.  I hear all the time that the mass kicks happen constantly, and while they are certainly prevalent on the weekends I do tend to think that the perception on the number of kicks greatly outweights the reality.  Now three weeks ago I would agree with you that there were more occurences of the kicks since I can look at just S4 and see that there were 16 kicks in a single Saturday for just that one server.  We have made incremental improvements though, so let's talk about what we have done and what we are continuing to do.  

 

17 hours ago, Crasher2003 said:

All lies, those numbers are wayyyyy higher now. Lmao, I've been kicked 6 times in the past hour at least.

Well, he never gave the numbers/statistics for the future.......

Soooooooooo, I wouldn't really say they were lies........

people history fuck old stupid

Link to comment
39 minutes ago, henky said:

@bamf any new updates you can share with us ?

Sure.  The new change requiring you to "Show Storage" in your houses has had a very positive impact on overall memory performance.  Over the course of this past weekend, the number of BE mass kicks was down significantly.  Server 4 in particular has been much more stable since the update.  

As I've said all along, one mass kick is too many for me - so we will continue to do things to try to bring overall memory utilization down.  We have a few other things in mind that should be completely on the back end, but could be enormously successful in bringing down what the server executable stores in memory.  Also, we are hopeful that the 64-bit server application will alleviate the need to do more as well, but I'm sure it will have additional issues that we will need to resolve.  

BlackShot, .Sean, henky and 2 others like this
Link to comment
1 minute ago, bamf said:

Sure.  The new change requiring you to "Show Storage" in your houses has had a very positive impact on overall memory performance.  Over the course of this past weekend, the number of BE mass kicks was down significantly.  Server 4 in particular has been much more stable since the update.  

As I've said all along, one mass kick is too many for me - so we will continue to do things to try to bring overall memory utilization down.  We have a few other things in mind that should be completely on the back end, but could be enormously successful in bringing down what the server executable stores in memory.  Also, we are hopeful that the 64-bit server application will alleviate the need to do more as well, but I'm sure it will have additional issues that we will need to resolve.  

I'm sorry to be late to the troubleshooting but before you do more things on the backend.

Have you tried turning it off and on again? 

Link to comment
1 hour ago, bamf said:

Sure.  The new change requiring you to "Show Storage" in your houses has had a very positive impact on overall memory performance.  Over the course of this past weekend, the number of BE mass kicks was down significantly.  Server 4 in particular has been much more stable since the update.  

As I've said all along, one mass kick is too many for me - so we will continue to do things to try to bring overall memory utilization down.  We have a few other things in mind that should be completely on the back end, but could be enormously successful in bringing down what the server executable stores in memory.  Also, we are hopeful that the 64-bit server application will alleviate the need to do more as well, but I'm sure it will have additional issues that we will need to resolve.  

Lets just start over... 

Everyone will come back and play and there wil be fights all day. 

Erase the code and delete it and go back to 1.0

Link to comment
Guest
This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...