Hey, Sponge community!
Few days ago, after successful transitioning from Spigot to SpongeForge and re-opening my not-so-big almost-vanilla server, I ran into an annoying problem. Server starts to freeze after roughly 3 to 6 hours of intensive gameplay since scheduled restart. The message says “Can’t keep up …”. Yes, I understand that the server could not do what it’s told to be done, but it confuses me even more: TPS always stays at 20, RAM usage does not come close to its limit.
Actually, I suspect it has something to do with high CPU load (it raises to a crazy number for a second or two during the lag, then immediately goes down again), though I can’t think of anything that can cause this to happen.
- Sometimes spikes are following a pattern (showing up every 15 seconds), sometimes - they’re not.
- Insane CPU load during the lag. (500-700%). Lagless load is 50-70%, ten times lower.
- Entities & TileEntities consume most of the server tick. Nothing else stands out besides that.
- Clearly, there is a big amount of entities out there (as seen in timings reports) because of the world size, but removing them is not an option. Players won’t be happy with that.
- Average lag always drops out from 40 to 80 ticks (2-4 seconds).
- “TPS Loss” happens.
- I could not replicate this on any copy of the server. Think it somehow connected to players count.
What I’ve tried so far (obviously, none of that helped):
- Lowering entity spawns / trigger distances.
- Lowering tileentity tick rate.
- Abolutely disabling entity spawns.
- Removing suspicious plugins.
- Tweaking launch options, lowering / raising -Xms(x) values. This is what I came to.
-Xms5G -Xmx5G -XX:+UseG1GC -XX:+UnlockExperimentalVMOptions -XX:MaxGCPauseMillis=100 -XX:+DisableExplicitGC -XX:TargetSurvivorRatio=90 -XX:G1NewSizePercent=50 -XX:G1MaxNewSizePercent=80 -XX:InitiatingHeapOccupancyPercent=10 -XX:G1MixedGCLiveThresholdPercent=50 -XX:+AggressiveOpts -XX:+AlwaysPreTouch -XX:+UseLargePagesInMetaspace -jar forge-1.12.2-188.8.131.5207-universal.jar
- Exploring Timings / WarmRoast reports (as I said, I didn’t find anything outstanding besides entities. But I could miss something).
So, the lag process itself is pretty clear, but I can’t find the exact reason, which is why I seek your assistance.
https:// timings.aikar.co/?id=c493d83adc4c4feaacdc8207ab824867 (~16h uptime)
All threads .html file http://dropmefiles.com/VsqeC (~16h uptime)
Thanks for your attention, hope we find a solution ASAP.