Minecraft Server Monitoring: How I Keep My World Online After the Build Event Disaster
Last summer I organized a community build event on our Minecraft SMP. About forty people showed up. We had voice chat going on Discord, everyone was hyped, and the build theme was "floating islands." Two hours in, right when someone was finishing this wild redstone contraption suspended over the void, the server crashed. No warning, no slowdown, just gone.
I didn't find out for almost two hours. I'd stepped away to make food, came back to Discord blowing up with "is the server dead?" messages. Half the builders had already left. A few lost progress because the last save hadn't flushed. The crash log pointed to a Paper plugin that had a memory leak nobody caught.
That was the night I started taking server monitoring seriously. Not "I'll check on it when I remember" monitoring. Real, automated, "text me at 3 AM if something breaks" monitoring. If you run a Minecraft server for any number of people, you probably need this too.
Why Minecraft Servers Crash (And Why You're Usually the Last to Know)
Here's the thing about Minecraft servers. They're not like a website where Apache or Nginx handles requests and you move on. A Minecraft server is a living simulation. It's tracking every block, every entity, every chunk, every player inventory, every redstone circuit, every hopper chain, all in real time. The number of things that can go wrong is honestly kind of staggering.
In my years running servers, here are the crashes I've personally dealt with:
- Paper plugins conflicting after an update (the build event disaster)
- A player building an iron farm so massive it tanked TPS to single digits
- Running out of RAM because I allocated 4GB and loaded too many chunks
- World corruption after a hard crash during chunk generation
- My VPS host doing "emergency maintenance" at peak hours with zero notice
- A DDoS attack from someone who got banned and took it personally
- Forgetting to update the server JAR and getting hit by an exploit
Every single one of these happened while I was either asleep, at work, or AFK. I found out from Discord messages. Every. Single. Time.
The worst part isn't the crash itself. It's the gap between the crash and when you find out. Two hours of downtime during a build event. Four hours overnight where nobody's online to notice but the server is sitting there dead, not saving backups, not running scheduled tasks. That gap is what monitoring eliminates.
What Minecraft Server Monitoring Actually Means
When I say "monitoring," I don't mean opening your server list in the Minecraft client and refreshing it every ten minutes. That's what I used to do, and it's as unreliable as it sounds.
Real monitoring is an external service that checks your server at regular intervals and alerts you when something is wrong. The key word is external. Something completely separate from your server, running on different hardware, in a different location. Because if your server goes down, any monitoring that lives on that same server goes down with it.
What good monitoring checks
A simple ICMP ping to your server IP isn't enough. Your VPS can respond to pings perfectly fine while the Minecraft Java process has crashed and nobody can connect. I learned this the hard way when my monitoring showed 100% uptime for a week and my players told me the server had been down three times.
You need monitoring that checks what actually matters:
- The game port (25565 for Java, 19132 for Bedrock) is accepting connections
- The server responds to a Minecraft protocol handshake, not just a generic TCP connection
- Response time is reasonable (a server that takes 10 seconds to respond is effectively down)
- The check handles brief network glitches without firing false alerts
Java vs Bedrock: Two Different Monitoring Challenges
If you run both Java and Bedrock (maybe through GeyserMC or separate instances), you need to understand that these are fundamentally different from a monitoring perspective.
Java Edition uses TCP on port 25565 by default. This is straightforward to monitor. A TCP port check will confirm the port is open and accepting connections. UptyBots can hit that port every minute and alert you the moment it stops responding.
Bedrock Edition uses UDP on port 19132. UDP monitoring is trickier because UDP is connectionless. There's no handshake to verify. Some monitoring tools can't even check UDP ports properly. You need a service that actually understands how to probe UDP services.
I run both on my server through GeyserMC, and I've got separate monitors for each. There have been times where Java was fine but Bedrock players couldn't connect because the GeyserMC bridge had silently died. Without separate monitors, I'd have never caught that.
The Plugin Problem: Paper, Spigot, and the Art of Breaking Things
Let me tell you about plugins, because they're responsible for probably 70% of the crashes I've dealt with.
I run Paper (switched from Spigot about two years ago for the performance improvements). Paper is great. But the plugin ecosystem is a minefield. You've got plugins written by one person three years ago that haven't been updated since 1.18. You've got plugins that work fine alone but conflict with each other in ways nobody predicted. You've got plugins that slowly leak memory until your server falls over after 72 hours of uptime.
That build event crash? It was a permissions plugin that had been updated the week before. The update introduced a memory leak that only triggered when more than 30 players were online simultaneously. With our usual 10-15 players, everything was fine. The moment we hit 40 for the event, RAM usage climbed steadily until the JVM ran out of heap space and died.
Here's what monitoring would have caught, and eventually did catch once I set it up:
- Port checks would have detected the crash within 60 seconds
- I would have gotten a Telegram notification on my phone immediately
- The crash log would have pointed me to the OutOfMemoryError
- I could have restarted the server and investigated the plugin later
- Total downtime: 2-3 minutes instead of 2 hours
TPS Drops: When Your Server Is "Up" But Unplayable
TPS. Ticks per second. The heartbeat of your Minecraft server. A healthy server runs at 20 TPS. Below 18, you start noticing lag. Below 15, it's rough. Below 10, the game is basically unplayable. Mobs freeze, blocks take forever to break, items don't drop, and your players start asking what's wrong.
The tricky part: your server is still "up" during all of this. Port checks pass. HTTP checks (if you've got a web panel) return 200. But the gameplay experience is terrible.
I've had TPS drops caused by:
- A player who loaded 200+ chunks by flying across the map with elytra
- Somebody's automated sorting system with 300 hoppers all running at once
- A mob farm that accumulated 2000+ entities in a single chunk
- A Spigot-era plugin that ran synchronous database queries on the main thread
- World pre-generation running while players were online
External monitoring can't directly check TPS (that's internal to the game engine). But you can set up a clever workaround. Create a simple HTTP endpoint on your server (most admin panels like Pterodactyl expose one, or you can use a plugin like Plan) that reports TPS. Then monitor that endpoint. If TPS drops below your threshold, you get alerted.
Combined with port monitoring for actual crashes, you end up with pretty solid coverage of both "server is dead" and "server is alive but suffering" scenarios.
Memory Leaks: The Silent Killer
My server runs on a VPS with 8GB of RAM, 6GB allocated to the JVM. On a fresh restart, it uses about 2.5GB. Over time, that number climbs. With well-behaved plugins, it stabilizes around 4-4.5GB and the garbage collector keeps things manageable.
With badly-behaved plugins, it just keeps climbing. 5GB. 5.5GB. 6GB. Then the garbage collector starts thrashing, TPS drops, and eventually you get the OutOfMemoryError and the server crashes.
This is exactly what happened during the build event. The leak was slow enough that with normal player counts, the server would crash after about 4 days. Nobody noticed because I was restarting the server every day or two anyway for updates. But during the event, with 40 players generating more objects in memory, it hit the ceiling in about 2 hours.
After that incident, I added two things to my monitoring setup:
- A port monitor on 25565 that checks every minute (catches crashes)
- An HTTP check against a Plan analytics endpoint that includes memory usage (catches the slow climb before it becomes a crash)
Now I can see memory trending upward over days and do a clean restart before it becomes a problem. Proactive beats reactive every time.
Setting Up Monitoring: What I Actually Did
After the build event disaster, here's exactly how I set up monitoring. This took me about 15 minutes.
- Signed up for UptyBots. Free tier was enough to start with. No credit card needed.
- Added a TCP port monitor. Pointed it at my server IP on port 25565 (Java). Set checks to every 1 minute.
- Added a second port monitor for Bedrock. Port 19132. Same 1-minute interval.
- Set up notifications. I added a Discord webhook to our admin channel and a Telegram notification to my personal account. Discord for the team, Telegram for when I'm away from the PC.
- Tested it. I stopped the server process manually and waited. Within 90 seconds I had alerts on both Discord and Telegram. Started the server back up, got recovery notifications. Perfect.
- Added an HTTP monitor. I run Pterodactyl, so I added a check against the panel's status API endpoint to catch cases where the VPS is up but the panel (and by extension the game server) has issues.
- Added more monitors over time. Our voting site, the Dynmap page, and the BungeeCord proxy all got their own monitors as I realized each one could fail independently.
Beyond Port Checks: What Else You Should Monitor
Once I got the basics working, I kept adding layers. Here's what my monitoring looks like now:
- Port 25565 (Java) and 19132 (Bedrock). The essentials. Checks every minute.
- Server panel HTTP endpoint. Catches panel crashes and underlying issues.
- Voting site. If the vote page is down, you lose server list rankings. Separate monitor.
- Dynmap. Players love the live map. When it crashes (and it does crash separately from the main server), I want to know.
- BungeeCord/Velocity proxy. If you run a network with multiple servers, the proxy is the single point of failure. Monitor that port separately.
- Donation store. Runs on a different platform (Tebex). Separate monitor. If it's down, you're losing revenue.
Each one of these has failed independently of the others at some point. If I only monitored the main game port, I'd miss half the problems.
Common Crashes and What Your Logs Will Tell You
When monitoring catches a crash, you need to know what happened so you can fix it. Here are the crashes I see most often, and what to look for in your logs:
- OutOfMemoryError. The server ran out of heap space. Look for which plugin or system was allocating memory before the crash. Increase your -Xmx if you have headroom, or find the leak.
- Crash on startup after update. You updated a plugin or the server JAR and now it won't start. The crash report will name the offending class. Roll back the update.
- Chunk load failure. World corruption, usually after a hard crash during chunk writing. The log will show the region file and chunk coordinates. Restore from backup or delete the region file (you'll lose those chunks).
- Watchdog kill. Paper/Spigot's watchdog detected the server was frozen (no ticks for 60+ seconds) and killed it. The thread dump in the log shows what was happening when it froze. Usually a plugin doing something blocking on the main thread.
- Connection refused. The Java process isn't running. Check if it crashed (look for crash reports) or if it was killed by the OS (check dmesg for OOM killer).
- TPS death spiral. Server is technically running but processing ticks so slowly it's unresponsive. Usually entity counts, poorly optimized plugins, or insufficient hardware. The timings report (Paper) will show you where the time is going.
- Network issues on the host. Server process is fine but players can't reach it. Monitoring from an external location catches this instantly because it can't reach the port either.
Real Situations From My Server (And How Monitoring Helped)
These all happened after I set up monitoring. The difference in response time compared to the pre-monitoring era was night and day.
- Friday night plugin crash. We had about 25 players online. An economy plugin threw an unhandled exception during a shop transaction. Server crashed. I got a Telegram alert within 60 seconds, SSH'd in from my phone, restarted the server. Total downtime: about 3 minutes. Before monitoring, this would have been a "someone pings me in Discord 20 minutes later" situation.
- Slow memory leak over four days. A new chat plugin was leaking memory. My HTTP monitor against the Plan endpoint showed memory usage climbing steadily. I did a clean restart at day 3, investigated the plugin, found it was caching message histories without ever clearing them. Removed the plugin before it caused a crash. Without the trend data from monitoring, I'd have just experienced a random crash on day 4.
- Hosting network outage. My VPS provider had a routing issue that made the server unreachable from Europe. North American players were fine. UptyBots's multi-location checks caught it immediately. I couldn't fix the network, but I could post in Discord that it was a host issue and that European players should use a VPN temporarily. Communication matters even when you can't fix the root cause.
- BungeeCord proxy crash. The proxy went down but the backend servers stayed up. Without a separate monitor on the proxy port, I'd have thought everything was fine because the Minecraft port check on the backend was passing. The proxy monitor caught it in under a minute.
- Map corruption from a mod conflict. I'd added a new world gen mod that conflicted with existing chunks. The server crashed every time a player entered certain chunks. Monitoring caught each crash, I saw the pattern in the logs (always the same region file), and removed the problematic region data. Fixed within 20 minutes of the first crash.
Who Actually Needs This?
Honestly, if more than two people depend on your server being online, you need monitoring. That includes:
- Private SMPs with 5-10 friends. Your friend group's weekend gaming session shouldn't be ruined because the server crashed at noon and nobody noticed until evening.
- Public survival or hardcore servers. Players expect 24/7 availability. Downtime means they go play somewhere else.
- PvP and minigame networks. These often run BungeeCord or Velocity with multiple backend servers. Each one can fail independently.
- SkyBlock, Anarchy, Lifesteal servers. Heavy plugin use means more crash potential.
- Modded servers (Forge, Fabric, NeoForge). Mods are even less stable than plugins in my experience. More moving parts, more things that break.
The free tier on UptyBots covers small servers. There's genuinely no reason not to set it up. It takes less time to configure monitoring than it takes to deal with one undetected crash.
Best Practices I've Learned the Hard Way
- Monitor the game port, not just the host. Your VPS responding to ping means nothing if the Minecraft process is dead.
- Use retries before alerting. A single failed check can be a network blip. Two or three consecutive failures from multiple locations means something is actually wrong. Configure your alerts with confirmation requirements so you're not waking up at 3 AM for a false positive.
- Back up before you update plugins. Most of my crashes happen within 48 hours of a plugin update. Always have a rollback plan.
- Set up multiple notification channels. Discord webhook for the admin team, Telegram or email for you personally. Redundancy matters because if your notification method is down, you're back to the old days of not knowing.
- Restart your server on a schedule. Even with good plugins, a weekly restart keeps memory clean. I do mine at 5 AM on Monday when nobody's online.
- Track your uptime over time. UptyBots gives you uptime history and response time graphs. After a month, you'll see patterns. Maybe your server is always slow on Saturday evenings (peak players). Maybe it crashes every Tuesday (coincidence with a cron job). The data tells you things you'd never notice otherwise.
- Test your alerts. Stop your server on purpose once a month and make sure you actually get the notification. I've had a Telegram bot token expire on me and didn't realize until I needed it. Never again.
Your Server Website Matters Too
If you run a server website (vote page, rules, store, live map), its uptime directly affects your server list rankings and revenue. Most Minecraft server lists check whether your website is up when ranking you. A website that's frequently offline will drop in rankings, which means fewer new players finding your server.
Add an HTTP monitor for your website in addition to the game port monitors. It takes two minutes and covers a completely separate failure mode. I've had my game server running perfectly while my website was down because of an expired SSL certificate. Monitoring caught it. Without it, I'd have lost server list votes for days.
Frequently Asked Questions
What is the default Minecraft Java port?
TCP 25565. Custom servers may use other ports.
What is the default Minecraft Bedrock port?
UDP 19132. Bedrock uses UDP, not TCP.
How often should I check my Minecraft server?
Every 1-5 minutes for active community servers. I use 1-minute checks because the difference between "caught it in 1 minute" and "caught it in 5 minutes" is four minutes of players trying to connect and failing.
Can monitoring catch plugin-specific issues?
External port monitoring catches full crashes caused by plugins. For plugin-specific issues that don't crash the server (like a command not working), you need internal logging or a health endpoint that checks plugin status. I combine both.
Is monitoring worth it for small private servers?
Yes. I started with a 5-person SMP. Even then, having the server crash at 2 PM and nobody noticing until 8 PM was frustrating. The free tier on UptyBots handles small servers perfectly. Set it up once and forget about it until you need it.
What I'd Tell Past Me
If I could go back to the night before that build event, I'd tell myself: spend 15 minutes setting up monitoring. That's it. Fifteen minutes of configuration would have saved two hours of downtime, a bunch of lost builds, and a very angry Discord server.
Your Minecraft server is going to crash eventually. Plugins break. RAM fills up. Hosts have outages. The question isn't whether it will happen, it's whether you'll know about it in 60 seconds or 60 minutes. UptyBots makes it 60 seconds. And the difference between those two numbers is the difference between your community trusting you and your community finding a new server.
Set up monitoring before your next build event. You'll thank yourself.