I no longer (may I never did?) understand my #linux system.
I have 12GB of RAM, it considers roughly 6-8GB as being used, over time ~4GB are considered cache and there are also roughly 6GB of Swap around that is entirely free. Still my system dies OOM more and more often recently 👀
I already increased the memory pressure close to maximum so the cache should go away as soon as anything is needed, but nope. Still OOM. Help?
Oh! Which kernel version are you on?
@null0x0 I'm now running 5.7.10-201.fc32.x86_64 (after a reboot).
The failing kernel was 5.6.19-300.fc32.x86_64
We'll see what happens I guess :/
Oh, sure it might or might not cause this it's just a hunch.
rum top to see which task blows up. most likely memory leak in one software running or miner trojan?
@sheogorath dmesg should tell you which process is OOMing. You can also use cgroups in creative ways to limit individual processes from gobbling all memory. Or add swap and keep an eye on system monitor, but a memory leak will gobble all that too
@simon That's the thing, it's not consistent. It's element-desktop, Atom, firefox, evolution, … and in worst case gnome-session itself (which obviously takes everything else with it).
Nothing explicitly yells "memory-leak" when I look at the dumped memory usage after a process was killed.
@sheogorath not a fix for your issue but you can protect against gnome session being killed by oom killer by assigning an appropriate oom score value https://dev.to/rrampage/surviving-the-linux-oom-killer-2ki9
If a memory leak or trojan filling the memory and swap too fast, swapspace may buy you time while you investigate. It dynamically allocates swap files. That means it creates swap files when needed and deletes them when freed.
@sheogorath When you're getting OOM'd, what does dmesg show was killed?
@trini Depending on its mood: Firefox, Atom, Element, Evolution, gnome-software, packagekitd, and sometimes gnome-session itself (which is the most annoying).
And when I look at the process table dump that comes along with the oom kill, nothing is sticking out as a specific memory abuser.
@sheogorath The first thing would be, when it's picked that process, is it at least using a fair bit of memory? Was the system usage totaling closer to 12G? The second thing would be to go back to the default memory pressure settings. Or if that has helped make this less of a problem, can you point me at what you've tweaked so I can get on the same level? Thanks!
@trini I mean the machine is running around with roughly 6-8GB used and 4GB considered cache/buffer but usually there are also a few hundred MB considered free.
I just decided to undo the memory pressure adjustments(vfs_cache_pressure). (When I set them I tried a various values ranging from 120 to 200, but it seems like it didn't solve anything)
@trini oh and as mentioned the ~6GB swap usually stay pretty much untouched (I didn't adjust the swappiness setting at all).
@sheogorath So in modern times I think "buffer/cache" is a bad way for info to be grouped. cache can be flushed in many cases but buffers are less clearly immediately reclaimable. swap only matters if there's something that's not demanding usage. So, what can we do? One thing you can try is changing overcommit policy. https://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/ seems a reasonable write-up at first glance and depending on how easy your problem is to trigger, you can experiment and see if that helps.
@trini Oh I actually came across that already and even wrote a quick TIL article based on it. For a while I used overcommit_ratio of 95 (I also tested 75 and 80) and an overcommit_memory setting of 2. That sadly caused more OOM crashes, not less.
Maybe that's Linux telling me that I just need to get more memory for my machine 😐
@sheogorath Ah, well damn, I was hoping that going that route would cause applications to get "sorry, no memory" and fail themselves before triggering OOM, which would in turn make it easier to see what's really killing the system. But what happens if you use the default ratio, or even lower? The goal I have in mind is to see what's asking for so much memory and in turn get it to fail directly rather than getting the last-restort OOM killer involved.
@trini Oh yes, they fail by themselves then, but that's also not really consistent. It actually caused more crashes of GNOME, but also Firefox and Atom died or refused to start. (Especially when the original setting of and overcommit_ration of 50.) OOM stopped being on a rampage by then.
So while the kernel was more relaxed it didn't really stabilize the system :D
Strange thing: applications reported memory allocation errors even when a few hundred MB were still around :/
@sheogorath Ah, see I would call that progress. If you keep the ratio at 50 or 75, and don't start up firefox or atom, can you keep the other going reliably?
@trini Would need to do some long term tests. (In general this kind of unreliability appears a few days after using things.) But I'll see what happens when after it happens the next time, I'll leave both, atom and firefox closed for a few hours. (ususally once it ran out of memory, there is more coming :D)
Maybe this will give you some information in which direction to search.
Your Swap partition is too big but it might be difficult to reduce it. You can lower the "swappiness" value from the default of 60 to something lower like 5. Do a little research on how to do this.
I hope his helps.
@sheogorath Do you use Firefox?
@n8chz Yes, I do. Is there a known problem around?
@sheogorath Certainly in the past Firefox has had a reputation for memory leaks. Supposedly they've fixed it. Does your memory use look dramatically better when not using FF?
This is my personal microblog. It's filled with my fun, joy and silliness.