• Hi there and welcome to PC Help Forum (PCHF), a more effective way to get the Tech Support you need!
    We have Experts in all areas of Tech, including Malware Removal, Crash Fixing and BSOD's , Microsoft Windows, Computer DIY and PC Hardware, Networking, Gaming, Tablets and iPads, General and Specific Software Support and so much more.

    Why not Click Here To Sign Up and start enjoying great FREE Tech Support.

    This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Solved High GPU stress causing black screen crash

Status
Not open for further replies.
I've been trying to troubleshoot this issue on my own, but I'm at my wits' end and hoping someone here can help. Fingers crossed!

I built my rig about 9 months ago, and things have mostly gone smoothly ... except for very occasional crashes to a black screen when running high-performance games. In the past, I'd uninstall & update drivers, physically reinstall the GPU, reboot, and things would mysteriously seem to go away for a while. But in the last week, this happens more and more, and now nothing seems to fix it.

Most recently, I've removed the GPU, completely uninstalled the drivers with DDU, reinstalled the card, updated to the current Nvidia drivers, and run Windows Update.

I can reproduce the issue by either running high-end games (Cyberpunk, Red Dead 2, AC Odyessy) or by stressing the GPU with 3DMark or FurMark. When this happens, the screen goes black, all the fans ramp up, and everything freezes until I hard reboot, at which point I'm the system will run fine ... until the graphics are stressed again.

Other things to note:

I've run HWiNFO64 to look at temps, and nothing has ever jumped off the scales; it's a well-ventilated system. I've been very meticulous in building the machine; everything is connected well, it's clean, dust-free, and every component was brand new when I put it together.

My first thought was that this could be a power issue, but I think my 850-watt gold PSU is enough, especially since I'm actually not overclocking my i9-10850K at the moment. (Maybe not though?)

Looking at the Event Viewer, it seems the error "Faulting application name: dwm.exe..." occurs several times right before every crash. Googling that error message lead me here to this forum. So here I am! Does anyone have any ideas? Components are listed below along with a Speccy snapshot and MiniToolBar file from right after the last crash. Thanks in advance! :)


i9-10850K
ZOTAC GeForce RTX 3070
ASUS Prime Z490-A
Corsair Vengeance RGB PRO - 32GB DDR4 3600
Corsair RMX Series, 850 Watt, 80+ Gold

Samsung 970 Evo - 500GB M.2/PCIe (for the OS)
Mushkin Enhanced Pilot-E - 2TB 2280 M.2/PCIe
Intel 660p M.2 2280 - 1TB PCIe 3.0 x4
Crucial BX500 - 2TB 3D SATA

Cooler Master MasterLiquid ML240L RGB v2
Cooler Master SickleFlow 120mm V2 ARGB

 
As a starting point;

You have the wrong RAM for your CPU, Intel state here up to 2933MHz and if you have XMP enabled the RAM will get auto OCd past what the CPU can handle and the PC start to have issues.

Go into the BIOS, disable XMP and then manually set the RAM to run at 2933MHz and the voltage to 1.35V.


Power Profile
Active power scheme: High performance

Change the Windows Power Plan to Balanced, Ultra and High Performance are a form of overclocking that is known to cause stability and overheating issues, the setting should only be used for gaming type notebooks that have a discrete GPU that needs the extra power.

Make the above changes, fully shut down the computer, restart and test by using the computer as you normally would then post back with an update for us.
 
phillpower2, thank you sooo much! I cannot believe I missed this somehow when I was picking the parts for my build.

And yes, I believe I did enable XMP when I originally set things up because I wanted to get the speeds up to 3600mhz (even though that's too fast for my mobo, apparently). I'm going to do both of these suggestions this evening and run a few most stress tests, and I'll report back afterward. Thank you again! :)
 
I believe I did enable XMP when I originally set things up because I wanted to get the speeds up to 3600mhz (even though that's too fast for my mobo, apparently)

Not too fast for the MB but the CPU and unless you closely read the MB specs before making the purchase you would not know, the board specs says compatible with RAM up to 4800MHz (OC) but then goes on to say the below;

* 10th Gen Intel® Core™i9/i7 CPUs support 2933/2800/2666/2400/2133 natively, Refer to www.asus.com for the Memory QVL (Qualified Vendors Lists).

Not quite sure how they got to the 4800MHz (OC) because as far as I am aware the fastest RAM that any Intel CPU can handle is an 11th gen i9 that maxes out at 3200MHz.

You are welcome btw :)
 
Not too fast for the MB but the CPU and unless you closely read the MB specs before making the purchase you would not know
Ah ok, well that would explain why I probably went ahead and got the 3600 RAM when I was collecting parts. I have a feeling I would have caught a pretty obvious problem like my mobo not being able to handle that speed, but I might not have read the super fine print.

Anywayz....

Unfortunately, the changes you suggested didn't fix the black screen crash. :( I actually first set my BIOS to Default so there were no other possible tweaks getting in the way, then I rebooted and went back to set it to Manual, 2933MHz, and 1.35 volts. I also changed the active power scheme to Balanced. The crashes happened several times in a row when I running a 3DMark test. What do you think I should try now? (I updated the MTB report and Speccy snapshot.) Thanks!

 
We need to leave Speccy for now.

Unfortunately, the changes you suggested didn't fix the black screen crash.

Software such as Windows can crash and when it does crash you get a BSOD and when enabled a crash dmp is generated, programs or games when they crash can on occasion close to the desktop but the computer will still be 100% functional.

Hardware failure such as a weak power supply and/or overheating are not software related and when a computer for example suddenly turns off, freezes or the screen goes black etc the behaviour should be described as the "computer shut down unexpectedly" or froze etc and not as having crashed as the latter implies a software issue as opposed to an obvious hardware issue when described properly.

Having the correct info means that helpers will not be looking for a software issue when the problem is clearly hardware related.

What happened exactly and how were you able to get back into Windows.

What is suggested;

To rule out any possible bad settings, restore the MBs default factory settings in the BIOS, save the settings, exit the BIOS, restart the PC then do the below for us;

Download MiniToolBox and save the file to the Desktop.

Close the browser and run the tool, check the following options;

List last 10 Event Viewer Errors
List Installed Programs
List Devices (Only Problems)
List Users, Partitions and Memory size

Click on Go.

Post the resulting log in your next reply for us if you will.
 
Sorry if this is my fault here, but it doesn't appear that the MTB file show up in the post even though I'm attaching it to my reply. And strangely, when I try to cut & paste the text in the message box itself, I get the message "Oops! We ran into some problems. Please try again later. More error details may be in the browser console." Do I need privileges to either post attachments or create make long posts?
 
Oh I'm sorry about that! I guess I didn't quite understand what you were asking. :confused: Given what's been happening with my system, I suppose I should just say the "computer shuts down unexpectedly" as opposed to "crashes" since I'm not quite sure of the culprit yet.

Here's exactly what happens: When I play a high-performance game or I test the GPU using Furmark or 3DMark, the screen will go completely black and the fans spin at their maximum. It will stay like this until I hard reboot it by pressing the power button. But once it reboots, the system appears to be completely normal; it boots into Windows 11 just fine, and everything seems to work ... that is, until the GPU is stressed again (with a game or GPU benchmark tool).

As I said in my original post, this used to happen infrequently when I built it months ago, but now it happens every time. The temps of all the components seem to be within the normal range according to HWiNFO, and the GPU never gets above ~68 C.

Hope that makes sense and answers your question better. Here is the MiniToolBar report which I ran after I completely rest the BIOS and then manually changed the RAM speed and voltage as you suggested earlier.

Thanks in advance for the help! :) Apologies again for the confusion.
 
Classic signs of something overheating and looking at the MTB log there is nothing to suggest otherwise, as in there are no problem devices list and the Windows errors that we can see would not cause a black screen and sudden shutdown.

Have you made sure that crash dmps are enabled on the computer, see info here for configuring small memory dmps.

We need to check the temps and voltages;

Download Speedfan and install it. Once it's installed, run the program and post here the information it shows. The information I want you to post is the stuff that is circled in the example picture I have attached but don`t worry if it does not display the same.

speedfan.png


So that we have a comparison to Speedfan, download, run and grab a screenshot of HWMonitor (free).

To capture and post a screenshot;

Click on the ALT key + PRT SCR key..its on the top row..right hand side..now click on start...all programs...accessories...paint....left click in the white area ...press CTRL + V...click on file...click on save...save it to your desktop...name it something related to the screen your capturing... BE SURE TO SAVE IT AS A .JPG ...otherwise it may be to big to upload... after typing in any response you have... click on Upload a File to add the screenshot.

Screenshot instructions are provided to assist those that may read this topic but are not yet aware of the “how to”.
 
Have you made sure that crash dmps are enabled on the computer, see info here for configuring small memory dmps.

It looks like these crashes aren't actually creating dump files. Windows is configured correctly to do this, and there are a few dump files from several weeks/months ago. But none are being written as a result of the black screen shutdowns I've been having recently. Could this be further evidence that it's likely a hardware problem?

Download Speedfan and install it. Once it's installed, run the program and post here the information it shows. The information I want you to post is the stuff that is circled in the example picture I have attached but don`t worry if it does not display the same.

SpeedFan seems to be missing the fan speed section and the voltage section. I'm not sure why though, especially since it looks like it's recognizing all of the components in my system. (According to the text box right above "CPU usage", it's discovered all the hardware.)

Speedfan.jpg


So that we have a comparison to Speedfan, download, run and grab a screenshot of HWMonitor (free).

The HWMonitor shows everything, I believe. (Here's a text file of the exported monitoring data in case there are other #'s you need.) I'm definitely not the expert here, but I think those numbers look more or less okay, no? I wish I could get these readings when I run a GPU stress test, but it shuts down pretty quickly before I can tell how much the system is being taxed.

HWMonitor 1.jpg
HWMonitor 2.jpg


I can tell you, however, that several months ago when things were working smoothly, the temp readings of the major components seemed pretty normal. I used HWiNFO64 when I was testing the stability of the system. Nothing went off the charts, the GPU got to about 70 C, and then fans would kick in and cool things down. When the system wasn't doing much besides running a browser (basically as it is in these screenshots), the GPU would hover between ~40-50 C. But alas, that was back when it was working well and I could monitor things while pushing the performance. :(

Let me know what you think!
 
As per my replies #6 amd #12 Rufus;

Software such as Windows can crash and when it does crash you get a BSOD and when enabled a crash dmp is generated, programs or games when they crash can on occasion close to the desktop but the computer will still be 100% functional.

Hardware failure such as a weak power supply and/or overheating are not software related and when a computer for example suddenly turns off, freezes or the screen goes black etc the behaviour should be described as the "computer shut down unexpectedly" or froze etc and not as having crashed as the latter implies a software issue as opposed to an obvious hardware issue when described properly.

Having the correct info means that helpers will not be looking for a software issue when the problem is clearly hardware related.
Classic signs of something overheating and looking at the MTB log there is nothing to suggest otherwise, as in there are no problem devices list and the Windows errors that we can see would not cause a black screen and sudden shutdown.

No worries about Speedfan we just get what we can from it and is why I said don`t worry if it does not display the same, HWMonitor still gets updated whereas Speedfan does not I`m afraid.

Could be something and nothing but HWMonitor shows that the maximum speed of the CPU cooling fan is/was 663rpm, check the maximum CPU fan speed setting in the BIOS along with the thermal shutdown setting.

Everything else in both programs look ok.
 
Could be something and nothing but HWMonitor shows that the maximum speed of the CPU cooling fan is/was 663rpm, check the maximum CPU fan speed setting in the BIOS along with the thermal shutdown setting.

I think the max CPU fan was low in the last post because I had basically just powered the system up and was only using Chrome for a few minutes. But here's a snapshot of the system after doing several more demanding tasks. I'm pretty sure the fans get going once there's a demand.

1.jpg


Anyway, so what you're saying is that it's probably something overheating (as opposed to a weak power supply), correct? And considering this only happens when the GPU is stressed, is it safe to say the video card is likely the problem? I didn't mention this before, but I've run both CPU and RAM stress tests with Cinebench & MemTest86 several times, and there were never any obvious issues.

Also, is there a way to drill this down further with other diagnostic tools/methods? Obviously, I'm not able to monitor the temps or voltages at the moment since the system will always shut down when the GPU is pushed. But could I, for example, under-volt the GPU and see if there's a breaking point perhaps? (I'm probably out of my depth here, btw.)

Last thing: I have this old GTX 1660 from my previous build, and as of two years ago, it worked just fine. If I swapped that into the current system and everything ran well during 3DMark & FurMark tests, would almost certainly mean my current RTX 3070 from Zotac is definitely faulty? Maybe I'm just being optimistic, but I feel like defective hardware is a lot less common than people think, but I dunno, perhaps I should face that possibility ... even though I got it brand new directly from Zotac, and it hasn't really gotten a ton of use. :(
 
CPU fan speeds ruled out but did you also check what the thermal shut down setting was in the BIOS.

Swapping in the GTX 1660 is an idea but being that it draws a lot less power than the RTX 3070 any outcome would not be conclusive, try it and see how it goes in any event.

Also, is there a way to drill this down further with other diagnostic tools/methods?

We are nearly out of options and you are fast approaching the need for a local tech to do a proper bench test.

Maybe I'm just being optimistic, but I feel like defective hardware is a lot less common than people think,

Sorry but you are so far off the mark, the mass producers from the east have flooded the world with not only sub standard goods over the years but also out and out counterfeit items and this not just small items like PSUs and RAM but whole computers made looking the exact same just with no Dell or HP etc logo on them.
 
CPU fan speeds ruled out but did you also check what the thermal shut down setting was in the BIOS.

I'm not 100% sure if this is what I should be looking for (see pic below), but for my ASUS mobo it seems the "Maximum CPU Core Temperature" is set to "Auto," as are a few other related settings on this page. There doesn't seem to be an actual "thermal shut down" setting though, at least not that I'm seeing.

Except for the RAM tweak I made above, all the BIOS are set to default, but it doesn't say what that threshold currently is. Is this something I should adjust do you think? Shouldn't I be looking for GPU-related settings since it seems the CPU stress tests don't cause the shutdowns? If I do need to adjust this, what should I set it to?

bios.jpg


Swapping in the GTX 1660 is an idea but being that it draws a lot less power than the RTX 3070 any outcome would not be conclusive, try it and see how it goes in any event.

Well, it seems the old GTX 1660 works just fine in this system. I ran 3DMark several times and Furmark for 15+ minutes, and although the benchmarks/scores were worse than the newer RTX 3070 of course, no shutdowns happens and everything ran smoothly. But as you said, this may not be telling since it pulls less power. :/

Sorry but you are so far off the mark, the mass producers from the east have flooded the world with not only sub standard goods over the years but also out and out counterfeit items and this not just small items like PSUs and RAM but whole computers made looking the exact same just with no Dell or HP etc logo on them.

Sigh. I guess I was just hoping there'd be something I was overlooking ... rather than have to face the possibility of needing to find a new GPU during this never-ending shortage. :(
 
Not it I'm afraid, you are looking for something that tells you at what temperature the MB will shut down to protect the CPU from frying, check under the Advanced tab, this check is something just to tick on the check list and restoring the MBs default settings would make sure that the CPU will shut down at a safe temperature.

The cause of the black screen could still be either the GPU or the PSU so you are at the stage now were you need to either get the PC tested by a local tech using their PSU and your RTX 3070 or RMA the GPU which was not released until Oct 2020 and so should still be covered by warranty the standard Zotac 2 year warranty.
 
Status
Not open for further replies.