Troubleshooting Ambiguous Errors

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • tomargent75
    PCHF Member
    • Jul 2023
    • 8

    #1

    Troubleshooting Ambiguous Errors

    Hey all, first post, but I’m at my wit’s end. First off, here’s my build:
    AMD Ryzen 9 7950X
    ASUS ROG Strix X670E-F Gaming Wifi
    2 sticks of Kingston Technology Fury Beast 16GB 6000MT/s DDR5 CL36 AMD Expo RAM
    ASUS GeForce RTX Nvidia 4070 Ti TUF Gaming
    Corsair RMX Series RM1000x, 1000W Gold Power Supply
    Samsung 980 Pro SSD 1TB PCIe NVMe Gen 4 M.2 for OS and odds and ends storage
    Samsung 980 Pro SSD 2TB PCIe NVMe Gen 4 M.2 for programs and multimedia storage

    I’m going to be as thorough as possible with my troubleshooting; if you’d prefer to avoid the long read, a summary will be provided at the end.
    Ran fine for 4 months, then suddenly, three days ago, every time I loaded into Diablo 4 it would crash to desktop without a reason. Figured it was probably a Diablo thing, so loaded up another game. Played for around 6 minutes, boom, crash to desktop. Assumed a GPU driver issue, so I booted up Nvidia’s GeForce Experience and tried to update drivers, but the download continued to fail, saying that either I had no hard drive space (I have 600GB left on the 1TB, which is my C:/) or that the download was corrupted. I decided to try a manual install, but that failed also. Thinking it could be a hard drive issue at this point, I tried downloading something other than drivers to both drives and both installs were successful (and none of my existing files were unable to be accessed, which with my experience means it mostly isn’t a hard drive problem). At this point during my research, I realized that Chrome had been giving a lot of “Aw snap” pages during my browsing, which I had ignored as Chrome being Chrome at first, but started to think that may be an indicator of the true issue. It was around this time, while I was browsing, I had my first reboot. No errors, nothing, just a black screen reboot. Loaded back in, started digging some more and this time, pretty quickly afterwards, I had a BSOD with no error code to dig into.
    With the quick crash, I began to suspect overheating so I gave it a break and booted back in later, but after monitoring for a good twenty to thirty minutes, doing a range of non-gaming activities, it never got above 70 degrees Celsius with processing intensive activities, and baselined at around 34 degrees when idle. Overheating was out of the question, to my mind.
    I began to dig around Device Manager and realized that my GPU had an error flag next to it. I uninstalled the drivers and GeForce Experience and reinstalled them (successfully this time) and the error cleared up, so I tried booting back into Diablo 4, which promptly crashed again, but not before I noticed a strange warbling effect on the right side of the screen - with no error log. I decided to boot up an EA game to see if maybe it’d have a different error message which, finally, it did - crashing mere moments after I loaded in - The message read something along the lines of “Crash reason GPUDeviceRemoved.” Must be the GPU, right? But I realized that, for a good portion of the crashes, the GPU being used were the onboard graphics, not the Nvidia card, due to the errors I fixed with the fresh driver installs. So I’m blanking. It could be RAM, a good majority of the issues with the crashes, reboots, etc seem like RAM issues, but then the GPU is acting squirrely too. It could be the entire mobo, with how widespread the issues are, too. The final things I tested were Cinebench, which ran for a good while before crashing, and the Windows Mem Diagnostic Tool which didn’t finish both times I ran it.

    Currently, my gaming PC has been relegated to a role of glorified Netflix machine and even that crashes from time to time (not to desktop, the Chrome page crashes and I have to refresh to make it contine)
    Side Note: A week before these issues popped up, I cleaned my PC by blowing the dust out. To make sure I didn’t somehow wiggle or blow something lose, I reseated my components and tested things again to no avail.

    TL;DR
    -Chrome tabs throwing errors after being open for a bit (YouTube usually throws “Aw Snap” error on initial load too)
    -Intensive Games crash almost as soon as I get beyond the opening menus (one game gave a “GPUDeviceRemoved error”)
    -Random Reboots (When the GPU was active and inactive, due to an error in device manager that forced the onboard graphics to become active)
    -BSODs without error codes
    -Cinebench not finishing its tests
    -Windows Mem Diagnostic Tool not finishing its tests before crashes
    -The system is not overheating, according to several separate monitors I ran
    -Couldn’t update Nvidia Drivers

    If I think of anything else, or if anyone has more questions, I’ll add to this. I just am trying to get to the root of the problem. Any ideas on what to troubleshoot, what the issue sounds like, etc? Like I said, I’m stumped…
  • veeg
    PCHF Director
    • Jul 2016
    • 8982

    #2
    Hello

    Assuming it will run long enough for this.

    Download and post.. Download Speccy | Find your computer specs, free!

    To post. CCleaner Support Community

    Comment

    • tomargent75
      PCHF Member
      • Jul 2023
      • 8

      #3
      Originally posted by veeg
      Hello

      Assuming it will run long enough for this.

      Download and post.. Download Speccy | Find your computer specs, free!

      To post. CCleaner Support Community
      Alrighty. I’ll upload the snapshot when I get back home to my pc

      Comment

      • xrobwx71
        PCHF Moderator
        • Mar 2023
        • 1067

        #4
        After following @veeg 's instruction:

        I have used DDU Display Driver Uninstaller to fix some serious issues with video drivers (Nvidia or AMD) and some audio (Realtek) drivers in the past. What it does is remove every remnant of the current drivers and gives you a completely clean slate regarding video or audio drivers.

        Some direction from the DDU site:

        Recommended usage

        [ul]
        [li]You MUST disconnect your internet or completely block Windows Update when running DDU until you have re-installed your new drivers. Remember to turn Updates back on![/li][li]DDU should be used when having a problem uninstalling/installing a driver or when switching the GPU brand.[/li][li]DDU should not be used every time you install a new driver unless you know what you are doing.[/li][li]DDU will not work on a network drive. Please install in a local drive (C:, D: or else).[/li][li]The tool can be used in normal mode but for absolute stability when using DDU, safe mode is always the best.[/li][li]If you are using DDU in normal mode, clean, reboot, clean again, reboot.[/li][li]Make a backup or a system restore[/li][li]It is best to exclude the DDU folder completely from any security software to avoid issues.[/li][/ul]

        Comment

        • tomargent75
          PCHF Member
          • Jul 2023
          • 8

          #5

          Comment

          • tomargent75
            PCHF Member
            • Jul 2023
            • 8

            #6
            Originally posted by xrobwx71
            After following @veeg 's instruction:

            I have used DDU Display Driver Uninstaller to fix some serious issues with video drivers (Nvidia or AMD) and some audio (Realtek) drivers in the past. What it does is remove every remnant of the current drivers and gives you a completely clean slate regarding video or audio drivers.

            Some direction from the DDU site:

            Recommended usage

            [ul]
            [li]You MUST disconnect your internet or completely block Windows Update when running DDU until you have re-installed your new drivers. Remember to turn Updates back on![/li][li]DDU should be used when having a problem uninstalling/installing a driver or when switching the GPU brand.[/li][li]DDU should not be used every time you install a new driver unless you know what you are doing.[/li][li]DDU will not work on a network drive. Please install in a local drive (C:, D: or else).[/li][li]The tool can be used in normal mode but for absolute stability when using DDU, safe mode is always the best.[/li][li]If you are using DDU in normal mode, clean, reboot, clean again, reboot.[/li][li]Make a backup or a system restore[/li][li]It is best to exclude the DDU folder completely from any security software to avoid issues.[/li][/ul]
            Thanks for the input, but unfortunately this didn’t work. I ran through the process, installed from the clean slate (manually) and, while I got a bit further than I had been getting, I still got this crash error less than a minute into a game:
            [ATTACH type=“full”]12365[/ATTACH]

            Comment

            • Bruce
              PCHF Moderator
              • Oct 2017
              • 10702

              #7
              while this may not be the problem at all, some things from the Speccy report…

              your RAM is not showing up, what is the make/model?
              uninstall BitDefender. Windows Defender will automatically kick in and you can reload BitDefender later.
              disconnect the 1TB Apricorn E:\ drive. it’s showing a SMART error.
              and to isolate a potential hardware fault - run the rig on only one drive, one memory stick, and remove the GPU and use the onboard graphics ports.

              Comment

              • tomargent75
                PCHF Member
                • Jul 2023
                • 8

                #8
                Originally posted by Bruce
                while this may not be the problem at all, some things from the Speccy report…

                your RAM is not showing up, what is the make/model?
                uninstall BitDefender. Windows Defender will automatically kick in and you can reload BitDefender later.
                disconnect the 1TB Apricorn E:\ drive. it’s showing a SMART error.
                and to isolate a potential hardware fault - run the rig on only one drive, one memory stick, and remove the GPU and use the onboard graphics ports.
                RAM = Kingston Technology Fury Beast 32GB (2x16GB) 6000MT/s DDR5 CL36 Desktop Memory Kit of 2 | AMD Expo | Plug N Play | KF560C36BBEK2-32, Black (New) [I went back and just copied the Amazon sale name from my email for exact info]

                Crossing my fingers it isn’t hardware ???
                I’ll gut her here in a minute and report any findings/if anything changes.

                Comment

                • tomargent75
                  PCHF Member
                  • Jul 2023
                  • 8

                  #9
                  Oh, finally got the Chrome crash again:

                  Comment

                  • tomargent75
                    PCHF Member
                    • Jul 2023
                    • 8

                    #10
                    Welp, took the RAM out and slotted one back in. Yellow LED. RAM issue according to the Strix’s manual on the Q-LEDs. Went ahead and slotted the second one in too, and still got the Yellow.
                    So, now I can’t boot up.
                    Got a pretty gnarly lightning storm going on, so gonna wait and mess with the seating a bit afterwards to see if I just didn’t get them seated properly, somehow (The case I have is pretty compact and awkward to move around in, though the air flow is pretty great). I just made sure they were lined up and pushed till they clicked, as per usual, but maybe they didn’t get all the way in.
                    If not, I think they may both be dead. Luckily I have a warranty in the pair.

                    Comment

                    • tomargent75
                      PCHF Member
                      • Jul 2023
                      • 8

                      #11
                      Finally got it to boot again. The second stick of RAM was the issue, it seems. Thanks for the suggestions. I booted up Diablo no issues, made a cycle through some Steam games, browsed Chrome, and have yet to have another issue. Luckily my RAM is under warranty, so I can hopefully get a replacement stick.
                      Any ideas why it went bad in less than 6 months time? Or was it most likely just a faulty stick to begin?

                      Comment

                      • veeg
                        PCHF Director
                        • Jul 2016
                        • 8982

                        #12
                        It seems to be a bad stick of ram, from the manufacture.

                        Comment

                        • Bruce
                          PCHF Moderator
                          • Oct 2017
                          • 10702

                          #13
                          sometimes there is no rhyme or reason - that’s why they do warranty - it’s even out of their hands sometimes.

                          faulty from the get-go, or power surge, would be the logical choices.

                          either go through the retail outlet you got it from, or get onto the manufactures website and there should be a support page where you can RMA it.
                          this should get you started; https://www.kingston.com/en/support/rma-enduser

                          Comment

                          • Bruce
                            PCHF Moderator
                            • Oct 2017
                            • 10702

                            #14
                            Closing as solved.
                            To request a re-open, go to Members > Staff Members, click a Staffer then Start Conversation and quote thread name.

                            Comment

                            Working...