Brand new GPU with multiple strange issues

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • AGDeveloper
    PCHF Member
    • Oct 2023
    • 16

    #1

    Brand new GPU with multiple strange issues

    Recently I decided to make an upgrade to my PC, going from a GTX 980 to a RTX 4070 Ti. In doing so, I seem to have unleashed some form of ancient evil which after multiple days of trying to fix it, it has reached a point where I’m completely stumped on what to do and fixing it is well above my knowledge level.

    The 3 big issues are:
    1. Constant loss of video
    2. Weird and intermittent visual glitch, almost like artifacting
    3. Occasionally unable to POST with VGA issue which resolves itself for no apparent reason

    My specs are are:
    Mobo - Msi x370 gaming pro carbon (MS-7A32, Bios ver E7A32AMS.1L0
    CPU - Ryzen 7 1700 @ 3.0GHz, rolled back from 3.8GHz
    RAM - 2x 8GB Corsair Vengeance LPX DDR4, in slot 1 and 3 (DIMMB2 & DIMMA2)
    GPU - Gigabyte GeForce RTX 4070 Ti ( https://www.overclockers.co.uk/gigab...gx-1g0-gi.html )
    PSU - Phanteks Revolt 1000w ( https://www.overclockers.co.uk/phant...ca-0cp-pt.html ) + Phanteks Revolt Cable Starter Kit
    Boot drive: Samsung 980 Evo M.2 NVMe, 250GB. Running Windows 10 on latest version & updates

    Here’s what’s happened when and all the troubleshooting steps I’ve taken:

    PC, before I got the new GPU and Psu, ran fine for years with a MSI GTX 980 and a Novatech Power Station 750w Black Edition.

    Upon installing the new components, everything was seemingly fine and was able to get to BIOS.

    After moving the PC, I ran into issue 3: PC refusing to post with a VGA issue. This was confirmed by looking at the “EZ Debug” LED which was stuck on VGA, and a series of beeps (1 long, 2 short) which corresponded to a video card error.

    I found to get around this issue I simply needed to jiggle the card around a bit, pushing it into the pcie slot and slowly releasing pressure, which eventually worked. This makes me believe that due to its sheer size and weight, the card isn’t sitting in the pcie slot correctly, despite it being all the way in and mounted correctly including using the anti-sag bracket. However, this issue does seem to randomly come back every so often, and simply powering the pc down and back up magically resolves the issue.

    Once I was properly in to windows after that nightmare, I proceeded to do a driver update via GeForce Experience. It worked fine for an hour or two before suddenly and out of nowhere while idling, both my monitors go black and all fans in the PC ramp up of about a second and a half, before returning back to what I’d expect them to be at idle. Video signal never does come back, and remoting into the pc via Google Remote Desktop greeted me with a blank screen with only the cursor visible, which it would stay until I rebooted. This issue would then reappear between 5 to 10 minutes after logging in.

    To try and remedy this, I uninstalled the driver in device manager, to which windows responded by downloading it again. However, this seemed to have solved the issue for about a day, where it eventually returned. In response, I did the same thing, and followed it with nuking the drivers from orbit using the Display Driver Uninstaller utility found online, and letting windows fetch it again. This solved it for a day and a half, before it returned again.

    I decided to dig deeper. In Event Viewer I did my best to try and track down what exactly was causing this, and found that Desktop Window Manager had crashed before every occurance. Reliability Monitor also had the same story to tell. I also decided to run a test to see if windows is in any way corrupted (“sfc /scannow”) to which there was no issues found. I also tried safe mode, to no avail, and running a clean boot (disabling all services) to also no luck. I decided to re-enable all the services upon confirming it wasn’t changing a thing. And rebooted.

    Eventually the issue happened again and on rebooting, the issue just decided not to happen… I really cannot explain this as I had done nothing different. It just decided on its own to work… Which lasted all of 2 hours at which point it got fed up and decided to have a snooze with the same issue yet again.

    This time however, it’s become more severe, as I can’t even get to the login screen. It POSTs, windows loads in, and right after it fades to black, and stays there. The monitor is clearly active, but nothing is being displayed, where it stays.

    In regards to the strange artifacting-like glitch, it was noticeable during the periods where I could use the PC before it had a mental breakdown. It only ever occurred for the second monitor, on both DP and HDMI. It’s definitely not the monitor as plugging in anything else, such as a laptop, or viewing content such as YouTube on it does not make this issue, and it’s only ever been present since the 4070 Ti was installed. There’s no way to properly test to see if it happens with just the monitor plugged in, as the second my second monitor is not plugged in regardless of DP or HDMI, the same issue of blank screen happens - I can only use both or the second, never the first alone, another issue only created since the 4070 Ti was installed. I have attached a video of me showing it happening when it was about halfway down the monitor, however it does originate at the top to the bottom, and only appears every 3 to 5 mins.

    The biggest issue here is the constant loss of video, which currently renders the PC basically unusable, and has gotten so bad that it’s literally unusable to the point where I can’t access anything. I have tried everything to get it to work, but at this point I’m left at the mercy of what mood it’s in at that moment. Posting here is truly my last resort, other than calling in an army of priests to perform an exorcism on it.

    I do think I’ve done a lot more than I’ve mentioned already to fix it, however I’m so mentally exhausted I can’t remember what exactly, so if I’ve missed anything my apologies!

    Thanks in advance
  • phillpower2
    PCHF Administrator
    • Sep 2016
    • 15209

    #2
    While troubleshooting stick with just the one screen connected to the GPU.
    Originally posted by AGDeveloper
    CPU - Ryzen 7 1700 @ 3.0GHz, rolled back from 3.8GHz
    I suggest you restore the MBs default factory settings in the BIOS;
    Restoring the MBs default factory settings in the BIOS, they are sometimes listed as one of the following " factory defaults" “most stable” or on newer boards “optimized” please note that if you have both the “most stable” and the “optimized” options in the BIOS you should choose the most stable" option as in this instance the “optimized” settings are a form of overclocking that can cause instability.

    Save the new settings, exit the BIOS, restart the computer, test by using the computer as you normally would, post back with an update once you have done this.

    Comment

    • AGDeveloper
      PCHF Member
      • Oct 2023
      • 16

      #3
      Originally posted by phillpower2
      ~trimmed~
      Okay so I’ve reset the BIOS settings back to factory. I also forgot to mention this (knew there was something) but I have done this before as part of troubleshooting this issue by removing CMOS battery, waiting 30 mins, and trying with that to no help as the issue reoccurred not long after, however I’ve done so again to really make sure after resetting back to factory.

      As shown in attached image, it successfully cleared. The only option I changed was turning post beep back on, and confirmed that was the only setting changed. I have been able to boot back into windows successfully after doing so.

      Usually it can take anywhere from 5 minutes to up to 24hrs before the issue reoccurs, so unless there’s things that I should be trying I’ll report back when it happens again, or when 24hrs has passed without issue.

      Thanks.

      Comment

      • phillpower2
        PCHF Administrator
        • Sep 2016
        • 15209

        #4
        Not what was suggested and for a couple of very good reasons, restoring the settings in the BIOS following the steps provided does not put anyone or their hardware at risk of harm so keeps people safe + removing the CMOS battery does not restore the MBs best settings.
        Originally posted by AGDeveloper
        The only option I changed was turning post beep back on
        Good call and an idea to always have this enabled for troubleshooting purposes.

        Stick with not changing anything at all for the next couple of days at least and use the PC under load as much as possible to see how you get on.

        Comment

        • AGDeveloper
          PCHF Member
          • Oct 2023
          • 16

          #5
          Originally posted by phillpower2
          Not what was suggested and for a couple of very good reasons, restoring the settings in the BIOS following the steps provided does not put anyone or their hardware at risk of harm so keeps people safe + removing the CMOS battery does not restore the MBs best settings.
          Ah that’s my bad then. I always assumed they where one and the same. It’s good to know for the future!

          And will do, and I’ll report back if it happens again.

          Comment

          • phillpower2
            PCHF Administrator
            • Sep 2016
            • 15209

            #6
            No worries and fwiw, you are comfortable with working inside of the case whereas someone not so savvy could take it upon themselves to do the same thing and have mishap.

            Comment

            • AGDeveloper
              PCHF Member
              • Oct 2023
              • 16

              #7
              Yeah that’s understandable. I quite like to get hands on, and I’ve stripped down and rebuilt a pc quite a few times over the years for things like cleaning, troubleshooting etc.. While installing the components I did strip it down fully to clean all the fans, case, stuff like that, and tried to get the spare pc running with my old GPU as my partners laptop is on its way out. Thankfully I tested it before building it back into the case as I found the mobo fried itself, second one that pc has claimed, and I’m done spending more money on getting that old dog back to life. Though apart from the “basics”, I don’t know much else, I’m more conformable with the software side of things.

              Anyways, the issue has happened yet again. I wasn’t at my PC when it happened, however it couldn’t have been more than 30 minutes between the last check in and when I found it. Same exact issue - Black screen, monitors have signal, unresponsive to even the power button. I wasn’t able to remote in to see if it’s identical as Virgin Media has gone for a nap leaving me without Internet, however it looked identical to what I’m unfortunately used to by now.

              The PC has been idling all day apart from a short 10 minute load from a early afternoon session of Beat Saber. The only loads it’s experienced since clearing CMOS have been the same games I played during the 2 stints of it working - Cyberpunk 2077 (Raytracing Ultra preset), Final Fantasy XV (Ultra + 4K Texture DLC) and Star Citizen (Ultra) . The only thing I did differently was monitor gpu temps using hwinfo64 on the off chance it’s what’s causing it, but the gpu has stayed at around 46C idle and 52C on heavy loads.

              I’ve managed to boot in again just fine this time round and are monitoring it for any more occurances. Strangely, this time round nothing in Event Viewer or Reliability Monitor hints at dwm crashing. The last occurance recorded by both of it happening was the event that led me to post here. But I’ll keep an eye on it.

              Comment

              • phillpower2
                PCHF Administrator
                • Sep 2016
                • 15209

                #8
                Download MiniToolBox and save the file to the Desktop.

                Close the browser and run the tool, check the following options;

                List last 10 Event Viewer Errors
                List Installed Programs
                List Devices (Only Problems)
                List Users, Partitions and Memory size

                Click on Go.

                Post the resulting log in your next reply for us if you will.

                Comment

                • AGDeveloper
                  PCHF Member
                  • Oct 2023
                  • 16

                  #9
                  Here’s the report, attached to the bottom of this message.

                  I should stress that at the time of the log no issue has occurred since the last message, and the PC has been on all day idling to try and repeat my last posts conditions of crash (24/10/23 11:27, time now 25/10/23 04:23). And the last error was irregular - no dwm/Desktop Window Manager reported crash in Event/Reliability log, unlike most times.

                  I also have a speccy log from the first usable boot just after my first post, and now. I can provide it if needed.

                  Comment

                  • phillpower2
                    PCHF Administrator
                    • Sep 2016
                    • 15209

                    #10
                    Couple of problems there but will wait and see how you get on before elaborating.

                    Comment

                    • AGDeveloper
                      PCHF Member
                      • Oct 2023
                      • 16

                      #11
                      Since the last bootup in my last reply, I haven’t had the time to stress the PC in the same way due to IRL issues taking priority. However, I did manage to leave it idling all night thursday, to no issues causing me to plug my second monitor in for further testing, and all day Friday without issues (bar one strange recurring one, I’ll touch on that later). All day yesterday into today (Saturday to Sunday) I have been stressing it with a few abnormal but stressing tasks to no avail. I cannot recreate the issue whatsoever anymore.

                      This is strange - I used to get it regularly, but ever since my forum post here, I cannot get it to happen fully even once. I fear I may be dealing with a case of “call tech support because the issue will always fix itself before you get connected”. It’s the only way I can explain this abnormal behaviour, even more so as bar running MiniToolBox, I have not done anything different and yourself have not suggested anything that I have not already tried beforehand. I can almost guarantee once this topic is closed and my support from here ends it’ll re-occur, most likely out of spite more than anything.

                      In regards to the other issue, I have found that as of recently, the GPU is sending some form of “kill” signal to my monitor, causing it to shutdown. Before, even during the issue happening, my monitor could fully detect the signal coming from my GPU and even turn on and display properly if it detected it during sleep mode - it’s default status. It could also go into auto-standby mode just fine, and reawaken to have the PC go back from 1 into 2 monitor mode and it would have signal. However, now, it seems to shut down randomly without cause or reason, it can only detect the signal if both the PC is powered on and the monitor is on before POST OR the cable is unplugged and replugged in. When it does go into the random “sleep”, it shuts down fully, something not possible due to me disabling it. It also cannot detect the signal upon its bootup unless the cable is unplugged and reinserted, and best of all, the PC acts like it has a second display despite it being completely off. I have tested it in both HDMI and DP connections, and no dice. Only my second, older and much more basic, monitor seems to have luck keeping the GPU’s attention in check, also on DP and HDMI, and with the reverse applied to both. I truly cannot explain this, and the behaviour it’s exhibiting is so left field I’m still recovering from the whiplash. However, I’ll throw it in with the issues to be fixed alongside the other issues like the artifacting, as it’s not a priority.

                      Attached below are the screenshots of the last dwm crash from before I made this post - from Reliability Monitor and Event Viewer.

                      I apologise for taking this long to solve this issue, but it’s out of my hands and now it’s truly proven to be well and above anything I could ever solve on my own. I hope at least the pain this computer is having me endure is causing enough curiosity out of the sheer perplexity it’s causing to make this worthwhile.

                      Comment

                      • phillpower2
                        PCHF Administrator
                        • Sep 2016
                        • 15209

                        #12
                        No worries how long this takes but it is important that we stick to one thing at a time, going off topic is both unadvisable and frowned upon.

                        Fingers crossed but you may have resolved the original issue by restoring the MBs default factory settings, give it another couple of days and then post back.

                        Comment

                        • veeg
                          PCHF Director
                          • Jul 2016
                          • 8982

                          #13
                          Any updates ?

                          Comment

                          • AGDeveloper
                            PCHF Member
                            • Oct 2023
                            • 16

                            #14
                            Looks like I was too quick to call it done. It has happened again. To make sure it was not a fluke, after bootup I went straight back to what I was doing before and managed to get it to happen again, much faster for whatever reason. Attached are the 2 MiniToolBox logs taken right after bootup after each issue.

                            What I was doing during both crashes is playing Cities Skylines 2. I’ve been playing it for 2 days now without issue, and when these 2 crashes occurred I had only just turned the PC on again today since the end of my last session from the late evening into today’s very early morning.

                            The issues happened just like before - screens go black, all fans spin up, and the PC is accessible via remote desktop but nothing but a blank black screen is viewable. When the first one happened, I immediately recognised it and forced a shutdown. On the second one, however, I decided to wait and see what happens if I let it try shutdown on its own (after pressing power button). It worked, but I managed to encounter a related issue I haven’t seen since making the initial post here - unable to get to the logon screen. Everytime the PC does the usual load after POST it goes blank and stays that way. I managed to resolve the issue by forcing a shutdown at the blank screen, at which point the next bootup worked just fine, so I seem to have found a way to replicate and “resolve” that at least. Both issues seem to be directly related as this behaviour has not happened without a crash prior.

                            I have checked Event Viewer and Reliability Monitor after both cases. After the first nothing looked out of the ordinary. After the second however, the familiar dwm crash is present. I have included the screenshot of Reliability Monitor to the post as well.

                            Edit: After the first crash I also began to monitor temps to see if that may be related while trying again, however it is not. Temps looked just like they have in the times I was checking them well before the crashes.

                            Comment

                            • phillpower2
                              PCHF Administrator
                              • Sep 2016
                              • 15209

                              #15
                              Originally posted by AGDeveloper
                              What I was doing during both crashes is playing Cities Skylines 2.
                              How many monitors did you have connected.

                              Just looking at your first MTB log we can see a major problem and one that has possibly caused Windows to become corrupt, the other errors could be as a result of this so Windows needs to be addressed first.
                              [COLOR=rgb(226, 80, 65)]1 Drive c: (WHOA, DUDE!) (Fixed) (Total:232.34 GB)[COLOR=rgb(226, 80, 65)] (Free:26.88 GB) NTFS

                              See my canned info below;
                              For Windows to be able to run efficiently and to be able to update you need to have between 20 and 25% of the partition or drive available on a HDD and an SSD between 10 and 15% as free storage space at all times, if you don`t you risk Windows becoming corrupt or not being able to update which puts you at risk of malware attack.

                              Data only storage devices should not be allowed to get any lower than 10% of free storage space of the full capacity of the drive/partition on the drive, this also to avoid data corruption.

                              [COLOR=rgb(226, 80, 65)]Please note that storage devices can physically fail if the amount of free storage space is allowed to drop below the required 10 or 20/25% minimum.

                              Uninstall as many unused programs, games, videos and music files as you can and get yourself another means of backing up to, post back when you have between 20 and 25% free storage space on the C: drive/partition and we can go from there.[/COLOR][/COLOR][/COLOR]

                              Comment

                              Working...