GPU Weirdness

Earlier this week I encountered a very strange problem with the 1050ti in my desktop TS140 - the same setup I had posted about here earlier. This is what happened. The machine had just completed a windows update and prompted to reboot. I was working on the other TS140, goonie, at the time and needed local video on it, so while the desktop-TS140 rebooted I pulled it's displayport cable and moved it to goonie. I finished doing what I needed to do and moved the cable back, yet, no video. So I reset the machine - the BIOS splash displayed as normal, but when the loading screen normally transitions to the windows login screen, the display went dark. And not just black - no video signal was being received.

I figured a windows update nuked something, so I booted off my veeam recovery disk and restored the previous day's backup (yay for taking regular backups), then rebooted, but encountered the same behavior. As soon as windows started, the display went dark. So I grabbed my windows 10 install drive and did a fresh install - the install process went just fine, but as soon as it loaded the desktop and windows started automatically downloading and enabling divers - boom, display goes dark again. At this point I don't know what to think. I tested the HDMI and DVI as well, but only saw the same behavior.

Wanting to test on more than windows, I grabbed my ubuntu 18.04 drive and did a live boot off the install media, and while it did display full resolution, I didn't think that the video was being accelerated. At this point it was getting late, so I pulled the card and switched back to the onboard video, then once again restored the previous day's backup. Windows booted up just fine, and after updating the driver for the iGPU (which most certainly made me nervous) it was just fine. At this point it was about 1AM so I packed up and went to bed.

On the way home from work the next day I had to swing by the storage unit to drop off some more stuff, and while I was there I decided to pull out the T3500 - my only other full-size PC with PCIe. I do have the other TS140 at the house but I wanted to test on something totally different. Once I got it all setup and the card installed I did a fresh install of ubuntu 18.04. When it booted up the display was again at full resolution, but after running some GPU tests it was most certainly not accelerated. I installed the latest stable nvidia drivers for ubuntu and rebooted again, fingers crossed. As soon as the machine came back up and the display manager started... you guessed it... the video signal dropped out. Every time the GPU was initialized with drivers it was crashing. I ran the windows 10 install on this machine too and again saw the same behavior - no problem during the install and first boot, but as soon as windows grabbed and started the GPU driver the display dropped. Frustrated and certain that the card was not only dead, but not covered under warranty, I went to bed.

The following evening (yesterday evening, if we're counting), I decided to pop off the card's cooler and take a look at the thermal situation. I didn't think that this was a heat related issue, but still - might as well take a look at the card and the die while I can. The thermal paste was plentiful and very dry, so I grabbed some alcohol and q-tips and went to town cleaning everything up. I re-applied the paste and put the card back together, and then wondered if it was worth testing again. I figured it was, but wanted to take a slightly different approach.

I powered down and grabbed the other TS140, goonie, and put a spare SSD in it. I made a windows 7 install drive, and after mounting the card in the machine, began the process of installing and updating 7 on this box. Once it was all up and running, I made the fateful trip to the nvidia download page and grabbed the latest installer. Unlike in windows 10, the GPU driver wasn't initialized right away, so I rebooted, expecting the worst. It came right back up and displayed a full resolution login screen, so after I logged in I checked device manager right away and the 1050 ti was listed, with no errors.

So far I have run benchmarks with Geekbench, Cinebench, and FurMark - including a 10 minute stress test with FurMark. The card gets rather toasty, about 60ºC, but the fans never ramped up, so I guess it was doing just fine. In any case, the system never crashed - it just kept chugging along. I don't know what to think. I really doubt that my replacing the thermal paste did anything, and I would be surprised if both the T3500 and other TS140 were messed up somehow. I have another 1050 Ti (a short card this time...) on order, should get here tomorrow, and I'm really hoping that it doesn't exhibit any weirdness. I am going to test this card in at least one more system, a friend's Optiplex 790, and until then I'm going to essentially expect it to die at any time. Right now the only test I have left that I'd really like to do is Ubuntu 18.04 on goonie and see how that compares. If I have to use this card on this machine in windows 7 that's hardly the worst thing in the world, but it certainly is confusing....