We encountered a series of alerts from one server related to WMI timeouts. The appearing error message in these cases is WMI request timed out unrecoverable. If the problem persists with this sensor, consider pausing or deleting it. (code: PE051)
The Local Probe disconnects and restarts automatically some time after the WMI errors occurred. After that, all WMI errors disappeared from this server. I would like to understand the reason for this behavior in order to be able to troubleshoot such issues in a better way.
If a WMI sensor is down, does PRTG still keep trying to communicate with the device by opening new threads? What does happen with these threads once the device is reachable again? Do they all time out or do they stay active?
Article Comments
This article applies to PRTG Network Monitor 13 or later
WMI Timeouts and Probe Restarts
While Windows Management Instrumentation (WMI) sensors are showing timeouts, PRTG’s Local Probe continues to send WMI requests to the target machine and opens new threads at the same time. Because the number of available threads is limited to 500, PRTG restarts the probe service by design if the number of sensors that are in a timeout state reaches 40. This helps resolve connection issues in many cases. You can check the number of timeouts in PRTG’s system log files.
Regarding Open and Killed Threads
The PRTG probe starts a thread for each sensor scan. In this thread, the WMI sensor code sends WMI API calls to the probe computer’s WMI system. Most often, these API calls return after a short time, but sometimes they do not return at all. They just get lost somewhere in the Windows system. The thread cannot continue processing and has to wait until the function call returns.
After waiting for 45 minutes, PRTG stops the thread and marks it as unusable (which is similar to killing a thread, though, there is no action such as “killing” threads). Then, the sensor gets a new thread assigned and tries to process again. This method most often works as described, however, sometimes it does not—the new thread also gets stuck waiting for a response from Windows.
Unfortunately, there is no way to recover the resources of unusable (killed) threads. Because of this, PRTG restarts the whole probe process after 40 killed threads.
Causes
After all, there is not one device causing this issue on its own, neither the PRTG server, nor the target Windows server. Usually, the target computer’s WMI system is causing some trouble—this trouble results in hung up function calls on the probe computer’s WMI system. Concluding, this issue is kind of a system problem.
Workaround
If you detect recurring entries of certain target devices in the server log, consider to install a remote probe on the affected machines. This approach omits the usage of DCOM communication between probe and target computer, and, thus, helps in many cases to resolve these issues.
Aug, 2013 - Permalink
This article applies to PRTG Network Monitor 13 or later
WMI Timeouts and Probe Restarts
While Windows Management Instrumentation (WMI) sensors are showing timeouts, PRTG’s Local Probe continues to send WMI requests to the target machine and opens new threads at the same time. Because the number of available threads is limited to 500, PRTG restarts the probe service by design if the number of sensors that are in a timeout state reaches 40. This helps resolve connection issues in many cases. You can check the number of timeouts in PRTG’s system log files.
Regarding Open and Killed Threads
The PRTG probe starts a thread for each sensor scan. In this thread, the WMI sensor code sends WMI API calls to the probe computer’s WMI system. Most often, these API calls return after a short time, but sometimes they do not return at all. They just get lost somewhere in the Windows system. The thread cannot continue processing and has to wait until the function call returns.
After waiting for 45 minutes, PRTG stops the thread and marks it as unusable (which is similar to killing a thread, though, there is no action such as “killing” threads). Then, the sensor gets a new thread assigned and tries to process again. This method most often works as described, however, sometimes it does not—the new thread also gets stuck waiting for a response from Windows.
Unfortunately, there is no way to recover the resources of unusable (killed) threads. Because of this, PRTG restarts the whole probe process after 40 killed threads.
Causes
After all, there is not one device causing this issue on its own, neither the PRTG server, nor the target Windows server. Usually, the target computer’s WMI system is causing some trouble—this trouble results in hung up function calls on the probe computer’s WMI system. Concluding, this issue is kind of a system problem.
Workaround
If you detect recurring entries of certain target devices in the server log, consider to install a remote probe on the affected machines. This approach omits the usage of DCOM communication between probe and target computer, and, thus, helps in many cases to resolve these issues.
Aug, 2013 - Permalink