Table of Contents |
---|
...
Documentation
The "as built" MES environment should be documented and available to Support personnel.
It should include topics such as:
Server name,
IP addresses,
Domain,
Purpose,
Main applications and versions installed,
Backup Strategy
Notes
Operating System
Inventory of Drives and sizes,
Free space alerts (email)
Memory (RAM)
CPU's/Cores
For Support/Troubleshooting
VPN Access information
Contact information for key personnel
Server monitoring
Server monitoring is designed to determine whether the server is appropriately sized for the load that it's running. This document won't be able to guess the tools available to the I.T. department to perform this monitoring.
All servers in the MES environment should be monitored but especially the Applications, Process Historian and SQL servers. The OPC servers should be monitored for a short period after installing and configuring tags to determine if they are sized correctly.
Anchor | ||||
---|---|---|---|---|
|
Monitoring can be performed manually by using Task Manager. Task Manager provides information about applications currently running on your system, the processes and memory usage or other data about those processes, and statistics about memory and processor performance. Although useful as a quick reference to system operation and performance, Task Manager lacks the logging and alert capabilities so it should be used as a current real time reference of how the server is operating "now".
The first time you open task manager on Windows Server, you will be presented with a minimal display.
...
Disk: The graph shows information about disk drives connected to the server.
The first graph shows the % disk activity in the last 60 seconds since the task manager was open.
The second graph illustrates the speed at which data is read or written to the disk in KB/MB per second.
Network : The graph shows the network throughout of the last 60 seconds since the task manager was open. It also shows the current upload and download speed with the tyoe of connection and ipv4 and Ipv6 address.
Anchor | ||||
---|---|---|---|---|
|
...
...
Event Viewer provides historical information that can help troubleshooting, track down system and security issues. The Window Logs category that are available:
...
Application Log: Records events logged by applications. For example, a SQL Database might record a database connection error.
Status | ||
---|---|---|
|
Security Log: Re
...
cords events such as valid and invalid logon attempts, creating\deleting files or other objects. Records events that you have set for auditing with local or global group policies (GPOs).
...
Proporty Name | Description |
Source | |
Event ID | |
Level | |
User | |
Option code | |
Log | |
Task Category | |
Keyword | |
Computer | |
Date and time |
Anchor | ||||
---|---|---|---|---|
|
If your server runs out of disk space, then it will obviously affect server performance. All data files (i.e.,. SQL Server databases) and log files should be configured to reside on the largest drive on the server.
It is important to regularly monitor the available free disk space on:
Process Historian: The drive(s) where the active archives reside
The drive(s) where the daily backups reside.
SQL: The drives where the .mdf and ldf files reside
The drives where the tempdb files reside.
Application Server:The drive where the log files reside.
How to Troubleshoot Disk Space Usage
Check the following:
...
Below is some code that can be used to check and purge Historian log files that are older than 30 days and are on the path "C:\Proficy Historian Data\LogFiles. You can create a batch file and run it on a weekly or monthly interval using Windows task scheduler. There are Powershell options also available.
:: Checking for log files older than 30 days
forfiles -p "C:\Proficy Historian Data\LogFiles" -s -m *.log -d -30 -c "cmd /c del @path"
IT teams have various tools available to them for checking and alarming on Hard Drive minimum thresholds. A good preventative maintenance plan will have one monitoring and alarming on the key servers. We have experienced several production outages over the years due to SQL, Application, Historian or OPC servers failing after they ran out of hard drive space.
Anchor | ||||
---|---|---|---|---|
|
Configure weekly, nightly, or hourly Snapshot schedules of Servers.
...
Specify the number of Snapshot copies to be retained and duration
Generate Syslog Messages for Server Actions and have automatic alarms sent to appropriate persons on snapshot failures.
Anchor | ||||
---|---|---|---|---|
|
There is some debate on whether servers need to be rebooted now and then. This is especially true for SQL servers that can experience Port exhaustion
Port exhaustion can cause all kinds of problems for your servers. Here's a list of some symptoms:
– Users won't be able to connect to file shares on a remote server – DNS name registration might fail – Authentication might fail – Trust operations might fail between domain controllers – Replication might fail between domain controllers – MMC consoles won't work or won't be able to connect to remote servers.
Suffice it to say that it would be a good idea to reboot servers at least once or twice a year. This may occur naturally with the implementation of Updates or Hotfixes. To determine the last time a server was rebooted or server uptime you can run the following command.
Go to "Start" -> "Run".
Write "CMD" and press on "Enter" key.
Write the command "net statistics server" and press on "Enter" key.
The line that start with "Statistics since …" provides the time that the server was up from.
Historian System Statistics
...
GE Hist 55_Using_Historian_AdministratorThe System Statistics screen, as shown in the following figure, displays current system status and performance statistics. It presents an overall view of system health. The screen has three sections:
System Statistics Section
Collectors Panel
Alerts Panel
System Statistics
...
NOTE: The statistics displayed on this screen are computed independently on various time scales and
schedules. As a result, they may update at different times.
...
The Field | Display |
Active Tags | Number of tags in your configuration. |
Licensed Tags | How many tags are authorized for this Historian installation by the Software Key and License.If this field displays 100 tags and the Licensed Users field displays 1 client, you are likely running in demonstration mode and you may have incorrectly installed your hardware key. |
Active Users | The number of users currently accessing the Historian system. |
Licensed Users | The number of users authorized to access the Historian application by the Software Key and License.The number of users that are authorized to access Historian is strictly based on the Software Key and License. However, if you have utilized your available Client Access Licenses (CAL) and need an additional one to administer the system in an emergency, you have an option to reserve a CAL.This reserved CAL allows you to access the server. To do so, provide the reserved CAL to the system administrators and add them to the ih Security Admins group. A system administrator will be able to connect to Historian in an emergency.This facility is optional and does not provide a guaranteed connection. This only eliminates the emergency situations when a CAL is preventing you from accessing the system and may not work if there are other conditions. For example, if Historian is busy, you will not be able to connect using this feature.If this field displays 1 client and the Licensed Tags field displays 100 tags, you are likely running in demonstration mode and you may have incorrectly installed your hardware key. Refer to Installing the Hardware Key for more information |
Alarm Rate | Displays the rate at which Historian is receiving alarm and event data. |
SCADA Tags | Displays the number of Proficy Cimplicity or Proficy iFIX tags. |
Tags Consumed by Arrays | Indicates the total number of Array tags consumed by Proficy Historian. |
...
Collectors panel Statistics section
The Collectors panel shows current statistics on the operation of all connected data collectors in the system. For more information on a particular data collector, click the name of the Data Collector you want to examine. The Collector Maintenance Screen for that collector then appears. You can also display the Collector Maintenance screen by clicking on the Collector link in the top line of the System Statistics screen.
To automatically refresh the collector's panel statistics, select Auto option in the collector's panel. Selecting Auto option will automatically refresh the collector's panel statistics for every 45 seconds.
You can also use the refresh button to manually refresh the collector's panel statistics. To refresh the statistics, click the Refresh button on the Collectors Panel.
The Collectors panel of the System Statistics screen displays data described in the following table.
The Field | Display |
Collector | The collector ID, which is used to identify the collector in a Historian system. |
Status |
|
Computer | The name of the computer the collector is running on. |
Report Rate | The current rate in a number of samples/minute at which the server is receiving data from the collector. It is a measure of the collection rate and also of data compression activity. A value equal to the data acquisition rate, when Collector Compression Percent is zero, indicates that every data value received from the data source is being reported to the server. This means that the collector is not performing any data compression. You can lower the report rate, and make the system more efficient, by increasing the data compression at the collector. To do this, widen the collection compression deadbands for selected tags. |
The Field | Display |
Overruns | The overruns in relation to the total events collected since startup. This value is calculated by using the following equation: OVERRUN_PCT = OVERRUNS / ( OVERRUNS + TOTAL_EVENTS_COLLECTED ). Overruns are a count of the total number of data events not collected on their scheduled polling cycle. In normal operation, this value should be zero.You may be able to reduce the number of overruns on the collector by increasing the tag collection intervals (per tag). |
Compression % | Percentage of how effective compression is at present for the specific collector since collector startup. A value of zero indicates that compression is either turned off or not effective. To increase the value, enable compression on the collector's associated tags and increase the width of the compression deadband on selected tags. The collector keeps track of how many samples it collected from the data source (OPC Server for example) and keeps track of how many samples it reported to the Historian data archiver (after collector compression is complete). A low number or zero means most everything coming from the data source is being sent to the Historian data archiver. The reason for the low number or zero is that too many samples are exceeding compression or you are not using collector compression. A high number or 100 means you are collecting a lot of samples, but they are not exceeding collector compression and therefore are not being sent to the server. |
Out of Order | How many samples within a series of timestamped data values normally transmitted in the sequence have been received out of sequence since collector startup? This field applies to all collectors. Even though events are still stored, a steadily increasing number of out of order events indicates a problem with data transmission that you should investigate. For instance, a steadily increasing number of out of order events when you are using the OPC Collector means that there is an out of order between OPC Server and the OPC Collector. This may also cause out of order between the OPC Collector and the data archiver but that is not what this statistic indicates. |
Anchor | ||||
---|---|---|---|---|
|
The Alerts panel displays all alerts and warnings received or generated by the system. You can scan through these messages by using the scroll bar at the right of the window. It displays the system timestamps and records of each message in this window.
To stop automatic updating of the display in the Historian Non-Web Administrator, clear the Show Alerts check box. This setting will be reset when you restart the Non-Web Administrator.
To automatically refresh the alerts panel statistics, select Auto option in the alerts panel. Selecting Auto option automatically refreshes the last five seconds alerts panel statistics for every 25 seconds.
You can also use the refresh button to manually refresh the alerts panel statistics. To refresh the statistics, click the Refresh button on the alerts panel.
The Alerts panel of the System Statistics screen displays data described in the following table:
The Field | Display |
Timestamp | The timestamp associated with the message or alert. |
Topic | The type of alert message. Only the Services and Performance alerts appear here. A total of up to 250 of the most recent messages will be displayed. |
Message | The content of the message or alert. |
Anchor | ||||
---|---|---|---|---|
|
The Message Search screen, shown in the following figure, lets you enter search parameters, such as start and end times, and to limit the search to alerts only or messages only. It further refines the search by topic and a text mask.
...