During a routine new installation of SCOM2022 I encountered a very peculiar error, not during the setup process, but during the normal operation of these services outside the set up window.
SCOM - what is that ?
SCOM is part of the System Centre suite, this particular particle focuses on the operations manager side of the equation, However, it would not surprise me if the same Issue would apply for every product in this suite.
SQL is the backbone for System Centre
Obviously, it goes without saying for these products to work you need an SQL server and management servers - once you set up your SQL server with the required features, in this particular example, this was an SQL server that was dedicated for SCOM.
SQL Requirements
SCOM 2022 ideally requires SQL 2019, however, please ensure you refer to the image below to make sure you install the correct options, once you have installed SQL 2019 with the options required as below:
Ensure the latest CU is installed
Then the next order of business is to get the latest Cumulative update (CU) Before you start installing SCOM, I have always used this website to get the latest CU.
Setup Management Server (RMS)
Once your SQL has the latest CU It’s now time To run the set up and deploy your first management server, when you install this product for the first time, it will automatically create the databases, and then install the rest of the management server role automatically, this is the part of the setup that does work seamlessly after you give it the SQL connection string.
Problem after setup
Once set up has completed, you can confirm the databases are present on your SQL server and you should have an active management server, not particularly complicated, right?
Well, unfortunately, no, it wasn’t that simple for my installation, the moment the Services tried to start. I had a unique situation that was a bit confusing, this is what I observed:
The "System Center Data Access" service was in the running state and the event log confirmed it was successfully connected to the SQL database, and it had restored it to connectivity
The "System Centre Configuration Service" while was running, was not connecting to the SQL server and by definition to database, and this is the error I noticed continually in the event log:
The error text was as below:
OpsMgr Management Configuration Service failed to execute bootstrap work item 'ConfigurationDataProviderInitializeWorkItem' due to the following exception System.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: SQL Network Interfaces, error: 26 - Error Locating Server/Instance Specified)
That error clearly says it can’t find the instance specified, that’s very peculiar because set up didn’t have a problem with this and neither does the data access service - So why is the configuration service not happy with the database connection?
SQL Management Studio : Successfully Connects
If you check your SQL server with the management studio, you can see your management server is connecting, you get no errors in the SQL logs, there are also no errors in the event log that lead you to where the problem is, when I got to this particular situation, I have three other management servers all doing exactly the same thing.
SQL is not the problem, directly
This would point you to a problem being with the SQL server, however, if you look at this logically the setup did not encounter a problem and the Data access service is also not complaining about errors so it’s only one of the surfaces that can’t talk to SQL
ODBC Test - to check connectivity
I then tried an OBDC connection to the database, using the connection string specified in the registry and this is where it starts to make some sense, if I use the connection string in the registry I did indeed get an error connecting to the database:
Checking Connection String
This got me looking at connection, string values and the usual connection string of specifying an SQL server is like this:
\\server.bear.local\instance
This format means you need to specify the server name first, then followed by the instance name if you do not specify a port It’s assumes you’re using TCP:1433 which is the default for SQL
If you have customized your port, and please don’t do this from a security point of view because a quick sports scan will immediately unveil your alternate port, then you also need to specify the port after a comma at the end of the connection string like this:
\\server.bear.local\instance,5600
This format for the connection string did not work when I tested it in ODBC, but remember, it didn’t seem to bother the set up process or the data access service. only the configuration service seems to be affected by this connection string, why would that be?
If using a Default Instance name : You do not need the specify that name!
Well, interestingly, when you set up your SQL server, you get to a certain stage in the set up where you are asked for your instance name as below:
If you leave these values as default, which essentially means you have the instance name of “MSSQLSERVER” you do not need to specify the instance name in the connections string.
This means if I go back to the affected management server and try the connection string in ODBC as below, remember, in this example, we don’t need to specify the ports because it will assume the default port:
\\server.bear.local
This is now successfully connects to the SQL server, this means the set up has been run with the incorrect SQL configuration settings that only seems to be affecting the configuration service, Yes, that will be a human error, but now we need to fix it.
Re-Install or Fault Find?
Obviously, the first thing that comes into your mind is that simple this is a new installation.
We can just delete the databases and run the set up process again with the correct connection strings, well, yes, that would work, but sometimes it’s nicer to go a bit further down the rabbit hole you’ve just created to see if you can fix it without a rebuild.
Obviously, if you choose to reinstall the software from scratch that’s fantastic, but if you’re doing an upgrade of an already well established installation that may not be an option.
🐰 Down the rabbit hole
If you’ve got to this point You have decided that you do not want to uninstall and reinstall, but rather fix the problems and learned a lot more about how these products work.
🚀 Fixing the Issue
We need to tackle this in two different places. firstly in the Registry and secondly in a single configuration file
If you just update the Registry locations this does not fix the problem (you’re more than welcome to test) the registry is the base location and the configuration file contains the running configuration.
Warning : This will involve editing Registry so it recommended to take a backup and it’s also worth mentioning if you incorrectly modify the registry then you may unleash a fire breathing dragon on your issue that can’t be easily fixed.
With the disclaimers out the way, Open up your registry editor and there are two keys. We need to focus on let’s go through them one by one.
The first location we need to check out is the one shown below:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Setup
When you navigate to this location you should see something like this:
We need to make sure in this particular example, we have set the name of the server without the instance name and then ending with the SQL server port in this registry key
The second location we need to check is the one shown below:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\System Center\2010\Common\Database
Again, as before, once you have navigated to this location you need to update the DatabaseServerName with the correct value:
Exactly like before, we need to set the correct database connection string, ending with the port of the SQL server in the registry key
We have now updated our Registry keys to the correct location, now we now need to restart some Services with the command below:
Restart-Service OMSDK
Restart-Service cshost
Obviously, after you restart the services, this won’t fix the problem at the moment we have one more configuration file to check and update.
The configuration file is stored in the following location:
C:\Program Files\Microsoft System Center\Operations Manager\Server
Once you were navigated to this folder, the file we are looking for is shown in the screenshot below, but the file named will be labelled ConfigService.config:
We now need to edit this file, but remember, you need to run Notepad as an administrator otherwise you won’t have access to edit the file.
Then when you open the file you are looking for the connecting string in SQL that is set incorrectly there are two locations in this file these are first under the CMDB section where you need to replace ServerName to the FQDN name of the SQL server:
<Category Name="Cmdb">
<Setting Name="ServerName" Value="<SERVER_FQDN>" />
<Setting Name="DatabaseName" Value="OperationsManager" />
Then you the ConfigStore section where you will need to update the same ServerName key to the FQDN of the SQL server:
Restart-Service OMSDK
Restart-Service cshost