MicroStrategy - Clustering

mstr

Important:
https://community.microstrategy.com/t5/Server/TN6022-MicroStrategy-Intelligence-Server-Cluster-Configuration/ta-p/167093 - need to read this again
https://community.microstrategy.com/t5/Server/TN5866-Tuning-MicroStrategy-Intelligence-Server-for-Memory-Usage/ta-p/166951 - need to read this again
https://community.microstrategy.com/t5/Server/TN16018-MicroStrategy-Intelligence-Server-Clustering-FAQ/ta-p/176386 - done reading
https://community.microstrategy.com/t5/Server/TN30728-MicroStrategy-Intelligence-Server-Cluster-Failover-Guide/ta-p/181663 - done reading
https://community.microstrategy.com/t5/Server/TN45867-In-a-MicroStrategy-9-2-x-9-3-x-clustered-environment/ta-p/195536 - done reading
https://community.microstrategy.com/t5/tkb/articleprintpage/tkb-id/server/article-id/4584 - done reading, SkipClusterChecking
https://community.microstrategy.com/t5/Server/TN240525-Improving-responsiveness-of-MicroStrategy-Intelligence/ta-p/240525 - done reading, UseClusterSynchThread
https://community.microstrategy.com/t5/Administration/I-server-crashed-due-to-large-reports/td-p/32564 - done reading

https://community.microstrategy.com/t5/Environment-Installation/Disaster-Recovery-for-MSTR/td-p/59399 - done reading
https://community.microstrategy.com/t5/Server/TN36359-After-a-restart-some-nodes-do-not-re-join-the/ta-p/186705 - done reading
https://community.microstrategy.com/t5/Web/TN17309-How-a-cluster-of-MicroStrategy-Intelligence-Servers/ta-p/177625 - done reading
https://community.microstrategy.com/t5/Server/TN12428-How-to-configure-the-MicroStrategy-Intelligence-Server/ta-p/173019 - done reading
https://community.microstrategy.com/t5/Administration/how-check-microstrategy-web-server/td-p/84104 - done reading
https://community.microstrategy.com/t5/tkb/articleprintpage/tkb-id/server/article-id/5066 - done reading
https://community.microstrategy.com/t5/Server/TN47195-In-a-MicroStrategy-9-2-x-9-4-x-clustered-environment/ta-p/196833 - done reading
https://community.microstrategy.com/t5/Administration/Email-Subscription-not-sending-to-All-Receipients/td-p/115282 - done reading
https://community.microstrategy.com/t5/tkb/articleprintpage/tkb-id/server/article-id/1286 - done reading
https://community.microstrategy.com/t5/tkb/articleprintpage/tkb-id/server/article-id/2024 - TN30775: How are time-based and event-based schedules
https://community.microstrategy.com/t5/tkb/articleprintpage/tkb-id/server/article-id/2178 - TN31628: How does MicroStrategy 9.x handle subscriptions
https://community.microstrategy.com/t5/tkb/articleprintpage/tkb-id/server/article-id/1215 - TN16606: How to schedule a time-based History List
https://community.microstrategy.com/t5/Server/TN35205-How-does-time-based-scheduling-work-in-a-MicroStrategy/ta-p/185696 - TN35205: How does time based scheduling work in a MicroStrategy Intelligence Server 9.2.x and later when the Intelligence Servers are on different time zones?
https://community.microstrategy.com/t5/Server/TN14393-What-role-do-Project-Failover-Latency-and-Configuration/ta-p/174872 - TN14393: What role do Project Failover Latency and Configuration Recovery Latency play in MicroStrategy Intelligence Server configuration
https://community.microstrategy.com/t5/Architect/TN19583-Advanced-settings-in-the-Projects-Tab-in-the/ta-p/179785 - TN19583: Advanced settings in the Projects Tab in the MicroStrategy Intelligence Server Configuration are grayed out
https://community.microstrategy.com/t5/tkb/articleprintpage/tkb-id/server/article-id/2634 - TN34137: The default project failover latency setting cannot
https://community.microstrategy.com/t5/Server/TN6058-How-and-when-does-MicroStrategy-Intelligence-Server/ta-p/167124 - TN6058: How and when does MicroStrategy Intelligence Server cluster synchronize user History List or Inbox
https://community.microstrategy.com/t5/Server/TN30700-In-MicroStrategy-Intelligence-Server-9-0-history-list/ta-p/181634 - TN30700: In MicroStrategy Intelligence Server 9.0, history list jobs executed through MicroStrategy Web do not fail over to the surviving node when the other node in the cluster is not available
https://community.microstrategy.com/t5/Server/TN4759-MicroStrategy-Intelligence-Server-cluster-load-balancing/ta-p/166017 - TN4759: MicroStrategy Intelligence Server cluster load balancing FAQ
https://community.microstrategy.com/t5/Server/TN36020-New-Statistics-and-Enterprise-Manager-tables-designed-to/ta-p/186415
https://community.microstrategy.com/t5/Administration/Cube-refresh-report/td-p/124911
https://community.microstrategy.com/t5/tkb/articleprintpage/tkb-id/server/article-id/1741
https://community.microstrategy.com/t5/tkb/articleprintpage/tkb-id/architect/article-id/6318
http://www.bryanbrandow.com/2014/10/trigger.html
https://lw.microstrategy.com/MSDZ/MSDL/940/docs/mergedProjects/mobilesdk/topics/iOS/samples/iPh_Auto_Refreshing_a_Document.htm
https://community.microstrategy.com/t5/tkb/articleprintpage/tkb-id/server/article-id/1741

How do people really monitor MSTR servers using a professional monitoring tool?

RefreshHostStatus
SkipClusterChecking
[Error] CDSSServerMessage::DropResultID(): RptCacheAdmin->ReleaseInboxRefCount return error 0x225
[Error] ClusterManager::SendAndReceiveMessage() failed: HResult = -2147202924.
[Warning] ClusterManager::Validate(): Server <QSCMSPAPP02> not responding (failed): HResult = -2147202924. Try one more time.
[Info] MSIClusterManager::hThreadSendAndReceiveMessage(): Info - sending a messages <104> to node <QSCMSPAPP02> is not completed before <20000>
[Error] MSIScheduler::RefreshPrimaryList(): Project internal id: 0 from non-primary to primary

What are the benefits of clustering?

  1. higher availability through failover
  2. better performance through load distribution
  3. greater scalability through an infrastructure for adding new resources into the system as system demands grow.

How many Intelligence servers can be part of cluster?

4.

How can we enable Automatic session re-routing?

In order for a failover to be seamless for the end-user, automatic session re-routing is provided, such that user sessions on failed cluster nodes are automatically re-routed to other available nodes. If a MicroStrategy Intelligence Server cluster node fails while running a request, end-users will receive an error message; but subsequent requests will run seamlessly through an available cluster node. To ensure that session re-routing does not ask the end-user to provide their login credentials again, users must enable the "Allow automatic login if session is lost" setting found under the Security section on the Web Administration Page

TN30728_TN30728_6.jpg

What is project failover?

Project failover support within a cluster is similar to system failover support. However, project failover only works in an asymmetrical cluster. For example, one server in a cluster is hosting project A and another server in the cluster is running projects B and C. If the first server becomes unavailable, the other server can begin running all three projects. Project failover support ensures that projects remain available even if hardware or an application fails.

Project failover is triggered when the number of nodes running a given project reaches zero due to node failure. At that point, the system automatically loads any projects that were on the failed system onto another server in the cluster to maintain the availability of those projects. Once the failed server recovers, the system reloads the original project onto the recovered server. It also removes the project from the server that had temporarily taken over.

What is project failover latency?

The amount of time taken before the project is loaded on one of the surviving nodes of the cluster to maintain minimum level availability. Latency parameters can be set via the Intelligence Server Configuration Editor (under Projects ->General). The latency parameters only take effect in case of an unexpected shutdown of a node. If the node is manually shutdown, there will not be any project failover and therefore the parameters will not apply.

Will project failover be effective if the node is manually shutdown?

No. The latency parameters only take effect in case of an unexpected shutdown of a node. If the node is manually shutdown, there will not be any project failover and therefore the parameters will not apply.

How does MicroStrategy Intelligence Server failover History Lists?

MicroStrategy Intelligence Server creates a backup of history lists on disk. MicroStrategy Intelligence Server provides History List (Inbox) synchronization but to be able to synchronize history lists, each cluster node must be able to access the history list folders on all other nodes. History Lists are synchronized across all cluster nodes in a manner similar to report caches. Each MicroStrategy Intelligence Server node retains in memory the location of all history list messages of users with active sessions on that node. When a history list message is created on one cluster node, the existence and location of this message is broadcast to the other cluster nodes. The other cluster nodes then update their memory with this new information. When a cluster node receives a request for an inbox message that resides in the hard disk of a remote cluster node, the cluster node will access the remote hard disk to retrieve the inbox message.

See TN6058.

If a MicroStrategy Intelligence Server node failure occurs in the middle of a job execution, the surviving node will proceed to run the unfinished history list job. This job failover is possible only because the history lists are synchronized across all nodes in the cluster. Additionally, the surviving node will append the history list messages from the failed node to its own list of messages. Each node in the cluster has a partial list of History List messages for each user based on where the message was created. When a user logs in, the user's History List is updated with messages from all the nodes in the cluster. For clarity, the following sequence of events occurs during history list failover:

  1. A report is sent to history list on cluster Node A.
  2. Before the report finishes executing, Node A fails.
  3. If File Based History List is configured, then Node B moves all history list backups from Node A's disk to Node B's disk. If Database Based History List is configured (possible starting from MicroStrategy 9), no history list backups are moved from one node to the other. All nodes in the cluster share the same History List Repository.
  4. Node B will notice that the history list message created in Step 1 is not yet in 'Ready' status. Therefore, Node B will re-execute the job associated with the history list message. In other words, for the job to be kicked off on the surviving node, the History List message status should be "Execution Error".

What happens to a non-History List (Interactive) job when a MicroStrategy Intelligence Server node fails?

MicroStrategy Web will automatically re-route the user session to an available node within the cluster but the job will not be re-executed. It will basically be canceled. https://community.microstrategy.com/t5/Server/TN30728-MicroStrategy-Intelligence-Server-Cluster-Failover-Guide/ta-p/181663

What happens to time-based subscriptions when a MicroStrategy Intelligence Server node fails?

In MicroStrategy 9.x, time-based subscriptions are load balanced, so they are not limited to the primary node. If the node running a time-based subscripton fails, the job that belongs to future upcoming time trigger will be balanced to the remaining nodes seamlessly. The jobs already running on the failed node will only be re-executed on one of the surviving nodes if that subscription has a history list association with it, otherwise it ill be canceled.

In MicroStrategy 8.x, time-based subscriptions only run on the primary node of the cluster. When this node fails, one of the other nodes in the cluster resumes the responsibility of the primary node and executes the remaining subscriptions. If the primary node fails during an in-flight time-based subscripton request, the job will failover to the new primary node only if this subscription has a history list association with it.

What happens to event-based subscriptions when a MicroStrategy Intelligence Server node fails?

Event based subscriptions run on the node where the event is triggered. So if this node fails, job will be re-executed if there is a history list association to it, otherwise it will be canceled.

In MicroStrategy 9.x, event-based subscriptions are also load balanced, so they are not limited to the triggered node. If the node running the event-based subscription fails, the job will failover to the whichever node MicroStrategy Web re-routes the user session to and will only be re-executed if this subscription has a history list association with it.

What does MicroStrategy software do when there is a node failure?

In the event of a MicroStrategy Intelligence Server failure, job re-execution on the surviving node is based on whether or not that job has a history list message associated to it. Failover is only supported for an unexpected Intelligence Server failure and not for a manual shutdown of the Intelligence Server. This is a summary. See elsewhere on this page for details.

Why is the performance of one server impacted when the other server in the cluster is shutdown?

This issue can be seen in environments where a response to a connect call (to connect to the Intelligence Server on the machine that was shutdown) can take a long time. This call can be slow to respond especially on UNIX/Linux operating systems. The surviving nodes try to connect to the down Intelligence Server because of the auto-join feature which is enabled by default. This feature allows MicroStrategy Intelligence Server which shuts down unexpectedly, to automatically join the cluster when it starts up. The node that was shutdown should re-establish the connection to the cluster when it is started again, but there probably are reasons for the surviving nodes to try to connect to the down node, such as to get cache and load balancing jobs.

What is the purpose of the SkipClusterChecking registry setting?

Starting with MicroStrategy Intelligence Server 9.3.1, a new registry setting has been added to control the auto-join feature. In cases, where the auto-join is causing unresponsiveness of the surviving nodes, this feature can be disabled. Implement the steps below to apply this registry setting:

  1. Shutdown MicroStrategy Intelligence Server 9.2.x - 9.4.x.
  2. Make a backup of the MSIReg.reg file found in the home folder of the MicroStrategy installation.
  3. Find the following entry in the file: [HKEY_LOCAL_MACHINE\SOFTWARE\MicroStrategy\DSS Server\Castor]
  4. Add the following entry under this location: "SkipClusterChecking"=dword:00000001
  5. Save and close the MSIReg.reg file
  6. Repeat steps 1-5 on all the MicroStrategy Intelligence Servers in the cluster.

After disabling the automatic join feature by adding the above registry setting, to ensure that the cluster is automatically formed after MicroStrategy Intelligence Server restart (shutdown can be normal shutdown or unexpected), all the servers should be selected on the server definition under MicroStrategy Intelligence Server Configuration > Clustering as shown below. If the servers are not selected, the cluster is not automatically formed upon restart.

TN47195_TN47195_2.JPG

To avoid seeing this issue, shut down MicroStrategy Intelligence Server before shutting down the physical machine. With this, the other nodes in the cluster will not try to communicate with the down node.

What is the purpose of the UseClusterSynchThread setting?

MicroStrategy Intelligence Server nodes in a cluster become unresponsive when one of the nodes is doing report cache, document cache or Intelligent Cube related activity. The unresponsiveness results in significant performance degradation and users notice significant slowdown in execution of jobs. This behavior is prominent when the number of caches are in thousands, multiple projects are loaded and with the environment having high number of clustered nodes. The MicroStrategy Intelligence Server unresponsiveness is most prominent during server startup, cluster maintenance or cache synchronization.

The Intelligence Server unresponsiveness in the cluster can be attributed to the number of threads used for cache synchronization. In 9.4.x versions, by default only one thread is used for synchronizing caches (Report/Document/Cube) between nodes in a cluster and for many other cluster related actions, including login. When the number of caches and projects to be synchronized becomes large, the MicroStrategy Intelligence Server may become unresponsive. When synchronization is slow, it slows down all the other cluster related activity.

In MicroStrategy Intelligence Server 9.4.x, the default behavior can be changed by having 3 threads per project per node. Three different threads will be used for synchronizing report caches, document caches and cube caches within a project on a node. The total number of threads that will be available equals – Number of Projects * Number of Clustered Nodes * 3. For example, in a 4 node cluster with 20 projects, we would have 20*4*3 = 240 threads for synchronization of caches across the cluster as compared to default value of 1 thread. Cache synchronization across the cluster becomes faster by increasing the number of threads. In most cases, faster cache synchronization leads to faster Intelligence Server responsiveness and better performance.

In the solution, we spawn new threads to specifically do the synchronization. Therefore, it would not cause other actions to hang even though synchronization were to take a long time. Enabling this setting will have more working threads and overhead due to context switches.

By default, the “UseClusterSynchThread” setting is disabled in MicroStrategy 9.4.x. Without this entry or if its value is 0, the old way of cluster synchronization is used. The default behavior can be changed by enabling the “UseClusterSynchThread” setting in the Windows registry or in Intelligence Server Registry file (MSIReg.reg).

  1. Launch regedit
  2. Browse to the following registry key: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\MicroStrategy\DSS Server\Castor
  3. Create a new 'DWORD (32-bit) Value' as shown below: original?v=mpbl-1&px=-1

After enabling this setting, the Intelligence Server will have one thread per node, per project and per synchronization type (3 types – report cache/document cache/cube cache) and the synchronization process will be parallel.

Which node in the cluster handle time-based schedules?

In a MicroStrategy Intelligence Server 8.1.x clustered environment, time-based schedules are run only on the primary node in the cluster. If the cluster primary node were to become unavailable, due to any reason ranging from a server crash, a shutdown or temporarily unreachable due to network issues, another server in the cluster will switch over to be the primary node. Any time based schedules will then become the responsibility of the new primary node in the cluster. Because of asymmetrical clustering the check of whether node is the primary node for a project is performed at the project level and on the node changing over a project to the primary status.

What are the requirements for setting up a cluster?

  1. Hardware should be identical although this is not a hard requirement.
  2. Each node of the cluster must be set up to point to the same metadata repository, and use the same server definition. Using the same server definition greatly reduce administrative overhead because changes need to be done only to the server definition instead of all servers. Depending on the type of changes, we may still need to do these changes individually on each server, and we still have to manually restart the Intelligence servers.
  3. When Intelligence Server is installed, the last step is to choose a user identity under which the service will run. In order to run in a clustered configuration this user must be a domain account that has a trust relationship with each of the computers in the cluster. This is needed because resources will be shared across the network.

How can we configure the Intelligence servers to automatically rejoin the cluster after being restarted?

  1. Launch Desktop / Developer
  2. Drill down on the right project source
  3. Right click on the project source and select Configure MicroStrategy Intelligence Server
  4. Expand Clustering > General
  5. Select which servers currently in the cluster will automatically rejoin upon startup (check the check box next to each server)
  6. Restart all of the servers in the cluster to apply the changes.

How can we check to see if the cluster has been broken?

  1. Check the MSTR Web Admin page. If the cluster is functioning, all the nodes within the cluster should be grouped together as one single unit.

It should be noted that the properties of each MicroStrategy Intelligence Server will be set independently of the cluster. This can be beneficial if the Administrator would like to set the Load Balance for one or more Nodes to zero and use those machines to run Schedules. Therefore, if it is desirable to make a change to the properties of all nodes in the cluster then the properties will need to be modified for each node in the cluster.

How can we rejoin a node to a cluster if the cluster become broken?

If the servers were rebooted simultaneously, they may not rejoin the cluster appropriately (even though we configured the servers to automatically rejoin the cluster). In such cases, users will received multiple emails for their scheduled reports. To rejoin the cluster:

  1. Launch Desktop
  2. Drill down on appropriate project source
  3. Expand Administration -> System Administration -> Cluster Nodes
  4. Right click on the right side
  5. Select Join Cluster
  6. Specify the name of the other machine in the cluster and click OK.

This will display a message that this node is already part of the cluster, and then update the screen to show both machines in the cluster. To verify that the node rejoin the cluster successfully, close Desktop, launch it again, and expand "Administration -> System Administration -> Cluster Nodes". It should display both machines. Check the other server as well. Check the mstrWebAdmin page as well for each MSTR Web instances.

How can we deliberately break a cluster?

  1. Launch Desktop
  2. Drill down on appropriate project source
  3. Expand Administration -> System Administration -> Cluster Nodes
  4. Right click on a server on the right side and select Leave Cluster.
  5. Stop the Intelligence server on the machine that we just removed from the cluster, and make sure that the MicroStrategy Intelligence Service is not automatically restarted when the server is reboot.

How can we configure the cluster so that scheduled reports are processed by a particular Intelligence server?

NEED CONCRETE STEPS.

How can we configure asymmetric clustering?

  1. Launch Desktop / Developer
  2. Drill down on the right project source
  3. Right click on the right project source and select Configure MicroStrategy Intelligence Server
  4. Drill down on Projects -> General
  5. Check appropriate check boxes

How does MSTR Web select the node to process the request?

  1. The MicroStrategy Web product load balancers on each Web Server collect load information from each cluster node, and then connect the users to the nodes that carry the lightest loads and run the project that the user logged into. All report requests are then processed by the nodes to which the users are connected.
  2. The MicroStrategy Intelligence Server nodes receive the requests and process them. In addition, the nodes communicate with each other to maintain metadata synchronization and cache accessibility across nodes.

What information is shared across MicroStrategy Intelligence Server nodes?

In a clustered environment, each node must share information with the other nodes so that the information users see is consistent regardless of the node to which they are connected when running reports. The nodes synchronize:

  1. Metadata information - object caches
  2. History Lists
  3. Report caches

How can the user tell whether their report hits a cache in a clustered environment?

If the cache is available on the local node, the Cache Monitor will increment the hit count. If the cache is retrieved from another node, speed of response can indicate whether a cache is hit. The SQL view of the report will also indicated whether Cache used = Yes or No. Statistics tables can provide additional data on cache hits as well.

What methods can be used to guarantee availability of the MicroStrategy Intelligence Server report cache?

To prevent the loss of a MicroStrategy Intelligence Server cluster node from affecting report cache availability, the cluster can be configured such that a separate file server is used as a common report cache repository. In order to maintain cache availability, this separate file server can be configured for failover with third-party clustering software.

If a report cache is created by a MicroStrategy Intelligence Server cluster node, will that report cache be seen in the Cache Monitor of another cluster node?

No. Although the new report cache will be available for use by other cluster nodes, the cache will not appear in the Cache Monitor of other cluster nodes. In order to see all report caches within a cluster, the administrator will need to create a separate data source within MicroStrategy Desktop for each cluster node. Then, the report caches within each node can be administered separately, using the same instance of the MicroStrategy Desktop application. https://community.microstrategy.com/t5/Server/TN16018-MicroStrategy-Intelligence-Server-Clustering-FAQ/ta-p/176386

What is asymmetric clustering?

With Asymmetric clustering, different projects can be loaded in each node of a MicroStrategy Intelligence Server cluster. Using different server definitions on different nodes in the cluster also compromises some of the benefits found in asymmetric project distribution across nodes, as well as the automatic reformation of the cluster at node startup.

Is it possible for different nodes of a MicroStrategy Intelligence Server cluster to run against different metadata repositories?

No, all the nodes in the same cluster must run against the same metadata.

Is it possible for different nodes of a MicroStrategy Intelligence Server cluster to run with different configuration settings under the same metadata repository?

Yes this is possible, using caution because users can configure different nodes at different settings. For example, differences in memory allocation for the cache, time out settings, etc can result in uneven performance across cluster nodes.

What can I do about the 'CDSSServerMessage::DropResultID(): RptCacheAdmin->ReleaseInboxRefCount return error 0x225' error message?

In my environment, I've seen this message preceded by another message 'ClusterManager::SendAndReceiveMessage() failed: HResult = -2147202924' which indicate that the cluster is unreliable during the time when the scheduled reports are run. I had about 700 subscriptions, and they all run at the same time. To avoid this issue:

  1. Expire all cache instead of deleting all cache.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License