Configuring Hot Standby with API Gateway on Openshift

Hi all

I am trying to setup Hot Standby for API Gateway on OpenShift.
I have 2 separate OC clusters, each with their own API Gateway (and API Datastore) installed. However I am unable to achieve connectivity between the 2 clusters while configuring Hot Standby.

I am trying to expose the syncport over a Service+Route to the other cluster but connectivity keeps failing:

Exception : RuntimeException occurred while writing key {multicast,{GOSSIP_DATA,broadcast}} to remote: sync-API-mgt-dc1.ocp.endpoint:443, with error message: UNAVAILABLE: Network closed for unknown reason

Error occurred during gossip with vector: [[ad193d47-3c51-4059-8e54-060062bcec84,688639]] with error message: UNAVAILABLE: Network closed for unknown reason

Does anyone here have experience on this? I’m not sure how to configure the listener/ring since it’s exposed with a Service+Route…

Thanks!

Please share the following:

  • Definition of the OpenShift routes and services
  • REST API calls made in both clusters to configure and activate the ring (calls to /dataspace/*)
  • Confirm you’ve set up TLS to secure the replication

Hove you tried to rsh into one of the pod containers in cluster 1 and make a curl call to cluster 2 (pointing to its route?) This is to confirm cluster 2 is reachable from cluster 1. The same test should also be done in a pod container of cluster 2.

Hi Stephane

steps.txt (4.4 KB)
I included the yml services and routes as well as the rest API calls made to configure the clusters.

Currently I have not setup TLS as part of the replication config, I wanted to get it working first before adding that. However I am not sure what the best approach is since the Route itself uses TLS, I have also tried passthrough and simply without TLS… Also the flag “insecureTrustManager” is not clear for me what it does exactly.

Connectivity on the Routes itself does not work - I always get a 502 Bad Gateway when trying to connect to the syncPort via the Route. However directly testing the Service works:

curl API-mgt-dc2-API-gateway-is-sync.API-mgt-dc2.svc.cluster.local:4440
Warning: Binary output can mess up your terminal. Use “–output -” to tell
Warning: curl to output it to your terminal anyway, or consider “–output
Warning: ” to save to a file.

Thanks for your help!

Kind regards

The DC1 route is in namespace API-mgt, while the underlying service is in API-mgt-dc1. In the same route, the service name is wrong: API-mgt-dc1–API-gateway-is-sync instead of API-mgt-dc1-API-gateway-is-sync.
You seem to have rewritten the OCP descriptors, probably for confidentiality reasons, so what I report here could simply be due to manual rewriting errors.

I suspect the issue is more related to the configuration of OCP to support HTTP/2, and possibly to the TLS termination at the edge.
See for example https://access.redhat.com/solutions/2274201 and https://docs.openshift.com/container-platform/4.16/networking/ingress-operator.html

The very first thing I’d do is trying with a route configured with no TLS at all (you’d also need to reconfigure your replication ring.)
Then, if this plain HTTP route works, you could try setting up end to end TLS using a termination of type passthrough. This way, the OCP route just treats the traffic as binary data and does not touch the connection.
if you really want TLS termination at the edge, then you will probably need to do advanced tuning to the HAproxy and the route configuration.

Indeed those are my mistake when trying to obscure the data before posting here. But good eye!

Apparently it will not be possible at this time to enable HTTP/2 on our OCP… So I will try to set the Routes to TLS passthrough and configure the listeners with TLS as a workaround.

Thanks again for your help, understanding it better now!
I will get back to you with an update once I tested with TLS passthrough.

Kind regards

Hi Stephane

I configured the gRPC channel to be secured with TLS (see
notes.txt (2.4 KB))
And the Routes are also configured for TLS passthrough.

I confirm both datacenters are correctly listening with gRPC and see the connection is working (in both directions):

curl https://sync-api-mgt-dc1.dc1.ocp.somedns -k -v
...
> GET / HTTP/2
> Host: sync-api-mgt-dc1.dc1.ocp.somedns
> User-Agent: curl/7.61.1
> Accept: */*
>
...
< HTTP/2 415
< content-type: text/plain; encoding=utf-8
< grpc-status: 13
< grpc-message: Content-Type is missing from the request
<
Content-Type is missing from the request

But unfortunately I am still seeing this error in the logs when promoting both to STANDBY mode:

UNAVAILABLE: Network closed for unknown reason

I am close to getting it working, but not sure what the cause could be at this point…

EDIT: could you confirm for me if the Hot Standby mode uses streaming with gRPC?

The curl command points to port 443, but your listeners are configured with port 4440. have you configured some port translations anywhere here?

Similarly, you’ve set up TLS passthrough with nodes listening to port 4440, but your ring specifies port 443.

In the listeners config, insecureTrustManager should be set to false (since you’ve got TLS.)

You have created two root CAs (one for each DC) but you have a unique truststore that seems to contain only one root certificate. Your truststore should contains the two CA certs.

Retry the curl command without the “-k” switch, specify instead the truststore concerted into pem format.

Hot standby uses gRPC, yes.

Hi Stephane

Thanks for your feedback, to address your points:

  • Yes there is route translation, since I am setting this up on Openshift there is a service and Route on top of the pods listening on 4440. The Route is on 443 with TLS passthrough. Could this translation be causing some issue?
  • I tried with insecureTrustManager on false, but same error (network closed for unknown reason)
  • I reused the same CA to sign both certificates in both DCs (and tested the truststore in PEM format successfully)

Thanks again for your input on this very “vague” issue I am having…
Kind regards