The main problem is that the endpoints are only available once. So /health and /prometheus endpoints of the same microservice might point to different instances when multiple instances via replica > 1 or auto scale are configured.
Also it could be that on load you have multiple instances and on idle it will be reduced to just one instance which again, makes it very hard to monitor and compare.
The only option I see might be some additional attributes which are provided on the promtheus endpoint like the hostname or any other attributes which identifies the current instance. So the response of the prometheus request might vary each time you call it.
Still, you cannot control which instance is currently polled. So if you configure your prometheus to poll very 60 seconds it might be that it will query instance1, instance2 … instanceN each 60 seconds with no guarantee that you get them from all available instances. It could also be that you get always instance1.
I haven’t tried that but this could be a solution to your problem adding this information (if not already present).
Also I would question the main purpose of your approach: Do you really need multiple instances of your microservice? What is the main purpose of the microservice? Would it be sufficient to add additional microservices on request or scale them vertically only?
We are trying to do what your backend monitoring was able to do to get those per instance kpi by ourselves instead of getting it from support and R&D team. So there must be away as evident in the plots we got from SwAG. We are trying to isolate a problem with a few microservices.
the main difference is that Operations + R&D support has direct access to the kubernetes cluster of the whole instance getting much more monitoring insight as you get if you don’t have that and have to rely on the prometheus endpoints only.
Also it’s not possible to get you access on kubernetes level on shared environments of course.
I would recommend starting with one instance per microservice to monitor the microservices on C8Y. You can increase the CPU limit or even turn off auto-scaling to ensure that there is only one instance of this microservice. This will allow you to concentrate on monitoring the resource usage of this microservice and then find out why the memory consumption is accumulating. I don’t think monitoring multiple instances can help with this issue. You can see in the graph that the instance that always survives has an increasing memory consumption. The presence of other instances didn’t change its rising trend.