A major challenge when deploying data pipelines to run on Kubernetes is how to handle Kerberos principals and Kerberos keytabs needed when pipelines write to secure Hadoop.
What Are Kerberos Principals?
Kerberos principals are identifiers that represent either users or service daemons. User principals are often constructed as name@realm and are not bound to a specific host. Principles for service daemons are typically of the form name/host@realm.
What Is A Kerberos Keytab?
Kerberos keytabs are files that associate encrypted keys with principals and serve as a basis for authentication.
The use of Kerberos keytabs for principals of the form name@realm (without a host field), incurs security risks as a Kerberos keytab for such a principal could be used on any host in the enterprise. Best practice for Kerberos principals is that they be of the form name/host@realm.
But how can host-qualified principals and keytabs be automatically generated for Kubernetes-based deployments, which can be dynamic, ephemeral and auto-scaling, with host names not necessarily known beforehand? The answer to that question is: StreamSets Provisioning Agent!
The below image depicts multiple StreamSets Data Collector Engine deployments, each with a different Kerberos user and kerberos keytab associated with them.
Here is how the StreamSets Provisioning Agent, in conjunction with StreamSets Control Hub, automates the Kerberos aspects of the deployment process:
Step 1: The Provisioning Agent polls Control Hub looking for tasks to perform.
Step 2: When there is a deployment request (for example, “create two Data Collectors for the Marketing Department with the Kerberos user ‘marketing’”), the Provisioning Agent interacts with the Kerberos KDC and creates Kerberos principals of the form marketing/@ and generates keytabs for those principals.
Step 3: The Provisioning Agent injects the Kerberos principal name and keytab into each Data Collector’s configuration.
Step 4: Multiple deployments can have unique principals associated with their own set of Data Collectors.
Step 5: The Provisioning Agent will dynamically provision new Kerberos principal names and a kerberos keytab or multiple when needed to respond to horizontal pod autoscaling events. For example, if a third Data Collector is spawned for a given deployment under load, the new Data Collector will automatically get the kerberos credentials it needs, tied to the new host.
An additional service performed by the Provisioning Agent is the automatic cleanup of the KDC when Kubernetes deployments terminate or pods are bounced. This prevents the KDC from being littered with no-longer needed principals.
Provisioning Agent Configuration
In the Provisioning Agent Helm Chart (aka “Control Agents”), specify Kerberos configuration in the Chart’s values.yaml file as follow:
krb:
enabled: false
encryptionTypes:
containerDn:
ldapUrl:
adminPrincipal:
adminKey:
realm:
kdcType: < AD | MIT >
With such a configuration, a Provisioning Agent will be able to interact with a Kerberos KDC.
Note: All credentials and Kerberos configuration details provided in the Chart are managed as Kubernetes Secrets.