Hadoop can essentially run in two modes: unsecure and secure, which unfortunately means many different things to different people. For Hadoop, there are multiple levels of security that can be enabled, starting with perimeter security based on authentication via Kerberos. Then there is over-the-wire and at-rest security, both based on pluggable encryption algorithms, with a central key management service (KMS). On top of that comes authorization (e.g. Apache Sentry or Ranger), and eventually special access logs to provide complete governance and auditability. For the remainder of this post we are only concerned with the first level, authentication.
When running in unsecure mode nothing needs to be done, as there are no checks performed as far as verifying user accounts is concerned. You can pretend to be anyone you want to be, and Hadoop will take your word for it. This can be accomplished by setting the optional environment variable HADOOP_USER_NAME
before invoking a CLI command, like, for example, $ hadoop dfs -ls
.
For real production workloads, you will want to at least enable perimeter security to provide proper authentication. For Hadoop this is based on the Kerberos protocol, which uses a central database of known and trusted users and services, so called principals, to issue tickets that allow full user authentication, even over a non-secured network (for example, not using SSL/TLS).
Once Kerberos is enabled you have to generate and deploy keytabs on each server for every service they offer - that is, a so called service principal (SPN) per shared service and dedicated to each server hostname. Luckily, this is automated by the various Hadoop distributions these days, and in the end you should have a cluster running that is ready for you to connect to without too much effort.
When using the Hadoop command line tools, you now need to provide a user ID, either in form of a simple name for unsecure clusters, or in form of an user principal (UPN) for secured, kerberized clusters. Here be dragons going forward, causing a lot of grief in practice. How can you check what is going on, and are there any tools to debug the current user ID used using the command line interface? Why, yes, there is, the swiss army knife of classes, the UserGroupInformation
(UGI) comes to the rescue.
This class is shared by literally any Hadoop project that is using the native Hadoop APIs, including the core projects, as well as the all the auxiliary ones building on top of the stack. A little known fact is, that it comes with a handy main()
method, which dumps the environment it determines. Since all other command line tools use the same class internally to build the user environment, this can be used to determine exactly as which user any command would be executed.
Let's assume that we have the following core-site.xml
content in our local Hadoop configuration file, set using HADOOP_CONF_DIR=/etc/opt/hadoop/conf
:
$ cat /etc/opt/hadoop/conf/core-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> ... <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master-1.internal.larsgeorge.com:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/data/tmp/hadoop-\${user.name}</value> </property> <property> <name>hadoop.security.authentication</name> <value>kerberos</value> </property> </configuration>
You can see that the authentication method has been set to kerberos
via the parameter hadoop.security.authentication
. To connect to the cluster, a user needs to first authenticate with Kerberos by using the kinit
command. Afterwards, klist
can be used to verify the ticket is valid:
$ kinit Password for larsgeorge@INTERNAL.LARSGEORGE.COM: <PASSWORD> <ENTER> $ klist Ticket cache: FILE:/tmp/krb5cc_500 Default principal: larsgeorge@INTERNAL.LARSGEORGE.COM Valid starting Expires Service principal 04/24/16 18:43:19 04/25/16 18:43:19 krbtgt/INTERNAL.LARSGEORGE.COM@INTERNAL.LARSGEORGE.COM renew until 05/01/16 18:43:19
If we now call the UGI class with no parameters at all, we get the following:
$ hadoop org.apache.hadoop.security.UserGroupInformation Getting UGI for current user User: larsgeorge Group Ids: Groups: larsgeorge wheel UGI: larsgeorge (auth:KERBEROS) Auth method KERBEROS Keytab false ============================================================
The output shows my current user and group names. The UGI instance was set up using the Kerberos credentials (Auth method KERBEROS
), but no keytab was used ('Keytab false').
Now let's do the same again, but this time try to explicitly pass a username to Hadoop by setting the environment variable HADOOP_USER_NAME
:
$ HADOOP_USER_NAME=foobar hadoop org.apache.hadoop.security.UserGroupInformation Getting UGI for current user User: larsgeorge Group Ids: Groups: larsgeorge wheel UGI: larsgeorge (auth:KERBEROS) Auth method KERBEROS Keytab false ============================================================
The result stays the same, i.e. the Kerberos ticket trumps any local setting using a variable, as opposed to an unsecured cluster.
By the way, the ticket is retrieved from the local KRB5 ticket cache. You can see where that resides, for example, first issuing a $ kdestroy
(not shown as it does not return any command line feedback), and then:
$ klist klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_500)
The kdestroy
removed the local cache file altogether, and from here on out we have no Kerberos ticket anymore. For demonstration purposes, let's trick the local Hadoop client into believing the cluster is not secured. We warp the configuration environment variable to a non-existent path, then set the name and invoke the UGI main method, which will make Hadoop use default values:
$ export HADOOP_CONF_DIR=/etc/opt/hadoop/conf.BOGUS $ HADOOP_USER_NAME=foobar hadoop org.apache.hadoop.security.UserGroupInformation Getting UGI for current user User: foobar Group Ids: 2016-04-04 08:17:02,501 WARN [main] security.UserGroupInformation \ (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user foobar Groups: UGI: foobar (auth:SIMPLE) Auth method SIMPLE Keytab false ============================================================
A few things have changed. We now see the name we handed in set as the user name, and the authentication method has changed to SIMPLE
, which is the default. A side effect to using a bogus name is that it does not exist locally, and UGI is emitting a warning message, as it could not determine the group foobar
belongs to, as there is none.
Let's unset the non-existent configuration path and try again, this time using the core-site.xml
with Kerberos enabled again:
$ unset HADOOP_CONF_DIR $ HADOOP_USER_NAME=foobar hadoop org.apache.hadoop.security.UserGroupInformation Getting UGI for current user User: larsgeorge Group Ids: Groups: larsgeorge wheel UGI: larsgeorge (auth:SIMPLE) Auth method SIMPLE Keytab false ============================================================
Again, the username we passed into Hadoop is ignored and the local username is used instead.
The UGI class has optional parameters, which let you specify a principal and keytab to perform the login with. So, instead of issuing a kinit
, we can login in with an SPN, given we have the principal and a valid keytab (use $ klist -kt <keytab>
to get the content of a keytab, which lists the contained principals). Let's first try the bogus configuration again, but using a keytab:
$ export HADOOP_CONF_DIR=/etc/opt/hadoop/conf.BOGUS $ hadoop org.apache.hadoop.security.UserGroupInformation \ hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM \ hdfs-master-2.internal.larsgeorge.com.keytab Getting UGI for current user User: larsgeorge Group Ids: Groups: larsgeorge wheel UGI: larsgeorge (auth:SIMPLE) Auth method SIMPLE Keytab false ============================================================ Getting UGI from keytab.... User: larsgeorge Group Ids: Groups: larsgeorge wheel Keytab: larsgeorge (auth:SIMPLE) Auth method SIMPLE Keytab false
We now see an additional section, since we tried to log in using a keytab (after the divider). But, alas, it failed, because the default configuration in use again reverts to SIMPLE
as the authentication method to use. UGI ignores the keytab we have given and uses the local configuration instead once more.
Just for good measure, here the same but again adding the local user name variable in an attempt to override the name:
$ HADOOP_USER_NAME=foobar hadoop org.apache.hadoop.security.UserGroupInformation \ hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM \ hdfs-master-2.internal.larsgeorge.com.keytab Getting UGI for current user User: foobar Group Ids: 2016-04-04 08:40:45,313 WARN [main] security.UserGroupInformation \ (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user foobar Groups: UGI: foobar (auth:SIMPLE) Auth method SIMPLE Keytab false ============================================================ Getting UGI from keytab.... User: foobar Group Ids: 2016-04-04 08:40:45,316 WARN [main] security.UserGroupInformation \ (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user foobar Groups: Keytab: foobar (auth:SIMPLE) Auth method SIMPLE Keytab false
As expected, it used the variable and sets the name to foobar
- and cannot resolve the group as per the above. OK, now for the real keytab attempt, unsetting the configuration to revert to the one that states KERBEROS
:
$ unset HADOOP_CONF_DIR $ HADOOP_USER_NAME=foobar hadoop org.apache.hadoop.security.UserGroupInformation \ hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM \ hdfs-master-2.internal.larsgeorge.com.keytab Getting UGI for current user User: larsgeorge Group Ids: Groups: larsgeorge wheel UGI: larsgeorge (auth:KERBEROS) Auth method KERBEROS Keytab false ============================================================ Getting UGI from keytab.... 16/04/04 08:40:58 INFO security.UserGroupInformation: Login successful for user \ hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM using keytab file \ hdfs-master-2.internal.larsgeorge.com.keytab User: hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM Group Ids: Groups: hadoop Keytab: larsgeorge (auth:KERBEROS) Auth method KERBEROS Keytab true
This time it succeeded for part two, i.e. the login from keytab succeeded, using the given principal name (hdfs
at a specific node and realm). The group resolved to hadoop
correctly, as is set in my example cluster. Also interesting is the top part, stating larsgeorge
as the user name. Albeit having destroyed the local ticket cache, the UGI output makes it look like that some Kerberos mechanism was used to log me in. But that is not the case, as Auth method KERBEROS
really just says that the cluster configuration is set to Kerberos.
Finally, how about taking a look under the hood for a bit. There are two things you can enable to debug the login process on the command line: the debug log of the Hadoop class itself, and the underlying native Kerberos libraries. We start with the logger inside the Hadoop Java class first, running the login from keytab again:
$ export HADOOP_ROOT_LOGGER=DEBUG,console $ hadoop org.apache.hadoop.security.UserGroupInformation \ hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM \ hdfs-master-2.internal.larsgeorge.com.keytab 16/04/04 13:51:48 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate \ org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation \ @org.apache.hadoop.metrics2.annotation.Metric(value=[Rate of successful kerberos logins \ and latency (milliseconds)], about=, valueName=Time, type=DEFAULT, always=false, sampleName=Ops) 16/04/04 13:51:48 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate \ org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation \ @org.apache.hadoop.metrics2.annotation.Metric(value=[Rate of failed kerberos logins and \ latency (milliseconds)], about=, valueName=Time, type=DEFAULT, always=false, sampleName=Ops) 16/04/04 13:51:48 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate \ org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation \ @org.apache.hadoop.metrics2.annotation.Metric(value=[GetGroups], about=, valueName=Time, \ type=DEFAULT, always=false, sampleName=Ops) 16/04/04 13:51:48 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics Getting UGI for current user 16/04/04 13:51:48 DEBUG util.Shell: setsid exited with exit code 0 16/04/04 13:51:48 DEBUG security.Groups: Creating new Groups object 16/04/04 13:51:48 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library... 16/04/04 13:51:48 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library 16/04/04 13:51:48 DEBUG security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution 16/04/04 13:51:48 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping \ impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping 16/04/04 13:51:49 DEBUG security.Groups: Group mapping \ impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; \ warningDeltaMs=5000 16/04/04 13:51:49 DEBUG security.UserGroupInformation: hadoop login 16/04/04 13:51:49 DEBUG security.UserGroupInformation: hadoop login commit 16/04/04 13:51:49 DEBUG security.UserGroupInformation: using kerberos user:null 16/04/04 13:51:49 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: larsgeorge 16/04/04 13:51:49 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: \ larsgeorge" with name larsgeorge 16/04/04 13:51:49 DEBUG security.UserGroupInformation: User entry: "larsgeorge" 16/04/04 13:51:49 DEBUG security.UserGroupInformation: UGI loginUser:larsgeorge (auth:KERBEROS) User: larsgeorge Group Ids: Groups: larsgeorge wheel UGI: larsgeorge (auth:KERBEROS) Auth method KERBEROS Keytab false ============================================================ Getting UGI from keytab.... 16/04/04 13:51:49 DEBUG security.UserGroupInformation: hadoop login 16/04/04 13:51:49 DEBUG security.UserGroupInformation: hadoop login commit 16/04/04 13:51:49 DEBUG security.UserGroupInformation: using kerberos \ user:hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM 16/04/04 13:51:49 DEBUG security.UserGroupInformation: Using user: \ "hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM" with name \ hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM 16/04/04 13:51:49 DEBUG security.UserGroupInformation: User entry: \ "hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM" 16/04/04 13:51:49 INFO security.UserGroupInformation: Login successful for user \ hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM using keytab file \ hdfs-master-2.internal.larsgeorge.com.keytab User: hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM Group Ids: Groups: hadoop Keytab: larsgeorge (auth:KERBEROS) Auth method KERBEROS Keytab true
Second, we add the library debugging, and I apologize for the lengthy output that follows now:
$ export HADOOP_OPTS="-Dsun.security.krb5.debug=true -Djavax.net.debug=ssl" $ hadoop org.apache.hadoop.security.UserGroupInformation \ hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM \ hdfs-master-2.internal.larsgeorge.com.keytab 16/04/04 13:51:59 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate \ org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation \ @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of successful kerberos \ logins and latency (milliseconds)], about=, always=false, type=DEFAULT, sampleName=Ops) 16/04/04 13:51:59 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate \ org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation \ @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of failed kerberos logins \ and latency (milliseconds)], about=, always=false, type=DEFAULT, sampleName=Ops) 16/04/04 13:51:59 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate \ org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation \ @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[GetGroups], about=, \ always=false, type=DEFAULT, sampleName=Ops) 16/04/04 13:51:59 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics Getting UGI for current user 16/04/04 13:52:00 DEBUG util.Shell: setsid exited with exit code 0 Java config name: null Native config name: /etc/krb5.conf Loaded from native config 16/04/04 13:52:00 DEBUG security.Groups: Creating new Groups object 16/04/04 13:52:00 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library... 16/04/04 13:52:00 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library 16/04/04 13:52:00 DEBUG security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for \ Group resolution 16/04/04 13:52:00 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping \ impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping 16/04/04 13:52:00 DEBUG security.Groups: Group mapping \ impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; \ warningDeltaMs=5000 >> Look up native default credential cache >>>KinitOptions cache name is /tmp/krb5cc_500 16/04/04 13:52:00 DEBUG security.UserGroupInformation: hadoop login 16/04/04 13:52:00 DEBUG security.UserGroupInformation: hadoop login commit 16/04/04 13:52:00 DEBUG security.UserGroupInformation: using kerberos user:null 16/04/04 13:52:00 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: larsgeorge 16/04/04 13:52:00 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: \ larsgeorge" with name larsgeorge 16/04/04 13:52:00 DEBUG security.UserGroupInformation: User entry: "larsgeorge" 16/04/04 13:52:00 DEBUG security.UserGroupInformation: UGI loginUser:larsgeorge (auth:KERBEROS) User: larsgeorge Group Ids: Groups: larsgeorge wheel UGI: larsgeorge (auth:KERBEROS) Auth method KERBEROS Keytab false ============================================================ Getting UGI from keytab.... Java config name: null Native config name: /etc/krb5.conf Loaded from native config >>> KdcAccessibility: reset >>> KdcAccessibility: reset >>> KeyTabInputStream, readName(): INTERNAL.LARSGEORGE.COM >>> KeyTabInputStream, readName(): hdfs >>> KeyTabInputStream, readName(): master-2.internal.larsgeorge.com >>> KeyTab: load() entry length: 112; type: 18 >>> KeyTabInputStream, readName(): INTERNAL.LARSGEORGE.COM >>> KeyTabInputStream, readName(): hdfs >>> KeyTabInputStream, readName(): master-2.internal.larsgeorge.com >>> KeyTab: load() entry length: 96; type: 17 >>> KeyTabInputStream, readName(): INTERNAL.LARSGEORGE.COM >>> KeyTabInputStream, readName(): hdfs >>> KeyTabInputStream, readName(): master-2.internal.larsgeorge.com >>> KeyTab: load() entry length: 104; type: 16 >>> KeyTabInputStream, readName(): INTERNAL.LARSGEORGE.COM >>> KeyTabInputStream, readName(): hdfs >>> KeyTabInputStream, readName(): master-2.internal.larsgeorge.com >>> KeyTab: load() entry length: 96; type: 23 >>> KeyTabInputStream, readName(): INTERNAL.LARSGEORGE.COM >>> KeyTabInputStream, readName(): hdfs >>> KeyTabInputStream, readName(): master-2.internal.larsgeorge.com >>> KeyTab: load() entry length: 88; type: 8 >>> KeyTabInputStream, readName(): INTERNAL.LARSGEORGE.COM >>> KeyTabInputStream, readName(): hdfs >>> KeyTabInputStream, readName(): master-2.internal.larsgeorge.com >>> KeyTab: load() entry length: 88; type: 3 >>> KeyTabInputStream, readName(): INTERNAL.LARSGEORGE.COM >>> KeyTabInputStream, readName(): HTTP >>> KeyTabInputStream, readName(): master-2.internal.larsgeorge.com >>> KeyTab: load() entry length: 112; type: 18 >>> KeyTabInputStream, readName(): INTERNAL.LARSGEORGE.COM >>> KeyTabInputStream, readName(): HTTP >>> KeyTabInputStream, readName(): master-2.internal.larsgeorge.com >>> KeyTab: load() entry length: 96; type: 17 >>> KeyTabInputStream, readName(): INTERNAL.LARSGEORGE.COM >>> KeyTabInputStream, readName(): HTTP >>> KeyTabInputStream, readName(): master-2.internal.larsgeorge.com >>> KeyTab: load() entry length: 104; type: 16 >>> KeyTabInputStream, readName(): INTERNAL.LARSGEORGE.COM >>> KeyTabInputStream, readName(): HTTP >>> KeyTabInputStream, readName(): master-2.internal.larsgeorge.com >>> KeyTab: load() entry length: 96; type: 23 >>> KeyTabInputStream, readName(): INTERNAL.LARSGEORGE.COM >>> KeyTabInputStream, readName(): HTTP >>> KeyTabInputStream, readName(): master-2.internal.larsgeorge.com >>> KeyTab: load() entry length: 88; type: 8 >>> KeyTabInputStream, readName(): INTERNAL.LARSGEORGE.COM >>> KeyTabInputStream, readName(): HTTP >>> KeyTabInputStream, readName(): master-2.internal.larsgeorge.com >>> KeyTab: load() entry length: 88; type: 3 Added key: 3version: 1 Found unsupported keytype (8) for hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM Added key: 23version: 1 Added key: 16version: 1 Added key: 17version: 1 Added key: 18version: 1 Ordering keys wrt default_tkt_enctypes list Using builtin default etypes for default_tkt_enctypes default etypes for default_tkt_enctypes: 18 17 16 23 1 3. Added key: 3version: 1 Found unsupported keytype (8) for hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM Added key: 23version: 1 Added key: 16version: 1 Added key: 17version: 1 Added key: 18version: 1 Ordering keys wrt default_tkt_enctypes list Using builtin default etypes for default_tkt_enctypes default etypes for default_tkt_enctypes: 18 17 16 23 1 3. Using builtin default etypes for default_tkt_enctypes default etypes for default_tkt_enctypes: 18 17 16 23 1 3. >>> KrbAsReq creating message >>> KrbKdcReq send: kdc=master-2.internal.larsgeorge.com UDP:88, timeout=30000, \ number of retries =3, #bytes=206 >>> KDCCommunication: kdc=master-2.internal.larsgeorge.com UDP:88, timeout=30000,Attempt =1, \ #bytes=206 >>> KrbKdcReq send: #bytes read=772 >>> KdcAccessibility: remove master-2.internal.larsgeorge.com Added key: 3version: 1 Found unsupported keytype (8) for hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM Added key: 23version: 1 Added key: 16version: 1 Added key: 17version: 1 Added key: 18version: 1 Ordering keys wrt default_tkt_enctypes list Using builtin default etypes for default_tkt_enctypes default etypes for default_tkt_enctypes: 18 17 16 23 1 3. >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType >>> KrbAsRep cons in KrbAsReq.getReply hdfs/master-2.internal.larsgeorge.com 16/04/04 13:52:00 DEBUG security.UserGroupInformation: hadoop login 16/04/04 13:52:00 DEBUG security.UserGroupInformation: hadoop login commit 16/04/04 13:52:00 DEBUG security.UserGroupInformation: using kerberos \ user:hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM 16/04/04 13:52:00 DEBUG security.UserGroupInformation: Using user: \ "hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM" with name \ hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM 16/04/04 13:52:00 DEBUG security.UserGroupInformation: User entry: \ "hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM" 16/04/04 13:52:00 INFO security.UserGroupInformation: Login successful for user \ hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM using keytab file \ hdfs-master-2.internal.larsgeorge.com.keytab User: hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM Group Ids: Groups: hadoop Keytab: larsgeorge (auth:KERBEROS) Auth method KERBEROS Keytab true
In this listing, the lines containing DEBUG
are the ones from the Hadoop class, while the lines prefixed with >>>
are written by the lower-level libraries. You can see how it reads the keytab and also is dumping the contained principals. For your own fun, you could add strace
to the call (prefix the entire command line) to have the Linux process being watched and all operations logged. This would show you which configuration is read and so on.
Lastly, there is another Java class that has a main()
method that is useful to know, which is HadoopKerberosName
:
$ hadoop org.apache.hadoop.security.HadoopKerberosName Java config name: null Native config name: /etc/krb5.conf Loaded from native config 16/04/04 13:52:25 DEBUG util.Shell: setsid exited with exit code 0
Without setting any parameters, this does not tell us much yet (note though that debugging is still enabled, hence the DEBUG
line). We need to add the user name we want to have resolved by the class, for example:
$ /opt/hadoop/bin/hadoop org.apache.hadoop.security.HadoopKerberosName \ larsgeorge@INTERNAL.LARSGEORGE.COM Java config name: null Native config name: /etc/krb5.conf Loaded from native config 16/04/04 13:52:32 DEBUG util.Shell: setsid exited with exit code 0 Name: larsgeorge@INTERNAL.LARSGEORGE.COM to larsgeorge
Mkay, it mapped my principal name to my local user name, since they match in my test system. Here a different example with a more complex principal, which is our above hdfs
SPN:
$ hadoop org.apache.hadoop.security.HadoopKerberosName \ hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM Java config name: null Native config name: /etc/krb5.conf Loaded from native config 16/04/04 13:53:06 DEBUG util.Shell: setsid exited with exit code 0 Name: hdfs/master-2.internal.larsgeorge.com@INTERNAL.LARSGEORGE.COM to hdfs
The output (the last line) shows how the Hadoop classes translate a principal name into a local user account. If you do not want to have a local matching user, you will need to have a helper that, for example, uses those shared by a central authority, like Active Directory or LDAP. For that, system administrators usually use sssd
, or a commercial tool like Centrify
or VAS
(now owned by Dell).
The HadoopKerberosName
is useful to see how the Hadoop classes translate a complex principal into a local username, in case the login fails due to inconsistencies. Especially when you start to use the auth_to_local
configuration parameter to specify special rules you may want to use this helper class to check all is well.
That is it for now, happy debugging!