-
Notifications
You must be signed in to change notification settings - Fork 56
Exception in thread "HiveServer2-Handler-Pool: Thread-76" java.lang.ExceptionInInitializerError #23
Comments
Does Spark community support hive 3.1 since 2.3? |
I can interact with Hive metastore via spark thrift server(spark-beeline shell) with this set up before spark-ranger plugin. So I assumed that spark 2.3 can connect to Hive meta store v-3.1.2. I replaced the spark-ranger jar with original-spark-ranger jar, now I am facing another issue: Exception in thread "HiveServer2-Handler-Pool: Thread-99" java.lang.NoClassDefFoundError: org/apache/ranger/plugin/policyengine/RangerAccessResultProcessor Any thoughts on this ? One more question: Does spark-authorizer work in this use case ? connecting to an external hive metastore ? Apologies for mulitple questions. Thanks in advance :) |
spark-authorizer is limited to ranger-0.5.3 and spark with built-in hive(1.2.1.spark2) ExceptionInInitializerError is more likely related to another NoClassDefFoundError. Maybe you can try the Apache submarine spark security plugin as you can find in README. This project is about to be archived and many fixes are far behind here. If you have any questions, send them to the Apache submarine JIRA or the user mailing list will be a better choice. |
You do not need to replace the spark-ranger.jar. Let it be there. Did you copy spark-ranger.jar to all machines? By the way, that is not your root exception. please provide full ERROR level exception in the thriftserver logs. |
Hi, Thanks for your reply. When I build this project, I can see two jar files - spark-ranger.jar and original-spark-ranger.jar. I tried with both of these jars. Attaching my thriftserver logs. Kindly let me know if there is anything that can help me (this log is for original-spark-ranger.jar). My first comment was for spark-ranger.jar You can find the entire exception there. Let me know if there is anything you need to know from my end. Also, I have started hivemetastore before starting thriftserver. Thanks in advance. Spark Command: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.222.b10-1.fc31.x86_64/jre/bin/java -cp /home/DURGACHILLAKURU/spark-2.3.0-bin-hadoop2.7/conf/:/home/DURGACHILLAKURU/spark-2.3.0-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.SparkSubmit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server spark-internal2020-04-09 12:23:02 WARN Utils:66 - Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 10.0.0.128 instead (on interface wlp4s0) |
Did you try spark-ranger.jar? You do not need 'original-spark-ranger.jar'. It is just 188K and spark-ranger.jar should be 16MB. Copy it on all machines and see what will happen and attach your logs if available. |
Hi, Yes I did try with spark-ranger.jar. Have attached the thriftserver logs for your reference. Steps I have done so far: Before starting Spark:Have build spark-ranger and placed the jar under SPARK_HOME/jars and mentioned the param spark.sql.extensions=org.apache.ranger.authorization.spark.authorizer.RangerSparkSQLExtension in spark-default.conf file. Have placed both xml files with appropriate params under SPARK_HOMW/conf directory. Also, have addressed the add-on issues mentioned in spark-ranger github. Started Ranger-admin service. Now, when I am trying to access the beeline(spark) , below exception pops up. Attached the thriftserver log file for your reference. Kindly let me know if there is anything that I need to provide. |
Well, In addition, you have put "spark.sql.extensions=org.apache.ranger.authorization.spark.authorizer.RangerSparkSQLExtension" in spark-default.conf which is invalid. The correct form is like below:
Try add "spark.driver.extraClassPath YOUR_SPARK_HOME/jars/spark-ranger-1.0-SNAPSHOT.jar" in spark-default.conf. By the way, your "ranger-spark-security.xml" is not readable(there may be permission problem) based on this error occured in the logs: "ERROR RangerConfiguration:103 - addResourceIfReadable(ranger-spark-security.xml): couldn't find resource file location" That will use DB audit store as default and continue. It is not related to the error actually, but I had to say to solve it. |
Hi, I am running spark on single node. When trying to rectify this, I tried to run the thriftserver with sudo, in which case I get a different error stating "user [root] does not have [use] priviledge on default". If my understanding is right, then only root can be used as a user to connect ranger and spark? Or is there any other alternate to solving this? |
Hi, I would also like to mention that I have installed spark in my home directory, whereas ranger would in usr/local/ . Previously when I integrated Ranger with hive server2, I had my hive installed in my home directory. But I had Hive Ranger plugin in my /usr/local directory, so there wasn't any issue in accessing credential files. But in this case, the plugin set up is adding a jar file and some conf set up in spark directory itself. I guess that's the reason for this. Can you please let me know if there is any inputs from your end to resolve this ? Thanks in advance. |
Hi, When you enable security, It is not recommended to have all services installed into their $HOME. For example, you will install Spark in /usr/local and you chown it to spark user but other users can interact with it. No need to run ThriftServer as sudo. Set the owner of each services. When I checked your logs again, I found that you have a permission problem. Two errors happened in your logs which is below:
The first error is related to policycache folder's permission. The second error is related to access permissions of spark-ranger.jar file which is not readable. Check both and tell me the result. |
Hi,
I have installed spark-2.3.0-bin-hadoop2.7, Hive-3.1.2 and ranger-2.0.0 locally. I have established a connection between spark thrift server and Hive metastore before enabling spark ranger plugin. Now I have followed the same steps as mentioned in the Readme along with the known issues for ranger-2.0.0. But after starting thriftserver, when I am running spark beeline shell, I can see an exception in beeline:
2020-04-09 11:17:46 ERROR HiveConnection:593 - Error opening session
org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376)
at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453)
at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435)
at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:156)
at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:143)
at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:583)
at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:192)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:142)
at org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:207)
at org.apache.hive.beeline.Commands.connect(Commands.java:1149)
at org.apache.hive.beeline.Commands.connect(Commands.java:1070)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:970)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:813)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:771)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:484)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:467)
Error: Could not establish connection to jdbc:hive2://localhost:10000: null (state=08S01,code=0)
0: jdbc:hive2://localhost:10000 (closed)>
After digging into thrifserver logs, I found the exception -
enableTagEnricherWithLocalRefresher: false, disableTrieLookupPrefilter: false, optimizeTrieForRetrieval: false, cacheAuditResult: false }
2020-04-09 11:17:46 INFO AuditProviderFactory:493 - RangerAsyncAuditCleanup: Waiting to audit cleanup start signal
Exception in thread "HiveServer2-Handler-Pool: Thread-76" java.lang.ExceptionInInitializerError
at org.apache.spark.sql.catalyst.optimizer.RangerSparkAuthorizerExtension.apply(RangerSparkAuthorizerExtension.scala:62)
at org.apache.spark.sql.catalyst.optimizer.RangerSparkAuthorizerExtension.apply(RangerSparkAuthorizerExtension.scala:36)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:66)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:66)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3248)
at org.apache.spark.sql.Dataset.(Dataset.scala:190)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
at org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager.openSession(SparkSQLSessionManager.scala:68)
at org.apache.hive.service.cli.CLIService.openSessionWithImpersonation(CLIService.java:202)
at org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:351)
at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:246)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1253)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1238)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
at org.apache.ranger.plugin.service.RangerBasePlugin.init(RangerBasePlugin.java:223)
at org.apache.ranger.authorization.spark.authorizer.RangerSparkPlugin.init(RangerSparkPlugin.scala:39)
at org.apache.ranger.authorization.spark.authorizer.RangerSparkPlugin$Builder.getOrCreate(RangerSparkPlugin.scala:71)
at org.apache.ranger.authorization.spark.authorizer.RangerSparkAuthorizer$.(RangerSparkAuthorizer.scala:43)
at org.apache.ranger.authorization.spark.authorizer.RangerSparkAuthorizer$.(RangerSparkAuthorizer.scala)
... 34 more
Any thought on this ?? Any help would be much appreciated. Thanks in advance.
Note: I have installed ranger-hive plugin(ranger and hiveserver2) before with same setup and it works perfectly fine. Now I want ranger to interact with spark thrift server with an underlying hive metastore.
The text was updated successfully, but these errors were encountered: