Integrating DolphinScheduler with Sqoop can streamline data synchronization across systems. But beginners often run into frustrating errors during setup and execution.
This guide walks you through common pitfalls, complete with real-world error messages, solutions, and configuration tips. Whether you’re struggling with environment variables, classpath issues, or malformed Sqoop commands, this article will help you get data flowing smoothly in no time.
1. Error when creating a tenant in DolphinScheduler: Permission denied
If you see an error in the logs indicating insufficient permissions when creating a tenant in DolphinScheduler, you can enable permissions using the following command:
hdfs dfs -chmod 777 /
2. Encountering the following error
Sqoop: command can’t be found
The possible issues may be:
- Sqoop is not installed;
- or it is installed but not configured in the environment variables of DolphinScheduler.
- Can the full path be used instead?
Let’s check how DolphinScheduler integrates with Sqoop:
At the end of the configuration file shown above, add the following lines:
export SQOOP_HOME=/opt/installs/sqoop
export PATH=$SQOOP_HOME/bin:$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$SEATUNNEL_HOME/bin:$CHUNJUN_HOME/bin:$PATH
Then restart DolphinScheduler:
# Check status
bash ./bin/dolphinscheduler-daemon.sh status standalone-server
# Stop DolphinScheduler
bash ./bin/dolphinscheduler-daemon.sh stop standalone-server
# Start DolphinScheduler
bash ./bin/dolphinscheduler-daemon.sh start standalone-server
3. If you encounter the following error:
Caused by: java.lang.ClassNotFoundException: Class QueryResult not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2571)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2665)
... 12 more
Temporary solution:
Copy the QueryResult.jar
file to the lib
directory of Sqoop.
[INFO] 2024-09-25 06:19:16.083 +0000 - -> Note: /tmp/sqoop-root/compile/46c0c4b3def5aba0c202ae9664234de6/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
# Based on the log, go to this path:
cd /tmp/sqoop-root/compile/46c0c4b3def5aba0c202ae9664234de6
# Copy the jar file
cp /tmp/sqoop-root/compile/46c0c4b3def5aba0c202ae9664234de6/QueryResult.jar /opt/installs/sqoop/lib/
Permanent solution:
Add the following line to DolphinScheduler’s configuration file:
At the bottom of the config file, add:
export HADOOP_CONF_DIR=/opt/installs/hadoop/etc/hadoop
After configuration, restart DolphinScheduler:
# Check status
bash ./bin/dolphinscheduler-daemon.sh status standalone-server
# Stop
bash ./bin/dolphinscheduler-daemon.sh stop standalone-server
# Start
bash ./bin/dolphinscheduler-daemon.sh start standalone-server
4. Error log example:
[INFO] 2024-09-25 06:27:53.628 +0000 - -> 2024-09-25 14:27:52,757 INFO [main] sqoop.Sqoop (Sqoop.java:<init>(96)) - Running Sqoop version: 1.4.7
2024-09-25 14:27:52,824 ERROR [main] tool.BaseSqoopTool - Error parsing arguments for import:
2024-09-25 14:27:52,825 ERROR [main] tool.BaseSqoopTool - Unrecognized argument: dt
2024-09-25 14:27:52,825 ERROR [main] tool.BaseSqoopTool - Unrecognized argument: 2024-09-24
...
This indicates that there is a syntax error in the Sqoop command used in the DolphinScheduler workflow:
Double-check your workflow’s Sqoop command:
Delete the Sqoop parameter:
[INFO] 2024-09-25 06:34:34.639 +0000 - -> Sqoop version: 1.4.7
WARN - Setting password on command-line is insecure. Consider using -P instead.
ERROR - Must specify destination with --target-dir. Try --help for usage instructions.
If this happens, modify the command:
Target path of Hive: /tmp/user_orclog
If you see the following SQL syntax error:
Error executing statement: java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '%Y-%m-%d) = 2024-09-24 AND (1 = 0)' at line 1
→ Replace double quotes with single quotes in your SQL statement.
5. Hive Error
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 45:21 cannot recognize input near ';' '<EOF>' '<EOF>' in expression specification
Solution:
Use the SQL task type, and do not add a semicolon (;
) at the end of the statement.