visit
Old Version: 1.3.4 New Version: 3.1.2
After the upgrade, using the resource center results in an error: IllegalArgumentException: Failed to specify server's Kerberos principal name
.
Solution:
Edit dolphinscheduler/api-server/conf/hdfs-site.xml
and add the following content:
<property>
<name>dfs.namenode.kerberos.principal.pattern</name>
<value>*</value>
</property>
2. Task Instance Log Loss
After the upgrade, viewing task instance logs results in an error indicating that the logs cannot be found. By checking the error message and the new version’s directory structure and log paths, it is found that the log path has changed.
Before upgrade: log path is under /logs/
After the upgrade: the log path is under /worker-server/logs/
Solution: Execute SQL to modify the log path
update t_ds_task_instance set log_path=replace(log_path,'/logs/','/worker-server/logs/');
cp -r {old_dolphinscheduler_directory}/logs/[1-9]* {new_dolphinscheduler_directory}/worker-server/logs/*
3. Error Creating Workflow After Upgrade
The error message indicates that the initial values of the primary keys in t_ds_process_definition_log
and t_ds_process_definition
are inconsistent. Synchronizing these values resolves the issue.
Solution: Execute the following SQL
-- Get the auto-increment value of the primary key
select AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_SCHEMA = 'dolphinscheduler' AND TABLE_NAME = 't_ds_process_definition' limit 1;
-- Use the result of the above SQL to execute the following
alter table dolphinscheduler_bak1.t_ds_process_definition_log auto_increment = {max_id};
4. Empty Task Instance List After Upgrade
By checking the SQL query in dolphinscheduler-dao/src/main/resources/org/apache/dolphinscheduler/dao/mapper/TaskInstanceMapper.xml
, it is found that the SQL for querying task instance lists references the t_ds_task_definition_log
table, but the join condition define.code=instance.task_code
does not match.
Considering the condition define.project_code = #{projectCode}
, the join with t_ds_task_definition_log
is mainly for filtering by projectCode
. Modify the SQL as follows:
Solution:
select
<include refid="baseSqlV2">
<property name="alias" value="instance"/>
</include>
,
process.name as process_instance_name
from t_ds_task_instance instance
-- left join t_ds_task_definition_log define
-- on define.code = instance.task_code and
-- define.version = instance.task_definition_version
join t_ds_process_instance process
on process.id = instance.process_instance_id
join t_ds_process_definition define
on define.code = process.process_definition_code
where define.project_code = #{projectCode}
<if test="startTime != null">
and instance.start_time <![CDATA[ >= ]] > #{startTime}
</if>
By directly joining with t_ds_process_definition
, which also contains the project_code
field for filtering, the query will return the correct data.
5. Null Pointer Exception During Upgrade Script Execution
(1) Analyze the log and locate the issue at line 517 in UpgradeDao.java
513 if (TASK_TYPE_SUB_PROCESS.equals(taskType)) {
514 JsonNode jsonNodeDefinitionId = param.get("processDefinitionId");
515 if (jsonNodeDefinitionId != null) {
516 param.put("processDefinitionCode",
517 processDefinitionMap.get(jsonNodeDefinitionId.asInt()).getCode());
518 param.remove("processDefinitionId");
519 }
520 }
The issue is that processDefinitionMap.get(jsonNodeDefinitionId.asInt())
returns null. Add a null check, and if null, log the relevant information for later review.
Solution:
if (jsonNodeDefinitionId != null) {
if (processDefinitionMap.get(jsonNodeDefinitionId.asInt()) != null) {
param.put("processDefinitionCode",processDefinitionMap.get(jsonNodeDefinitionId.asInt()).getCode());
param.remove("processDefinitionId");
} else {
logger.error("*******************error");
logger.error("*******************param:" + param);
logger.error("*******************jsonNodeDefinitionId:" + jsonNodeDefinitionId);
}
}
(2) Analyze the log and locate the issue at line 675 in UpgradeDao.java
669 if (mapEntry.isPresent()) {
670 Map.Entry<Long, Map<String, Long>> processCodeTaskNameCodeEntry = mapEntry.get();
671 dependItem.put("definitionCode", processCodeTaskNameCodeEntry.getKey());
672 String depTasks = dependItem.get("depTasks").asText();
673 long taskCode =
674 "ALL".equals(depTasks) || processCodeTaskNameCodeEntry.getValue() == null ? 0L
675 : processCodeTaskNameCodeEntry.getValue().get(depTasks);
676 dependItem.put("depTaskCode", taskCode);
677 }
The issue is that processCodeTaskNameCodeEntry.getValue().get(depTasks)
returns null. Modify the logic to check for null before assigning the value and log the relevant information.
Solution:
long taskCode =0;
if (processCodeTaskNameCodeEntry.getValue() != null
&&processCodeTaskNameCodeEntry.getValue().get(depTasks)!=null){
taskCode =processCodeTaskNameCodeEntry.getValue().get(depTasks);
}else{
logger.error("******************** depTasks:"+depTasks);
logger.error("******************** taskCode not in "+JSONUtils.toJsonString(processCodeTaskNameCodeEntry));
}
dependItem.put("depTaskCode", taskCode);
6. Login Failure After LDAP Integration, Uncertain Email Field Name
Configure LDAP integration in api-server/conf/application.yaml
security:
authentication:
# Authentication types (supported types: PASSWORD,LDAP)
type: LDAP
# IF you set type `LDAP`, below config will be effective
ldap:
# ldap server config
urls: xxx
base-dn: xxx
username: xxx
password: xxx
user:
# admin userId when you use LDAP login
admin: xxx
identity-attribute: xxx
email-attribute: xxx
# action when ldap user is not exist (supported types: CREATE,DENY)
not-exist-action: CREATE
To successfully integrate LDAP, the fields urls
, base-dn
, username
, password
, identity
, and email
must be correctly filled. If the email field name is unknown, leave it empty initially.
Solution: The LDAP authentication code is located in dolphinscheduler-api/src/main/java/org/apache/dolphinscheduler/api/security/impl/ldap/LdapService.java
in the ldapLogin()
method
ctx = new InitialLdapContext(searchEnv, null);
SearchControls sc = new SearchControls();
sc.setReturningAttributes(new String[]{ldapEmailAttribute});
sc.setSearchScope(SearchControls.SUBTREE_SCOPE);
EqualsFilter filter = new EqualsFilter(ldapUserIdentifyingAttribute, userId);
NamingEnumeration<SearchResult> results = ctx.search(ldapBaseDn, filter.toString(), sc);
if (results.hasMore()) {
// get the users DN (distinguishedName) from the result
SearchResult result = results.next();
NamingEnumeration<? extends Attribute> attrs = result.getAttributes().getAll();
while (attrs.hasMore()) {
// Open another connection to the LDAP server with the found DN and the password
searchEnv.put(Context.SECURITY_PRINCIPAL, result.getNameInNamespace());
searchEnv.put(Context.SECURITY_CREDENTIALS, userPwd);
try {
new InitialDirContext(searchEnv);
} catch (Exception e) {
logger.warn("invalid ldap credentials or ldap search error", e);
return null;
}
Attribute attr = attrs.next();
if (attr.getID().equals(ldapEmailAttribute)) {
return (String) attr.get();
}
}
}
// sc.setReturningAttributes(new String[]{ldapEmailAttribute});
NamingEnumeration<? extends Attribute> attrs = result.getAttributes().getAll();
7. Resource File Authorization Not Effective for Regular Users by Admin
Solution:
In the listAuthorizedResource
method of ResourcePermissionCheckServiceImpl.java
file located at dolphinscheduler-api/src/main/java/org/apache/dolphinscheduler/api/permission/
, modify the return collection to relationResources
:
@Override
public Set<Integer> listAuthorizedResource(int userId, Logger logger) {
List<Resource> relationResources;
if (userId == 0) {
relationResources = new ArrayList<>();
} else {
// query resource relation
List<Integer> resIds = resourceUserMapper.queryResourcesIdListByUserIdAndPerm(userId, 0);
relationResources = CollectionUtils.isEmpty(resIds) ? new ArrayList<>() : resourceMapper.queryResourceListById(resIds);
}
List<Resource> ownResourceList = resourceMapper.queryResourceListAuthored(userId, -1);
relationResources.addAll(ownResourceList);
return relationResources.stream().map(Resource::getId).collect(toSet()); // Fixed the issue that the resource file authorization was invalid
// return ownResourceList.stream().map(Resource::getId).collect(toSet());
}
Check the new version’s change log and find that this bug has been fixed in version 3.1.3: GitHub PR #13318
8. Kerberos Expiry Issues
Solution:
Add the following method in CommonUtils.java
located at dolphinscheduler-service/src/main/java/org/apache/dolphinscheduler/service/utils/
:
/**
* Periodically update credentials
*/
private static void startCheckKeytabTgtAndReloginJob() {
// Periodically update credentials daily
Executors.newScheduledThreadPool(1).scheduleWithFixedDelay(() -> {
try {
UserGroupInformation.getLoginUser().checkTGTAndReloginFromKeytab();
logger.warn("Check Kerberos Tgt And Relogin From Keytab Finish.");
} catch (IOException e) {
logger.error("Check Kerberos Tgt And Relogin From Keytab Error", e);
}
}, 0, 1, TimeUnit.DAYS);
logger.info("Start Check Keytab TGT And Relogin Job Success.");
}
Then call this method in the loadKerberosConf
method before returning true:
public static boolean loadKerberosConf(String javaSecurityKrb5Conf, String loginUserKeytabUsername,
String loginUserKeytabPath, Configuration configuration) throws IOException {
if (CommonUtils.getKerberosStartupState()) {
System.setProperty(Constants.JAVA_SECURITY_KRB5_CONF, StringUtils.defaultIfBlank(javaSecurityKrb5Conf,
PropertyUtils.getString(Constants.JAVA_SECURITY_KRB5_CONF_PATH)));
configuration.set(Constants.HADOOP_SECURITY_AUTHENTICATION, Constants.KERBEROS);
UserGroupInformation.setConfiguration(configuration);
UserGroupInformation.loginUserFromKeytab(
StringUtils.defaultIfBlank(loginUserKeytabUsername,
PropertyUtils.getString(Constants.LOGIN_USER_KEY_TAB_USERNAME)),
StringUtils.defaultIfBlank(loginUserKeytabPath,
PropertyUtils.getString(Constants.LOGIN_USER_KEY_TAB_PATH)));
startCheckKeytabTgtAndReloginJob(); // Call here
return true;
}
return false;
}