Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: upgrade timeline service to 1.5 #612

Closed
wants to merge 1 commit into from

Conversation

leopaul36
Copy link
Member

Which issue(s) this PR fixes

Fixes #27

Additional comments

I'm trying to apply the instructions at https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/data-operating-system/content/timeline_server_1_5_overview.html.

I feel like everything is there, the Timeline server seems to be working fine. I validated by running yarn application -status XXX
after deleting the corresponding znode and restarting the RMs. The informations are retrieved successfully from the Timeline leveldb.

One thing I don't understand yet is why /ats/* folders are remaining empty. I tried with MapReduce and Tez but none actually uploaded metrics to HDFS.

Marking as draft until we figure out what's going on.

Agreements

@leopaul36
Copy link
Member Author

Would you have time to look at this @mehdibn? Particularly the issue with /ats/done and /ats/active remaining empty.

@mehdibn
Copy link
Contributor

mehdibn commented Jan 2, 2023

Would you have time to look at this @mehdibn? Particularly the issue with /ats/done and /ats/active remaining empty.

@leopaul36 yes and will keep you in touch ..

@mehdibn
Copy link
Contributor

mehdibn commented Jan 3, 2023

i have this issue when i tried to use this branch:

TASK [tosit.tdp.apptimelineserver : Create a symbolic link to necessary tez jars] ****************************************************************
failed: [mehdi-master-03] (item=/opt/tdp/tez/tez-yarn-timeline-cache-plugin-0.9.1-TDP-0.1.0-SNAPSHOT.jar) => {
    "ansible_loop_var": "item",
    "changed": false,
    "item": "/opt/tdp/tez/tez-yarn-timeline-cache-plugin-0.9.1-TDP-0.1.0-SNAPSHOT.jar",
    "path": "/opt/tdp/hadoop/share/hadoop/yarn/tez-yarn-timeline-cache-plugin-0.9.1-TDP-0.1.0-SNAPSHOT.jar",
    "src": "/opt/tdp/tez/tez-yarn-timeline-cache-plugin-0.9.1-TDP-0.1.0-SNAPSHOT.jar"
}

MSG:

src file does not exist, use "force=yes" if you really want to create the link: /opt/tdp/tez/tez-yarn-timeline-cache-plugin-0.9.1-TDP-0.1.0-SNAPSHOT.jar
failed: [mehdi-master-03] (item=/opt/tdp/tez/tez-api-0.9.1-TDP-0.1.0-SNAPSHOT.jar) => {
    "ansible_loop_var": "item",
    "changed": false,
    "item": "/opt/tdp/tez/tez-api-0.9.1-TDP-0.1.0-SNAPSHOT.jar",
    "path": "/opt/tdp/hadoop/share/hadoop/yarn/tez-api-0.9.1-TDP-0.1.0-SNAPSHOT.jar",
    "src": "/opt/tdp/tez/tez-api-0.9.1-TDP-0.1.0-SNAPSHOT.jar"
}

MSG:

src file does not exist, use "force=yes" if you really want to create the link: /opt/tdp/tez/tez-api-0.9.1-TDP-0.1.0-SNAPSHOT.jar
failed: [mehdi-master-03] (item=/opt/tdp/tez/tez-common-0.9.1-TDP-0.1.0-SNAPSHOT.jar) => {
    "ansible_loop_var": "item",
    "changed": false,
    "item": "/opt/tdp/tez/tez-common-0.9.1-TDP-0.1.0-SNAPSHOT.jar",
    "path": "/opt/tdp/hadoop/share/hadoop/yarn/tez-common-0.9.1-TDP-0.1.0-SNAPSHOT.jar",
    "src": "/opt/tdp/tez/tez-common-0.9.1-TDP-0.1.0-SNAPSHOT.jar"
}

MSG:

src file does not exist, use "force=yes" if you really want to create the link: /opt/tdp/tez/tez-common-0.9.1-TDP-0.1.0-SNAPSHOT.jar

@leopaul36
Copy link
Member Author

leopaul36 commented Jan 4, 2023

i have this issue when i tried to use this branch:

TASK [tosit.tdp.apptimelineserver : Create a symbolic link to necessary tez jars] ****************************************************************
failed: [mehdi-master-03] (item=/opt/tdp/tez/tez-yarn-timeline-cache-plugin-0.9.1-TDP-0.1.0-SNAPSHOT.jar) => {
    "ansible_loop_var": "item",
    "changed": false,
    "item": "/opt/tdp/tez/tez-yarn-timeline-cache-plugin-0.9.1-TDP-0.1.0-SNAPSHOT.jar",
    "path": "/opt/tdp/hadoop/share/hadoop/yarn/tez-yarn-timeline-cache-plugin-0.9.1-TDP-0.1.0-SNAPSHOT.jar",
    "src": "/opt/tdp/tez/tez-yarn-timeline-cache-plugin-0.9.1-TDP-0.1.0-SNAPSHOT.jar"
}

MSG:

src file does not exist, use "force=yes" if you really want to create the link: /opt/tdp/tez/tez-yarn-timeline-cache-plugin-0.9.1-TDP-0.1.0-SNAPSHOT.jar
failed: [mehdi-master-03] (item=/opt/tdp/tez/tez-api-0.9.1-TDP-0.1.0-SNAPSHOT.jar) => {
    "ansible_loop_var": "item",
    "changed": false,
    "item": "/opt/tdp/tez/tez-api-0.9.1-TDP-0.1.0-SNAPSHOT.jar",
    "path": "/opt/tdp/hadoop/share/hadoop/yarn/tez-api-0.9.1-TDP-0.1.0-SNAPSHOT.jar",
    "src": "/opt/tdp/tez/tez-api-0.9.1-TDP-0.1.0-SNAPSHOT.jar"
}

MSG:

src file does not exist, use "force=yes" if you really want to create the link: /opt/tdp/tez/tez-api-0.9.1-TDP-0.1.0-SNAPSHOT.jar
failed: [mehdi-master-03] (item=/opt/tdp/tez/tez-common-0.9.1-TDP-0.1.0-SNAPSHOT.jar) => {
    "ansible_loop_var": "item",
    "changed": false,
    "item": "/opt/tdp/tez/tez-common-0.9.1-TDP-0.1.0-SNAPSHOT.jar",
    "path": "/opt/tdp/hadoop/share/hadoop/yarn/tez-common-0.9.1-TDP-0.1.0-SNAPSHOT.jar",
    "src": "/opt/tdp/tez/tez-common-0.9.1-TDP-0.1.0-SNAPSHOT.jar"
}

MSG:

src file does not exist, use "force=yes" if you really want to create the link: /opt/tdp/tez/tez-common-0.9.1-TDP-0.1.0-SNAPSHOT.jar

@mehdibn my bad I forgot to add Hive Client as a dependency to YARN ATS. It works on my side because my ATS is on a master-3 node which is also a HS2 and it was a pre-running cluster.

@mehdibn
Copy link
Contributor

mehdibn commented Jan 4, 2023

@leopaul36 no problem, i resolved this dependency for my case.

After this deployment, i noted the following behavior:

  • with actual yarn configuration, Yarn Resource Manager uses only zookeeper for applications recovery. I changed the zookeeper znode and lost all applications. I think that timeline can't be used as a RM recovery

  • with this timeline configuration, timeline is still using leveldb as a backend. In order to check it, you can note the size of leveldb : /var/lib/yarn/ats/leveldb/leveldb-timeline-store.ldb before and after an application submission

@leopaul36
Copy link
Member Author

@leopaul36 no problem, i resolved this dependency for my case.

After this deployment, i noted the following behavior:

* with actual yarn configuration, Yarn Resource Manager uses only zookeeper for applications recovery. I changed the zookeeper znode and lost all applications. I think that timeline can't be used as a RM recovery

* with this timeline configuration, timeline is still using leveldb as a backend. In order to check it, you can note the size of leveldb : `/var/lib/yarn/ats/leveldb/leveldb-timeline-store.ldb` before and after an application submission

I agree with both observations. What I'm struggling to understand is that according to Hortonworks' documentation, it's normal that the local leveldb path is used because it has to be configured according to the docs. In this case I don't understand what is supposed to be uploaded to HDFS paths /ats/done

@mehdibn
Copy link
Contributor

mehdibn commented Jan 4, 2023

@leopaul36 no problem, i resolved this dependency for my case.
After this deployment, i noted the following behavior:

* with actual yarn configuration, Yarn Resource Manager uses only zookeeper for applications recovery. I changed the zookeeper znode and lost all applications. I think that timeline can't be used as a RM recovery

* with this timeline configuration, timeline is still using leveldb as a backend. In order to check it, you can note the size of leveldb : `/var/lib/yarn/ats/leveldb/leveldb-timeline-store.ldb` before and after an application submission

I agree with both observations. What I'm struggling to understand is that according to Hortonworks' documentation, it's normal that the local leveldb path is used because it has to be configured according to the docs. In this case I don't understand what is supposed to be uploaded to HDFS paths /ats/done

It is the hdfs path to store done application’s timeline data when hdfs is well configured as a timeline backend. But, it is not actually the case. we should miss one or more properties. Or maybe, replace leveldb by hdfs (can't have both) . I checked the timeline logs and there are no connections to hdfs ..

@mehdibn
Copy link
Contributor

mehdibn commented Jan 5, 2023

@leopaul36,after testing, i conlude that:

  • in Timeline V1.5, HDFS usage is only dedicated for technologies that support this feature, especially Tez via its UI and ATSV15Plugin or by adding this code in the yarn app
  • for more information about Tez, you can find more details in this doc
  • in the HDP doc, they confirm it by only talking about Tez: The writer part involves configuring Tez with the ATSV15Plugin that is not yet configured in this branch
  • as you can see in the doc, LevelDB is the principal timeline backend and hdfs is only used when application support it
  • in our case, even we configure ATSV15Plugin, and as TezUI is not used, this feature will not be useful
  • if we want to have a full scalable backend supported with all processing technologies, the solution is to migrate to timline V2
  • in all cases, jobhistory and sparkhistoryserver will keep all details for MR and Spark jobs

@leopaul36
Copy link
Member Author

Thanks for testing and your explanations @mehdibn

Let's mark #27 as not planned and create an issue for timeline V2 (not a priority IMO).

@leopaul36
Copy link
Member Author

Closed as #27 is not planned

@leopaul36 leopaul36 closed this Jan 5, 2023
@leopaul36 leopaul36 deleted the 27-timeline-service-1.5 branch January 5, 2023 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hadoop: change YARN Timeline Service to v1.5
2 participants