Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when running HdfsTest with Spark3.3+ on K8S and data on TDP1.1/HDFS #94

Open
befteam2022 opened this issue Feb 12, 2024 · 1 comment

Comments

@befteam2022
Copy link

befteam2022 commented Feb 12, 2024

Hello,

When running a HdfsTest from Spark 3.3/3.4/3.5 on TDP1.1/HDFS, we are facing the following error : "No live nodes contain current block Block locations"

Here is a fraction of the log stack ==>

24/02/10 10:55:35 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (XXX.XX.X.XXX executor 1): org.apache.spark.SparkException: Encountered error while reading file hdfs://path/to/monfichier. Details:
24/02/10 10:55:35 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 1) (XXX.XX.X.XXX, executor 1, partition 0, ANY, 7962 bytes)
... 22 more
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:297)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.execution.datasources.HadoopFileLinesReader.hasNext(HadoopFileLinesReader.scala:67)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:198)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:158)
at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:185)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:227)
at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)
at java.base/java.io.DataInputStream.read(DataInputStream.java:151)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:957)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:884)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:677)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:969)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:990)
at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1007)
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1040252842-XX.XXX.XXX.XX-1687882122273:blk_1074022325_282875 file=/path/to/monfichier No live nodes contain current block Block locations: DatanodeInfoWithStorage[XX.XXX.XXX.XX:9866,DS-67d022d0-c1db-4d0e-9604-0573a9e83e0f,DISK] DatanodeInfoWithStorage[XXX.XXX.XXX.XXX:9866,DS-2a18a93d-9ffe-454d-b106-f02680cbebdd,DISK] DatanodeInfoWithStorage[XX.XXX.XXX.XX:9866,DS-4deeb2d6-1a39-45a9-a8a4-520026026552,DISK] Dead nodes: DatanodeInfoWithStorage[XX.XXX.XXX.XX:9866,DS-4deeb2d6-1a39-45a9-a8a4-520026026552,DISK] DatanodeInfoWithStorage[XX.XXX.XXX.XX:9866,DS-67d022d0-c1db-4d0e-9604-0573a9e83e0f,DISK] DatanodeInfoWithStorage[XXX.XXX.XXX.XXX:9866,DS-2a18a93d-9ffe-454d-b106-f02680cbebdd,DISK]

Can you investigate about this issue please ?

Regards,

BEFTEAM2022

PS : in attachement the manifest yaml file to help you if you want to reproduce the issue. Just change everything which is between <>.
spark_3_3_k8s_on_tdp.yaml.txt

@befteam2022 befteam2022 changed the title Issue when running HdfsTest with Spark3.3+ on K8S and data on TDP3.1/HDFS Issue when running HdfsTest with Spark3.3+ on K8S and data on TDP1.1/HDFS Feb 14, 2024
@Edouard-R
Copy link
Contributor

Thanks for the issue !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants