2023-Q1-W2 oncall 记录

Issue #1 问题描述 使用 StreamingFileSink,报错 fileNotFoundException File does not exist: /user/xxxx/xxx_realtime_sample/tfrecord/2023011202/.part-32-474.inprogress.e9b18821-adf3-4e45-b5e9-dacd1abe32c3 原因分析 Flink 作业在写入过程中, 内部发生了重启, 但下次恢复时, 由于其他定时任务, 对应时刻的文件刚好被 mv 到了其他的目录,导致出现了问题 解决方式 mv 文件时,要避开正在写入的文件, 只处理已经 commit 的文件即可 Issue #2 问题描述 使用 StreamingFileSink 时, 报错 DFSClient_NONMAPREDUCE_-xxxx is already the current lease holder. Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to APPEND_FILE /user/xxxx/xxx_realtime_sample/tfrecord/2023011202/.part-220-474.inprogress.6746e5ad-1f94-4345-82c8-8d7251394cc5 for DFSClient_NONMAPREDUCE_-640776598_106_application_1667456877234_4942861 on [xxxx](http://xxxxx) because DFSClient_NONMAPREDUCE_-640776598_106_application_1667456877234_4942861 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2783) at org.apache.hadoop.hdfs.server.namenode.FSDirAppendOp.appendFile(FSDirAppendOp.java:125) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2867) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:914) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:562) at org....

January 15, 2023