在 Hadoop 生态系统中将文件复制到 HDFS 时出错

在 Hadoop 生态系统中将文件复制到 HDFS 时出错

在 Hadoop 3.0 中,在终端上发出将文件从本地文件系统复制到 HDFS 的命令时,显示错误

hadoop-3.0.0/hadoop2_data/hdfs/datanode': No such file or directory: 
`hdfs://localhost:9000/user/Amit/hadoop-3.0.0/hadoop2_data/hdfs/datanode.

不过我已经检查过目录hadoop-3.0.0/hadoop2_data/hdfs/数据节点具有适当的访问权限。我尝试从 Web 浏览器上传文件,但显示以下错误。

"Couldn't find datanode to write file. Forbidden"

请帮助解决该问题。附加核心站点.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/Amit/hadoop-3.0.0/hadoop2_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.name.dir</name>
<value>file:/home/Amit/hadoop-3.0.0/hadoop2_data/hdfs/datanode</value>
</property>
</configuration>

检查 Hadoop 安装目录中的 Datanode 日志文件,显示成功消息如下

INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT]
INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for [DISK]file:/tmp/hadoop-Amit/dfs/data
INFO org.apache.commons.beanutils.FluentPropertyBeanIntrospector: Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
INFO org.apache.hadoop.hdfs.server.common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Initialized block scanner with targetBytesPerSec 1048576
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is DESKTOP-JIUFBOR.localdomain
INFO org.apache.hadoop.hdfs.server.common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting DataNode with maxLockedMemory = 0
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /0.0.0.0:9866
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwidth is 10485760 bytes/s
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 50
INFO org.eclipse.jetty.util.log: Logging initialized @146677ms
INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.datanode is not defined
INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode
INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 49833
INFO org.eclipse.jetty.server.Server: jetty-9.3.19.v20170502
INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler@3a0baae5{/logs,file:///home/Amit/hadoop-3.0.0/logs/,AVAILABLE}
INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler@289710d9{/static,file:///home/Amit/hadoop-3.0.0/share/hadoop/hdfs/webapps/static/,AVAILABLE}
INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.w.WebAppContext@3016fd5e{/,file:///home/Amit/hadoop-3.0.0/share/hadoop/hdfs/webapps/datanode/,AVAILABLE}{/datanode}
INFO org.eclipse.jetty.server.AbstractConnector: Started ServerConnector@178213b{HTTP/1.1,[http/1.1]}{localhost:49833}
INFO org.eclipse.jetty.server.Server: Started @151790ms
INFO org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer: Listening HTTP traffic on /0.0.0.0:9864
INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = Amit
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup = supergroup
INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 1000 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 9867
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /0.0.0.0:9867
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: null
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: <default>
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to localhost/127.0.0.1:9000 starting to offer service
INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9867: starting
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Acknowledging ACTIVE Namenode during handshakeBlock pool <registering> (Datanode Uuid unassigned) service to localhost/127.0.0.1:9000
INFO org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /tmp/hadoop-Amit/dfs/data/in_use.lock acquired by nodename [email protected]
INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-1751678544-127.0.1.1-1518974872649
INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /tmp/hadoop-Amit/dfs/data/current/BP-1751678544-127.0.1.1-1518974872649
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage: nsid=1436602813;bpid=BP-1751678544-127.0.1.1-1518974872649;lv=-57;nsInfo=lv=-64;cid=CID-b7086125-1e01-4cf4-94d0-f8b6b1d4db25;nsid=1436602813;c=1518974872649;bpid=BP-1751678544-127.0.1.1-1518974872649;dnuuid=f132f3ae-7f95-424d-b4d0-729602fc80dd
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-ba9d49d2-87cb-4dff-ae80-d7f11382644f
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - [DISK]file:/tmp/hadoop-Amit/dfs/data, StorageType: DISK
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Registered FSDatasetState MBean
INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for /tmp/hadoop-Amit/dfs/data
INFO org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker: Scheduled health check for volume /tmp/hadoop-Amit/dfs/data
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding block pool BP-1751678544-127.0.1.1-1518974872649
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-1751678544-127.0.1.1-1518974872649 on volume /tmp/hadoop-Amit/dfs/data...
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-1751678544-127.0.1.1-1518974872649 on /tmp/hadoop-Amit/dfs/data: 1552ms
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Total time to scan all replicas for block pool BP-1751678544-127.0.1.1-1518974872649: 1597ms
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-1751678544-127.0.1.1-1518974872649 on volume /tmp/hadoop-Amit/dfs/data...
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice: Replica Cache file: /tmp/hadoop-Amit/dfs/data/current/BP-1751678544-127.0.1.1-1518974872649/current/replicas doesn't exist 
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-1751678544-127.0.1.1-1518974872649 on volume /tmp/hadoop-Amit/dfs/data: 1ms
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Total time to add all replicas to map: 4ms
INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/tmp/hadoop-Amit/dfs/data, DS-ba9d49d2-87cb-4dff-ae80-d7f11382644f): no suitable block pools found to scan.  Waiting 1811581849 ms.
INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Periodic Directory Tree Verification scan starting at 2/18/18 8:15 PM with interval of 21600000ms
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1751678544-127.0.1.1-1518974872649 (Datanode Uuid f132f3ae-7f95-424d-b4d0-729602fc80dd) service to localhost/127.0.0.1:9000 beginning handshake with NN
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool Block pool BP-1751678544-127.0.1.1-1518974872649 (Datanode Uuid f132f3ae-7f95-424d-b4d0-729602fc80dd) service to localhost/127.0.0.1:9000 successfully registered with NN
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode localhost/127.0.0.1:9000 using BLOCKREPORT_INTERVAL of 21600000msec CACHEREPORT_INTERVAL of 10000msec Initial delay: 0msec; heartBeatInterval=3000
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Successfully sent block report 0xe646383a22bd4be5,  containing 1 storage report(s), of which we sent 1. The reports had 0 total blocks and used 1 RPC(s). This took 9 msec to generate and 834 msecs for RPC and NN processing. Got back one command: FinalizeCommand/5.

答案1

执行摘要:

似乎您正在尝试复制不存在的 HDFS 文件夹中的文件


详细解答:

HDFS文件系统和常规文件系统之间存在巨大差异。

HDFS 文件系统- Hadoop 分布式文件系统 (HDFS) 旨在可靠地在大型集群中的机器之间存储非常大的文件。

文件系统分布在多台机器上,只能通过 HDFS 命令(或等效命令)访问。

假设您正在使用/user/Amit/hadoop-3.0.0/hadoop2_data/hdfs/datanodeHDFS 目标文件夹 - 我怀疑该文件夹不存在。

您可以运行以下命令来测试我的假设:

  • 将文件复制到HDFS /tmp文件夹中

    hadoop fs -put <LocalFileSystem_Path> /tmp
    
  • 将文件复制到HDFS默认文件夹(.

    hadoop fs -put <LocalFileSystem_Path> .
    

之后您可以执行ls(列出文件)命令 - 查看文件是否存在:

  • 列出HDFS /tmp文件夹中的文件

    hadoop dfs -ls /tmp
    
  • 列出HDFS默认文件夹中的文件 ( .)

    hadoop dfs -ls .
    

更多信息 -HDFS 文件系统外壳可以被找寻到这里


更新:

为了检查 HDFS 文件夹是否/user/Amit确实存在,可以执行以下命令:

hadoop dfs -ls /user/Amit

如果该文件夹存在,您可以使用以下命令将文件复制到那里:

hadoop fs -put <LocalFileSystem_Path> /user/Amit

如果该文件夹不存在,您需要调查 HDFS 文件系统,例如使用:

hadoop dfs -ls /

然后ls对子目录进行执行。

如果您拥有相关权限,您将能够创建子文件夹,例如如果该文件夹/user/Amit存在,您可能能够执行:

hdfs dfs -mkdir /user/Amit/newsubfolder

注意,您可以尝试使用该mkdir -p选项,这也会使父文件夹(如果您具有所需的权限)看到下面的两个示例(我猜您具有第一个命令所需的权限):

hdfs dfs -mkdir /tmp/Amit/fold1/fold2
hdfs dfs -mkdir /user/Amit/fold1/fold2

相关内容