[ACCEPTED]-HDFS error: could only be replicated to 0 nodes, instead of 1-hadoop
WARNING: The following will destroy ALL data on HDFS. Do not execute the steps in this answer unless you do not care about destroying existing data!!
You should do this:
- stop all hadoop services
- delete dfs/name and dfs/data directories
hdfs namenode -format
Answer with a capital Y- start hadoop services
Also, check the diskspace 2 in your system and make sure the logs are 1 not warning you about it.
This is your issue - the client can't communicate 15 with the Datanode. Because the IP that the 14 client received for the Datanode is an internal 13 IP and not the public IP. Take a look at 12 this
http://www.hadoopinrealworld.com/could-only-be-replicated-to-0-nodes/
Look at the sourcecode from DFSClient$DFSOutputStrem 11 (Hadoop 1.2.1)
//
// Connect to first DataNode in the list.
//
success = createBlockOutputStream(nodes, clientName, false);
if (!success) {
LOG.info("Abandoning " + block);
namenode.abandonBlock(block, src, clientName);
if (errorIndex < nodes.length) {
LOG.info("Excluding datanode " + nodes[errorIndex]);
excludedNodes.add(nodes[errorIndex]);
}
// Connection failed. Let's wait a little bit and retry
retry = true;
}
The key to understand here 10 is that Namenode only provide the list of 9 Datanodes to store the blocks. Namenode 8 does not write the data to the Datanodes. It 7 is the job of the Client to write the data 6 to the Datanodes using the DFSOutputStream 5 . Before any write can begin the above code 4 make sure that the Client can communicate 3 with the Datanode(s) and if the communication 2 fails to the Datanode, the Datanode is added 1 to the excludedNodes .
Look at following:
By seeing this exception(could 24 only be replicated to 0 nodes, instead of 23 1), datanode is not available to Name Node..
This 22 are the following cases Data Node may not 21 available to Name Node
Data Node disk is 20 Full
Data Node is Busy with block report 19 and block scanning
If Block Size is Negative 18 value(dfs.block.size in hdfs-site.xml)
while 17 write in progress primary datanode goes 16 down(Any n/w fluctations b/w Name Node and 15 Data Node Machines)
when Ever we append any 14 partial chunk and call sync for subsequent 13 partial chunk appends client should store 12 the previous data in buffer.
For example 11 after appending "a" I have called 10 sync and when I am trying the to append 9 the buffer should have "ab"
And 8 Server side when the chunk is not multiple 7 of 512 then it will try to do Crc comparison 6 for the data present in block file as well 5 as crc present in metafile. But while constructing 4 crc for the data present in block it is 3 always comparing till the initial Offeset 2 Or For more analysis Please the data node 1 logs
Reference: http://www.mail-archive.com/hdfs-user@hadoop.apache.org/msg01374.html
I had a similar problem setting up a single 3 node cluster. I realized that I didn't config 2 any datanode. I added my hostname to conf/slaves, then 1 it worked out. Hope it helps.
I'll try to describe my setup & solution: My 10 setup: RHEL 7, hadoop-2.7.3
I tried to setup 9 standalone Operation first and then Pseudo-Distributed Operation where the latter failed 8 with the same issue.
Although, when I start 7 hadoop with:
sbin/start-dfs.sh
I got the following:
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/<user>/hadoop-2.7.3/logs/hadoop-<user>-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /home/<user>/hadoop-2.7.3/logs/hadoop-<user>-datanode-localhost.localdomain.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/<user>/hadoop-2.7.3/logs/hadoop-<user>-secondarynamenode-localhost.localdomain.out
which looks 6 promising (starting datanode.. with no failures) - but 5 the datanode wasn't exist indeed.
Another 4 indication was to see that there is no datanode 3 in operation (the below snapshot shows fixed 2 working state):
I've fix that issue by doing:
rm -rf /tmp/hadoop-<user>/dfs/name
rm -rf /tmp/hadoop-<user>/dfs/data
and 1 then start again:
sbin/start-dfs.sh
...
I had the same error on MacOS X 10.7 (hadoop-0.20.2-cdh3u0) due 4 to data node not starting.
start-all.sh
produced following 3 output:
starting namenode, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
localhost: ssh: connect to host localhost port 22: Connection refused
localhost: ssh: connect to host localhost port 22: Connection refused
starting jobtracker, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
localhost: ssh: connect to host localhost port 22: Connection refused
After enabling ssh login via System Preferences -> Sharing -> Remote Login
it 2 started to work.
start-all.sh
output changed to following 1 (note start of datanode):
starting namenode, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
Password:
localhost: starting datanode, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
Password:
localhost: starting secondarynamenode, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
starting jobtracker, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
Password:
localhost: starting tasktracker, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
And I think you should make sure all the 5 datanodes are up when you do copy to dfs. In 4 some case, it takes a while. I think that's 3 why the solution 'checking the health status' works, because 2 you go to the health status webpage and 1 wait for everything up, my five cents.
It take me a week to figure out the problem 12 in my situation.
When the client(your program) ask 11 the nameNode for data operation, the nameNode 10 picks up a dataNode and navigate the client 9 to it, by giving the dataNode's ip to the 8 client.
But, when the dataNode host is configured 7 to has multiple ip, and the nameNode gives 6 you the one your client CAN'T ACCESS TO, the 5 client would add the dataNode to exclude 4 list and ask the nameNode for a new one, and 3 finally all dataNode are excluded, you get 2 this error.
So check node's ip settings before 1 you try everything!!!
If all data nodes are running, one more 5 thing to check whether the HDFS has enough 4 space for your data. I can upload a small 3 file but failed to upload a big file (30GB) to 2 HDFS. 'bin/hdfs dfsadmin -report' shows 1 that each data node only has a few GB available.
Have you tried the recommend from the wiki 11 http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment ?
I was getting this error when putting 10 data into the dfs. The solution is strange 9 and probably inconsistent: I erased all 8 temporary data along with the namenode, reformatted 7 the namenode, started everything up, and 6 visited my "cluster's" dfs health 5 page (http://your_host:50070/dfshealth.jsp). The 4 last step, visiting the health page, is 3 the only way I can get around the error. Once 2 I've visited the page, putting and getting 1 files in and out of the dfs works great!
Reformatting the node is not the solution. You 5 will have to edit the start-all.sh. Start 4 the dfs, wait for it to start completely 3 and then start mapred. You can do this using 2 a sleep. Waiting for 1 second worked for 1 me. See the complete solution here http://sonalgoyal.blogspot.com/2009/06/hadoop-on-ubuntu.html.
I realize I'm a little late to the party, but 14 I wanted to post this for future visitors 13 of this page. I was having a very similar 12 problem when I was copying files from local 11 to hdfs and reformatting the namenode did 10 not fix the problem for me. It turned out 9 that my namenode logs had the following 8 error message:
2012-07-11 03:55:43,479 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-920118459-192.168.3.229-50010-1341506209533, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Too many open files
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:883)
at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:491)
at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.createTmpFile(FSDataset.java:462)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.createTmpFile(FSDataset.java:1628)
at org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1514)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:113)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:381)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:171)
Apparently, this is a relatively 7 common problem on hadoop clusters and Cloudera suggests increasing 6 the nofile and epoll limits (if on kernel 5 2.6.27) to work around it. The tricky thing 4 is that setting nofile and epoll limits 3 is highly system dependent. My Ubuntu 10.04 server required a slightly different configuration for this 2 to work properly, so you may need to alter 1 your approach accordingly.
Don't format the name node immediately. Try 3 stop-all.sh and start it using start-all.sh. If 2 the problem persists, go for formatting 1 the name node.
Follow the below steps:
1. Stop dfs
and yarn
.
2. Remove 2 datanode and namenode directories as specified 1 in the core-site.xml
.
3. Start dfs
and yarn
as follows:
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.