Pyhdfs copy_from_local causing nodename nor servname provided, or not known error
我使用以下python代码使用
1 2 3 4 | from pyhdfs import HdfsClient client = HdfsClient(hosts='1.1.1.1',user_name='root') client.mkdirs('/jarvis') client.copy_from_local('/my/local/file,'/hdfs/path') |
使用python3.5 /。
Hadoop在默认端口运行:50070
1.1.1.1是我的远程Hadoop网址
创建目录"jarvis"工作正常,但复制文件不起作用。 我收到以下错误
Traceback (most recent call last):
File"test_hdfs_upload.py", line 14, in
client.copy_from_local('/tmp/data.json','/test.json')
File"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyhdfs.py", line 753, in copy_from_local
self.create(dest, f, **kwargs)
File"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyhdfs.py", line 426, in create
metadata_response.headers['location'], data=data, **self._requests_kwargs)
File"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/api.py", line 99, in put
return request('put', url, data=data, **kwargs)
File"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 383, in request
resp = self.send(prep, **send_kwargs)
File"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 486, in send
r = adapter.send(request, **kwargs)
File"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/adapters.py", line 378, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='ip-1-1-1-1', port=50075): Max retries exceeded with url: /webhdfs/v1/test.json?op=CREATE&user.name=root&namenoderpcaddress=ip-1-1-1-1:9000&overwrite=false (Caused by : [Errno 8] nodename nor servname provided, or not known)
首先,检查是否为HDFS群集启用了
要启用webhdfs,请按如下所示修改
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/path/to/namenode/dir/</value> </property> <property> <name>dfs.checkpoint.dir</name> <value>file:/path/to/checkpoint/dir/</value> </property> <property> <name>dfs.checkpoints.edits.dir</name> <value>file:/path/to/checkpoints-ed/dir/</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/path/to/datanode/dir/</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration> |
此外,当从PyHDFS库调用
然后尝试与该域进行HTTP连接以执行操作。这是因为您的主机无法理解(无法解析)域名失败。
要解析域,您需要在
例如,如果您有一个带有namenode和2个datanode的HDFS集群,则具有以下IP地址和主机名:
- 192.168.0.1(NameNode1)
- 192.168.0.2(DataNode1)
- 192.168.0.3(DataNode2)
您需要更新
1 2 3 4 5 | 127.0.0.1 localhost ::1 localhost 192.168.0.1 NameNode1 192.168.0.2 DataNode1 192.168.0.3 DataNode2 |
这将启用从主机到HDFS群集的域名解析,您可以通过PyHDFS进行webhdfs API调用。