Back to Data Science Ipython Notebooks

HDFS

spark/hdfs.ipynb

latest1.2 KB
Original Source

This notebook was prepared by Donne Martin. Source and license info is on GitHub.

HDFS

Run an HDFS command:

python
!hdfs

Run a file system command on the file systems (FsShell):

python
!hdfs dfs

List the user's home directory:

python
!hdfs dfs -ls

List the HDFS root directory:

python
!hdfs dfs -ls /

Copy a local file to the user's directory on HDFS:

python
!hdfs dfs -put file.txt file.txt

Display the contents of the specified HDFS file:

python
!hdfs dfs -cat file.txt

Print the last 10 lines of the file to the terminal:

python
!hdfs dfs -cat file.txt | tail -n 10

View a directory and all of its files:

python
!hdfs dfs -cat dir/* | less

Copy an HDFS file to local:

python
!hdfs dfs -get file.txt file.txt

Create a directory on HDFS:

python
!hdfs dfs -mkdir dir

Recursively delete the specified directory and all of its contents:

python
!hdfs dfs -rm -r dir

Specify HDFS file in Spark (paths are relative to the user's home HDFS directory):

python
data = sc.textFile ("hdfs://hdfs-host:port/path/file.txt")