Ruby Notes - Working With Files
My next “Ruby Notes” post was going to be on arrays but in my last couple of mini projects, Text Munger problem on RubyQuiz, and building an image editor command line app, I had to work a lot with files and directories and realized I didn’t have a great handle on them. To remedy this I did what I always do, a bunch of reading, practice problems, and put it all down in my notebook. There still is a lot for me to learn but I think this lays a good foundation for understanding and working with files in Ruby.
IO Class
The IO class is the parent class for the File class and thus is where it gets a ton of its methods such as readlines and readline. IO stands for input/output, specifically input/output streams which are sequences of data that allow you to do things like play sound on your speakers and print output to a screen. The IO class allows you to initialize streams and do things with them.
Standard Output, Input, and Error
STDOUT, STDIN, and STDERR are ruby constants that are IO objects pointing to your programs output, input, and error streams. You can access these streams through the terminal without opening any files.
When you do something like call puts, output is sent to the IO object that STDOUT points to. Conversely when you call get, input is captured by the IO object that STDIN points to.
Further Reading: https://rubymonk.com/learning/books/1-ruby-primer/chapters/42-introduction-to-i-o/lessons/89-streams
File Class
According to the ruby doc, a File is an abstraction of any file object accessible by the program and is closely associated with the class IO (it’s a subclass of IO).
You use the File class to create files, read them, and write to them. There are various modes that can be given to the File class telling it what its behaviour is i.e. can read it, can write to it, can do both, etc. These modes are inherited from the IO class and are listed below.
Modes
Mode | Meaning |
---|---|
“r” | Read-only, starts at beginning of file (default mode). |
“r+” | Read-write, starts at beginning of file. |
“w” | Write-only, truncates existing file to zero length or creates a new file for writing. |
“w+” | Read-write, truncates existing file to zero length or creates a new file for reading and writing. |
“a” | Write-only, starts at end of file if file exists, otherwise creates a new file for writing. |
“a+” | Read-write, starts at end of file if file exists, otherwise creates a new file for reading and writing. |
“b” | Binary file mode (may appear with any of the key letters listed above). Suppresses EOL <-> CRLF conversion on Windows. And sets external encoding to ASCII-8BIT unless explicitly specified. |
“t” | Text file mode (may appear with any of the key letters listed above except “b”). |
Writing to a File
1 2 3 4 5 |
|
On the first line I’m calling the .open
method on the File class and passing it the file text.txt and the mode I want the file to use, “w”. Next I’m using the .puts
method to write to the file and passing it the text I want it to write to the file. Note, that If we didn’t have a file text.txt in our directory, this script would have created it.
Using Block Notation
1
|
|
Note that when passing a block to File you don’t have to close it because when the block is exited it closes the File for you.
Reading a File
1 2 3 4 |
|
This is pretty simple. We’re opening the file we want to read with the .open
method and storing it in the file variable. Then we call the .read
method on file and store it in contents and then puts the contents.
.read
starts reading from the place the last .read
operation stopped. Here we’ve read the entire file and thus if below puts contents
we tried to read the file again there would be nothing to read because we’re at the end of the file.
Reading a File Block Notation
1 2 |
|
Closing Files
If you open a file make sure you close it, unless you’re passing File a block and then the block will close the file when it ends.
The reason you need to close files is it forces a “flush”, which means it pushes the data-to-be-written to where you want it to be. This frees up memory for the rest of your program and ensures the file is available for other processes to access.
Further Reading: http://ruby.bastardsbook.com/chapters/io/
More File Methods
We’ve already seen some file methods like .open and .close but here are some more useful ones. Checkout the ruby doc for File and IO for the rest of them.
.readlines & .readline
These two methods can be very handy when you want to read one line at a time. This would be useful for instance if you are reading a comma delimited file.
.readlines
- takes in all the content of the file and stores each line as an element of an array. From here you can iterate over each line using each.
1 2 3 |
|
.readline
- is a bit different in it only reads one line at a time and thus you need to keep advancing it forward in the file, which can be done with a while
or until
method.
1 2 3 4 5 6 |
|
The reason you would want to .readline
vs .readlines
is because .readlines
loads the entire contents of the file into memory. For a small script working with small files this isn’t a problem but if you are using large files and/or have multiple users this is bad.
.exists?
- checks for the existence of the file.
1 2 3 |
|
.absolute_path
- gets the absolute path for the.
1 2 |
|
.basename
- gives you just the filename.
1 2 |
|
.directory?
- returns true if the string passed to it is a directory.
1 2 3 |
|
Dir Class
The Directory class allows you to work with driectories as you’d expect. Most of the methods you can use on the directory class are the same as the commands you use in the console.
Some Dir Methods
.pwd
- tells you what directory you’re in.
1 2 |
|
.chdir
- allows you to change to a new directory.
1 2 |
|
.mkdir
- makes a new directory named the string it is passed.
1 2 |
|
.rmdir
- removes an empty directory but throws an error if it contains files. To remove a directory with files you must use the FileUtils module.
1 2 |
|
Accessing Directory Content
There are two ways to grab content from directories, using .entries and .glob.
.entries
- returns an array with every single entry inside the diretory including “.” and hidden files.
1 2 |
|
.glob
- can be passed a directory name or pattern such as *.txt
and returns an array of just the visible files
1 2 |
|
Gives us the files in the current directory.
1 2 |
|
Here we use */.txt
to search the current directory and all it’s sub directories for any .txt files using a recursive search and passing it the pattern .txt.
FileUtils Module
I’m not going to go into FileUtils too much but it allows more control over files and mimics a lot of the command line commands and flags you can use such as rm -rf
for removing directories that contain files.
Some Methods
.mkdir
- makes a directory
.touch
- makes a file
.rm_rf
- removes a directory whether it contains other files and directories or not
1 2 3 |
|
Note: you need to require FileUtils in your files with require ‘fileutils’