Applying TDD to Classes Accessing Files

We start with unit tests reading a file line by line with QFile. Unit tests accessing the file system are not considered unit tests, as they may be slow and may depend on each other in surprising ways (see A Set of Unit Testing Rules by Michael Feathers). In a first step, we encapsulate QFile in a class TextFile and provide a fake implementation representing the file as a list of strings kept in main memory. In a second step, we introduce an interface, from which the product and fake implementations derive. We can now apply TDD to classes accessing files.

Bad Unit Tests Read from Files

# files/libffi/recipeinfo (Commit cdb19980)
LICENSE: MIT
PR: r0
PV: 3.2.1

The package scanner in the running example is part of a license compliance checker for Linux images built with Yocto or Buildroot. It reads the recipeinfo files for all packages in the Linux image.

// sources/package_scanner.cpp (Commit cdb19980)
PackageInfo PackageScanner::readRecipeInfo(QString packageName)
{
    QFile recipeInfo{QString{"files/%1/recipeinfo"}.arg(packageName)};  // A1
    if (!recipeInfo.open(QFile::ReadOnly))                             // A2
    {
        qWarning().noquote().nospace()
                << "Cannot read file \'" << recipeInfo.fileName() << "\'.";
        return {};
    }
    QString licStr;
    QString version;
    QString revision;
    QTextStream is{&recipeInfo};
    while (!is.atEnd())                                                 // A3
    {
        auto line = is.readLine();                                      // A4
        if (line.startsWith("LICENSE"))
        {
            licStr = line.split(':')[1].trimmed();
        }
        // Similar for the other lines ...
    }
    return PackageInfo{packageName, licStr, version, revision};
}

The function under test, PackageScanner::readRecipeInfo, reads the file files/libffi/recipeinfo for the package packageName, splits each line key: value into a key and its value, and stores the result in a PackageInfo object.

// tests/package_scanner_with_file_io/test_package_scanner_with_file_io.cpp
void TestPackageScannerWithFileIO::testReadRecipeInfo()
{
    PackageScanner scanner;
    auto package = scanner.readRecipeInfo("libffi");
    QCOMPARE(package.name(), "libffi");
    QCOMPARE(package.licenseString(), "MIT");
    QCOMPARE(package.version(), "3.2.1");
    QCOMPARE(package.revision(), "r0");
}

The test testReadRecipeInfo checks that the values from the recipeinfo file were correctly entered into the PackageInfo object package.

# tests/package_scanner_with_file_io/CMakeLists.txt
project(package_scanner_with_file_io)
find_package(Qt6 REQUIRED COMPONENTS Test Core)
file(COPY files/ DESTINATION files)                 # C1
...

The CMakeLists.txt file copies the directory tree rooted at files in the current source directory to the directory files in the current build directory (line C1).

The original package scanner has more tests. Each test has its own recipeinfo file. The tests check the following cases among others:

The scanner flags an error, if the license string is missing, if the recipeinfo file does not exist, or if the recipeinfo file is not readable.
The scanner flags a warning, if the version or revision is missing.

The section heading suggests that unit tests are bad, when they read from files or when they write to files. Why?

Before we can answer this question, we must understand what characterises good unit tests. Tim Ottinger and Jeff Langr explain that good Unit Tests Are FIRST: Fast, Isolated, Repeatable, Self-Verifying, and Timely.

Fast. Test suites that run longer than 3-5 seconds are too slow for TDD, where we may run unit tests several times per minute. Reading from or writing to files is one of the slowest operations on a computer. It gets slower the bigger the files become or the more often the files are accessed.
Isolated. Unit tests should only have a single reason to fail. The recipeinfo files are only copied (line C1), if we run CMake explicitly. Therefore, tests may work with out-of-date recipeinfo files and produce wrong results.
Repeatable. No matter how often or in which order we run tests, they should always produce the same result. Especially, tests should not depend on each other. They should not share data, e.g., through static or global variable or through files.
Self-verifying. A test, which requires a human to check an output in the console, is not self-verifying. Some developers use such “tests” to feign higher test coverage.
Timely. We write tests before the code and not after the code.

The package scanner tests are not fast, as they access the file system. They are not isolated, as they may fail if CMake was not run at the right time. Tests accessing files can easily become non-repeatable, if we don’t take special care. The tests are self-verifying, as they contain checks. They are also timely, as I write my tests first.

Good Unit Tests Read from File Doubles

We create a file double that holds the file contents in main memory. A text file can be represented as a list of strings: QStringList. My first idea was to derive a QStringList-based subclass from QFile, QFileDevice or even QIODevice. This approach would come with considerable effort, as I knew from creating a Mock QCanBusDevice for TDD. The effort would be out of proportion for reading or writing text files line by line.

This approach would also change the interface of PackageScanner from

PackageInfo readRecipeInfo(QString packageName);

PackageInfo readRecipeInfo(QString packageName, QFileDevice *file);

Production code would pass a QFile object to readRecipeInfo, whereas test code would pass an InMemoryFile object. Following down this path would lead to few steps with big changes. I prefer many small steps with small changes. So, what now? That’s when I stumbled over this tweet from Michael “GeePaw” Hill.

I usually encapsulate the native collection classes — List, Set, Map, et al — within minutes of using them, and sometimes even *before* I use them. I recommend the practice to others, especially juniors.
Tweet by @GeePawHill

My idea was to encapsulate the class QFile (used in lines A1, A2, A3 and A4) in the class TextFile. The implementation of readRecipeInfo would change a little, but not the interface.

// sources/package_scanner.cpp (Commit 60387030)
PackageInfo PackageScanner::readRecipeInfo(QString packageName)
{
    TextFile recipeInfo{QString{"files/%1/recipeinfo"}.arg(packageName)}; // D1
    QString licStr;
    QString version;
    QString revision;
    while (!recipeInfo.isAtEnd())                                        // D2
    {
        auto line = recipeInfo.readLine();                               // D3
        if (line.startsWith("LICENSE"))
        {
            licStr = line.split(':')[1].trimmed();
        }
        ...
    }
    return PackageInfo{packageName, licStr, version, revision};
}

The lines A1 and A2 from the original implementation are replaced by the line D1. Lines A3 and A4 are replaced by lines D2 and D3, respectively.

The function readRecipeInfo does not know how file access is implemented. It doesn’t know whether files are accessed through QFile and whether files are stored on the hard disk or in a QStringList. TextFile hides the implementation details from its clients. This makes it easy to change the implementation of TextFile for testing. Here is the slightly abridged production version of TextFile. It uses QFile to read from the file system.

// sources/text_file.cpp (Commit 60387030)

struct TextFile::Impl
{
    Impl(QString filePath);
    ~Impl();
    QFile m_file;
    QTextStream m_inStream;
};

TextFile::Impl::Impl(QString filePath)
    : m_file{filePath}
{
    if (!m_file.open(QFile::ReadOnly))
    {
        throw std::runtime_error(
            QString{"Cannot read file \'%1\'."}.arg(filePath).toStdString());
    }
    m_inStream.setDevice(&m_file);
}

TextFile::Impl::~Impl() { m_file.close(); }

TextFile::TextFile(QString filePath)
    : m_impl{new Impl{filePath}} {}

bool TextFile::isAtEnd() const { return m_impl->m_inStream.atEnd(); }

QString TextFile::readLine() { return m_impl->m_inStream.readLine(); }

And yes, the production version of readRecipeInfo must handle the exception thrown in the constructor. I left it out for brevity. Throwing the exception ensures that no TextFile object exists – especially not a partially created object, when an error condition occurs.

We use the pimpl pattern so that the TextFile header does not contain any traces of QFile or of any other implementation details. Our next step is to replace the QFile-based implementation of TextFile by a QStringList-based one.

// tests/doubles/fake_text_file.cpp (Commit ecaf965c)
struct TextFile::Impl
{
    Impl(QString filePath);
    ~Impl();
    bool m_isOpen{true};
    QStringList m_lines{
        "LICENSE: MIT",
        "PR: r0",
        "PV: 3.2.1"
    };
    int m_currentLine{0};
};

TextFile::Impl::Impl(QString filePath)
{
    if (!m_isOpen)
    {
        throw std::runtime_error(
            QString{"Cannot read file \'%1\'."}.arg(filePath).toStdString());
    }
}

TextFile::Impl::~Impl() {}

TextFile::TextFile(QString filePath)
    : m_impl{new Impl{filePath}} {}

bool TextFile::isAtEnd() const
{
    return m_impl->m_currentLine == m_impl->m_lines.count();
}

QString TextFile::readLine()
{
    auto line = m_impl->m_lines[m_impl->m_currentLine];
    ++m_impl->m_currentLine;
    return line;
}

We store the recipeinfo file in the member variable m_lines, which is a QStringList. The member variable m_currentLine is the index of the line returned by the next call to readLine. The function readLine increments m_currentLine at each call. The end of the “file” is reached when m_currentLine is equal to the number of lines in m_lines. “Opening” a file always succeeds, as m_isOpen is initialised with true. The exception is never thrown.

The fake implementation only works for a single recipeinfo file. That’s OK. We will grow the implementation test by test in the next section. Proceeding in small steps is the gist of TDD after all.

The tests using the fake TextFile are in the project tests/package_scanner_without_file_io. The test function TestPackageScannerWithoutFileIO::testReadRecipeInfo is identical to TestPackageScannerWithFileIO::testReadRecipeInfo above. The CMakeLists.txt file does not copy any recipeinfo files around but simply adds fake_text_file.cpp to the executable.

# tests/package_scanner_without_file_io/CMakeLists.txt
project(package_scanner_without_file_io)
find_package(Qt6 REQUIRED COMPONENTS Test Core)
add_executable(
    ${PROJECT_NAME}
    test_package_scanner_without_file_io.cpp
    ../../sources/package_scanner.cpp
    ../../sources/package_info.cpp
    ../doubles/fake_text_file.cpp
)

The test TestPackageScannerWithoutFileIO::testReadRecipeInfo is fast and isolated. It satisfies all FIRST criteria. That’s a big step, but we are not finished yet. We need the fake TextFile implementation to work with different file contents.

Growing the File Double Test by Test

The fake implementation of TextFile (in fake_text_file.cpp) is a bit simplistic. It works only with a single hard-wired recipeinfo file. Let us change this.

Cannot Open File

// tests/package_scanner_without_file_io/test_package_scanner_without_file_io.cpp
// (Commit: 50b1f5f2)
void TestPackageScannerWithoutFileIO::testCannotOpenRecipeInfo()
{
    PackageScanner scanner;
    auto package = scanner.readRecipeInfo(u"cannot-open"_qs);
    QVERIFY(!package.isValid());
}

The next test checks whether readRecipeInfo returns an invalid package, if it cannot open the text file, that is, if the TextFile constructor throws an exception. readRecipeInfo now has a catch block that returns a default constructed PackageInfo object, which is always invalid.

TextFile::Impl::Impl(QString filePath)
{
    if (!m_isOpen)
    {
        throw std::runtime_error(
            QString{"Cannot read file \'%1\'."}.arg(filePath).toStdString());
    }
}

We make the fake implementation throw the exception by forcing m_isOpen to false. When TextFile is created with a different file path, say files/cannot-open/recipeinfo, the fake implementation initialises m_isOpen with false and m_lines with an arbitrary string list (e.g., the empty string list).

struct TextFileData
{
    bool m_isOpen{false};
    QStringList m_lines;
};

struct TextFile::Impl
{
    Impl(QString filePath);
    ~Impl();
    QHash<QString, TextFileData> m_fileSystem{            // E1
        {u"files/libffi/recipeinfo"_qs,
            {true, {u"LICENSE: MIT"_qs,
                    u"PR: r0"_qs,
                    u"PV: 3.2.1"_qs}}},
        {u"files/cannot-open/recipeinfo"_qs,
            {false, {}}},
    };

    bool m_isOpen{false};
    QStringList m_lines;
    int m_currentLine{0};
};

TextFile::Impl::Impl(QString filePath)
{
    auto textFileData = m_fileSystem.value(filePath);     // E2
    m_isOpen = textFileData.m_isOpen;                   // E3
    m_lines = textFileData.m_lines;                     // E4
    if (!m_isOpen)                                      // E5
    {
        throw std::runtime_error(QString{"Cannot read file \'%1\'."}.arg(filePath).toStdString());
    }
}

We are moving from a single file to multiple files. The hash map m_fileSystem from file paths to TextFileData (file contents and attributes) reflects this (see line E1). The first entry of the hash map provides a proper recipeinfo file, which can be opened, to the test testReadRecipeInfo. The second entry provides an empty file (key: u"files/cannot-open/recipeinfo"_qs), which cannot be opened (value: {false, {}}), to the test testCannotOpenRecipeInfo.

Line E2 retrieves the TextFileData for a given filePath from the hash map m_fileSystem. Lines E3 and E4 initiase m_isOpen and m_lines with the values from the hash map. For the file path u"files/cannot-open/recipeinfo"_qs, m_isOpen is false and condition E5 evaluates to true. The TextFile::Impl constructor throws the exception. readRecipeInfo catches the exception and returns an invalid PackageInfo object. The test testCannotOpenRecipeInfo passes.

Missing License

Thanks to the file system double, writing tests becomes easy. We just add another TextFileData entry to the file system map.

// tests/package_scanner_without_file_io/test_package_scanner_without_file_io.cpp
// Commit: 520b8d5e0
void TestPackageScannerWithoutFileIO::testLicenseMissingInRecipeInfo()
{
    PackageScanner scanner;
    auto package = scanner.readRecipeInfo(u"missing-license"_qs);
    QVERIFY(package.isValid());
    QCOMPARE(package.name(), u"missing-license"_qs);
    QVERIFY(package.licenseString().isEmpty());
}

The test checks that the license string is empty, if the recipeinfo file doesn’t give a value for LICENSE. We add the following entry to the filesystem map for the file path files/missing-license/recipeinfo.

// tests/doubles/fake_text_file.cpp    
QHash<QString, TextFileData> m_fileSystem{
    ...,
    {u"files/missing-license/recipeinfo"_qs,
        {true, {u"LICENSE: "_qs,
                u"PR: r4"_qs,
                u"PV: 6.3.2"_qs}}},
};

The test testLicenseMissingInRecipeInfo and the other two tests pass.

File System Double for Different Test Cases

The fake TextFile still has a problem. The file system double is specific for the test case TestPackageScannerWithoutFileIO. We need different file system doubles for different test cases.

// tests/doubles/fake_text_file.cpp(Commit: be9f55b3)
#include "file_system_double.h"
#include "text_file.h"

struct TextFile::Impl
{
    Impl(QString filePath);
    ~Impl();
    QHash<QString, TextFileData> m_fileSystem = fileSystemDouble();
    // As before ...
};

We extract the definition of the file system double from fake_text_file.cpp and move it into the header file file_system_double.h.

// tests/package_scanner_without_file_io/file_system_double.h
struct TextFileData
{
    bool m_isOpen{false};
    QStringList m_lines;
};

inline static QHash<QString, TextFileData> fileSystemDouble()
{
    return QHash<QString, TextFileData>{
        {u"files/libffi/recipeinfo"_qs,
            {true, {u"LICENSE: MIT"_qs,
                    u"PR: r0"_qs,
                    u"PV: 3.2.1"_qs}}},
        // More files ...
    };
}

The header file_system_double.h is located in the same directory – tests/package_scanner_without_file_io – as the test case test_package_scanner_without_file_io.cpp, as the file system double is specific to this test case. The fake TextFile, fake_text_file.cpp, is in the directory tests/doubles so that multiple test cases can use it. Each test case provides its own file_system_double.h and reuses fake_text_file.cpp.

Splendid: Abstract TextFile with Product and Test Implemenations

So far, we made the class PackageScanner testable according to the FIRST principles without changing the interface of the class under test. This wasn’t easy, because we can’t set the text file through the interface of PackageScanner. The implementation hides, which file is read from the real file system or its double. We must provide the text files through the back door by including the file_system_double.h header that fits to the respective test case.

// sources/package_scanner.cpp (Commit: 788eecdf)
PackageInfo PackageScanner::readRecipeInfo(QString packageName)
{
    try
    {
        TextFile recipeInfo{QString{"files/%1/recipeinfo"}.arg(packageName)};
        return readRecipeInfo(packageName, recipeInfo);
    }
    catch (...)
    {
        return {};
    }
}

PackageInfo PackageScanner::readRecipeInfo(QString packageName, TextFile &recipeInfo)
{
    QString licStr;
    QString version;
    QString revision;
    while (!recipeInfo.isAtEnd()) { ... }
    return {packageName, licStr, version, revision};
}

We temporarily duplicate the readRecipeInfo function in the class PackageScanner. The new function takes an additional second parameter: a reference to a TextFile object. This enables clients of PackageScanner to pass a TextFile object explicitly. The original readRecipeInfo function creates a TextFile object, recipeInfo, and calls the new function with packageName and recipeInfo.

At this point, we haven’t changed any client code yet. The clients still call the old version of readRecipeInfo, which calls the new version. The tests pass and give us high confidence that the new version works correctly. We can now make one client after the other call the new version of readRecipeInfo.

// tests/package_scanner_without_file_io/test_package_scanner_without_file_io.cpp
void TestPackageScannerWithoutFileIO::testReadRecipeInfo()
{
    PackageScanner scanner;
    TextFile recipeInfo{u"files/libffi/recipeinfo"_qs};
    auto package = scanner.readRecipeInfo(u"libffi"_qs, recipeInfo);
    QVERIFY(package.isValid());
    // More checks ...
}

Calling the new version works well for testReadRecipeInfo and testLicenseMissingInRecipeInfo. It crashes for testCannotOpenRecipeInfo, because the test doesn’t catch the exception thrown by the TextFile constructor.

// tests/package_scanner_without_file_io/test_package_scanner_without_file_io.cpp
void TestPackageScannerWithoutFileIO::testCannotOpenRecipeInfo()
{
    PackageScanner scanner;
    QVERIFY_EXCEPTION_THROWN(
        TextFile recipeInfo{u"files/cannot-open/recipeinfo"_qs};
        auto package = scanner.readRecipeInfo(u"cannot-open"_qs, recipeInfo),
        std::runtime_error
    );
}

The QVERIFY_EXCEPTION_THROWN macro passes if the expression in its first argument throws an exception of the type given in its second argument. The TextFile constructor throws the exception. Therefore, we could omit the call to readRecipeInfo, as it is never called.

We can replace the old version of readRecipeInfo by the new one in TestPackageScannerWithFileIO. It will read from real files, because it uses sources/text_file.cpp instead of test/doubles/fake_text_file.cpp.

We remove the old version of readRecipeInfo from PackageScanner. We also remove PackageInfo::isValid, as readRecipeInfo does not catch the TextFile exception and hence does not return an invalid package any more.

We are now ready to get rid of the crutch file_system_double.h. We turn TextFile into an interface AbstractTextFile and pass an AbstractTextFile reference to readRecipeInfo. We derive TextFile and FakeTextFile from AbstractTextFile. TextFile is the product version reading from real files and FakeTextFile is the version for unit testing reading from a string list. Let us do this small step by small step. Here is the interface AbstractTextFile.

// sources/abstract_text_file.h (Commit: c7530ab2)
class AbstractTextFile
{
public:
    virtual ~AbstractTextFile() = default;

    virtual bool isAtEnd() const = 0;
    virtual QString readLine() = 0;
};

The class TextFile inherits the interface AbstractTextFile and includes the header abstract_text_file.h.

#include "abstract_text_file.h"

class TextFile : public AbstractTextFile
{
    // As before ...
};

The function readRecipeInfo takes an AbstractTextFile reference as its second argument instead of the concrete TextFile reference. The header file forward declares AbstractTextFile instead of TextFile and the source file includes abstract_text_file.h instead of text_file.h.

// sources/package_scanner.h
class AbstractTextFile;

class PackageScanner
{
public:
    PackageInfo readRecipeInfo(QString packageName, AbstractTextFile &recipeInfo);

The tests pass after these changes. We create a header for the class FakeTextFile in test/doubles/fake_text_file.h derived from AbstractTextFile. The header looks exactly the same as sources/text_file.h except that TextFile is replaced by FakeTextFile. Similarly, we replace TextFile by FakeTextFile and text_file.h by fake_text_file.h in test/doubles/fake_text_file.cpp.

// tests/package_scanner_without_file_io/test_package_scanner_without_file_io.cpp
// Commit: 48d62561
void TestPackageScannerWithoutFileIO::testReadRecipeInfo()
{
    PackageScanner scanner;
    FakeTextFile recipeInfo{u"files/libffi/recipeinfo"_qs};
    auto package = scanner.readRecipeInfo(u"libffi"_qs, recipeInfo);
    QCOMPARE(package.name(), u"libffi"_qs);
    // ...

The test case TestPackageScannerWithoutFileIO uses FakeTextFile for QStringList-based files, whereas the test case TestPackageScannerWithFileIO uses TextFile for real files. The time has come to sunset the crutch file_system_double.h used by TestPackageScannerWithoutFileIO.

// tests/doubles/fake_text_file.cpp (Commit: 358ff6c0)
#include "fake_text_file.h"

struct FakeTextFile::Impl
{
    bool m_isOpen{false};
    QStringList m_lines;
    int m_currentLine{0};
};

FakeTextFile::FakeTextFile(QString filePath, bool isOpen, QStringList lines)
    : m_impl{new Impl{isOpen, lines, 0}}
{
    if (!m_impl->m_isOpen)
    {
        throw std::runtime_error(
            QString{"Cannot read file \'%1\'."}.arg(filePath).toStdString());
    }
}

We pass the file attribute isOpen and the file contents lines to the FakeTextFile constructor, which stores these arguments in the structure FakeTextFile::Impl. The two new constructor arguments mirror one TextFileData entry of the hash map m_fileSystem. Instead of adding entries to the hash map in file_system_double.h, every FakeTextFile constructor call sets the file attribute and the file contents.

Each test contains all the information needed to understand it in one glance. That’s a lot better than having to find the file information in an extra header file file_system_double.h.

// tests/package_scanner_without_file_io/test_package_scanner_without_file_io.cpp
void TestPackageScannerWithoutFileIO::testReadRecipeInfo()
{
    PackageScanner scanner;
    FakeTextFile recipeInfo{
        u"files/libffi/recipeinfo"_qs, true,
        {u"LICENSE: MIT"_qs,
         u"PR: r0"_qs,
         u"PV: 3.2.1"_qs}
    };
    auto package = scanner.readRecipeInfo(u"libffi"_qs, recipeInfo);
    QCOMPARE(package.name(), u"libffi"_qs);

The macro QVERIFY_EXCEPTION_THROWN gets confused by the brace initialisation of the FakeTextFile constructor. It thinks that it receives 4 instead of 2 arguments. We can fix this by using parentheses instead of braces. The compiler is happy with this. QtCreator keeps flagging this as an error – wrongly. So, we ignore QtCreator.

// tests/package_scanner_without_file_io/test_package_scanner_without_file_io.cpp
void TestPackageScannerWithoutFileIO::testCannotOpenRecipeInfo()
{
    PackageScanner scanner;
    QVERIFY_EXCEPTION_THROWN(
        FakeTextFile recipeInfo(u"files/cannot-open/recipeinfo"_qs, false, {});
        auto package = scanner.readRecipeInfo(u"cannot-open"_qs, recipeInfo),
        std::runtime_error
    );
}

All tests pass. We have successfully converted integration tests accessing the file system into unit tests accessing in-memory “files”. Hard-to-test code becomes easy to test. We did this using TDD in many small steps.

How to Follow Along

You find the example code in the directory BlogPosts/TDDonClassesWithFileIO of the GitHub repository embeddeduse. Check out the commit SHAs given in the code snippets to follow along step by step.

4 thoughts on “Applying TDD to Classes Accessing Files”

Aurélien 2022/09/25 at 16:09


Hello,
What about moving the file read operation out of the scanning code ? This is a common practice to separate data read from processing.

The scanner class would be a simple function :

using FileContent = QStringList ; // to be improved
PackageInfo scanPackageRecipeInfo(const FileContent &);

Usage:

const auto info = scanPackageRecipeInfo (mustReadFileContent(“…”));

Then your tests can provide simple raw text that you split in QStringList which is IMHO simpler to maintain.

What do you think ?
1. Burkhard Stubert 2022/10/15 at 14:35
  
  
  Hi Aurélien,
  
  That is certainly an option, when I refactor the legacy code further. However, the focus of the post was on file I/O.
  
  Cheers,
  Burkhard
David 2022/09/25 at 23:38


Hi,

Thx for the write up.

You may add an extra test where one line is missing the colon character. Example:
LICENSE MIT
PR: r0
PV: 3.2.1
1. Burkhard Stubert 2022/10/15 at 14:26
  
  
  Hi David,
  
  Absolutely. I use this test and many others for the original code. For the post, all these tests would be a bit too much.
  
  Cheers,
  Burkhard