• Articles
  • Improving type safety and security with
Published by
Mar 14, 2010 (last update: Jul 13, 2011)

Improving type safety and security with sequence containers

Score: 3.0/5 (9 votes)
*****
Organization
Part 1) Passing arrays to functions

Part 2) Returning arrays from functions

If you have any trouble understanding any of the examples, consider reviewing the arrays and templates tutorials. For the templates tutorial you only need to read the first section on function templates.

http://www.cplusplus.com/doc/tutorial/arrays/
http://www.cplusplus.com/doc/tutorial/templates/
Terminology
The use of the word array will immediately result in some confusion for a number of reasons.

1) Other languages have built in smart array types which do not work the same way as C/C++ arrays. The term array is defined in numerous dictionaries and so there is a generalized concept of an array which leads to confusion when discussing specific kinds of array types that are defined by C++ or some other language.

2) The std::vector is described by the C++ standard as a sequence container but C++ programmers commonly refer to the std::vector as a dynamic array. In fact any of the standard sequence containers that provide random access will fit into a more general definition of the term array.

For instance, consider these definitions:
dictionary.com
Computers. a block of related data elements, each of which is usually identified by one or more subscripts.

Merriam Webster
(1) : a number of mathematical elements arranged in rows and columns (2) : a data structure in which similar elements of data are arranged in a table b : a series of statistical data arranged in classes in order of magnitude

When I use the term array I am talking about the more general definition of the concept that you would find in any dictionary. When referring to the "data structure" described by section 8.3.4 of the C++ std I will use the term C-Array. The following example shows an example of a C-Array. This type of a data structure existed in the C language and must be supported by C++ compilers. I'll use numerous examples to explain why it is sometimes better to consider using one of the standard sequence containers.

1
2
3
const int SIZE(5);
int data[SIZE];
std::generate(data, data + SIZE, rand);


PART I - Passing arrays to functions
Compile and execute the program. It contains a defect which will be evident when you analyze the output.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include <iostream>

void printArray(int data[])
{
    for(int i = 0, length = sizeof(data); i < length; ++i)
    {
        std::cout << data[i] << ' ';
    }
    std::cout << std::endl;
}

int main()
{
    int data[] = { 5, 7, 8, 9, 1, 2 };
    printArray(data);
    return 0;
}


You will see that only the first 4 elements of the C-Array are printed. The call to sizeof(data) returns a value of 4! That happens to be the size of the pointer used to pass the C-Array to printArray. That has a couple of implications. First, the array does not get copied. The pointer to the first element of the array is copied. C-Arrays do not have copy constructors, assignment operators, or functional interfaces. In the following examples you will see examples using std::vector, std::deque, and std::list which are dynamic sequence containers provided by the C++ std template library. This is not a full tutorial on those containers but they are used in order to show the flexibility in the proposed improvements to the flawed program.

Let's look at another example. In it, I have created numerous printArray functions that are overloaded so that I can show a number of solutions. I will then analyze each of those solutions and explain the pros and cons of each.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
#include <iostream>
#include <vector>
#include <deque>
#include <list>

// Method 1: works but very little security.  It is impossible to validate
// the inputs since the size of data still cannot be validated. If length is too large
// undefined behavior will occur.
void printArray(int data[], int length)
{
    for(int i(0); i < length; ++i)
    {
        std::cout << data<i> << ' ';
    }
    std::cout << std::endl;
}

// Method 2: Type safe and more generic.  Works with any container that supports forward iterators.
// Limitation - cannot validate iterators so caller could pass null or invalid pointers.  Typesafe - won't
// allow you to pass inconsistent iterator types.  Allows you to pass any valid range of a container.
template <class ForwardIteratorType> 
void printArray(ForwardIteratorType begin, ForwardIteratorType end)
{
    while(begin != end)
    {
        std::cout << *begin << ' ';
        ++begin;
    }
    std::cout << std::endl;
}

// Method 3 - This implementation is as typesafe and secure as you can get but
// does not allow a subrange since the entire container is expected.  It could
// be useful if you want that extra security and know that you want to operate
// on the entire container.
template <class ContainerType> 
void printArray(const ContainerType& container)
{
    ContainerType::const_iterator current(container.begin()), end(container.end());
    for( ; 
        current != end; 
        ++current)
    {
        std::cout << *current << ' ';
    }
    std::cout << std::endl;
}

int main()
{
    // Method 1.
    const int LENGTH(6);
    int data[LENGTH] = { 5, 7, 8, 9, 1, 2 };
    printArray(data, LENGTH);

    // Method 2.
    printArray(data, data + LENGTH);
    std::vector<int> vData(data, data + LENGTH);
    printArray(vData.begin(), vData.end());
    std::list<int> lData(data, data + LENGTH);
    printArray(lData.begin(), lData.end());
    std::deque<int> dData(data, data + LENGTH);
    printArray(dData.begin(), dData.end());
    // won't compile if caller accidentally mixes iterator types.
    //printArray(dData.begin(), vData.end());

    // method 3.
    printArray(vData);
    printArray(dData);
    printArray(lData);
	return 0;
}


Method 2 is unique in that it allows you to specify any range of the array where method 1 and 2 accomplish the same goal of printing the entire container. If that is what your intention was all along then I submit to you that method 3 is the best. It is most secure and typesafe. There is very little if any chance that a caller could specify invalid parameters. An empty container would not cause any problem. The function simply wouldn't print any values.

It is important to note that a C-Array cannot be passed using method 3. Method 3 requires the use of a container such as the std::vector. C-Arrays are a holdover from the C language and do not have functional interfaces. Method 1 or 2 would need to be used if you are dealing with C-Arrays. I'm sure that there are other ways as well but it is up to you to determine which method is best for your project.

One could produce hundreds of example programs that demonstrate these points even further but I'll leave it up to the readers to copy the program and build other kinds of examples. The beauty of templates is that it reduces repetitive programming tasks. Define the function once so that the function can be called multiple times where each time you specify a different type. It is simply a matter of making sure that the type supports the minimim requirements of the function. The method 3 printArray function requires that the ContainerType has begin() and end() member functions that return forward iterators and that the objects within the container are instances of classes that support the operator<< function. The operator<< can be defined for user defined types as well so method 3 is not limited to containers of built in types.
Part II – Returning arrays from functions
What follows is an example containing two typical problems with returning arrays from functions. For the record, I do not believe that returning arrays from a function is necessary. It may seem natural to return the result of a function but it isn't necessary. You can provide out parameters to a function using pointers or references.

The following program produces this output using MS Visual Studio C++ express 2008.
13 8 9 10 11 12
-858993460 -858993460 -858993460 -858993460 -858993460 3537572
41 18467 6334 26500 19169 15724
41 18467 6334 26500 19169 15724
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <algorithm>
#include <iostream>

// Prints out array elements. Method 2 from PART I.
template <class ForwardIteratorType> 
void printArray(ForwardIteratorType begin, ForwardIteratorType end)
{
    while(begin != end)
    {
        std::cout << *begin << ' ';
        ++begin;
    }
    std::cout << std::endl;
}

// This function is a poor design which will lead to undefined behavior when the caller
// tries to use the pointer that is returned.  data is allocated on the stack and destroyed
// after the function returns.  The pointer to the memory is returned but it is a dangling
// pointer to memory that has already been released.
{
    int data[6] = { 13, 8, 9, 10, 11, 12 };
    int* pointer = data;
    printArray(pointer, pointer + 6);
    return pointer;
}

// The *& symbol means reference to a pointer so that modification of the array 
// results in modification of lotteryNumbers back in main.  In this case the pointer
// updated back in main is valid but the caller has to remember to release the memory
// at some point.  Therefore this approach is error prone.
void generateArray(int *&array, int length)
{
    int* pointer = new int[length];
    // std::generate requires the <algorithm> header
    std::generate(pointer, pointer + length, rand);
    printArray(pointer, pointer + length);
    array = pointer;
}

int main()
{
    int* lotteryNumbers = generateArray();
    printArray(lotteryNumbers, lotteryNumbers + 6);

    const int LENGTH(6);
    generateArray(lotteryNumbers, LENGTH);
    printArray(lotteryNumbers, lotteryNumbers + 6);
    delete lotteryNumbers;
    return 0;
}


The first call to printArray occurred within the version of generateArray that returns a value. At that time the array named data was valid and had been allocated from stack memory since it was created local to the function. Once generateArray returns the memory is returned to the stack and available for the program to reuse for some other purpose. Therefore the pointer that is returned to main points to memory that can and will be overwritten and the second line of output is garbage. The behavior is undefined. There is no way to predict how a program like this will behave. The output that I witnessed may not be the output that you see with another compiler and/or run-time environment.

There is another problem with that same version of generateArray. The function can only return one value. How does main know how big the array is, even if it was properly constructed using heap memory? In this case the programmer who wrote both functions coded this assumption which is a bad design.

Notice that there is another version of generateArray that takes two parameters and has a void return type. The first argument is a reference to a pointer so that the lotteryNumbers pointer of main is modified . The second argument is the length which I require the caller to supply. Although the function can accomplish the task successfully is it the best approach? In a complicated, large scale application memory leaks can cause serious problems and it may not be so easy to manage memory yourself.

I think that we can do even better. One question that arises is why would you want a function that builds an array? You can easily instantiate an array in place. Let me create a function that reads console input, and fills an array for the user. The below example allows the array to be constructed by a function without the caller having to worry about memory leaks or stack vs. heap memory allocations. There are many ways to do this. In this case I chose to allow the caller to pass an array of any size and the function will simply add to it. It could start out empty but it doesn't have to. The std::vector is managing the memory so when the main function exits it is destroyed without the programmer having to worry about garbage collection.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <vector>
#include <iostream>
#include <limits>

// Prints out array elements. Method 2 from PART I.
template <class ForwardIteratorType> 
void printArray(ForwardIteratorType begin, ForwardIteratorType end)
{
    while(begin != end)
    {
        std::cout << *begin << ' ';
        ++begin;
    }
    std::cout << std::endl;
}

// The caller must decide whether to pass an empty container.  This function will 
// add to it.  
void readScores(std::vector<int>& container)
{
    std::cout << "Type the list of scores followed by a non-numeric character and press enter when finished. " 
              << "For instance (22 25 26 f <enter> " << std::endl;
    int temp(0);
    while(std::cin >> temp)
    {
        container.push_back(temp);
    }
    // clear and discard any leftover data from the input stream.
    std::cin.clear();
    std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}

int main()
{
    std::vector<int> scores; // uninitialized.  Let readScores fill it.
    readScores(scores);
    printArray(scores.begin(), scores.end());
    return 0;
}


I chose not to make readScores a template function this time. It doesn't have to be and I wanted to keep the example fairly simple. It could be changed to be more generic. Try it if you dare and see what happens when you run the program. The point is that the function doesn't really need to build the array. Building the array within the function and returning it is tricky. You will either have to deal with garbage collection or return a std container by value which could result in unnecessary copy construction.

Unfortunately return by value means that at the very least you are probably going to have an assignment that would result in the caller's vector allocating memory to hold the copied data. The best way is really to pass by reference with a void return as I did in the earlier example. That example is more flexible as well since the caller can decide whether to add to an existing array or fill a new array.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
std::vector<int> container readScores()
{
    std::vector<int> container;
    std::cout << "Type the list of scores followed by a non-numeric character and press enter when finished. " 
              << "For instance (22 25 26 f <enter> " << std::endl;
    int temp(0);
    while(std::cin >> temp)
    {
        container.push_back(temp);
    }
    // clear and discard any leftover data from the input stream.
    std::cin.clear();
    std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
    
    // return by value. Container will be destroyed but data will be copied into callers vector instance which could result
    // in additional memory allocation.  
    return container;
}


I'll finish by stating that there are other ways of accomplishing these kinds of programming tasks and I'd like to encourage anyone to post some examples using template functions of their own or boost libraries.