Programming in C++
Lab 4 – Pointers, references, and how to use them
The C++ memory layout
The various processes running have memory allocated by the operating system. In general, processes can only access their memory. This memory is organized as follows:
If a program is executed, its variables are by default created on the stack, with the last function called being on top of the stack (hence it is similar to a stack data structure):
Notice how this enables visibility of variables, recursion, and what it means for call-by-value for parameters. Also notice that memory becomes available if we finish a function (strictly speaking, a block of code) - this is easy to do in the stack structure. It also explains why an array has a fixed size "for live".
Pointers
C++ allows programmers to access memory directly by using pointers. A pointer is an address to a memory location that contains some data or where some data starts. In the case of a pointer, it is actually necessary to think of two separate values: the address and the data it refers to. Note that no memory is allocated for data by default when defining a pointer. It is up to the programmer to deal with this. If pointer refers to an arbitrary memory location, operating systems with memory protection will stop the program when reading data (error messages like segmentation fault, page fault, general protection fault).
The following figure shows an example of memory. The first line contains the memory addresses and the second line the corresponding values. Each memory location stores one byte. For multibyte values, the order in which the individual bytes are read is important. In this example, we read the bytes from left to right. For example, if two memory slots have consecutive bytes 2 and 1, then their slots contain a two-byte integer 258 (256 * 1 + 2). Read more about the order of bytes here: http://en.wikipedia.org/wiki/Little_endian.
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
34 | 4A | 11 | B5 | FF | 3 | 23 | F0 | C0 | 20 | 6D | 30 | 83 | 0 | 7 | 1 | 34 | 22 | BA | D |
Asterisks are used to define a reference. The asterisk is later used to read data from the referenced memory address. See the following code section for an example of how to define and use a link. NB! This code will not work on the computer because the memory address 15 will not be easily accessed by the programmer. Consider this as an example.
unsigned short *somenumber = 15; /* Saving memory address */ cout << somenumber; /* Prints 15 */ cout << *somenumber; /* Prints 263 (why?) */ somenumber = 14; /* New address. What is *somenumber now? */ *somenumber = 1; /* Changing the value. */ somenumber = 15; /* What is now the value of *somenumber? */
Pointer operations/operators are:
- Declaring a pointer variable (or parameter): int* a or int *a
- Getting the pointer to an existing variable: int b=0; &b
- Getting the value of a pointer: *a
Note that pointers are a different type from the "base" variable - keep this in mind when doing assignments: a=&b and b=*a will work in those examples. a=b or a=12 will not.
When to use pointers?
Using pointers is more complicated than defining regular variables, therefore it is reasonable to use them when it brings about higher efficiency or better design. Before using a pointer one must check that it has a value and it isn't null pointer. (C++11 adds keyword nullptr, before value 0 or constant NULL was used). Additionally, one must release all the memory allocated for pointers, otherwise it „leaks” and reduces available memory
References
References were added to the C++ language to provide some pointer capabilities without the associated risks. A reference is essentially a new name that can be used to use a variable or object. The link itself has no content or address - it only refers to an existing object.
An ampersand is used to define a reference. Here's a simple example of how to use it:
int a = 5, b = 9; // define variables a and b int& refToA = a; // refToA is new name to variable a cout << a << "," << refToA << "," << b; // returns 5,5,9 a = 7; // change a value cout << a << "," << refToA << "," << b; // returns 7,7,9 refToA = 11; // change refToA value cout << a << "," << refToA << "," << b; // returns 11,11,9
When to use references?
If you need to refer to the same data or objects in several places, use references if possible, and if they do not fit (eg you need to distinguish an existing object from one that has not yet been created), use pointerse. Use references wherever they help to avoid excessive copying of values.
Allocating and releasing data for pointers: operators new and delete
The operator new
calculates how much memory is needed for a given object, allocates it and returns a pointer to beginning of the memory slot:
int *a = new int; // call without constructor Vector2 *v = new Vector2 {}; // You can also add parameters to the constructor
This memory will be allocated on the heap (see initial figure).
If the pointer is to an object, then -> is used to access its data or methods
cout << "Coordinates are " << v->x << " and " << v->y << "." << endl;
Allocating memory may fail, therefore it is necessary to check whether or not it succeeded. Values
created by keyword new
must be released by using delete
. NB! In case of allocating memory for
several objects with operator new[]
then for release delete[]
must be used.
delete a; delete v;
Additional reading:
Pointer: http://www.cplusplus.com/doc/tutorial/pointers.html
Dynamic memory management: http://www.cplusplus.com/doc/tutorial/dynamic.html
Pointers and references in function and method parameters
When specifying the type of arguments to functions, you must decide whether to pass the argument as a pointer, a reference, or a normal value. The following examples may help you decide this.
Option A: Parameters passed as values
void process_values (int a, string b, Vector2 c);
When calling the function, the following things will happen:
- All values used in the call are automatically copied.
- Modifying these copies within a function does not change the values in the calling function.
- Copies are destroyed upon returning from the function
Option B: Parameters passed as references
void process_references (int &a, string &b, Vector2 &c);
When calling the function, the following things will happen:
- Data of used variables is not copied.
- Changing their values inside the function will affect them in calling function.
- References are destroyed upon returning from the function, but variables remain intact.
Option C: Parameters passed as pointers
void process_pointers (int *a, string *b, Vector2 *c);
When calling the function, the following things will happen:
- Pointers are copied but they will be pointing to same values as before.
- When changing argument pointer addresses, original pointers will remain unchanged.
- When changing values pointed to by given argument pointers, original values will change as well.
- Pointers are destroyed upon returning from the function, but original data will remain intact.
If the data of the parameters is large, like an object containing large data, then pointers or references are recommended, because then there will be less copying of data. Also pointers or references are needed when function is expected to change the data given to it.
Pointers and references as function return values
Care must also be taken if the pointer or reference occurs in the value returned by the method or function. Again, we will give illustrative examples, but first we will introduce the concept of scope of a variable. An ordinary variable is „alive” only within its scope – inside a structure or part of a code in which it was defined. When the structure is destroyed or the part of the code ends, the variable is destroyed.
int arvuta_midagi () { ... int a = ... | for (int i = 0; i < 0; i++) { | | cout << i << endl; | scope of i | } | | scope of a return a; | } |
Variable a
is an ordinary variable for which memory is allocated and released automatically. Upon returning from the function a
is destroyed and upon leaving the loop i
is destroyed. Function arvuta_midagi
returns integer a
. Since the function returns a value, a
is copied for the calling function.
string& arvuta_teisiti () { string s = "C++"; return s; }
The function arvuta_teisiti
tries to return a reference to a local variable. This is very likely to be wrong, and the compiler usually alerts the programmer to this. The memory for the variable s
is freed and the reference returned is left to point to nowhere. When returning a reference, make sure that it remains valid after the function has ended.
Vector2* tee_uus_vektor () { | Vector2* tee_uus_vektor2 () { Vector2 v; | Vector2* v = new Vector2 {}; return &v; // WRONG | return v; } | }
The function tee_uus_tipp
creates a Vector2
as an ordinary variable and returns its address (note the operator &, that is responsible for returning memory address of an object written to the right of it). This kind of action brings about same dangers as with references. Much bigger problem is that here compiler will not warn us, because pointer pointing nowhere is totally legal. Function
tee_uus_tipp2
shows how to perform such a task correctly. Note that after using this kind of
approach the programmer is responsible of relasing the memory. For that, a correct function is for
example:
void vabasta_vektor (Vector2 *v) { if (v) delete v; }
An alternative is given in the following example. The pointer returned by the function is valid as long as the instance of Someclass
on which get_something
was called exists.
class Someclass { int a; int* get_something () { return &a; } };
The danger of memory leaks is very real in this context. Take this program:
void my_func() { int* valuePtr = new int(15); int x = 45; // ... if (x == 45) return; // here we have a memory leak, valuePtr is not deleted // ... delete valuePtr; }
Here, seemingly everything is fine - we have a new, and a corresponding delete. Unfortunately, if x==45 applies, this is not executed. Keep in mind exceptions here.
How does C ++ 11 change the use of references and pointers?
C++11 complements the use of pointers and references in several ways. All upgrades will not be covered in labs. Following materials are for those who are interested in more possibilities of C++.
Move constructor and move assignment operator
Additionally to constructor, destructor, copy constructor and assignment operator the developer can now also use move constructor and move assignment which avoid copying. This is useful in cases where copying is not needed and moving is enough.
SomeObject a, b, c; c = a + b;
If SomeObject
class had ordinary assignment operator then in memory an additional variable would be made for addition a + b and its value would be copied into c. If the class has move assignment operator then copying is avoided and less work is done. Read more:
http://stackoverflow.com/questions/4782757/rule-of-three-becomes-rule-of-five-with-c11
Moving objects in parameters and return values
Moving also helps to reduce copying in parameters and return values. Here it is important to observe what the function is doing and decide accordingly. Some good examples:
http://stackoverflow.com/questions/7592630/is-pass-by-value-a-reasonable-default-in-c11
Smart pointers
C++11 includes smart pointer support. A smart pointer can follow itself and decide when its memory should be released. Using smart pointers helps to program more safely. It is still useful to know how ordinary pointers work.
Remember our example of a program with a memory leak above. The same example using the unique_ptr<> template looks like this:
#include <memory> void my_func() { std::unique_ptr<int> valuePtr(new int(15)); int x = 45; // ... if (x == 45) return; // no memory leak anymore! // ... }
An object behind a unique_ptr
can be denoted by one unique_ptr
only. If the pointer is out of scope, the object gets deleted.
shared_ptr
counts instances, as long as there is still a pointer to an object, the object is kept.
auto_ptr
is an outdated implementation of unique_ptr
.
More on smart pointers here: https://en.cppreference.com/book/intro/smart_pointers