How Variables Exist in Computers

Preface:

This article delves deeply into variables in programs, involving a wealth of underlying computer knowledge.

If you happen to have a keen interest in this area, please read on!

However, if you’re not interested, you may choose to skip it, as reading this article may not significantly enhance your programming skills.

Introduction

Introduction 1

When we open a C source file and input the following code:

#include <stdio.h>

int main()
{
    int num;
    scanf("%d", &num);
    printf("%d", num);
    return 0;
}

The number you input will be stored in a variable called num.

Introduction 2

Type	Storage Size	Value Range
int	4	-2^31 to 2^31 (2,147,483,647)
unsigned int	4	0 to 2^32 (4,294,967,295)

Thought: How is a number stored in a variable? How is the value range determined?

Basic Concept Introduction

Bit

A bit is the most basic concept in computing. Since only logical 0 and 1 exist in computers, many things, actions, and numbers must be represented as a string of binary digits, such as 1001 0000 1101, etc. Each logical 0 or 1 is a bit. For example, in the string 1000 1110, there are eight bits, and it is called a bit, which is the most basic unit in computing.

Byte

A byte is a unit composed of eight bits, meaning 8 bits make 1 byte. What is the use of a byte? In computer science, it is used to represent ASCII characters; bytes are used to record letters and some symbols. For example, the character A is represented as “0100 0001”.

Binary

Binary (binary) is a mathematics and digital circuit number system with a base of 2, representing a base-2 counting system. In this system, two different symbols, 0 (representing zero) and 1 (representing one), are used for representation. Digital electronic circuits directly implement logic gates using binary. Modern computers and devices relying on computers use binary. Each digit is called a bit.

Hexadecimal

Hexadecimal (abbreviated as hex or subscript 16) is a counting system with a base of 16, which is a place value system that counts in groups of 16. It typically uses the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and letters A, B, C, D, E, F (or a, b, c, d, e, f), where A~~F represent 10~~15. These are known as hexadecimal digits.

Is It Coincidence or Destiny?

As is well known, in the world of computers, only “0” and “1” exist, meaning binary. But what does this have to do with storage?
First, converting a decimal “1” to binary is ‘01’.
Using 4 bits as a unit to represent it as ‘0001’, the range it can represent is 0 to 16, which just happens to coincide with hexadecimal.
Now it is specified that a byte represents ‘0000 0001’, and in hexadecimal, it is expressed as “1”.
‘1111 1111’ -> FF

This way, we can perfectly represent binary using hexadecimal!

Sign-Magnitude, One’s Complement, and Two’s Complement

In computers, integers can be represented in three binary formats: sign-magnitude, one’s complement, and two’s complement.
All three representations consist of a sign bit and a value bit. The sign bit uses 0 to represent “positive” and 1 to represent “negative.”
The representations for positive numbers are the same in sign-magnitude, one’s complement, and two’s complement.
However, the representations for negative integers differ among the three methods.

Sign-Magnitude:
Directly convert the value into binary in the form of positive and negative numbers to obtain the sign-magnitude representation.
One’s Complement:
Keep the sign bit unchanged, and flip all other bits to obtain the one’s complement.
Two’s Complement:
Add 1 to the one’s complement to obtain the two’s complement.

For integers, the data stored in memory is actually stored in two’s complement.

🤔 Why?

In computer systems, values are uniformly represented and stored using two’s complement. The reason is that using two’s complement allows the sign bit and value field to be handled uniformly;
at the same time, addition and subtraction can also be treated uniformly (the CPU only has an adder). In addition, the process of converting between two’s complement and sign-magnitude is the same in terms of computation, requiring no additional hardware circuitry. ✔️

🎈 How can we better understand this?
Let’s take an example 🌰: Suppose we need the computer to execute the operation 1-1?

Next, let’s take a look at the storage in memory.

Introduction to Endianness
What is Big-Endian and Little-Endian 🚩
Big-endian storage mode means that the low-order byte is stored at a higher memory address, while the high-order byte is stored at a lower memory address.

Little-endian storage mode means that the low-order byte is stored at a lower memory address, while the high-order byte is stored at a higher memory address.

🎮 The memory of our computer is utilized from low to high addresses. Whenever we need to define a variable, we request memory space to store data. Here, let’s assume we want to store a number, 11223344 (in hexadecimal form).

Since we know that memory space is measured in bytes, let’s simulate the storage of a four-byte hexadecimal number.

This is similar to how we write numbers in our daily lives (decimal), such as writing five hundred twenty as 520❤️.

We naturally write the high-order digit (5) on the left, so the difference in big-endian and little-endian storage modes is whether the high-order byte is at a lower or higher address when storing a number. 🎈

The storage mode adopted by our devices is determined by the hardware design.

Back to the Introduction
First, Discuss Signed int
We all know that computers store data in binary form.

An int typically occupies 4 bytes in a computer, which is 32 bits. In some older compilers, int may only occupy 16 bits.

Since this is a signed int type, it can represent negative numbers. Because it is signed, one bit must be used as the sign bit, leaving 31 bits for the actual data (hence the range is 2^31).

Conversion Process:

First, determine the sign bit. Since this is a negative number, the sign bit is 1 and placed at the front. Then convert 123 to binary: 1111011, which occupies 7 bits, leaving 24 bits filled with 0.

The resulting sign-magnitude representation is:

To obtain the one’s complement, flip all bits of the sign-magnitude representation (the sign bit remains unchanged). If the value is 0, it becomes 1; if it is 1, it becomes 0.

Finally, to obtain the two’s complement, simply add 1 to the one’s complement.

Thus, the variable c occupies four bytes in memory, corresponding to the above two’s complement.

Next, Discuss Unsigned int
For unsigned int c = 123;

To declare an unsigned int variable, simply add unsigned in front of int. This way, the int variable can only represent positive numbers and does not require a separate bit for the sign. Therefore, all 32 bits can be used to represent data, allowing unsigned int types to represent a larger range of positive numbers.

Conversion Process:

Since this is an unsigned type, there is no need to determine the sign bit. Directly convert 123 to binary: 1111011, which occupies 7 bits, and fill the remaining 25 bits with 0.

The resulting representation is:

Note: If it is an unsigned number, the sign-magnitude representation is the same as the two’s complement, and there is no need for the multiple conversions required for signed numbers. The value stored in memory is simply the sign-magnitude representation.

To prove this with examples:

The variable test is an unsigned int variable, while test1 is a signed int variable, and test2 is also an unsigned int variable.

Then, when we add 12 and -13, it’s obvious that the result should be -1. But let’s take a look at the result:

The result is not -1, so let’s analyze this.

First, adding -13 and 12 definitely results in -1, as it involves a negative number, which is a signed int type.

Thus, we need to determine the two’s complement of this number. As I mentioned earlier, signed numbers are stored in memory as their two’s complement.

First, we have a sign bit of 1 (for the negative number), and the binary representation of 1 is just 1, with 30 bits remaining filled with 0.

The resulting sign-magnitude representation is:

Next, we obtain the one’s complement (the process is described above, so I won’t repeat it):

Then, we derive the two’s complement from the one’s complement:

Since test2 is an unsigned int variable, it directly converts these 32 bits of 1s into decimal, which is:

At this point, I believe you should understand how variables are represented in memory and the differences between signed and unsigned variables.

Note: The two’s complement representation is only stored in memory when the value is negative.

For example: int i = 124;

Although I defined i as a signed int variable, since i is a positive number, its storage representation in memory is the sign-magnitude representation:

The only difference from unsigned representation is that the highest bit is the sign bit and cannot be used to store the numerical value.

Let’s illustrate addition using binary to enhance understanding.

-15 + 10 = -5;

Here I only have 8 bits to represent this.

First, convert -15 to two’s complement: 11110001, and for 10 (its two’s complement is the same as the sign-magnitude representation since 10 is positive): 00001010.

The resulting binary addition yields:

Since the highest bit is 1, the result is negative, which means it’s in two’s complement form. We need to convert the two’s complement back to the sign-magnitude form to get the final result.

The method for converting two’s complement to sign-magnitude is the same as described above: first flip the bits, then add 1, yielding: 0 0 0 0 0 1 0 1, which converts to decimal as 5.

Exercise

Given the execution statement char *p1, *p2; int a = -1; p1 = (char*)&a; p2 = p1 + 2; and assuming int type data occupies 4 bytes, which of the following options is correct?

(A) *p1 == *p2 is true
(B) *p1 == -1 is true
(C) *p2 == -1 is true
(D) A, B, and C are all true

Answer: D

Explanation:

Here, it’s important to note that we are using some pointer knowledge.

The type of the pointer determines the unit of access (the size of memory accessed).

An int pointer dereferences to access 4 bytes, but if we only want to access 1 byte, we can cast it to a char type before dereferencing.

It can be understood as: we access memory from the perspective of char, where char accesses one byte at a time; dereferencing means accessing one byte from the starting address.

char *p1, *p2; // Define char pointers: p1, p2
int a = -1;    // Define an int variable a: FF FF FF FF
p1 = (char *)&a; // Point p1 to the address of a, cast to char* (char only reads one byte); p1 points to the first byte of a
p2 = p1 + 2;       // p2 points to the third address (third byte) of a
printf("%d\n", p1); // Read and output one byte of a: FF; as (int) signed type, it outputs -1; if (unsigned int), it outputs 15
printf("%d\n", p2); // Read and output one byte of a + 2: FF