Post

Understanding the Binary System and Data Encoding

The binary system is at the heart of how computers store and process information. If there is an absolute minimum for storing any information, then that minimum is just two values, yes or no, true or false. This binary logic is the basis of digital data encoding, using just two digits: 0 and 1.

The Binary Numeral System: 0 and 1

Just as the decimal system is based on ten digits (0-9), binary is based on two digits: 0 and 1. In much the same way that decimal progresses, when the rightmost digit reaches its maximum it wraps around to 0, and the next digit to the left is incremented. Thus, decimal numbers increase in this fashion:

1
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, .

In binary it would appear as:

1
0, 1, 10, 11, 100, 101, 110, 111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111, 10000, 10001, .

This system enables large amounts of data to be encoded and that is easily readable and processable by machines. Various technologies, including magnetic tapes, floppy disks, CDs, DVDs, and even flash memory, use binary encoding. To help make sure there is no confusion with decimal values, binary values use the “b” suffix, such as 1010b.

To convert the binary number 1111111111b to decimal, we can use the positional value of each bit. Starting from the rightmost bit (which is the least significant bit), each bit represents a power of 2:

1
1 * 2^9 + 1 * 2^8 + 1 * 2^7 + 1 * 2^6 + 1 * 2^5 + 1 * 2^4 + 1 * 2^3 + 1 * 2^2 + 1 * 2^1 + 1 * 2^0

Now, summing these values:

1
512 + 256 + 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = 1023

Thus, 1111111111b in decimal is 1023.

The conversion for the decimal number 1020 to binary representation would give us 1111111100b:

Conversion to Hexadecimal

To convert the binary number 1111111111b to hexadecimal, we can group the binary digits into sets of four, starting from the right. Since 1111111111 has 10 bits, we can add two leading zeros to make it a full set of 12 bits:

1
0011 1111 1111

Grouping Binary Digits: The Basis for Octal and Hexadecimal Systems

When working with binary values, it often becomes useful to group them into sets of bits. For instance, a group of 3 binary digits can represent 8 possible values (since 2³ = 8), ranging from 000 to 111. A group of 4 binary digits (from 0000 to 1111) gives us 2⁴ = 16 possibilities. That is why there are such things as octal and hexadecimal systems—they simplify the representation of binary numbers.

  • Octal: It uses a base of 8 and utilizes digits from 0 to 7.
  • Hexadecimal: It uses a base of 16 and utilizes digits from 0 to 9 followed by A to F where A=10, B=11, and so on.

Conversion Between Binary and Hexadecimal

Converting between binary and hexadecimal is quite straightforward. A binary number can be divided into sets of 4 digits, with every 4-digit set corresponding to a single hexadecimal digit.

For instance:

  • 0001b as binary is 1h as hexadecimal.
  • 00110001b as binary can be divided as: 0011b and 0001b, it translates to 3h and 1h as hexadecimal hence, 31h.

Data Types and Their Encoding

Now that we understand the basic numeral systems, it is time to have a look at how different data types are encoded.

Bits and Bytes

  • A bit (abbreviation of binary digit) is the smallest unit of data, which can hold one of two possible values: 0 or 1.
  • A byte is a collection of 8 bits, which enables us to represent 256 different combinations, since 2⁸ = 256. A byte can be represented in hexadecimal from 0x00 to 0xFF.

Larger data units are used too in programming:

  • A word in most systems consists of 2 bytes, 16 bits.
  • A DWORD is made up of 4 bytes, which adds up to 32 bits.
  • A QWORD consists of 8 bytes, amounting to 64 bits.

These sizes are especially relevant when programming system code and using APIs, like the Windows API, that use specific bit-widths for different data types, such as WORD being 16 bits.

Data Types

  • Boolean: This data type is able to store no more than two values: true or false.
  • Integer: An integer holds whole numbers and can differ in size: for example, int16, int32, int64.
  • Signed and Unsigned:
    • Unsigned integers use all of the bits to store the numeric value.
    • Signed integers use the most significant bit (MSB) for the sign of the number—0 for positive and 1 for negative. For instance, 0xFFFFFFFF would represent -1 in a signed 32-bit integer.
  • Floating Point: These types are used to represent real numbers with fractional parts, though they are less commonly used in certain fields like malware development.
  • Character and String:
    • Char: Stores a single character, typically 1 byte.
    • String: A sequence of characters, usually represented as a series of bytes, depending on the encoding.

Character Encodings

Character encoding standards define how characters are represented in memory:

  • ASCII: A 7-bit encoding standard for representing characters. The extended version uses 8 bits, thus allowing for 256 possible characters.
  • UTF-8: This is an encoding that uses 1 to 4 bytes per character and is very common in Unix-based systems. The first 128 characters of UTF-8 are identical to ASCII.
  • UTF-16: This is an encoding that uses from 2 to 4 bytes per character and can represent more characters than UTF-8.

Endianness: The Order of Bytes in Memory

When one is storing multi-byte values, endianness, which concerns the order in which bytes are stored in memory, must be taken into account:

  • Little Endian: The least significant byte is stored at the lowest memory address.
  • Big Endian: The least significant byte is stored at the highest memory address.

The most modern computers all use Little Endian, but over networks, Big Endian is used by convention to make compatibility between systems easier.

Memory Paging and Allocation

Modern operating systems are page-based; that is, they divide memory into fixed-size blocks called pages, usually 4 KB. It helps the operating system perform its job efficiently by loading and swapping pages in and out of the physical memory.

Conclusion

The binary, octal, and hexadecimal systems, data types, and memory structures form the very foundation on which persons in computer science or cybersecurity studies stand. From encoding small amounts of data to managing large-scale systems, these concepts play a critical role in how information will be processed, stored, and transmitted.

This post is licensed under CC BY 4.0 by the author.