Thbop -> C For Python Programmers

Home Tutorial Hub

Introduction

This tutorial assumes you have a decent grasp of the Python language.

C is a compiled low-level language first developed at Bell Labs by Dennis Ritchie in 1972 (source).

As a Python programmer, you are accustomed to a language developed by Guido van Rossum. Python was coded in C++, an extension of C (source).

Additionally, Python is an Interpreted Language, parsing instructions line by line opposed to a Compiled Language.

Thus, as previously implied, Python is a high-level programming language (far abstracted from raw machine code) while C is a low-level programming language. Furthermore, as a C programmer, you can shouldn't expect high-level features such as Object Oriented Programming and lists.

In C, one primarily and fundamentally deals with raw numbers and memory addresses. High-level features such as strings, lists, dictionaries and the like do not exist and must be coded from scratch.

To compile C programs you, of course, will need a compiler. For general purpose projects I'd use mingw-w64 (specifically i686-w64-mingw32 packaged with MSYS2). Mingw-w64 is packaged with a ton of tools and installing it can be tricky for beginners, but it allows one to compile C and C++ programs.

For this tutorial, I'll be using the Tiny C Compiler due to its small size and ease of use. I specifically downloaded tcc-0.9.27-win32-bin since I am on Windows 10. Don't stress to much between 32 bit versus 64 bit (unless, of course, you are running a 32 bit operating system).

Since tcc (Tiny C Compiler) is much simpler than gcc (one of mingw-w64's compilers), the resulting file size is very tiny. We're talking 2KB versus 240KB just for a simple "Hello World!" program.

(Note: For windows users, I'd recommend adding tcc to PATH. If you do not know what that means, you probably aren't experienced enough in Python to be following this tutorial.)

Hello World!

A basic "Hello World!" program looks like this:

#include <stdio.h>

int main() {
    printf("Hello World!\n");
    return 0;
}

Here's a line-by-line explanation:

#include <stdio.h> - preprocessor directive functioning similar to an import statement in Python code. "std" means standard, "io" means input/output, and ".h" means it is a header file. "stdio.h" contains functions for standard input/output, like printf(). Preprocessor statements do not end with semicolons.
int main { - defines the program entry point. Unlike Python, to run any code in C you must define a main function; additionally, this function must return an integer (the int). The returned int signals to the operating system the success or failure of your program. 0 = success while 1 = failure.
printf("Hello World!\n"); - prints a character array to the console. The "f" in printf() stands for formatted (source). Notice also how the line is terminated with a semicolon; this simply tells the compiler this line is finished (unlike the invisible newline [or '\n'] character that Python uses).
return 0; - simply returns 0 denoting to the system that the program ran successfully.
} - end bracket to define the end of the main() function. Brackets are nested like indentation in Python. Furthermore, C does not rely on indentation at all (a valid program can consist of a single line).

To compile this "Hello World!" program, first create a text file named: "main.c". Unlike Java (and similar languages) the name of the .c source file does not correspond to the name of the main() function. You can name your file whatever you want as long as it ends in ".c".

Using tcc in the console, run tcc main.c. A main.exe (windows) should be generated. Here's an example:

C:\Users\thbop\Desktop\C-For-Python>tcc main.c

C:\Users\thbop\Desktop\C-For-Python>main
Hello World!

C:\Users\thbop\Desktop\C-For-Python>main.exe
Hello World!

One benefit, made immediately apparent when using a compiled language, is that users downloading a program do not need a "compiler" (or "interpreter") to run it; their machine already knows how to run it.

Theory Dump: How It Works with cc65

C Compilers basically convert C code into Assembly, assemble it into an object file (".o"), and finally link it into a machine code.

For this theory dump, I'll show the entire process, generating a program that can run on the 6502 Processor, an older and (relatively) simpler processor compared to modern processors. (6502 assembly is also much simpler to understand compared to modern assembly.)

First, you will have to understand the elementary concept of a Byte. (Simple explanation: 8 bits, 1 bit is an on or off state.)

For this walk-through, I'll be using this program:

// We do not #include <stdio.h> because the 6502 is just a processor; there is no defined standard input/output.
unsigned char main() {
    unsigned char i = 0;
    while ( i < 10 ) {
        i++;
    }
    return 0;
}

Note: unsigned char declares that variable's / function's type as an unsigned (no negative sign, only positive) character (1 byte). You can think of it as an integer variable that can only store values from 0 to 255 (range of 2^8=256).

Aside from that no-so-obvious (for some) note, this program is relatively simple and its Python "equivalent" is:

i = 0
while i < 10:
    i += 1

The abridged 6502 assembly source file (compiled by cc65) looks like this (with comments added):

.segment	"CODE"

	lda     #$00        ; Loads a 0 into the A register; the A register or Accumulator is general purpose single-byte storage in the CPU
	jsr     pusha       ; Pushes A onto the stack (I'll cover the stack later)
	jmp     L0004       ; Jumps to the while loop (the assembly line starting with "L0004:")
L0002:	ldy     #$00    ; This section defines the "i++" expression within the while loop
	ldx     #$00
	clc
	lda     #$01        ; Loads 1 into A so that we can add it to the C variable "i"
	adc     (sp),y      ; Adds A + i
	sta     (sp),y      ; Stores the result in i. After processing this line, the CPU will go to the next line (L0004)
L0004:	ldy     #$00    ; Start of the while loop line
	ldx     #$00
	lda     (sp),y      ; Loads the value of i into A
	cmp     #$0A        ; Compares A to 0A (hex representation of 10)
	jsr     boolult     ; Jumps to the boolult subroutine to handle the "<" sign
	jne     L0002       ; If the comparison is is satisfied (e.g. A is less than 10), jump to L0002
	ldx     #$00        ; When the CPU is on this instruction, the while loop has finished and everything just needs to be cleaned up.
	lda     #$00
	jmp     L0001       ; Jump to L0001
L0001:	jsr     incsp1
	rts                 ; Return from program. This processor can only run one program at a time, so we need to be able to return.

Congratulations! You were reading assembly code!

When assembled, the object file's (".o") raw bytes look something like this:

00000000  55 7A 6E 61 11 00 00 00  60 00 00 00 0B 00 00 00  |Uzna....`.......|
00000010  6B 00 00 00 0F 00 00 00  7A 00 00 00 E5 00 00 00  |k.......z.......|
00000020  5F 01 00 00 20 00 00 00  7F 01 00 00 0C 00 00 00  |_...............|
00000030  8B 01 00 00 02 00 00 00  8E 01 00 00 9C 00 00 00  |................|

...

00000220  1F 01 00 00 00 2B 01 00  00 00 20 00 06 6D 61 69  |.....+.......mai| <--- Notice how the object file stores extra data about the program.
00000230  6E 2E 73 18 63 61 36 35  20 56 32 2E 31 39 20 2D  |n.s.ca65.V2.19.-| <--- This data will be used by the linker
00000240  20 47 69 74 20 38 63 33  32 39 64 66 19 63 63 36  |.Git.8c329df.cc6| <---
00000250  35 20 76 20 32 2E 31 39  20 2D 20 47 69 74 20 38  |5.v.2.19.-.Git.8| <---
00000260  63 33 32 39 64 66 02 73  70 04 73 72 65 67 07 72  |c329df.sp.sreg.r| <---
00000270  65 67 73 61 76 65 07 72  65 67 62 61 6E 6B 04 74  |egsave.regbank.t| <---
00000280  6D 70 31 04 74 6D 70 32  04 74 6D 70 33 04 74 6D  |mp1.tmp2.tmp3.tm| <---
00000290  70 34 04 70 74 72 31 04  70 74 72 32 04 70 74 72  |p4.ptr1.ptr2.ptr| <---
000002A0  33 04 70 74 72 34 0E 6C  6F 6E 67 62 72 61 6E 63  |3.ptr4.longbranc| <---
000002B0  68 2E 6D 61 63 0B 5F 5F  53 54 41 52 54 55 50 5F  |h.mac.__STARTUP_| <---
000002C0  5F 05 5F 6D 61 69 6E 05  70 75 73 68 61 05 4C 30  |_._main.pusha.L0| <---
000002D0  30 30 34 05 4C 30 30 30  32 05 2E 73 69 7A 65 07  |004.L0002..size.| <---
000002E0  62 6F 6F 6C 75 6C 74 05  4C 30 30 30 31 06 69 6E  |boolult.L0001.in| <---
000002F0  63 73 70 31 04 43 4F 44  45 06 52 4F 44 41 54 41  |csp1.CODE.RODATA| <---
00000300  03 42 53 53 04 44 41 54  41 08 5A 45 52 4F 50 41  |.BSS.DATA.ZEROPA| <---
00000310  47 45 04 4E 55 4C 4C 00  00  |GE.NULL..|

Now onto the final stage of "compilation," linking.

The linker basically inserts the program bytes from the main.o object file and its dependencies into one executable.

If you found this process and the 6502 interesting, check out the following projects / sources relating to the topic:

Basic Data Types and Variables

This section is going to be very simple compared to the last. So take a breath-- and let's continue.

C has a few common data types:

int - 16 or 32 bit, or 2 or 4 byte representation of a signed integer. The signed integer can be negative or positive at the expense of a half-reduced max value.
short - 16 bit or 2 byte representation of a signed integer.
char - 8 bit or 1 byte representation of a signed integer. Commonly used to store a single character or an array/string of characters (more on that later).
float - 32 bit or 4 byte representation of a signed floating point number (decimal number).
double - 64 bit or 8 byte representation of a signed floating point number. A more precise "float" at the expense of memory and process time.

Any of these declarations can be prefixed with unsigned to make them unsigned (not negative):

unsigned int a = -10; // Would have undefined behavior (unless you know what you are doing)

They can also be prefixed with const to define them as constant and immutable.

const int a = 76; // "a" cannot be modified

Additionally, int's can be prefixed with long to implicitly define a 32 bit or 4 byte integer. On some systems, int's by default are 16 bit while on other systems they are 32 bit. The keyword long forces an int to be 32 bit (source).

long int a = -45; // 32 bit int

The sizeof(x) function can be used to determine the size (in bytes) of a particular variable.

You might have already run into this issue while playing with numbers: "I can't print them!"

Don't worry! printf() is defined as:

static inline int printf(const char *__format, ...) { ... }

"Thbop, that doesn't help."

The arguments for printf() are: "const char* __format" (a constant character pointer or array; I'll cover pointers and arrays later, but for now you can think of this as a string) and "..."

The word "format" should get you thinking of the Python str.format() method. The concept is the same, but the execution is slightly different:

int a = 10;
float b = 65.0f; // const float values like this must be defined with an "f" on the end
double c = 34.654; // opposed
printf("%d\n", a); // Prints "a" as 10; note %d is the same as %i, I just prefer %d
printf("%d %f %lf\n", a, b, c); // Outputs: 10 65.000000 34.654000

Some common type specifiers are shown below:

Symbol	Type
%d or %i	int
%x	int (hex representation)
%u	unsigned int
%lu	long unsigned int
%f or %F	float
%lf (long float)	double
%c	char (actually prints a character)
%s	string (actually a char*)

(Sources: w3schools data types and w3schools sizeof)

Some Basic Operations

The for-loop in Python has the following structure:

for i in iterator:

In C, only the iterator is supplied; one constructs the iterator like so:

for ( varible-definition; condition; runs-every-iteration ) {...}

or, for example:

for ( int i = 0; i < 10; i++ ) {...}

Miscellaneous note: whitespaces do not affect the code in many instances, so:

for(int i=0;i<10;i++){...}//Hasthesameaffectasabove

Additionally, the variable i will only be defined within the scope between the squiggly brackets {}, so:

for ( int i = 0; i < 10; i++ ) {...}
int a = i; // Will result in an error; i is undefined in this scope.

Here are some other examples:

int a = 10;
if ( a == 10 || 5 == 5 ) { // || = or
	printf("yes\n");
} else if ( a < 10 && a > 0 ) { // && = and
	// do something
} else printf("hmm");

if ( a == 10 )
	printf("If there's only one statement after if, {} are not required\n");

switch ( a ) {
	case 1: {
		printf("a = %d\n", a);
		break; // case 1 is satisfied, no other cases will be evaluated
	} case 2:
		printf("Brackets also do not have to be used here"); // Note: there is subtle difference between having brackets and not having them
		break;
	default: printf("Neither are satisfied"); break;
}

Arrays and Pointers

Arrays and pointers are a source of much frustration for Python programmers because these features are handled more automatically in Python.

For example, in Python you can simply define a list, append, insert, etc to it to your heart's content. In C, it's not that simple. And what's a pointer?

First, let's discuss what an array is. Here's a simple array definition:

int arr[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; // Define array with size of 10
for ( int i = 0; i < 10; i++ ) // "Iterate over the array," more like: "loop 10 times"
    printf("%d\n", arr[i]);    // Print the value at index i

Some notes:

Stack-defined (what we have here) arrays' sizes cannot be modified.
The size of a stack-defined array must be known at compile time, not runtime.
Negative indexing (arr[-1]) does not exist.

int a = 34;
int arr[a] = {0}; // "a" is a varible known at runtime, not compile time; this code would result in an error
// The {0} would initialize the array's values as 0 if this example was valid

To pass an array into a user defined function, the size must be declared in the function parameters:

int array_process( int arr[11] ) { // Notice this array is of size 11; this is still valid but index 10 will result in garbage (random) data
    printf("%d\n", arr[9]); // Print the 10th element
    return 0;
}

int main() {
    int arr[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    array_process(arr); // Pass array into function
    return 0;
}

What an interesting introduction into pointers! You might have noticed that I've been rather silent on the topic... until now.

The concept to grasp is this: arrays are pointers.

Here's a simple pointer declaration (to leave you even more confused):

int a = 45; // Define a variable; this exists at a particular memory address
int* b = &a; // Define b to point to a; "int*" denotes a pointer variable and "&a" references a (returns its memory address)
printf("%d at %x\n", *b, b); // "*b" dereferences b (returning the value of a); b's actual value is a memory address, so we print that here

Result: 45 at 19ff34

Now here's the trick: an array is actually a pointer to the first element in an array

Thus:

int* arr[4] = {45, 23, 73, 1};

printf("%d ", *arr); // Dereference first value in array, 45
printf("%d\n", *(arr+3)); // Adding 3 to the address will return the 4th element (or index 3), 1

Result: 45 1

The benefit is that we can rewrite the array_process() function as so:

int array_process( int* arr ) { ... }

The downside to this approach is that there will be no runtime checks when accessing and modifying array members:

int array_process( int* arr ) {
    printf("%d\n", arr[400]); // Print some random out-of-bound value
    return 0;
}

I personally wouldn't reccommend messing with out-of-bounds memory at all because it doesn't really have many legitimate uses.

Now to clear up strings:

char* str = "Hello World!";
printf("%s\n", str);

Functions and Void

Yes, I haven't properly covered functions yet. Here's some example code to cover them real quick:

void printintarr( int* arr, int length ) { // A void function returns nothing
    printf("{ ");
    for ( int i = 0; i < length; i++ ) printf("%d, ", arr[i]);
    printf("}\n");
}

int main() { // main() cannot return void, it must always return an int
    int a[] = {1, 5, 3, 7, 3, 7, 2, 5};
    printintarr(a, 8);
    return 0;
}

Structs and Typedef

The closest thing to class, structs provide a sleek way to group variables in an "OOP" way.

The principal difference between a struct and a class is that a stuct cannot have its own methods (or functions). You can, of course, create functions that modify a struct, but they must be external to the struct itself.

Here's an example of a simple struct declaration:

struct vec2 { // Declare struct
    float x, y;
};

void printvec2( struct vec2 p ) { // Example of a function that prints the vec2 by passing the struct in directly
    printf("vec2(%f, %f)\n", p.x, p.y);
}

void printvec2ptr( struct vec2* p ) { // Passing a pointer to the struct
    printf("vec2(%f, %f)\n", p->x, p->y); // Notice when dereferencing struct values we use the "->" symbol instead of "."
}

int main() {
    struct vec2 p = {4.0f, 1.0f}; // Declare p as type struct vec2; we supply it with a buffer

    printvec2(p);
    printvec2ptr(&p); // Reference p
    return 0;
}

As a simplification, we can use typedef:

struct vec2 {
    float x, y;
} typedef vec2; // Define a custom type for the struct

vec2 subtractv2( vec2 p, vec2 q ) {
    return (vec2){ p.x - q.x, p.y - q.y }; // "(vec2)" is a type cast
}

float dot( vec2 p, vec2 q ) {
    return p.x * q.x + p.y * q.y;
}

int main() {
    vec2 p = {5.0f, 20.0f};
    vec2 q = {7.0f, 2.0f};

    printf( "%f\n",
        dot( subtractv2( p, q ), q )
    );
    return 0;
}

Looking further

This introductory "course" did not cover topics like files, stack vs heap, void pointers, and others. Just keep experimenting and researching.

Website last updated Tue Nov 19 15:52:15 2024 by Thbop.