Algorithms #1: Intro

The base of all technology is it speeds up a process we used to spend time on, and gives us freedom to do higher-level things. Algorithms are the premier technology of information.

Algorithms are computational procedures. They take an input and process it to generate an output.

So to study this, I’ll be reading Introduction to Algorithms, better known as ‘CLRS’. This is by far my biggest conquest for data science. It’s a multidisciplinary, almost grad school-level textbook. Since I’m a beginner, I’ll have to break each discipline down to size and learn how they all tie into algorithms. It’s intimidating, but I’m eager to conquer this thing.

I’ve only read the first 2 intro chapters on analyzing and designing algorithms. Past those, I’ll be stepping away to educate myself on the other disciplines. So starting off, I’ll summarize what I learned in the first 2 chapters.

The most fundamental algorithm is sorting. If I have a sequence of numbers that I input into a sorting algorithm, it will output those numbers in a particular order, usually from least to greatest. This type of output is called a permutation which is the reordering of a sequence.

After an algorithm does its job, it halts, meaning it ceases operation to which it then presents the output. Some algorithms are incorrect, meaning they halt before their full operation. Yet some incorrect algorithms can suffice if we can control their error rate, but we’ll get to that later.

In a perfect world, algorithms run when all the input data is available. In reality, input data arrives over time. The algorithm has to decide how to proceed without having all the data, and without knowing what new data will be coming.

If algorithms are cars, computing power is gas. Different algos have different mileages and we have to know when it’s appropriate to use more or less. So resourcefulness will be a big quality we have to practice.

And there are time complexities in this field which not only means how long algorithms take to run, but how they grow and adapt to incoming inputs.

So again, sorting is most basic type of algorithm. The sequence is made up of keys, usually numbers. The process works by taking a number and testing its relationship with another number, to see if it’s less or greater. Each time it does this is called an iteration. For each iteration, a loop invariant is applied which ensures the current and previous are both correct

There are 3 steps a loop invariant goes through:

  1. Initialization – It is true prior to the first iteration of the loop
  2. Maintenance – If it’s true before an iteration, it remains true before the next iteration
  3. Termination – The loop terminates, and when it terminates, the invariant gives a useful property stating why the algorithm is correct

As long as the first 2 hold, the loop invariant will be true prior to every iteration.

A loop-invariant proof is a form of mathematical induction, where to prove that a property holds, you prove a base case and an inductive step. Initialization is the base case, and maintenance is the inductive step

Typically for the termination property, we use a loop invariant along with the condition that caused the loop to terminate. Inductions usually run infinitely, but with a loop invariant, it stops when the loop terminates.

Analyzing Algorithms

Analyzing mostly means determining the resources needed for the algorithm. Mainly memory, bandwidth, energy consumption, and computational time.

Outside the algorithm itself, the computer it’s running on can affect its computational time. Even with the same algorithm, the input can affect the time. The obvious factor is the size, 100 numbers would take longer to compute than 3 numbers. But even if the inputs are the same size, one can take longer if it’s not already as sorted as the other.

This (5, 7, 2, 1, 8) would take longer to sort than this (2, 5, 1, 7, 8) even though they have the same keys.

For measuring input size, outside of (obviously) counting the number of elements, there’s also the number of bits needed to represent the input in binary notation.

Running Time

Running time is measured by the number of instructions and data accesses executed. Again, depending on the computer, this may not be constant. Some lines may execute at different times. An algorithm’s running time is the sum of the times for each statement executed

For an insertion-sort, best case it’s a linear function, average or worst case it’s a quadratic function.

Order of growth is how the running time increase with the input.

Designing Algorithms

There are a number of techniques for designing an algorithm. We’ll start with divide-and-conquer.

Not most, but many algorithms are recursive, meaning they’ll call themselves to break a larger problem into subproblems. It divides the problems up, conquers them, then brings them back together to solve the main problem.

For these subproblems, we can evaluate their runtime by a recurrence equation. This can help us set bounds to predict an algorithm’s runtime and overall performance. And there’s also a recursion tree that we use to illustrate these steps.

So that’s it for the first 2 chapters. I know that was very general. This is a multidisciplinary textbook so I’ll need to brush up on my other subjects before I move on. I had ChatGPT list the concepts I’ll need to know as a beginner.

  1. Foundational Concepts:
    • Basic Data Structures (Arrays, Linked Lists, Trees, Graphs)
    • Algorithm Analysis (Time Complexity, Space Complexity, Big O Notation)
    • Asymptotic Notations (Big O, Omega, Theta)
  2. Next Level Concepts:
    • Sorting Algorithms (Quicksort, Mergesort, Insertion Sort, etc.)
    • Searching Algorithms (Binary Search, Linear Search)
    • Recursion and Recursive Algorithms
    • Mathematical Foundations (Summations, Logarithms, Mathematical Induction)
  3. Algorithm Design Techniques:
    • Divide and Conquer (e.g., Merge Sort, Binary Search)
    • Greedy Algorithms (e.g., Huffman Coding, Dijkstra’s Algorithm)
    • Dynamic Programming (e.g., Fibonacci Sequence, Longest Common Subsequence)
    • Backtracking (e.g., N-Queens Problem, Sudoku Solver)
  4. Advanced Algorithm Analysis:
    • Amortized Analysis
    • Randomized Algorithms
    • Parallel Algorithms
  5. Complex Data Structures and Advanced Topics:
    • Advanced Tree Structures (AVL Trees, Red-Black Trees, B-Trees)
    • Graph Algorithms (Shortest Path Algorithms, Network Flow)
    • String Algorithms (Pattern Matching, String Compression)
  6. Advanced Mathematical Concepts:
    • Discrete Mathematics (Graph Theory, Combinatorics, Number Theory)
    • Linear Algebra (Matrix Operations, Eigenvalues, Singular Value Decomposition)
    • Probability and Statistics (Probability Distributions, Hypothesis Testing)

So next post, I’ll be starting with data structures.

Leave a comment