Introducing an efficient Big O analyzer

Introducing an efficient Big O analyzer

Introducing an efficient Big O analyzer, a premium state-of-the-art AKDSFramework feature to analyze Big O for any function without any human intervention.

As you already know calculating big O is a big part of what we do to approximate the running time of an algorithm in worst cases. But most of it is done by hand tracing step by step manually. I'm going to propose an analytical approach to compute big O with respect to one expanding variable. Let's see what I mean with some examples

  1. Sorting a sequence of numbers (Big O is \(O(n \log n)\) with respect to the sequence's length (n))
  2. Finding the maximum element of a given array is \(O(n)\) with respect the length of the array.
  3. Finding the last element of a singly linked list is \(O(n)\) with respect to the length of the list.

So in the above cases I'm calling the sequence of numbers the "expanding variables" because we are calculating how the algorithm would perform when these "expanding variables" grows towards big sizes.

Now let's create a Big O analyzer.

Designing the Big O analyser system

Our big O analyser system has these following parts

  • A function that would make a dictionary and record how much time the function is taking for different size of inputs.
  • Another function that would generate different size of inputs that can be fed into the function. Let's call it a generator.
  • Another function that would interpret the execution times and fit the times into a definitive time complexity with respect to the expanding variable.

Let's see an example to clarify this thing:

Let's say we are tasked to find the complexity of the python function sorted(). Now we identify what's our expanding variable?

For the function sorted it sorts a sequence of data, so the expanding variable would be the sequence of numbers. So now let's import an generator that is built into AKDSFramework

from AKDSFramework.applications.complexity_analysis.data_generators import integer_sequence

Now armed with integer_sequence that can generate sequence of integers with length of any number we create an instance of the integer_sequence like this:

int_generator = lambda n: integer_sequence(lb=1, ub=200000, size=n)

Now this int_generator can create a random sequence of length n and individual elements are ranging from 1 to 200000.

Now our job is to make a dictionary and record how much time the function is taking for different size of inputs. From the previous piece of code we already know that we can generate different size of inputs. Now it's time to create the dictionary. For that we need to do the followings

from AKDSFramework.applications.complexity_analysis.analyser import runtimedict

Now we feed the function and any other keyword arguments for the function into runtimedict method.

The run-time dictionary

runtimedict method takes in a few arguments these are:

  • func: The function for which you want to make the execution time dictionary.
  • pumping_lower_bound: Lowest size of the expanding variable. For example if you have to find the execution time for some size of arrays from where you would start.
  • pumping_upper_bound: Largest size of the expanding variable. For example if you have to find the execution time for some size of arrays where you would stop.
  • total_measurements: From lowest size of array to largest size of array how many measurements you want to do?
  • pumping: Among all the keyword arguments what variable needs to be pumped meaning what arguments is the expanding variable. Put the name of the variable in strings.
  • **kwargs: All the arguments of the functions.

Let's take the example of sorted. Sorted function takes in iterable as the keyword argument for the sequence of numbers so to make the run time dictionary we write this:

  • We'll record 200 measurements.
  • Our array size will start from 1000
  • Our array size will end to 5000

So the code would be

# The integer generator from before
int_generator = lambda n: int_generator(1, 200000, n)

# And the Run time dictionary
rtdc = runtimedict(sorted, 1000, 5000, 200, pumping='iterable', iterable=int_generator)

Now to fit the complexity we need the following lines of code:

from AKDSFramework.applications.complexity_analysis.analyser import run_inference_on_complexity

run_inference_on_complexity(rtdc)

Output

Calculating complexity: 100%|██████████| 7/7 [00:00<00:00, 2618.63it/s]
O(N log N)

As you can see that our analysis of sorted function is \(O(n \log n)\) which is actually true.

Another example

Now let's take another example of bubble sort and insertion sort. Both are order \(O(n^2)\)

from AKDSFramework.applications.sorting import bubblesort, insertionsort

rtdc = runtimedict(insertionsort, 10, 2000, 200, pumping='array', array=int_generator, visualize=False, maintain_iter_dict=False)
run_inference_on_complexity(rtdc)

Output

Processing: 100%|██████████| 200/200 [00:12<00:00, 16.17it/s]
Calculating complexity: 100%|██████████| 7/7 [00:00<00:00, 2285.37it/s]
O(N^2)

Bubble sort

rtdc = runtimedict(bubblesort, 1000, 5000, 200, pumping='array', array=int_generator, visualize=False, maintain_iter_dict=False)

run_inference_on_complexity(rtdc)

Output

Processing: 100%|██████████| 200/200 [03:23<00:00,  1.02s/it]
Calculating complexity: 100%|██████████| 7/7 [00:00<00:00, 3031.82it/s]
O(N^2)

Inner workings

So we can say the big O complexity analysis is working. Let's see the inner workings of this module:

  1. First we calculate the runtime for different size of array.
  2. Next we fit the size and time to return the least-squares solution to a linear matrix equation. Now by fitting we mean in separate instances we transform the size to order \(O(n)\) or \(O(n^2)\) or \(O(n^3)\) or \(O(n \log n)\) or \(O(\log n)\) or \(O(c^n)\) then we look at which one is most fitted to a straight line with the time. The most fitted one will be our big O because the time would be in the same order.
  3. We return the most fitted complexity.

To see which one fits better we use numpy.linalg.lstsq to return the least-squares solution to a linear matrix equation. Returned residual is minimum means that the equation fits better to a linear equation. More about this method here

Installation

  • First download/clone this repo like git clone https://github.com/theroyakash/AKDSFramework.git
  • Now uninstall if any previous version installed pip3 uninstall AKDSFramework
  • Now install fresh on your machine pip3 install -e AKDSFramework

Alternate installation

This is easier to install but a bit slower in the installation time. pip3 install https://github.com/theroyakash/AKDPRFramework/tarball/main

First code, Check the version

Now to check whether your installation is completed without error import AKDSFramework

import AKDSFramework
print('AKDSFramework Version is --> ' + AKDSFramework.__version__)

What you contribute is the only resource behind these material. Please support me on gumroad

Did you find this article valuable?

Support theroyakash by becoming a sponsor. Any amount is appreciated!