Decode Python List

When we learn any programming language we always come across data structures used to store the list of items. And use of this is everywhere, for example you search an item in amazon and you get a list of items after the search 🛍️. Or when you are finding the YouTube video to watch the search result show the list of videos 📺.

In my view the core part of any application is how you work with a list of items and knowing internals of how a given programming language handles the list of items is one the main things you should know as a software engineer.

There are many ways to optimize how you can handle operation on a list based on the way it stores the data. For example,

Search in arrays is way more faster then search in linked list.
Inserting in a Linked list is way faster than creating a new array and adding a new element for the append function.

Note: It doesn’t mean that use most on time in O(1) in runtime with list and other data structures but in Database only one table has to store all data 😀.

At the end knowing a bit about how a given language handles the list of values storing can be very useful to get most out of it. In this blog we are going to explore Python (🐍) list internal working, not just How to do append, remove and other operation on list. We will try to understand how it works.

What is a list in Python?

Python List is used to store a list of items in Python, it is mutable and we can store any type of data in a given list. List provides many default functions and can be very useful for dealing with lists of items in your application.

Couple of list functions,

It is Mutable.
It is Iterable.
It can store any data type as an item, even another list as an item.
It allows duplicates.
It maintains order of elements.

Here is the code example for how you can create a Python list and perform the operations on it.

# List with initial values
numbers = [1, 2, 3, 4, 5]


numbers[1] = 10  # Update the second element in the list
numbers.append(6)  # Add a new element to the end of the list
numbers[1:3] = [20, 30]  # Replace the second and third elements with new values
# List comprehension
squares = [x**2 for x in range(1, 6)]


# Sorting
numbers.sort()


numbers[3] = "zero"  # Replace the fourth element with a string

# List with initial values
numbers = [1, 2, 3, 4, 5]


numbers[1] = 10  # Update the second element in the list
numbers.append(6)  # Add a new element to the end of the list
numbers[1:3] = [20, 30]  # Replace the second and third elements with new values
# List comprehension
squares = [x**2 for x in range(1, 6)]


# Sorting
numbers.sort()


numbers[3] = "zero"  # Replace the fourth element with a string

How does Python list work?

Let’s explore how the list internally works 👀.

First let’s understand how we define data structure. In short terms data structure defines how we store the data with some rules, for example arrays used to store the same data type values with fixed memory and fixed size. Linked lists are used to store data with unfixed memory and unfixed size. We do have many complex data structures based on rules or how they are going to store the data Like Graph, Tree etc.

Now coming to the main focus of this post, Python List working ☝️. First let’s understand how Python executes code. Python is an interpreted language that means code runs line by line with defined interpreters. One of the most used interpreters is CPython.

CPython is written in C, and it is the Official and default implementation for Python interpreter. There are other interpreters to Like PYPY etc.

Let’s understand process with diagram

First Python code is converted to intermediate byte code by interpreter, then interpreter runs code line by line and gives the output. Here all Python code is converted to .pyc files in the case of CPython 💭.

We will understand How lists work with CPython for Python. CPython has an Object name as PyObject, which is the base of any data structure in CPython.

As per cPython/object Docs.

Objects are always accessed through pointers of the type ‘PyObject *’.
The type ‘PyObject’ is a structure that only contains the reference count
and the type pointer. The actual memory allocated for an object
contains other data that can only be accessed after casting the pointer
to a pointer to a longer structure type.

That means PyObject is used as a pointer and stores the address to any data. Now let’s understand how a Python list is defined in CPython.

Here is a code for a list in CPython.

typedef struct {
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;

typedef struct {
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;

So each Item in the list is known as ob_item and it will be the only address of that item, like if you store 1 at 0 index then It will store the address of location where 1 is story on list first index. And this how Python/CPython enables the option of storing any type of data in a list. As address can be any off location 📍,

The struct define allocated which defines the size of list, after the given size threshold is reached It automatically increases the size of list with x amount. And this how any insertion in List will be O(1).

Let’s understand the whole process with a diagram.

We have a List x=[1], so first In memory It will create List with size more then 1.

When we append a value in x, the pointer is moved to the next index and stores the address of variable `zeel`.

Now if we add one more value than the default size buffer is reached the default size of the list will be increased with more slots.

And this how Python/CPython enables the function of allowing list to store more then one data type and dynamic size option.

Here the insertion is O(1) but as list size increases the time on execution to increase the default size of list is increased. So even though List provides many useful and dynamic functions, at some time speed will be an issue with huge data size.

Also as data type is not fixed the Python has to do many operations on runtime before doing operation on list. So now you know how to list internal work in Python. The implementation or execution of list varies from interpreter to interpreter 🌚. But the final output is fixed so there will not be much change in time when it comes to work with huge data.

TLDR;

Everything in life comes with some pros and cons, it's up to you if you want to take it, same as Python list, you have to decide when to use it and when to not use it.

Everything in life comes with some pros and cons, it's up to you if you want to take it, same as Python list, you have to decide when to use it and when to not use it.

Bye 👋 …..