For decades, computer advertisements have touted the power of multithreading as the answer to customer demands for more power. The number of cores available to a computer has shot up in the last few years, from four to eight or ten. But what is multithreading, and how can a computer programmer access it? This article will take a hard look at multithreading and examine how it can be used for appreciable performance gains in two of the most popular programming languages available today—Java and Python.
What is multithreading?
Basically, multithreading is an efficient way to run multiple tasks at the same time, often assigned by the same program. These tasks are managed at the smallest level available to the programmer in order to ensure that conflicts between threads are minimal and that system resources are used for maximum benefit. This level of management is referred to as a thread. A thread is an active part of a process, which is an action that a program undertakes. So, in a word processing program, for example, saving your file would be considered a process, while the action of allocating the space to save the file would be considered a thread.
This article deals with user-level threads, which are created by a user or by a user-created program, such as an application. Kernel-level threads (and in Java, Daemon threads) are not covered here. Kernel-level threads are specific to a particular operating system and take a long time for a system to implement, while Daemon threads are automatically created by the Java Virtual Machine in order to serve user-level threads. Neither Daemon threads nor Kernel-level threads are typically created by the average coder, so they are not covered here.
Also, please note that multithreading is not the same term as multitasking. Multitasking happens on a system level and involves memory handling by the OS to ensure that two programs can be working at the same time. Multithreading, on the other hand, occurs at the thread level, which is much smaller, and while it still involves memory handling to an extent, is mostly concerned with assigning threads(and thereby processes) to multiple cores. While multithreading can still be used on a single core machine, the most significant benefits from the technique are gained from using it on multicore computers. If each thread in shared memory is running on a different core on a multicore machine, this technique is known as parallel execution, or more simply as parallelism.
If the programmer is running a multithreaded program on a single core machine, multiple threads are sent to the single processor at once. The processor completes the tasks contained in the threads as quickly as it is able while accepting as many threads as it can. According to oracle.com, this technique is known as concurrent execution, or more simply as concurrency.
Multithreading on a Single-Core Processor – Worth It?
While it seems pointless to create a multithreaded program for a single-core processor, this may not be the case. Modern processors are extremely fast, and often bottlenecks are caused by the speed of input/output buses and devices, not the processor itself. If one thread demands utilization of a relatively slow hard drive, for example, other threads can be completed in a multithreaded program while the “first” thread is still accessing the hard drive. In comparison, if traditional programming techniques are used, the entire program must wait until the thread accessing the hard drive has been completed.
There can be a downside to creating multithreaded programs for a single-core processor. If the multithreaded program has too many threads running for the processor to handle at once, the threads could conceivably slow down the processor, which of course could cause performance issues. Another issue to be aware of when coding a multithreaded program for a single-core machine is that the threading management process could take up too much memory, actually slowing down the entire computer. This risk varies depending on the type of program being coded as multithreaded, but it can be significant. Even so, the advantages of making a multithreaded program generally outweigh the risks.
How Complicated is Multithreading in Python?
There is another factor that keeps all programmers from creating multithreaded programs in any language of their preference. Quite simply, multithreaded programs are difficult to program and are not recommended for beginners. Coding multithreaded programs involves memory management techniques that can be seen as counter-intuitive, depending on the programming language the coder decides to use.
For example, in Python there are two ways to create threads in a program. The first way, the <thread> module, is a bit older and isn’t often used in modern versions of Python. However, it is popular in legacy code and so it should still be studied by a programmer who truly wants to understand multithreading. The <thread> module is treated as a function, and in modern versions of Python is referred to as the <_thread> module.
However, most modern Python coding uses the <threading> module, which looks superficially similar to the <thread> module, but actually encompasses quite a bit more flexibility than the <thread> module does. The <threading> module can use a function-based approach, as the <thread> module does, but can also take an object-oriented approach to multi-threading which is far easier. So already, the beginning Python thread programmer runs into a situation that can be tricky — namely, two different modules with similar names, but approaches that can be different, depending on the programming.
When creating a thread based on an object-oriented approach, the <threading> module uses a completely different syntax than the threading structure does. According to techbeamers.com, the first task which must be completed by the programmer in order to use multithreading in the threading module is to construct a subclass from the <Thread> class. Then, the coder must override the <__init__(self [,args])> method to supply arguments. Finally, the coder must then override the <run(self [,args])> method to code the business logic of the thread.
After the threads have been created, then the programmer must use some type of memory management to ensure that no conflicts between threads are created. Conflicts between threads(in which two threads compete for the same memory space) can lead to deadlocks, a situation in which two or more processes are waiting for the same resources and neither can progress until the other yields. This situation can result in highly diminished performance and can even cause an application to “freeze” or crash, depending on the situation.
Fortunately, the Python <threading> module does have some functionality to prevent this situation from occurring, notably the Lock() method. After using the Lock() method, the programmer can invoke the acquire(blocking) method, which forces threads to run synchronously. Synchronization is the procedure in which the programmer can control program flow and allow the program to have access to shared data. In other words, synchronization allows the programmer to decide which threads have priority, when threads can access memory, and even tell threads to wait until a particular condition has been met. This will obviously help keep conflicts to a minimum.
Unfortunately, the Python <thread> module has no such functionality, although certain programming techniques may assist the veteran coder in working with this method. A tutorial on how exactly to work with multithreading using the Python <thread> module is beyond the scope of this article. As can be seen by the information noted above, coding a multithreaded application in Python should only be attempted by an experienced programmer. But what about other programming languages? Are they any easier to work with?
Multithreading in Java
Let’s use Java as an example. Java is known for its multithreading capabilities and has been around long enough for libraries to be created for nearly every situation a programmer can encounter. According to tutorialspoint.com, to code a Java program that’s capable of running multithreaded processes, the programmer should follow the following steps.
As a first step, the coder needs to implement a run() method provided by a runnable interface. This will provide a starting basis for the program to run correctly. The second step would be to instantiate a Thread object using the following constructor −Thread(Runnable threadObj, String threadName). Finally, once the Thread object is created, the programmer may start it by using the Start() method. This is one technique, as detailed on tutorialspoint.com, which will allow the Java programmer to create multithreaded programs. There are other ways to create threads in Java programming, and some are admittedly easier—for example, the Java programmer may simply create threads by extending a thread class, which enables the programmer to create a number of threads at the same time.
There is an undeniable benefit to learning how to code multithreaded programs. Programmed correctly, multithreaded programs run tasks more quickly and efficiently than single-threaded programs. However, it cannot be credibly denied that creating multithreaded programs is a challenge most suitable for experienced programmers. Although it may be tempting to start coding multithreaded programs as a novice, the sensible advice is to wait before attempting a project that is so complicated. While the performance gains are substantial, the chance of programming frustration and possibly even causing your computer to freeze are high. Coding a multithreaded program is a task best left to experienced programmers.