Concurrency HOWTO¶

There are many outstanding resources, both online and in print, that would do an excellent job of introducing you to concurrency. This howto document builds on those by walking you through how to apply that knowledge using Python.

Python supports the following concurrency models directly:

free-threading (stdlib, C-API)
isolated threads, AKA CSP/actor model (stdlib*, C-API)
coroutines, AKA async/await (language, stdlib, C-API)
multi-processing (stdlib)
distributed, e.g. SMP (stdlib (limited))

In this document, we’ll look at how to take advantage of Python’s concurrency support. The overall focus is on the following:

understanding the supported concurrency models
factors to consider when designing a concurrent solution
key concurrency primitives
high-level, app-oriented practical examples

Note

You should always make sure concurrency is the right tool for the job before you reach for it when solving your problem. There are many cases where concurrency simply isn’t applicable or will only complicate the solution. In-depth discussion of this point is outside the scope of this document.

Note

Free-threading is one of the oldest concurrency models, fundamental to operating systems, and widely supported in programming languages. However, it is generally considered perilous and not human-friendly. Other concurrency models have demonstrated better usability and newer programming languages typically avoid exposing threads directly. Take that into consideration before reaching for threads and look at the alternatives first.

Note

Python supports other concurrency models indirectly through community-maintained PyPI packages. One well-known example is dask, which supports “distributed” computing.

Quick reference¶

Terminology

We’ll be using the following terms and ideas throughout:

task (logical thread): a cohesive linear sequence of abstract steps in a program;

effectively, a mini-program;

the logical equivalent of executed instructions corresponding to code;

also known as “logical process”
physical thread (OS thread): where the actual code for a logical thread runs on the CPU (and operating system);

we avoid using using plain “thread” for this, to avoid ambiguity
Python thread: the Python runtime running in a physical thread

particularly the portion of the runtime state active in the physical thread

(see threading.Thread)
concurrency (multitasking): a program with multiple logical threads running simultaneously

(not necessarily in parallel)
parallelism (multi-core): running a program’s multiple logical threads on multiple physical threads (CPU cores)

For convenience, here is a summary of what we’ll cover later.

Concurrency Primitives

primitive	used with	purpose
…	…	…

High-level App Examples

workload (app)	per-request inputs	per-request outputs	N core tasks	core task
grep	N filenames (stdin) file bytes x N (disk)	M matches (stdout)	1+ per file	time: ~ file size mem: small
…	…	…	…	…
…	…	…	…	…

Each has side-by-side implementations for the different models:

workload (app)	side-by-side examples
grep	by concurrency models
…	by concurrency models
…	by concurrency models

Python Concurrency Models¶

As mentioned, there are essentially five concurrency models that Python supports directly:

model	Python stdlib	description
free threading	`threading`	using multiple physical threads in the same process, with no isolation between them
isolated threads (multiple interpreters)	interpreters	threads, often physical, with strict isolation between them (e.g. CSP and actor model)
coroutines (async/await)	`asyncio`	switching between logical threads is explicitly controlled by each
multi-processing	`multiprocessing`	using multiple isolated processes
distributed	multiprocessing (limited)	multiprocessing across multiple computers

There are tradeoffs to each, whether in performance or complexity. We’ll take a look at those tradeoffs in detail later.

Before that, we’ll review various comparisons of the concurrency models, and we’ll briefly talk about critical caveats for specific models.

Comparison tables¶

The following tables provide a detailed look with side-by-side comparisons.

key characteristics¶

	scale	multi-core	races	overhead
free-threading	small-medium	yes*	yes	very low
multiple interpreters	small-medium	yes	limited	low+
coroutines	small-medium	no	no	low
multi-processing	small	yes	limited	medium
distributed	large	yes	limited	medium

overhead details¶

	memory	startup	cross-task	management	system
free threading	very low	very low	none	very low	none
multiple interpreters	low*	medium*	low	very low	none
coroutines	low	low	none	low	none
multi-processing	medium	medium	medium	medium	low
distributed	medium+	medium+	medium-high	medium	low-medium

complexity¶

	parallel	shared mem	shared I/O	shared env	cross thread	sync	tracking	compat	extra LOC
free-threading	yes*	all	all	yes	high	explicit		yes	low?
multiple interpreters	yes	limited	all	yes	low	implicit	???	yes	low?
coroutines	no	all	all	yes	low-med?	implicit	???	no	low-med
multi-processing	yes	limited	no	no?	low	implicit +optional	???	yes	low-med?
distributed	yes	limited	no	no?	low	implicit +optional	???	yes	medium?

exposure¶

	academic research	academic curriculum	industry	examples	Python history
free-threading	very high	high	high	high	0.9?
isolated threads (multiple interpreters)	high	low?	low-medium?	low-medium?	2.2
coroutines	medium-high?	medium?	medium?	medium-high?	3.3-3.5 (2.2)
multi-processing	???	low?	low-medium?	low?	2.6
distributed	medium-high?	low?	medium?	medium?	n/a

Critical caveats¶

Here are some important details to consider, specific to individual concurrency models in Python.

Data races and non-deterministic scheduling (free-threading)¶

The principal caveat for physical threads is that each thread shares the full memory of the process with all its other threads. Combined with their non-deterministic scheduling (and parallel execution), threads expose programs to a significant risk of races.

The potential consequences of a race are data corruption and invalidated expectations of data consistency. In each case, the non-deterministic scheduling of threads means it is both hard to reproduce races and to track down where a race happened. These qualities make these bugs especially frustrating and worth diligently avoiding.

Python threads are light wrappers around physical threads and thus have the same caveats. The majority of data in a Python program is mutable and all of the program’s data is subject to potential modification by any thread at any moment. This requires extra effort, to synchronize around reads and writes. Furthermore, given the maximally-broad scope of the data involved, it’s difficult to be sure all possible races have been dealt with, especially as a code base changes over time.

The other concurrency models essentially don’t have this problem. In the case of coroutines, explicit cooperative scheduling eliminates the risk of a simultaneous read-write or write-write. It also means program logic can rely on memory consistency between synchronization points (await).

With the remaining concurrency models, data is never shared between logical threads unless done explicitly (typically at the existing inherent points of synchronization). By default that shared data is either read-only or managed in a thread-safe way. Most notably, the opt-in sharing means the set of shared data to manage is explicitly defined (and often small) instead of covering all memory in the process.

The Global Interpreter Lock (GIL)¶

While physical threads are the direct route to multi-core parallelism, Python’s threads have always had an extra wrinkle that gets in the way: the global interpreter lock (GIL).

The GIL is very efficient tool for keeping the Python implementation simple, which is an important constraint for the project. In fact, it protects Python’s maintainers and users from a large category of concurrency problems that one must normally face when threads are involved.

The big tradeoff is that the bytecode interpreter, which executes your Python code, only runs while holding the GIL. That means only one thread can be running Python code at a time. Threads will take short turns, so none have to wait too long, but it still prevents any actual parallelism of CPU-bound code.

That said, the Python runtime (and extension modules) can release the GIL when the thread is doing slow or long-running work unrelated to Python, like a blocking IO operation.

There is also an ongoing effort to eliminate the GIL: PEP 703. Any attempt to remove the GIL necessarily involves some slowdown to single-threaded performance and extra maintenance burden to the Python project and extension module maintainers. However, there is sufficient interest in unlocking full multi-core parallelism to justify the current experiment.

You can also move from free-threading to isolated threads using multiple interpreters. Each interpreter has has its own GIL. Thus, If you want multi-core parallelism, run a different interpreter in each thread. Their isolation means that each can run unblocked in that thread.

Thread isolation and multiple interpreters ¶

As just noted, races effectively stop being a problem if the memory used by each physical thread is effectively isolated from the others. That isolation can also help with the other caveats related to physical threads. In Python you can get this isolation by using multiple interpreters.

In this context, an “interpreter” represents nearly all the capability and state of the Python runtime, for its C-API and to execute Python code. The full runtime supports multiple interpreters and includes some state that all interpreters share. Most importantly, the state of each interpreter is effectively isolated from the others.

That isolation includes things like sys.modules. By default, interpreters mostly don’t share any data (including objects) at all. Anything that gets shared is done on a strictly opt-in basis. That means programmers wouldn’t need to worry about possible races with any data in the program. They would only need to worry about data that was explicitly shared.

Interpreters themselves are not specific to any thread, but instead each physical thread has (at most) one interpreter active at any given moment. Each interpreter can be associated in this way with any number of threads. Since each interpreter is isolated from the others, any thread using one interpreter is thus isolated from threads using any other interpreter.

Using multiple interpreters is fairly straight-forward:

create a new interpreter
switch the current thread to use that interpreter
call exec(), but targeting the new interpreter
switch back

Note that no threads were involved; running in an interpreter happens relative to the current thread. New threads aren’t implicitly involved.

Multi-processing and distributed computing provide similar isolation, though with some tradeoffs.

A stdlib module for using multiple interpreters ¶

While use of multiple interpreters has been part of Python’s C-API for decades, the feature hasn’t been exposed to Python code through the stdlib. PEP 734 proposes changing that by adding a new interpreters module.

In the meantime, an implementation of that PEP is available for Python 3.13+ on PyPI: interpreters-pep-734.

Improving performance for multiple interpreters ¶

The long-running effort to improve on Python’s implementation of multiple interpreters focused on isolation and stability; very little done to improve performance. This has the most impact on:

how much memory each interpreter uses (i.e. how many can run at the same time)
how long it takes to create a new interpreter

It also impacts how efficiently data/objects can be passed between interpreters, and how effectively objects can be shared.

As the work on isolation wraps up, improvements will shift to focus on performance and memory usage. Thus, the overhead of using multiple interpreters will drastically decrease over time.

Shared resources ¶

Aside from memory, all physical threads in a process share the following resources:

command line arguments (“argv”)
env vars
current working directory
signals, IPC, etc.
open I/O resources (file descriptors, sockets, etc.)

When relevant, these must be managed in a thread-safe way.

Tracing execution ¶

TBD

Coroutines are contagious ¶

Coroutines can be an effective mechanism for letting a program’s non-blocking code run while simultaneously waiting for blocking code to finish. The tricky part is that the underlying machinery (the event loop) relies on each coroutine explicitly yielding control at the appropriate moments.

Normal functions do not follow this pattern, so they cannot take advantage of that cooperative scheduling to avoid blocking the program. Thus, coroutines and non-coroutines don’t mix well. While there are tools for wrapping normal functions to act like coroutines, they are often converted into coroutines instead. At that point, if any non-async code relies on the function then either you’ll need to convert the other code a coroutine or you’ll need to keep the original non-async implementation around along with the new, almost identical async one.

You can see how that can proliferate, leading to possible extra maintenance/development costs.

Processes consume extra resources ¶

When using multi-processing for concurrency, keep in mind that the operating system will assign a certain set of limited resources to each process. For example, each process has its own PID and handle to the executable. You can run only so many processes before you run out of these resources. Concurrency in a single process doesn’t have this problem, and a distributed program can work around it.

Using multiprocessing for distributed computing ¶

Not only does the multiprocessing module support concurrency with multiple local processes, it can also support a distributed model using remote computers. That said, consider first looking into tools that have been designed specifically for distributed computing, like dask.

Resilience to crashes ¶

A process can crash if it does something it shouldn’t, like try to access memory outside what the OS has provided it. If your program is running in multiple processes (incl. distributed) then you can more easily recover from a crash in any one process. Recovering from a crash when using free-threading, multiple interpreters, or coroutines isn’t nearly so easy.

High-level APIs¶

Also note that Python’s stdlib provides various higher-level APIs that support these concurrency models in various contexts:

	`concurrent.futures`	`socketserver`	`http.server`
free-threading	`yes`	`yes`	`yes`
multiple interpreters	(pending)
coroutines	???
multi-processing	`yes` (`similar`)	`yes`
distributed	???

Designing A Program For Concurrency¶

Whether you are starting a new project using concurrency or refactoring an existing one to use it, it’s important to design for concurrency before taking one more step. Doing so will save you a lot of headache later.

decide if your program might benefit from concurrency
break down your *logical* program into distinct tasks
determine which tasks could run at the same time
identify the other concurrency-related characteristics of your program
decide which concurrency model fits best
go for it!

At each step you should be continuously asking yourself if concurrency is still a good fit for your program.

Some problems are obviously not solvable with concurrency. Otherwise, even if you could use concurrency, it might not provide much value. Furthermore, even if it seems like it would provide meaningful value, the additional costs in performance, complexity, or maintainability might outweigh that benefit.

Thus, when you’re thinking of solving a problem using concurrency, it’s crucial that you understand the problem well.

Getting started¶

How can concurrency help?¶

TBD

How can concurrency hurt?¶

TBD

Analyze your problem¶

Identifying the logical tasks in your program¶

TBD

The concurrency characteristics of your program¶

TBD

Other considerations¶

TBD

Pick a concurrency model¶

TBD

Python Concurrency Primitives¶

TBD

Group A¶

primitive 1¶

TBD

Group B¶

primitive 1¶

TBD

Python Concurrency Workload Examples¶

Below, we have a series of examples of how to implement the most common Python workloads that take advantage of concurrency. For each workload, you will find an implementation for each of the concurrency models.

The implementations are meant to accurately demonstrate how best to solve the problem using the given concurrency model. The examples for the workload are presented side-by-side, for easier comparison. The examples for threads, multiprocessing, and multiple interpreters will use concurrent.futures when that is the better approach. Performance comparisons are not included here.

Here’s a summary of the examples, by workload:

workload	req in	req out	N core tasks	core task
grep	N filenames (stdin) file bytes x N (disk)	M matches (stdout)	1+ per file	time: ~ file size mem: small
…	…	…	…	…
…	…	…	…	…

Also see:

Note

Each example is implemented as a basic command line tool, but can be easily adapted to run as a web service.

Workload: grep¶

This is a basic Python implementation of the linux grep tool. We read from one or more files and report about lines that match (or don’t match) the given regular expression.

This represents a workload involving a mix of moderate IO and CPU work.

For full example code see the side-by-side implementations below.

Design and analysis¶

Design steps from above:

concurrency fits?

Yes! There is potentially a bunch of work happening at the same time, and we want results as fast as possible.
identify logical tasks

At a high level, the application works like this:
1. handle args (including compile regex)
2. if recursive, walk tree to find filenames
3. for each file, yield each match
4. print each match
5. exit with 0 if matched and 1 otherwise
At step 3 we do the following for each file:

a. open the file b. iterate over the lines c. apply the regex to each line d. yield each match e. close the file
select concurrent tasks

Concurrent work happens at step 3. Sub-steps a, b, and e are IO-intensive. Sub-step c is CPU-intensive. The simplest approach would be one concurrent worker per file. Relative to a strictly sequential approach, there’s extra complexity here in managing the workers, fanning out the work to them, and merging the results back into a single iterator.

If we were worried about any particularly large file or sufficiently large regular expression, we could take things further. That would involve splitting up step 3 even further by breaking the file into chunks that are divided up among multiple workers. However, doing so would introduce extra complexity that might not pay for itself.
concurrency-related characteristics

TBD
pick best model

TBD

Here are additional key constraints and considerations:

there’s usually a limit to how many files can be open concurrently, so we’ll have to be careful not to process too many at once
the order of the yielded/printed matches must match the order of the requested files and the order of each files lines

High-level code¶

With the initial design and analysis done, let’s move on to code. We’ll start with the high-level code corresponding to the application’s five top-level tasks we identified earlier.

Most of the high-level code has nothing to do with concurrency. The part that does, search(), is highlighted.

(expand)

def main(regex=regex, filenames=filenames):
    # step 1
    regex = re.compile(regex)
    # step 2
    filenames = resolve_filenames(filenames, recursive)
    # step 3
    matches = search(filenames, regex, opts)

    # step 4

    if hasattr(type(matches), '__aiter__'):
        async def iter_and_show(matches=matches):
            matches = type(matches).__aiter__(matches)

            # Handle the first match.
            async for filename, line in matches:
                if opts.quiet:
                    return 0
                elif opts.filesonly:
                    print(filename)
                else:
                    async for second in matches:
                        print(f'{filename}: {line}')
                        filename, line = second
                        print(f'{filename}: {line}')
                        break
                    else:
                        print(line)
                break
            else:
                return 1

            # Handle the remaining matches.
            if opts.filesonly:
                async for filename, _ in matches:
                    print(filename)
            else:
                async for filename, line in matches:
                    print(f'{filename}: {line}')

            return 0
        return asyncio.run(search_and_show())
    else:
        matches = iter(matches)

        # Handle the first match.
        for filename, line in matches:
            if opts.quiet:
                return 0
            elif opts.filesonly:
                print(filename)
            else:
                for second in matches:
                    print(f'{filename}: {line}')
                    filename, line = second
                    print(f'{filename}: {line}')
                    break
                else:
                    print(line)
            break
        else:
            return 1

        # Handle the remaining matches.
        if opts.filesonly:
            for filename, _ in matches:
                print(filename)
        else:
            for filename, line in matches:
                print(f'{filename}: {line}')

        return 0
rc = main()

# step 5
sys.exit(rc)

The search() function that gets called returns an iterator (or async iterator) that yields the matches, which get printed. Here’s the high-level code again, but with highlighting on each line that uses the iterator.

(expand)

def main(regex=regex, filenames=filenames):
    # step 1
    regex = re.compile(regex)
    # step 2
    filenames = resolve_filenames(filenames, recursive)
    # step 3
    matches = search(filenames, regex, opts)

    # step 4

    if hasattr(type(matches), '__aiter__'):
        async def iter_and_show(matches=matches):
            matches = type(matches).__aiter__(matches)

            # Handle the first match.
            async for filename, line in matches:
                if opts.quiet:
                    return 0
                elif opts.filesonly:
                    print(filename)
                else:
                    async for second in matches:
                        print(f'{filename}: {line}')
                        filename, line = second
                        print(f'{filename}: {line}')
                        break
                    else:
                        print(line)
                break
            else:
                return 1

            # Handle the remaining matches.
            if opts.filesonly:
                async for filename, _ in matches:
                    print(filename)
            else:
                async for filename, line in matches:
                    print(f'{filename}: {line}')

            return 0
        return asyncio.run(search_and_show())
    else:
        matches = iter(matches)

        # Handle the first match.
        for filename, line in matches:
            if opts.quiet:
                return 0
            elif opts.filesonly:
                print(filename)
            else:
                for second in matches:
                    print(f'{filename}: {line}')
                    filename, line = second
                    print(f'{filename}: {line}')
                    break
                else:
                    print(line)
            break
        else:
            return 1

        # Handle the remaining matches.
        if opts.filesonly:
            for filename, _ in matches:
                print(filename)
        else:
            for filename, line in matches:
                print(f'{filename}: {line}')

        return 0
rc = main()

# step 5
sys.exit(rc)

Here’s the search function for a non-concurrent implementation:

def search_sequential(filenames, regex, opts):
    for filename in filenames:
        lines = iter_lines(filename)
        yield from search_lines(lines, regex, opts, filename)

iter_lines() is a straight-forward helper that opens the file and yields each line.

search_lines() is a sequential-search helper used by all the example implementations here:

def search_lines(lines, regex, opts, filename):
    try:
        if opts.filesonly:
            if opts.invert:
                for line in lines:
                    m = regex.search(line)
                    if m:
                        break
                else:
                    yield (filename, None)
            else:
                for line in lines:
                    m = regex.search(line)
                    if m:
                        yield (filename, None)
                        break
        else:
            assert not opts.invert, opts
            for line in lines:
                m = regex.search(line)
                if not m:
                    continue
                if line.endswith(os.linesep):
                    line = line[:-len(os.linesep)]
                yield (filename, line)
    except UnicodeDecodeError:
        # It must be a binary file.
        return

Concurrent Code¶

Now lets look at how concurrency actually fits in. We’ll start with an example using threads. However, the pattern is essentially the same for all the concurrency models.

def search_using_threads(filenames, regex, opts):
    matches_by_file = queue.Queue()

    def do_background():
        MAX_FILES = 10
        MAX_MATCHES = 100

        # Make sure we don't have too many threads at once,
        # i.e. too many files open at once.
        counter = threading.Semaphore(MAX_FILES)

        def search_file(filename, matches):
            lines = iter_lines(filename)
            for match in search_lines(lines, regex, opts, filename):
                matches.put(match)  # blocking
            matches.put(None)  # blocking
            # Let a new thread start.
            counter.release()

        for filename in filenames:
            # Prepare for the file.
            matches = queue.Queue(MAX_MATCHES)
            matches_by_file.put(matches)

            # Start a thread to process the file.
            t = threading.Thread(target=search_file, args=(filename, matches))
            counter.acquire()
            t.start()
        matches_by_file.put(None)

    background = threading.Thread(target=do_background)
    background.start()

    # Yield the results as they are received, in order.
    matches = matches_by_file.get()  # blocking
    while matches is not None:
        match = matches.get()  # blocking
        while match is not None:
            yield match
            match = matches.get()  # blocking
        matches = matches_by_file.get()  # blocking

    background.join()

We loop over the filenames and start a thread for each one. Each one sends the matches it finds back using a queue.

We want to start yielding matches as soon as possible, so we also use a background thread to run the code that loops over the filenames.

We use a queue of queues (matches_by_file) to make sure we get results back in the right order, regardless of when the worker threads provide them.

The operating system will only let us have so many files open at once, so we limit how many workers are running. (MAX_FILES)

If the workers find matches substantially faster than we can use them then we may end up using more memory than we need to. To avoid any backlog, we limit how many matches can be queued up for any given file. (MAX_MATCHES)

One notable point is that the actual files are not opened until we need to iterate over the lines. For the most part, this is so we can avoid dealing with passing an open file to a concurrency worker. Instead we pass the filename, which is much simpler.

Finally, we have to manage the workers manually. If we used concurrent.futures, it would take care of that for us.

Here are some things we don’t do but might be worth doing:

stop iteration when requested (or for ctrl-C)
split up each file between multiple workers
…

Recall that the search() function returns an iterator that yields all the matches. Concurrency may be happening as long as that iterator hasn’t been exhausted. That means it is happening more or less the entire time we loop over the matches to print them in main() (in the high-level code above).

Side-by-side¶

Here are the implementations for the different concurrency models, side-by-side for easy comparison (main differences highlighted):

sequential

threads

multiple interpreters

coroutines

multiple processes

concurrent.futures

(expand)

import os
import os.path
import re
import sys


def search(filenames, regex, opts):
    for filename in filenames:
        # iter_lines() opens the file too.
        lines = iter_lines(filename)
        yield from search_lines(
                            lines, regex, opts, filename)


def iter_lines(filename):
    if filename == '-':
        yield from sys.stdin
    else:
        with open(filename) as infile:
            yield from infile


def search_lines(lines, regex, opts, filename):
    try:
        if opts.filesonly:
            if opts.invert:
                for line in lines:
                    m = regex.search(line)
                    if m:
                        break
                else:
                    yield (filename, None)
            else:
                for line in lines:
                    m = regex.search(line)
                    if m:
                        yield (filename, None)
                        break
        else:
            assert not opts.invert, opts
            for line in lines:
                m = regex.search(line)
                if not m:
                    continue
                if line.endswith(os.linesep):
                    line = line[:-len(os.linesep)]
                yield (filename, line)
    except UnicodeDecodeError:
        # It must be a binary file.
        return


def resolve_filenames(filenames, recursive=False):
    for filename in filenames:
        assert isinstance(filename, str), repr(filename)
        if filename == '-':
            yield '-'
        elif not os.path.isdir(filename):
            yield filename
        elif recursive:
            for d, _, files in os.walk(filename):
                for base in files:
                    yield os.path.join(d, base)


if __name__ == '__main__':
    # Parse the args.
    import argparse
    ap = argparse.ArgumentParser(prog='grep')

    ap.add_argument('-r', '--recursive',
                    action='store_true')
    ap.add_argument('-L', '--files-without-match',
                    dest='filesonly',
                    action='store_const', const='invert')
    ap.add_argument('-l', '--files-with-matches',
                    dest='filesonly',
                    action='store_const', const='match')
    ap.add_argument('-q', '--quiet', action='store_true')
    ap.set_defaults(invert=False)

    reopts = ap.add_mutually_exclusive_group(required=True)
    reopts.add_argument('-e', '--regexp', dest='regex',
                        metavar='REGEX')
    reopts.add_argument('regex', nargs='?',
                        metavar='REGEX')

    ap.add_argument('files', nargs='+', metavar='FILE')

    opts = ap.parse_args()
    ns = vars(opts)

    regex = ns.pop('regex')
    filenames = ns.pop('files')
    recursive = ns.pop('recursive')
    if opts.filesonly:
        if opts.filesonly == 'invert':
            opts.invert = True
        else:
            assert opts.filesonly == 'match', opts
            opts.invert = False
    opts.filesonly = bool(opts.filesonly)

    def main(regex=regex, filenames=filenames):
        # step 1
        regex = re.compile(regex)
        # step 2
        filenames = resolve_filenames(filenames, recursive)
        # step 3
        matches = search(filenames, regex, opts)
        matches = iter(matches)

        # step 4

        # Handle the first match.
        for filename, line in matches:
            if opts.quiet:
                return 0
            elif opts.filesonly:
                print(filename)
            else:
                for second in matches:
                    print(f'{filename}: {line}')
                    filename, line = second
                    print(f'{filename}: {line}')
                    break
                else:
                    print(line)
            break
        else:
            return 1

        # Handle the remaining matches.
        if opts.filesonly:
            for filename, _ in matches:
                print(filename)
        else:
            for filename, line in matches:
                print(f'{filename}: {line}')

        return 0
    rc = main()

    # step 5
    sys.exit(rc)

(expand)

import os
import os.path
import re
import sys

import queue
import threading


def search(filenames, regex, opts):
    matches_by_file = queue.Queue()

    def do_background():
        MAX_FILES = 10
        MAX_MATCHES = 100

        # Make sure we don't have too many threads at once,
        # i.e. too many files open at once.
        counter = threading.Semaphore(MAX_FILES)

        def search_file(filename, matches):
            lines = iter_lines(filename)
            for match in search_lines(
                            lines, regex, opts, filename):
                matches.put(match)  # blocking
            matches.put(None)  # blocking
            # Let a new thread start.
            counter.release()

        for filename in filenames:
            # Prepare for the file.
            matches = queue.Queue(MAX_MATCHES)
            matches_by_file.put(matches)

            # Start a thread to process the file.
            t = threading.Thread(target=search_file,
                                 args=(filename, matches))
            counter.acquire()
            t.start()
        matches_by_file.put(None)

    background = threading.Thread(target=do_background)
    background.start()

    # Yield the results as they are received, in order.
    matches = matches_by_file.get()  # blocking
    while matches is not None:
        match = matches.get()  # blocking
        while match is not None:
            yield match
            match = matches.get()  # blocking
        matches = matches_by_file.get()  # blocking

    background.join()


def iter_lines(filename):
    if filename == '-':
        yield from sys.stdin
    else:
        with open(filename) as infile:
            yield from infile


def search_lines(lines, regex, opts, filename):
    try:
        if opts.filesonly:
            if opts.invert:
                for line in lines:
                    m = regex.search(line)
                    if m:
                        break
                else:
                    yield (filename, None)
            else:
                for line in lines:
                    m = regex.search(line)
                    if m:
                        yield (filename, None)
                        break
        else:
            assert not opts.invert, opts
            for line in lines:
                m = regex.search(line)
                if not m:
                    continue
                if line.endswith(os.linesep):
                    line = line[:-len(os.linesep)]
                yield (filename, line)
    except UnicodeDecodeError:
        # It must be a binary file.
        return


def resolve_filenames(filenames, recursive=False):
    for filename in filenames:
        assert isinstance(filename, str), repr(filename)
        if filename == '-':
            yield '-'
        elif not os.path.isdir(filename):
            yield filename
        elif recursive:
            for d, _, files in os.walk(filename):
                for base in files:
                    yield os.path.join(d, base)


if __name__ == '__main__':
    # Parse the args.
    import argparse
    ap = argparse.ArgumentParser(prog='grep')

    ap.add_argument('-r', '--recursive',
                    action='store_true')
    ap.add_argument('-L', '--files-without-match',
                    dest='filesonly',
                    action='store_const', const='invert')
    ap.add_argument('-l', '--files-with-matches',
                    dest='filesonly',
                    action='store_const', const='match')
    ap.add_argument('-q', '--quiet', action='store_true')
    ap.set_defaults(invert=False)

    reopts = ap.add_mutually_exclusive_group(required=True)
    reopts.add_argument('-e', '--regexp', dest='regex',
                        metavar='REGEX')
    reopts.add_argument('regex', nargs='?',
                        metavar='REGEX')

    ap.add_argument('files', nargs='+', metavar='FILE')

    opts = ap.parse_args()
    ns = vars(opts)

    regex = ns.pop('regex')
    filenames = ns.pop('files')
    recursive = ns.pop('recursive')
    if opts.filesonly:
        if opts.filesonly == 'invert':
            opts.invert = True
        else:
            assert opts.filesonly == 'match', opts
            opts.invert = False
    opts.filesonly = bool(opts.filesonly)

    def main(regex=regex, filenames=filenames):
        # step 1
        regex = re.compile(regex)
        # step 2
        filenames = resolve_filenames(filenames, recursive)
        # step 3
        matches = search(filenames, regex, opts)
        matches = iter(matches)

        # step 4

        # Handle the first match.
        for filename, line in matches:
            if opts.quiet:
                return 0
            elif opts.filesonly:
                print(filename)
            else:
                for second in matches:
                    print(f'{filename}: {line}')
                    filename, line = second
                    print(f'{filename}: {line}')
                    break
                else:
                    print(line)
            break
        else:
            return 1

        # Handle the remaining matches.
        if opts.filesonly:
            for filename, _ in matches:
                print(filename)
        else:
            for filename, line in matches:
                print(f'{filename}: {line}')

        return 0
    rc = main()

    # step 5
    sys.exit(rc)

(expand)

import os
import os.path
import re
import sys

import test.support.interpreters as interpreters
import test.support.interpreters.queues as interp_queues
import types
import queue
import threading


def search(filenames, regex, opts):
    matches_by_file = queue.Queue()

    def do_background():
        MAX_FILES = 10
        MAX_MATCHES = 100
        new_queue = interpreters.queues.create

        def new_interpreter():
            interp = interpreters.create()
            interp.exec(f"""if True:
                with open({__file__!r}) as infile:
                    text = infile.read()
                ns = dict()
                exec(text, ns, ns)
                prep_interpreter = ns['prep_interpreter']
                del ns, text

                search_file = prep_interpreter(
                    {regex.pattern!r},
                    {regex.flags},
                    {tuple(vars(opts).items())},
                )
                """)
            return interp

        ready_workers = queue.Queue(MAX_FILES)
        workers = []

        def next_worker():
            if len(workers) < MAX_FILES:
                interp = new_interpreter()
                workers.append(interp)
                ready_workers.put(interp)
            return ready_workers.get()  # blocking

        def do_work(filename, matches, interp):
            interp.prepare_main(matches=matches)
            interp.exec(
                    f'search_file({filename!r}, matches)')
            # Let a new thread start.
            ready_workers.put(interp)

        for filename in filenames:
            # Prepare for the file.
            matches = interp_queues.create(MAX_MATCHES)
            matches_by_file.put(matches)
            interp = next_worker()

            # Start a thread to process the file.
            t = threading.Thread(
                target=do_work,
                args=(filename, matches, interp),
            )
            t.start()
        matches_by_file.put(None)

    background = threading.Thread(target=do_background)
    background.start()

    # Yield the results as they are received, in order.
    matches = matches_by_file.get()  # blocking
    while matches is not None:
        match = matches.get()  # blocking
        while match is not None:
            yield match
            match = matches.get()  # blocking
        matches = matches_by_file.get()  # blocking

    background.join()


def prep_interpreter(regex_pat, regex_flags, opts):
    regex = re.compile(regex_pat, regex_flags)
    opts = types.SimpleNamespace(**dict(opts))

    def search_file(filename, matches):
        lines = iter_lines(filename)
        for match in search_lines(
                            lines, regex, opts, filename):
            matches.put(match)  # blocking
        matches.put(None)  # blocking
    return search_file


def iter_lines(filename):
    if filename == '-':
        yield from sys.stdin
    else:
        with open(filename) as infile:
            yield from infile


def search_lines(lines, regex, opts, filename):
    try:
        if opts.filesonly:
            if opts.invert:
                for line in lines:
                    m = regex.search(line)
                    if m:
                        break
                else:
                    yield (filename, None)
            else:
                for line in lines:
                    m = regex.search(line)
                    if m:
                        yield (filename, None)
                        break
        else:
            assert not opts.invert, opts
            for line in lines:
                m = regex.search(line)
                if not m:
                    continue
                if line.endswith(os.linesep):
                    line = line[:-len(os.linesep)]
                yield (filename, line)
    except UnicodeDecodeError:
        # It must be a binary file.
        return


def resolve_filenames(filenames, recursive=False):
    for filename in filenames:
        assert isinstance(filename, str), repr(filename)
        if filename == '-':
            yield '-'
        elif not os.path.isdir(filename):
            yield filename
        elif recursive:
            for d, _, files in os.walk(filename):
                for base in files:
                    yield os.path.join(d, base)


if __name__ == '__main__':
    # Parse the args.
    import argparse
    ap = argparse.ArgumentParser(prog='grep')

    ap.add_argument('-r', '--recursive',
                    action='store_true')
    ap.add_argument('-L', '--files-without-match',
                    dest='filesonly',
                    action='store_const', const='invert')
    ap.add_argument('-l', '--files-with-matches',
                    dest='filesonly',
                    action='store_const', const='match')
    ap.add_argument('-q', '--quiet', action='store_true')
    ap.set_defaults(invert=False)

    reopts = ap.add_mutually_exclusive_group(required=True)
    reopts.add_argument('-e', '--regexp', dest='regex',
                        metavar='REGEX')
    reopts.add_argument('regex', nargs='?',
                        metavar='REGEX')

    ap.add_argument('files', nargs='+', metavar='FILE')

    opts = ap.parse_args()
    ns = vars(opts)

    regex = ns.pop('regex')
    filenames = ns.pop('files')
    recursive = ns.pop('recursive')
    if opts.filesonly:
        if opts.filesonly == 'invert':
            opts.invert = True
        else:
            assert opts.filesonly == 'match', opts
            opts.invert = False
    opts.filesonly = bool(opts.filesonly)

    def main(regex=regex, filenames=filenames):
        # step 1
        regex = re.compile(regex)
        # step 2
        filenames = resolve_filenames(filenames, recursive)
        # step 3
        matches = search(filenames, regex, opts)
        matches = iter(matches)

        # step 4

        # Handle the first match.
        for filename, line in matches:
            if opts.quiet:
                return 0
            elif opts.filesonly:
                print(filename)
            else:
                for second in matches:
                    print(f'{filename}: {line}')
                    filename, line = second
                    print(f'{filename}: {line}')
                    break
                else:
                    print(line)
            break
        else:
            return 1

        # Handle the remaining matches.
        if opts.filesonly:
            for filename, _ in matches:
                print(filename)
        else:
            for filename, line in matches:
                print(f'{filename}: {line}')

        return 0
    rc = main()

    # step 5
    sys.exit(rc)

(expand)

import os
import os.path
import re
import sys

import asyncio


async def search(filenames, regex, opts):
    matches_by_file = asyncio.Queue()

    async def do_background():
        MAX_FILES = 10
        MAX_MATCHES = 100

        # Make sure we don't have too many coros at once,
        # i.e. too many files open at once.
        counter = asyncio.Semaphore(MAX_FILES)

        async def search_file(filename, matches):
            # aiter_lines() opens the file too.
            lines = iter_lines(filename)
            async for m in search_lines(
                            lines, regex, opts, filename):
                await matches.put(match)
            await matches.put(None)
            # Let a new coroutine start.
            counter.release()

        async with asyncio.TaskGroup() as tg:
            for filename in filenames:
                # Prepare for the file.
                matches = asyncio.Queue(MAX_MATCHES)
                await matches_by_file.put(matches)

                # Start a coroutine to process the file.
                tg.create_task(
                    search_file(filename, matches),
                )
                await counter.acquire()
            await matches_by_file.put(None)

    background = asyncio.create_task(do_background())

    # Yield the results as they are received, in order.
    matches = await matches_by_file.get()  # blocking
    while matches is not None:
        match = await matches.get()  # blocking
        while match is not None:
            yield match
            match = await matches.get()  # blocking
        matches = await matches_by_file.get()  # blocking

    await asyncio.wait([background])


async def iter_lines(filename):
    if filename == '-':
        infile = sys.stdin
        line = await read_line_async(infile)
        while line:
            yield line
            line = await read_line_async(infile)
    else:
        # XXX Open using async?
        with open(filename) as infile:
            line = await read_line_async(infile)
            while line:
                yield line
                line = await read_line_async(infile)


async def read_line_async(infile):
    # XXX Do this async!
    # maybe make use of asyncio.to_thread()
    # or loop.run_in_executor()?
    return infile.readline()


async def search_lines(lines, regex, opts, filename):
    try:
        if opts.filesonly:
            if opts.invert:
                async for line in lines:
                    m = regex.search(line)
                    if m:
                        break
                else:
                    yield (filename, None)
            else:
                async for line in lines:
                    m = regex.search(line)
                    if m:
                        yield (filename, None)
                        break
        else:
            assert not opts.invert, opts
            async for line in lines:
                m = regex.search(line)
                if not m:
                    continue
                if line.endswith(os.linesep):
                    line = line[:-len(os.linesep)]
                yield (filename, line)
    except UnicodeDecodeError:
        # It must be a binary file.
        return


def resolve_filenames(filenames, recursive=False):
    for filename in filenames:
        assert isinstance(filename, str), repr(filename)
        if filename == '-':
            yield '-'
        elif not os.path.isdir(filename):
            yield filename
        elif recursive:
            for d, _, files in os.walk(filename):
                for base in files:
                    yield os.path.join(d, base)


if __name__ == '__main__':
    # Parse the args.
    import argparse
    ap = argparse.ArgumentParser(prog='grep')

    ap.add_argument('-r', '--recursive',
                    action='store_true')
    ap.add_argument('-L', '--files-without-match',
                    dest='filesonly',
                    action='store_const', const='invert')
    ap.add_argument('-l', '--files-with-matches',
                    dest='filesonly',
                    action='store_const', const='match')
    ap.add_argument('-q', '--quiet', action='store_true')
    ap.set_defaults(invert=False)

    reopts = ap.add_mutually_exclusive_group(required=True)
    reopts.add_argument('-e', '--regexp', dest='regex',
                        metavar='REGEX')
    reopts.add_argument('regex', nargs='?',
                        metavar='REGEX')

    ap.add_argument('files', nargs='+', metavar='FILE')

    opts = ap.parse_args()
    ns = vars(opts)

    regex = ns.pop('regex')
    filenames = ns.pop('files')
    recursive = ns.pop('recursive')
    if opts.filesonly:
        if opts.filesonly == 'invert':
            opts.invert = True
        else:
            assert opts.filesonly == 'match', opts
            opts.invert = False
    opts.filesonly = bool(opts.filesonly)

    async def main(regex=regex, filenames=filenames):
        # step 1
        regex = re.compile(regex)
        # step 2
        filenames = resolve_filenames(filenames, recursive)
        # step 3
        matches = search(filenames, regex, opts)
        matches = type(matches).__aiter__(matches)

        # step 4

        # Handle the first match.
        async for filename, line in matches:
            if opts.quiet:
                return 0
            elif opts.filesonly:
                print(filename)
            else:
                async for second in matches:
                    print(f'{filename}: {line}')
                    filename, line = second
                    print(f'{filename}: {line}')
                    break
                else:
                    print(line)
            break
        else:
            return 1

        # Handle the remaining matches.
        if opts.filesonly:
            async for filename, _ in matches:
                print(filename)
        else:
            async for filename, line in matches:
                print(f'{filename}: {line}')

        return 0
    rc = asyncio.run(main())

    # step 5
    sys.exit(rc)

(expand)

import os
import os.path
import re
import sys

import multiprocessing
import queue
import threading


def search(filenames, regex, opts):
    matches_by_file = queue.Queue()

    def do_background():
        MAX_FILES = 10
        MAX_MATCHES = 100

        # Make sure we don't have too many procs at once,
        # i.e. too many files open at once.
        counter = threading.Semaphore(MAX_FILES)
        finished = multiprocessing.Queue()
        active = {}
        done = False

        def monitor_tasks():
            while not done:
                try:
                    index = finished.get(timeout=0.1)
                except queue.Empty:
                    continue
                proc = active.pop(index)
                proc.join(0.1)
                if proc.is_alive():
                    # It's taking too long to terminate.
                    # We can wait for it at the end.
                    active[index] = proc
                # Let a new process start.
                counter.release()
        monitor = threading.Thread(target=monitor_tasks)
        monitor.start()

        for index, filename in enumerate(filenames):
            # Prepare for the file.
            matches = multiprocessing.Queue(MAX_MATCHES)
            matches_by_file.put(matches)

            # Start a subprocess to process the file.
            proc = multiprocessing.Process(
                target=search_file,
                args=(filename, matches, regex, opts,
                      index, finished),
            )
            counter.acquire(blocking=True)
            active[index] = proc
            proc.start()
        matches_by_file.put(None)
        # Wait for all remaining tasks to finish.
        done = True
        monitor.join()
        for proc in active.values():
            proc.join()

    background = threading.Thread(target=do_background)
    background.start()

    # Yield the results as they are received, in order.
    matches = matches_by_file.get()  # blocking
    while matches is not None:
        match = matches.get()  # blocking
        while match is not None:
            yield match
            match = matches.get()  # blocking
        matches = matches_by_file.get()  # blocking

    background.join()


def search_file(filename, matches, regex, opts,
                index, finished):
    lines = iter_lines(filename)
    for match in search_lines(lines, regex, opts, filename):
        matches.put(match)  # blocking
    matches.put(None)  # blocking
    # Let a new process start.
    finished.put(index)


def iter_lines(filename):
    if filename == '-':
        yield from sys.stdin
    else:
        with open(filename) as infile:
            yield from infile


def search_lines(lines, regex, opts, filename):
    try:
        if opts.filesonly:
            if opts.invert:
                for line in lines:
                    m = regex.search(line)
                    if m:
                        break
                else:
                    yield (filename, None)
            else:
                for line in lines:
                    m = regex.search(line)
                    if m:
                        yield (filename, None)
                        break
        else:
            assert not opts.invert, opts
            for line in lines:
                m = regex.search(line)
                if not m:
                    continue
                if line.endswith(os.linesep):
                    line = line[:-len(os.linesep)]
                yield (filename, line)
    except UnicodeDecodeError:
        # It must be a binary file.
        return


def resolve_filenames(filenames, recursive=False):
    for filename in filenames:
        assert isinstance(filename, str), repr(filename)
        if filename == '-':
            yield '-'
        elif not os.path.isdir(filename):
            yield filename
        elif recursive:
            for d, _, files in os.walk(filename):
                for base in files:
                    yield os.path.join(d, base)


if __name__ == '__main__':
    multiprocessing.set_start_method('spawn')

    # Parse the args.
    import argparse
    ap = argparse.ArgumentParser(prog='grep')

    ap.add_argument('-r', '--recursive',
                    action='store_true')
    ap.add_argument('-L', '--files-without-match',
                    dest='filesonly',
                    action='store_const', const='invert')
    ap.add_argument('-l', '--files-with-matches',
                    dest='filesonly',
                    action='store_const', const='match')
    ap.add_argument('-q', '--quiet', action='store_true')
    ap.set_defaults(invert=False)

    reopts = ap.add_mutually_exclusive_group(required=True)
    reopts.add_argument('-e', '--regexp', dest='regex',
                        metavar='REGEX')
    reopts.add_argument('regex', nargs='?',
                        metavar='REGEX')

    ap.add_argument('files', nargs='+', metavar='FILE')

    opts = ap.parse_args()
    ns = vars(opts)

    regex = ns.pop('regex')
    filenames = ns.pop('files')
    recursive = ns.pop('recursive')
    if opts.filesonly:
        if opts.filesonly == 'invert':
            opts.invert = True
        else:
            assert opts.filesonly == 'match', opts
            opts.invert = False
    opts.filesonly = bool(opts.filesonly)

    def main(regex=regex, filenames=filenames):
        # step 1
        regex = re.compile(regex)
        # step 2
        filenames = resolve_filenames(filenames, recursive)
        # step 3
        matches = search(filenames, regex, opts)
        matches = iter(matches)

        # step 4

        # Handle the first match.
        for filename, line in matches:
            if opts.quiet:
                return 0
            elif opts.filesonly:
                print(filename)
            else:
                for second in matches:
                    print(f'{filename}: {line}')
                    filename, line = second
                    print(f'{filename}: {line}')
                    break
                else:
                    print(line)
            break
        else:
            return 1

        # Handle the remaining matches.
        if opts.filesonly:
            for filename, _ in matches:
                print(filename)
        else:
            for filename, line in matches:
                print(f'{filename}: {line}')

        return 0
    rc = main()

    # step 5
    sys.exit(rc)

(expand)

import os
import os.path
import re
import sys

from concurrent.futures import ThreadPoolExector
import queue
import threading


def search(filenames, regex, opts):
    matches_by_file = queue.Queue()

    def do_background():
        MAX_FILES = 10
        MAX_MATCHES = 100

        def search_file(filename, matches):
            lines = iter_lines(filename)
            for match in search_lines(
                            lines, regex, opts, filename):
                matches.put(match)  # blocking
            matches.put(None)  # blocking

        with ThreadPoolExecutor(MAX_FILES) as workers:
            for filename in filenames:
                # Prepare for the file.
                matches = queue.Queue(MAX_MATCHES)
                matches_by_file.put(matches)

                # Start a thread to process the file.
                workers.submit(
                            search_file, filename, matches)
            matches_by_file.put(None)

    background = threading.Thread(target=do_background)
    background.start()

    # Yield the results as they are received, in order.
    matches = matches_by_file.get()  # blocking
    while matches is not None:
        match = matches.get()  # blocking
        while match is not None:
            yield match
            match = matches.get()  # blocking
        matches = matches_by_file.get()  # blocking

    background.join()


def iter_lines(filename):
    if filename == '-':
        yield from sys.stdin
    else:
        with open(filename) as infile:
            yield from infile


def search_lines(lines, regex, opts, filename):
    try:
        if opts.filesonly:
            if opts.invert:
                for line in lines:
                    m = regex.search(line)
                    if m:
                        break
                else:
                    yield (filename, None)
            else:
                for line in lines:
                    m = regex.search(line)
                    if m:
                        yield (filename, None)
                        break
        else:
            assert not opts.invert, opts
            for line in lines:
                m = regex.search(line)
                if not m:
                    continue
                if line.endswith(os.linesep):
                    line = line[:-len(os.linesep)]
                yield (filename, line)
    except UnicodeDecodeError:
        # It must be a binary file.
        return


def resolve_filenames(filenames, recursive=False):
    for filename in filenames:
        assert isinstance(filename, str), repr(filename)
        if filename == '-':
            yield '-'
        elif not os.path.isdir(filename):
            yield filename
        elif recursive:
            for d, _, files in os.walk(filename):
                for base in files:
                    yield os.path.join(d, base)


if __name__ == '__main__':
    # Parse the args.
    import argparse
    ap = argparse.ArgumentParser(prog='grep')

    ap.add_argument('-r', '--recursive',
                    action='store_true')
    ap.add_argument('-L', '--files-without-match',
                    dest='filesonly',
                    action='store_const', const='invert')
    ap.add_argument('-l', '--files-with-matches',
                    dest='filesonly',
                    action='store_const', const='match')
    ap.add_argument('-q', '--quiet', action='store_true')
    ap.set_defaults(invert=False)

    reopts = ap.add_mutually_exclusive_group(required=True)
    reopts.add_argument('-e', '--regexp', dest='regex',
                        metavar='REGEX')
    reopts.add_argument('regex', nargs='?',
                        metavar='REGEX')

    ap.add_argument('files', nargs='+', metavar='FILE')

    opts = ap.parse_args()
    ns = vars(opts)

    regex = ns.pop('regex')
    filenames = ns.pop('files')
    recursive = ns.pop('recursive')
    if opts.filesonly:
        if opts.filesonly == 'invert':
            opts.invert = True
        else:
            assert opts.filesonly == 'match', opts
            opts.invert = False
    opts.filesonly = bool(opts.filesonly)

    def main(regex=regex, filenames=filenames):
        # step 1
        regex = re.compile(regex)
        # step 2
        filenames = resolve_filenames(filenames, recursive)
        # step 3
        matches = search(filenames, regex, opts)
        matches = iter(matches)

        # step 4

        # Handle the first match.
        for filename, line in matches:
            if opts.quiet:
                return 0
            elif opts.filesonly:
                print(filename)
            else:
                for second in matches:
                    print(f'{filename}: {line}')
                    filename, line = second
                    print(f'{filename}: {line}')
                    break
                else:
                    print(line)
            break
        else:
            return 1

        # Handle the remaining matches.
        if opts.filesonly:
            for filename, _ in matches:
                print(filename)
        else:
            for filename, line in matches:
                print(f'{filename}: {line}')

        return 0
    rc = main()

    # step 5
    sys.exit(rc)

Model-specific details¶

Here are some implementation-specific details we had to deal with.

threads:

…

interpreters:

…

multiprocessing:

…

asyncio:

…

concurrent.futures¶

For threads, multiprocessing, and multiple interpreters *, you can also use concurrent.futures:

(expand)

def search_using_threads_cf(filenames, regex, opts):
    matches_by_file = queue.Queue()

    def do_background():
        MAX_FILES = 10
        MAX_MATCHES = 100

        def search_file(filename, matches):
            lines = iter_lines(filename)
            for match in search_lines(lines, regex, opts, filename):
                matches.put(match)  # blocking
            matches.put(None)  # blocking

        with ThreadPoolExecutor(MAX_FILES) as workers:
            for filename in filenames:
                # Prepare for the file.
                matches = queue.Queue(MAX_MATCHES)
                matches_by_file.put(matches)

                # Start a thread to process the file.
                workers.submit(search_file, filename, matches)
            matches_by_file.put(None)

    background = threading.Thread(target=do_background)
    background.start()

    # Yield the results as they are received, in order.
    matches = matches_by_file.get()  # blocking
    while matches is not None:
        match = matches.get()  # blocking
        while match is not None:
            yield match
            match = matches.get()  # blocking
        matches = matches_by_file.get()  # blocking

    background.join()

For processes`, use concurrent.futures.ProcessPoolExecutor. For interpreters, use InterpreterPoolExecutor. In both cases you must use the proper queue type and there are a few other minor differences.

Workload 2: …¶

TBD

Design and analysis¶

Design steps from above:

concurrency fits?

TBD
identify logical tasks

TBD
select concurrent tasks

TBD
concurrency-related characteristics

TBD
pick best model

TBD

Here are additional key constraints and considerations:

…

High-level code¶

# …

Side-by-side¶

Here’s the implementations for the different concurrency models, side-by-side for easy comparison:

sequential

threads

multiple interpreters

coroutines

multiple processes

concurrent.futures

(expand)

# sequential 3
...

(expand)

import threading

def task():
    ...

t = threading.Thread(target=task)
t.start()

...

(expand)

# subinterpreters 3
...

(expand)

# async 3
...

(expand)

import multiprocessing

def task():
    ...

...

(expand)

# concurrent.futures 2
...

Workload 3: …¶

TBD

Design and analysis¶

Design steps from above:

concurrency fits?

TBD
identify logical tasks

TBD
select concurrent tasks

TBD
concurrency-related characteristics

TBD
pick best model

TBD

Here are additional key constraints and considerations:

…

High-level code¶

# …

Side-by-side¶

Here’s the implementations for the different concurrency models, side-by-side for easy comparison:

sequential

threads

multiple interpreters

coroutines

multiple processes

concurrent.futures

(expand)

# sequential 3
...

(expand)

import threading

def task():
    ...

t = threading.Thread(target=task)
t.start()

...

(expand)

# subinterpreters 3
...

(expand)

# async 3
...

(expand)

import multiprocessing

def task():
    ...

...

(expand)

# concurrent.futures 3
...