Python Enhancement Proposals

  • Python »
  • PEP Index »

PEP 583 – A Concurrency Memory Model for Python

A couple definitions, sequential consistency, zombie values, inconsistent orderings, a happens-before race that’s not a sequentially-consistent race, self-justifying values, uninitialized values (direct), uninitialized values (flag), inconsistent guarantees from relying on data dependencies, data-race-free programs are sequentially consistent, no security holes from out-of-thin-air reads, restrict reorderings instead of defining happens-before, atomic, unordered assignments, two tiers of guarantees, adapt the x86 model, upgrading or downgrading to an alternate model, acknowledgements.

This PEP describes how Python programs may behave in the presence of concurrent reads and writes to shared variables from multiple threads. We use a happens before relation to define when variable accesses are ordered or concurrent. Nearly all programs should simply use locks to guard their shared variables, and this PEP highlights some of the strange things that can happen when they don’t, but programmers often assume that it’s ok to do “simple” things without locking, and it’s somewhat unpythonic to let the language surprise them. Unfortunately, avoiding surprise often conflicts with making Python run quickly, so this PEP tries to find a good tradeoff between the two.

So far, we have 4 major Python implementations – CPython, Jython , IronPython , and PyPy – as well as lots of minor ones. Some of these already run on platforms that do aggressive optimizations. In general, these optimizations are invisible within a single thread of execution, but they can be visible to other threads executing concurrently. CPython currently uses a GIL to ensure that other threads see the results they expect, but this limits it to a single processor. Jython and IronPython run on Java’s or .NET’s threading system respectively, which allows them to take advantage of more cores but can also show surprising values to other threads.

So that threaded Python programs continue to be portable between implementations, implementers and library authors need to agree on some ground rules.

Two simple memory models

Before talking about the details of data races and the surprising behaviors they produce, I’ll present two simple memory models. The first is probably too strong for Python, and the second is probably too weak.

In a sequentially-consistent concurrent execution, actions appear to happen in a global total order with each read of a particular variable seeing the value written by the last write that affected that variable. The total order for actions must be consistent with the program order. A program has a data race on a given input when one of its sequentially consistent executions puts two conflicting actions next to each other.

This is the easiest memory model for humans to understand, although it doesn’t eliminate all confusion, since operations can be split in odd places.

Happens-before consistency

The program contains a collection of synchronization actions , which in Python currently include lock acquires and releases and thread starts and joins. Synchronization actions happen in a global total order that is consistent with the program order (they don’t have to happen in a total order, but it simplifies the description of the model). A lock release synchronizes with all later acquires of the same lock. Similarly, given t = threading.Thread(target=worker) :

  • A call to t.start() synchronizes with the first statement in worker() .
  • The return from worker() synchronizes with the return from t.join() .
  • If the return from t.start() happens before (see below) a call to t.isAlive() that returns False , the return from worker() synchronizes with that call.

We call the source of the synchronizes-with edge a release operation on the relevant variable, and we call the target an acquire operation.

The happens before order is the transitive closure of the program order with the synchronizes-with edges. That is, action A happens before action B if:

  • A falls before B in the program order (which means they run in the same thread)
  • A synchronizes with B
  • You can get to B by following happens-before edges from A.

An execution of a program is happens-before consistent if each read R sees the value of a write W to the same variable such that:

  • R does not happen before W , and
  • There is no other write V that overwrote W before R got a chance to see it. (That is, it can’t be the case that W happens before V happens before R .)

You have a data race if two conflicting actions aren’t related by happens-before.

Let’s use the rules from the happens-before model to prove that the following program prints “[7]”:

  • Because myqueue is initialized in the main thread before thread1 or thread2 is started, that initialization happens before worker1 and worker2 begin running, so there’s no way for either to raise a NameError, and both myqueue.l and myqueue.cond are set to their final objects.
  • The initialization of x in worker1 happens before it calls myqueue.put() , which happens before it calls myqueue.l.append(x) , which happens before the call to myqueue.cond.release() , all because they run in the same thread.
  • In worker2 , myqueue.cond will be released and re-acquired until myqueue.l contains a value ( x ). The call to myqueue.cond.release() in worker1 happens before that last call to myqueue.cond.acquire() in worker2 .
  • That last call to myqueue.cond.acquire() happens before myqueue.get() reads myqueue.l , which happens before myqueue.get() returns, which happens before print y , again all because they run in the same thread.
  • Because happens-before is transitive, the list initially stored in x in thread1 is initialized before it is printed in thread2.

Usually, we wouldn’t need to look all the way into a thread-safe queue’s implementation in order to prove that uses were safe. Its interface would specify that puts happen before gets, and we’d reason directly from that.

Surprising behaviors with races

Lots of strange things can happen when code has data races. It’s easy to avoid all of these problems by just protecting shared variables with locks. This is not a complete list of race hazards; it’s just a collection that seem relevant to Python.

In all of these examples, variables starting with r are local variables, and other variables are shared between threads.

This example comes from the Java memory model :

Initially p is q and p.x == 0 . Thread 1 Thread 2 r1 = p r6 = p r2 = r1.x r6.x = 3 r3 = q r4 = r3.x r5 = r1.x Can produce r2 == r5 == 0 but r4 == 3 , proving that p.x went from 0 to 3 and back to 0.

A good compiler would like to optimize out the redundant load of p.x in initializing r5 by just re-using the value already loaded into r2 . We get the strange result if thread 1 sees memory in this order:

Evaluation Computes Why r1 = p r2 = r1.x r2 == 0 r3 = q r3 is p p.x = 3 Side-effect of thread 2 r4 = r3.x r4 == 3 r5 = r2 r5 == 0 Optimized from r5 = r1.x because r2 == r1.x.

From N2177: Sequential Consistency for Atomics , and also known as Independent Read of Independent Write (IRIW).

Initially, a == b == 0 . Thread 1 Thread 2 Thread 3 Thread 4 r1 = a r3 = b a = 1 b = 1 r2 = b r4 = a We may get r1 == r3 == 1 and r2 == r4 == 0 , proving both that a was written before b (thread 1’s data), and that b was written before a (thread 2’s data). See Special Relativity for a real-world example.

This can happen if thread 1 and thread 3 are running on processors that are close to each other, but far away from the processors that threads 2 and 4 are running on and the writes are not being transmitted all the way across the machine before becoming visible to nearby threads.

Neither acquire/release semantics nor explicit memory barriers can help with this. Making the orders consistent without locking requires detailed knowledge of the architecture’s memory model, but Java requires it for volatiles so we could use documentation aimed at its implementers.

From the POPL paper about the Java memory model [#JMM-popl].

Initially, x == y == 0 . Thread 1 Thread 2 r1 = x r2 = y if r1 != 0: if r2 != 0: y = 42 x = 42 Can r1 == r2 == 42 ???

In a sequentially-consistent execution, there’s no way to get an adjacent read and write to the same variable, so the program should be considered correctly synchronized (albeit fragile), and should only produce r1 == r2 == 0 . However, the following execution is happens-before consistent:

Statement Value Thread r1 = x 42 1 if r1 != 0: true 1 y = 42 1 r2 = y 42 2 if r2 != 0: true 2 x = 42 2

WTF, you are asking yourself. Because there were no inter-thread happens-before edges in the original program, the read of x in thread 1 can see any of the writes from thread 2, even if they only happened because the read saw them. There are data races in the happens-before model.

We don’t want to allow this, so the happens-before model isn’t enough for Python. One rule we could add to happens-before that would prevent this execution is:

If there are no data races in any sequentially-consistent execution of a program, the program should have sequentially consistent semantics.

Java gets this rule as a theorem, but Python may not want all of the machinery you need to prove it.

Also from the POPL paper about the Java memory model [#JMM-popl].

Initially, x == y == 0 . Thread 1 Thread 2 r1 = x r2 = y y = r1 x = r2 Can x == y == 42 ???

In a sequentially consistent execution, no. In a happens-before consistent execution, yes: The read of x in thread 1 is allowed to see the value written in thread 2 because there are no happens-before relations between the threads. This could happen if the compiler or processor transforms the code into:

Thread 1 Thread 2 y = 42 r2 = y r1 = x x = r2 if r1 != 42: y = r1

It can produce a security hole if the speculated value is a secret object, or points to the memory that an object used to occupy. Java cares a lot about such security holes, but Python may not.

From several classic double-checked locking examples.

Initially, d == None . Thread 1 Thread 2 while not d: pass d = [3, 4] assert d[1] == 4 This could raise an IndexError, fail the assertion, or, without some care in the implementation, cause a crash or other undefined behavior.

Thread 2 may actually be implemented as:

Because the assignment to d and the item assignments are independent, the compiler and processor may optimize that to:

Which is obviously incorrect and explains the IndexError. If we then look deeper into the implementation of r1.append(3) , we may find that it and d[1] cannot run concurrently without causing their own race conditions. In CPython (without the GIL), those race conditions would produce undefined behavior.

There’s also a subtle issue on the reading side that can cause the value of d[1] to be out of date. Somewhere in the implementation of list , it stores its contents as an array in memory. This array may happen to be in thread 1’s cache. If thread 1’s processor reloads d from main memory without reloading the memory that ought to contain the values 3 and 4, it could see stale values instead. As far as I know, this can only actually happen on Alphas and maybe Itaniums, and we probably have to prevent it anyway to avoid crashes.

From several more double-checked locking examples.

Initially, d == dict() and initialized == False . Thread 1 Thread 2 while not initialized: pass d[‘a’] = 3 r1 = d[‘a’] initialized = True r2 = r1 == 3 assert r2 This could raise a KeyError, fail the assertion, or, without some care in the implementation, cause a crash or other undefined behavior.

Because d and initialized are independent (except in the programmer’s mind), the compiler and processor can rearrange these almost arbitrarily, except that thread 1’s assertion has to stay after the loop.

This is a problem with Java final variables and the proposed data-dependency ordering in C++0x.

First execute: g = [] def Init (): g . extend ([ 1 , 2 , 3 ]) return [ 1 , 2 , 3 ] h = None Then in two threads: Thread 1 Thread 2 while not h: pass r1 = Init() assert h == [1,2,3] freeze(r1) assert h == g h = r1 If h has semantics similar to a Java final variable (except for being write-once), then even though the first assertion is guaranteed to succeed, the second could fail.

Data-dependent guarantees like those final provides only work if the access is through the final variable. It’s not even safe to access the same object through a different route. Unfortunately, because of how processors work, final’s guarantees are only cheap when they’re weak.

The rules for Python

The first rule is that Python interpreters can’t crash due to race conditions in user code. For CPython, this means that race conditions can’t make it down into C. For Jython, it means that NullPointerExceptions can’t escape the interpreter.

Presumably we also want a model at least as strong as happens-before consistency because it lets us write a simple description of how concurrent queues and thread launching and joining work.

Other rules are more debatable, so I’ll present each one with pros and cons.

We’d like programmers to be able to reason about their programs as if they were sequentially consistent. Since it’s hard to tell whether you’ve written a happens-before race, we only want to require programmers to prevent sequential races. The Java model does this through a complicated definition of causality, but if we don’t want to include that, we can just assert this property directly.

If the program produces a self-justifying value, it could expose access to an object that the user would rather the program not see. Again, Java’s model handles this with the causality definition. We might be able to prevent these security problems by banning speculative writes to shared variables, but I don’t have a proof of that, and Python may not need those security guarantees anyway.

The .NET [#CLR-msdn] and x86 [#x86-model] memory models are based on defining which reorderings compilers may allow. I think that it’s easier to program to a happens-before model than to reason about all of the possible reorderings of a program, and it’s easier to insert enough happens-before edges to make a program correct, than to insert enough memory fences to do the same thing. So, although we could layer some reordering restrictions on top of the happens-before base, I don’t think Python’s memory model should be entirely reordering restrictions.

Assignments of primitive types are already atomic. If you assign 3<<72 + 5 to a variable, no thread can see only part of the value. Jeremy Manson suggested that we extend this to all objects. This allows compilers to reorder operations to optimize them, without allowing some of the more confusing uninitialized values . The basic idea here is that when you assign a shared variable, readers can’t see any changes made to the new value before the assignment, or to the old value after the assignment. So, if we have a program like:

Initially, (d.a, d.b) == (1, 2) , and (e.c, e.d) == (3, 4) . We also have class Obj(object): pass . Thread 1 Thread 2 r1 = Obj() r3 = d r1.a = 3 r4, r5 = r3.a, r3.b r1.b = 4 r6 = e d = r1 r7, r8 = r6.c, r6.d r2 = Obj() r2.c = 6 r2.d = 7 e = r2 (r4, r5) can be (1, 2) or (3, 4) but nothing else, and (r7, r8) can be either (3, 4) or (6, 7) but nothing else. Unlike if writes were releases and reads were acquires, it’s legal for thread 2 to see (e.c, e.d) == (6, 7) and (d.a, d.b) == (1, 2) (out of order).

This allows the compiler a lot of flexibility to optimize without allowing users to see some strange values. However, because it relies on data dependencies, it introduces some surprises of its own. For example, the compiler could freely optimize the above example to:

Thread 1 Thread 2 r1 = Obj() r3 = d r2 = Obj() r6 = e r1.a = 3 r4, r7 = r3.a, r6.c r2.c = 6 r5, r8 = r3.b, r6.d r2.d = 7 e = r2 r1.b = 4 d = r1

As long as it didn’t let the initialization of e move above any of the initializations of members of r2 , and similarly for d and r1 .

This also helps to ground happens-before consistency. To see the problem, imagine that the user unsafely publishes a reference to an object as soon as she gets it. The model needs to constrain what values can be read through that reference. Java says that every field is initialized to 0 before anyone sees the object for the first time, but Python would have trouble defining “every field”. If instead we say that assignments to shared variables have to see a value at least as up to date as when the assignment happened, then we don’t run into any trouble with early publication.

Most other languages with any guarantees for unlocked variables distinguish between ordinary variables and volatile/atomic variables. They provide many more guarantees for the volatile ones. Python can’t easily do this because we don’t declare variables. This may or may not matter, since python locks aren’t significantly more expensive than ordinary python code. If we want to get those tiers back, we could:

  • Introduce a set of atomic types similar to Java’s [5] or C++’s [6] . Unfortunately, we couldn’t assign to them with = .
  • Without requiring variable declarations, we could also specify that all of the fields on a given object are atomic.
  • Extend the __slots__ mechanism [7] with a parallel __volatiles__ list, and maybe a __finals__ list.

We could just adopt sequential consistency for Python. This avoids all of the hazards mentioned above, but it prohibits lots of optimizations too. As far as I know, this is the current model of CPython, but if CPython learned to optimize out some variable reads, it would lose this property.

If we adopt this, Jython’s dict implementation may no longer be able to use ConcurrentHashMap because that only promises to create appropriate happens-before edges, not to be sequentially consistent (although maybe the fact that Java volatiles are totally ordered carries over). Both Jython and IronPython would probably need to use AtomicReferenceArray or the equivalent for any __slots__ arrays.

The x86 model is:

  • Loads are not reordered with other loads.
  • Stores are not reordered with other stores.
  • Stores are not reordered with older loads.
  • Loads may be reordered with older stores to different locations but not with older stores to the same location.
  • In a multiprocessor system, memory ordering obeys causality (memory ordering respects transitive visibility).
  • In a multiprocessor system, stores to the same location have a total order.
  • In a multiprocessor system, locked instructions have a total order.
  • Loads and stores are not reordered with locked instructions.

In acquire/release terminology, this appears to say that every store is a release and every load is an acquire. This is slightly weaker than sequential consistency, in that it allows inconsistent orderings , but it disallows zombie values and the compiler optimizations that produce them. We would probably want to weaken the model somehow to explicitly allow compilers to eliminate redundant variable reads. The x86 model may also be expensive to implement on other platforms, although because x86 is so common, that may not matter much.

We can adopt an initial memory model without totally restricting future implementations. If we start with a weak model and want to get stronger later, we would only have to change the implementations, not programs. Individual implementations could also guarantee a stronger memory model than the language demands, although that could hurt interoperability. On the other hand, if we start with a strong model and want to weaken it later, we can add a from __future__ import weak_memory statement to declare that some modules are safe.

Implementation Details

The required model is weaker than any particular implementation. This section tries to document the actual guarantees each implementation provides, and should be updated as the implementations change.

Uses the GIL to guarantee that other threads don’t see funny reorderings, and does few enough optimizations that I believe it’s actually sequentially consistent at the bytecode level. Threads can switch between any two bytecodes (instead of only between statements), so two threads that concurrently execute:

with i initially 0 could easily end up with i==1 instead of the expected i==2 . If they execute:

instead, CPython 2.6 will always give the right answer, but it’s easy to imagine another implementation in which this statement won’t be atomic.

Also uses a GIL, but probably does enough optimization to violate sequential consistency. I know very little about this implementation.

Provides true concurrency under the Java memory model and stores all object fields (except for those in __slots__ ?) in a ConcurrentHashMap , which provides fairly strong ordering guarantees. Local variables in a function may have fewer guarantees, which would become visible if they were captured into a closure that was then passed to another thread.

Provides true concurrency under the CLR memory model, which probably protects it from uninitialized values . IronPython uses a locked map to store object fields, providing at least as many guarantees as Jython.

Thanks to Jeremy Manson and Alex Martelli for detailed discussions on what this PEP should look like.

This document has been placed in the public domain.

Source: https://github.com/python/peps/blob/main/peps/pep-0583.rst

Last modified: 2023-09-09 17:39:29 GMT

CSC148 Course Notes

1.2 The Python Memory Model: Functions and Parameters

1.2 the python memory model: functions and parameters #, terminology #.

Let’s use this simple example to review some terminology that should be familiar to you:

In the function declaration, each variable in the parentheses is called a parameter . Here, n and s are parameters of function mess_about . When we call a function, each expression in the parentheses is called an argument . The arguments in our one call to mess_about are count and word .

How function calls are tracked #

Python must keep track of the function that is currently running, and any variables defined inside of it. It stores this information in something called a stack frame , or just “frame” for short.

Every time we call a function, the following happens:

A new frame is created and placed on top of any frames that may already exist. We call this pile of frames the call stack .

Each parameter is defined inside that frame.

The arguments in the function call are evaluated, in order from left to right. Each is an expression, and evaluating it yields the id of an object. Each of these ids is assigned to the corresponding parameter.

Then the body of the function is executed.

In the body of the function there may be assignment statements. We know that if the variable on the left-hand-side of the assignment doesn’t already exist, Python will create it. But with the awareness that there may be a stack of frames, we need a slightly more detailed rule:

If the variable on the left-hand-side of the assignment doesn’t already exist in the top stack frame , Python will create it in that top stack frame .

For example, if we stop our above sample code right before printing message , this is the state of memory:

A memory model diagram showing the state of memory before printing `message`.

Notice that the top stack frame, for our call to mess_about , includes the new variable message . We say that any new variables defined inside a function are local variables ; they are local to a call to that function.

When a function returns, either due to executing a return statement or getting to the end of the function, the frame for that function call is deleted. All the variables defined in it—both parameters and local variables—disappear. If we try to refer to them after the function has returned, we get an error. For example, when we are about to execute the final line in this program,

this is the state of memory,

variables

which explains why the final line produces the error NameError: name 'n' is not defined .

Passing an argument creates an alias #

What we often call “parameter passing” can be thought of as essentially variable assignment. In the example above, it is as if we wrote

before the body of the function.

If an argument to a function is a variable, what we assign to the function’s parameter is the id of the object that the variable references. This creates an alias. As you should expect, what the function can do with these aliases depends on whether or not the object is mutable.

Passing a reference to an immutable object #

If we pass a reference to an immutable object, we can do whatever we want with the parameter and there will be no effect outside the function.

Here’s an example:

This code prints plain old moo . The reason is that, although we set up an alias, we don’t (and can’t) change the object that both word and s reference; we make a new object. Here’s the state of memory right before the function returns:

variables

Once the function is over and the stack frame is gone, the string object we want (with moomoo! ) will be inaccessible. The net effect of this function is nothing at all. It doesn’t change the object that s refers to, it doesn’t return anything, and it has no other effect such as taking user input or printing to the screen. The one thing it does do, making s refer to something new, doesn’t last beyond the function call.

If we want to use this function to change word , the solution is to return the new value and then, in the calling code, assign that value to word :

This code prints out moomoo! . Notice that we changed the function name from emphasize to emphasized . This makes sense when we consider the context of the function call:

Our function call is not merely performing some action, it is returning a value. So the expression on the right-hand side has a value: it is the emphasized word.

Passing a reference to a mutable object #

If we wrote code analogous to the broken code in Example 3, but with a mutable type, it wouldn’t work either. For example:

This code prints ['winter', 'is', 'coming'] for the same reason we saw in Example 3. Changing a reference (in this case, making lst refer to something new) is not the same as mutating a value (in this case, mutating the list object whose id was passed to the function). This model of memory illustrates:

variables

The code below, however, correctly mutates the object:

This is the state of memory immediately before function emphasize returns:

variables

Here are some things to notice:

When we begin this program, we are executing the module as a whole. We make an initial frame to track its variables, and put the module name in the upper-left corner.

When we call emphasize , a new frame is added to the call stack. In the upper-left corner of the frame, we write the function name.

The parameter lst exists in the stack frame. It comes into being when the function is called. And when the function returns, this frame will be discarded, along with everything in it. At that point, lst no longer exists.

When we pass argument sentence to emphasize , we assign it to lst . In other words, we set lst to id2 , which creates an alias.

id2 is a reference to a list object, which is mutable. When we use lst to access and change that object, the object that sentence references also changed. Of course it does: they are the same object!

Moral of the story #

The situation gets trickier when we have objects that contain references to other objects, and you’ll see examples of this in the work you do this term. The bottom line is this: know whether your objects are mutable—at each level of their structure. Memory model diagrams offer a concise visual way to represent that.

Python Variables and Assignment

Python variables, variable assignment rules, every value has a type, memory and the garbage collector, variable swap, variable names are superficial labels, assignment = is shallow, decomp by var.

This browser is no longer supported.

Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.

December 2012

Volume 27 Number 12

C# - The C# Memory Model in Theory and Practice

By Igor Ostrovsky | December 2012

This is the first of a two-part series that will tell the long story of the C# memory model. The first part explains the guarantees the C# memory model makes and shows the code patterns that motivate the guarantees; the second part will detail how the guarantees are achieved on different hardware architectures in the Microsoft .NET Framework 4.5.

One source of complexity in multithreaded programming is that the compiler and the hardware can subtly transform a program’s memory operations in ways that don’t affect the single-threaded behavior, but might affect the multithreaded behavior. Consider the following method:

If _data and _initialized are ordinary (that is, non-volatile) fields, the compiler and the processor are allowed to reorder the operations so that Init executes as if it were written like this:

There are various optimizations in both compilers and processors that can result in this kind of reordering, as I’ll discuss in Part 2.

In a single-threaded program, the reordering of statements in Init makes no difference in the meaning of the program. As long as both _initialized and _data are updated before the method returns, the order of the assignments doesn’t matter. In a single-threaded program, there’s no second thread that could observe the state in between the updates.

In a multithreaded program, however, the order of the assignments may matter because another thread might read the fields while Init is in the middle of execution. Consequently, in the reordered version of Init, another thread may observe _initialized=true and _data=0.

The C# memory model is a set of rules that describes what kinds of memory-operation reordering are and are not allowed. All programs should be written against the guarantees defined in the specification.

However, even if the compiler and the processor are allowed to reorder memory operations, it doesn’t mean they always do so in practice. Many programs that contain a “bug” according to the abstract C# memory model will still execute correctly on particular hardware running a particular version of the .NET Framework. Notably, the x86 and x64 processors reorder operations only in certain narrow scenarios, and similarly the CLR just-in-time (JIT) compiler doesn’t perform many of the transformations it’s allowed to.

Although the abstract C# memory model is what you should have in mind when writing new code, it can be helpful to understand the actual implementation of the memory model on different architectures, in particular when trying to understand the behavior of existing code.

C# Memory Model According to ECMA-334

The authoritative definition of the C# memory model is in the Standard ECMA-334 C# Language Specification ( bit.ly/MXMCrN ). Let’s discuss the C# memory model as defined in the specification.

Memory Operation Reordering According to ECMA-334, when a thread reads a memory location in C# that was written to by a different thread, the reader might see a stale value. This problem is illustrated in Figure 1 .

Figure 1 Code at Risk of Memory Operation Reordering

Suppose Init and Print are called in parallel (that is, on different threads) on a new instance of DataInit. If you examine the code of Init and Print, it may seem that Print can only output “42” or “Not initialized.” However, Print can also output “0.”

The C# memory model permits reordering of memory operations in a method, as long as the behavior of single-threaded execution doesn’t change. For example, the compiler and the processor are free to reorder the Init method operations as follows:

This reordering wouldn’t change the behavior of the Init method in a single-threaded program. In a multithreaded program, however, another thread might read _initialized and _data fields after Init has modified one field but not the other, and then the reordering could change the behavior of the program. As a result, the Print method could end up outputting a “0.”

The reordering of Init isn’t the only possible source of trouble in this code sample. Even if the Init writes don’t end up reordered, the reads in the Print method could be transformed:

Just as with the reordering of writes, this transformation has no effect in a single-threaded program, but might change the behavior of a multithreaded program. And, just like the reordering of writes, the reordering of reads can also result in a 0 printed to the output.

In Part 2 of this article, you’ll see how and why these transformations take place in practice when I look at different hardware architectures in detail.

Volatile Fields The C# programming language provides volatile fields that constrain how memory operations can be reordered. The ECMA specification states that volatile fields provide acquire-­release semantics ( bit.ly/NArSlt ).

A read of a volatile field has acquire semantics, which means it can’t be reordered with subsequent operations. The volatile read forms a one-way fence: preceding operations can pass it, but subsequent operations can’t. Consider this example:

Read 1 and Read 3 are non-volatile, while Read 2 is volatile. Read 2 can’t be reordered with Read 3, but it can be reordered with Read 1. Figure 2 shows the valid reorderings of the Foo body.

Figure 2 Valid Reordering of Reads in AcquireSemanticsExample

A write of a volatile field, on the other hand, has release semantics, and so it can’t be reordered with prior operations. A volatile write forms a one-way fence, as this example demonstrates:

Write 1 and Write 3 are non-volatile, while Write 2 is volatile. Write 2 can’t be reordered with Write 1, but it can be reordered with Write 3. Figure 3 shows the valid reorderings of the Foo body.

Figure 3 Valid Reordering of Writes in ReleaseSemanticsExample

I’ll come back to the acquire-release semantics in the “Publication via Volatile Field” section later in this article.

Atomicity Another issue to be aware of is that in C#, values aren’t necessarily written atomically into memory. Consider this example:

If one thread repeatedly calls SetValue and another thread calls GetValue, the getter thread might observe a value that was never written by the setter thread. For example, if the setter thread alternately calls SetValue with Guid values (0,0,0,0) and (5,5,5,5), GetValue could observe (0,0,0,5) or (0,0,5,5) or (5,5,0,0), even though none of those values was ever assigned using SetValue.

The reason behind the “tearing” is that the assignment “_value = value” doesn’t execute atomically at the hardware level. Similarly, the read of _value also doesn’t execute atomically.

The C# ECMA specification guarantees that the following types will be written atomically: reference types, bool, char, byte, sbyte, short, ushort, uint, int and float. Values of other types—including user-defined value types—could be written into memory in multiple atomic writes. As a result, a reading thread could observe a torn value consisting of pieces of different values.

One caveat is that even the types that are normally read and written atomically (such as int) could be read or written non-atomically if the value is not correctly aligned in memory. Normally, C# will ensure that values are correctly aligned, but the user is able to override the alignment using the StructLayoutAttribute class ( bit.ly/Tqa0MZ ).

Non-Reordering Optimizations Some compiler optimizations may introduce or eliminate certain memory operations. For example, the compiler might replace repeated reads of a field with a single read. Similarly, if code reads a field and stores the value in a local variable and then repeatedly reads the variable, the compiler could choose to repeatedly read the field instead.

Because the ECMA C# spec doesn’t rule out the non-reordering optimizations, they’re presumably allowed. In fact, as I’ll discuss in Part 2, the JIT compiler does perform these types of optimizations.

Thread Communication Patterns

The purpose of a memory model is to enable thread communication. When one thread writes values to memory and another thread reads from memory, the memory model dictates what values the reading thread might see.

Locking Locking is typically the easiest way to share data among threads. If you use locks correctly, you basically don’t have to worry about any of the memory model messiness.

Whenever a thread acquires a lock, the CLR ensures that the thread will see all updates made by the thread that held the lock earlier. Let’s add locking to the example from the beginning of this article, as shown in Figure 4 .

Figure 4 Thread Communication with Locking

Adding a lock that Print and Set acquire provides a simple solution. Now, Set and Print execute atomically with respect to each other. The lock statement guarantees that the bodies of Print and Set will appear to execute in some sequential order, even if they’re called from multiple threads.

The diagram in Figure 5 shows one possible sequential order that could happen if Thread 1 calls Print three times, Thread 2 calls Set once and Thread 3 calls Print once.

Sequential Execution with Locking

When a locked block of code executes, it’s guaranteed to see all writes from blocks that precede the block in the sequential order of the lock. Also, it’s guaranteed not to see any of the writes from blocks that follow it in the sequential order of the lock.

In short, locks hide all of the unpredictability and complexity weirdness of the memory model: You don’t have to worry about the reordering of memory operations if you use locks correctly. However, note that the use of locking has to be correct. If only Print or Set uses the lock—or Print and Set acquire two different locks—memory operations can become reordered and the complexity of the memory model comes back.

Publication via Threading API Locking is a very general and powerful mechanism for sharing state among threads. Publication via threading API is another frequently used pattern of concurrent programming.

The easiest way to illustrate publication via threading API is by way of an example:

When you examine the preceding code sample, you’d probably expect “42” to be printed to the screen. And, in fact, your intuition would be correct. This code sample is guaranteed to print “42.”

It might be surprising that this case even needs to be mentioned, but in fact there are possible implementations of StartNew that would allow “0” to be printed instead of “42,” at least in theory. After all, there are two threads communicating via a non-volatile field, so memory operations can be reordered. The pattern is displayed in the diagram in Figure 6 .

Two Threads Communicating via a Non-Volatile Field

The StartNew implementation must ensure that the write to s_value on Thread 1 will not move after <start task t>, and the read from s_value on Thread 2 will not move before <task t starting>. And, in fact, the StartNew API really does guarantee this.

All other threading APIs in the .NET Framework, such as Thread.Start and ThreadPool.QueueUserWorkItem, also make a similar guarantee. In fact, nearly every threading API must have some barrier semantics in order to function correctly. These are almost never documented, but can usually be deduced simply by thinking about what the guarantees would have to be in order for the API to be useful.

Publication via Type Initialization Another way to safely publish a value to multiple threads is to write the value to a static field in a static initializer or a static constructor. Consider this example:

If Test3.PrintValue is called from multiple threads concurrently, is it guaranteed that each PrintValue call will print “42” and “false”? Or, could one of the calls also print “0” or “true”? Just as in the previous case, you do get the behavior you’d expect: Each thread is guaranteed to print “42” and “false.”

The patterns discussed so far all behave as you’d expect. Now I’ll get to cases whose behavior may be surprising.

Publication via Volatile Field Many concurrent programs can be built using the three simple patterns discussed so far, used together with concurrency primitives in the .NET System.Threading and System.Collections.Concurrent namespaces.

The pattern I’m about to discuss is so important that the semantics of the volatile keyword were designed around it. In fact, the best way to remember the volatile keyword semantics is to remember this pattern, instead of trying to memorize the abstract rules explained earlier in this article.

Let’s start with the example code in Figure 7 . The DataInit class in Figure 7 has two methods, Init and Print; both may be called from multiple threads. If no memory operations are reordered, Print can only print “Not initialized” or “42,” but there are two possible cases when Print could print a “0”:

  • Write 1 and Write 2 were reordered.
  • Read 1 and Read 2 were reordered.

Figure 7 Using the Volatile Keyword

If _initialized were not marked as volatile, both reorderings would be permitted. However, when _initialized is marked as volatile, neither reordering is allowed! In the case of writes, you have an ordinary write followed by a volatile write, and a volatile write can’t be reordered with a prior memory operation. In the case of the reads, you have a volatile read followed by an ordinary read, and a volatile read can’t be reordered with a subsequent memory operation.

So, Print will never print “0,” even if called concurrently with Init on a new instance of DataInit.

Note that if the _data field is volatile but _initialized is not, both reorderings would be permitted. As a result, remembering this example is a good way to remember the volatile semantics.

Lazy Initialization One common variant of publication via volatile field is lazy initialization. The example in Figure 8 illustrates lazy initialization.

Figure 8 Lazy Initialization

In this example, LazyGet is always guaranteed to return “42.” However, if the _box field were not volatile, LazyGet would be allowed to return “0” for two reasons: the reads could get reordered, or the writes could get reordered.

To further emphasize the point, consider this class:

Now, it’s possible—at least in theory—that PrintValue will print “0” due to a memory-model issue. Here’s a usage example of BoxedInt that allows it:

Because the BoxedInt instance was incorrectly published (through a non-volatile field, _box), the thread that calls Print may observe a partially constructed object! Again, making the _box field volatile would fix the issue.

Interlocked Operations and Memory Barriers Interlocked operations are atomic operations that can be used at times to reduce locking in a multithreaded program. Consider this simple thread-safe counter class:

Using Interlocked.Increment, you can rewrite the program like this:

As rewritten with Interlocked.Increment, the method should execute faster, at least on some architectures. In addition to the increment operations, the Interlocked class ( bit.ly/RksCMF ) exposes methods for various atomic operations: adding a value, conditionally replacing a value, replacing a value and returning the original value, and so forth.

All Interlocked methods have one very interesting property: They can’t be reordered with other memory operations. So no memory operation, whether before or after an Interlocked operation, can pass an Interlocked operation.

An operation that’s closely related to Interlocked methods is Thread.MemoryBarrier, which can be thought of as a dummy Interlocked operation. Just like an Interlocked method, Thread.Memory­Barrier can’t be reordered with any prior or subsequent memory operations. Unlike an Interlocked method, though, Thread.MemoryBarrier has no side effect; it simply constrains memory reorderings.

Polling Loop Polling loop is a pattern that’s generally not recommended but—somewhat unfortunately—frequently used in practice. Figure 9 shows a broken polling loop.

Figure 9 Broken Polling Loop

In this example, the main thread loops, polling a particular non-volatile field. A helper thread sets the field in the meantime, but the main thread may never see the updated value.

Now, what if the _loop field was marked volatile? Would that fix the program? The general expert consensus seems to be that the compiler isn’t allowed to hoist a volatile field read out of a loop, but it’s debatable whether the ECMA C# specification makes this guarantee.

On one hand, the specification states only that volatile fields obey the acquire-release semantics, which doesn’t seem sufficient to prevent hoisting of a volatile field. On the other hand, the example code in the specification does in fact poll a volatile field, implying that the volatile field read can’t be hoisted out of the loop.

On x86 and x64 architectures, PollingLoopExample.Main will typically hang. The JIT compiler will read test1._loop field just once, save the value in a register, and then loop until the register value changes, which will obviously never happen.

If the loop body contains some statements, however, the JIT compiler will probably need the register for some other purpose, so each iteration may end up rereading test1._loop. As a result, you may end up seeing loops in existing programs that poll a non-­volatile field and yet happen to work.

Concurrency Primitives Much concurrent code can benefit from high-level concurrency primitives that became available in the .NET Framework 4. Figure 10 lists some of the .NET concurrency primitives.

Figure 10 Concurrency Primitives in the .NET Framework 4

By using these primitives, you can often avoid low-level code that depends on the memory model in intricate ways (via volatile and the like).

So far, I’ve described the C# memory model as defined in the ECMA C# specification, and discussed the most important patterns of thread communication that define the memory model.

The second part of this article will explain how the memory model is actually implemented on different architectures, which is helpful for understanding the behavior of programs in the real world.

Best Practices

  • All code you write should rely only on the guarantees made by the ECMA C# specification, and not on any of the implementation details explained in this article.
  • Avoid unnecessary use of volatile fields. Most of the time, locks or concurrent collections (System.Collections.Concurrent.*) are more appropriate for exchanging data between threads. In some cases, volatile fields can be used to optimize concurrent code, but you should use performance measurements to validate that the benefit outweighs the extra complexity.
  • Instead of implementing the lazy initialization pattern yourself using a volatile field, use the System.Lazy<T> and System.Threading.LazyInitializer types.
  • Avoid polling loops. Often, you can use a BlockingCollection<T>, Monitor.Wait/Pulse, events or asynchronous programming instead of a polling loop.
  • Whenever possible, use the standard .NET concurrency primitives instead of implementing equivalent functionality yourself.

Igor Ostrovsky   is a senior software development engineer at Microsoft. He has worked on Parallel LINQ, the Task Parallel Library, and other parallel libraries and primitives in the Microsoft .NET Framework. Ostrovsky blogs about programming topics at igoro.com .

Thanks to the following technical expert for reviewing this article: Joe Duffy, Eric Eilebrecht, Joe Hoag, Emad Omara, Grant Richins, Jaroslav Sevcik and Stephen Toub

Additional resources

PCRS Homepage

  • Python Modules
  • Contributors

Python Programming Modules

Project leads: jennifer campbell and paul gries.

This website contains materials for use by other instructors developing a blended or online course where Python programming skills are being developed. The materials are licensed under a Creative Commons Attribution 4.0 International license, so they can be modified with attribution to the authors.

The curriculum consists of twelve weeks of modules. Each module consists of video content that is available through the web via the links below. At the University of Toronto, the video content is supplemented with online, interactive exercises served through PCRS . The PCRS software is available for download , but the Python exercises used at U of T are not available for distribution.

  • Python as a Calculator (Video 1, 5:35)
  • Python as a Calculator (Video 2, 2:02)
  • Python as a Calculator (Video 3, 2:44)
  • Python and Computer Memory (2:42)
  • Variables (Video 1, 5:37)
  • Variables (Video 2, 0:27)
  • Variables (Video 3, 1:36)
  • Visualizing Assignment Statements (2:45)
  • Built-in Functions (Video 1, 2:08)
  • Built-in Functions (Video 2) (1:46)
  • Built-in Functions (Video 3)
  • Arithmetic Operators: mod (Exercise explanation, 1:31)
  • Arithmetic operators: precedence (Exercise explanation, 3:06)
  • Arithmetic Operators: division, modulo (Exercise explanation, 2:23)
  • Built-in Functions (Exercise explanation, 1:49)
  • Built-in Functions: help (Exercise explanation, 2:37)
  • Built-in Functions: round (Exercise explanation)
  • Variable Assignment Memory Model (Problem explanation, 2:02)
  • Variable Swap (Problem explanation, 3:03)
  • Changing variable values (Problem explanation, 4:18)
  • Defining Functions (Video 1, 2:39)
  • Defining Functions (Video 2, 4:02)
  • Type str: Strings in Python (Video 1, 2:26)
  • Type str: Strings in Python (Video 2, 2:28)
  • Input/Output and str formatting (Video 2, 2:51)
  • Input/Output and str formatting (Video 3, 3:07)
  • Input/Output and str formatting (Video 1, 4:17)
  • Docstrings and Function help (0:54)
  • Function Design Recipe (Video 1, 3:18)
  • Function Design Recipe (Video 2, 2:12)
  • Function Reuse (Video 1, 3:41)
  • Function Reuse (Video 2, 1:50)
  • Visualizing Function Calls (5:35)
  • Constants (2:09)
  • Function Definition (Problem explanation, 5:29)
  • Repeat Word (Problem explanation. 2:49)
  • Defining function distance (Worked example, 5:27)
  • calculate_tax (Problem explanation)
  • number_of_cents (Problem explanation)
  • Function Reuse (Problem explanation, 3:10)
  • Nested Function Calls (Problem explanation)
  • Import: Using Non-Builtin Functions (5:26)
  • Type Bool (Video 1, 2:19)
  • Type Bool (Video 2, 1:28)
  • Type Bool (Video 3, 3:35)
  • Type Bool (Video 4, 2:48)
  • Converting between int, str, and float (Video 1, 2:04)
  • Converting between int, str, and float (Video 2, 1:12)
  • If statements (Video 1, 3:40)
  • If Statements (Video 2, 5:08)
  • No if required (4:15)
  • Structuring If Statements (Video 1, 2:33)
  • Structuring If Statements (Video 2, 3:32)
  • More str Operators (2:12)
  • str: Indexing and Slicing (4:53)
  • A1 Checker and PyTA
  • earlier_name (0:49)
  • format_name (1:28)
  • ticket_price (2:09)
  • No if required (3:04)
  • Function is_right_triangle (1:20)
  • different_types (1:56)
  • same_abs (1:05)
  • print vs. return (Problem explanation, 1:55)
  • String indexing (Problem explanation, 2:16)
  • String slicing (Problem explanation, 1:25)
  • String: in operator and function len (Problem explanation, 4:11)
  • str methods: Functions inside Objects (3:41)
  • for loop over str (Video 1, 2:21)
  • for loop over str (Video 2, 4:36)
  • for loop over str (Video 3, 4:32)
  • Functions, Variables, and the Call Stack (4:31)
  • Wing's Debugger
  • while Loops (Video 1, 2:29)
  • while Loops (Video 2, 4:50)
  • while Loops (Video 3, 4:58)
  • str method: lower (Problem explanation, 3:07)
  • str methods: find (Problem explanation, 3:01)
  • str: alphanumeric methods (Problem explanation, 2:13)
  • Add Underscores (Problem explanation, 3:29)
  • All Fluffy (Problem explanation, 4:46)
  • Count Uppercase (Problem explanation, 2:15)
  • Is IP Address (Problem explanation, 8:01)
  • Enter a number (4:07)
  • every_nth_character (Problem explanation, 2:41)
  • find_letter_n_times (Problem explanation, 4:17)
  • Comments (4:56)
  • Type list (Video 1, 3:18)
  • Type list (Video 2, 1:03)
  • List Methods (Video 1, 2:44)
  • List Methods (Video 2, 2:54)
  • List Methods (Video 3, 2:23)
  • Module typing
  • Mutability and Aliasing (Video 1, 4:17)
  • Mutability and Aliasing (Video 2, 5:45)
  • range (3:34)
  • for Loops over Indices (Video 1, 7:40)
  • for Loops over Indices (Video 2, 6:39)
  • Parallel Lists and Strings (Video 1, 3:36)
  • Parallel Lists and Strings (Video 2, 2:22)
  • Collatz Conjecture (Problem explanation, 1:45)
  • List aliasing (Problem explanation, 3:35)
  • List methods (Problem explanation, 1:20)
  • List operations and methods (Problem explanation, 3:38)
  • Function range (Problem explanation, 4:47)
  • Collect Underperformers (Problem explanation, 2:24)
  • Scale midterm grades (Problem explanation, 3:43)
  • Calculate Triangle Areas (3:48)
  • Greatest Difference (Problem explanation, 2:54)
  • Stretch String (Problem explanation, 3:31)
  • Nested Lists (4:03)
  • Nested Loops (Video 1, 2:16)
  • Nested Loops (Video 2, 11:35)
  • Nested list: creation (Problem explanation, 3:14)
  • Nested list: indexing (Problem explanation, 2:07)
  • Nested list: len (Problem explanation, 2:40)
  • can_pay_with_two_coins (Problem explanation, 4:15)
  • digital_sums (Problem explanation, 2:41)
  • Reading Files (4:43)
  • Writing Files (5:18)
  • Tuples (2:07)
  • Type dict (Video 1, 3:21)
  • Type dict (Video 2, 6:32)
  • Inverting a Dictionary (4:29)
  • Populating a Dictionary (7:18)
  • File writing (Problem explanation, 1:47)
  • is_correct (Problem explanation, 2:52)
  • write_ascii_triangle (Problem explanation, 3:40)
  • File reading multiple-choice (Problem explanation, 2:47)
  • build_placements (Problem explanation, 2:31)
  • express_checkout (Problem explanation, 2:29)
  • get_team (2:20)
  • name_to_binomial (Problem explanation, 2:36)
  • animal_to_locomotion (Problem explanation, 2:58)
  • Palindrome: Approaching the Problem (Video 1, 0:49)
  • Palindrome: Approaching the Problem (Video 2, 4:26)
  • Palindrome: Approaching the Problem (Video 3, 1:40)
  • Palindrome: Algorithm 1 (5:44)
  • Palindrome: Algorithm 2 (Video 1, 4:12)
  • Palindrome: Algorithm 2 (Video 2, 2:03)
  • Palindrome: Algorithm 3 (5:08)
  • The Restaurant Recommendation Problem (2:44)
  • Restaurant Recommendations: Representing the Data (6:19)
  • Restaurant Recommendations: Planning the Program (Video 1, 7:23)
  • Restaurant Recommendations: Planning the Program (Video 2, 5:48)
  • Restaurant Recommendations: Planning the Program (Video 3, 3:36)
  • Random Story Generation 1 (Problem explanation, 0:16)
  • Random Story Generation 1 (Problem explanation, 0:38)
  • Random Story Generation 1 (Problem explanation, 0:41)
  • Random Story Generation 1 (Problem Explanation, 0:19)
  • Random Story Generation: representing the context and next words
  • Random Story Generation: Planning the program (5:48)
  • Testing Automatically using doctest (Video 1, 6:17)
  • Testing Automatically using doctest (Video 2, 0:27)
  • Writing a '__main__' Program (Video 1, 2:57))
  • Writing a '__main__' Program (Video 2, 1:45)
  • Intro to Classes (Video 1, 1:17)
  • Intro to Classes (Video 2, 1:45)
  • Intro to Classes (Video 3,1:02)
  • Intro to Classes (Video 4, 2:11)
  • Testing Automatically using unitest (Video 1, 2:07)
  • Testing Automatically using unitest (Video 2, 2:32)
  • Choosing Test Cases (Video 1, 3:06)
  • Choosing Test Cases (Video 2, 3:56)
  • Testing Functions that Mutate Values (3:26)
  • Random Story Generation: conclusion (Problem explanation, 11:38)
  • Choosing tests for all_fluffy (Problem explanation, 5:02)
  • Choosing tests for is_teenager (Problem explanation, 3:09)
  • Choosing tests for same_abs (Problem explanation, 0:51)
  • Test most_popular (Problem explanation, 2:17)
  • Testing collect_underperformers mutation (Problem explanation, 3:10)
  • unittests for collect_underperformers (Problem explanation, 5:33)
  • Analyzing Algorithms (6:36)
  • Linear Search (Video 1, 1:34)
  • Linear Search (Video 2, 1:22)
  • Linear Search (Video 3, 3:44)
  • Binary Search (Video 1, 3:02)
  • Binary Search (Video 2, 1:24)
  • Binary Search (Video 3, 1:49)
  • Binary Search (Video 4, 0:25)
  • Comparing Search Algorithms (Video 1, 1:46)
  • Comparing Search Algorithms (Video 2, 6:09)
  • Comparing Search Algorithms (Video 3, 4:09)
  • Comparing Search Algorithms (Video 4, 0:22)
  • Bubble Sort (Video 1, 4:13)
  • Bubble Sort (Video 2, 1:39)
  • Selection Sort (Video 1, 2:18)
  • Selection Sort (Video 2, 1:23)
  • Insertion Sort (Video 1, 2:11)
  • Insertion Sort (Video 2, 2:42)
  • Runtime complexity (Problem explanation, video 2, 2:05)
  • Runtime complexity (Problem explanation, video 1, 4:34)
  • Identify Sorting Algorithms (Problem explanation, 5:11)
  • Insertion Sort: Best case (Problem explanation, 4:07)
  • Insertion Sort: Worst case (Problem explanation, 7:58)
  • Selection Sort: Analysis (Problem explanation, 4:22)
  • Creating a New Type (Video 1, 1:32)
  • Creating a New Type (Video 2, 2:13)
  • Creating a New Type (Video 3, 1:34)
  • Creating a New Type (Video 4, 2:21)
  • Creating a New Type (Video 5, 1:01)
  • Plugging Your Classes Into Python Syntax (Video 1, 2:02)
  • Plugging Your Classes Into Python Syntax (Video 2, 8:51)
  • Writing Special Method str (Video 1, 4:41)
  • Writing Special Method str (Video 2, 0:27)
  • Creating Classes That Interact (Video 1, 0:33)
  • Creating Classes That Interact (Video 2, 1:33)
  • Creating Classes That Interact (Video 3, 4:28)
  • Passing Functions as Arguments (2:29)
  • Class Event (Problem explanation, 7:32)
  • Class Day (Problem explanation, 6:03)
  • Assigning Parameters Default Value (Video 1, 1:38)
  • Assigning Parameters Default Value (Video 2, 1:10)
  • Assigning Parameters Default Value (Video 3, 1:31)
  • Dealing With Exceptional Situations (7:28)
  • Class Day: Improved (Problem explanation, 4:28)
  • Class Day: schedule_multiple_event method (Problem Explanation, 2:28)

Last updated: Sept. 27, 2018 -- Contact Us

  • Skip to main content
  • Skip to search
  • Skip to select language
  • Sign up for free
  • English (US)

Memory management

Low-level languages like C, have manual memory management primitives such as malloc() and free() . In contrast, JavaScript automatically allocates memory when objects are created and frees it when they are not used anymore ( garbage collection ). This automaticity is a potential source of confusion: it can give developers the false impression that they don't need to worry about memory management.

Memory life cycle

Regardless of the programming language, the memory life cycle is pretty much always the same:

  • Allocate the memory you need
  • Use the allocated memory (read, write)
  • Release the allocated memory when it is not needed anymore

The second part is explicit in all languages. The first and last parts are explicit in low-level languages but are mostly implicit in high-level languages like JavaScript.

Allocation in JavaScript

Value initialization.

In order to not bother the programmer with allocations, JavaScript will automatically allocate memory when values are initially declared.

Allocation via function calls

Some function calls result in object allocation.

Some methods allocate new values or objects:

Using values

Using values basically means reading and writing in allocated memory. This can be done by reading or writing the value of a variable or an object property or even passing an argument to a function.

Release when the memory is not needed anymore

The majority of memory management issues occur at this phase. The most difficult aspect of this stage is determining when the allocated memory is no longer needed.

Low-level languages require the developer to manually determine at which point in the program the allocated memory is no longer needed and to release it.

Some high-level languages, such as JavaScript, utilize a form of automatic memory management known as garbage collection (GC). The purpose of a garbage collector is to monitor memory allocation and determine when a block of allocated memory is no longer needed and reclaim it. This automatic process is an approximation since the general problem of determining whether or not a specific piece of memory is still needed is undecidable .

Garbage collection

As stated above, the general problem of automatically finding whether some memory "is not needed anymore" is undecidable. As a consequence, garbage collectors implement a restriction of a solution to the general problem. This section will explain the concepts that are necessary for understanding the main garbage collection algorithms and their respective limitations.

The main concept that garbage collection algorithms rely on is the concept of reference . Within the context of memory management, an object is said to reference another object if the former has access to the latter (either implicitly or explicitly). For instance, a JavaScript object has a reference to its prototype (implicit reference) and to its properties values (explicit reference).

In this context, the notion of an "object" is extended to something broader than regular JavaScript objects and also contain function scopes (or the global lexical scope).

Reference-counting garbage collection

Note: no modern JavaScript engine uses reference-counting for garbage collection anymore.

This is the most naïve garbage collection algorithm. This algorithm reduces the problem from determining whether or not an object is still needed to determining if an object still has any other objects referencing it. An object is said to be "garbage", or collectible if there are zero references pointing to it.

For example:

There is a limitation when it comes to circular references. In the following example, two objects are created with properties that reference one another, thus creating a cycle. They will go out of scope after the function call has completed. At that point they become unneeded and their allocated memory should be reclaimed. However, the reference-counting algorithm will not consider them reclaimable since each of the two objects has at least one reference pointing to them, resulting in neither of them being marked for garbage collection. Circular references are a common cause of memory leaks.

Mark-and-sweep algorithm

This algorithm reduces the definition of "an object is no longer needed" to "an object is unreachable".

This algorithm assumes the knowledge of a set of objects called roots. In JavaScript, the root is the global object. Periodically, the garbage collector will start from these roots, find all objects that are referenced from these roots, then all objects referenced from these, etc. Starting from the roots, the garbage collector will thus find all reachable objects and collect all non-reachable objects.

This algorithm is an improvement over the previous one since an object having zero references is effectively unreachable. The opposite does not hold true as we have seen with circular references.

Currently, all modern engines ship a mark-and-sweep garbage collector. All improvements made in the field of JavaScript garbage collection (generational/incremental/concurrent/parallel garbage collection) over the last few years are implementation improvements of this algorithm, but not improvements over the garbage collection algorithm itself nor its reduction of the definition of when "an object is no longer needed".

The immediate benefit of this approach is that cycles are no longer a problem. In the first example above, after the function call returns, the two objects are no longer referenced by any resource that is reachable from the global object. Consequently, they will be found unreachable by the garbage collector and have their allocated memory reclaimed.

However, the inability to manually control garbage collection remains. There are times when it would be convenient to manually decide when and what memory is released. In order to release the memory of an object, it needs to be made explicitly unreachable. It is also not possible to programmatically trigger garbage collection in JavaScript — and will likely never be within the core language, although engines may expose APIs behind opt-in flags.

Configuring an engine's memory model

JavaScript engines typically offer flags that expose the memory model. For example, Node.js offers additional options and tools that expose the underlying V8 mechanisms for configuring and debugging memory issues. This configuration may not be available in browsers, and even less so for web pages (via HTTP headers, etc.).

The max amount of available heap memory can be increased with a flag:

We can also expose the garbage collector for debugging memory issues using a flag and the Chrome Debugger :

Data structures aiding memory management

Although JavaScript does not directly expose the garbage collector API, the language offers several data structures that indirectly observe garbage collection and can be used to manage memory usage.

WeakMaps and WeakSets

WeakMap and WeakSet are data structures whose APIs closely mirror their non-weak counterparts: Map and Set . WeakMap allows you to maintain a collection of key-value pairs, while WeakSet allows you to maintain a collection of unique values, both with performant addition, deletion, and querying.

WeakMap and WeakSet got the name from the concept of weakly held values. If x is weakly held by y , it means that although you can access the value of x via y , the mark-and-sweep algorithm won't consider x as reachable if nothing else strongly holds to it. Most data structures, except the ones discussed here, strongly holds to the objects passed in so that you can retrieve them at any time. The keys of WeakMap and WeakSet can be garbage-collected (for WeakMap objects, the values would then be eligible for garbage collection as well) as long as nothing else in the program is referencing the key. This is ensured by two characteristics:

  • WeakMap and WeakSet can only store objects or symbols. This is because only objects are garbage collected — primitive values can always be forged (that is, 1 === 1 but {} !== {} ), making them stay in the collection forever. Registered symbols (like Symbol.for("key") ) can also be forged and thus not garbage collectable, but symbols created with Symbol("key") are garbage collectable. Well-known symbols like Symbol.iterator come in a fixed set and are unique throughout the lifetime of the program, similar to intrinsic objects such as Array.prototype , so they are also allowed as keys.
  • WeakMap and WeakSet are not iterable. This prevents you from using Array.from(map.keys()).length to observe the liveliness of objects, or get hold of an arbitrary key which should otherwise be eligible for garbage collection. (Garbage collection should be as invisible as possible.)

In typical explanations of WeakMap and WeakSet (such as the one above), it's often implied that the key is garbage-collected first, freeing the value for garbage collection as well. However, consider the case of the value referencing the key:

If key is stored as an actual reference, it would create a cyclic reference and make both the key and value ineligible for garbage collection, even when nothing else references key — because if key is garbage collected, it means that at some particular instant, value.key would point to a non-existent address, which is not legal. To fix this, the entries of WeakMap and WeakSet aren't actual references, but ephemerons , an enhancement to the mark-and-sweep mechanism. Barros et al. offers a good summary of the algorithm (page 4). To quote a paragraph:

Ephemerons are a refinement of weak pairs where neither the key nor the value can be classified as weak or strong. The connectivity of the key determines the connectivity of the value, but the connectivity of the value does not affect the connectivity of the key. […] when the garbage collection offers support to ephemerons, it occurs in three phases instead of two (mark and sweep).

As a rough mental model, think of a WeakMap as the following implementation:

Warning: This is not a polyfill nor is anywhere close to how it's implemented in the engine (which hooks into the garbage collection mechanism).

As you can see, the MyWeakMap never actually holds a collection of keys. It simply adds metadata to each object being passed in. The object is then garbage-collectable via mark-and-sweep. Therefore, it's not possible to iterate over the keys in a WeakMap , nor clear the WeakMap (as that also relies on the knowledge of the entire key collection).

For more information on their APIs, see the keyed collections guide.

WeakRefs and FinalizationRegistry

Note: WeakRef and FinalizationRegistry offer direct introspection into the garbage collection machinery. Avoid using them where possible because the runtime semantics are almost completely unguaranteed.

All variables with an object as value are references to that object. However, such references are strong — their existence would prevent the garbage collector from marking the object as eligible for collection. A WeakRef is a weak reference to an object that allows the object to be garbage collected, while still retaining the ability to read the object's content during its lifetime.

One use case for WeakRef is a cache system which maps string URLs to large objects. We cannot use a WeakMap for this purpose, because WeakMap objects have their keys weakly held, but not their values — if you access a key, you would always deterministically get the value (since having access to the key means it's still alive). Here, we are okay to get undefined for a key (if the corresponding value is no longer alive) since we can just re-compute it, but we don't want unreachable objects to stay in the cache. In this case, we can use a normal Map , but with each value being a WeakRef of the object instead of the actual object value.

FinalizationRegistry provides an even stronger mechanism to observe garbage collection. It allows you to register objects and be notified when they are garbage collected. For example, for the cache system exemplified above, even when the blobs themselves are free for collection, the WeakRef objects that hold them are not — and over time, the Map may accumulate a lot of useless entries. Using a FinalizationRegistry allows one to perform cleanup in this case.

Due to performance and security concerns, there is no guarantee of when the callback will be called, or if it will be called at all. It should only be used for cleanup — and non-critical cleanup. There are other ways for more deterministic resource management, such as try...finally , which will always execute the finally block. WeakRef and FinalizationRegistry exist solely for optimization of memory usage in long-running programs.

For more information on the API of WeakRef and FinalizationRegistry , see their reference pages.

6.6 The Full Python Memory Model: Function Calls

So far in this chapter, we have talked only about variables defined within the Python console. In 2.3 Local Variables and Function Scope , we saw how to represent function scope in the value-based memory model using separate “tables of values” for each function call. In this section, we’ll see how to represent function scope in the full Python memory model so that we can capture exactly how function scope works and impacts the variables we use throughout the lifetime of our programs.

Stack frames

Suppose we define the following function, and then call it in the Python console:

Consider what the state of memory is when repeat(count, word) is called, immediately before the return message statement executes. Let’s first recall how we would draw the value-based memory model for this point:

This memory model shows two tables, showing the variables defined in the Python console ( count , word ), and the variables local to the function repeat ( n , s , and message ).

Here is how we would translate this into a full Python memory model diagram:

variable assignment memory model

As with the diagrams we saw in the previous sections of this chapter, our variables are on the left side of the diagram, and the objects on the right. The variables are separated into two separate boxes, one for the Python console and one for the function call for repeat . All variables, regardless of which box they’re in, store only ids that refer to objects on the right-hand side. Notice that count and n are aliases, as are word and s .

Now that we have this full diagram, we’ll introduce a more formal piece of terminology. Each “box” on the left-hand side of our diagram represents a stack frame (or just frame for short), which is a special data type used by the Python interpreter to keep track of the functions that have been called in a program, and the variables defined within each function. We call the collection of stack frames the function call stack .

Every time we call a function, the Python interpreter does the following:

  • Creates a new stack frame and add it to the call stack.
  • Evaluates the arguments in the function call, yielding the ids of objects (one per argument). Each of these ids is assigned to the corresponding parameter, as an entry in the new stack frame.
  • Executes the body of the function.
  • When a return statement is executed in the function body, the id of the returned object is saved and the stack frame for the function call is removed from the call stack.

Argument passing and aliasing

What we often call “parameter passing” is a special form of variable assignment in the Python interpreter. In the example above, when we called repeat(count, word) , it is as if we wrote

before executing the body of the function.

This aliasing is what allows us to define functions that mutate their argument values, and have that effect persist after the function ends. Here is an example:

When emphasize(sentence) is called in the Python console, this is the state of memory:

variable assignment memory model

In this case, words and sentence are aliases, and so mutating words within the function causes a change to occur in __main__ as well.

On the other hand, consider what happens with this version of the function:

After we call emphasize_v2 in the Python console, the value of sentence is unchanged! To understand why, let’s look at two memory model diagrams. The first shows the state of memory immediately after new_words = ['believe', 'me!'] is executed:

variable assignment memory model

The next statement to execute is words = words + new_words . The key to understanding the next diagram is to recall variable reassignment : the right-hand side ( words + new_words ) is evaluated, and then the resulting object id is assigned to words . List concatenation with + creates a new list object .

variable assignment memory model

Notice that in this diagram, words and sentence are no longer aliases! Instead, words has been assigned to a new list object, but sentence has remained unchanged. Remember the rule of variable reassignment: an assignment statement <name> = ... only changes what object the variable <name> refers to, but never changes any other variables. This illustrates the importance of keeping variable reassignment and object mutation as distinct concepts. Even though the bodies of emphasize and emphasize_v2 look very similar, the end result is very different: emphasize mutates its argument object, while emphasize_v2 actually leaves it unchanged!

The Go Memory Model

Version of june 6, 2022, introduction.

The Go memory model specifies the conditions under which reads of a variable in one goroutine can be guaranteed to observe values produced by writes to the same variable in a different goroutine.

Programs that modify data being simultaneously accessed by multiple goroutines must serialize such access.

To serialize access, protect the data with channel operations or other synchronization primitives such as those in the sync and sync/atomic packages.

If you must read the rest of this document to understand the behavior of your program, you are being too clever.

Don't be clever.

Informal Overview

Go approaches its memory model in much the same way as the rest of the language, aiming to keep the semantics simple, understandable, and useful. This section gives a general overview of the approach and should suffice for most programmers. The memory model is specified more formally in the next section.

A data race is defined as a write to a memory location happening concurrently with another read or write to that same location, unless all the accesses involved are atomic data accesses as provided by the sync/atomic package. As noted already, programmers are strongly encouraged to use appropriate synchronization to avoid data races. In the absence of data races, Go programs behave as if all the goroutines were multiplexed onto a single processor. This property is sometimes referred to as DRF-SC: data-race-free programs execute in a sequentially consistent manner.

While programmers should write Go programs without data races, there are limitations to what a Go implementation can do in response to a data race. An implementation may always react to a data race by reporting the race and terminating the program. Otherwise, each read of a single-word-sized or sub-word-sized memory location must observe a value actually written to that location (perhaps by a concurrent executing goroutine) and not yet overwritten. These implementation constraints make Go more like Java or JavaScript, in that most races have a limited number of outcomes, and less like C and C++, where the meaning of any program with a race is entirely undefined, and the compiler may do anything at all. Go's approach aims to make errant programs more reliable and easier to debug, while still insisting that races are errors and that tools can diagnose and report them.

Memory Model

The following formal definition of Go's memory model closely follows the approach presented by Hans-J. Boehm and Sarita V. Adve in “ Foundations of the C++ Concurrency Memory Model ”, published in PLDI 2008. The definition of data-race-free programs and the guarantee of sequential consistency for race-free programs are equivalent to the ones in that work.

The memory model describes the requirements on program executions, which are made up of goroutine executions, which in turn are made up of memory operations.

A memory operation is modeled by four details:

Some memory operations are read-like , including read, atomic read, mutex lock, and channel receive. Other memory operations are write-like , including write, atomic write, mutex unlock, channel send, and channel close. Some, such as atomic compare-and-swap, are both read-like and write-like.

A goroutine execution is modeled as a set of memory operations executed by a single goroutine.

Requirement 1 : The memory operations in each goroutine must correspond to a correct sequential execution of that goroutine, given the values read from and written to memory. That execution must be consistent with the sequenced before relation, defined as the partial order requirements set out by the Go language specification for Go's control flow constructs as well as the order of evaluation for expressions .

A Go program execution is modeled as a set of goroutine executions, together with a mapping W that specifies the write-like operation that each read-like operation reads from. (Multiple executions of the same program can have different program executions.)

Requirement 2 : For a given program execution, the mapping W , when limited to synchronizing operations, must be explainable by some implicit total order of the synchronizing operations that is consistent with sequencing and the values read and written by those operations.

The synchronized before relation is a partial order on synchronizing memory operations, derived from W . If a synchronizing read-like memory operation r observes a synchronizing write-like memory operation w (that is, if W ( r ) = w ), then w is synchronized before r . Informally, the synchronized before relation is a subset of the implied total order mentioned in the previous paragraph, limited to the information that W directly observes.

The happens before relation is defined as the transitive closure of the union of the sequenced before and synchronized before relations.

Requirement 3 : For an ordinary (non-synchronizing) data read r on a memory location x , W ( r ) must be a write w that is visible to r , where visible means that both of the following hold:

A read-write data race on memory location x consists of a read-like memory operation r on x and a write-like memory operation w on x , at least one of which is non-synchronizing, which are unordered by happens before (that is, neither r happens before w nor w happens before r ).

A write-write data race on memory location x consists of two write-like memory operations w and w' on x , at least one of which is non-synchronizing, which are unordered by happens before.

Note that if there are no read-write or write-write data races on memory location x , then any read r on x has only one possible W ( r ): the single w that immediately precedes it in the happens before order.

More generally, it can be shown that any Go program that is data-race-free, meaning it has no program executions with read-write or write-write data races, can only have outcomes explained by some sequentially consistent interleaving of the goroutine executions. (The proof is the same as Section 7 of Boehm and Adve's paper cited above.) This property is called DRF-SC.

The intent of the formal definition is to match the DRF-SC guarantee provided to race-free programs by other languages, including C, C++, Java, JavaScript, Rust, and Swift.

Certain Go language operations such as goroutine creation and memory allocation act as synchronization operations. The effect of these operations on the synchronized-before partial order is documented in the “Synchronization” section below. Individual packages are responsible for providing similar documentation for their own operations.

Implementation Restrictions for Programs Containing Data Races

The preceding section gave a formal definition of data-race-free program execution. This section informally describes the semantics that implementations must provide for programs that do contain races.

Any implementation can, upon detecting a data race, report the race and halt execution of the program. Implementations using ThreadSanitizer (accessed with “ go build -race ”) do exactly this.

A read of an array, struct, or complex number may by implemented as a read of each individual sub-value (array element, struct field, or real/imaginary component), in any order. Similarly, a write of an array, struct, or complex number may be implemented as a write of each individual sub-value, in any order.

A read r of a memory location x holding a value that is not larger than a machine word must observe some write w such that r does not happen before w and there is no write w' such that w happens before w' and w' happens before r . That is, each read must observe a value written by a preceding or concurrent write.

Additionally, observation of acausal and “out of thin air” writes is disallowed.

Reads of memory locations larger than a single machine word are encouraged but not required to meet the same semantics as word-sized memory locations, observing a single allowed write w . For performance reasons, implementations may instead treat larger operations as a set of individual machine-word-sized operations in an unspecified order. This means that races on multiword data structures can lead to inconsistent values not corresponding to a single write. When the values depend on the consistency of internal (pointer, length) or (pointer, type) pairs, as can be the case for interface values, maps, slices, and strings in most Go implementations, such races can in turn lead to arbitrary memory corruption.

Examples of incorrect synchronization are given in the “Incorrect synchronization” section below.

Examples of the limitations on implementations are given in the “Incorrect compilation” section below.

Synchronization

Initialization.

Program initialization runs in a single goroutine, but that goroutine may create other goroutines, which run concurrently.

If a package p imports package q , the completion of q 's init functions happens before the start of any of p 's.

The completion of all init functions is synchronized before the start of the function main.main .

Goroutine creation

The go statement that starts a new goroutine is synchronized before the start of the goroutine's execution.

For example, in this program:

calling hello will print "hello, world" at some point in the future (perhaps after hello has returned).

Goroutine destruction

The exit of a goroutine is not guaranteed to be synchronized before any event in the program. For example, in this program:

the assignment to a is not followed by any synchronization event, so it is not guaranteed to be observed by any other goroutine. In fact, an aggressive compiler might delete the entire go statement.

If the effects of a goroutine must be observed by another goroutine, use a synchronization mechanism such as a lock or channel communication to establish a relative ordering.

Channel communication

Channel communication is the main method of synchronization between goroutines. Each send on a particular channel is matched to a corresponding receive from that channel, usually in a different goroutine.

A send on a channel is synchronized before the completion of the corresponding receive from that channel.

This program:

is guaranteed to print "hello, world" . The write to a is sequenced before the send on c , which is synchronized before the corresponding receive on c completes, which is sequenced before the print .

The closing of a channel is synchronized before a receive that returns a zero value because the channel is closed.

In the previous example, replacing c <- 0 with close(c) yields a program with the same guaranteed behavior.

A receive from an unbuffered channel is synchronized before the completion of the corresponding send on that channel.

This program (as above, but with the send and receive statements swapped and using an unbuffered channel):

is also guaranteed to print "hello, world" . The write to a is sequenced before the receive on c , which is synchronized before the corresponding send on c completes, which is sequenced before the print .

If the channel were buffered (e.g., c = make(chan int, 1) ) then the program would not be guaranteed to print "hello, world" . (It might print the empty string, crash, or do something else.)

The k th receive on a channel with capacity C is synchronized before the completion of the k + C th send from that channel completes.

This rule generalizes the previous rule to buffered channels. It allows a counting semaphore to be modeled by a buffered channel: the number of items in the channel corresponds to the number of active uses, the capacity of the channel corresponds to the maximum number of simultaneous uses, sending an item acquires the semaphore, and receiving an item releases the semaphore. This is a common idiom for limiting concurrency.

This program starts a goroutine for every entry in the work list, but the goroutines coordinate using the limit channel to ensure that at most three are running work functions at a time.

The sync package implements two lock data types, sync.Mutex and sync.RWMutex .

For any sync.Mutex or sync.RWMutex variable l and n < m , call n of l.Unlock() is synchronized before call m of l.Lock() returns.

is guaranteed to print "hello, world" . The first call to l.Unlock() (in f ) is synchronized before the second call to l.Lock() (in main ) returns, which is sequenced before the print .

For any call to l.RLock on a sync.RWMutex variable l , there is an n such that the n th call to l.Unlock is synchronized before the return from l.RLock , and the matching call to l.RUnlock is synchronized before the return from call n +1 to l.Lock .

A successful call to l.TryLock (or l.TryRLock ) is equivalent to a call to l.Lock (or l.RLock ). An unsuccessful call has no synchronizing effect at all. As far as the memory model is concerned, l.TryLock (or l.TryRLock ) may be considered to be able to return false even when the mutex l is unlocked.

The sync package provides a safe mechanism for initialization in the presence of multiple goroutines through the use of the Once type. Multiple threads can execute once.Do(f) for a particular f , but only one will run f() , and the other calls block until f() has returned.

The completion of a single call of f() from once.Do(f) is synchronized before the return of any call of once.Do(f) .

In this program:

calling twoprint will call setup exactly once. The setup function will complete before either call of print . The result will be that "hello, world" will be printed twice.

Atomic Values

The APIs in the sync/atomic package are collectively “atomic operations” that can be used to synchronize the execution of different goroutines. If the effect of an atomic operation A is observed by atomic operation B , then A is synchronized before B . All the atomic operations executed in a program behave as though executed in some sequentially consistent order.

The preceding definition has the same semantics as C++’s sequentially consistent atomics and Java’s volatile variables.

The runtime package provides a SetFinalizer function that adds a finalizer to be called when a particular object is no longer reachable by the program. A call to SetFinalizer(x, f) is synchronized before the finalization call f(x) .

Additional Mechanisms

The sync package provides additional synchronization abstractions, including condition variables , lock-free maps , allocation pools , and wait groups . The documentation for each of these specifies the guarantees it makes concerning synchronization.

Other packages that provide synchronization abstractions should document the guarantees they make too.

Incorrect synchronization

Programs with races are incorrect and can exhibit non-sequentially consistent executions. In particular, note that a read r may observe the value written by any write w that executes concurrently with r . Even if this occurs, it does not imply that reads happening after r will observe writes that happened before w .

it can happen that g prints 2 and then 0 .

This fact invalidates a few common idioms.

Double-checked locking is an attempt to avoid the overhead of synchronization. For example, the twoprint program might be incorrectly written as:

but there is no guarantee that, in doprint , observing the write to done implies observing the write to a . This version can (incorrectly) print an empty string instead of "hello, world" .

Another incorrect idiom is busy waiting for a value, as in:

As before, there is no guarantee that, in main , observing the write to done implies observing the write to a , so this program could print an empty string too. Worse, there is no guarantee that the write to done will ever be observed by main , since there are no synchronization events between the two threads. The loop in main is not guaranteed to finish.

There are subtler variants on this theme, such as this program.

Even if main observes g != nil and exits its loop, there is no guarantee that it will observe the initialized value for g.msg .

In all these examples, the solution is the same: use explicit synchronization.

Incorrect compilation

The Go memory model restricts compiler optimizations as much as it does Go programs. Some compiler optimizations that would be valid in single-threaded programs are not valid in all Go programs. In particular, a compiler must not introduce writes that do not exist in the original program, it must not allow a single read to observe multiple values, and it must not allow a single write to write multiple values.

All the following examples assume that `*p` and `*q` refer to memory locations accessible to multiple goroutines.

Not introducing data races into race-free programs means not moving writes out of conditional statements in which they appear. For example, a compiler must not invert the conditional in this program:

That is, the compiler must not rewrite the program into this one:

If cond is false and another goroutine is reading *p , then in the original program, the other goroutine can only observe any prior value of *p and 1 . In the rewritten program, the other goroutine can observe 2 , which was previously impossible.

Not introducing data races also means not assuming that loops terminate. For example, a compiler must in general not move the accesses to *p or *q ahead of the loop in this program:

If list pointed to a cyclic list, then the original program would never access *p or *q , but the rewritten program would. (Moving `*p` ahead would be safe if the compiler can prove `*p` will not panic; moving `*q` ahead would also require the compiler proving that no other goroutine can access `*q`.)

Not introducing data races also means not assuming that called functions always return or are free of synchronization operations. For example, a compiler must not move the accesses to *p or *q ahead of the function call in this program (at least not without direct knowledge of the precise behavior of f ):

If the call never returned, then once again the original program would never access *p or *q , but the rewritten program would. And if the call contained synchronizing operations, then the original program could establish happens before edges preceding the accesses to *p and *q , but the rewritten program would not.

Not allowing a single read to observe multiple values means not reloading local variables from shared memory. For example, a compiler must not discard i and reload it a second time from *p in this program:

If the complex code needs many registers, a compiler for single-threaded programs could discard i without saving a copy and then reload i = *p just before funcs[i]() . A Go compiler must not, because the value of *p may have changed. (Instead, the compiler could spill i to the stack.)

Not allowing a single write to write multiple values also means not using the memory where a local variable will be written as temporary storage before the write. For example, a compiler must not use *p as temporary storage in this program:

That is, it must not rewrite the program into this one:

If i and *p start equal to 2, the original code does *p = 3 , so a racing thread can read only 2 or 3 from *p . The rewritten code does *p = 1 and then *p = 3 , allowing a racing thread to read 1 as well.

Note that all these optimizations are permitted in C/C++ compilers: a Go compiler sharing a back end with a C/C++ compiler must take care to disable optimizations that are invalid for Go.

Note that the prohibition on introducing data races does not apply if the compiler can prove that the races do not affect correct execution on the target platform. For example, on essentially all CPUs, it is valid to rewrite

provided it can be proved that *shared will not fault on access, because the potential added read will not affect any existing concurrent reads or writes. On the other hand, the rewrite would not be valid in a source-to-source translator.

Go programmers writing data-race-free programs can rely on sequentially consistent execution of those programs, just as in essentially all other modern programming languages.

When it comes to programs with races, both programmers and compilers should remember the advice: don't be clever.

AAWO Andrzej Wojciechowski

The optimal way of creating memory model in VHDL

During the development of a block communicating with an external chip, a behavioral model of this device can be very helpful. Sometimes such model can be found online, but often we need to create it ourselves. One type of external chip frequently connected to the FPGA is some kind of volatile or non-volatile memory. Unfortunately, in VHDL it is often implemented not optimally.

A seemingly easy memory model implementation

The memory model implementation seems pretty straightforward: just create an interface, add an array of given size to act like a memory cells (store data), some read and write functionality and additional features that the actual chip has. Apart from the fact that such memory module doesn’t need to be synthesizable, it can be implemented just like any other VHDL design block.

When searching the web for an example of VHDL memory model, we can find many examples such as:

  • The one from Sabanci University
  • Or this one from Doulos
  • This one from Auburn University
  • And this one from GetMyUni
  • And many more

All of them have a similar structure:

And these generally work correctly and can be used as an example. However, there is a catch. These examples are not optimal . This means, until you use them for a relatively small size memories (from my experience: up to a couple thousands memory words, so address width in low teens), they are fine. The problem appears when you need a memory model with bigger array – let’s say several Megabits or larger. Once you try to simulate such model, you may encounter an error regarding memory allocation (this time it refers to a physical memory in your computer).

What is the cause of the problem?

In short, this is caused by implementing a memory array as signal . Why?

VHDL (same as Verilog) simulators use Discrete Event Simulation (DES) technique – here you can find a general description of DES. Generally, during simulation each signal assignment creates and event . Each event contains at least a type of event (a signal transition from state A to state B ) and a timestamp when the event will occur.

Why does it matter in this case? We can see that a signal simulation allocates more memory (physical memory in your computer) than it seems by the signal declaration. In case of a bigger memory size model simulation this can become a real problem. Fortunately, there is a way to fix this.

Memory array model in VHDL – the optimal way

In order to optimally implement a memory model in VHDL, we need to eliminate signal from the array declaration. This can be done using variable . This way we can reduce the required memory (physical memory in your computer) for array simulation. This is because how variable work in VHDL, when compared to signals .

In VHDL a variable value is assigned immediately (with no delay). Unlike signal , whose value is assigned in the next event (most often during the next clock edge). Therefore, a variable assignment does not require an event. So in total, a variable allocates noticeably less memory (physical memory in your computer) then a signal .

But there is a catch. By default, a variable is a local object, that’s accessible only in a single process block. So it may require a slightly different implementation:

All read and write to memory array operations must be implemented in a single process block. Which may not be a bit deal. Also, a multi-port memory memory implementation is a bit different and definitely less clear. What’s even worse, it is more difficult to access the memory contents externally – in a testbench or simply to view it on a simulator’s waveform. Fortunately, this can also be fixed.

In VHDL 1993 version, a shared variable was introduced. It allows us to define a variable that can be accessed in multiple different process blocks (just like signals), while still behaving like a variable (immediate assignment, so less physical memory allocation). It can be used as such:

The VHDL 1993 standard is supported by the majority of HDL development tools, so most cases it should be possible to use shared variable to model a memory array.

There are several gains to the shared variable approach:

  • significantly less physical memory allocation (compared to signal )
  • reduced simulation times
  • easy transition from the signal implementation to shared variable implementation
  • similar behavior to signal implementation (multiple process blocks can be used)
  • similar access to memory array as in signal implementation

Generally there are no real disadvantages. The only two far-fetched ones, that I can think of, are:

  • VHDL 1993 requirement – while the newer VHDL versions (like VHDL 2008 or especially VHDL 2019) may not be acceptable for a given project, the VHDL 1993 is probably the most widely used version
  • the immediate value assignment may cause a different behavior – but generally the simultaneous read and write operations to the same memory location should be avoided in a real application.

References:

  • Modeling memory in VHDL
  • Discrete event modeling: VHDL by Peter Marwedel – Univ. Dortmund Germany
  • The Verilog Hardware Description Language by Professor Don Thomas – Carnegie Mellon University (CMU)
  • VHDL Simulation vs. Synthesis by Erkay Savaş – Sabanci University

2 thoughts on “The optimal way of creating memory model in VHDL”

  • Pingback: Constant value override from VHDL testbench « AAWO
  • Pingback: How to use shared variables in VHDL « AAWO Andrzej Wojciechowski

Comments are closed.

variable assignment memory model

© 2020 – 2024 AAWO Andrzej Wojciechowski

IMAGES

  1. Python Namespace and Scope of a Variable

    variable assignment memory model

  2. The 3 Stages of Memory: An In-Depth Guide (with Examples!)

    variable assignment memory model

  3. PPT

    variable assignment memory model

  4. Java Memory Model

    variable assignment memory model

  5. Memory_model

    variable assignment memory model

  6. AS Psychology: The Working Memory Model

    variable assignment memory model

VIDEO

  1. Assignment operators in java

  2. Memory Verses Final Cut

  3. The Assignment Memory Cutscene (BREAK IN 2)

  4. _DSDV_Discuss Structure, Variable Assignment Statement in verilog

  5. variable memory model

  6. PHP Variable Scope & Operator

COMMENTS

  1. PEP 583

    The basic idea here is that when you assign a shared variable, readers can't see any changes made to the new value before the assignment, or to the old value after the assignment. So, if we have a program like: ... We can adopt an initial memory model without totally restricting future implementations. If we start with a weak model and want ...

  2. 6.4 The Python Memory Model: Introduction

    Assignment statements. Second, we said earlier that an assignment statement is executed by first evaluating the right-hand side expression, and then storing it in the left-hand side variable. ... Our last topic in this section will be to use our object-based memory model to visualize variable reassignment and object mutation in Python. Consider ...

  3. 1.1 The Python Memory Model: Introduction

    Variables# All programming languages have the concept of variables. In Python, a variable is not an object, and so does not actually store data; it stores an id that refers to an object that stores data. This is the case whether the data is something very simple like an int or more complex like a str. Consider this code:

  4. Understanding python's memory model

    What you need to realize is that: A variable is simply a name which you use to reference an object. 20000 and 1000000 are two unique integer objects. This means that they will never share the same memory address simultaneously. In simple terms, when you execute this line: y = 20000. two things happen:

  5. 1.2 The Python Memory Model: Functions and Parameters

    If an argument to a function is a variable, what we assign to the function's parameter is the id of the object that the variable references. This creates an alias. As you should expect, what the function can do with these aliases depends on whether or not the object is mutable. ... Memory model diagrams offer a concise visual way to represent ...

  6. Memory Model

    1. Variable Assignment <<variable>> = <<expression>> Step 1: Evaluate the expression to produce a value. This value is stored in a memory object. Step 2: Store the address of the memory object in the variable. (If the variable already exists, replace the memory address that is contains.) Result: The variable points the memory where the value is ...

  7. Memory Management in Python

    Python uses a portion of the memory for internal use and non-object memory. The other portion is dedicated to object storage (your int, dict, and the like). Note that this was somewhat simplified. If you want the full picture, you can check out the CPython source code, where all this memory management happens.

  8. PDF C4M: The Python Memory Model

    We have a memory table and a variables table. Initially, the memory table looks something like this (the memory addresses are made up). Address Value 1000 1010 1020 2 1030 3 1040 4 1050 Some integers are already stored at some of the addresses. For example, the integer 2 is stored in address (i.e., cell) 1020. When we assign a value to a ...

  9. Python Variables and Assignment

    A Python variable is a named bit of computer memory, keeping track of a value as the code runs. A variable is created with an "assignment" equal sign =, with the variable's name on the left and the value it should store on the right: x = 42. In the computer's memory, each variable is like a box, identified by the name of the variable.

  10. PDF 1 The memory table and variable table

    ESC180H1F The Python Memory Model - Lecture Notes Fall 2022 Some integers are already stored at some of the addresses2. For example, the integer 2 is stored in address (i.e., cell) 1020. When we assign a value to a variable, what really happens is that the variable name becomes associated with an address. For example, consider the following ...

  11. Python Variable Assignment. Explaining One Of The Most Fundamental

    Declare And Assign Value To Variable. Assignment sets a value to a variable. To assign variable a value, use the equals sign (=) myFirstVariable = 1 mySecondVariable = 2 myFirstVariable = "Hello You" Assigning a value is known as binding in Python. In the example above, we have assigned the value of 2 to mySecondVariable.

  12. Variable assignment and instruction scheduling for processor with multi

    To achieve these goals, our proposed method will solve the following problems simultaneously with a unified ILP model. • Variable assignment. Assigning variables to memory modules in a performance- and energy-aware fashion. We assume that array variables are indivisible. • Instruction scheduling. Mapping operations on appropriate functional ...

  13. C#

    The C# memory model permits reordering of memory operations in a method, as long as the behavior of single-threaded execution doesn't change. For example, the compiler and the processor are free to reorder the Init method operations as follows: XML. void Init() {. _initialized = true; // Write 2.

  14. PDF Lecture 20 C's Memory Model

    1 The C0 and C Memory Model When we talk about memory in C0, C1, and C, that memory is always in one of three places: Local variables (including the arguments to functions) are stored in memory. In both C0 and C, this memory is reserved automatically when we declare a new local variable, though in C the contents of that local memory aren't ...

  15. PCRS-Python: Python Programming Modules

    Variable Assignment Memory Model (1 videos) Variable Assignment Memory Model (Problem explanation, 2:02) Variable Assignment Statements (2 videos) Variable Swap (Problem explanation, 3:03) Changing variable values (Problem explanation, 4:18) Week 2 (21 videos) Prepare (14 videos) Defining Functions (2 videos) Defining Functions (Video 1, 2:39)

  16. Memory management

    Memory management. Low-level languages like C, have manual memory management primitives such as malloc () and free (). In contrast, JavaScript automatically allocates memory when objects are created and frees it when they are not used anymore ( garbage collection ). This automaticity is a potential source of confusion: it can give developers ...

  17. 6.6 The Full Python Memory Model: Function Calls

    6.6 The Full Python Memory Model: Function Calls. So far in this chapter, we have talked only about variables defined within the Python console. In 2.3 Local Variables and Function Scope, we saw how to represent function scope in the value-based memory model using separate "tables of values" for each function call.

  18. The Go Memory Model

    The Go memory model specifies the conditions under which reads of a variable in one goroutine can be guaranteed to observe values produced by writes to the same variable in a different goroutine. Advice. Programs that modify data being simultaneously accessed by multiple goroutines must serialize such access.

  19. python

    Suppose I assign two variables to integers: a = 1 b = 2 Now, I'll assign a to b: a = b As expected, a == 2, because a has been set to the memory address of b. But actually, it hasn't. If I do . b += 1 a still equals 2. Why does a not point to b?

  20. The optimal way of creating memory model in VHDL « AAWO

    In case of a bigger memory size model simulation this can become a real problem. Fortunately, there is a way to fix this. Memory array model in VHDL - the optimal way. In order to optimally implement a memory model in VHDL, we need to eliminate signal from the array declaration. This can be done using variable. This way we can reduce the ...

  21. A novel regional traffic control strategy for mixed traffic system with

    To solve the bi-level programming model, the paper introduces a variable neighborhood search approach based on graph theory. The Frank-Wolfe algorithm is used to solve the lower-level model, with a penalty function introduced to transform the constrained traffic assignment problem (TAP) into an unconstrained TAP.

  22. How does memory work in c# with variable assignment?

    Assigning the instance of a reference type to a variable/field/property copies the reference to the instance's allocated memory i.e. the value: // Let local variable 'a' point to/reference instance of MyClass (the value) var a = new MyClass();

  23. Design synthesis of hybrid stator type flux-switching bearingless

    A magnetic flux-switching type bearingless permanent magnet memory motor is proposed, which not only solves the problem of friction in high-speed rotating mechanical bearings, but also achieves the true effect of "variable flux" operation with almost no excitation loss through permanent magnets with low coercivity such as aluminum nickel cobalt that can be adjusted online by applying only ...