[ACCEPTED]-How is thread synchronization implemented, at the assembly language level?-x86
- In practice, these tend to be implemented with CAS and LL/SC. (...and some spinning before giving up the time slice of the thread - usually by calling into a kernel function that switches context.)
- If you only need a spinlock, wikipedia gives you an example which trades CAS for lock prefixed
xchgon x86/x64. So in a strict sense, a CAS is not needed for crafting a spinlock - but some kind of atomicity is still required. In this case, it makes use of an atomic operation that can write a register to memory and return the previous contents of that memory slot in a single step. (To clarify a bit more: the lock prefix asserts the #LOCK signal that ensures that the current CPU has exclusive access to the memory. On todays CPUs it is not necessarily carried out this way, but the effect is the same. By using
xchgwe make sure that we will not get preempted somewhere between reading and writing, since instructions will not be interrupted half-way. So if we had an imaginary lock mov reg0, mem / lock mov mem, reg1 pair (which we don't), that would not quite be the same - it could be preempted just between the two movs.)
- On current architectures, as pointed out in the comments, you mostly end up using the atomic primitives of the CPU and the coherency protocols provided by the memory subsystem.
- For this reason, you not only have to use these primitives, but also account for the cache/memory coherency guaranteed by the architecture.
- There may be implementation nuances as well. Considering e.g. a spinlock:
- instead of a naive implementation, you should probably use e.g. a TTAS spin-lock with some exponential backoff,
- on a Hyper-Threaded CPU, you should probably issue
pauseinstructions that serve as hints that you're spinning - so that the core you are running on can do something useful during this
- you should really give up on spinning and yield control to other threads after a while
- this is still user mode - if you are writing a kernel, you might have some other tools that you can use as well (since you are the one that schedules threads and handles/enables/disables interrupts).
The x86 architecture, has long had an instruction 22 called
xchg which will exchange the contents 21 of a register with a memory location. xchg 20 has always been atomic.
There has also always 19 been a
lock prefix that could be applied to 18
any a single instruction to make that instruction 17 atomic. Before there were multi processor 16 systems, all this really did was to prevent 15 an interrupt from being delivered in the 14 middle of a locked instruction. (xchg was 13 implicitly locked).
This article has some 12 sample code using xchg to implement a spinlock http://en.wikipedia.org/wiki/Spinlock
When 11 multi CPU and later multi Core systems began 10 to be built, more sophisticated systems 9 were needed to insure that lock and xchg 8 would synchronize all of the memory subsystems, including 7 l1 cache on all of the processors. About 6 this time, new research into locking and 5 lockless algorithms showed that atomic CompareAndSet 4 was a more flexible primitive to have, so 3 more modern CPUs have that as an instruction.
Addendum: In 2 comments andras supplied a "dusty old" list 1 of instructions which allow the
lock prefix. http://pdos.csail.mit.edu/6.828/2007/readings/i386/LOCK.htm
I like to think of thread synchronization 32 as a bottom up where processor and operating 31 system provide construct that are primitive 30 to more sophisticated
At the processor level 29 you have CAS and LL/SC which allow you to 28 perform a test and store in a single atomic 27 operation ... you also have other processor 26 constructs that allow you to disable and 25 enable interrupt (however they are considered 24 dangerous ... under certain circumstances 23 you have no other option but to use them)
operating 22 system provides the ability to context switch 21 between tasks which can happen every time 20 a thread has used its time slice ... or 19 it can happen due to otgher reasons (I will 18 come to that)
then there are higher level 17 constructs like mutexes which uses these 16 primitive mechanisms provided by processor 15 (think spinning mutex) ... which will continuously 14 wait for the condition to become true and 13 checks for that condition atomically
then 12 these spinning mutex can use the functionality 11 provided by OS (context switch and system 10 calls like yield which relinquishes the 9 control to another thread) and gives us 8 mutexes
these constructs are further utilized 7 by higher level constructs like conditional 6 variables (which can keep track of how many 5 threads are waiting for the mutex and which 4 thread to allow first when the mutex becomes 3 available)
These constructs than can be further 2 used to provide more sophisticated synchronization 1 constructs ... example : semaphores etc
More Related questions