Hello All,
I had moved to http://blog.csliu.com, see you there. The rss feed will remain the same.
6/25/2011
2/26/2011
Memory Issues on Multicore Platform
On multi-core platform, pure computing is cheap since there are many processing unit and memory capacity may also not be problem since it's becoming larger and larger. But memory bandwidth remains the bottleneck all the time because it's a bus that is shared by all CPU cores. So efficient memory management is very critical for a scalable application on multicore CPU.
In this article I will point out some memory related problems regarding multicore architecture and also some solutions.
Part I - Memory Contention
Memory Contention means that different cores share a common data region(in main memory and cache) that needs to be synchronized among them. Synchronizing data among different cores has big performance penalty because bus traffic contention, locking cost and cache miss. To deal with such problem, there are two strategies:
1. Don't Share Writable State Among Cores
To minimize memory bus traffic, you should minimize core interactions by minimizing shared locations/data, even if the shared data is not protected by lock but some hardware level atomic instructions such as InterlockedExchangeAdd64 on win32 platform.
The patterns that tend to reduce lock contention also tend to reduce memory traffic, because it is the shared writable state that requires locks and generates contention. In practice, letting each thread work on its own local copy of the data and merging the data after all threads are done can be a very effective strategy.
Let's see two parallel versions of sum calculation program on an eight-core computer. One version uses a shared global variable protected by InterlockedExchangeAdd64() to track all intermediate results among all threads. The other version gives each thread a private partial sum variable that's not shared at all and the final sum is computed as the sum of all these partial sums.
From the console output we can see clearly that, the private partial sum solution is 20x faster than the other one.
So, even if we just share one variable protected by hardware atomic instructions, the performance penalty could be very significant.
The general rule for efficient execution on a single core is to pack data tightly, so that it has as small a footprint as possible. But on a multi-core processor, packing shared data can lead to a severe penalty from false sharing. Generally, the solution is to pack data tightly, give each thread its own private copy to work on, and merge results afterwards.
2. Avoid False Sharing introduced by Core Cache
Good performance depends on processors fetching most of their data from cache instead of main memory. For sequential programs, modern caches generally work well without too much thought, though a little tuning helps. The smallest unit of memory that two processors interchange is a cache line or cache sector.
Even if we follows the strategy 1 and let each thread access its private data/state, different thread on different cores may also share the same cache line. This is called "false sharing". Avoiding false sharing may require aligning variables or objects in memory on cache line boundaries.
Let's use a parallel number increaser to see what's the performance penalty of false sharing. In the first version, each thread will modify some thread specific number variables, which are aligned together (so will be packed in the same cache line). In the second version, those variables are located on non-continuous places.
The performance related number would be:
Total Time:2012 (for first version)
Total Time:468 (for second version)
We can see that, false sharing introduced about 5x performance penalty. Avoiding false sharing may require aligning variables or objects in memory on cache line boundaries, so that each core accesses a private cache line that is not shared with others.
Part II - Heap Contention
Most developers manage memories using standard C library malloc/free or standard C++ library new/delete, some of them using OS APIs, for example, HeapAlloc()/HeapFree() on windows platform.
C/C++ standard memory management routines are implemented using platform specific memory management APIs, usually based on the concept of Heap. These library routines (whether is single thread version or multi-thread version) allocate/free memory resource on a single heap, which is usually called CRT heap. It's a global resource that is shared and contended among threads within a process.
This heap contention is one of the bottle neck of multi-threading applications that are memory intensive. The solution is to use thread local/private heap to do memory management, thus the resource contention is eliminated. On windows platform, this means that you need to create a dedicated heap using HeapCreate() for each thread and pass the returned heap handle to HeapAlloc()/HeapFree() functions.
Let's see this Global Heap Vs Local Heap example on Windows platform
On an 8-core system, perf test result using 8 threads are:
8 core time:59282, use global heap? true.
8 core time:20112, use global heap? false.
Using private heap will get around 3x perf gain.
NOTE:
- On windows platform, heap_no_serialization flag can be set when creating a heap, this means that there will be no synchronization cost when accessing it from multiple threads. But it turns out that setting this flag to thread private heap will be very slow on vista and later operating system.
- The reason is that in vista, Microsoft refactored the heap manager code, where some extra data structure and code are removed who is no longer part of the common case for handling heap API calls.
- Heap_no_serialization and some debug scenarios will disable Low Fragment Heap feature, who is now the de facto default policy for heaps and thus highly optimized.
Part III - Dynamic Creation/Free of C++ Object
Operator New/Delete are functions, which are the C++ version of malloc/free and responsible for create/release memory only. It has global version ::operator new and class level version (static member) class-name::operator new.
But New/Delete Operator will handle object construction and deconstruction besides memory management. It's a language operator just like +, - * / and others. New/Delete operator will call global operator new/delete or class specific operator new/delete if requested class has such operator functions.
In order to fully parallelize your application that may use some STL containers, you might need to write your own allocator to leverage thread private heap or some memory pools. Thus, your business logic is the same as single core version and contention bottle neck is eliminated at the same time.
Here is the example on writing your own operator new/delete and allocator.
[Reference]
Hehalem Architecture
http://arstechnica.com/hardware/news/2008/04/what-you-need-to-know-about-nehalem.ars
http://rolfed.com/nehalem/nehalemPaper.pdf
Cache Organization and Memory Management of the Intel Nehalem Computer Architecture
http://rolfed.com/nehalem/nehalemPaper.pdf
Cross-Platform Get Cache Line Size
http://strupat.ca/2010/10/cross-platform-function-to-get-the-line-size-of-your-cache/
Understanding and Avoiding Memory Issues with Multi-core Processors
http://www.drdobbs.com/high-performance-computing/212400410
Thread/Data placement for better/consistent performance on Multi-Core/NUMA Achitecture
http://www.renci.org/wp-content/pub/techreports/TR-08-07.pdf
Parallel Memory Management(Allocate/Free) Intensive Applications on Multi-core system
English Version - http://www.codeproject.com/KB/cpp/rtl_scaling.aspx
Chinese Version - http://blog.csdn.net/arau_sh/archive/2010/02/22/5317919.aspx
Intel Guide for Developing Multithreaded Applications
http://software.intel.com/en-us/articles/intel-guide-for-developing-multithreaded-applications/
Windows Heap Management/Performance
http://stackoverflow.com/questions/1983563/reason-for-100x-slowdown-with-heap-memory-functions-using-heap-no-serialize-on-v
http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Marinescu.pdf
http://www.codeproject.com/KB/winsdk/HeapPerf.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2010/04/29/10004218.aspx
Memory Optimization for the entire C++ program
http://www.cantrip.org/wave12.html
C++ Dynamic Memory Management Techniques
http://www.cs.wustl.edu/~schmidt/PDF/C++-mem-mgnt4.pdf
Understanding Operator New and Operator Delete
http://www.codeproject.com/KB/cpp/Memory_Management.aspx
C++ Standard Allocator - Introduction and Implementation
http://www.codeproject.com/KB/cpp/allocator.aspx
http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c4079
Improve Performance by Allocator using Pooled Memory
http://www.drdobbs.com/cpp/184406243
Improving STL Allocators
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2045.html
Anatomy of the Linux slab allocator
http://www.ibm.com/developerworks/linux/library/l-linux-slab-allocator/
In this article I will point out some memory related problems regarding multicore architecture and also some solutions.
Part I - Memory Contention
Memory Contention means that different cores share a common data region(in main memory and cache) that needs to be synchronized among them. Synchronizing data among different cores has big performance penalty because bus traffic contention, locking cost and cache miss. To deal with such problem, there are two strategies:
1. Don't Share Writable State Among Cores
To minimize memory bus traffic, you should minimize core interactions by minimizing shared locations/data, even if the shared data is not protected by lock but some hardware level atomic instructions such as InterlockedExchangeAdd64 on win32 platform.
The patterns that tend to reduce lock contention also tend to reduce memory traffic, because it is the shared writable state that requires locks and generates contention. In practice, letting each thread work on its own local copy of the data and merging the data after all threads are done can be a very effective strategy.
Let's see two parallel versions of sum calculation program on an eight-core computer. One version uses a shared global variable protected by InterlockedExchangeAdd64() to track all intermediate results among all threads. The other version gives each thread a private partial sum variable that's not shared at all and the final sum is computed as the sum of all these partial sums.
From the console output we can see clearly that, the private partial sum solution is 20x faster than the other one.
Use Global - Total Sum is:49999995000000, used ticket:904.
Use Local - Total Sum is:49999995000000, used ticket:47. So, even if we just share one variable protected by hardware atomic instructions, the performance penalty could be very significant.
The general rule for efficient execution on a single core is to pack data tightly, so that it has as small a footprint as possible. But on a multi-core processor, packing shared data can lead to a severe penalty from false sharing. Generally, the solution is to pack data tightly, give each thread its own private copy to work on, and merge results afterwards.
2. Avoid False Sharing introduced by Core Cache
Good performance depends on processors fetching most of their data from cache instead of main memory. For sequential programs, modern caches generally work well without too much thought, though a little tuning helps. The smallest unit of memory that two processors interchange is a cache line or cache sector.
Even if we follows the strategy 1 and let each thread access its private data/state, different thread on different cores may also share the same cache line. This is called "false sharing". Avoiding false sharing may require aligning variables or objects in memory on cache line boundaries.
Let's use a parallel number increaser to see what's the performance penalty of false sharing. In the first version, each thread will modify some thread specific number variables, which are aligned together (so will be packed in the same cache line). In the second version, those variables are located on non-continuous places.
The performance related number would be:
Total Time:2012 (for first version)
Total Time:468 (for second version)
We can see that, false sharing introduced about 5x performance penalty. Avoiding false sharing may require aligning variables or objects in memory on cache line boundaries, so that each core accesses a private cache line that is not shared with others.
Part II - Heap Contention
Most developers manage memories using standard C library malloc/free or standard C++ library new/delete, some of them using OS APIs, for example, HeapAlloc()/HeapFree() on windows platform.
C/C++ standard memory management routines are implemented using platform specific memory management APIs, usually based on the concept of Heap. These library routines (whether is single thread version or multi-thread version) allocate/free memory resource on a single heap, which is usually called CRT heap. It's a global resource that is shared and contended among threads within a process.
This heap contention is one of the bottle neck of multi-threading applications that are memory intensive. The solution is to use thread local/private heap to do memory management, thus the resource contention is eliminated. On windows platform, this means that you need to create a dedicated heap using HeapCreate() for each thread and pass the returned heap handle to HeapAlloc()/HeapFree() functions.
Let's see this Global Heap Vs Local Heap example on Windows platform
On an 8-core system, perf test result using 8 threads are:
8 core time:59282, use global heap? true.
8 core time:20112, use global heap? false.
Using private heap will get around 3x perf gain.
NOTE:
- On windows platform, heap_no_serialization flag can be set when creating a heap, this means that there will be no synchronization cost when accessing it from multiple threads. But it turns out that setting this flag to thread private heap will be very slow on vista and later operating system.
- The reason is that in vista, Microsoft refactored the heap manager code, where some extra data structure and code are removed who is no longer part of the common case for handling heap API calls.
- Heap_no_serialization and some debug scenarios will disable Low Fragment Heap feature, who is now the de facto default policy for heaps and thus highly optimized.
Part III - Dynamic Creation/Free of C++ Object
Operator New/Delete are functions, which are the C++ version of malloc/free and responsible for create/release memory only. It has global version ::operator new and class level version (static member) class-name::operator new.
But New/Delete Operator will handle object construction and deconstruction besides memory management. It's a language operator just like +, - * / and others. New/Delete operator will call global operator new/delete or class specific operator new/delete if requested class has such operator functions.
In order to fully parallelize your application that may use some STL containers, you might need to write your own allocator to leverage thread private heap or some memory pools. Thus, your business logic is the same as single core version and contention bottle neck is eliminated at the same time.
Here is the example on writing your own operator new/delete and allocator.
[Reference]
Hehalem Architecture
http://arstechnica.com/hardware/news/2008/04/what-you-need-to-know-about-nehalem.ars
http://rolfed.com/nehalem/nehalemPaper.pdf
Cache Organization and Memory Management of the Intel Nehalem Computer Architecture
http://rolfed.com/nehalem/nehalemPaper.pdf
Cross-Platform Get Cache Line Size
http://strupat.ca/2010/10/cross-platform-function-to-get-the-line-size-of-your-cache/
Understanding and Avoiding Memory Issues with Multi-core Processors
http://www.drdobbs.com/high-performance-computing/212400410
Thread/Data placement for better/consistent performance on Multi-Core/NUMA Achitecture
http://www.renci.org/wp-content/pub/techreports/TR-08-07.pdf
Parallel Memory Management(Allocate/Free) Intensive Applications on Multi-core system
English Version - http://www.codeproject.com/KB/cpp/rtl_scaling.aspx
Chinese Version - http://blog.csdn.net/arau_sh/archive/2010/02/22/5317919.aspx
Intel Guide for Developing Multithreaded Applications
http://software.intel.com/en-us/articles/intel-guide-for-developing-multithreaded-applications/
Windows Heap Management/Performance
http://stackoverflow.com/questions/1983563/reason-for-100x-slowdown-with-heap-memory-functions-using-heap-no-serialize-on-v
http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Marinescu.pdf
http://www.codeproject.com/KB/winsdk/HeapPerf.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2010/04/29/10004218.aspx
Memory Optimization for the entire C++ program
http://www.cantrip.org/wave12.html
C++ Dynamic Memory Management Techniques
http://www.cs.wustl.edu/~schmidt/PDF/C++-mem-mgnt4.pdf
Understanding Operator New and Operator Delete
http://www.codeproject.com/KB/cpp/Memory_Management.aspx
C++ Standard Allocator - Introduction and Implementation
http://www.codeproject.com/KB/cpp/allocator.aspx
http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c4079
Improve Performance by Allocator using Pooled Memory
http://www.drdobbs.com/cpp/184406243
Improving STL Allocators
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2045.html
Anatomy of the Linux slab allocator
http://www.ibm.com/developerworks/linux/library/l-linux-slab-allocator/
1/16/2011
Tips for Smart Pointers in C++
Part I - Brief Summary for Various Smart Pointers
1. auto_ptr
- RAII and transfer-of-ownership semantics based, but no shared-ownership
- Managed heap object will be owned by one and only one
- Assignment/Copy Construction will transfer ownership
- Can be compiled with STL containers, but wrong semantic
2. scoped_ptr
- RAII semantic based, but no shared-ownership, nor transfer-of-ownership semantics
- Managed heap object will be owned by one and only one pointer
- Assignment/Copy Construction are forbidden
- Can't be compiled with STL containers
3. shared_ptr
- Reference count based
- Managed heap object could be owned by multiple smart pointers
- Assignment/Copy Construction will add ownership
- To avoid memory leak, don't construct temporary shared_ptr object on function call parameter
- Can't construct a shared_ptr object from this pointer (Causes double deletion)
4. intrusive_ptr
- Basically the same as shared_ptr
- Shared ownership of objects with an embedded reference count
- Can be constructed from an arbitrary raw pointer of type T *
- Try shared_ptr first, if it isn't obvious whether intrusive_ptr better fits your needs
5. weak_ptr
- Just reference, no ownership, no RAII, no shared-ownership, no transfer of ownership
- Linked to a shared_ptr object and known by it
- Shared_ptr will reset weak_ptr when it decides to destroy the dynamic object owned by it
- It's a safe(no need to worry the dangling reference) way to reference a dynamic object but don't own it
- A nice feature of weak_ptr is that, it can access the internal state of corresponding shared_ptr object
6. unique_ptr
- C++0x introduced a new scoped_ptr like pointer: unique_ptr to replace auto_ptr.
- It hide assignment operator and copy constructor
- Transfer-of-ownership can be done using std::move() explicitly
These smart points are only suitable for single dynamic object, for object array, use other smart pointers whose name ended as "_array".
Part II - Tips for shared_ptr
1. shared_ptr VS weak_ptr
- shared_ptr owns some heap object
- weak_ptr points some heap object
2. Handling this Pointer
It's safe to construct a shared_ptr object from a newly created heap object since it's not managed by any other shared_ptr object yet. But when you want to pass this pointer to a function that expects a shared_ptr object, you will encounter a tricky problem because most likely, the heap object is already created and managed by other shared_ptr objects.
The problem is that, in general, you can't create a shared_ptr from an existing raw pointer - the new shared_ptr you create won't "know" about the other instances that refer to the same object and you'll get multiple-deletes.
2.1. Use enable_shared_from_this from boost library
You can derive from enable_shared_from_this and then you can use "shared_from_this()" instead of "this" to spawn a shared pointer to your own self object.
How it's implemented?
- Add a weak_ptr member to point to an existing shared_ptr object that manages this object
- When shared_ptr object get constructed from raw pointer to a this kind of object, it will properly set the weak_ptr inside that object
- shared_from_this() will construct a safe shared_ptr object from the weak_ptr member
- In boost shared_ptr implementation, the "sp_enable_shared_from_this()" function will get called in shared_ptr's constructor. In this function, if the passed in dynamic object derives from enable_shared_from_this, it will set the weak_ptr member using itself.
If you adopt this method, you should be careful not creating such object on stack. Because when creating object on stack, the object is not managed by any shared_ptr, so no shared_ptr's constructor gets called and the corresponding weak_ptr member won't get set properly.
2.2 If you know that your object is long lived, you can do the following:
struct null_deleter
{
template void operator()(T *) {}
}
Then in your code, just return a shared_ptr(this, null_deleter()).
3. Handling Null Valued shared_ptr Object.
When you are using shared_ptr in your code, sometimes you need a NULL equivalent stuff to represent a pointer that didn't point anything meaningful.
Generally speaking, you have the following choices:
Returning Zero/Null for smart pointers is acceptable in some cases too, when the other alternatives don't make sense. Consider the following code:
class some_class_name{
public:
template<typename T> operator shared_ptr<T>() { return shared_ptr<T>(); }
} nullPtr;
Use this template function when any boost::shared_ptr<> typed null pointer is needed.
[Reference]
smart pointers overview
http://en.wikipedia.org/wiki/Smart_pointer
http://www.informit.com/articles/article.aspx?p=25264
http://www.drdobbs.com/184401507
http://dlugosz.com/Repertoire/refman/Classics/Smart%20Pointers%20Overview.html
http://ootips.org/yonat/4dev/smart-pointers.htm
unique_ptr
http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=401
shared_ptr
http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=239
weak_ptr
http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=300
http://www.drdobbs.com/184402026;jsessionid=X2T3WUC5FRMSDQE1GHPSKHWATMY32JVN
shared_ptr for this pointer
http://stackoverflow.com/questions/142391/getting-a-boostshared-ptr-for-this
1. auto_ptr
- RAII and transfer-of-ownership semantics based, but no shared-ownership
- Managed heap object will be owned by one and only one
- Assignment/Copy Construction will transfer ownership
- Can be compiled with STL containers, but wrong semantic
2. scoped_ptr
- RAII semantic based, but no shared-ownership, nor transfer-of-ownership semantics
- Managed heap object will be owned by one and only one pointer
- Assignment/Copy Construction are forbidden
- Can't be compiled with STL containers
3. shared_ptr
- Reference count based
- Managed heap object could be owned by multiple smart pointers
- Assignment/Copy Construction will add ownership
- To avoid memory leak, don't construct temporary shared_ptr object on function call parameter
- Can't construct a shared_ptr object from this pointer (Causes double deletion)
4. intrusive_ptr
- Basically the same as shared_ptr
- Shared ownership of objects with an embedded reference count
- Can be constructed from an arbitrary raw pointer of type T *
- Try shared_ptr first, if it isn't obvious whether intrusive_ptr better fits your needs
5. weak_ptr
- Just reference, no ownership, no RAII, no shared-ownership, no transfer of ownership
- Linked to a shared_ptr object and known by it
- Shared_ptr will reset weak_ptr when it decides to destroy the dynamic object owned by it
- It's a safe(no need to worry the dangling reference) way to reference a dynamic object but don't own it
- A nice feature of weak_ptr is that, it can access the internal state of corresponding shared_ptr object
6. unique_ptr
- C++0x introduced a new scoped_ptr like pointer: unique_ptr to replace auto_ptr.
- It hide assignment operator and copy constructor
- Transfer-of-ownership can be done using std::move() explicitly
These smart points are only suitable for single dynamic object, for object array, use other smart pointers whose name ended as "_array".
Part II - Tips for shared_ptr
1. shared_ptr VS weak_ptr
- shared_ptr owns some heap object
- weak_ptr points some heap object
2. Handling this Pointer
It's safe to construct a shared_ptr object from a newly created heap object since it's not managed by any other shared_ptr object yet. But when you want to pass this pointer to a function that expects a shared_ptr object, you will encounter a tricky problem because most likely, the heap object is already created and managed by other shared_ptr objects.
The problem is that, in general, you can't create a shared_ptr from an existing raw pointer - the new shared_ptr you create won't "know" about the other instances that refer to the same object and you'll get multiple-deletes.
2.1. Use enable_shared_from_this from boost library
You can derive from enable_shared_from_this and then you can use "shared_from_this()" instead of "this" to spawn a shared pointer to your own self object.
How it's implemented?
- Add a weak_ptr member to point to an existing shared_ptr object that manages this object
- When shared_ptr object get constructed from raw pointer to a this kind of object, it will properly set the weak_ptr inside that object
- shared_from_this() will construct a safe shared_ptr object from the weak_ptr member
- In boost shared_ptr implementation, the "sp_enable_shared_from_this()" function will get called in shared_ptr's constructor. In this function, if the passed in dynamic object derives from enable_shared_from_this, it will set the weak_ptr member using itself.
If you adopt this method, you should be careful not creating such object on stack. Because when creating object on stack, the object is not managed by any shared_ptr, so no shared_ptr's constructor gets called and the corresponding weak_ptr member won't get set properly.
2.2 If you know that your object is long lived, you can do the following:
struct null_deleter
{
template
}
Then in your code, just return a shared_ptr
When you are using shared_ptr in your code, sometimes you need a NULL equivalent stuff to represent a pointer that didn't point anything meaningful.
- Return iterators and the end iterator if not found
- Boost::optional
- Silly return codes
Returning Zero/Null for smart pointers is acceptable in some cases too, when the other alternatives don't make sense. Consider the following code:
class some_class_name{
public:
template<typename T> operator shared_ptr<T>() { return shared_ptr<T>(); }
} nullPtr;
Use this template function when any boost::shared_ptr<> typed null pointer is needed.
[Reference]
smart pointers overview
http://en.wikipedia.org/wiki/Smart_pointer
http://www.informit.com/articles/article.aspx?p=25264
http://www.drdobbs.com/184401507
http://dlugosz.com/Repertoire/refman/Classics/Smart%20Pointers%20Overview.html
http://ootips.org/yonat/4dev/smart-pointers.htm
unique_ptr
http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=401
shared_ptr
http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=239
weak_ptr
http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=300
http://www.drdobbs.com/184402026;jsessionid=X2T3WUC5FRMSDQE1GHPSKHWATMY32JVN
shared_ptr for this pointer
http://stackoverflow.com/questions/142391/getting-a-boostshared-ptr-for-this
Labels:
engineering
12/20/2010
Parallel Database for OLTP and OLAP
Just a survey article on materials on parallel database products and technologies for OLTP/OLAP applications. It mainly covers major commercial/academic efforts on developing parallel dbms to solve the ever growing large amount of relational data processing problem.
Part I - Parallel DBMSs
1.1 Parallel Database for OLAP (Shared-Nothing/MPP)
TeraData
- TeraData Home
- Teradata DBC/1012 Paper
- NCR Teradata VS Oracle Exadata
Vertica
- Vertica Home
- The original research project: C-Strore
Paraccel
- Paraccel Home
- MPP Based Architecture
- Columnar Based Storage
- Flash Based Storage
DataLlegro(now MS Madison)
- Design Choices in MPP Data Warehousing Lessons from DATAllegro V3
- Microsoft SQL Server Parallel Data Warehousing
Netezza
- Netezza Home
- Acquired by IBM
- Hadoop & Netezza: Synergy in Data Analytics (Part 1, Part 2)
- Netezza Twinfin VS Oracle Exadata (eBook, Blog)
GreenPlum:
- GreenPlum Home
- Combined: PostGreSQL/ZFS/MapReduce
- Acquired by EMC
Oracle ExaData:
- ExaData Home
- OLTP & OLAP Hybrid Orientation
- 1 * RAC + N * Exadata Cells (Storage Node) + Infiniband Network
- Exadata Cell: Flash Cache + Disk Array + Data Filtering Logic (partial SQL execution)
- Exadata – the Sequel is a great Exadata study article
IBM DB2 Data Partitioning Feature (can work with both OLAP/OLTP)
- formerly known as DB2 Parallel Edition (An Shorter Overview)
- DB2 At a Glance - Data Partitioning Feature
- Simulating Massively Parallel Database Processing on Linux
AsterData:
- Supercharging Analytics with SQL-MapReduce
- Aster Data brings Applications inside an MPP Database
Misc Articles:
- What's MPP?
- Comparison of Oracle to IBM DB2 UDB and NCR Teradata for Data Warehousing
- SMP or MPP for Data Warehouse
- Dividing the data Warehousing work among MPP Nodes
- SANs vs. DAS in MPP data Warehousing
- Three ways Oracle or Microsoft could go MPP
1.2 Parallel Database for OLTP (Shared-Disk/SMP)
Oracle Real Application Cluster
- Oracle RAC Concepts
- Oracle Parallel Database Server Concepts
- Oracle RAC Case Study on 16-Node Linux Cluster
IBM DB2 for z/OS (with Sysplex Technology)
- Share Disk and Share Nothing for IBM DB2
- What's DB2 Data Sharing?
IBM DB2 for LUW (with pureScale Technology)
- IBM DB2 pureScale: The Next Big Thing or a Solution Looking for a Problem?
- What is DB2 pureScale?
- DB2 pureScale Scalability (section 1, section 2)
Part II - Academic Readings
2.1 Overview
1). Parallel Database System: The Future of High Performance Database Processing
2). Survey of Architecture of Parallel Database System
3). The Case for Shared Nothing
4). Much Ado About Shared-Nothing
2.2 Research System
1). XPS: A High Performance Parallel Database Server
2). The Design of XPRS
3). Prototyping Buuba, H High Parallel Database System
4). The Gamma Database Machine Project
5). NonStop SQL, A Distributed, High-Performance, High-Availability Implementation of SQL
6). Parallel Query Processing in Shared Disk Database System
7). Architecture of SDC, the Super Database Computer
2.3 Commercial System
1). A Study of A Parallel Database Machine and Its Performance - The NCR/TERADATA DBC/1012
2). A Practical Implementation of the Database Machine - Teradata DBC/1012
3). DB2 Parallel Edition
4). Parallel SQL Execution in Oracle 10g
6). Shared Cache - The Future of Parallel Database
7). Cache Fusion: Extending Shared-Disk Clusters with Shared Caches
Part I - Parallel DBMSs
1.1 Parallel Database for OLAP (Shared-Nothing/MPP)
TeraData
- TeraData Home
- Teradata DBC/1012 Paper
- NCR Teradata VS Oracle Exadata
Vertica
- Vertica Home
- The original research project: C-Strore
Paraccel
- Paraccel Home
- MPP Based Architecture
- Columnar Based Storage
- Flash Based Storage
DataLlegro(now MS Madison)
- Design Choices in MPP Data Warehousing Lessons from DATAllegro V3
- Microsoft SQL Server Parallel Data Warehousing
Netezza
- Netezza Home
- Acquired by IBM
- Hadoop & Netezza: Synergy in Data Analytics (Part 1, Part 2)
- Netezza Twinfin VS Oracle Exadata (eBook, Blog)
GreenPlum:
- GreenPlum Home
- Combined: PostGreSQL/ZFS/MapReduce
- Acquired by EMC
Oracle ExaData:
- ExaData Home
- OLTP & OLAP Hybrid Orientation
- 1 * RAC + N * Exadata Cells (Storage Node) + Infiniband Network
- Exadata Cell: Flash Cache + Disk Array + Data Filtering Logic (partial SQL execution)
- Exadata – the Sequel is a great Exadata study article
IBM DB2 Data Partitioning Feature (can work with both OLAP/OLTP)
- formerly known as DB2 Parallel Edition (An Shorter Overview)
- DB2 At a Glance - Data Partitioning Feature
- Simulating Massively Parallel Database Processing on Linux
AsterData:
- Supercharging Analytics with SQL-MapReduce
- Aster Data brings Applications inside an MPP Database
Misc Articles:
- What's MPP?
- Comparison of Oracle to IBM DB2 UDB and NCR Teradata for Data Warehousing
- SMP or MPP for Data Warehouse
- Dividing the data Warehousing work among MPP Nodes
- SANs vs. DAS in MPP data Warehousing
- Three ways Oracle or Microsoft could go MPP
1.2 Parallel Database for OLTP (Shared-Disk/SMP)
Oracle Real Application Cluster
- Oracle RAC Concepts
- Oracle Parallel Database Server Concepts
- Oracle RAC Case Study on 16-Node Linux Cluster
IBM DB2 for z/OS (with Sysplex Technology)
- Share Disk and Share Nothing for IBM DB2
- What's DB2 Data Sharing?
IBM DB2 for LUW (with pureScale Technology)
- IBM DB2 pureScale: The Next Big Thing or a Solution Looking for a Problem?
- What is DB2 pureScale?
- DB2 pureScale Scalability (section 1, section 2)
Part II - Academic Readings
2.1 Overview
1). Parallel Database System: The Future of High Performance Database Processing
2). Survey of Architecture of Parallel Database System
3). The Case for Shared Nothing
4). Much Ado About Shared-Nothing
2.2 Research System
1). XPS: A High Performance Parallel Database Server
2). The Design of XPRS
3). Prototyping Buuba, H High Parallel Database System
4). The Gamma Database Machine Project
5). NonStop SQL, A Distributed, High-Performance, High-Availability Implementation of SQL
6). Parallel Query Processing in Shared Disk Database System
7). Architecture of SDC, the Super Database Computer
2.3 Commercial System
1). A Study of A Parallel Database Machine and Its Performance - The NCR/TERADATA DBC/1012
2). A Practical Implementation of the Database Machine - Teradata DBC/1012
3). DB2 Parallel Edition
4). Parallel SQL Execution in Oracle 10g
6). Shared Cache - The Future of Parallel Database
7). Cache Fusion: Extending Shared-Disk Clusters with Shared Caches
12/16/2010
Lecture Notes - AltaVista Indexing and Search Engine
01/18/2000, Michael Burrows gave a technical presentation at UW. In this video, he talked about the design of the AltaVista indexing system and the search engine site. The presentation is short and brief, but covers many core design and concepts which are used in today's commercial search engine systems.
The presentation video can be found at uwtv: http://uwtv.org/programs/displayevent.aspx?rid=2123
And I had recreated the PPT used in his video for further use. I tried my best to record the text and redraw the diagrams, but there may be many errors during this process. The copyright is of Mike.
I think the most interesting design is the Location Space and ISR abstraction. The first one enables store any information using inverted index mechanism and the second one solve the problem of interpreting complicated search query semantic.
But it's not easy to fully understand how the whole ISR system works to serve various query semantic.
And in the second part of his presentation, Mike mentioned many aspects of AltaVista search engine web site. Many of the experiences and designs are still good reference for today's Internet web application.
[Reference]
1. http://www.searchenginehistory.com/
2. http://en.wikipedia.org/wiki/Search_engine
3. http://en.wikipedia.org/wiki/AltaVista
The presentation video can be found at uwtv: http://uwtv.org/programs/displayevent.aspx?rid=2123
And I had recreated the PPT used in his video for further use. I tried my best to record the text and redraw the diagrams, but there may be many errors during this process. The copyright is of Mike.
I think the most interesting design is the Location Space and ISR abstraction. The first one enables store any information using inverted index mechanism and the second one solve the problem of interpreting complicated search query semantic.
But it's not easy to fully understand how the whole ISR system works to serve various query semantic.
And in the second part of his presentation, Mike mentioned many aspects of AltaVista search engine web site. Many of the experiences and designs are still good reference for today's Internet web application.
[Reference]
1. http://www.searchenginehistory.com/
2. http://en.wikipedia.org/wiki/Search_engine
3. http://en.wikipedia.org/wiki/AltaVista
Subscribe to:
Posts (Atom)