10/06/2009

Windows Debugging Facilities

Part I - System/Application Error Collection Tools

These tools are used to collection software data (especially when error occurs) that can be used to identify & fix software defects.

1. Dr. Watson

"Dr. Watson for Windows is a program error debugger that gathers information about your computer when an error (or user-mode fault) occurs with a program. The information obtained and logged by Dr. Watson is the information needed by technical support personnel to diagnose a program error for a computer running Windows."

A text file (Drwtsn32.log) is created whenever an error is detected, and can be delivered to support personnel by the method they prefer. A crash dump file can also be created, which is a binary file that a programmer can load into a debugger.

Starting with Windows Vista, Dr. Watson has been replaced with "Problem Reports and Solutions"

2. Windows Error Report

While Dr.Watson left the memory dump on the user's local machine for debugging, Windows Error Reporting offers to send the memory dump to Microsoft using the internet, more info can be found at - http://en.wikipedia.org/wiki/Windows_Error_Reporting

3. Adplus

ADPlus is console-based Microsoft Visual Basic script. It automates the Microsoft CDB debugger to produce memory dumps and log files that contain debug output from one or more processes. It has many switches to control what data to collect, more info can be found at Microsoft KB on Adplus tool

[Reference]
- Wiki on Dr. Watson Debugger
- Description of Dr. Watson Tool

Part II - Structured Exception Handling

SEH is usually known as a convenient error handling mechanism for windows native code programmers provided by windows operating system itself (with compiler support). But it is also a great system to enable applications talking to software debuggers.

1. Various Concepts

Structured exception handling
is a mechanism for handling both hardware and software exceptions. To full understand SEH mechanism, you should get familiar with the following concepts:
- guarded body of code
- exception handler
- termination handler
- filter expression, follows __except keyword,
evaluated when system conducting exception processing
- filter function
, can only be called in filter expression

filter expression & function can only return the following three values:
-
EXCEPTION_CONTINUE_SEARCH, The system continues its search for an exception handler.
-
EXCEPTION_CONTINUE_EXECUTION, The system stops its search for an exception handler, restores the machine state, and continues thread execution at the point at which the exception occurred.
-
EXCEPTION_EXECUTE_HANDLER, The system transfers control to the exception handler, and thread execution continues sequentially in the stack frame in which the exception handler is found.

2. Stack Unwinding

If the located exception handler is not in the stack frame in which the exception occurred, the system unwinds the stack, leaving the current stack frame and any other stack frames until it is back to the exception handler's stack frame.

3. Vectored Exception Handling

Vectored handlers are called in the order that they were added, after the debugger gets a first chance notification, but before the system begins unwinding the stack. Since they are not framed based, they will be called each time when an exception is raised.

4. Exception & Debugger

SEH is also a communication mechanism between windows application and debugger. The detailed description on the whole exception dispatching process can be found here and the debugger exception handling process can be found here.

The core concepts here are first-chance notification and second-chance(last-chance) notification.
- 1st chance notification is a mechanism to notify debugger the exception information before application get chance to process the exception.
- 2nd chance notification happens after the windows system finds that no application defined exception handler exists.

David Kline wrote a great article on first-chance & second-chance notification.

5. Functions and Keywords

- GetExceptionCode and GetExceptionInformation can be used to get detail information about current exception.
- The SEH compatible compiler recognize __try, __except, __finally, __leave as keywords.

- It will also interpret the GetExceptionCode, GetExceptionInformation, and AbnormalTermination functions as keywords, and their use outside the appropriate exception-handling syntax generates a compiler error.

[Reference]
- Crash Course on SEH
- Structured Exception Handling @ MSDN
- First and Second Chance Exception handling
- David Kline's Article

Part III - Dump File

In Part I, we introduced several tools to get diagnose data to enable offline debugging and analyzing. The most important diagnose data is dump file. There are two dump files:

1. Kernel-Mode Dump (system/core/memory dump)

This kind of dump happens when an stop error occurs in the windows system. The common phenomenon is that the blue screen shows up and at the same time, an core dump file is generated.

There are three kinds of core dump files:

- Complete Memory Dump

A Complete Memory Dump is the largest kernel-mode dump file. This file contains all the physical memory for the machine at the time of the fault.

The Complete Memory Dump file is written to %SystemRoot%\Memory.dmp by default.

- Small Memory Dump

A Small Memory Dump is much smaller than the other two kinds of kernel-mode crash dump files. It is exactly 64 KB in size, and requires only 64 KB of pagefile space on the boot drive.

- Kernel Memory Dump

A Kernel Memory Dump contains all the memory in use by the kernel at the time of the crash. This kind of dump file is significantly smaller than the Complete Memory Dump. Typically, the dump file will be around one-third the size of the physical memory on the system.

This dump file will not include unallocated memory, or any memory allocated to user-mode applications. It only includes memory allocated to the Windows kernel and hardware abstraction level (HAL), as well as memory allocated to kernel-mode drivers and other kernel-mode programs.

Here, you can find more detailed information on kernel-mode dump files.

An great Microsoft KB on system dump(core dump, blue screen) configuration.

2. User-Mode Dump (application/process dump)

This kind of dump file is from specific process, not from the windows system itself.

- Full Dump

A full user-mode dump is the basic user-mode dump file. This dump file includes the entire memory space of a process, the program's executable image itself, the handle table, and other information that will be useful to the debugger.

- Mini Dump

A mini user-mode dump includes only selected parts of the memory associated with a process. The size and contents of a minidump file varies depending on the program being dumped and the application doing the dumping.

The name "minidump" is misleading, because the largest minidump files actually contain more information than the "full" user-mode dump.

User mode dump files can be created using the following methods:
- using task manager
- using adplus
- using userdump

You can also using some debugging tools such as Visual Studio and Windbg to create dump files.

Manipulate Mini Dump Programmatically:
- MiniDumpReadDumpStream()
- MiniDumpWriteDump()
- MINIDUMP_TYPE ENUM

[Reference]
- Dump in Visual Studio
- Crash Dump Doc @ MSDN

9/26/2009

What's Good Research

Some senior people shared his opinions on "What's Good Research" a couple of weeks ago:

1. Like a movie, that has a good and compelling Story to tell;
[clear motivation/vision behind the various technologies]

2. Like Magic, about making seemingly impossible possible;
[non-trivial idea and implementation, should has some technical bars]

3. Emotional, touches people’s Life when you explain it.
[potential to has big impact on this world]

9/20/2009

Qizmt - MySpace's Open Source Mapreduce Framework

MySpace just open sourced it’s C# & Windows Based Map/Reduce framework – Qizmt (http://code.google.com/p/qizmt/)

C#/Windows world has an industrial-quality Hadoop counterpart. It’s a good news for both C# users and Microsoft’s server business.

One nice feature of Qizmt is it's built-in support for debugging distributed computing applications.

Qizmt currently supports .Net 3.5 SP1 on Windows 2003 Server, Windows 2008 Server and Windows Vista

9/10/2009

Edit Distance for String, Tree and Graph

  Edit Distance refers to the min steps to convert one object into another. It's a useful concept in searching, data mining and pattern recognition. It's not only limits to text String, but also applied to structured data such as Tree and Graph.

  1. String Edit Distance

  The basic idea is dynamic planning: we can solve the problem if a smaller scale problem can be solved, by choosing the least cost option. The algorithm and the proof can be found below:

[1] http://en.wikipedia.org/wiki/Levenshtein_distance
[2] http://nlp.stanford.edu/IR-book/html/htmledition/edit-distance-1.html

  I wrote an implementation that can show the whole edit process, it can be found here.

  2. Tree Edit Distance
  
  This algorithm is also based on dynamic planning, but it's not that easy for understanding anymore. The detailed tutorial on this algorithm can be found at [1], from the presentation you can find that the recursive formula is rather simple. The challenging part is to understand how to compute the elements of the matrix in dynamic algorithm and why all the steps work. It may takes several days to fully understand the algorithm.

  Code for the algorithm implementation and test can be found under this folder. It is based on the algorithm propose in [2]. [1] is a very clear tutorial on this algorithm.

  
References can be found below:
[1] http://www.inf.unibz.it/dis/teaching/ATA  
[2]
Kaizhong Zhang and Dennis Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing, 18(6):1245–1262, 1989
[3] A Survey on Tree Edit Distance and Related Problems

  3. Graph Edit Distance
  
  This problem is a even harder problem, related paper can be found at below:
[1] Bridging the Gap Between Graph Edit Distance and Kernel Machines
[2] http://www.springerlink.com/content/k21v4h4ur2r068w2/

  4. Visualization

  W
hen you dealing with Tree/Graph algorithm, visualization is very important for debug/diagnose purpose. You can use graphwiz tool to do this.

  Here you can find a lot of papers on graph visualization algorithms.


  More references listed below:
[1] http://en.wikipedia.org/wiki/Graph_drawing
[2] http://graphdrawing.org/index.html
[3] Graphviz and Dynagraph – Static and Dynamic Graph Drawing Tools
[4] Graph Visualization and Navigation in Information Visualization: a Survey
[5] Interactive Visualization of Large Graphs and Networks (PH.D Thesis)

  Update@09/27/2009 Microsoft has a great graph layout library called MSAGL(formerly known as: GLEE) that is available publicly. It beats Graphviz on many aspects, especially on the scalability perspective.

8/27/2009

Parallel DBMS V.S. Distributed DBMS

 Large Scale Data Intensive Computing is a hot topic today, many people starts to talk so called Parallel Database System and Distributed Database System technologies. But these two concepts seem very confusing, so I devoted sometime to try to make it clear.

Parallel Database System seeks to improve performance through parallelization of various operations, such as data loading, index building and query evaluating. Although data may be stored in a distributed fashion in such a system, the distribution is governed solely by performance considerations.

In Distributed Database System, data is physically stored across several sites, and each site is typically managed by a DBMS capable of running independent of the other sites. In contrast to parallel databases, the distribution of data is governed by factors such as local ownership and
increased availability.

PDB & DDB Comparison:

1. System Components
- Distributed DBMS consists of many Geo-distributed, low-bandwidth link connected, autonomic sites.
- Parallel DBMS consists of tightly coupled, high-bandwidth link connected, non-autonomic nodes.

2. Component Role
- Sites in Distributed DBMS can work independently to handle local transactions or work together to handle global transactions.
- Nodes in Parallel DBMS can only work together to handle global transactions.

3. Design Purposes
= Distributed DBMS is for:
 - Sharing Data
 - Local Autonomy
 - High Availability
= Parallel DBMS is for:
 - High Performance
 - High Availability

 But both PDB&DDB need to consider the following problems: 1. Data Distribution (Placement & Replicatioin); 2. Query Parallelization(Distributed Evaluation). And also, many parallel system consists of network of workstation, the difference between Parallel DB & Distributed DB is becoming smaller.

[Reference]
1. Great Paper on PDB&DDB Explanation Distributed and Parallel Database Systems
2. Great Paper by Jim Gray Parallel Database Systems
3. Textbook, Database Management System (3rd edition)
4. Textbook, Database System Concepts (5th edition)
5. Textbook, Principle of Distributed Database Systems (2nd edition)
6. DB Textbook List @ Amazon