Wednesday, February 25, 2009

Use complete memory dump to solve user-mode crash

The process dump is used to investigate a process crash or hang most of the time. However, sometimes it is very hard or impossible to collect the process dump. For instance, a user-mode service crashes during machine shutdown. In that case, a complete memory dump can be manually generated using the keyboard.

Windows allows users to manually generate a kernel/complete memory dump by using the keyboard. Here is the KB article on how to do it.

The following illustrates one case where a complete memory dump is used to solve a user-mode crash.

Several users have reported that FTP/SSH service crashes during machine shutdown. Except the "Runtime Error!" message box, there is no other trace for the crash. I duplicated the crash on one of Win2003 server boxes. However, I could not attach a debugger to the crashed service to generate the process dump during machine shutdown. Then I decided to manually generate a complete memory dump when "Runtime Error!" message box pops up and hoped the dump might contain some useful information regarding to the crash.

The dump was generated by using the keyboard when FTP service crashed during machine shutdown.

0: kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
...
PROCESS 89eee6f8 SessionId: 0 Cid: 079c Peb: 7ffdd000 ParentCid: 01bc
DirBase: 7f404380 ObjectTable: e2d03410 HandleCount: 129.
Image: intfysvc.exe

PROCESS 899f2bc8 SessionId: 0 Cid: 07f0 Peb: 7ffd8000 ParentCid: 01bc
DirBase: 7f4043a0 ObjectTable: e2cee100 HandleCount: 267.
Image: SSHServer.exe

PROCESS
8948fa08 SessionId: 0 Cid: 0a04 Peb: 7ffde000 ParentCid: 01bc
DirBase: 7f404520 ObjectTable: e2d831e0 HandleCount: 275.
Image:
iftpsvc.exe
...


After locating the FTP service, I set it as the current process context. One of its threads looks suspicious.

0: kd> .process /r /p 8948fa08
Implicit process is now 8948fa08
Loading User Symbols
................................................................

0: kd> !process 8948fa08
PROCESS 8948fa08 SessionId: 0 Cid: 0a04 Peb: 7ffde000 ParentCid: 01bc
DirBase: 7f404520 ObjectTable: e2d831e0 HandleCount: 275.
Image: iftpsvc.exe
VadRoot 8998ed30 Vads 233 Clone 0 Private 1698. Modified 42013. Locked 0.
DeviceMap e10018f0
Token e267fbf0
ElapsedTime 00:01:03.281
UserTime 00:00:00.546
KernelTime 00:00:00.281
QuotaPoolUsage[PagedPool] 154668
QuotaPoolUsage[NonPagedPool] 12236
Working Set Sizes (now,min,max) (5277, 50, 345) (21108KB, 200KB, 1380KB)
PeakWorkingSetSize 5298
VirtualSize 138 Mb
PeakVirtualSize 139 Mb
PageFaultCount 51692
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 4946
....


THREAD 89e3d418 Cid 0a04.0ae8 Teb: 7ffad000 Win32Thread: e26a6508 WAIT: (Unknown) UserMode Alertable
8948fcb0 NotificationEvent
898f1b88 SynchronizationEvent
89e3d490 NotificationTimer
Not impersonating
DeviceMap e10018f0
Owning Process 8948fa08 Image: iftpsvc.exe
Attached Process N/A Image: N/A
Wait Start TickCount 7198 Ticks: 20 (0:00:00:00.312)
Context Switch Count 24 LargeStack
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address MSVCR80!_endthreadex (0x781329e1)
Start Address KERNEL32!BaseThreadStartThunk (0x77e617ec)
Stack Init b8fdf000 Current b8fde900 Base b8fdf000 Limit b8fdb000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr
b8fde918 80833485 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
b8fde944 808294b9 nt!KiSwapThread+0x2e5 (FPO: [0,7,0])
b8fde978 80938f68 nt!KeWaitForMultipleObjects+0x3d7 (FPO: [8,8,4])
b8fdebf4 809390ca nt!ObpWaitForMultipleObjects+0x202 (FPO: [SEH])
b8fded48 808897bc nt!NtWaitForMultipleObjects+0xc8 (FPO: [SEH])
b8fded48 7c8285ec nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b8fded64)
045b9f6c 7c827cfb ntdll!KiFastSystemCallRet (FPO: [0,0,0])
045b9f70 77e6202c ntdll!NtWaitForMultipleObjects+0xc (FPO: [5,0,0])
045ba018 7739bbd1 KERNEL32!WaitForMultipleObjectsEx+0x11a (FPO: [SEH])
045ba074 776c1ffa USER32!RealMsgWaitForMultipleObjectsEx+0x141 (FPO: [5,13,0])
045ba09c 776e2eff ole32!CCliModalLoop::BlockFn+0x7d (FPO: [3,0,4])
045ba10c 79f752f0 ole32!CoWaitForMultipleHandles+0xc9 (FPO: [5,20,4])
045ba12c 79f75232 mscorwks!NT5WaitRoutine+0x51 (FPO: [5,1,0])
045ba198 79f7519a mscorwks!MsgWaitHelper+0xa5 (FPO: [SEH])
045ba1b8 79ed9808 mscorwks!Thread::DoAppropriateAptStateWait+0x28 (FPO: [5,0,4])
045ba23c 79ed96c4 mscorwks!Thread::DoAppropriateWaitWorker+0x13c (FPO: [5,22,4])
045ba28c 79ed9a62 mscorwks!Thread::DoAppropriateWait+0x40 (FPO: [SEH])
045ba2e8 79e78944 mscorwks!CLREvent::WaitEx+0xf7 (FPO: [3,11,4])
045ba2fc 79f7515d mscorwks!CLREvent::Wait+0x17 (FPO: [3,0,0])
045ba37c 79f750b9 mscorwks!CorExitProcess+0x30f (FPO: [0,21,4])
045ba380 79f7509d mscorwks!WaitForEndOfShutdown+0x5 (FPO: [0,0,0])
045ba3ac 79f7478c mscorwks!EEShutDown+0xae (FPO: [1,5,4])
045ba448 79004fab mscorwks!DisableRuntime+0xeb (FPO: [0,0,0])
045ba458 7c348cfa mscoree!CorExitProcess+0x46 (FPO: [1,1,0])
045ba460 7c3476c8 MSVCR71!__crtExitProcess+0x25 (FPO: [1,0,0]) (CONV: cdecl) [f:\vs70builds\3052\vc\crtbld\crt\src\crt0dat.c @ 463]
045ba490 7c348d22 MSVCR71!doexit+0xab (FPO: [Non-Fpo]) (CONV: cdecl) [f:\vs70builds\3052\vc\crtbld\crt\src\crt0dat.c @ 414]
045ba4a0 7c34d2b1 MSVCR71!_exit+0xd (FPO: [1,0,0]) (CONV: cdecl) [f:\vs70builds\3052\vc\crtbld\crt\src\crt0dat.c @ 311]
045ba4e8 7c34cf8a MSVCR71!raise+0xae (FPO: [Non-Fpo]) (CONV: cdecl) [f:\vs70builds\3052\vc\crtbld\crt\src\winsig.c @ 508]
045ba524 77e761b7
MSVCR71!abort+0xe (FPO: [0,0,0]) (CONV: cdecl) [f:\vs70builds\3052\vc\crtbld\crt\src\abort.c @ 48]
045ba77c 77e792a3
KERNEL32!UnhandledExceptionFilter+0x12a (FPO: [SEH])
045ba784 77e61ac1 KERNEL32!BaseThreadStart+0x4a (FPO: [SEH])
045ba7ac 7c828752 KERNEL32!_except_handler3+0x61 (FPO: [Uses EBP] [3,0,7])
045ba7d0 7c828723 ntdll!ExecuteHandler2+0x26
045ba878 7c82863c ntdll!ExecuteHandler+0x24
045bab58 77e4bee7 ntdll!RtlRaiseException+0x3d
045babb8 78158e89 KERNEL32!RaiseException+0x53 (FPO: [4,20,4])
045babf0 03403222 MSVCR80!_CxxThrowException+0x46 (FPO: [2,8,0])
045bac0c 03402f9d ftpsrvdata_pg!_com_raise_error+0x32 (FPO: [2,4,0]) (CONV: stdcall) [f:\sp\vctools\compiler\cxxfe\sl\vccom\comraise.cpp @ 18]
045bac18 033f87b5 ftpsrvdata_pg!_com_issue_error+0xd (FPO: [1,0,0]) (CONV: stdcall) [f:\sp\vctools\compiler\cxxfe\sl\vccom\comsupp.cpp @ 40]
045bcc58 033f9812 ftpsrvdata_pg!PrintProviderError+0x35 (FPO: [1,2058,0]) (CONV: cdecl) [d:\fts\ftpsrvdata_pg\src\idbcommand.cpp @ 860]
...


Then I switched to the register context of the suspicious thread.

0: kd> .thread /r /p 89e3d418
Implicit thread is now 89e3d418
Implicit process is now 8948fa08
Loading User Symbols
................................................................

0: kd> !thread 89e3d418
THREAD 89e3d418 Cid 0a04.0ae8 Teb: 7ffad000 Win32Thread: e26a6508 WAIT: (Unknown) UserMode Alertable
8948fcb0 NotificationEvent
898f1b88 SynchronizationEvent
89e3d490 NotificationTimer
Not impersonating
DeviceMap e10018f0
Owning Process 8948fa08 Image: iftpsvc.exe
Attached Process N/A Image: N/A
Wait Start TickCount 7198 Ticks: 20 (0:00:00:00.312)
Context Switch Count 24 LargeStack
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address MSVCR80!_endthreadex (0x781329e1)
Start Address KERNEL32!BaseThreadStartThunk (0x77e617ec)
Stack Init b8fdf000 Current b8fde900 Base b8fdf000 Limit b8fdb000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr Args to Child
b8fde918 80833485 89e3d418 f77275c8 00000001 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
b8fde944 808294b9 89e3d418 00000002 00000000 nt!KiSwapThread+0x2e5 (FPO: [0,7,0])
b8fde978 80938f68 00000002 b8fdeaac 00000001 nt!KeWaitForMultipleObjects+0x3d7 (FPO: [8,8,4])
b8fdebf4 809390ca 00000002 b8fdec1c 00000001 nt!ObpWaitForMultipleObjects+0x202 (FPO: [SEH])
b8fded48 808897bc 00000002 045b9fbc 00000001 nt!NtWaitForMultipleObjects+0xc8 (FPO: [SEH])
b8fded48 7c8285ec 00000002 045b9fbc 00000001 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b8fded64)
045b9f6c 7c827cfb 77e6202c 00000002 045b9fbc ntdll!KiFastSystemCallRet (FPO: [0,0,0])
045b9f70 77e6202c 00000002 045b9fbc 00000001 ntdll!NtWaitForMultipleObjects+0xc (FPO: [5,0,0])
045ba018 7739bbd1 00000002 045ba040 00000000 KERNEL32!WaitForMultipleObjectsEx+0x11a (FPO: [SEH])
045ba074 776c1ffa 00000001 001514f8 00013880 USER32!RealMsgWaitForMultipleObjectsEx+0x141 (FPO: [5,13,0])
045ba09c 776e2eff 001514f8 00013880 045ba0c4 ole32!CCliModalLoop::BlockFn+0x7d (FPO: [3,0,4])
045ba10c 79f752f0 00000002 00013880 00000001 ole32!CoWaitForMultipleHandles+0xc9 (FPO: [5,20,4])
045ba12c 79f75232 00000000 00013880 00000001 mscorwks!NT5WaitRoutine+0x51 (FPO: [5,1,0])
045ba198 79f7519a 00000001 001514f8 00000000 mscorwks!MsgWaitHelper+0xa5 (FPO: [SEH])
045ba1b8 79ed9808 00000001 001514f8 00000000 mscorwks!Thread::DoAppropriateAptStateWait+0x28 (FPO: [5,0,4])
045ba23c 79ed96c4 00000001 001514f8 00000000 mscorwks!Thread::DoAppropriateWaitWorker+0x13c (FPO: [5,22,4])
045ba28c 79ed9a62 00000001 001514f8 00000000 mscorwks!Thread::DoAppropriateWait+0x40 (FPO: [SEH])
045ba2e8 79e78944 00013880 00000001 00000000 mscorwks!CLREvent::WaitEx+0xf7 (FPO: [3,11,4])
045ba2fc 79f7515d 00013880 00000001 00000000 mscorwks!CLREvent::Wait+0x17 (FPO: [3,0,0])
045ba37c 79f750b9 79f7509d 1e232538 00000000 mscorwks!CorExitProcess+0x30f (FPO: [0,21,4])
045ba380 79f7509d 1e232538 00000000 00000000 mscorwks!WaitForEndOfShutdown+0x5 (FPO: [0,0,0])
045ba3ac 79f7478c 00000e10 045ba3f0 79f7394a mscorwks!EEShutDown+0xae (FPO: [1,5,4])
045ba448 79004fab 00000003 79e70000 045ba490 mscorwks!DisableRuntime+0xeb (FPO: [0,0,0])
045ba458 7c348cfa 00000003 7c3476c9 00000003 mscoree!CorExitProcess+0x46 (FPO: [1,1,0])
045ba460 7c3476c8 00000003 00000000 7c391300 MSVCR71!__crtExitProcess+0x25 (FPO: [1,0,0]) (CONV: cdecl) [f:\vs70builds\3052\vc\crtbld\crt\src\crt0dat.c @ 463]
045ba490 7c348d22 00000003 00000001 00000000 MSVCR71!doexit+0xab (FPO: [Non-Fpo]) (CONV: cdecl) [f:\vs70builds\3052\vc\crtbld\crt\src\crt0dat.c @ 414]
045ba4a0 7c34d2b1 00000003 00000000 045ba7a4 MSVCR71!_exit+0xd (FPO: [1,0,0]) (CONV: cdecl) [f:\vs70builds\3052\vc\crtbld\crt\src\crt0dat.c @ 311]
045ba4e8 7c34cf8a 00000016 0000000a 00000000 MSVCR71!raise+0xae (FPO: [Non-Fpo]) (CONV: cdecl) [f:\vs70builds\3052\vc\crtbld\crt\src\winsig.c @ 508]
045ba524 77e761b7 045ba7a4 00000000 00000000 MSVCR71!abort+0xe (FPO: [0,0,0]) (CONV: cdecl) [f:\vs70builds\3052\vc\crtbld\crt\src\abort.c @ 48]
045ba77c 77e792a3 045ba7a4 77e61ac1 045ba7ac KERNEL32!UnhandledExceptionFilter+0x12a (FPO: [SEH])
045ba784 77e61ac1 045ba7ac 00000000 045ba7ac KERNEL32!BaseThreadStart+0x4a (FPO: [SEH])
045ba7ac 7c828752 045bab68 045bffdc 045ba888 KERNEL32!_except_handler3+0x61 (FPO: [Uses EBP] [3,0,7])
045ba7d0 7c828723 045bab68 045bffdc 045ba888 ntdll!ExecuteHandler2+0x26
045ba878 7c82863c 045b0000 045ba888 00010007 ntdll!ExecuteHandler+0x24
045bab58 77e4bee7 045bab68 00000000 e06d7363 ntdll!RtlRaiseException+0x3d
045babb8 78158e89 e06d7363 00000001 00000003 KERNEL32!RaiseException+0x53 (FPO: [4,20,4])
045babf0 03403222 045bac00 03440a54 0342dc6c MSVCR80!_CxxThrowException+0x46 (FPO: [2,8,0])
045bac0c 03402f9d 80004003 00000000 033f87b5 ftpsrvdata_pg!_com_raise_error+0x32 (FPO: [2,4,0]) (CONV: stdcall) [f:\sp\vctools\compiler\cxxfe\sl\vccom\comraise.cpp @ 18]
045bac18 033f87b5 80004003 045bf9b0 045bd1cc ftpsrvdata_pg!_com_issue_error+0xd (FPO: [1,0,0]) (CONV: stdcall) [f:\sp\vctools\compiler\cxxfe\sl\vccom\comsupp.cpp @ 40]
045bcc58 033f9812 00000000 78158ed7 045bf9b0 ftpsrvdata_pg!PrintProviderError+0x35 (FPO: [1,2058,0]) (CONV: cdecl) [d:\fts\ftpsrvdata_pg\src\idbcommand.cpp @ 860]

It seems that the thread had thrown an unhandled exception, which caused the service to crash.

Since KERNEL32!UnhandledExceptionFilter is on the stack, I could dereference its first parameter to retrieve the exception record and context record and thereby find the exception stack. Here is the KB article on how to do it.

0: kd> dd 045ba7a4 l2
045ba7a4 045bab68 045ba888
0: kd> .cxr 045ba888
eax=045bab68 ebx=033f97fe ecx=00000000 edx=045bac00 esi=045babf0 edi=00000000
eip=77e4bee7 esp=045bab64 ebp=045babb8 iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
KERNEL32!RaiseException+0x53:
001b:77e4bee7 5e pop esi
0: kd> kb
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr Args to Child
045babb8 78158e89 e06d7363 00000001 00000003 KERNEL32!RaiseException+0x53
045babf0 03403222 045bac00 03440a54 0342dc6c MSVCR80!_CxxThrowException+0x46
045bac0c 03402f9d 80004003 00000000 033f87b5 ftpsrvdata_pg!_com_raise_error+0x32 [f:\sp\vctools\compiler\cxxfe\sl\vccom\comraise.cpp @ 18]
045bac18 033f87b5 80004003 045bf9b0 045bd1cc ftpsrvdata_pg!_com_issue_error+0xd [f:\sp\vctools\compiler\cxxfe\sl\vccom\comsupp.cpp @ 40]
045bcc58 033f9812 00000000 78158ed7 045bf9b0 ftpsrvdata_pg!PrintProviderError+0x35 [d:\fts\ftpsrvdata_pg\src\idbcommand.cpp @ 860]
045bf9bc 033dbf4a 045bf9f0 045bcc60 71f5cadb ftpsrvdata_pg!IDBCommand::ExecuteResults+0x2a2 [d:\fts\ftpsrvdata_pg\src\idbcommand.cpp @ 610]
045bfe08 00d66a6a 00000001 045bfe74 00000015 ftpsrvdata_pg!IFTPSystemStats_Impl::UpdateSystemStats+0xaa [d:\fts\ftpsrvdata_pg\src\ftpsystemstats_impl.cpp @ 288]
045bff78 781329bb 006ab558 1e2328f2 00000000 FTPParsers!SessionManagerThreadProc+0x19a [d:\fts\ftpparsers\ftpstats.cpp @ 413]
045bffb0 78132a47 00000000 77e64829 03266678 MSVCR80!_endthreadex+0x3b
045bffb8 77e64829 03266678 00000000 00000000 MSVCR80!_endthreadex+0xc7
045bffec 00000000 781329e1 03266678 00000000 KERNEL32!BaseThreadStart+0x34


ftpsrvdata_pg!PrintProviderError has the following signature,
void PrintProviderError(ADODB::_ConnectionPtr pConnection).

ftpsrvdata_pg!_com_raise_error has the following signature,
void __stdcall _com_raise_error(HRESULT hr, IErrorInfo* perrinfo = 0).

When I ran !error 80004003 (the first parameter of _com_raise_error), it returned "Invalid Pointer" error message. The only parameter of ftpsrvdata_pg!PrintProviderError is NULL, it seems that the crash was due to the NULL pointer. Since there is the line number for the source code, it would be pretty straightforward to verify the result -- the code does not check the NULL pointer.
 

Monday, February 23, 2009

Is a thread waiting?

The following dump indicates the thread is in wait state.

THREAD 893683b0 Cid 0d9c.0f38 Teb: 7ffad000 Win32Thread: e6be2850 WAIT: (UserRequest) UserMode Non-Alertable
895ae760 NotificationEvent
893684a0 NotificationTimer
IRP List:
895e3c68: (0006,0190) Flags: 00000000 Mdl: 895712c0
893c2858: (0006,0190) Flags: 00000000 Mdl: 8937c008
Not impersonating
DeviceMap e1009200
Owning Process 0 Image:
Attached Process 89573b98 Image: iftpsvc.exe
Wait Start TickCount 19645 Ticks: 92 (0:00:00:01.437)
Context Switch Count 309 LargeStack
UserTime 00:00:00.265
KernelTime 00:00:00.046
Win32 Start Address 0x79f9205f
Start Address 0x7c8106e9
Stack Init b1173000 Current b1172ca0 Base b1173000 Limit b116f000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0 DecrementCount 16
ChildEBP RetAddr
b1172cb8 804e1bd2 nt!KiSwapContext+0x2f (FPO: [Uses EBP] [0,0,4])
b1172cc4 804e1c1e nt!KiSwapThread+0x8a (FPO: [0,0,0])
b1172cec 8056dff6 nt!KeWaitForSingleObject+0x1c2 (FPO: [5,5,4])
b1172d50 804dd99f nt!NtWaitForSingleObject+0x9a (FPO: [Non-Fpo])
b1172d50 7c90e4f4 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b1172d64)


However, I would like to confirm the thread is really in wait state. To do so, I first retrieve KeTickCount, which was saved when the dump was generated, then calculate the delta between KeTickCount and Wait Start TickCount for the thread. If the delta is equal to Ticks, then the thread is still in wait state. If the delta is larger than Ticks, the thread is no longer in wait state.

0: kd> dd KeTickCount l1
8055a000 00004d19

0: kd> ? 00004d19
Evaluate expression: 19737 = 00004d19

0: kd> ? 0n19737 - 0n19645
Evaluate expression: 92 = 0000005c


So the thread is still in wait state.

Wednesday, February 18, 2009

Ownership and signaled state of synchronization objects

Synchronization objects are indispensable in multithreaded programming. Except critical sections, which are user mode objects, others, such as events, mutexes, semaphores, ERESOURCEs, threads, or processes are all kernel mode objects. I include threads and processes as synchronization objects since in many cases they can be used in xxWaitForSingleObject, and I include ERESOURCEs since they provide similar functionality as critical sections conceptually.

Due to different functionality and role of each type of synchronization object in a multithreaded program, when it comes to debugging, the focus would differ based on the synchronization object investigated. For instance, besides which threads waiting on it, for a critical section, mutex, or ERESOURCE, we would like to know which thread owns it; for an event, thread, or process, we would like to know whether the object is signaled.

Since a critical section is a user mode object, a user mode command, such as !cs or !ntsdexts.locks can be used to find out which thread owns the critical section.

A mutex or ERESOURCE is a kernel mode object, we can use live kernel mode debugging or kernel/complete memory dump to find out its ownership.

0: kd> dt _KMUTANT
nt!_KMUTANT
+0x000 Header : _DISPATCHER_HEADER
+0x010 MutantListEntry : _LIST_ENTRY
+0x018 OwnerThread : Ptr32 _KTHREAD
+0x01c Abandoned : UChar
+0x01d ApcDisable : UChar

lkd> dt _ERESOURCE
nt!_ERESOURCE
+0x000 SystemResourcesList : _LIST_ENTRY
+0x008 OwnerTable : Ptr32 _OWNER_ENTRY
+0x00c ActiveCount : Int2B
+0x00e Flag : Uint2B
+0x010 SharedWaiters : Ptr32 _KSEMAPHORE
+0x014 ExclusiveWaiters : Ptr32 _KEVENT
+0x018 OwnerThreads : [2] _OWNER_ENTRY
+0x028 ContentionCount : Uint4B
+0x02c NumberOfSharedWaiters : Uint2B
+0x02e NumberOfExclusiveWaiters : Uint2B
+0x030 Address : Ptr32 Void
+0x030 CreatorBackTraceIndex : Uint4B
+0x034 SpinLock : Uint4B

lkd> dt _OWNER_ENTRY
nt!_OWNER_ENTRY
+0x000 OwnerThread : Uint4B
+0x004 OwnerCount : Int4B
+0x004 TableSize : Uint4B


Per MSDN documentation, a kernel ERESOURCE pointer is passed as the first paramter into nt!ExAcquireResourceSharedLite and nt!ExAcquireResourceExclusiveLite.

Since an event, thread, or process is a kernel mode object, we can use live kernel mode debugging or kernel/complete memory dump to find out whether it is signaled or not.

0: kd> dt -r1 _KEVENT
nt!_KEVENT
+0x000 Header : _DISPATCHER_HEADER
+0x000 Type : UChar
+0x001 Absolute : UChar
+0x002 Size : UChar
+0x003 Inserted : UChar
+0x004 SignalState : Int4B
+0x008 WaitListHead : _LIST_ENTRY

0: kd> dt -r1 _KTHREAD
nt!_KTHREAD
+0x000 Header : _DISPATCHER_HEADER
+0x000 Type : UChar
+0x001 Absolute : UChar
+0x002 Size : UChar
+0x003 Inserted : UChar
+0x004 SignalState : Int4B
+0x008 WaitListHead : _LIST_ENTRY
...

0: kd> dt -r1 _KPROCESS
nt!_KPROCESS
+0x000 Header : _DISPATCHER_HEADER
+0x000 Type : UChar
+0x001 Absolute : UChar
+0x002 Size : UChar
+0x003 Inserted : UChar
+0x004 SignalState : Int4B
+0x008 WaitListHead : _LIST_ENTRY
...


Signaled Object from Memory Dump Analysis Vol 2 by Dmitry Vostokov provides an in-depth look at the signaled object from kernel mode debugging perspective.
 

Tuesday, February 17, 2009

Display content of std::map

After reading STL AND WINDBG from Memory Dump Analysis Vol 2 by Dmitry Vostokov, I try to dump the content for std::map as following.

0:056> dt ftpaccess!IFTPLicense_Impl 0x0f7d0318
+0x000 __VFN_table : 0x01ba058c
+0x004 m_nRefCount : 1
+0x008 m_CS : IFTPCritSec
+0x020 m_pLicense : 0x00ffdfb8 CIpswitchLicense2
+0x024 m_FeatureMap : std::map,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> > >
+0x030 m_pszFeatureList : 0x0f802ed0 "IPLocking,SSH,ThinWeb"


It is a license object. Its memeber m_FeatureMap is std::map.

0:056> dt -n std::map,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> > > (0x0f7d0318+0x024)
ftpaccess!std::map,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> > >
+0x000 comp : std::less,std::allocator > >
+0x001 _Alnod : std::allocator,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> >,0> >::_Node>
+0x002 _Alptr : std::allocator,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> >,0> >::_Node *>
+0x003 _Alval : std::allocator,std::allocator > const ,FEATURE_DATA *> >
+0x004 _Myhead : 0x0f806c98 std::_Tree_nod,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> >,0> >::_Node
+0x008 _Mysize : 3


There are three items in the map. _Myhead is the first node.

0:056> dt -n std::_Tree_nod,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> >,0> >::_Node 0x0f806c98
ftpaccess!std::_Tree_nod,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> >,0> >::_Node
+0x000 _Left : 0x0f806d28 std::_Tree_nod,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> >,0> >::_Node
+0x004 _Parent : 0x00ffea70 std::_Tree_nod,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> >,0> >::_Node
+0x008 _Right : 0x00ffeac8 std::_Tree_nod,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> >,0> >::_Node
+0x00c _Myval : std::pair,std::allocator > const ,FEATURE_DATA *>
+0x02c _Color : 1 ''
+0x02d _Isnil : 1 ''

The key-value pair in the first node (_Myhead) is empty. However, it is linked to another three nodes. Let us dump the left node.

0:056> dt -n std::_Tree_nod,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> >,0> >::_Node 0x0f806d28
ftpaccess!std::_Tree_nod,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> >,0> >::_Node
+0x000 _Left : 0x0f806c98 std::_Tree_nod,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> >,0> >::_Node
+0x004 _Parent : 0x00ffea70 std::_Tree_nod,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> >,0> >::_Node
+0x008 _Right : 0x0f806c98 std::_Tree_nod,std::allocator >,FEATURE_DATA *,std::less,std::allocator > >,std::allocator,std::allocator > const ,FEATURE_DATA *> >,0> >::_Node
+0x00c _Myval : std::pair,std::allocator > const ,FEATURE_DATA *>
+0x02c _Color : 0 ''
+0x02d _Isnil : 0 ''


The key-value pair (_Myval) in the left node is not empty. Let us dump the pair to see what it is.

0:056> dt -n std::pair,std::allocator > const ,FEATURE_DATA *> (0x0f806d28+0x00c)
ftpaccess!std::pair,std::allocator > const ,FEATURE_DATA *>
+0x000 first : std::basic_string,std::allocator >
+0x01c second : 0x0f7d1bb0 FEATURE_DATA


The key-value pair consists of first and second. The first is key, which is std::string, the second is value, which is FEATURE_DATA.

0:056> dt -r -n std::basic_string,std::allocator > (0x0f806d28+0x00c)
ftpaccess!std::basic_string,std::allocator >
+0x000 _Alval : std::allocator
=01b90000 npos : 0x905a4d
+0x004 _Bx : std::basic_string,std::allocator >::_Bxty
+0x000 _Buf : [8] "ฐཽSSHEna"
+0x000 _Ptr : 0x0f7d0e10 "IPLocking"
+0x014 _Mysize : 9
+0x018 _Myres : 0xf


0:056> dt -r -n FEATURE_DATA 0x0f7d1bb0
ftpaccess!FEATURE_DATA
+0x000 nEnabled : 1
+0x004 nLicenseType : -1
+0x008 tmActivationDate : _SYSTEMTIME
+0x000 wYear : 0
+0x002 wMonth : 0
+0x004 wDayOfWeek : 0
+0x006 wDay : 0
+0x008 wHour : 0
+0x00a wMinute : 0
+0x00c wSecond : 0
+0x00e wMilliseconds : 0
+0x018 tmLicenseExpires : _SYSTEMTIME
+0x000 wYear : 0
+0x002 wMonth : 0
+0x004 wDayOfWeek : 0
+0x006 wDay : 0
+0x008 wHour : 0
+0x00a wMinute : 0
+0x00c wSecond : 0
+0x00e wMilliseconds : 0
+0x028 szSerialNumber : [260] ""


So the key-value pair indicates that IPLocking feature is enbled in the license. With the same approach, I can dump the content for the parent and right nodes.
 

Saturday, February 14, 2009

Run Win32 application in its own application pool

A handful of ThinClient customers periodically can not login to ThinClient . The server-side log indicates that an access violation occured when the back-end code tried to check whether ThinClient feature has been enabled in the license. We could not duplicate the issue in house until one of my colleague tried to login to ThinClient while running automation test against server webadmin ASP pages.

The following is a simplified version of architectural layers for ThinClient and WebAdmin ASP.


IIS is used to host WCF service, ASP .NET application, and classic ASP pages. The license object is a global singleton written in native C++ and resides in native C++ layer.

At the beginning, it was very hard for me to associate WCF service layer, ASP .NET application, and classic ASP pages together. Since in my opinion, ASP pages should definitely run in a different process from those .NET applications. So I attached WinDBG to w3wp and ran the test case again. WinDBG broke into access violation exception right away. From the stack trace, I could see COM layer code at the bottom of the stack and managed C++ code at the top of the stack. Since I tried to login to ThinClient when the exception occurred, it completely made sense to see managed C++ code on the stack. However, the COM layer code used by classic ASP also on the stack did not make any sense. Even more interesting, when WinDBG broke into the exception, WCF service, ASP .NET, and classic ASP pages all stopped. It seems to me that they were running in the same process. Then I checked IIS configuration and found out that all of them were running in the same application pool. With the help of my colleage, we found out the application pool only used one worker process. After increasing the number of worker process to 5, we ran the test case again. The issue happened much less. Then we moved WCF service and ASP .NET into a different application pool so that both .NET applications are in the same pool and classic ASP in another one. Everything works fine since then.

Confused by what I have seen, I looked for some explanation. Unfornatunely, I did not find any formal document on whether .NET and Win32 applications could run in the same application pool. However, from some blog articles, email threads, and the document on application pool, it seems that .NET applications benefit from the application pool. Multiple .NET applications running in the same application pool not only reduces fixed overhead on the memory, but also improves the performance and stability. To shield its variables from being visible to all other applications in the same pool, .NET applications are seperated by AppDomain. For a specific .NET application, its variables are only visible inside its AppDomain. However, for Win32 applications, since there is no such a boundary for them. A global or static variable in one Win32 application would be visible to all other applications in the same application pool. Therefore, a Win32 application should run in its own application pool.


Friday, February 6, 2009

Alias evaluation in WinDBG script

I copied and pasted the following commands from Dmitry Vostokov's Memory Dump Analysis Volume 2 into a script file.

aS /c FirstModule .printf "%mu", @@c++((*(ntdll!_LDR_DATA_TABLE_ENTRY**)&@$peb->Ldr->InLoadOrderModuleList.Flink)->BaseDllName.Buffer);
.dump /ma /u c:\${FirstModule}.dmp;


To see whether the script actually works, I launched a notepad, attached windbg to it, and ran the script. However, the script generated the output as following for every run,

c:\${FirstModule}_1008_2009-02-06_11-56-11-549_17f0.dmp - mini user dump
Dump successfully written


Somehow, the alias was not evaluated. However, as I entered those commands directly into the windbg command window, the alias was evaluated as expected. After trial and error, I changed the script as following,

.block
{
aS /c FirstModule .printf "%mu", @@c++((*(ntdll!_LDR_DATA_TABLE_ENTRY**)&@$peb->Ldr->InLoadOrderModuleList.Flink)->BaseDllName.Buffer);
.dump /ma /u c:\${FirstModule}.dmp;
}


As I ran the script for the first time, the alias still was not evaluated.

c:\${FirstModule}_1008_2009-02-06_12-12-47-359_17f0.dmp - mini user dump
Dump successfully written


However, as I ran the script the second time and more, the alias was evaluated correctly.

c:\notepad.exe_1008_2009-02-06_12-14-58-959_17f0.dmp - mini user dump
Dump successfully written


Then I looked up the .block from the windbg help. The comments section explains the reason.

When each block is entered, all aliases within the block are evaluated. If you alter the value of an alias at some point within a command block, commands subsequent to that point will not use the new alias value unless they are within a subordinate block.

Then I changed the script as following and everything works like a charm since then.

aS /c FirstModule .printf "%mu", @@c++((*(ntdll!_LDR_DATA_TABLE_ENTRY**)&@$peb->Ldr->InLoadOrderModuleList.Flink)->BaseDllName.Buffer);
.block
{
.dump /ma /u c:\${FirstModule}.dmp;
}