jepzilla.com


20
Nov

My Hot Chili

Ingredients

1 can of diced tomatos
1 can of kidney beans
5 or 6 celery stalks
1 or 2 habernero pepper
1lb ground beef or steak

Canola oil

Black pepper
Chili powder
Ground cumin
Cayenne Pepper (optional)
Ground oregano (optional)
Ground cinnamon (optional)

Perparation

Empty kidney beans into a strainer, wash, drain and leave to dry.

Chop celery stalks into pieces 1/4 to 1/2 an inch long.

Finely chop the peppers. Remove all seeds, and chop into small square pieces approximately 1/16th of an inch to a side. Place pepper bits onto a plate so you don’t have to touch them with your hands again. Don’t touch anything while you’re doing this. WASH YOUR HANDS IN RUBBING ALCOHOL WHEN YOU’RE DONE. If you don’t have any rubbing alcohol, wash them in cooking oil or milk. If you don’t do this immediately the oils from the peppers will soak into your hands and burn for hours and it can’t be washed off. (First aid advice: if your hands start burning 15-30 minutes after working with the peppers, use lanacane or any other topical burn medication that contains a local anasthetic, then burning will stop after a couple of hours).

If you’re using steak, chop it into 1in chunks.

Cooking

Put a splash of cooking oil into a pot, sprinkle it with black pepper (or pan) and set the heat on high. You don’t want a lot of oil, just enough to cover the bottom of the pan when hot. You don’t want too much oil or it’ll splash when you add the meat, which sucks.

Once the oil is hot enough (it’ll cover the bottom of the pan and will be rippling a little bit), put in your beef, and sprinkle with more pepper. Brown the meat.

Add your can of diced tomatos, the cleaned beans, the celery stalks and the peppers (don’t forget to wash your hands if you’ve touched them again).

Add 5 tablespoons of chili powder, and 1 tablespoon of cumin. I also like to add a few teaspoons of cinnamon and oregano, and a little bit of cayenne pepper.

Mix everything well.

Bring to a boil, stir and lower the heat, then cover and let simmer for 1-2 hours. Check every 10-15 minutes to adjust spices to your liking. Making a chili isn’t an exact science. At least, it’s not for me. :-)

Other things you can add:
A tablespoon of lemon juice
A tablespoon of cocoa powder + 1/4 cup of corn flower or cornmeal.


19
Dec

Update to DPC Mystery

So some further investigation reveals that after a fresh boot, the ndis.sys driver is behaving normally. The DPC flood doesn’t begin until after the first suspend/wakeup cycle.


17
Dec

Why is Vista hogging my CPU?

As some know I’ve been running Vista on one computer or another since its release. In general I don’t have too many issues with it. But Vista’s lacklustre performance is a mystery to me. The Sidebar is kind of a hog if you install some badly behaved widgets, but otherwise when I look at Windows’ task manager I don’t see any programs that could be the culprit. So it’s time to do some digging.

 

Our first stopover will be SysInternals’ Process Explorer (now owned by Microsoft), which will give us a lot more information about the system than the Task Manager will. And what do we find…

VistaDPCs

 

Holy schmoly! What’s going on with the DPCs? What are DPCs?

 

What is this DPC thing?

Well DPC stands for Deferred Procedure Call, and it’s how device drivers do most of their work. In Windows (and other x86 operating systems) when a device needs its device driver to do something, it generates an interrupt. Interrupts do exactly what they’re named: they interrupt the currently running program and make the computer run a special bit of code provided by the operating system. The operating system then goes to the driver for the device which generated the interrupt and runs its Interrupt Service Routine (ISR).

 

Interrupts get to preempt whatever else the computer is doing because they can be very time sensitive. So for this reason there’s one thing interrupts aren’t allowed to… interrupt- and that’s other ISRs. But sometimes there’s also a lot of processing that needs to occur when an interrupt is received, for example a network device might generate an interrupt when it receives some data. That data needs to be read from the card and then processed by the computer’s TCP/IP stack, which might take quite a while, during which time many other devices might generate interrupts (or the same device might generate more interrupts).

 

The solution to this problem is to break drivers into two parts. One part lives in the ISR and does any time-critical tasks that are needed. It then schedules the rest of the driver’s processing to be performed in a Deferred Procedure Call (DPC). DPCs run at an elevated priority, like ISRs, so they get to preempt any programs we’re running. However, they don’t have quite the same priority as ISRs, so a DPC can be interrupted by any new interrupts (which will then schedule their own DPCs).

 

So to cut a long story short, basically a DPC is a device driver.

Device driver road rage

So this pisses me off.

 

There’s no reason why Vista should be spending so much time running device drivers, no reason at all. The screenshot above was taken on my laptop, but I installed Vista and XP on a desktop machine and compared time in DPCs on both machines. Vista was spending about 5% of its CPU time in DPCs, whereas XP is spending a fraction of a percent.

 

I want to resolve this frustration in a natural way, i.e., by flinging poop at those responsible. I’m a reasonable person. So who’s to blame? Conveniently, Windows will tell us if we know how to ask.

> tracelog -start -f dec_17_2007_10_34_AM.etl -dpcisr -UsePerfCounter -b 64

… wait a little while …

> tracelog -stop

> tracerpt dec_17_2007_10_34_AM.etl -report dec_17_2007_10_34_AM.html -f HTML

Survey says…

DPC processor utilization

Interesting, ndis.sys is consuming a lot of CPU time: 13.09% of processor 1 over the test duration. And let’s take a look at ndis.sys…

ndis.sys file properties

 

So now the question is, why? What is ndis.sys doing that takes up so much CPU time at 0×806481C7?


27
Sep

Vision

The current word is vision. Machine vision. How do we make a computer “see”? But the process of vision isn’t all semantic analysis, that’s merely a last step; for now I’m more interested in lower level processes. One of the clever functions of the human visual system is our eyes move constantly, picking up images of the environment from slightly different angles, and with different patterns of rods and cones capturing the same scene. The visual centers of the brain can then take these different images and reconstruct a higher resolution picture of the scene by combining information from a sequence of images. This reconstructed image is what we perceive, consciously. So, my goal is to do this- take a video source, find portions of the image in motion and construct high resolution images of those moving objects from the sequence of frames.

It’s not quite there yet, but I do have pretty pictures…

Still capture of video stream Funky Book Funky Table

The image is generated by first converting each frame into HSV. The value component is then replaced by the edge strength (calculated by the norm of the horizontal and vertical Sobel operators), and then the RGB image is reconstituted. This is nothing particularly special.

What’s cool is the calculation is actually done by the computer’s GPU, not the CPU. This is nice because the GPU is well suited for vision problems and is substantially faster than the CPU within its practical problem domain (300GFLOPS vs 30GFLOPS).


16
Jun

No posts, or how I learned to love the NDA

I originally intended to update this on a weekly basis. However, I’m under a NDA with regards to my GPU work. Prudence has led me to abandon my goal of semi-frequent posts. However, work on GPUs is still progressing, although I have moved on from my original topic to something a little different. CUDA is pretty fantastic though, and I’ve picked up my own 8800 to experiment with the software outside of work.

 However, what do high-dimensional Hilbert spaces, topology, statistical learning theory, user interface design, Kohonen maps and machine translation have to do with each other? They’re all components of my new personal project. More to come. :)


30
Mar

On the importance of being cached

A curious feature of modern GPUs is the lack of implicit caching. In modern computer architectures there are many layers of cache interposed between the CPU and the system’s main memory. Memory latency (the time it takes for the processor to fetch data back from main memory) has not kept up with processor speed increases. The practical impact of this is that we need to copy frequently used memory into smaller, faster memory buffers that the processor can access quickly. In modern architectures there are multiple layers of cache, each one faster (and smaller) than the previous. In fact, on your typical desktop processor the majority of die space is taken up by cache memory.

GPUs have similar problems to CPUs when dealing with memory latency. Your typical graphics card has a certain amount of on-board memory, which formerly acted as a cache for textures and other data that the cards use frequently but isn’t updated too often; with the advent of programmable rendering pipelines GPU memory has become a first class citizen in the memory world, rather than simply a cache. However, GPUs still employ multiple caches, much like a CPU, to speed access to their ever-growing on-board memory. One of the curiosities of GPU architectures, however, is that caching is far more explicit than it is with CPUs.

A CPU implicitly caches data. Whenever something is accessed from memory, the CPU caches that value, and any values nearby in memory. There are a number of different algorithms for choosing exactly what to cache, but mostly they are dependent on the assumption that memory accesses tend to have spatial and temporal locality. But this is all done transparently to the programmer and the program; you could completely remove the cache from a processor and the only noticable change would be a significant drop in performance.

When programming a GPU, we have to explicitly manage our caches. While array memory (i.e., textures) have implicit caching, general purpose linear memory is not. We can access data in array memory and rely on the same techniques in traditional programming to ensure reasonable performance, however to access data in linear memory we need to first copy it to the chip’s local shared memory (16kb per multiprocessor on a G80). Furthemore, our performance is dependent on good memory access patterns, to spread operations out over the different memory banks.

So, what’s the practical implications of all this?

Well, the most obvious is that it makes programming a GPU a bit trickier than a CPU. We can’t just rely on the clever men and women at Intel and AMD and IBM to get our data to the processor as fast as possible. As programmers, it’s important to consider the lifecycle of our data and make sure we use our cache effectively; making sure our intermediate and output buffers are properly used and cached is of paramount importance. Fortunately, the texture cache means that any read-only input data we can load into array memory and not worry about too much.

This also has some pretty significant implications on the way you design your program. For example, consider the simple operation of multiplying a vector by a matrix. One possible implementation would be to write a nice, fast dot-product operation on the GPU and implement the multiplication by looping over each element of the output array and calculating it by calling our high-performance dot-product. But the proper way to do it is to implement the entire operation on the GPU without relying on the CPU at all, because that messes up our caching! With current technology, we don’t get to preserve our multiprocessor caches between kernel calls, and that’s a problem; by dumping our cache after each element we have to reload the vector operand into the cache each time, which is a waste. By doing the entire operation as part of the same kernel we get the performance boost of keeping our data close to the GPU throughout the entire computation.

Of course, we could also keep our vector operand in array memory and let the texture cache solve our woes. But there are reasons why we might want our input buffer to be in linear memory. :-)


23
Mar

Driver Frustrations

For the work I’m doing I was hoping to be able to use NVIDIA’s new CUDA library. CUDA is a rather clever toolkit that lets us dispense with all the graphics libraries (i.e., OpenGL and Direct3D) and program the graphics card directly, with no reference to frame buffers, or shaders or textures (well, it still has textures).

Now, I have two graphics cards that I use for this work: a GeForce 7950 GX2, which is a little out of date now and a G80-875. The G80-875 is a beast of a card, and isn’t due for commercial release until the fourth quarter (I think it’s the 8950). Problem: the 7000 series can’t run CUDA programs, and the CUDA drivers don’t support the pre-release G80-875. Whoopsie!

Guess it’s back to OpenGL for me.


23
Mar

Background

I’m working at Dalhousie on a project for a company called Acceleware. I can describe what I’m doing in one sentence, “Implement a conjugate gradient solver for sparse matrices on a GPU.” So, there’s three things which combine together to make an interesting little problem:

  1. Conjugate gradient solver
  2. Sparse matrices
  3. GPU computation

The conjugate gradient method (more properly known as the Hestenes-Stiefel conjugate gradient method) is a technique for numerically solving certain types of series of linear equations. To be precise, we can use it to solve symmetric, positive definite linear systems. Now, this is a pretty significant restriction, but symmetric, positive definite linear systems are quite common, especially in the realm of modeling and simulation when we’re trying to solve partial differential equations.

So how does a conjugate gradient solver work? Well, essentially it works by hill climbing. When we do hill climbing, we basically treat the solution space of the problem as a surface in space, where the answer we’re looking for is the highest point. A common way of doing this is called steepest ascent, where we follow the steepest path up; this is a greedy solution to the problem. The problem with greedy solutions is they aren’t particularly clever and often end up taking far too long to solve your problem; they fall prey to pathological situations and follow routes which seem appealing at first, but which lead to dead ends. With the conjugate gradient method instead of following the steepest route, we take the steepest route that is conjugate to the last path we followed. Doing this lets us avoid most of the degenerate cases that catch a simple steepest ascent algorithm.

So what is a sparse matrix, and what is so important about them that we’re making big sacrifices in terms of generality in order to be good at solving them? A sparse matrix is a matrix that consists mostly of zeros. Often times we have to solve problems in an extraordinarily large number of dimensions, hundreds of thousands, or millions of dimensions. Normally, storing matrices this large would be infeasible, but with a sparse matrix we can store only those elements with non-zero values, in other words we can store a compressed representation of the matrix easily. And an algorithm like the conjugate gradient method lets us work directly on the compressed matrix. Suddenly all those intractable problems are solvable.

Finally, we do this on a GPU. A GPU is a computer chip designed for processing large amounts of data in parallel. The CPU in a typical computer is essentially a serial processor; it performs its calculations very quickly on a single stream of data. CPUs these days are pretty fast, but they aren’t optimised for any particular kind of problem. On a single stream of data, a GPU is a lot slower than a CPU, but the GPU will perform its calculations on many streams at once.

So adding all this up, what do we get? Well, a program that can solve partial differential equations in millions of dimensions, hundreds of times faster than can be done on a CPU alone. Pretty cool, huh?


22
Mar

First Post

And thus jepzilla.com was born.

jepzilla.com is is proudly powered by Wordpress
Navigation Theme by GPS Gazette