So today is the end of my first week at the U.S. Census Bureau.
Monday and Tuesday, the Census Bureau was closed because of Martin Luther King Day and the inauguration. (Although apparently I still get paid for those days!)
Wednesday and Thursday was orientation. Yes, it lasted two full 8-hour days. Most of it was pretty mundane presentations, a lot of which didn't even apply to me because I am a part-time worker (like the presentations about health benefits, retirement plans, etc.) However there were a couple funny parts, like the presentation about "Government Ethics." Normally, of course, you can teach government ethics just by opening up the newspaper to a random page. However, since this was the day after inauguration, all the newspapers were filled up with inauguration news, so we learned about government ethics by watching a video entitled "The Battle for Avery Mann," which features a nondescript government worker in a nondescript office facing a series of ethical conflicts (whether to use the office copying machine for personal use, whether to accept gifts from a subordinate, what to do when asked to review a proposal from a company he also works for part-time) and two comical characters representing his good and evil impulses trying to tell him what to do.
On Friday, I started doing some actual work. I am working in the Statistical Research Division in the Disclosure Avoidance Research Group. The goal of this research group is to find ways of releasing census data to the public in a way that is useful to potential users of the data, but that does not enable anyone to find out any information about a particular respondent. Methods that are being used now, or being explored for potential use in the future, for doing this include:
1. Providing only certain cross-tabulations of data, not the full data set.
2. Suppressing cells in cross-tabulations with less than a certain number of people in them.
3. Adding artificial "noise" to certain elements of the data set.
4. Providing a synthetic data set with similar statistical patterns to the real data set (say, the same values for all cross-tabulations with up to a certain number of variables) but that doesn't have any of the real data.
To see some of the work the Statistical Research Division does, you can see research papers they have published here. I am working with Yves Thibaudeau and Robert H. Creecy.
Bonus Math Question:
The Census Bureau offers a benefit known as a "health savings account" (HSA). Employees can designate a portion of their income to go into the HSA pre-tax (meaning that money put into the HSA is deducted from your income for purposes of income tax). Money in the HSA can be used for health care expenses. However, any money left in the HSA that is not used for health-care expenses is lost.
How can you determine the optimal amount of money to put in the HSA? (Assume that you are risk-neutral, that your utility of money does not depend on your health care expenses, and that you know what your probability distribution of health care expenses for the next year looks like.)