Data and Map Methods
Updating the HDMT

The first version of the HDMT website was launched in March 2007. To date, the web-based version of the HDMT has undergone a number of revisions to improve its applicability and specificity. Among these edits has been at attempt to update indicator data to be as current as possible. Because of the large number of indicators within the HDMT website, however, it is not always possible to include the most currently available data.

SFDPH staff is committed to one annual mass update to the HDMT website, primarily focusing on revising indicators, data and development targets. This update generally occurs after a large application of the HDMT is completed, as plan/project evaluations often highlight aspects of the HDMT that can be improved. Major re-organizations to the website also occur during this "update" period. Throughout the year, however, indicators that do not have data associated with them may be updated outside of the annual update. In addition, the policies and design strategies, research citations, and established standards referenced throughout the HDMT are updated at all times throughout the year.

The majority of HDMT indicators that use U.S. Census data rely on data from the 2000 Census, obtained from the GeoLytics® CensusCD® Neighborhood Change Database (NCDB) 1970-2000. In Spring 2008, some HDMT indicators using Census-based population and household denominator data were updated with new 2007 data released by Applied Geographic Solutions (AGS) in an attempt to reflect the changing population demographics of San Francisco. Unfortunately, AGS does not provide updated estimates for all Census variables used in the HDMT. As a result, HDMT indicators are based on a combination of both 2000 and 2007 data, as noted on the individual indicator pages. SFDPH staff anticipates updating indicators using revised Census-based estimates every three years.

2000 Census data notes: According to the NCDB Data Users Guide, "The Neighborhood Change Database (NCDB) contains social, demographic, economic, and housing data on census tracts in the United States for 1970, 1980, 1990, and 2000. Data in the NCDB are based on information gathered by the U.S. Bureau of the Census in its decennial censuses. The Bureau makes census tract data available to the public in both printed and machine-readable formats, but these products generally force users to focus on one tract and one characteristic at a time. By compiling data on all census tracts into one data file, the NCDB allows users to simultaneously analyze numerous (or all) tracts on a host of dimensions." (page 1-1)

"The basic geographical unit of observation in the NCDB is the census tract. Census tracts are locally determined geographic units, ranging in size from 2,500 to 8,000 persons. Tracts are meant to approximate "neighborhoods" by capturing a group of residents with similar population characteristics, economic status, and living conditions. Tracts can be used by themselves as units of analysis or as the building blocks to create larger neighborhood areas." (page 2-1)

"It is also important to recognize that the NCDB does not provide information on individuals directly—all data are aggregated to the census tract level. The Census Bureau aggregates data to preserve the confidentiality of individual respondents, which is guaranteed by federal law." (page 1-4)

2007 AGS data notes: AGS has processed and made updates to the majority of variables from the 2000 Census, using the same boundaries (e.g. Block Groups and Census Tracts). "The data variables include base population counts, income information, educational achievement and employment information." Mosaic methodology was used to create these update, which is "a geodemographic segmentation system developed by Experian. Each of the nearly one-quarter million block groups was classified into sixty segments on the basis of a wide range of demographic characteristics. The basic premise of geodemographic segmentation is that people tend to gravitate towards communities with other people of similar backgrounds, interests, and means. The Mosaic assignments are updated annually by incorporating updated AGS demographics into the segmentation model, ensuring that the assignment is as accurate as possible given shifts in local area demographics. Mosaic was originally constructed using the 1990 Census, and is now based on the 2000 Census data and updated on an annual basis using AGS demographic updates."

HDMT Geographic Boundaries: Tables and Maps

We constructed the HDMT indicator tables and maps using a number of different geographic units of analysis. "Planning Neighborhoods," as defined by the San Francisco Planning Department, reflect SFDPH’s preferred method of reporting data at a neighborhood level. However, some HDMT indicators are based on zip codes and Supervisorial Districts, based on the geographic level used by the original data source. Caution is advised in making direct comparisons across neighborhoods defined by different geographic levels, as Planning Neighborhoods, zip codes and Supervisorial Districts represent different constructs and geographic boundaries.

The vast majority of HDMT indicator tables which include neighborhood-level data use Planning Neighborhood boundaries, and have accompanying maps typically presenting data at the census tract-level with Planning Neighborhood names in the vicinity of their corresponding census tracts. While Planning Neighborhoods are larger geographic areas than census tracts, census tracts do not always lie completely within a Planning Neighborhood. After importing census data into ArcGIS, SFDPH used ArcGIS software and a 'centroids within' methodology to convert census tracts (2000 Census data) or blocks (2007 AGS data) to geographic mean center points and assigned them to Planning Neighborhoods based on the spatial location of those geographic mean center points. We then calculated the Planning Neighborhood totals for the tables.

Creating HDMT Maps
How were the maps created?

The system used for mapping and various analyses was ArcGIS 9.2 software by ESRI (2006). This software integrated the data for mapping and analysis. The color schemes for the maps were selected by consulting ColorBrewer ((http://www.colorbrewer.org), an online tool for selecting color schemes.

How were the maps classified?

The majority of the maps were classified using Jenks Natural Breaks. In this classification method, the data are assigned to classes based upon their position along the data distribution relative to all other data values. The classification is determined by the best arrangement of values into classes by comparing the sum of squared differences of values from the means of their classes. The features are divided into classes whose boundaries are set where there are relatively big jumps in the data values. This classification uses an iterative algorithm to optimally assign data to classes such that the variances within all classes are minimized, while the variances among classes are maximized. This technique first orders the values from low to high. It then calculates the sum of squared difference (SSD) for the possible first breaks, calculating the SSD for every possible break. It then finds the SSD for each of the next possible breaks, as if a previous break had already happened. It determines the SSDs for all of the requested breaks, and then it chooses the best last break from the last list of SSDs, the best second to last break from the second to last list, etc. This provides the best set of breaks from the entire list of possible breaks and thus the classification.

What projection and coordinate system was used for the maps?
Projected Coordinate System: NAD_1983_UTM_Zone_10N
Projection: Transverse_Mercator
False_Easting: 500000.00000000
False_Northing: 0.00000000
Central_Meridian: -123.00000000
Scale_Factor: 0.99960000
Latitude_Of_Origin: 0.00000000
Linear Unit: Meter (1.000000)

Geographic Coordinate System:
GCS_North_American_1983
Datum: D_North_American_1983
Prime Meridian: 0
        
Census Undercount

According to the NCDB Data Users Guide, "Since its inception in 1790, controversy has surrounded the decennial census's alleged undercount of individuals (Anderson 1988). This is a significant issue because data from the census are so widely used in social science research and are the basis of important political decisions, including the drawing of congressional districts and the allocation of government funding…..No one, not even the Census Bureau, denies that the census misses many people. Also, to a lesser extent, there is some enumeration of fictitious or deceased individuals and double counting. The undercount problem exists for many reasons. For instance, the Census Bureau may miss some housing units when sending out forms or some people who have received forms may not complete and return them. The former case is prevalent among individuals with no stable address (such as the homeless), while the latter is particularly common among illegal immigrants, many of whom wish to remain hidden from the government. While the Census Bureau makes several attempts to locate nonresponding households, some are inevitably missed." (page 4-7 and 4-8)

"Of particular concern is the so-called "differential undercount," which refers to the fact that certain types of individuals and households are more likely to be missed by the census than others. According to one study, the undercount for black persons remained at 5.7 percent in 1990—an improvement from the 8.4 percent mark in 1940, but an increase from 4.5 percent in 1980 (Robinson, et. al. 1991). Men and the young are more likely to be missed than women and the old, and one study estimated that for black males between 20 and 29, the undercount was 10.1 percent in 1990 (Skerry 1992). The number of illegal immigrants, most of whom are of Hispanic origin, is believed to be around 3 million, and the Census Bureau estimates that 30 percent of this population was missed in 1990." (page 4-8)

According to the U.S. Census, "data indicate that populations were undercounted at different rates. In general, Blacks, American Indians and Alaskan Natives, Asians and Pacific Islanders, and Hispanics were missed at higher rates than Whites." For more information, visit: http://www.census.gov/dmd/www/techdoc1.html

Census Weighting

The vast majority of U.S. Census-based indicators in the HDMT are reported based on the Census Summary File 1 (SF1), also known as the long form, and Summary File 3 (SF3), also known as the short form. According to the NCDB Data Users Guide, "…short form and long form counts for particular geographic areas or subpopulations do not necessarily agree with each other, since the former are based on an enumeration of the entire population and the latter only on weighted sums of a 1-in-6 sample of households. (page 4-15)

As further explained by the Census Bureau, "As in earlier censuses, the responses from the sample of households reporting on long forms must be weighted to reflect the entire population. Specifically, each responding household represents, on average, six or seven other households who reported using short forms…..One consequence of the weighting procedures is that each estimate based on the long form responses has an associated confidence interval. These confidence intervals are wider (as a percentage of the estimate) for geographic areas with smaller populations and for characteristics that occur less frequently in the area being examined (such as the proportion of people in poverty in a middle-income neighborhood)."

"In particular, for Census 2000, the Bureau of the Census created weighting areas --geographic areas from which about two hundred or more long forms were completed-- which are large enough to produce good quality estimates. If smaller weighting areas had been used, the confidence intervals around the estimates would have been significantly wider, rendering many estimates less useful due to their lower reliability……The disadvantage of using weighting areas this large is that, for smaller geographic areas within them, the estimates of characteristics that are also reported on the short form will not match the counts reported in SF 1 or SF 2. Examples of these characteristics are the total number of people, the number of people reporting specific racial categories, and the number of housing units. The official values for items reported on the short form come from SF 1 and SF 2."

For more information on Census weighting, visit:
http://www.census.gov/Press-Release/www/2002/sf3compnote.html

References

Applied Geographic Solutions, Inc. Spring 2007 Update: Current Year Estimates. Methodology available at: http://www.appliedgeographic.com/library.html

GeoLytics® in association with the Urban Institute. CensusCD® Neighborhood Change Database (NCDB) 1970-2000. US Census Tract Data User Guide. Available at: http://www.geolytics.com/USCensus,Neighborhood-Change-Database-1970-2000,Products.asp