Five sets of flat files are built from the CRSP Stock and Index databases. A number of our subscribers have long requested alternate formats of our databases in order to automate or streamline their processes and easily ingest CRSP data into them. These flat files are intended for these purposes.
- SIZ – 1925 US Stock and Indexes
- SAZ – 1925 US Stock
- SXZ – 1962 US Stock and Indexes
- S6Z – 1962 US Stock
- SFZ - 1925 US Indexes
|Delist – Daily Return Information||sfz_del.sas7bdat||sfz_del.dat|
|Delist – Monthly Return Information||sfz_mdel.sas7bdat||sfz_mdel.dat|
|Index Membership *||sfz_mbr.sas7bdat||sfz_mbr.dat|
|Portfolio Membership - Daily *||sfz_portd.sas7bdat||sfz_portd.dat|
|Portfolio Membership - Monthly *||sfz_portm.sas7bdat||sfz_portm.dat|
|Security Header Information||sfz_hdr.sas7bdat||sfz_hdr.dat|
|Time Series – Daily Primary||sfz_dp_dly.sas7bdat||sfz_dp_dly.dat|
|Time Series – Daily Secondary||sfz_ds_dly.sas7bdat||sfz_ds_dly.dat|
|Time Series - Monthly||sfz_mth.sas7bdat||sfz_mth.dat|
|Time Series – Daily Index||sfz_dind.sas7bdat||sfz_dind.dat|
|Time Series – Monthly Index||sfz_mind.sas7bdat||sfz_mind.dat|
* Available in 1925 and 1962 US Stock & Indexes, and 1925 US Indexes, but are NOT available in 1925 and 1962 US Stock.
Comparisons between the legacy CRSPAccess Stock and Index databases and the CRSP flat files will reveal differences, outlined here:
The flat files provide all of the underlying data that can be used to generate derived data items, but do not include the derived items themselves. Examples of derived data that TSQuery and ts_print can generate that are not found in the flat include excess and cumulative returns, last non-missing, recent, previous period versions of time series data items.
CRSPAccess stored most items, including commonly used Price (PRC), Ask or High Price (ASKHI), and Bid or Low Price (BIDLO), as 4-byte floating point numbers. The SAS format defaults to storing those items as 8-byte (double precision) floating point numbers. For the majority of historical values, this does not result in any difference. However, the conversion to SAS’s double precision, makes more apparent a known issue of phantom precision related to decimal pricing. For example, IBM’s price on 11/27/2015 of $138.46 will, if displayed with five decimal places, show $138.46001. The ASCII version, by outputting only seven significant digits in scientific notation, will have just 1.384600E+02, and no phantom precision.
CRSP is re-engineering our back office to correct the phantom precision and increase the real precision throughout the Stock and Indexes flat files in the coming year.
Subscribers have long-requested the addition of 9-character cusips to our databases. The flat files include them in the name- and header history files.
CRSP Total Market
The investable CRSP Total Market Index is included in the index files associated with the stock and index database. Daily price-only and total returns, levels, and counts are included.
The introduction of double precision allows for the monthly volumes to be stored in same unit (1 share) as daily volumes rather than as a unit of 100 shares. So monthly volumes in the flat files are now stored as one hundred times the value in legacy CRSPAccess. The change also allowed for four missing daily volumes for Citigroup in 2009 and 2010 to be replaced with actual values that were in excess of two billion, and for those monthly volumes to be recalculated.
|KYPERMNO||CALDT||New VOL||MCALDT||New MVOL|
In CRSPAccess, missing values were indicated by the use of defined non-null values (e.g -99 for returns and 0 for prices). In the flat files, missing values are now represented by null values.
In the SAS files, CRSP uses the default SAS missing value, displayed as a “.”
In the coming year, CRSP will be further examining all missing value conventions.