The X-Boorman platform presents a digital revival of the Biographical Dictionary of Republican China edited by Howard L. Boorman in 1967-1971 (Columbia University Press). The BDRC was a reference work that generations of historians used in their research and teaching. It provided essential information on both a large group of Chinese elites (including one foreigner) and on the history of the period. The BDRC went out of print decades ago and became inaccessible, except in libraries ((The print version of the BDRC is now out of print, but the four main volumes are available for consultation — on loan — at Archives.org.)). It also became obsolete due to its use of the Wade-Giles transliteration system, the availability of new materials (especially in Chinese), and of course the rise of Wikipedia-type platforms in the United States and in China. Yet the BDRC remains a very valuable piece of scholarship that we sought to transform to make it available again under new guise to the public.
The ENP-China project developed this platform initially as a spin off of its experiments. The ENP-China project aims at studying the transformation of elites in modern China. Originally, we initiated this study and exploration of the BDRC as an attempt to implement NLP (natural language processing) tools and methods in historical research. Our purpose was to examine the biographical information contained in the four volumes of the dictionary and to collect and extract all the biographical data therein. We chose the BDRC for several reasons:
Thematically, it intersects nicely with the main topic of our research — elites in modern China — and provides ready-made biographical material. The BDRC contains 589 individual biographies and one family group biography (Song family). The sample of individuals selected for inclusion represents only a tiny fraction of elites in Republican China. Reviewers have discussed its degree of representativeness and have pointed out the overwhelming weight of political and military leaders in the BDRC. Yet, even with the inevitable imbalances in the sample, one must admit that practically all the sectors of society are covered. From our perspective to engage with the language of the biographies, the BDRC provides rich and relevant materials.
As a genre, the BDRC was composed by historians and written in natural language (vs. structured text), in the style and prose we expect to find in other texts produced by historians. Our main purpose was to test retrieving all the historical data in the BDRC and to produce datasets to feed into the Modern China Biographical Database (to be released in 2021). Biographical dictionaries in print format have been used as a documentary basis to lay the foundations of biographical databases. Beyond the BDRC, we plan to apply this approach more broadly to academic literature on China, past and present.
There was nothing straightforward in implementing NLP methods to the BDRC. Although it is written in English, all the mentions of Chinese individuals, institutions, and locations are given in the mainstream transliteration system of the time (Wade-Giles). This required adapting the CoreNLP (Stanford University) pipeline to add specific NLP functions in order to identify and extract the named entities. As much as possible, we sought to retrieve the original Chinese names of individuals, institutions, locations, and works that appear in the BDRC. We are adding Chinese and pinyin in the text to make it more relevant and useful to the younger generation of scholars and to students. This is still an on-going process and we welcome inputs from the users of this platform.
X-Boorman is an invitation to a new journey. One can start from many angles to discover the rich information the BDRC still has to offer. The two main gateways of X-Boorman are the original biographical texts on the one hand and a graph visualization instrument — Padagraph — on the other hand. Yet, the reader will also find various alternative pathways into this central group of Chinese elites.