`APPENDIX B`

Note on Sources

Without digital methods and online repositories, this book would probably have taken an entire career to research. With these tools and sources, I was able to complete it in a decade. To avoid further confusing an already dense and sometimes meandering narrative, I have largely elided discussion of research methods from the body text. Nonetheless, these deserve some degree of explanation.

By far the most significant tool used in this research—one that underlies most recent historical work but is generally not acknowledged—was full-text search. By using full-text search for keywords like “fir” (shan 杉)—in both specialized databases of Chinese sources and more general search tools like Google and Baidu—I was able to range across an enormous body of highly varied sources. In this way, I discovered entire genres of text, some of which I did not previously know existed. This is how I found several treatises on shipyard administration (chuanzheng), most of them freestanding texts; it is also how I found treatises on logging administration (muzheng), most of them hidden in the later chapters of gazetteers. In addition to identifying highly topical treatises, full-text search also allowed me to find anecdotes scattered widely in otherwise generalist accounts. For example, the Xu zizhi tongjian changbian is a general account of Song history and government with no specialized sections on forests or timber trade. By using full-text search, I identified dozens of small instances of changing policy—anecdotes that collectively allowed me to paint a broad picture of Song forest interventions.

Just as significantly, full-text search allowed me to identify other search terms. Slowly, I built a mental map (and a Google spreadsheet) of the linkages between keywords like “bamboo and timber” (zhumu), “proportional tariff” (choufen), “logging requisition” (caiban), and dozens of others that collectively made up the bureaucratic mechanisms for managing forests and woodland. Because the premodern Chinese state did not have a single, centralized forestry bureau, it was especially important to be able to track interventions across multiple institutions and their preferred interventions.

There are trade-offs to this approach. What keyword search gains in breadth, it tends to lose in context. The choice of keywords is also very significant—some are too specific and yield few results, some are too general and return a lot of extraneous information. The best are keywords that map closely onto clear ontologies in the source texts. But even when these ontologies are clear, keywords structure the inquiry in less expected ways. This text is guided, in part, by the vocabulary underlying Chinese botany, tax accounting, and construction administration. Indeed, I can easily recall the keywords used to develop a line of inquiry for each of the chapters. Finally, in order to use full-text search, I relied almost entirely on digital repositories, with a strong preference for those without paywalls or other access restrictions. The bibliographic information supplied by these repositories is not always complete. In some cases, it is difficult to identify the physical edition underlying the digital one, adding a degree of uncertainty to the chain of documentation.

In addition to keyword search, I also used regular expressions (regex) as a way to access large volumes of data, including most of the data used in chapter 2 and appendix A. I worked with researchers at the Max Planck Institute for the History of Science, and their Local Gazetteer Research Tools, to develop regular expressions to tag known keywords and to identify related data based on its structure in the gazetteers. For example, knowledge of the vocabulary of tax accounting allowed me to tag and extract large volumes of tax data through a semi-automated process, quadrupling the scale of my data set in a matter of a few weeks. Like keyword search more broadly, regular expressions also builds upon the underlying semantic and structural content of historical texts.

All histories are a function of their sources, shaped by the archives they use, and the reading biases of their authors. This book is no exception. Yet it is worth being aware that in this case the “archive” is not a set of boxes in a physical repository, or even a genre of texts, but a loose array of disparate sources, many of them digital. And the “reading” process depends in part on computationally assisted methods like full-text search and regular expression tagging, as well as on my own human perceptual and cognitive capacities.

Glossary

Show the following:

Adjust appearance:

Notes

`APPENDIX B`

Annotate