Abstract:In translation teaching, students regularly prepare “parallel texts” for a new translation project and build these linguistic materials into database, or a reference corpus. This corpus requires a relative high volume of data in order to be effective due to no ideal solution provided by available product/service-oriented translation technology. This article aims at this scalability challenge and introduces an interdisciplinary approach by combining Python and PostgreSQL. The automation challenges in processes like corpus design, data collection and corpus management are addressed by applying these technologies. The entire large-scale corpus building process is presented in detail with relevant Python and PostgreSQL source code disclosed.