软件
Software
图 1:Mephisto巡天数据流;illustration of the data flow of Mephisto software
Mephisto 巡天正式开始后,每晚观测得到的原始数据量将达 TB 量级,年数据量达 PB 量级。针对 Mephisto 海量的观测数据和特定科学目标,为提高观测效率,保障海量数据的实时处理、分析和发布,项目组致力于研发自动化巡天系统(见巡天策略及OCS介绍)、巡天数据处理系统、变源暂现源探测分类及预警发布系统,实现无人值守自动化巡天观测,为充分发挥 Mephisto 性能优势,最大化其科学产出奠定坚实基础。各主要环节数据流及相应软件研发任务流程如图1 所示。
After the start of Mephisto Survey, the amount of raw data obtained from nightly observations will be on the order of terabytes (TB), and the amount of annual data will be on the order of petabytes (PB). To improve the observation efficiency and ensure real-time processing, the project team is committed to the development of an automated sky survey system (see the introduction of the survey strategy and OCS), a survey data processing system, and a system for detecting, classifying, and disseminating early warnings of variable and transient sources. This will lay a foundation for fully utilizing the advantages of Mephisto and maximizing its scientific output. The data flow of each major link and the corresponding software development task flow are shown in Figure 1.
巡天数据处理系统
通过对数据处理过程中的数据预处理、天体测量及测光定标等关键技术进行深入研究,项目组致力于研发一套全自动、高性能数据处理系统(如图 2),包括针对变源暂现源实时探测与分类需求的实时、快速处理流水线(< 15秒),并对图像质量进行快速判断和反馈;针对单幅曝光图像的高精度测量和定标需求的单曝光数据处理流水线,包括对观测的图像进行细致的仪器效应改正(如偏置场、平场、宇宙射线等),对图像内的点源和展源提供高精度天测和测光,基于以上分析,形成巡天星表及各种增值星表。
Through research on the issues of data pre-processing, astrometry, and photometric calibration of data processing, the Survey Data Processing System has been developed to be a set of fully automated, high-performance data processing systems (Figure 2), including a real-time, fast processing pipeline (< 15 seconds) for the real-time detection and classification of variable and transient sources and rapid feedback on image quality; a single-exposure data-processing pipeline, which includes detailed instrumental effect corrections (e.g., bias fields, flat fields, cosmic rays, etc.) on observation images, provides high-precision astrometry and photometry of point and spread sources within the images, and, based on the above analysis, results in the formation of survey catalogs and a variety of value-added catalogs.
图2:Mephisto数据处理系统架构流程图;The illustration of Mephisto Data processing pipeline
变源暂现源探测分类及预警发布系统
从海量数据中实时、准确证认并区分不同的暂现源是目前很大的一个挑战。实时预警和后续光谱观测对于研究不同类型的暂现源极为关键,尤其是短时标暂现源。原始观测图像经过预处理之后,进入变源暂现源快速证认、分类及预警发布系统(TranFinder)。TranFinder中的图像对减算法由Zackay等人于2016年提出,以预处理后的新图像、参考图像和对应星表作为输入。进而,该程序在对减图像上进行天体探测,并基于多波段信息和互相关分析技术扣除假源。我们引入人工智能技术来提高暂现源的证认和分类精度。巡天中发现的高概率暂现源将触发预警系统,以便尽快展开后随观测。整个系统的执行时间不超过 5 分钟,并在 1-2 分钟内完成对海量变源暂现源候选体的可靠证认、分类以及预警信息的快速发布。图3为变源暂现源快速证认、分类及预警发布系统的处理流程技术路线。
图3:变源暂现源快速证认、分类及预警发布系统的处理流程技术路线:The llustration of TransFinder
Transient identification and classification from huge volumes of observational data are challenging under the strict requirements of real-time and high-accuracy. The real-time alert and fast follow-up spectroscopic observations are crucial to study different types for the Mephisto time-domain surveys, we develop a customized Python pipeline, TranFinder, to identify transients in almost real time. Starting with the pre-processing products, the differencing algorithm proposed by Zackay et al. (2016) is implemented to subtract the new image from the reference image for each CCD chip separately. Then we perform object detections on the residual image. The multi-band data and cross-correlation analysis will be combined to reduce the spurious objections in an automatic fashion. We introduce artificial intelligence (AI) technology to improve the efficiency and accuracy of transient identification and classification. Transients with high probability will trigger the alert system for further follow-up observations. The execution time of the whole system is less than 5 minutes, and the reliable identification, classification, and rapid release of warning information for a large number of variable temporary source candidates are completed within 1-2 minutes. Figure 3 shows the technical route of the processing flow of the system for rapid identification, classification, and early warning dissemination of variable temporary sources.