Automation of analysis procedures for the Citizen CATE Experiment Eclipse Data, The

(1)

Title: The automation of analysis procedures for the Citizen CATE Experiment Eclipse Data Authors: Logan Jensen1_{, Michael Pierce}1

Affiliations: 1_{University of Wyoming} Introduction

The outer atmosphere of the Sun, called the corona, presents a bountiful scientific target for astronomers. The inner corona, from 1 to 2.5 solar radii is of particular interest. In this region the magnetic field of the Sun evolves from a complex structure to almost purely radial features (Woo and Habbal, 1999). Additionally, this region may offer insight into solutions of the infamous coronal heating problem. The solar corona is around 500 times hotter than the surface of the Sun and its temperature increases further away from the solar surface. The method of energy transfer to the corona is thought to be due to the propagation of magnetic waves but the observational evidence is lacking. These curiosities, and many more like them, are behind the drive to conduct in-depth studies of the inner corona.

Unfortunately, the inner corona is notoriously difficult to observe for both ground and space-based telescopes. Space-space-based telescopes are designed to either image the Sun’s extremely bright surface using narrow wavelength filters, making the corona invisible, or they artificially block out the Sun’s bright surface with an occulting disk to allow the fainter outer corona to be imaged. This occulting disk is used to block light from the surface of the Sun to protect sensitive detectors from overexposure. The disk must be larger than the Sun to account for telescope instability in orbit and pointing errors; as a result, it also blocks out the inner corona. Ground-based telescopes can collect data across a wide range of wavelengths which allows them to sample more of the corona, however they are limited by the high background light levels caused by sunlight scattering in Earth’s atmosphere. The only way to lower these background light levels is to block the light from the Sun before it enters the atmosphere, therefore a solar eclipse presents the ideal opportunity for collecting data on the inner corona form the ground. However, a solar eclipse is a very short event at any given location and can occur almost anywhere in the world, making it a difficult tool to utilize.

(2)

The 2017 solar eclipse provided a rare opportunity as the path of the eclipse crossed through the middle of the continental United States. The eclipse lasted for only 150 seconds at any one location, but in total there was 93 minutes of totality across the country. The National Solar Observatory’s Citizen

Continental-America Telescopic Eclipse (CATE) Experiment was designed to take full advantage of this opportunity. The experiment fielded 68 identical sets of observational equipment and trained non-professional volunteers in its operation and in eclipse data collection. These 68 sites were spaced across the path of totality to provide complete coverage of the eclipse from Oregon to South Carolina. 65 out of 68 sites successfully collected data of the solar corona during totality, producing a data set spanning 90 minutes and containing over 40,000 eclipse images. Each of these images needed to be calibrated, combined, filtered, and aligned to become scientifically valuable. Due to the data set’s size, the automation of this processing would be vital. In addition, due to the first-of-its kind nature of the data new processes and algorithms needed to be developed and implemented. This paper will describe the automation of image processing techniques in detail and provide the initial results of these efforts.

Background

The solar corona is composed of ionized plasma that surrounds the Sun and extends millions of kilometers into space. This charged plasma travels along the magnetic field lines of the Sun do to the interaction between moving charged particles and magnetic fields. Light from the Sun scatters off the free electrons in this plasma and produces the visible white-light corona seen during an eclipse. Therefore, these effects allow for the study of the Sun’s magnetic field by observing the corona. Understanding the Sun’s magnetic field is important because the dynamics of this field are the driver behind all high energy events on the Sun such as coronal mass ejections, solar flares, and the solar wind. As an example, radiation traveling in the solar wind interacts with the Earth’s magnetic field and atmosphere to produce aurorae. However, the solar wind can also interfere with satellites in space and a significantly large event, called a solar storm, can permanently disable satellites and endanger astronauts. With society becoming more dependent on satellites for navigation, communication, and internet it is becoming equally more important to understand the Sun’s magnetic field, the corona, and the solar wind.

(3)

The goal of the Citizen CATE Experiment was to collect data from the inner solar corona on a scale previously unachieved in astronomy. It was able to accomplish this feat due to a confluence of circumstances. The first of which was the accessibility of the eclipse path. The 2017 solar eclipse was the first total solar eclipse visible from the contiguous United States since 1979, and its path through the middle of the country along with America’s infrastructure meant millions of people would have easy access to the eclipse (Penn, 2016). The second factor was ease of access to the necessary equipment for observing the corona. As manufacturing techniques improve, the cost of science-grade cameras and telescopes and their rarity in the marketplace have dropped to the point where there is now a prominent worldwide amateur astronomy community. These two factors made the purchase and deployment of 68 sets of observational equipment an affordable and realistic task. In addition, the CATE Experiment reached out to middle and high schools and student groups along the eclipse path who were willing to operate the telescopes and collect eclipse data. In return the observing equipment was donated to the volunteer groups. Through this outreach, the CATE Experiment aimed to inspire scientific interest across the country and support that interest with the tools for exploration.

While most of the funding for the CATE Experiment came from NASA and the National Science Foundation (NSF), in order to donate the $260,000 in equipment to the volunteers the equipment needed funding from corporate and private donations. This was achieved thanks to the generous donations of the manufacturers who donated a large portion of the equipment for the project. The funding from NASA and NSF was used to employ six groups of undergraduate students and professors for two years to develop the project and travel across the country delivering equipment and training the volunteers in observation and data collection techniques for both the solar eclipse and for their future use after the eclipse.

In developing the Citizen CATE Experiment, five groups of amateur undergraduate students with mentor professors were trained in solar observation with the CATE equipment and sent to various

locations in Indonesia to observe the March 9, 2016 solar eclipse. The observation of this eclipse served three primary purposes: First, proof that amateurs could collect valuable and continuous data on a country-wide solar eclipse. Second, it revealed improvements that needed be made to the CATE

(4)

equipment, observation procedure, and training before 2017. Third, the data was used to develop

processing methods to reduce the raw images to a scientifically suitable form. Because this data is unique, many different potential processes and algorithms had to be explored. This task was undertaken by the undergraduate students during summer internships at the National Solar Observatory. These

developments were key to preparing the CATE experiment for 2017 and the improved procedures and teaching strategies streamlined the training of the vast network of volunteers.

To maximize the scientific value of the CATE data, the entire collection of over 110,000 images of calibration data and eclipse data collected on August 21, 2017, over 1.5 terabytes in total, needed to be processed and utilized. To accomplish this in a reasonable timeframe, procedures to reduce the raw data into its most scientifically useful form needs to be automated. Additionally, the analysis code was written in the computer coding language Python. Python has the benefit of being a free and open source software. It is also well documented and many tutorials are available, making it one of the easiest coding languages to learn. These factors will allow members of the CATE team from any age or skill level to continue working on the project or develop their own research questions without requiring expensive computer software.

Methods

Modern solid-state detectors work by exposing light sensitive capacitors, called pixels, to light. As photons interact with the pixels, electrons build up and a charge is stored in the capacitors. These electrons are then read out of the pixels as a current to produce a digital signal representing the amount of light collected in each pixel. Ideally, these values would only be the result of photons from the subject of the image, but there are several other contributing factors to the signal that need to be measured and removed from the images to produce accurate measurements of the subject (Gamal, 2005).

The first of these error sources is called the dark current. Over time, the pixels of a detector will build up charge at a rate dependent on the temperature of the detector itself. This current is an inherent property of the material used to make the pixels, and the processes that generate the current are explained in detail by Carrere (Carrere, 2014). As its namesake suggests, the detector will register this current even

(5)

when the shutter is closed and there is no light hitting the detector. The standard way to correct for these added values is to subtract an image that contains only the dark current from the “science” images of the subject that contain signal from both the subject and the dark current. To collect the dark current images the lens cap is left on the telescope and then images are collected with the same exposure time as the science images and ideally with the camera at the same temperature. Many frames at each exposure are collected and then the median for each exposure sequence produces an image called a master dark (Figure 1). Median is used instead of average throughout processing because it is less influenced by outlier values. These master dark images are then subtracted from all other images collected with the same exposure time to correct for the dark current.

The next step in processing is called flat field correction. This procedure corrects for sensitivity variations between pixels. These variations can be caused by the properties of pixels themselves or by the telescope and how uniformly it delivers light to the detector. The pixels induce variation because each pixel is uniquely sensitive to light due to limits in manufacturing tolerances, so if two pixels receive the same amount of light they can produce slightly different values. The telescope produces variation through dust collection and optical aberrations. To measure these effects 100 images are collected of the out of focus open sky, which is uniformly illuminated and should record the system’s response to uniform (flat) illumination. The corresponding master dark is subtracted from all flat images and then the median is

Figure 1: Master ark current image from CATE site 23 demonstrating the thermal noise readings from the camera.

(6)

taken. This median flat image is then divided by the average value of that image to create a normalized image with an average value of one (Figure 2).

In Figure 2 the dark circles are caused by dust on the telescope lens, the darkening at the corners of the images are an optical aberration called vingetting, and the “static” is caused by the variations in pixel sensitivity. The final step in flat field correction is to divide all dark-subtracted science images by the normalized flat field image. The result is that artificially darkened pixels are divided by a number less than one to increase their value and artificially brighter pixels are divided by a number greater than one to decrease their value. This procedure can be likened to zeroing a scale to ensure measurements are not biased negatively or positively.

After calibration, the data is ready for combination into high dynamic range (HDR) images. High dynamic range imaging allows cameras to capture a significantly greater range in luminance levels than would otherwise be achievable with the device. To create an HDR image, multiple images of the same object with various exposure times are combined, allowing the camera to measure a variety of lighting levels. This process is crucial for the CATE data because the intensity of the solar corona drops by a factor of 10,000 from the edge of the Sun to the edge of the CATE equipment’s field of view, about two solar radii away from the center of the Sun. This dynamic range would be impossible for the camera to

Figure 2: Normalized flat field image from CATE site 23 demonstrating the effects of detector and telescope induced

(7)

correctly expose in a single image. Therefore, during the eclipse each site collected eight images with different exposure times in a rapidly repeating cycle. An example cycle is shown in Figure 3 to

demonstrate how each image correctly exposes a different region of the corona.

To turn this sequence into an HDR, the overexposed pixels from each image are removed and the

remaining pixels from each frame are added together. This sum is then divided by a special image called a mask (Figure 4) that measures the number of frames in the sequence where each pixel was correctly exposed, and the result of the division is an HDR image (Figure 5).

Figure 4: Mask image from CATE site 23. Darker areas are where fewer frames were correctly exposed. The white disk at

the center is the moon and should be ignored.

Figure 5: HDR from CATE site 23 demonstrating the improvement in dynamic range. The black disk is the moon and

the light area is the corona.

Figure 3: Sequence of 8 eclipse images displaying the various exposure levels. (exposure times in milliseconds from left to right: 0.4 ms, 1.3 ms, 4.0 ms, 13.0 ms, 40.0 ms, 130.0 ms, 400.0 ms, 1300.0 ms)

(8)

Further processing of the HDR images requires a measured location for the center of the Sun in each image. However, for this work the center of the moon is used as an approximation as the difference between the center of the moon and Sun is negligible for this purpose. The first step in this process was to calculate the intensity gradient, or rate of change, along the X and Y coordinates (Figure 6).

This highlights the edges of the moon in the images because the solar corona is far brighter than the moon, resulting in a sharp decrease or increase in intensity at the lunar limb as is shown in Figure 7.

For the X-coordinate gradient, the location of the maximum and minimum gradient of each row is measured. To exclude rows that don’t contain the solar image and to reduce the impact of noise, only

Figure 6: Intensity gradient in Y (left) and X (right) coordinates

Figure 7: Gradient plot for displaying the difference between the gradient of a row that does not include the Sun (Orange) and one that does

(9)

points with gradients whose magnitudes are greater than 500 are considered. Additionally, to be saved the location of the maximum and minimum gradient must be separated by more than 100 pixels. This ensures that gradients are caused by the edges of the moon rather than some other features or noise. Half of the distance between each remaining max and min pair is calculated to find the center, then all the center measurements are averaged together to determine the X-coordinate for the center of the moon. This same process is repeated in Y-coordinates to determine the Y-coordinate for the center of the moon. The result of this process is a measured location for the center of the moon to sub-pixel accuracy (Figure 8). The

formal errors for the X- and Y-coordinate of the moon’s center were computed from the standard deviations of the values divided by the square root of the number of measurements. For the typical HDR image this would found to be 0.03 pixel.

Measuring the center of the moon for all the HDRs now allows for the use of processes that operate based on the radial distance from the Sun. For the CATE data, a Normalizing Radial Graded Filter (NRGF) is applied to enhance detail in the solar corona (Druckmüllerová, 2011). As previously addressed, the intensity of the

Figure 8: Histograms for the X and Y center values. Sigma is the error tolerance on the measurement of center.

σ = 0.0265 _{σ = 0.1093}

Figure 9: Plot of average radial intensity to display steep radial intensity drop over the image.

(10)

solar corona drops by a factor of 10,000 from the limb of the Sun to the edge of the CATE field of view (Figure 9). The purpose of the NRGF is to remove this sharp decrease in intensity to enhance the detail in the corona. This filtering is accomplished by converting the Cartesian-coordinate of each pixel in the HDRs into a polar-coordinate to organize pixel intensity by radius from the center of the moon. The result

is a one-dimensional vector quantifying the average intensity as a function of distance from the center of the moon. The radial distance of each pixel in the HDR is then computed from the moon’s center and is then divided by the average intensity at that radius. The result is a radially normalized image which removes the sharp intensity drop and enhances the remaining information in the image (Figure 10)

The static that can be seen towards the edge of the image in Figure 11 is a result of a low signal to noise ratio. This is the reason why all 60 NRGF images need to be used. By averaging, also called

“stacking”, all 60 NRGF frames, the signal to noise ratio in the final image can be increased by

approximately a factor of 8 (square root of 60). This increased signal will enable the detection of subtler coronal features and improve the accuracy of position measurements of these features within the solar corona. Before the images can be combined they must be aligned to avoid smearing the details of the corona. Alignment is necessary due to slight human errors in the setup of the telescope that causes the tracking direction of the telescope to be misaligned with the true motion of the Sun. As a result, the position of the Sun on the detector changes over the duration of the eclipse. Additionally, the lunar

(11)

centers measured previously cannot be used for alignment because the moon moves relative to the Sun during the eclipse. Image motion is a common problem in Astronomy, and amateur astronomers have developed software that is free to use that can quickly and accurately align and stack sequences of images. For this work the software package “AutoStakkert!” (Kraaikamp, 2017) was used. This choice was made because AutoStakkert! is one of the most popular packages in the amateur community, and it allows the

user to select specific regions of an image to analyze for alignment. This is a crucial feature, as the software must be able to align the small and low contrast features of the Sun and its corona rather than the large and high contrast moon. A solar prominence (pictured in Figure 11 & 12) is visible on the edge of the Sun throughout the eclipse. This feature is much brighter and sharper than the corona, so to ensure alignment accuracy this region was chosen as the focus for the alignment software. The program takes around one minute to produce the aligned and stacked frame (Figure 12). By taking the average of all 60 images, the random, Poisson, noise averages out leaving more of the signal from the corona and reducing the noise by a factor of 8.

The final step in the processing sequence is the application of a Sobel filter to the stacked images. The Sobel filter is used for edge enhancement and creates an image with highlighted edges of features. Sobel filtering works using localized intensity gradients of an image to find edges and sharp features (Vincent, 2009). The primary goal of the CATE eclipse data is to emphasize the thread-like filaments in

Figure 11: Close-up of the solar prominence prior to stacking with the Sun/Moon in the upper left.

Figure 12: Close-up of the solar prominence in the stacked image displaying significant improvement in signal to noise

(12)

the corona, particularly at the Sun’s magnetic poles. This filament highlighting will allow for future ventures into coronal filament tracking and outflow velocity measurements. The Sobel filter emphasizes

the importance of increasing the signal to noise ratio of the CATE data because the algorithm is sensitive to noise. When comparing the result of Sobel filtering an unstacked NRGF image (Figure 13) to the result of Sobel filtering the stacked image (Figure 14), it becomes clear that the improvement in signal to noise is vital for advanced analysis of the CATE data.

Results

On August 21, 2017, 65 out of the 68 sites across the United States collected over 110,000 images worth of calibration data and solar eclipse data. The others were either clouded out or experienced

technical issues. Before the automated analysis procedures were developed, a preliminary movie was made using one NRGF and Sobel filter image (both described above) from each site. These 65 frames were manually aligned and combined into one continuous movie of the eclipse spanning across the country. However, this process was completed by hand and only utilized 1% of the total data collected by the CATE sites. Due to difficulty and limitations on time and personnel it took approximately one month to develop the process and create the movie.

Using this 1% of data, the CATE team was able to measure details of the inner solar corona and its evolution to a previously unachieved level of accuracy and have also made several first of their kind

Figure 13: Result of Sobel filter applied to an individual NRGF image displaying the sensitivity to noise and lack of detail

Figure 14: Result of the Sobel filter applied to 60 stacked NRGF images displaying the edge finding properties of the filter

(13)

measurements. One of the primary interests in the CATE data is to measure the velocity and acceleration of material in the region of the corona around the Sun’s magnetic poles, called the polar plumes. Due to the difficulty in observing the inner solar corona, previous measurements included large margins of error and concurrent studies disagreed with each other as seen in Figure 15. Measurements from the CATE data (shown in red) set a new standard in precision for this region of the Sun, and the agreement between measurements will allow for the study of acceleration in the region as well as velocity. With the

processing now automated and the full breadth of the CATE data available, further studies of the corona will be more accurate and previously unseen or immeasurable features will now be accessible.

Discussion

The automation of the analysis procedures and improvements to the alignment process, brings the processing time for 60 CATE images at one site to approximately 30 minutes. This is a significant reduction in processing time and utilizes 60 times more data than before. Previously, the major obstacle to analysis of the CATE data was the volume of the dataset, but with the major processes developed and the automation framework built this is no longer an issue. Additional processing steps and further analysis of the corona now only requires the addition of additional programs to the processing pipeline.

Work on the automated processing for the CATE data will continue. First, as addressed in the methods section under center finding, there is a systematic error in measuring and using the location for

Figure 15: Plot of previous measurements for polar plume outflow velocity with overlaid measurements from the CATE

experiment in red. The yellow box indicates the region the CATE experiment hoped to study.

(14)

the center of the moon rather than the Sun in processing. While this error does not significantly affect the result of the current work, it will influence future analysis if not corrected for. Additionally, all the data from the 65 CATE sites needs to be processed. Currently, only three sites’ data have gone through the pipeline as tests of the automation. The final step will be to align the stacked image results from each site to the images from the other sites for combination into a movie. Unlike the alignment that has already been done, this procedure will have to correct for both translational (X and Y coordinate shifts) and rotational differences between images. The images in each individual site only differ translationally due to telescope tracking errors over time, but when comparing one site to the next it is possible for there to be a rotational difference between the images as well. For example, when mounting the camera into the telescope it may be oriented at a different angle or even upside down compared to the next site. CATE volunteers were trained in a standard setup, but human error must be accounted for. AutoStakkert! is not currently capable of correcting for random rotation angles between images, so an additional Python procedure will need to be developed.

A python procedure will be written to rotate the combined HDR images from each site into alignment making use of the rotation angles that were determined in the preliminary analysis mentioned above. AutoStakkert! Will then be used to precisely shift them into alignment. The resulting images will then be combined into the final high signal to noise movie spanning the full 90 minutes of the 2017 eclipse. This movie will form the basis for the study of the time-evolution of the solar corona and will be one of the primary data products of the CATE experiment. A complete set of the raw and processed data will be available to the entire CATE team, including the volunteers, for them to use as they see fit and all CATE data and results become publicly available August 2018.

(15)

References

• Woo, R., & Habbal, S. R. (1999). Radial evolution of density structure in the solar corona. Geophysical Research Letters, 26(13), 1793-1796. doi:10.1029/1999gl900366

• Penn, M. J. et al. (2016). Instrumentation for the Citizen CATE Experiment: Faroe Islands and Indonesia. Publications of the Astronomical Society of the Pacific, 129(971), 015005.

doi:10.1088/1538-3873/129/971/015005

• A. El Gamal & H. Eltoukhy, (2005) CMOS image sensors. IEEE Circuits and Devices Magazine, vol. 21, no. 3, pp. 6-20. doi: 10.1109/MCD.2005.1438751

• Carrere, J., Place, S., Oddou, J., Benoit, D., & Roy, F. (2014). CMOS image sensor: Process impact on dark current. 2014 IEEE International Reliability Physics Symposium.

doi:10.1109/irps.2014.6860620

• Druckmüllerová, H., Morgan, H., & Habbal, S. R. (2011). Enhancing Coronal Structures with The Fourier Normalizing-Radial-Graded Filter. The Astrophysical Journal, 737(2), 88. doi:10.1088/0004-637x/737/2/88

• Vincent, O., & Folorunso, O. (2009). A Descriptive Algorithm for Sobel Image Edge Detection. Proceedings of the 2009 InSITE Conference. doi:10.28945/3351

• E. Kraaikamp, (2017) Autostakkert (www.autostakkert.com)

• Marshall, P., Lintott, C., FletcherIdeas L. (2015) Ideas for Citizen Science in Astronomy. Annual Review of Astronomy and Astrophysics 53:1, 247-278. doi:10.1146/annurev-astro-081913-035959