Subido por Tomas Barra

Report info

Anuncio
Report of the the project
Informatics
Valentina Baccelliere Cornejo
June 2023
Analysis of a catalog of galaxies
1
Description of the project
The project consist in the analysis of a catalog of galaxies.
1.1
Description of the dataset
ˆ The identifier of the galaxy: specobjid
ˆ The redshift: z
ˆ The apparent magnitudes of the galaxies in different bands and
relative errors: ”petroMag u”, ”petroMagErr u”, etc.... They measure
the light emitted by the galaxy as observed from the Earth
ˆ The flux of a given set of emission lines:(Hα, Hβ, [OIII], [NII]): ”h
alpha flux”, ”h alpha flux err”, etc. They measure the intensity of flux of
a particular element visible in emission in a galaxy spectrum.
ˆ The stellar mass of the galaxy:”lgm tot p50”, ”lgm tot p16”, ”lgm
tot p84”. The logarithm of the stellar mass (in units of solar masses)
is provided, with its 16th and 84th percentiles (to measure errors). It
measures how massive (i.e. how many stars it contains) a galaxy is.
ˆ The star formation rate:”sfr tot p50”, ”sfr tot p16”, ”sfr tot p84”.
The logarithm of the star formation rate (in units of solar masses per
year), is provided with its 16th and 84th percentiles (to measure errors).
It measures the number of stars produced by a galaxy per unit time. It
quantifies how active is a galaxy
ˆ The absolute magnitudes: ”absMagU”, etc... They measure the light
emitted by the galaxy as observed at a fixed distance (absolute values).
1
1.2
Main steps of the project
Figure 1: Project description from slides of the course
Author: Michele Moresco
2
Execution of the project
This project was elaborated using the programming language Python and its
scientific libraries such as numpy, astropy, matplotlib, scipy among others
and its modules.
2
2.1
STEP 1
The request of this step is to read the catalog that was provided and then extract
a subsample from it, the subsample is given through an ID that is personal and
unique for each student, in my case my ID = 37.
The catalog is provided in fits format and to read it, it is necessary to use the
module fits from astropy.io package.
The first thing the code does is to create the directories that then will be used to
save the data and the plots from the next steps. It controls that the directories
exists and if they don’t it creates them. After that it proceeds to read and
access to the fits file, accessing to its content and printing its columns that
will be used for the whole project. The columns are named and the columns to
use for the following steps are named too. Then a mask is used to select and
then count how many galaxies are in my subsample. The number of galaxies
that is in my subsample is then printed to screen.
2.2
STEP 2
This step asks to plot the histograms for some columns and in each histogram
to plot different statistical quantities such as mean, median their errors and a
gaussian curve , for the subsample and for the parent sample. The step was
made using some functions that were made ad hoc for this step, the firs thing
the code does is to separate the different columns in different groups that will
be plotted together, then it proceeds to plot each group of columns using the
STEP2 make hist function, this function plots the data and it cleans it with
the clean array function that uses a mask to clean from NaN and inf values
and sigma clipping to clean it from outliers values. Each histogram is then
saved in a folder called dir plots step 2 s. After the histograms have been
saved, the code initiates four lists empty to be filled with the statistical values:
mean, standard deviation, median and median error. As is asked one value of
the data is calculated “by hand” with a function called statistics by hand.
Then a for cycle is initiated that excludes the firs column which values were
calculated before and then in continues for the other columns calculating the
statistical values using numpy modules. The lists are filled using the function
making lists and then the are saved in dir data step 2 s folder as a fits
file. The same procedure is then made for the parent sample.
The plots and the data saved of the different quantities are shown below:
3
(a) Redshift Subsample
(b) Redshift Parent sample
Figure 2: Redshift
(a) SFR Subsample
(b) SFR Parent sample
Figure 3: Star formation rate
(a) Fluxes Subsample
(b) Fluxes Parent sample
Figure 4: Fluxes
(a) Apparent Magnitude Subsample
(b) Apparent Magnitude Parent sample
Figure 5: Apparent Magnitudes
4
(a) Apparent Magnitude Subsample
(b) Apparent Magnitude Parent sample
Figure 6: Apparent Magnitudes
(a) Mass Subsample
(b) Mass Parent sample
Figure 7: Masses
(a) Absolut Magnitude Subsample
(b) Absolut Magnitude Parent sample
Figure 8: Absolut Magnitudes
(a) Absolut Magnitude Subsample
(b) Absolut Magnitude Parent sample
Figure 9: Absolut Magnitudes
5
Figure 10: Data of the subsample saved in fits file
2.3
STEP 3
This step asks to plot five quantities as a function of redshift, analyzing if some
redshift trend is found. Then save them to file and if some trend is found divide
the redshift into bins and re-do step 2 for each bin and save the figure and the
outputs (mean and median values) to file. The quantities to be plotted are:
Mass, Apparent magnitude, Absolute magnitude, Hα, and SFR as a function
of redshift, so scatter plots. The first thing to do is to create and clean the
arrays and to do so the code must do it three times so the arrays are of the
same length at the end of the cleaning, this is executed in a for cycle for each
trio of values.Make scatters generator function was used to plot each graphic
and its polynomial fit. A trend was detected in the Redshift-Mass plot and
redshift values were divided into three bins so four intervals and then using
a mask, the values of the masses in those intervals were calculated using the
function make hist, then the figure is saved to file in dir plots step 3 with
all of the others plots. Then four lists are initiated to be filled with the statistical
values calling the function making lists four times, the values were saved in
dir out step3 as a fits file. The plots and statistical values are shown below:
6
(a) Redshift-Mass
(b) Redshift-AbsMag u
(a) Redshift-AppMag z
(b) Redshift-Hα
(a) Redshift-SFR
7
Figure 14: Histograms of the relation found for Redshift-Mass
Figure 15: Data step 3
2.4
STEP 4
This step asks to create and analyze three different plots: the BPT, colormass and SFR-mass diagrams. SFR-Mass plot: It begins by defining the
variables that will serve to graph the first plot. Then a for cycle is initiated
in which the code creates a list called lista 4 containing the three variables,
after that a mask is created that eliminates NaN, inf and outliers. This
for cycle applies the same mask to each variable so that at the end of the
cycle the code is going to have same-length arrays. After that the plotting
gets started, opening the correct directory dir plots step 4 and creating the
figure environment make scatters generator function is used to do the scatter
plot. Then a function is defined that represents a theoretical line, the line is
then plotted. A mask divides the y values in two groups above and below
the theoretical line and the data containing these two groups are then plotted
8
as an histogram with make hist.Color-Mass diagram: this diagram is analyzed
and plotted the same way that the SFR-Mass diagram. BPT plot: this plot
has some specifications to consider, it starts by defining the five values that
will be needed then a for cycle is initiated to apply the masks to clean the
data and create same-length arrays. The difference between this plot analysis
and the others is that here the theoretical relation I have is log([OIII]/Hβ >
0.61/(log([NII]/Hα)-0.05)+1.3 so it is necessary to make some considerations
before plotting. These considerations are avoiding division by 0 and considering
that the logarithm’s domain must be >0. This is done by applying some masks
to the data that solves the problem. Then the plotting initiates that gives as a
result a scatter plot of the data with the curve that represents the theoretical
relation and a vertical line at x = 0.05 that represents the asymptote of the
function. Now the histograms are plotted for the x’s and the y’s above and
below the theoretical relation. After this the data of all the plots is saved to
file FITS as in the previous steps using the function making lists . The plots
and the data images are shown below:
(a) Mass-SFR
(b) SFR-Theory
Figure 16: Mass-SFR and SFR(y) above and below theoretical line
(a) Mass-Theory
(b) Mass-Color
Figure 17: Mass(x) above and below theoretical line(Mass-SFR), Mass-Color
9
(a) Color-Theory
(b) Mass-Theory
Figure 18: Color(y) above and below theoretical line(u-r), Mass(x) above and
below theoretical line(u-r)
(a) BPT
(b) Flux1-Theory
Figure 19: BPT, Flux(y)<y th
(a) Flux2-Theory
Figure 21: Flux(x) > y th
10
(a) Flux1-Theory
(b) Flux2-Theory
Figure 20: Flux(y)>y th, Flux(x)< y th
Figure 22: Data step 4
3
Appendix: Functions description
ˆ gauss: This function returns as an output a Gaussian curve given the
number of bins the mean and the standard deviation as an input. As an
output I want a variable that gives me the central position of each bin
so the first thing is to create an empty array full of an integer number of
zeros with the np.zeros module, then a for cycle is initiated to fill this
array, calculating the x position of the middle between the i-th and the
i-th+1 points. At this point my variable x contains the central position
of each bin, after that I’m going to calculate the y variable, the function
returns the x and the y values.
ˆ clean array: This function takes as an input an array and it cleans
it from outlayers and inf and NaN values. Is begins with the creation
11
of a mask that selects just the finite values of an array, then this mask
is applied to the array that was given as an input. Then a series of
three sigma clipping are applied and the function then returns the array
cleaned.
ˆ STEP2 2 make hist: This function takes three inputs, n cols that is a
number, lista, a list of names to put as a title and data, that is the
input data. Then it creates an histogram . A for cycle then is initiated and a current data variable is created, containing the subsample
or parent sample values to put into the histogram. The array is then
cleaned with the clean array function. The creation of the histogram is
initiated using gridspec.Gridspec module giving it the number of rows
equal to two and the number of columns equal to the n cols input. Giving
each histogram the title of the list lista strings. The mean and the standard deviation are calculated and saved as mean and std. Two vertical
lines representing the mean and the standard deviation are plotted and a
spanned area representing its errors are plotted too. The gauss function
is called and two output values are saved as xm and mod then its graphic
is plotted. The residuals for each bin are plotted as a scatter plot.
ˆ median: This is a function that takes as input L that is a list and computes
its median.
ˆ statistics by hand: this functions takes as an input the variable data
and calculates the mean, standard deviation and its errors not using numpy
modules as is asked in the step II. It begins by defining the mean and
initiating a counter with stedev initiated in zero, a for cycle is done to
calculate each value of the standard deviation the median and its error,
the function then returns mean, error of the mean, median and error of
the median.
ˆ statistics numpy: This function calculates the mean, its error, the median and its error using the numpy modules, returning mean, standard
deviation, median and its error.
ˆ making lists: This function takes as input five values, an array and
four lists. At first the function computes the mean standard deviation
the median and its error using the statistics numpy function that give
those quantities as an output. The function the appends to each list each
quantity calculated. When the function is used it is usually inside a for
cycle that fills each list with the quantities requested.
ˆ sigma clip mask generator: This function takes as an input an array.
It starts by creating a copy of the array so the function I’m using doesn’t
modify the original array. Then I create a mask that locate the positions in
which there are values too high (or too low) performing a sigma clipping
using np.logical and identifying as True the values of the array that
are between the mean value plus and less 4 standard deviation to find
outliers, then it creates an empty list that will be filled with the indexes
corresponding to False values of the array, appending those indexes to
the empty list just created. Then a for cycle is initiated in which instead
of deleting the False values just found, the function replaces them with
12
the mean value. Then it applies a mask named mask1 where it performs
sigma clipping again to the array that was just modified replacing the
first ouliers. Then a for cycle is initiated in which an if condition is used
to select the values in which the elements of mask1 that are False and
then replace them into mask as a False. Then I repeat the operations for
the third and fourth round of sigma clipping. Returning mask at the
end.
ˆ make scatters generator: This function takes as in input 6 variables,
the first is a condition, the second and the third are arrays the fourth is
the color bar and the fifth and sixth are the x and y labels. The functions
starts by creating a scatter plot in which three variables are plotted, the
third one as a color bar. Then it imposes a condition which if is True the
function adds to the plot a fitting polynomial and after that the axis are
labeled.
ˆ make hist: This function takes as an input three variables, it starts by
creating the plot environment and starting a for cycle and following commands for the creation of the plot. Using the data DATA the histograms
are created in a way very similar to STEP2 make hist.
13
Descargar