Code Overview: Modules
=======================

analogs.py
------------

.. py:module:: analogs

   :synopsis: This module contain the principal classes and functions to make the time series preprocessing and reconstructions


   The ``rascal.analogs.Station()`` class stores station metadata (code, name, altitude, longitude and latitude) and calculate daily time series.

   .. py:class:: rascal.analogs.Station(path)

      Stores station metadata (code, name, altitude, longitude and latitude) and calculate daily time series.

      :param path: Path of the directory that contains the observations.
      :type path: str
   
      .. py:attribute:: path
   
         Path of the directory that contains the observations
   
         :type: str
   
      .. py:attribute:: meta
      
         DataFrame with the metadata of the station (code, name, altitude, longitude and latitude) 
   
         :type: pd.DataFrame
   
      .. py:attribute:: longitude
      
         Longitude of the station
   
         :type: float
   
      .. py:attribute:: latitude
   
         Latitude of the station
   
         :type: float
   
      .. py:attribute:: altitude
      
         Elevation of the station
      
         :type: float
      
      .. py:method:: get_data(variable, [skipna=True])
      
         Get the daily time series of the ``variable``
   
         :param variable: variable name.
         :type variable: str
         :param skipna: skipna when resampling to daily frequency.
         :type skipna: bool
         :return: data
         :rtype: pd.DataFrame
      
      .. py:method:: get_gridpoint(grid_latitudes, grid_longitudes)


   The ``rascal.analogs.Predictor()`` class stores the predictor data and Principal Component Analysis results:

   .. py:class:: rascal.analogs.Predictor(paths, grouping, lat_min, lat_max, lon_min, lon_max, [mosaic=True], [number=None])

      Predictor class. This contains data about the predictor variable to use for the reconstruction.

    
      :param path: Paths of the grib file to open.
      :param grouping: Method of grouping the data, str format = "frequency_method"
      
         - frequency=("hourly", "daily", "monthly", "yearly")
         - method=("mean", "max", "min", "sum")
         
      :param lat_min: Predictor field minimum latitude
      :param lat_max: Predictor field maximum latitude
      :param lon_min: Predictor field minimum longitude
      :param lon_max: Predictor field maximum longitude
      :param mosaic: if True apply ``.to_mosaic()`` method
      :param number: Ensemble member number
   
      :type path: list[str]
      :type grouping: str or None
      :type lat_min: float
      :type lat_max: float
      :type lon_min: float
      :type lon_max: float
      :type mosaic: bool or None
      :type number: int or None
                     
   
      .. py:attribute:: data
   
         :type: xr.Dataset
   
      .. py:method:: crop(lat_min, lat_max, lon_min, lon_max)
   
         Crop the domain of the dataframe
         
         :param lat_min: New minimum latitude
         :param lat_max: New maximum latitude
         :param lon_min: New minimum longitude
         :param lon_max: New maximum longitude
         :type lat_min: float
         :type lat_max: float
         :type lon_min: float
         :type lon_max: float
   
      .. py:method:: to_mosaic()
      
         To use various simultaneous predictors or a vectorial variable, concatenate the variables along the longitude
         axis to obtain a single compound variable, easier to use when performing PCA.
   
         :return: compound_predictor
         :rtype: xr.Dataset

      .. py:method:: module()
   
         Get the module of the predictor variables as if they were components of a vector.
         
         :return: self
         :rtype: Predictor

   
      .. py:method:: anomalies([seasons], [standardize], [mean_period])
      
         Calculate seasonal anomalies of the field. The definition of season is flexible, being only a list of months
         contained within it.
         
         :param seasons: Months of the season. Default = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
         :param standardize: Standardize anomalies. Default = True
         :param mean_period: Dates to use as mean climatology to calculate the anomalies.
         :type seasons: list[list[int]] or None
         :type standardize: bool or None
         :type mean_period: list[pd.DatetimeIndex] or None
         
         :return: anomalies (dims = [time, latitude, longitude, season])
         :rtype: xr.Dataset
   
      .. py:method:: pcs(path, npcs, [seasons], [standardize], [pcscaling], [overwrite], [training], [project])

         Perform Principal Component Analysis. To save computation time, the PCA object can be saved as a pickle, so
         the analysis does not have to be performed every time.
         
         :param path: Path to save the PCA results
         :param npcs: Number of components.
         :param seasons: List of list of months of every season.
         :param standardize: If True, the anomalies used in the PCA are standardized.
         :param pcscaling: Set the scaling of the PCs used to compute covariance. The following values are accepted:
         
             - 0 : Un-scaled PCs.
             - 1 : PCs are scaled to unit variance (divided by the square-root of their eigenvalue) (default).
             - 2 : PCs are multiplied by the square-root of their eigenvalue.
             
         :param overwrite: Default = False. If True recalculate the PCA and overwrite the pickle with the PCA
         :param training: Dates to use for calculating the PCA
         :param project: Data to project onto the calculated PCA results.

         :type path: str
         :type npcs: int
         :type seasons: list[list[int]] or None
         :type standardize: bool or None
         :type pcscaling: int or None
         :type overwrite: bool or None
         :type training: list[pd.DatetimeIndex] or None
         :type project: xr.Dataset or None


   The ``rascal.analogs.Analogs()`` get the pool of analog days and reconstruct the time series:

   .. py:class:: rascal.analogs.Analogs(pcs, dates, observations)

      Predictor class. This contains data about the predictor variable to use for the reconstruction.

      :param path: Optional "kind" of ingredients.
      :type path: list[str] or None
   
      .. py:method:: get_pool(size, [vw_size], [vw_type], [distance])
      
         Get the pool of ``size`` closest neighbors to each day
         
        :param size: Number of neighbors in the pool.
        :param vw_size: Validation window size. How many data points around each point is ignored to validate the
            reconstruction.
        :param vw_type: Type of validation window. Options:
        
            - forward: The original date is the last date of the window.
            - backward: The original date is the firs date of the window.
            - centered: The original date is in the center of the window.
            
        :param distance: Metric to determine the distance between points in the PCs space. Options:
        
            - euclidean
            - mahalanobis (Wishlist)
            
        :return: ``(analog_dates, analog_distances)``, dates of the analogs in the pool for each day, and distances in the PCs space of each
            
        :type size: int
        :type vw_size: int or None
        :type vw_type: str or None
        :type distance: str or None
        :rtype: (pd.DataFrame, pd.DataFrame)
        
   
      .. py:method:: reconstruct([pool_size], [method], [sample_size], [mapping_variable], [vw_size], [vw_type], [distance])
          
         Reconstruct a time series using the analog pool for each day.
         
         :param pool_size: Size of the analog pool for each day.
         :param method: Similarity method to select the best analog of the pool. Options are:
         
            - 'closest': (Selected by default) Select the closest analog in the PCs space
            - 'average': Calculate the weighted average of the 'sample_size' closest analogs in the PCs space.
            - 'quantilemap': Select the analog that represent the same quantile in the observations pool that another mapping variable.
            
         :param sample_size: Number of analogs to average in the 'average' method
         :param mapping_variable: Time series of a variable to use as mapping in 'quantilemap'
         :param vw_size: Validation window size. How many data points around each point is ignored to validate the reconstruction.
         :param vw_type: Type of validation window. Options:
         
            - forward: The original date is the last date of the window.
            - backward: The original date is the firs date of the window.
            - centered: The original date is in the center of the window.
         
         :param distance: Metric to determine the distance between points in the PCs space. Options:
         
            - euclidean
            - mahalanobis (Wishlist)
            
         :type pool_size: int or None
         :type method: str or None
         :type sample_size: int or None
         :type mapping_variable: Predictor or None
         :type vw_size: int or None
         :type vw_type: str or None
         :type distance: str or None
         
         :return: reconstruction
         :rtype: pd.DataFrame
   

analysis.py
------------

.. py:module:: analysis

   :synopsis: This module contain the principal classes and functions analyze the skill and validate the reconstructions
   
   
   You can use the ``rascal.analysis.RSkill()`` class to validate and analyze the skill of the reconstructions:

   .. py:class:: rascal.analysis.RSkill([observations], [reconstructions], [reanalysis], [data])
      
      Predictor class. This contains data about the predictor variable to use for the reconstruction.
      
      :param observations: Obstervations time series
      :type observations: pd.DataFrame or None
      :param reconstructions: Reconstructions time series
      :type reconstructions: pd.DataFrame or None
      :param reanalysis: Reanalysis time series
      :type reanalysis: pd.DataFrame or None
      :param data: All data joined (observations, reconstructions, reanalysis)
      :type data: pd.DataFrame or None
      
      .. py:attribute:: observations
      
         Obstervations time series
      
         :type: pd.DataFrame
      
      .. py:attribute:: reconstructions
      
         Reconstructions time series
         
         :type: pd.DataFrame
      
      .. py:attribute:: reanalysis
      
         Reanalysis time series
      
         :type: pd.DataFrame
      
      .. py:attribute:: data
      
         All data joined (observations, reconstructions, reanalysis) concatenated in the columns axis
      
         :type: pd.DataFrame
      
      .. py:method:: resample(freq, grouping, [hydroyear], [skipna])
      
         Resample the dataset containing observations, reconstructions and reanalysis data.
         
         :param freq: New sampling frequency.
         :param grouping: Options="mean", "median" or "sum"
         :param hydroyear: Default=False. If True, when the resampling frequency is "1Y" it takes hydrological years (from October to September) instead of natural years
         :param skipna: Default=False. If True ignore NaNs. 
         
         :type freq: str
         :type grouping: str
         :type hydroyear: bool or None
         :type skipna: bool or None
         
         :return: RSkill with resampled data
         :rtype: RSkill
        
   
      .. py:method:: plotseries([color], [start], [end], [methods])
      
         Plot the time series of the reconstructions with the reanalysis and observations series
      
         :param color: dict of which color to use (values) with each dataset (keys)
         :param start: Start date of the plot
         :param end: End date of the plot
         :param methods: Reconstruction methods to plot
         
         :type color: dict or None
         :type start: Datetime or None
         :type end: Datetime or None
         :type methods: list[str] or None
   
      .. py:method:: skill([reference=None], [threshold=None])
      
        Generate a pd.DataFrame with the table of skills of various simulations. The skill metrics are:
        
            - Mean Bias Error (bias)
            - Root Mean Squared Error (rmse)
            - Correlation Coefficient (r2)
            - Standard Deviation (std)
            - MSE-based Skill Score (ssmse)
            - Heidke Skill Score (hss)
            - Brier Score (bs)

         :param reference: Time series of a reference model to compare when calculating SSMSE and HSS.
         :param threshold: Threshold to use when computing the HSS and BS
         
         :type referece: pd.DataFrame or None
         :type threshold: float or None
   
         :return: ``(observation_std, skill_table)``, Standard deviation of the observations and table of each skill score for each simulation.
         :rtype: (float, pd.DataFrame)  
         
         
      .. py:method:: taylor()
      
         Calls ``.skill()`` method and computes the Taylor diagram
         
         :return: fig, ax
   
      .. py:method:: annual_cycle([grouping], [color])
      
         Plot the annual cycle of the reconstructions, reanalysis and observations
      
         :param grouping: (Default="mean") Monthly grouping to plot in the cylce. Options=("sum", "mean", "median", "std")
         :param color: dict of which color to use (values) with each dataset (keys)
         :type grouping: str or None
         :type color: dict or None
   
      .. py:method:: qqplot()
      
         Quantile-Quantile plot


indices.py
------------

.. py:module:: indices

   :synopsis: This module contain the principal classes and functions to calculate relevant climatic indices
   
   
   You can use the ``rascal.indices.CIndex()`` class to retrieve relevant climatic indices based on:
   Data, C. (2009). Guidelines on analysis of extremes in a changing climate in support of informed decisions for adaptation. World Meteorological Organization.


   .. py:class:: rascal.analysis.CIndex(df)


      :param df: Time series containing the relevant variables for the index calculation.
      :type df: pd.DataFrame
   
      .. py:method:: fd()
   
         Count of days where TN (daily minimum temperature) < 0°C
         Let TNij be the daily minimum temperature on day i in period j. Count the number of days where TNij < 0°C.
         
         :return: idx
         :rtype: pd.DataFrame
      
      .. py:method:: su() 
   
         Count of days where TX (daily maximum temperature) > 25°C
         Let TXij be the daily maximum temperature on day i in period j. Count the number of days where TXij > 25°C.
         
         :return: idx
         :rtype: pd.DataFrame
   
      .. py:method:: id()
   
         Count of days where TX < 0°C
         Let TXij be the daily maximum temperature on day i in period j. Count the number of days where TXij < 0°C.
         
         :return: idx
         :rtype: pd.DataFrame
      
      .. py:method:: tr()
   
         Count of days where TN > 20°C
         Let TNij be the daily minimum temperature on day i in period j. Count the number of days where TNij > 20°C.
         
         :return: idx
         :rtype: pd.DataFrame
      
      .. py:method:: gsl()
   
         Annual count of days between first span of at least six days where TG (daily mean temperature) > 5°C and first span
         in second half of the year of at least six days where TG < 5°C.
         Let TGij be the daily mean temperature on day i in period j. Count the annual (1 Jan to 31 Dec in
         Northern Hemisphere, 1 July to 30 June in Southern Hemisphere) number of days between the first
         occurrence of at least six consecutive days where TGij > 5°C and the first occurrence after 1 July (1 Jan
         in Southern Hemisphere) of at least six consecutive days where TGij < 5°C.
         
         :return: idx
         :rtype: pd.DataFrame
      
      .. py:method:: txx() 
   
         Monthly maximum value of daily maximum temperature:
         Let TXik be the daily maximum temperature on day i in month k. The maximum daily maximum
         temperature is then TXx = max (TXik).
         
         :return: idx
         :rtype: pd.DataFrame
      
      .. py:method:: tnx()
   
         Monthly maximum value of daily minimum temperature:
         Let TNik be the daily minium temperature on day i in month k. The maximum daily minimum temperature
         is then TNx = max (TNik).
         
         :return: idx
         :rtype: pd.DataFrame
      
      .. py:method:: txn()
   
         Monthly minimum value of daily maximum temperature:
         Let TXik be the daily maximum temperature on day i in month k. The minimum daily maximum
         temperature is then TXn = min (TXik)
         
         :return: idx
         :rtype: pd.DataFrame
      
      .. py:method:: tnn()
   
         Monthly minimum value of daily minimum temperature:
         Let TNik be the daily minimum temperature on day i in month k. The minimum daily minimum
         temperature is then TNn = min (TNik)
         
         :return: idx
         :rtype: pd.DataFrame
      
      .. py:method:: tn10p()
   
         Count of days where TN < 10th percentile
         Let TNij be the daily minimum temperature on day i in period j and let TNin10 be the calendar day 10th
         percentile of daily minimum temperature calculated for a five-day window centred on each calendar day
         in the base period n (1961-1990). Count the number of days where TNij < TNin10.
         
         :return: idx
         :rtype: pd.DataFrame

      .. py:method:: tx10p()
 
         Count of days where TX < 10th percentile
         Let TXij be the daily maximum temperature on day i in period j and let TXin10 be the calendar day 10th
         percentile of daily maximum temperature calculated for a five-day window centred on each calendar day
         in the base period n (1961-1990). Count the number of days where TXij < TXin10
         
         :return: idx
         :rtype: pd.DataFrame
    
      .. py:method:: tn90p()
   
         Count of days where TN > 90th percentile
         Let TNij be the daily minimum temperature on day i in period j and let TNin90 be the calendar day 90th
         percentile of daily minimum temperature calculated for a five-day window centred on each calendar day
         in the base period n (1961-1990). Count the number of days where TNij > TNin90
         
         :return: idx
         :rtype: pd.DataFrame

      .. py:method:: tx90p()
   
         Count of days where TX > 90th percentile
         Let TXij be the daily maximum temperature on day i in period j and let TXin90 be the calendar day 90th
         percentile of daily maximum temperature calculated for a five-day window centred on each calendar day
         in the base period n (1961-1990). Count the number of days where TXij > TXin90.
         
         :return: idx
         :rtype: pd.DataFrame

      .. py:method:: wsdi()

         Count of days in a span of at least six days where TX > 90th percentile
         Let TXij be the daily maximum temperature on day i in period j and let TXin90 be the calendar day 90th
         percentile of daily maximum temperature calculated for a five-day window centred on each calendar day
         in the base period n (1961-1990). Count the number of days where, in intervals of at least six
         consecutive days TXij > TXin90.
         
         :return: idx
         :rtype: pd.DataFrame
    
      .. py:method:: csdi()
   
         Count of days in a span of at least six days where TN > 10th percentile
         Let TNij be the daily minimum temperature on day i in period j and let TNin10 be the calendar day 10th
         percentile of daily minimum temperature calculated for a five-day window centred on each calendar day
         in the base period n (1961-1990). Count the number of days where, in intervals of at least six
         consecutive days TNij < TNin10.
         
         :return: idx
         :rtype: pd.DataFrame

      .. py:method:: dtr()

         Mean difference between TX and TN (°C)
         Let TXij and TNij be the daily maximum and minium temperature on day i in period j. If I represents the
         total number of days in j then the mean diurnal temperature range in period j DTRj = sum (TXij - TNij) / I.
         
         :return: idx
         :rtype: pd.DataFrame
    
      .. py:method:: rx1day()
   
         Highest precipitation amount in one-day period
         Let RRij be the daily precipitation amount on day i in period j. The maximum one-day value for period j is
         RX1dayj = max (RRij).
         
         :return: idx
         :rtype: pd.DataFrame

      .. py:method:: rx5day()
   
         Highest precipitation amount in five-day period
         Let RRkj be the precipitation amount for the five-day interval k in period j, where k is defined by the last
         day. The maximum five-day values for period j are RX5dayj = max (RRkj)
         
         :return: idx
         :rtype: pd.DataFrame

      .. py:method:: sdii()

         Mean precipitation amount on a wet day
         Let RRij be the daily precipitation amount on wet day w (RR ≥ 1 mm) in period j. If W represents the
         number of wet days in j then the simple precipitation intensity index SDIIj = sum (RRwj) / W.
         
         :return: idx
         :rtype: pd.DataFrame
    
      .. py:method:: r10mm()
   
         Count of days where RR (daily precipitation amount) ≥ 10 mm
         Let RRij be the daily precipitation amount on day i in period j. Count the number of days where RRij ≥ 10 mm.
         
         :return: idx
         :rtype: pd.DataFrame

      .. py:method:: r20mm()
   
         Count of days where RR ≥ 20 mm
         Let RRij be the daily precipitation amount on day i in period j. Count the number of days where RRij ≥ 20 mm.
         
         :return: idx
         :rtype: pd.DataFrame

      .. py:method:: rnnmm(threshold)
      
         :param threshold: Precipitation threshold
         :type threshold: float
   
         Count of days where RR ≥ user-defined threshold in mm
         Let RRij be the daily precipitation amount on day i in period j. Count the number of days where RRij ≥ nn mm.
         
         :return: idx
         :rtype: pd.DataFrame

      .. py:method:: cdd()
   
         Maximum length of dry spell (RR < 1 mm)
         Let RRij be the daily precipitation amount on day i in period j. Count the largest number of consecutive
         days where RRij < 1 mm.
         
         :return: idx
         :rtype: pd.DataFrame

      .. py:method:: cwd()
   
         Maximum length of wet spell (RR ≥ 1 mm)
         Let RRij be the daily precipitation amount on day i in period j. Count the largest number of consecutive
         days where RRij ≥ 1 mm
         
         :return: idx
         :rtype: pd.DataFrame

      .. py:method:: r95ptot()
   
         Precipitation due to very wet days (> 95th percentile)
         Let RRwj be the daily precipitation amount on a wet day w (RR ≥ 1 mm) in period j and let RRwn95 be
         the 95th percentile of precipitation on wet days in the base period n (1961-1990). Then R95pTOTj = sum (RRwj),
         where RRwj > RRwn95.
         
         :return: idx
         :rtype: pd.DataFrame

      .. py:method:: r99ptot()
   
         Precipitation due to extremely wet days (> 99th percentile)
         Let RRwj be the daily precipitation amount on a wet day w (RR ≥ 1 mm) in period j and let RRwn99 be
         the 99th percentile of precipitation on wet days in the base period n (1961-1990). Then R99pTOTj = sum (RRwj),
         where RRwj > RRwn99
         
         :return: idx
         :rtype: pd.DataFrame

      .. py:method:: prcptot()
   
         Total precipitation in wet days (> 1 mm)
         Let RRwj be the daily precipitation amount on a wet day w (RR ≥ 1 mm) in period j. Then PRCPTOTj = sum (RRwj)
         
         :return: idx
         :rtype: pd.DataFrame