/*
 * Copyright (C) 2018 The Android Open Source Project
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package com.android.networkstack.ipmemorystore;

import com.android.internal.annotations.VisibleForTesting;

/**
 * A class containing the logic around the relevance value for
 * IP Memory Store.
 *
 * @hide
 */
public final class RelevanceUtils {
    /**
     * The relevance is a decaying value that gets lower and lower until it
     * reaches 0 after some time passes. It follows an exponential decay law,
     * dropping slowly at first then faster and faster, because a network is
     * likely to be visited again if it was visited not long ago, and the longer
     * it hasn't been visited the more likely it is that it won't be visited
     * again. For example, a network visited on holiday should stay fresh for
     * the duration of the holiday and persist for a while, but after the venue
     * hasn't been visited for a while it should quickly be discarded. What
     * should accelerate forgetting the network is extended periods without
     * visits, so that occasional venues get discarded but regular visits keep
     * the network relevant, even if the visits are infrequent.
     *
     * This function must be stable by iteration, meaning that adjusting the same value
     * for different dates iteratively multiple times should give the same result.
     * Formally, if f is the decay function that associates a relevance x at a date d1
     * to the value at ulterior date d3, then for any date d2 between d1 and d3 :
     * f(x, d3 - d1) = f(f(x, d3 - d2), d2 - d1). Intuitively, this property simply
     * means it should be the same to compute and store back the value after two months,
     * or to do it once after one month, store it back, and do it again after another
     * months has passed.
     * The pair of the relevance and date define the entire curve, so any pair
     * of values on the curve will define the same curve. Setting one of them to a
     * constant, so as not to have to store it, means the other one will always suffice
     * to describe the curve. For example, only storing the date for a known, constant
     * value of the relevance is an efficient way of remembering this information (and
     * to compare relevances together, as f is monotonically decreasing).
     *
     *** Choosing the function :
     * Functions of the kind described above are standard exponential decay functions
     * like the ones that govern atomic decay where the value at any given date can be
     * computed uniformly from the value at a previous date and the time elapsed since
     * that date. It is simple to picture this kind of function as one where after a
     * given period of time called the half-life, the relevance value will have been
     * halved. Decay of this kind is expressed in function of the previous value by
     * functions like
     * f(x, t) = x * F ^ (t / L)
     * ...where x is the value, t is the elapsed time, L is the half-life (or more
     * generally the F-th-life) and F the decay factor (typically 0.5, hence why L is
     * usually called the half-life). The ^ symbol here is used for exponentiation.
     * Or, starting at a given M for t = 0 :
     * f(t) = M * F ^ (t / L)
     *
     * Because a line in the store needs to become irrelevant at some point but
     * this class of functions never go to 0, a minimum cutoff has to be chosen to
     * represent irrelevance. The simpler way of doing this is to simply add this
     * minimum cutoff to the computation before and removing it after.
     * Thus the function becomes :
     * f(x, t) = ((x + K) * F ^ (t / L)) - K
     * ...where K is the minimum cutoff, L the half-life, and F the factor between
     * the original x and x after its half-life. Strictly speaking using the word
     * "half-life" implies that F = 0.5, but the relation works for any value of F.
     *
     * It is easy enough to check that this function satisfies the stability
     * relation that was given above for any value of F, L and K, which become
     * parameters that can be defined at will.
     *
     * relevance
     *  1.0 |
     *      |\
     *      | \
     *      |  \            (this graph rendered with L = 75 days and K = 1/40)
     *  0.75|   ',
     *      |     \
     *      |      '.
     *      |        \.
     *      |          \
     *  0.5 |           '\
     *      |             ''.
     *      |                ''.
     *      |                   ''.
     *  0.25|                      '''..
     *      |                           '''..
     *      |                                ''''....
     *      |                                        '''''..........
     *    0 +-------------------------------------------------------''''''''''----
     *      0       50       100      150     200      250     300      350     400 days
     *
     *** Choosing the parameters
     * The maximum M is an arbitrary parameter that simply scales the curve.
     * The tradeoff for M is pretty simple : if the relevance is going to be an
     * integer, the bigger M is the more precision there is in the relevance.
     * However, values of M that are easy for humans to read are preferable to
     * help debugging, and a suitably low value may be enough to ensure there
     * won't be integer overflows in intermediate computations.
     * A value of 1_000_000 probably is plenty for precision, while still in the
     * low range of what ints can represent.
     *
     * F and L are parameters to be chosen arbitrarily and have an impact on how
     * fast the relevance will be decaying at first, keeping in mind that
     * the 400 days value and the cap stay the same. In simpler words, F and L
     * define the steepness of the curve.
     * To keep things simple (and familiar) F is arbitrarily chosen to be 0.5, and
     * L is set to 200 days visually to achieve the desired effect. Refer to the
     * illustration above to get a feel of how that feels.
     *
     * Moreover, the memory store works on an assumption that the relevance should
     * be capped, and that an entry with capped relevance should decay in 400 days.
     * This is on premises that the networks a device will need to remember the
     * longest should be networks visited about once a year.
     * For this reason, the relevance is at the maximum M 400 days before expiry :
     * f(M, 400 days) = 0
     * From replacing this with the value of the function, K can then be derived
     * from the values of M, F and L :
     * (M + K) * F ^ (t / L) - K = 0
     * K = M * F ^ (400 days / L) / (1 - F ^ (400 days / L))
     * Replacing with actual values this gives :
     * K = 1_000_000 * 0.5 ^ (400 / 200) / (1 - 0.5 ^ (400 / 200))
     *   = 1_000_000 / 3 ≈ 333_333.3
     * This ensures the function has the desired profile, the desired value at
     * cap, and the desired value at expiry.
     *
     *** Useful relations
     * Let's define the expiry time for any given relevance x as the interval of
     * time such as :
     * f(x, expiry) = 0
     * which can be rewritten
     * ((x + K) * F ^ (expiry / L)) = K
     * ...giving an expression of the expiry in function of the relevance x as
     * expiry = L * logF(K / (x + K))
     * Conversely the relevance x can be expressed in function of the expiry as
     * x = K / F ^ (expiry / L) - K
     * These relations are useful in utility functions.
     *
     *** Bumping things up
     * The last issue therefore is to decide how to bump up the relevance. The
     * simple approach is to simply lift up the curve a little bit by a constant
     * normalized amount, delaying the time of expiry. For example increasing
     * the relevance by an amount I gives :
     * x2 = x1 + I
     * x2 and x1 correspond to two different expiry times expiry2 and expiry1,
     * and replacing x1 and x2 in the relation above with their expression in
     * function of the expiry comes :
     * K / F ^ (expiry2 / L) - K = K / F ^ (expiry1 / L) - K + I
     * which resolves to :
     * expiry2 = L * logF(K / (I + K / F ^ (expiry1 / L)))
     *
     * In this implementation, the bump is defined as 1/25th of the cap for
     * the relevance. This means a network will be remembered for the maximum
     * period of 400 days if connected 25 times in succession not accounting
     * for decay. Of course decay actually happens so it will take more than 25
     * connections for any given network to actually reach the cap, but because
     * decay is slow at first, it is a good estimate of how fast cap happens.
     *
     * Specifically, it gives the following four results :
     * - A network that a device connects to once hits irrelevance about 32.7 days after
     *   it was first registered if never connected again.
     * - A network that a device connects to once a day at a fixed hour will hit the cap
     *   on the 27th connection.
     * - A network that a device connects to once a week at a fixed hour will hit the cap
     *   on the 57th connection.
     * - A network that a device connects to every day for 7 straight days then never again
     *   expires 144 days after the last connection.
     * These metrics tend to match pretty well the requirements.
     */

    // TODO : make these constants configurable at runtime. Don't forget to build it so that
    // changes will wipe the database, migrate the values, or otherwise make sure the relevance
    // values are still meaningful.

    // How long, in milliseconds, is a capped relevance valid for, or in other
    // words how many milliseconds after its relevance was set to RELEVANCE_CAP does
    // any given line expire. 400 days.
    @VisibleForTesting
    public static final long CAPPED_RELEVANCE_LIFETIME_MS = 400L * 24 * 60 * 60 * 1000;

    // The constant that represents a normalized 1.0 value for the relevance. In other words,
    // the cap for the relevance. This is referred to as M in the explanation above.
    @VisibleForTesting
    public static final int CAPPED_RELEVANCE = 1_000_000;

    // The decay factor. After a half-life, the relevance will have decayed by this value.
    // This is referred to as F in the explanation above.
    private static final double DECAY_FACTOR = 0.5;

    // The half-life. After this time, the relevance will have decayed by a factor DECAY_FACTOR.
    // This is referred to as L in the explanation above.
    private static final long HALF_LIFE_MS = 200L * 24 * 60 * 60 * 1000;

    // The value of the frame change. This is referred to as K in the explanation above.
    private static final double IRRELEVANCE_FLOOR =
            CAPPED_RELEVANCE * powF((double) CAPPED_RELEVANCE_LIFETIME_MS / HALF_LIFE_MS)
            / (1 - powF((double) CAPPED_RELEVANCE_LIFETIME_MS / HALF_LIFE_MS));

    // How much to bump the relevance by every time a line is written to.
    @VisibleForTesting
    public static final int RELEVANCE_BUMP = CAPPED_RELEVANCE / 25;

    // Java doesn't include a function for the logarithm in an arbitrary base, so implement it
    private static final double LOG_DECAY_FACTOR = Math.log(DECAY_FACTOR);
    private static double logF(final double value) {
        return Math.log(value) / LOG_DECAY_FACTOR;
    }

    // Utility function to get a power of the decay factor, to simplify the code.
    private static double powF(final double value) {
        return Math.pow(DECAY_FACTOR, value);
    }

    /**
     * Compute the value of the relevance now given an expiry date.
     *
     * @param expiry the date at which the column in the database expires.
     * @return the adjusted value of the relevance for this moment in time.
     */
    public static int computeRelevanceForNow(final long expiry) {
        return computeRelevanceForTargetDate(expiry, System.currentTimeMillis());
    }

    /**
     * Compute the value of the relevance at a given date from an expiry date.
     *
     * Because relevance decays with time, a relevance in the past corresponds to
     * a different relevance later.
     *
     * Relevance is always a positive value. 0 means not relevant at all.
     *
     * See the explanation at the top of this file to get the justification for this
     * computation.
     *
     * @param expiry the date at which the column in the database expires.
     * @param target the target date to adjust the relevance to.
     * @return the adjusted value of the relevance for the target moment.
     */
    public static int computeRelevanceForTargetDate(final long expiry, final long target) {
        final long delay = expiry - target;
        if (delay >= CAPPED_RELEVANCE_LIFETIME_MS) return CAPPED_RELEVANCE;
        if (delay <= 0) return 0;
        return (int) (IRRELEVANCE_FLOOR / powF((float) delay / HALF_LIFE_MS) - IRRELEVANCE_FLOOR);
    }

    /**
     * Compute the expiry duration adjusted up for a new fresh write.
     *
     * Every time data is written to the memory store for a given line, the
     * relevance is bumped up by a certain amount, which will boost the priority
     * of this line for computation of group attributes, and delay (possibly
     * indefinitely, if the line is accessed regularly) forgetting the data stored
     * in that line.
     * As opposed to bumpExpiryDate, this function uses a duration from now to expiry.
     *
     * See the explanation at the top of this file for a justification of this computation.
     *
     * @param oldExpiryDuration the old expiry duration in milliseconds from now.
     * @return the expiry duration representing a bumped up relevance value.
     */
    public static long bumpExpiryDuration(final long oldExpiryDuration) {
        // L * logF(K / (I + K / F ^ (expiry1 / L))), as documented above
        final double divisionFactor = powF(((double) oldExpiryDuration) / HALF_LIFE_MS);
        final double oldRelevance = IRRELEVANCE_FLOOR / divisionFactor;
        final long newDuration =
                (long) (HALF_LIFE_MS * logF(IRRELEVANCE_FLOOR / (RELEVANCE_BUMP + oldRelevance)));
        return Math.min(newDuration, CAPPED_RELEVANCE_LIFETIME_MS);
    }

    /**
     * Compute the new expiry date adjusted up for a new fresh write.
     *
     * Every time data is written to the memory store for a given line, the
     * relevance is bumped up by a certain amount, which will boost the priority
     * of this line for computation of group attributes, and delay (possibly
     * indefinitely, if the line is accessed regularly) forgetting the data stored
     * in that line.
     * As opposed to bumpExpiryDuration, this function takes the old timestamp and returns the
     * new timestamp.
     *
     * {@see bumpExpiryDuration}, and keep in mind that the bump depends on when this is called,
     * because the relevance decays exponentially, therefore bumping up a high relevance (for a
     * date far in the future) is less potent than bumping up a low relevance (for a date in
     * a close future).
     *
     * @param oldExpiryDate the old date of expiration.
     * @return the new expiration date after the relevance bump.
     */
    public static long bumpExpiryDate(final long oldExpiryDate) {
        final long now = System.currentTimeMillis();
        final long newDuration = bumpExpiryDuration(oldExpiryDate - now);
        return now + newDuration;
    }

    // This utility class is not publicly instantiable.
    private RelevanceUtils() {}
}
