Estimating HIV incidence from grouped cross-sectional data in settings where anti-retroviral therapy is provided
Humphrey Misiri (hmisiri at gmail dot com)
Public Health Department, College of Medicine, University of Malawi, Blantyre, Malawi
DOI
http://dx.doi.org/10.13070/rs.en.2.1324
Date
2015-03-02
Cite as
Research 2015;2:1324
License
Abstract

Prevalence and incidence are measures that are used for monitoring the occurrence of a disease. Prevalence can be computed from readily available cross-sectional data but incidence is traditionally computed from longitudinal data from longitudinal studies. Longitudinal studies are characterised by financial and logistical problems where as cross-sectional studies are easy to conduct. This paper introduces a new method for estimating HIV incidence from grouped cross-sectional sero-prevalence data from settings where antiretroviral therapy is provided to those who are eligible according to recommended criteria for the administration of such drugs.

Introduction

Antiretroviral therapy (ART) has helped to alleviate the suffering of AIDS patients in the world. In many countries, patients have access to ART. In Malawi, ART is also available for free but not all HIV positive persons have access to ART. By 2011, over 30% of HIV positive persons were on ART [1].

Incidence is a very important measure of disease occurrence. If the incidence of HIV is known, it is easy to monitor its spread. On the other hand, prevalence alone does not give complete information about the magnitude of the spread of HIV or any disease in general.

Consider a virulent disease like Ebola which kills after just a few days from infection. Individuals who are infected with the Ebola virus die after a very short illness if no meaningful therapeutic intervention is available. In that case, prevalence can never give a true picture of the extent of an Ebola epidemic since those who die from the disease are never counted. As a result, a low prevalence of Ebola does not mean Ebola is about to be non-existent or is almost eradicated from a community. On the other hand, the incidence of Ebola is the best measure which can be used to monitor the disease since Ebola deaths are included in its computation. Consequently, incidence gives a true picture of an Ebola epidemic. In the same vein, HIV incidence gives a true picture of the spread of HIV in a community.

Traditionally, incidence is computed from data from longitudinal studies. Unfortunately, there are many financial and logistical problems associated with conducting longitudinal studies. To avoid these drawbacks, a viable alternative is to estimate incidence from data from cross-sectional studies. Two good examples of methods for achieving this are models by Podgor and Leske (1986) and Misiri et al (2012) [2, 3]. These models produce estimates of incidence which are adjusted for differential mortality. Both approaches are for estimating HIV incidence where ART is not properly rolled out in the community. It is possible to estimate the incidence of HIV from cross-sectional data from a population where ART is provided.

The aim of this paper is to introduce a new method of estimating HIV incidence in settings where ART is provided to HIV positive people who need it regardless of the extent of coverage of such services. This method also adjusts for differential mortality.

Materials and Methods
Motivation

Podgor and Leske (1986) proposed a method for estimating incidence from grouped cross-sectional data [3]. In the spirit of Podgor and Leske(1986), we proceed to motivate our approach. Let λ 1 be the rate of natural mortality, λ 2 be the HIV incidence, λ 3 be the rate of HIV mortality in the absence of ART, λ 4 be the rate of recruitment to ARV therapy, λ 5 be the rate of mortality among ART recipients.

Let X1, X2, X3, X4 and X5 be independent random variables where X1 is the time to death from natural causes, X2 is the time to HIV infection, X3 is the time to death whilst HIV positive, X4 is the time to ART registration and X5 is the time to death whilst on ART. It follows from the above description that X1, X2, ... , X5 have exponential distributions with parameters λ1, λ2, λ3, λ4 and λ5 respectively.

We will proceed by dividing the population into three strata namely: HIV negative persons, HIV positives on ART and HIV positives who are not on ART. Denote the total proportion of HIV positives by P0, the proportion of positives who are not on ART by P01 and the proportion of positives who are on ART by P02. Both P01 and P02 are proportions of the population.

Consider an interval [x, x+t]. The number of HIV positives at the end of the interval is

(1)    N1P1=N0P0S1+ N0(1-P0)S2

S1 is the probability of surviving the interval given that one entered the interval already infected.

S2 is the probability of being infected in the interval given that one was HIV negative at the beginning of the interval.

Furthermore, the number of HIV negatives at the end of the interval is

(2)    N1(1-P1)=N0(1-P0)S3

where S3 is the probability of surviving the interval without contracting HIV

According to the relationship among these exponential random variables [3, 4] (3)     S 1 =1 0 1 ( λ 3 + λ 4 ) e ( λ 3 + λ 4 ) dt= e ( λ 3 + λ 4 ) ,

S2= 1 0 1 λ 2 e ( λ 1 + λ 2 λ 3 )t λ 3 dt = λ 2 ( e λ 3 e ( λ 1 + λ 2 ) ) ( λ 1 + λ 2 λ 3 )

and S 3 =1 0 1 ( λ 1 + λ 2 ) e ( λ 1 + λ 2 )t dt= e ( λ 1 + λ 2 )

In the interval [x, x+t], some people may have just been registered to receive ART but some were already registered prior to entering the interval. Therefore the formula in (1) above does not capture the number of infected people in [x, x+t] in a setting where ART is provided. If ART is provided, at the end of the interval there are two groups of HIV positive individuals namely those who are not on ART and those who are on ART.

Not every infected person is eligible for ART. For example, an individual who gets infected with HIV in a 5-year interval can never be eligible for ART as the therapy is for HIV positives who are in a reasonably advanced stage of infection. Therefore, the number of HIV positive individuals who are on ART at the end of the interval is the sum of old HIV positives who entered the interval already on ART and those HIV positives who are newly registered to receive ART. This can be denoted by

(4)    N0P02S4 + N0P01S5

S4 is the probability of surviving to the end of the interval whilst on ART given than one was already on ART at the beginning of the interval

S5 is the probability of surviving to the end of the interval having been newly recruited to receive ART given than one was not on ART at the beginning of the interval

Using the relationship between independent exponential random variables as described in Lagakos(1976) on pages 553 through 555 [4], these probabilities are defined as follows:
S 4 = e λ 5
S 5 = 0 1 λ 4 e ( λ 3 + λ 4 λ 5 )t λ 5 dt= λ 4 ( e λ 5 e ( λ 3 + λ 4 ) ) ( λ 3 + λ 4 λ 5 ) .

Therefore (4) becomes

(5)     N 0 P 02 e λ 5 +N P 0 01 λ 4 ( e λ 5 e ( λ 3 + λ 4 ) ) ( λ 3 + λ 4 λ 5 ) .

The number of HIV positives at the end of the interval is therefore

(6)     N 1 P 1 = N 0 P 01 e ( λ 3 + λ 4 ) + N 0 ( 1 P 0 )[ λ 2 ( e λ 3 e ( λ 1 + λ 2 ) ) ( λ 1 + λ 2 λ 3 ) ]+ N 0 P 02 e λ 5 + N 0 P 01 [ λ 4 ( e λ 5 e ( λ 3 + λ 4 ) ) ( λ 3 + λ 4 λ 5 ) ] .

The number of HIV negative persons at the end of the interval is

(7)     N1(1-P1) =N0(1-P0)S6

where S6 is the probability of remaining HIV negative having survived the interval. Now, S 6 = 0 1 ( λ 1 + λ 2 ) e ( λ 1 + λ 2 ) dt= e ( λ 1 + λ 2 ) . Therefore the equation in (7) becomes

(8)     N 1 ( 1 P 1 )= N 0 ( 1 P 0 ) e ( λ 1 + λ 2 )

From (8) we have that N 1 = N 0 ( 1 P 0 ) ( 1 P 1 ) e ( λ 1 + λ 2 ) .

Therefore the left hand side of equation (6) becomes N 0 ( 1 P 0 ) ( 1 P 1 ) P 1 e ( λ 1 + λ 2 ) .

Consequently, equation (6) becomes
( 1 P 0 ) P 0 e ( 1 P 1 ) ( λ 1 + λ 2 ) = N 0 P 01 e ( λ 3 + λ 4 ) + N 0 P 01 [ λ 2 ( e λ 3 e ( λ 1 + λ 2 ) ) ( λ 1 + λ 2 λ 3 ) ]+ N 0 P 02 e λ 5 + N 0 P 01 [ λ 4 ( e λ 5 e ( λ 3 + λ 4 ) ) ( λ 3 + λ 4 λ 5 ) ]

From this expression we define a function f( λ 2 ) as follows : f( λ 2 )= ( 1 P 0 ) P 1 e ( λ 1 + λ 2 ) ( 1 P 1 ) P 01 e ( λ 3 + λ 4 ) P 02 e λ 5 P 01 λ 4 [ ( e λ 5 e ( λ 3 + λ 4 ) ) λ 3 + λ 4 λ 5 ]( 1 P 0 ) λ 2 [ ( e λ 3 e ( λ 1 + λ 2 ) ) ( λ 1 + λ 2 λ 3 ) ]

where 1- P1 > 0, λ 1 + λ 2 λ 3 >0 and λ 3 + λ 4 λ 5 >0 .

Using the Newton-Raphson method, the value of λ 2 can be estimated given appropriate data.

According to the Newton-Raphson method: λ 2 ( n+1 ) = λ 2 ( n ) f( λ 2 ) f'( λ 2 ) .

The derivative of f( λ 2 ) is
f'( λ 2 )= ( 1 P 0 ) P 1 e ( λ 1 + λ 2 ) ( 1 P 1 ) λ 2 ( 1 P 0 ) e ( λ 1 + λ 2 ) ( λ 1 + λ 2 λ 3 ) ( 1 P 0 )[ e λ 3 e ( λ 1 + λ 2 ) ] ( λ 1 + λ 2 λ 3 ) + λ 2 ( 1 P 0 )[ e λ 3 e ( λ 1 + λ 2 ) ] ( λ 1 + λ 2 λ 3 )

Note that the graph of f( λ 2 ) has an asymptote at λ 2 = λ 3 λ 1 . Because of this, it is possible for to have more than 1 root on either side of the asymptote. Nevertheless, we will retain the roots of which are to the right of the asymptote because these are the only values which satisfy the condition that. Nevertheless, we will retain the roots of f( λ 2 )

which are to the right of the asymptote because these are the only values which satisfy the condition that f( λ 2 )

= 0 given λ 1 + λ 2 λ 3 >0 .

The standard error of λ 2 was estimated using the delta transformation. An explanation of how the formula for the standard error was derived is given in the Annex.

Application of the method to population-based data from the Malawi Demographic Survey 2010
Description of the data

The estimated population of Malawi in 2011 was 14,388,550 [5]. The national prevalence of HIV was 10% in 2010 [6]. The provision of ARV therapy in Malawi is overseen by the HIV Unit in the Ministry of Health and Population. By 2011, 382,953 people were on ARV therapy [1]. The remaining 1,055,902 were not on ARV therapy. In the same year, the number of deaths due to HIV was 43,000 [1].

From the ARV Supervision database for 2004-2009 which was maintained by the HIV Unit, in 2004 there were 3,262 ART registrations [7]. By the end of 2008, a total of 20,393 HIV positive persons were recruited to receive ARV therapy. This gives a recruitment rate ( λ 4 ) of 3,426 people per year on average. Studies [8, 9] conducted in Malawi found that ART reduces mortality by 10% [8]. Therefore given HIV mortality rates, the rate of mortality among those on ART is λ 5 =0.9* λ 3 .

The age-specific HIV sero-prevalence data analysed for this paper are extracted from the database of the Malawi Demographic and Health Survey (MDHS2010) which was conducted in 2010. The data are in Table 1 below.

Agegroup HIV-Number % (p03) HIV+ Number % Not on ARV Number % (p01) On ARV Number % (p02)
15-1932080.022710.022630.02080.002
20-2423700.0511220.0511140.04880.003
25-2921410.1082320.1081970.092350.016
30-3415600.1812830.1812270.146560.036
35-3912240.2463010.2462320.190690.056
40-448700.2472150.2471550.178600.069
45-498170.1931580.193950.116630.077
50-542950.129380.129250.085130.044
Table 1. Nationally representative HIV sero-prevalence data for Malawi, 2010.

In 1992, HIV was not endemic as it is today. Mortality, in general, was mainly due to causes other than HIV. As HIV spread throughout Malawi, HIV became the leading cause of mortality. The provision of ART to HIV positives has reversed this trend in mortality. Therefore, the mortality estimates for 1992 represent true natural mortality rates for Malawi which are not contaminated by HIV mortality. The source of HIV mortalities is a study by Crampin et al(2002). This study reports mortality rates for HIV persons not on ARV therapy from a study conducted in a typical rural setting representative of an average rural area in Malawi [10]. These estimates represent HIV mortality rates in rural Malawi in the absence of ARV therapy. Table 2 below contains the natural and HIV mortality rates.

Age group index (j) Age group Natural mortality rates ( λ 1 ) AIDS mortality rates ( λ 3 )
Men Women
115-190.00380.00530.0471
220-240.00410.00360.0593
325-290.00680.00680.0675
430-340.00840.00720.1354
535-390.00760.0090.1354
640-440.01010.00890.1427
745-490.00970.00960.1427
850+0.00970.00960.2339
Table 2. Natural and AIDS mortality rates for Malawi.
Results

HIV incidence estimates for 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54 age groups are in Table 3. The 95% confidence interval for each estimate is also presented. The incidence estimates were obtained by using the Newton-Raphson method. The initial values of λ 2 plucked into the Newton-Raphson algorithm were obtained by a combination of methods which include inspection, use of the R function uniroot and numerical search procedures.

Agegroup FOI SE Incidence per 5 years 95% CI for incidence
Lower limit Upper limit
15-190.06070.000358616061
20-240.08580.000779868487
25-290.11710.001806117114121
30-340.16280.00157163160166
35-390.14280.00108143141145
40-440.14460.000897145143146
45-490.14470.000755145143146
50-54----
Table 3. Nationally representative HIV incidence estimates for Malawi.

The age group with the highest incidence is the 30-34 year age group. The smallest incidence is for the 15-19 year age group. Although 40-44 and 45-49 age groups have the same incidence estimate the two estimates are different correct to 6 decimal places. All the standard errors of the FOI estimates are very small. Furthermore, the 95 % confidence intervals for the 15-19 through 45-49 age groups are very narrow.

Discussion

This method is a very good way of estimating incidence from cross-sectional data. It is impossible to estimate the HIV incidence for the age group 50-54 years because the structure of the model does not permit it.

Our new method relies heavily on the existence of the roots of f( λ 2 ) . For the 50-54 year age group, no estimate is possible because of the nature of the model used.

We tested the sensitivity of the method to the size of P01 and P02. According to our findings, big values of P01 and P02 resulted in f( λ 2 ) whose roots were hard to estimate. In order to have reasonable smaller P01 and P02 for the Newton-Raphson method to converge efficiently, both parameters (P01 and P02) must be defined as proportions of the sample for each age group. In any case, the number of people on ART is bound to be small, therefore as a fraction of the sample for each age group, this produces proportions which make it easy to achieve convergence when using the Newton - Raphson algorithm.

The objective of the method is to produce incidence estimates. Therefore, defining P01 and P02 as proposed above does not make the results of the current method unusable. The reader who wants the proportions P01 and P02 to be defined otherwise can do so and can compute the proportions based on his own definitions from data [3].

The fact that all the confidence intervals were narrow can be explained by the size of the samples for each age group. All sample sizes were very big. In such cases, standard errors are very small. These affect the size of the margin of error. Eventually, confidence intervals computed from such standard errors are likewise narrow. Besides, the narrow confidence intervals are indicative of high precision in the estimation of FOI.

Conclusion

The novel method introduced in this paper is a very good approach for estimating HIV incidence from aggregated data collected from settings where ART is provided to HIV infected individuals. This method is timely as it comes at a time when provision of ART is rampant in many countries of the world.

ANNEX
Derivation of the standard error of λ2

Obviously, the force of infection (FOI) λ2 is the function of both P0 and P1. That is to say λ2=f(P0, P1). Therefore to find the variance of λ2 we use the delta method of transformation. Using the delta method
λ 2 =f( P 0 , P 1 ) . Therefore to find the variance of λ 2

we use the delta method of transformation. Using the delta method, Var( λ 2 )= ( λ 2 P 0 ) 2 Var( P 0 )+ ( λ 2 P 1 ) 2 Var( P 1 ) .

We will define a function y in this way:

(1)     y= P 1 e ( λ 1 + λ 2 ) ( 1 P 1 ) 1 ( 1 P 0 ) [ P 01 e ( λ 3 + λ 4 ) P 02 e λ 5 ] λ 4 ( 1 P 0 ) [ e λ 5 e ( λ 3 + λ 4 ) ( λ 3 + λ 4 λ 5 ) ]+ λ 2 ( λ 1 + λ 2 λ 3 ) [ e λ 3 e ( λ 1 + λ 2 ) ]

Therefore

(2)     y P 0 = 1 ( 1 P 0 ) 2 [ P 01 e ( λ 3 + λ 4 ) P 02 e λ 5 ] λ 4 ( 1 P 0 ) 2 [ e λ 5 e ( λ 3 + λ 4 ) ( λ 3 + λ 4 λ 5 ) ] .

Similarly,

(3)     y λ 2 = P 1 e ( λ 1 + λ 2 ) ( 1 P 1 ) + λ 2 . e ( λ 1 + λ 2 ) ( λ 1 + λ 2 λ 3 ) + e λ 3 e ( λ 1 + λ 2 ) ( λ 1 + λ 2 λ 3 ) + λ 2 ( e λ 3 e ( λ 1 + λ 2 ) ) ( λ 1 + λ 2 λ 3 ) 2 .

It is also true that

(4)     y P 1 = P 1 e λ 1 λ 2 ( 1 P 1 ) 2 + e λ 1 λ 2 ( 1 P 1 ) .
λ 2 P 0 = ( y P 0 ) ( y λ 2 ) and λ 2 P 1 = ( y P 1 ) ( y λ 2 )

The partial derivative is the quotient when the result in (2) is divided by the result in (3) above. Similarly, the partial derivative is the quotient when the result in (4) is divided by the result in (3) above.

The variance of P0 is Var( P 0 )= P 0 ( 1 P 0 ) N 0 . Similarly the variance of P1 is Var( P 1 )= P 1 ( 1 P 1 ) N 1 .

Declarations
Competing interests

There are no competing interests.

Acknowledgments

I am very grateful to ORC Macro International for allowing me to analyse the MDHS2010 data.

Authors' contributions

HM conceived the study, conceived the method, obtained the data, analyzed the data, drafted the manuscript and revised it.

References
  1. . HIV Unit: 2012 Global AIDS Response Progress Report:Malawi Country Report for 2010 and 2011. Lilongwe,Malawi: Ministry of Health,Malawi Government; 2012.
  2. Misiri H, Edriss A, Aalen O, Dahl F. Estimation of HIV incidence in Malawi from cross-sectional population-based sero-prevalence data. J Int AIDS Soc. 2012;15:14 pubmed
  3. Podgor M, Leske M. Estimating incidence from age-specific prevalence for irreversible diseases with differential mortality. Stat Med. 1986;5:573-8 pubmed
  4. Lagakos S. A stochastic model for censored-survival data in the presence of an auxiliary variable. Biometrics. 1976;32:551-9 pubmed
  5. . "Population projections for Malawi." [http://www.nso.malawi.net/index.php?option=com_content&view=article&id=134%3Apopulation-projections-for-malawi&catid=8&Itemid=3. ].
  6. . National Statistical Office (NSO) ORC Macro: Malawi Demographic and Health Survey 2010. Zomba: National Statistical Office (NSO) and O. R. C. Macro; 2010.
  7. Murray C, Ortblad K, Guinovart C, Lim S, Wolock T, Roberts D, et al. Global, regional, and national incidence and mortality for HIV, tuberculosis, and malaria during 1990-2013: a systematic analysis for the Global Burden of Disease Study 2013. Lancet. 2014;384:1005-70 pubmed publisher
  8. Jahn A, Floyd S, Crampin A, Mwaungulu F, Mvula H, Munthali F, et al. Population-level effect of HIV on adult mortality and early evidence of reversal after introduction of antiretroviral therapy in Malawi. Lancet. 2008;371:1603-11 pubmed publisher
  9. Floyd S, Molesworth A, Dube A, Banda E, Jahn A, Mwafulirwa C, et al. Population-level reduction in adult mortality after extension of free anti-retroviral therapy provision into rural areas in northern Malawi. PLoS ONE. 2010;5:e13499 pubmed publisher
  10. Crampin A, Floyd S, Glynn J, Sibande F, Mulawa D, Nyondo A, et al. Long-term follow-up of HIV-positive and HIV-negative individuals in rural Malawi. AIDS. 2002;16:1545-50 pubmed
ISSN : 2334-1009