3‐D Density Structure of the Lunar Mascon Basins Revealed by a High‐Efficient Gravity Inversion of the GRAIL Data

The lunar mascon basins are characterized by high gravity anomalies and thin crust. Study of the density structure beneath the mascon basins can help to understand the origin of these gravity anomalies and infer their formation and evolution. We propose an efficient forward gravity method based on the 3‐D Gauss‐Legendre quadrature (GLQ) and Fast Fourier Transforms combined with the adaptive discretization strategy to ensure high accuracy. The numerical example demonstrates that computational efficiency is increased by about three orders of magnitude compared to the traditional 3D GLQ method. We employ this forward method in a 3‐D inversion of the gravity data derived from the lunar gravity model GL1500E. The inverted results show prominent high‐density structures (namely, mascons) representing significant mantle uplift and thinned crust beneath most impact basins. Marked low‐density rings surround the mascons. Most of the corresponding low‐density anomalies extend from the near‐surface to the Moho, indicating a thick, low‐density crust. Our density model is consistent with the formation processes of mascons that an impact causes collapse of the transient crater and then the shocked mantle drives mantle flow. The low‐density rings surrounding the mascons may stem from the crust thickening following an impact and extensive fracturing of the crustal column.

. Although the observed surface features are clearly associated with large lunar impact basins, the subsurface structure of these basins is not studied yet sufficiently (Zuber et al., 2016). Several studies (e.g., Hikida & Wieczorek, 2007;Ishihara et al., 2009;Wieczorek et al., 2013) have determined the crustal thickness, providing valuable information for the lunar subsurface structure. In particular, Wieczorek et al. (2013) were able to resolve crustal density and give a crustal thickness model that matches the observed crustal thickness from Apollo. However, various studies provided different values for the crust-mantle interface beneath some large mascon basins. For example, the crustal thickness of the Orientale mascon is very different as determined from pre-GRAIL and post-GRAIL missions' data: it is close to 0 km based on the pre-GRAIL data (Hikida & Wieczorek, 2007), and is determined to be 10 km (Wieczorek et al., 2013) or 20 km (Liang et al., 2014) based on the post-GRAIL data. Recently, a localized admittance analysis of the high-resolution GRAIL data by Besserer et al. (2014) has derived lateral and vertical density variations using a high-degree (250-550) of GRAIL data. However, this method is mainly sensitive to shallow density structures typically less than 20 km, which naturally removes the mascons' signature. Liang et al. (2014) derived a 3-D density model of the lunar lithosphere (0-100 km) using a gravity inversion in spherical coordinates, but the resulting model has a relatively low resolution of 3° × 3°. A high-resolution 3-D density model is still required to provide further insights into the density structure and infer a plausible formation process of the lunar mascons (e.g., Johnson et al., 2016;Melosh et al., 2013).
Several efficient forward and inversion methods in Cartesian coordinates have been proposed to model 3-D density structures, including the fast Fourier transform method (e.g., Caratori Tontini et al., 2009;Zhao et al., 2018), the adaptive multilevel fast multipole method (Ren et al., 2017) and the block circulant extension method (e.g., Vogel, 2002;Wu, 2018). However, these methods developed in Cartesian coordinates are not applicable to deal with large-scale or global features due to the curvature of planets. The spherical harmonic analysis is widely used to obtain the thickness or bulk density of the lunar crust (Huang & Wieczorek, 2012;Ishihara et al., 2009;Wieczorek et al., 2013). On the other hand, several spatial domain methods were developed, including the 2-D, 3-D Gauss-Legendre quadrature (GLQ) (e.g., Asgharzadeh et al., 2007;Uieda et al., 2016;Zhang et al., 2018;Zhao et al., 2019) and the Taylor series expansion (e.g., Grombein et al., 2013), in which mass anomalies are generally discretized into tesseroids taking into account the curvature of planets (Anderson, 1976).
Among these spatial domain methods, the 3-D GLQ method has been well developed and widely used in gravitational forward modeling and inversion (e.g., Zhang et al., 2018;Zhao et al., 2019). It approximates the 3-D volume integral by a weighted sum of the effects of point masses. The accuracy is directly affected by the point mass number and the distance to the observation point (Zhao et al., 2019). Significant errors occur when the mass anomalies are in the polar regions or close to the observation point. Uieda et al. (2016) proposed an adaptive discretization strategy, which decreases the relative error to lower than 0.1%. Furthermore, the computational efficiency of the 3-D GLQ method with adaptive discretization is relatively low, especially for a high-resolution (fine) large-scale 3-D gravity inversion. Zhao et al. (2019) proposed an equivalent storage strategy that increases the computational efficiency by approximately two orders of magnitude and largely decreases the memory requirement, which also provides an elegant solution to storing a dense Jacobian matrix in fine 3D gravity inversions.
In this study, we follow the method of Zhao et al. (2019). First, we develop principles of the equivalent storage method based on advanced matrix analysis. Subsequently, further improvement in computational efficiency is proposed by using the Fast Fourier Transforms (FFT) algorithm. Then we design a simple sphere shell model to test its efficiency and accuracy and apply it in a global gravity inversion of the latest lunar gravity model GL1500E (Konopliv et al., 2014) to obtain a 3-D density distribution within the lunar crust and upper mantle (0-100 km). Finally, we discuss the inverted density anomalies for lunar mascon basins.

Forward Method
In the spherical coordinate system, subsurface mass distributions are generally discretized into tesseroids (Anderson, 1976) with equal intervals in the latitudinal and longitudinal directions, as shown in Figure 1. The density of each tesseroid is assumed to be constant. The gravity observations are typically considered to be on a spherical patch at a constant height above the source and aligned with the tesseroids mesh (Figure 2a). The gravity anomalies g (i.e., the gravitational accelerations in the radius direction) at the observations can be written in the matrix-vector form as: ZHAO ET AL.   Uieda et al. (2016). Q is the center of a tesseroid, and P (r, λ, φ) is an observation point outside the source region; l is the distance between the observation point P and a point inside a tesseroid. Figure 2. The sketch map shows (a) the kernel matrix K in the upper panel and the discretization of a source region and an observation surface in the lower panel, (b) the shifting equivalence, and (c) the swapping equivalence of a kernel sub-matrix K AX are expressed in matrix form in the upper panels. In (b) and (c), the lower panels show the equivalence relations in dashed lines with the same color. Taking A 1 X 2 = A 2 X 1 in (c) as an example, the swapping equivalence is determined based on the kernel expression in Equation 2 is an even function of (λ' − λ P ), therefore exchanging the subscript indexes in A 1 X 2 do not change the kernel value. Note that as for the gravity component g y , the second derivatives g xy and g yz , their kernel elements have the relation A 1 X 2 = −A 2 X 1 because their kernel expression is an odd function (Zhao et al., 2019).
where K is the kernel matrix, ρ represents density of the tesseroids. The amplitude of the elements in the kernel matrix is given by Grombein et al. (2013): where G is the gravitational constant, Δr, Δφ, and Δλ are mesh intervals.
sin sin cos cos cos P P P . Equation 2 can be evaluated by using the 3-D GLQ method.
Under the above discretization, Zhao et al. (2019) found shifting and swapping equivalences existing in the kernel matrix K, named the kernel matrix equivalence. Here, we explore the kernel matrix equivalence in a matrix form. To be explicit, a simple kernel matrix K in the upper panel of Figure 2a is taken as an example. This kernel matrix K is produced by a single-layer model with 3 × 4 tesseroids in the latitudinal and longitudinal directions (as shown in the lower panel of Figure 2a). The kernel matrix K is a block matrix, and each block in K (e.g., K AX ) is a sub-matrix with 4 × 4 elements, as shown in the upper panel of Figure 2b in this case. It is produced by the tesseroids in the first row evaluating at the observation points of the first row, as shown in the lower panel of Figure 2b.
Since the mass source and observation surface are discretized into regular meshes with equal intervals, the elements in the kernel matrix stemmed from the same sizes of tesseroids and computation distances are equal. For the tesseroids and observations in the first rows (in the lower panel of Figure 2b), the shifting equivalence reveals that the kernel elements along the parallel dashed lines are equal (e.g., A 1 X 1 = A 2 X 2 , A 2 X 1 = A 3 X 2 ). In the matrix form (e.g., a kernel sub-matrix K AX in the upper panel of Figure 2b), the shifting equivalence indicates that the kernel elements on (and parallel to) the main diagonal lines are equivalent, and they are expressed using the same color in both the upper and lower panels of Figure 2b. On the other hand, the swapping equivalence (Zhao et al., 2019) shows that the kernel elements along the crossed dashed lines have equal values represented by the same color, as shown in the lower panel of Figure 2c. In the matrix form, this swapping equivalence indicates that the elements in the kernel sub-matrix K AX are symmetric with respect to the main diagonal line (e.g., A 1 X 2 = A 2 X 1 , A 1 X 3 = A 3 X 1 ). The same relations are found in every block matrix in K.
Combining both the shifting equivalence and the swapping equivalence in the kernel matrix, we obtain a series of symmetric diagonal constant matrices (e.g., K AX ) of K in Equation 1. Such matrices have a specific name as Toeplitz matrix in the forms of where t i = t −i for i = 0, 1, …, n−1 (n = 4 in the case of Figure 2). We mark K AX = toeplitz(t), where t = (t 1−n , …, t −1 , t 0 , t 1 , …, t n−1 ).
As for the Toeplitz matrix (e.g., K AX ), its product with a vector is a 1-D discrete convolution, which can be evaluated efficiently and accurately through using an FFT algorithm (Vogel, 2002;Wu, 2018). For the case in this study, let represent the sub-density vector of tesseroids in row X and the sub-gravity vector in row A produced only by the tesseroids in row X, respectively. Then the element of the product (K AX ·ρ X ) of the kernel sub-matrix (K AX ) and sub-density vector (ρ X ) can be written as: To calculate this 1D discrete convolution using the FFT algorithm, a n × n circulant matrix C = circulant(c) is constructed (Vogel, 2002) as: where c = (c 0 , c 1 , …, c n−1 ). Accordingly, the Toeplitz matrix K AX is extended into a 2n × 2n circulant matrix K AX ext = circulant(t ext ) with t ext = (t 0 , t 1 , …, t n−1 , 0, t 1−n , …, t −1 ). Then the Toeplitz matrix-vector products K AX ·ρ X can be efficiently computed by combining circulant embedding with FFTs as follows: are the extended vectors of g AX and ρ X with a dimension of 2n.

1
 and  indicate 1D inverse and forward discrete Fourier transform operators, respectively. Finally, g AX is obtained by extracting the first n components of g AX ext .
Other Toeplitz matrix-vector products (e.g., K AY ·ρ Y and K AZ ·ρ Z ) can be calculated repeatedly using the above FFT-based method, and the final gravity anomaly g is obtained by summation: where g A , g B , and g C are gravity anomalies of row A, B and C, respectively.
More generally, we discretize a one-layer source region into N φ and N λ segments evenly in the latitudinal and longitudinal directions. The observation points are arranged on a regular grid of a constant height that aligns with the tesseroids meshes, as in Zhao et al. (2019). Then the kernel matrix K can be subdivided into N φ × N φ submatrices: where each submatrix K i, j , for i = 1, 2, …, N φ and j = 1, 2, …, N φ , is a Toeplitz matrix with dimensions of N λ × N λ . The submatrix K i, j presents a collection of kernel elements produced by tesseroids in the j-th row at the observation points in the i-th row. The density vector ρ and the gravity g can be expressed as: where each sub-vector g i and ρ j represents the gravity anomalies in i-th row and the density values in the j-th row with the length of N λ . Then the sub-vector g i is written as a sum of where each K i, j ⋅ρ j is a Toeplitz matrix-vector product that can be efficiently computed using the FFT method in Equation 3-7). Thus, the most time-consuming part of the matrix-vector product in Equation 1 has been degraded into a group of products of block Toeplitz matrices with density sub-vectors by using the FFTs method to evaluate the convolutions.
The proposed forward method is also applicable for computation of the gravity potential, gradient tensors, and magnetic fields. Furthermore, we note that the FFT method has long been used when computing expansions of spherical harmonics on an even grid (Driscoll & Healy, 1994). The similarities between our method and Driscoll and Healy (1994) are: (1) the discretization in latitude and longitude directions should be in even interval, which is the foundation both for our method and for the fast computation of spherical harmonic expansion; and (2) the proposed method and the spherical harmonic expansion are all converted to convolutions of two matrixes which are fast computed by the FFT method. However, we can also mention the distinct differences: (1) the spherical harmonic function is defined on a sphere, so the FFT method employed in computing spherical harmonic expansion is only applicable to global grids, however our forward method is applicable to both global grids and regional application; (2) the grids numbers should be only N × N or N × 2N required by Driscoll and Healy (1994), while our forward method has no limits on the grids.

Inversion Method
We apply the proposed forward method to the 3-D gravity inversion in spherical coordinates. The Tikhonov regularization-based inversion method (Li & Oldenburg, 1998;Liang et al., 2014) is applied in this study. The solution is obtained by minimizing the global objective function, which is composed of the data misfit and the model objective function weighted by the regularization parameter β: where Φ d = ||W d (g obs − K·ρ)|| 2 is the data misfit, g obs is the observed data vector. W d = diag{1/σ 1 , …, 1/σ N } is the diagonal data-weight matrix, N is the number of observations, and σ n is the error standard deviation associated with the n-th datum. The optimal regularization parameter β is determined by the L-curve criterion (e.g., Hansen, 2001). The model objective function Φ m in spherical coordinates is as proposed by Liang et al. (2014): where ρ ref represents the reference density model, dv = r' 2 cosφ'dr'dφ'dλ'; α s is the model spacing weighting function; α r' , α φ' and α λ' are the coefficients that determine the relative impact of different directions in the model objective function, which are defined by the length scales L r , L φ and L λ (Oldenburg & Li, 2005): where L r , L φ and L λ are usually set to be the double length of a tesseroid in each direction.
In Equation 13, the vector w(r') is the depth weighting function employed to attenuate the decay of the kernel function with depth. Because the kernel decays rapidly with depth, the sensitivity weighting (Li & Oldenburg, 2000) is used in the model misfit whose matrix is composed of elements, where K n,m is the element of the kernel matrix K, M is the total number of tesseroids. This sensitivity-based weighting function was also used widely in previous studies (e.g., Chasseriau & Chouteau, 2003;Galley et al., 2020;Zhdanov, 2002).
Minimizing the global objective function of Equation 12, we have the following matrix equation (Liang et al., 2014)  The conjugate gradient method (e.g., Pilkington, 1997) is employed to solve Equation 16 with the optimal regularization parameter β. The high-efficiency gravity forward method in Section 2.1 is employed to calculate the predicted gravity field K·ρ. The transposed matrix K T is also a block Toeplitz matrix. In Equation 16, the products of K T with the diagonal matrix W d T W d K ρ and W d T W d g obs can be calculated efficiently by the FFT-based matrix-vector product method in Section 2.1. We note that it is not necessary to store the huge W ρ and W ρ T matrixes fully. The final matrix-vector products βW ρ T W ρ ρ and βW ρ T W ρ ρ ref are diagonal matrices, so we just need to store all elements on their primary diagonal. Therefore, the computational storage in a fine gravity inversion can be significantly reduced.

Synthetic Forward Model
A sample spherical shell model is designed to test the efficiency and accuracy of the proposed method. The results are compared with the software Tesseroids (Uieda et al., 2016) and the kernel matrix equivalence method (Zhao et al., 2019). A homogenous spherical shell with a constant density of 1,000 kg/m 3 and a thickness of 100 km with the radius ranging from 1,638 to 1,738 km is discretized into small tesseroids by a set of intervals ranging from 0.5° to 10°, and 10 layers in depth. The observation points that align with the tesseroids are on a spherical surface at 10 km above the shell model's top surface. The gravity g along the polar axis of this sphere shell model can be evaluated analytically (Grombein et al., 2013).
To guarantee high accuracy, we adopt the adaptive discretization algorithm (Uieda et al., 2016) to calculate the kernel function. The maximum relative errors for the three methods compared to the analytical solution are shown in Figure 3a, and their relative computation time of g (normalized by the slowest calculation time) varying with different mesh intervals are shown in Figure 3b.
As shown in Figure 3a, the maximum relative errors of the three methods for the spherical shell model increase with increasing of the mesh interval but lower than 0.1%. The application of the FFT algorithm for the kernel-density vector product has almost no influence on the accuracy of the forward calculation. Figure 3b shows that the total computation time of the proposed method is reduced by approximately three orders of magnitude compared with the method of Uieda et al. (2016) and one order of magnitude compared with the method of Zhao et al. (2019) when the mesh interval is less than 6°. Table 1 presents the statistics of the absolute computation time, relative errors, and memory occupation with the discretization size of 360 × 720 × 10. In Table 1, the absolute computation time for calculating the kernel and the product between the kernel and the density vector is demonstrated separately. Comparing the proposed method with the method of Zhao et al. (2019), the time for kernel computing is almost equal with a negligible difference. The computational efficiency in the kernel-vector product is increased by about 20 times in this model compared with Zhao et al. (2019). This high time-consuming part usually repeats hundreds of times in a fine gravity inversion, while the kernel needs only to be computed and stored for one time. Therefore, the improvement in the kernel-vector product calculation is quite promising in large-scale 3D gravity inversions.
The memory occupation of Uieda et al. (2016) method is 537.48 GB in this test, while the proposed method only occupies 0.75 GB (Table 1), reducing computation storage by ∼720 times. As mentioned in Section 2.1, the kernel matrix K is a block Toeplitz matrix. We need only to calculate and store the kernel elements produced by tesseroids in the first column. Although it does not need to store the vast kernel matrix in the forward modeling by the software Tesseroids (Uieda et al., 2016), it is inevitable to compute the kernel separately and store the dense Jacobian matrix in an inversion.

Synthetic Inversion Model
To examine the performance of the inversion method, we design a synthetic global model that has density anomalies located over the entire globe. This model represents a spherical shell with a thickness of 100 km, ranging from the inner radius of 1,638 km to the outer radius of 1,738 km. It is discretized into 180 × 360 × 10 tesseroids with a spacing of 1.0°, 1.0°, and 10 km in the latitudinal, longitudinal, and radial directions, respectively. The model consists of 11 anomalous bodies with different sizes and density contrasts. Each anomalous body represents a vertical cylinder simulating lunar mascons, and six of them have low-density outer-rings surrounding the positive density cores. The density contrasts and locations of the anomalous bodies are shown in Table S1 and Figure 4a.
The gravity anomalies produced by this synthetic model are calculated at a 10 km height above a radius of 1,738 km using the proposed forward method (Section 2.1). The total number of the data points is 64,800 (180 × 360) on a 1° × 1° grid over the global sphere. Gaussian random noise with zero mean and standard deviation of 1% of the maximum amplitude of the synthetic data is added to the gravity anomalies to simulate the observation errors, as shown in Figure 4b. The gravity anomalies clearly reveal the locations of the synthetic bodies and have apparent negative rings surrounding the positive anomalies.
We perform the inversion of the synthetic data ( Figure 4b) using the inversion method described in Section 2.2. In this synthetic inversion test, the parameters α s = 1.0 × 10 −9 , α θ' = α λ' = 3.47 × 10 −6 and α r' = 4.0 × 10 −7 , which are determined by Equation 14 and ρ ref = 0. When these weighting parameters have been determined, the inverted results are controlled by the regularization parameter β. A series of regularization parameters are tested to obtain the L-curve, as shown in Figure 5. The optimal density model (Figures 4c and 6) is obtained by using the value β = 4.0 × 10 −5 , which lies in the inflection point of the L-curve. The misfit between the observed and predicted gravity anomalies is shown in Figure 4d.
As shown in Figure 4c, the inverted density model recovers the density anomalies well, with the maximum misfit of the observed and predicted gravity anomalies less than 10 mGal (Figure 4d). Notably, both the low-density rings and the high-density bodies are clearly resolved, as shown in the depth profiles ( Figure 6).
ZHAO ET AL.
10.1029/2021JE006841 9 of 22 However, due to the damping, the major anomalies (>300 kg/m 3 ) of the inverted model extend 5-10 km deeper compared with the initial model. The inverted density anomalies are also underestimated by about 30%-50%, which should be considered in practical applications. In addition, the inverted density anomalies are sagged in the center of the anomalous body. We note this feature might result from the smoothness of the L-2 norm inversion in Equation 13, and nonuniqueness of the gravity inversion. Specifically, in this test, the negative outer ring may attract and neutralize the center positive density, which likely causes the divided two density concentrations. Theoretically, there could be nonuniqueness of the gravity inversion due to the existence of null space of the inversion operator, including the finiteness of the observational data, the equivalence of source region, and the presence of observation errors. Therefore, this feature should not be further interpreted in the inversion of the actual data.
On the other hand, the choice of some parameters (e.g., the regularization parameter β, depth extent, and depth resolution) may also affect the inversion result. The depth extend refers to the maximum depth of the model covering all density anomalies that can produce the observed gravity signal. The depth resolution refers to the smallest step, which is equal to the mesh interval in the depth direction. These two parameters have to be determined before the inversion. Three extra tests are performed to examine the potential effects of the regularization parameter, depth extent, and depth resolution in the supporting information. In the first test, we perform the inversion using β = 1.0 × 10 −8 and 1.0 × 10 −2 but keeping other parameters equal to those ones in the inversion in this section. The corresponding depth profiles are shown in Figures S1 and S2.
In the second test, we perform the inversion with the depth extents 80 and 120 km, respectively, but using the same regularization parameter as in this section. The inverted depth profiles are shown in Figures S3  and S4. In the third test, the inversion is performed with higher depth resolution (5 km, discretizing into 20 layers) using the same β as in this section, as in the inverted depth profiles shown in Figure S5.
The first test shows that the increased β leads to a deeper density distribution and vice versa, so it is necessary to find an optimal regularization parameter before getting the final results. The test on the depth extent indicates that different values have little effect on the inverted results. In the third test, the recovered density anomalies are much shallower than in the initial model due to changing the optimal regularization parameter after changing the depth resolution. Therefore, it is necessary to find a new optimal regularization parameter when changing the depth resolution in the inversion.
ZHAO ET AL.

Geological Background
The geological evolution of the Moon is closely connected with cosmic impacts, which control the lunar surface morphology and formation of lunar mascon basins (Figure 7a). In the nearside, the lunar topography is relatively low and flat, dominated by volcanic maria, including the Procellarum and a large number of impact basins. The Procellarum KREEP Terrane (PKT) is a broad area on the nearside of the Moon, which is characterized by low elevation and high concentration of the heat-producing elements (Lawrence et al., 1998). It has been considered as an ancient impact basin ∼3,200 km wide (e.g., Whitaker, 1981), in which several large mascon basins, including the Imbrium, Serenitatis, Humorum, and Necteris are also located, while the recent GRAIL result revealed a quasi-rectangular pattern of narrow linear anomalies bordering Procellarum ( impact basins. Besides, there exist mascon basins outside the Procellarum region, e.g., the Crisium, Smythii, and the Humboldtianum. In contrast, the farside of the Moon is mountainous and deeply cratered (Jutzi & Asphaug, 2011), mainly represented by the Feldspathic Highlands Terrane (FHT) and South Pole-Aitken (SPA) basin (Neumann et al., 2015). The FHT characterized by high elevation is located in the northern hemisphere. Several small mascon basins, e.g., the Orientale, Freundlich-Sharonov, and Moscoviense, are located in the northeastern part of the FHT. In the southern region, the SPA basin is regarded as the largest and oldest impact basin on the Moon with low elevation.

Data and Inversion
To reveal the internal structure beneath the lunar surface and construct a 3D density model up to a depth of 100 km, we carry out the gravity inversion of the latest GRAIL data. The topography (Figure 7a) of the Moon is derived from the model LRO_LTM05_2050 . The free-air gravity disturbances (Figure 7b) are calculated using the latest lunar gravity field model GL1500E provided by the GRAIL mission (Konopliv et al., 2014) at an elevation of 10 km above the reference radius of 1,738 km. The topography correction (Figure 7c) is also calculated at 10 km height with the reference density of 2,560 kg/m 3 (Zuber et al., 2013) using Wieczorek and Phillips (1998)' method. The Bouguer gravity disturbances (Figure 7d) are obtained by removing the effect of topography (Figure 7c) from the free-air gravity disturbances (Figure 7b). All calculations are truncated to the degrees 6-450 to remove the effect of the hemispheric asymmetry and the South Pole-Aitken impact (Neumann et al., 2015) and represented on the grid with a spatial resolution of 1.0° × 1.0°.
As shown in Figures 7b and 7d, the free-air and Bouguer gravity disturbances range from about −430 to 530 mGal and −240 to 670 mGal, respectively. The most striking features are the high positive gravity anomalies associated with the lunar impact basins, implying significant mass concentrations. It's worth noting that the high-density mare basalts are widely distributed, especially on the nearside of the Moon, which influence the gravity signal. In this study, we calculate the maximal gravity effect of mare basalts using the upper limit ZHAO ET AL. of basalt thickness provided by Taguchi et al. (2017), as shown in the supporting information. The gravity effect of the mare basalts ranges from 2 to 80 mGal, so much less than the Bouguer anomalies. The results for the removed basalt effect can be found in the supporting information.
The recent study of the lunar crustal thickness (Wieczorek et al., 2013) revealed that the maximum lunar Moho depth is less than 80 km. Therefore, the model setup is designed to cover the spherical shell from 0 to the depth 100 km. This shell is divided into 180 × 360 × 10 tesseroids in latitudinal, longitudinal, and radial directions, coinciding with the resolution of the input data. In this inversion, α s = 1.0 × 10 −9 , α θ' = α λ' = 3.47 × 10 −6 , and α r' = 4.0 × 10 −7 , which are determined by Equation 14 and no reference model is used as a constraint (i.e., ρ ref = 0). The forward and inversion methods described in Section 2 are applied to carry out the 3D inversion of the observed Bouguer anomalies (Figure 7d). The optimal regularization parameter β = 2.0 × 10 −6 is determined by the L-curve criterion as shown in Figure 8. The L-curve in this gravity inversion is different from that one in the synthetic example, although with the same model smoothness and same input parameters (e.g., α s , α θ' , α λ' , and α r' ). The difference mainly results from different input signals that produce different data misfit and model misfit.

Results
The inverted 3D density structure of the lunar crust and upper mantle at different depths is shown in Figure 9. The most striking features are the circular high-density anomalies observed in almost all the mascon basins, and often surrounded by negative-density outer rings. The positive high-density anomalies vary from 250 kg/m 3 beneath the Imbrium to more than 400 kg/m 3 beneath the Orientale and correspond well to the Bouguer anomalies. The low-density outer rings are generally discontinuous. Other regions far away from mascon basins are relatively homogeneous with density anomalies less than 100 kg/m 3 . Table 2 represents the depth of the high-density anomalies of 13 large mascons obtained in this study compared with previous research by Liang et al. (2014). The density features of our inverted results are quite different from Liang et al. (2014). On the one hand, significant negative density anomalies are widely distributed on the Highland in Liang et al. (2014), while our results show there is no significant density anomaly. Additionally, the broadly distributed positive high-density anomalies in the SPA basin according to Liang et al. (2014) are not supported by our results. These differences may be due to removing the spherical harmonics of the degrees 2-5 in our data, which contain the hemispheric asymmetry and the SPA impact ZHAO ET AL.
10.1029/2021JE006841 13 of 22 Figure 8. The L-curve of the lunar gravity inversion. The optimal regularization parameter is β = 2.0 × 10 −6 , which is the inflection point of the curve pointed by a red dot. (Neumann et al., 2015). On the other hand, Liang et al. (2014) found nearly no negative density anomalies surrounding the high-density mascon basins while consistently detected by our result. A possible reason could be that the Bouguer anomaly was truncated to degrees of 60 in Liang et al. (2014), corresponding to a spatial resolution of 90 km, which is insufficient to detect small-scale structures. We list these statistics here just as a reference, which could be not consistent with other studies, e.g., 670 kg/m 3 for the Orientale by Zuber et al. (2016). On the other hand, it is expected that the density varies to some extent with depth, but this feature is not resolved in our inverted result. It may be obtained by applying some additional constraints to the inversion in the future.
As demonstrated by the synthetic example in Section 3.2, the inverted density values might be underestimated due to the smearing by about 30%-50%, and the lower boundaries of the high-density anomalies might be deepened by several tens of kilometers. As demonstrated below, the results also show positive density contrasts that are by 150-200 kg/m 3 lower in their center than in the surrounding areas, such as for ZHAO ET AL.

10.1029/2021JE006841
14 of 22 the Humorum, Nectaris, Crisium, and Smythii basins, i.e., bowl-shaped or ring-shaped with still a positive density contrast at the center, which is not as high as the density contrast away from the center. This feature is also clearly seen in the synthetic inversion but disagrees with the initial synthetic model; therefore, this decrease is likely an artifact of the inversion.

Large Impact Basins in the Nearside
In the PKT on the nearside, the nearly round high-density structures are the most notable features. These round mass concentrations (mascons) correspond to the impact basins, e.g., the Imbrium, Serenitatis, Humorum, and Nectaris. The depth profiles of the density structure across these four mascon basins are shown in Figure 10.
The Imbrium, as the largest impact basin in the PKT region, has relatively low positive density anomalies of 100-150 kg/m 3 evenly smeared to the mantle, as shown in Figures 10b and 10c. The negative ring anomaly is unevenly distributed around the basin. Similarly, the density anomalies beneath the Serenitatis basin (Figures 10d-10f) are also insignificant, but about 100 kg/m 3 larger than that of the Imbrium basin. The negative density anomalies around the Serenitatis basin are more significant compared to the Imbrium basin, reaching −250-−350 kg/m 3 .
As for another two impact basins in the PKT region, the Humorum (Figures 10g-10i) and Nectaris (Figure 10j-10l) show more striking high-density anomalies of 250-300 kg/m 3 and 300-350 kg/m 3 concentrated at depths of 20-50 km and 15-65 km, respectively. The negative density ring surrounding the Humorum basin is evident and continuous with the density anomalies reaching −150-−250 kg/m 3 , while the Nectaris basin has sporadic negative density anomalies only on the eastern and northern parts. A semicircular high-density anomaly is located in the northwest corner of the Nectaris basin, representing the mass concentration associated with the older obscured Asperitatis basin (Neumann et al., 2015). In Neumann et al. (2015), the new basin Asperitatis was regarded as 730 km of the main diameter and 345 km of the inner diameter, which are all ∼100 km smaller than that of the adjacent Nectaris basin. From Figure 10j, we can see the density structure of the Asperitatis is incomplete, with an estimated inner diameter of about 330 km (11° length). Moreover, the density structure in Figure 10j shows a more separate structure than visible from the Bouguer map in Neumann et al. (2015). We infer that the impact created the Nectaris basin may occur later than the formation of the Asperitatis basin, leading to the incomplete semicircular ZHAO ET AL. *Note. These estimated density contrasts may be underestimated by about 30%-50% by the gravity inversion, as indicated in the synthetic example in Section 3.2.
The locations and diameters are derived from Neumann et al. (2015). structure of the last one. Other remarkable features are the zones with reduced density beneath the centers of the Humorum and Nectaris basins, which fit well to the concave Moho interface estimated by Wieczorek et al. (2013). This fit might reflect the same artifact of the Bouguer anomaly inversion as demonstrated by the synthetic test (Section 3.2) with similar effects for the Moho and the density inversions, both from the Bouguer anomalies. Therefore, we do not analyze these features because they are likely not realistic.
Besides the above large mascons, the density model resolves several smaller mascons surrounding the PKT in the nearside. Three of them (the Crisium, Smythii, and Humboldtianum) are shown in Figure 11. The high-density anomalies beneath these three basins are nearly circular (Figures 11a, 11d, and 11g), with a diameter of ∼500, ∼400, and ∼300 km, respectively. All of them are evidently surrounded by low-dense rings with a density contrast of −150-−250 kg/m 3 .
ZHAO ET AL.

10.1029/2021JE006841
16 of 22 As shown in the cross-sections, the curved upper boundaries of the high-density bodies are found beneath the Crisium (Figures 11b and 11c) and Smythii (Figures 11e and 11f). However, in the Smythii, the upper boundary of the high-density anomaly is not symmetric as in the Crisum, Humorum, and Nectaris. The mascon beneath the southeastern part of the Smythii has shallower upper and deeper lower boundaries, contrary to the northwestern region. The asymmetric feature may result from an impact formation process from the southeast obliquely cutting to the northwest, which is also supported by the topographic study (Strain & El-Baz, 1979). As for the Humboldtianum (Figures 11g-11i), it is not obviously shown that the mascon is asymmetric, and the Bouguer gravity and the Moho interface are also not tilted distinctly. Therefore, it would be arbitrarily to draw a conclusion that the Humboldtianum is formed by an oblique impact.
Although the high-density structure beneath the above seven large impact basins is somewhat different, the upper boundaries of the high-density anomalies well correspond to the Moho interface estimated in a previous study (Wieczorek et al., 2013). These prominent high-density structures primarily represent significant ZHAO ET AL. mantle uplift and thinned crust beneath the impact basins. At the same time, the low-density rings surrounding most of the mascons extend from the near-surface to the Moho depth, indicating a thick, low-density crust. These results support the mascon formation process proposed by recent numerical studies (e.g., Freed et al., 2014;Melosh et al., 2013;Miljković et al., 2015). The numerical simulations predicted that the impact basins could be formed via the growth of a bowl-shaped transient cavity after a meteorite impact, followed by overturns of the excavated crust at the edge of basins as thick ejecta deposits. The collapse of the transient crater with thinned crust and uplifting of the underlying mantle material formed mascons. During the impact, approximately one-third of the excavated materials were deposited as ejecta in an annulus contributing to the increase of the crustal thickness (Zuber et al., 2016). The fractured and brecciated circle causes the negative density anomalies, likely resulted from substantial porosity extending to a depth of tens of kilometers as predicted by Wieczorek et al. (2013).

Farside Large Lunar Basins
The Moon's nearside and farside are remarkably different in surface topography, crustal structure, and mare volcanic activity. For example, only a few basins on the farside have experienced the subsequent volcanism and mare infilling, which are typical on the nearside. The 3D density structure of four farside basins (Orientale, Mendel-Rydberg, Freundlich-Sharonov, and Moscoviense) is shown in Figure 12.
The high-density anomalies beneath the farside basins ( Figure 12) are smaller but more concentrated than the high-density structures on the nearside (Figures 10 and 11). They usually don't exceed 300 km in diameter with more homogeneous density anomalies up to 350-400 kg/m 3 . In addition, it is also obvious that the farside density anomalies are shallower, fitting closer to the Moho than those of the nearside mascons. The negative density rings are also observed around these four impact basins. However, the amplitudes of these rings surrounding the Orientale, Mendel-Rydberg, and the Freundlich-Sharonov basins are slightly lower compared with the nearside mascon basins. In addition, the negative density anomalies of these three farside mascons are continuous, forming complete outer rings. It implies that the Orientale, Mendel-Rydberg, and the Freundlich-Sharonov basins are likely formed by the typical vertical impact. One possible reason for the less obvious negative density rings on the farside basins (except for the Moscoviense) may be the relatively smaller impact size that excavates less crustal material depositing around the edge of the basin.
As for the Moscoviense basin, the density structure at a depth of 35 km (Figure 12j) shows the irregular pattern that does not follow the typical concentric ring of other mascon basins. A prominent low-density ring is observed with the density contrast varying from −200 to −250 kg/m 3 in the southwest to −100-−150 kg/m 3 in the northeast, discontinued along the NW-SE direction. The most inner part is a nearly round high-density anomaly, which is elongated to the southwest, with a higher density contrast of 350-400 kg/m 3 from the center to the southwestern part, but with a lower density contrast of 200-250 kg/m 3 in the northeastern part of the most inner ring. As seen from the depth slices in Figures 12k and 12l, the Moscoviense basin has striking high-density anomalies, forming a mantle plume. In particular, the upper boundary of this mascon is oblique rather than horizontal. It varies from 10 to 22 km and deepens from the southwest to the northeast, which is slightly deeper than the crust-mantle boundary estimated by Wieczorek et al. (2013). The negative high-density structure is also asymmetric in depth. The southwestern part is more striking and with a higher density contrast than the northeastern region, consistent with the density feature in the horizontal profile.
Two possible formation mechanisms have been proposed to interpret the unusual density structure and geomorphology. One is the oblique impact supported mainly by the unique geomorphology, e.g., the discontinuous ejecta, the partial peak rings, and the elongated nature of the basin floor (Schultz, 1992). However, the oblique impact scenario does not satisfactorily explain the thin crust, high Bouguer anomalies, and mare basalt fill (Thaisen et al., 2011). Another possibility to explain these unusual features of the Moscoviense is the multiple impacts suggested by Wilhelms et al. (1987). This hypothesis supports the geomorphology and explains the thin crust and associated gravity anomalies. Recently, an alternative hypothesis called the double impact scenario was proposed (Ishihara et al., 2011;Thaisen et al., 2011). In this model, the second impact occurred off-center on the floor of the pre-Moscovinse basin by the first impact. Furthermore, Thaisen et al. (2011) suggested that the two impacts may occur nearly simultaneously, which would excavate a very deep transient cavity without significantly increasing the crater's diameter at the surface.
These hypotheses have been proposed mainly based on a geomorphic analysis. However, the density results only are not enough to support one mechanism over another. The formation mechanism of the Moscoviense basin is still an open issue that could be solved by providing further impact study. Our results only give some intuitive support for this hypothesis.

Conclusions
A high-efficiency forward gravity method in spherical coordinates is proposed based on the 3D GLQ. In this method, the gravity anomalies are calculated in two steps: first the kernel matrix is calculated using the equivalent storage strategy, and then the product of the kernel matrix with the density vector is computed using the FFT-based algorithm. The computational efficiency is greatly improved by about three orders of magnitude compared with the traditional 3D GLQ method. The proposed method is also flexible to combine ZHAO ET AL.

10.1029/2021JE006841
19 of 22 with the adaptive discretization strategy to ensure high accuracy. This forward method is also applicable for computations of the gravity potential, gradient tensor, and magnetic field in spherical coordinates. Moreover, the improvement in computational efficiency will be more significant when applying it to each iteration in an inversion process.
We employed the proposed forward method in a 3D gravity inversion of the Bouguer gravity anomalies derived from the latest lunar gravity model GL1500E in the spherical harmonic range 6-450. The inverted results show that the high-density anomalies beneath mascon basins (250-400 kg/m 3 ) are mainly concentrated at depths 20-60 km often surrounded by circle negative density anomalies. The top boundary of the high-density anomalies of most mascon basins corresponds to the Moho interface estimated in previous studies. These prominent high-density structures primarily represent significant mantle uplift and thinned crust beneath the impact basins. The low-density rings surrounding most of the mascons extend from the near-surface to the Moho depth, indicating a thick, low-density crust.
The density distribution is remarkably different in the nearside and farside of the Moon. On the nearside, the high-density anomalies beneath most mascon basins are inhomogeneous and don't exceed 350 kg/m 3 . Another striking feature is the significant but discontinuous negative density anomalies surrounding the mascons. On the farside, the high-density anomalies are smaller in size but more concentrated than that of the nearside mascon basins with more homogeneous density distributions up to 350-400 kg/m 3 . In contrast, the amplitude of low-density rings (except for the Moscoviense basin) is lower compared with the nearside mascon basins. The difference in the density structure of the nearside and farside mascon basins may correspond to a different crustal structure. As for the Moscoviense basin, it is clearly seen the asymmetric negative outer ring, the elongated high-density anomalies, together with the huge mantle plume. However, with the only density model we still cannot favor one mechanism over another without actual impact studies.
The high-density mare basalts may produce a noticeable signal in the gravity field, especially on the nearside, and this effect has been evaluated in the inversion. Our tests indicate that the mare basalts have little impact on the inverted density structure, and lead to a decrease of density anomaly for not more than 50 kg/ m 3 for the basalt-filled mascon basins (e.g., Imbrium) where the basalt thickness is more than 5 km. Further works on determining more realistic mare basalt thickness would improve the density structure of the nearside mascon basins. Besides, the results could be improved in the future by applying a joint inversion constrained by other geology and geophysical data, which would decrease the nonuniqueness of the gravity inversion and reduce the vertical smearing.

Data Availability Statement
The lunar gravity field model GL1500E and the topography model LRO_LTM05_2050 used in this study are available at PDS Geosciences Node by NASA (https://pds-geosciences.wustl.edu/). The forward and inversion codes used in this study are freely available online (Zhao et al., 2021) under the CC BY 4.0 open-source license. The relevant data for this study can be found in Figshare repository .