Ownership of battery electric vehicles is uneven in Norwegian households

Research design: specifications of the research problem

The research questions of this study have several characteristics. First, individual and household heterogeneity needs to be controlled over time to discover contributing variables to changes in private vehicle ownership while avoiding the risk of establishing biased results. Second, more informative data and variability (variation in the data that signals unnoticed relation) are needed to answer such questions.

One consequence is producing more reliable parameter estimates. Third, these questions also enable investigating households' transient or steady state regarding various socioeconomic variables such as income, education, and presence and age of children. These are among the characteristics that make panel data analysis a competent method to employ and answer the main research question of this study.

We benefit from panel data, also known as longitudinal data47,48,49. This data type allows observing the same entity, such as households, over several periods and following their developments.

The statistical unit of analysis for this research is household. The main reason is that the decision regarding vehicle choices is often affected by household needs and how the household decides. In simple words, the vehicle user is the whole household, although vehicle owners are individuals who pay for it.

Different definitions and compositions of households affect the inferred socioeconomic status and income of households50. We use the formal dwelling (residential) definition of households according to Statistics Norway (SSB)--see Supplementary Table 1.

Any households registered as residents from and after 2005 until 2022 are included in the analysis of the gray and green adopter populations if they meet the criteria mentioned. Households are dropped only in years they were not considered registered residents. We use Norwegian households, indicating households within the borders of Norway, regardless of the nationality or migration status of the individuals residing in those households.

It encompasses households including Norwegian citizens, migrants, or individuals without Norwegian citizenship.

Data and data treatment

The population of this research is all households who resided in Norway and were registered as residents as of January 1st from 2005 to 2022, with at least one privately owned passenger vehicle in any year (not necessarily in all years). To find these households in the database, we start by omitting all households without any record of vehicle ownership in the mentioned period--reported as of December 31 each year. Then, we are left with a population that comprises every household in Norway that has owned a passenger vehicle at least once during this period--conditioned to the report date.

That accounts for around 2,400,600 unique households over 18 years with the above criterion. Note that there might be years when these households have owned no vehicle. By including households that have owned vehicles at different points within the selected timeframe, we can compare and analyze the factors influencing vehicle ownership over the years, allowing us to investigate the transition from emitting vehicles to greener alternatives.

Among these households, any household without a record of BEV ownership, i.e., those who owned only emitting vehicles, are called gray adopters in this study. Those with at least one record of BEV ownership in this period are called green adopters, even if they have owned gray vehicles. Such categories align with previous studies, finding that gray vehicle owners are more likely to keep their emitting vehicle even after buying BEVs10,20. These two groups are mutually exclusive (2,048,239 gray adopter households + 352,467 green adopter households over 18 years). Of course, a household is a dynamic entity: households are formed and dissolved, and the population increases.

Given the criteria mentioned, that accounts for 1,421,753 adopter households in 2005 and 1,834,869 adopter households in 2022 (see Supplementary Note 3. A detailed description of the households' profiles used in this study can be found in Supplementary Table 1). Our data can be considered as unbalanced panel data, which occurs for various reasons: temporary unit non-response, where some participants do not engage at all time points (e.g., when people are out of the country in some years); panel attrition, when participants drop out at specific points (e.g., the contact person passes away, or the household emigrates); and late entry, when new participants join the panel at later periods (e.g., the survivors form a new household; people leave their current household and form a new household; the arrival of immigrants)51. Despite its apparent flaws, this data type is more representative of the ongoing reality in society.

It avoids myopic focus on a set of respondents with an uninterrupted record of data, which may lead to selection bias as a cause of endogeneity (Endogeneity is further discussed in the following sections). The data source for this research is Statistics Norway (SSB), accessible through the microdata.no platform. The analysis is also conducted on this platform, and all charts (except for the Sankey diagram) are our own creation in Microsoft Excel--using data manually collected from the analysis results on the platform.

The Sankey diagram is made on microdata.no and exported to the Vega graph editor (https://vega.github.io/editor/[1]). Data is treated following the platform confidentiality obligations and restrictions52.

Data procedure: measures derived from data

Variables for this study are initially shortlisted from 474 available variables on microdata.no based on their relevance to the socioeconomic status of individual persons and privately owned vehicles and longitudinal availability.

The shortlisted variables for the panel data regression analysis are controlled for correlation and multicollinearity. Regarding multicollinearity, we analyze the variance inflation factor (VIF) and tolerance (1/VIF) values (see Supplementary Table 2). The most common recommendation is that VIF values higher than 1053,54, and a tolerance of less than 0.20 are alarming55.

The variables used for this study are measured and operationalized as follows.

Household size

This variable indicates the number of persons registered under the same unique household identification number. Household size is calculated by counting the people with the same household identification number in our analysis.

Household background

Immigration background is an aggregated dummy variable. In this study, people who were not born in Norway with two Norwegian-born parents, or those foreign-born with two Norwegian-born parents are marked as having immigrant backgrounds.

Any household that comprises at least one person with an immigrant background is given a dummy value of 1--See notes in Supplementary Table 1 for more details. This variable is used as an instrument in this study.

Household income, wealth, debt

Individual persons' income after tax, wealth, and debt is aggregated into household levels by the unique household identification number. This number is the identification number of the contact person in the household and indicates persons who live in the same household.

In the case of zero or negative income after tax for the household, often related to family businesses that suffered losses within a specific year, such values are set to 1 in our regression analysis (this is done to avoid ending up with null values, and unwanted dropping of those households from the analysis by the microdata.no platform in the next transformation step). Then, aggregated values are transformed into natural log values.

Education (the highest in the household)

The highest education level that any person has achieved or holds within a year is called from the database (see the structure of the Norwegian education system56).

Then, the highest education in the household is found by the unique household identification number.

Household type by children

To examine the influence of children in vehicle adoption, we aggregate 24 categories of households defined by SSB: For the descriptive analysis, we present an overview of households without children, households with small children (youngest child 0-5 years), with older children (youngest child 6-17 years), and with adult children (youngest child 18 years and over). These four categories are mutually exclusive. In the panel regression analysis, we introduce a dummy for those households with children, which are assigned a value of 1.

Not having children is the reference. See notes in Supplementary Table 1 for more details.

Urban vs. non-urban settlement

We account for urban vs. non-urban settlement of the households. Consistent with the practice of connecting persons using a household ID, we make a simplistic assumption that all household members live at the same address as the contact person of the household, i.e., the person whose identification number is used to identify the household members.

Cross municipal residence and workplaces

To find those households with at least one person working outside the residence municipality--indicating a need for vehicle ownership--we retrieve every registered person's residence and work municipalities.

Workplace information includes the primary employment of employed residents aged 15-74 in November. Residence information consists of every resident in January. We make two simplified assumptions here.

First, the residence place and workplace dates apply for the whole year. Second, those persons with missing data of either residence or work, including those under 15 and over 74 years old, reside and work in the same municipality. Any household with at least one person working outside the residence municipality receives a dummy value of 1.

Vehicle classifications and ownership

Hybrid, electric, hydrogen, and biofuel-powered vehicles seem promising options for the transition toward a low-carbon, sustainable private transport fleet57,58. Each engine type has its advantages and disadvantages.

Except for BEVs, which are solely powered by electricity, other types emit carbon dioxide and other pollutants to different degrees59,60,61,62,63.

Various powertrains will likely coexist in the future, while BEVs will most likely lead the way64. Some references and studies we cite collectively attribute electric vehicles (EVs) to battery electric vehicles (BEVs), plug-in hybrid electric vehicles (PHEVs) or plug-in electric vehicles (PEVs), and hybrid electric vehicles (HEVs).

We use the following classification in our study to assign any registered vehicle in the country to the green or gray category:

Gray: except for battery electric vehicles, privately owned passenger vehicles with fuel types such as gasoline (petrol), diesel, paraffin (kerosene), gas, hybrid gasoline, hybrid diesel, biodiesel, bio-gasoline, LPG-gas, CNG-gas, methanol, ethanol, and other fuel.
Green: privately owned battery electric vehicles, solely powered by electricity. Any hybrid vehicle running on petrol or diesel is considered gray. Note that hydrogen-powered vehicles are green but omitted from the analysis because of negligible household ownership.

The number of privately owned hydrogen cars in Norway was zero in 2005 and 212 in 2022 (see Fig. 2).

The ownership of all registered passenger vehicles by year is identified with a unique vehicle-person ID and linked to individual persons, indicating ownership as of December 31 each year (see Supplementary Note 4). Finally, gray and green vehicles are linked to households by the unique household number. Any household that has only owned gray vehicles, i.e., never owned any green vehicle, is assigned to the gray adopter population group.

The other group, the green adopter population, consists of households with at least one green vehicle in any year, not necessarily all years (note that green adopters could also own gray vehicles). Furthermore, note that there might be years in this period when adopter households have not owned any vehicle. These two groups are mutually exclusive.

Potential endogeneity

Among the causes of endogeneity in the literature, omitted variables and simultaneity are more relevant to studies such as this research49,65,66. Endogeneity may be rooted in omitted variables when important variables correlated with the independent and dependent variables are left out of the model. This problem typically arises from three different categories.

First, the variable may exist and is measurable but overlooked and not modeled66. This issue should not be a concern due to the comprehensive selection of independent variables. We have accounted for several socioeconomic factors, covering various potential influencing variables on the number of gray (emitting) and green vehicles (BEV) in households.

Furthermore, in an adequate sample size, as large as the current study, the omitted variable can be assumed to be evenly distributed across all households (and thus, the predictor will not show systemic variation with the residual). Second, unobservable individual-specific variables such as environmental awareness, which are correlated with educational attainment, might affect the type of engine selection. Fixed effects models in panel data analysis handle unobserved heterogeneity to various degrees by accounting for such individual-specific or time-invariant effects47,66. Third, exogenous variables such as tax and subsidies (that may have a suppressing effect on the results), specific regional policies, or other unmeasured aspects (like social networks) could influence both the ownership of BEVs and emitting vehicles, thereby possibly leading to omitted variable bias. Incorporating influencing factors such as living in urban and residence-workplace proximity captures the essential elements in favor of providing a big picture in this study.

A more pressing issue within our study could be the simultaneity problem. Simultaneity occurs when the independent and dependent variables affect each other at the same time49,65,66. For example, it is plausible that a household's decision to purchase an emitting vehicle or a BEV influences their economic situation (through various factors like expenses and savings on fuel or taxes). In turn, their economic situation might influence their decision on vehicle purchase.

This feedback loop between the choice of vehicles and socioeconomic status could lead to biased estimates in the regression model. While we have accounted for various socioeconomic factors, comprehensively capturing this simultaneous relationship's direction and intensity is complex. We employ instrumental variable (IV) regression analysis to address this challenge and mitigate probable endogeneity resulting from simultaneity65,66.

Model specification

We employ the Hausman test to diagnose the model and check whether fixed effect (FE) or random effects (RE) estimation should be used in connection with panel regressions. The Hausman test provides a standard regression result for respective fixed and random effect estimation. P value based on chi-square diagnostics, an aggregate measure, indicates which variant is best for the current dataset.

P values