CMIP ensembles are often informatively labelled as ensembles of opportunity - they are mix of initial conditions ensembles, perturbed parameter ensembles and single simulations from whichever modelling centres have the resources and motivation to contribute. Different models often have a variety of different component models and different treatments of process, but just as commonly nominally different models share sub-models, parametrisation or approaches to tuning. This landscape is only likely to be harder to navigate with CMIP6, with a broader range of model complexity than CMIP5, several entirely new institutions participating, and an increasing number of effective model duplicates.
How can we make the best use of the information that CMIP contains? It seems increasingly clear that treating all models equally is likely to bias any inferences we wish to make. Using weighting, sub-selection or perhaps emergent constraints relevant to the problem of interest seems a logical step, but how can we be sure that the constraint provided by historical data is applicable to projections? What could go wrong? What kind of out-of-sample testing strategies could we employ to test efficacy in this case? How might doing this affect uncertainty as represented by ensemble spread?