I'm new for learning-to-rank. I'm trying to learn the Learning to rank example provided by xgboost. I found that the core code is as follows in rank.py.
train_dmatrix = DMatrix(x_train, y_train)
valid_dmatrix = DMatrix(x_valid, y_valid)
test_dmatrix = DMatrix(x_test)
train_dmatrix.set_group(group_train)
valid_dmatrix.set_group(group_valid)
params = {'objective': 'rank:pairwise', 'eta': 0.1, 'gamma': 1.0,
'min_child_weight': 0.1, 'max_depth': 6}
xgb_model = xgb.train(params, train_dmatrix, num_boost_round=4,
evals=[(valid_dmatrix, 'validation')])
pred = xgb_model.predict(test_dmatrix)
Group data is used in both training and validation sets. But test set prediction does not use group data. I also looked at some explanations to introduce model output such as What is the output of XGboost using 'rank:pairwise'?.
Actually, in Learning to Rank field, we are trying to predict the relative score for each document to a specific query.
My understanding is that if the test set does not have group data, no query is specified. How does the model output the relative score to the specified query?
And I've tried adding test_dmatrix.set_group(group_test). The output results of the two methods are in good agreement like:
[ 1.3535978 -2.9462705 0.86084974 ... -0.23594362 0.712791
-1.633297 ]
So my question as follows:
Why does it not need to set test group when using 'rank:pairwise' in xgboost?
How can I get label to the specified group query based on the forecasting score results?
Can anybody explain it to me? Thanks in advance.
For scoring on the test set, it might matter what the specified groups are, but not for just making predictionsin detail? I can understand the group data is needed for training. But if there is no groupid in test set, how can intra-group comparisons within the same group be made to output predictions? – giser_yugang Apr 29 '19 at 02:47The interpretation (and hence also scoring the model on the test set) should use these scores to rank the samples only within groups because the model was trained to ignore inter-group interactions. But, thinking again in the context of ranking search results, you'll only predict on a set of pages matching a given query.
– Ben Reiniger Apr 30 '19 at 03:05