Poverty example#66
Conversation
| transform (torchvision.transforms): transform to apply to the data""" | ||
|
|
||
| super().__init__() | ||
| dataframe = pd.read_csv(dataset_path + labels_name) |
There was a problem hiding this comment.
Consider encapsulating your strings in pathlib.Path objects to avoid '/' (slash) sensitiveness.
e.g.:
from pathlib import Path
dataframe = pd.read_csv(Path(dataset_path) / Path(labels_name))
There was a problem hiding this comment.
As we discussed, I removed the dataset from the PR.
I did make the chage though, thanks for the advice!
| self.dataframe_train = dataframe[dataframe['subset'] == 'train'] | ||
| self.dataframe_val = dataframe[dataframe['subset'] == 'val'] | ||
| self.dataframe_test = dataframe[dataframe['subset'] == 'test'] | ||
| self.tif_dir = dataset_path + tif_dir |
There was a problem hiding this comment.
Consider encapsulating your strings in pathlib.Path objects to avoid '/' (slash) sensitiveness.
e.g.:
from pathlib import Path
self.tif_dir = Path(dataset_path) / Path(tif_dir)
There was a problem hiding this comment.
As we discussed, I removed the dataset from the PR.
I did make the chage though, thanks for the advice!
| self.inference_batch_size = inference_batch_size | ||
| self.dict_normalize = json.load(open('examples/poverty/mean_std_normalize.json', 'r')) | ||
| self.num_workers = num_workers | ||
| self.task = 'regression' |
There was a problem hiding this comment.
The task argument is expected to be registered in an example config file. However here it is hardcoded.
Consider either adding the 'task' argument as input of your init() method; or checking in **kwargs if such and argument exists (taking priority over the default value 'regression')
There was a problem hiding this comment.
As we discussed, I removed the dataset from the PR.
I did make the chage though, thanks for the advice!
| dataset = MSDataset(self.dataframe_test, self.tif_dir, transform=self.test_transform()) | ||
| return dataset | ||
|
|
||
| def train_dataloader(self): |
There was a problem hiding this comment.
As your code is, it is only useful to re-define train/val/test_dataloaders to use persistent_workers=True, which is relevant if your epochs are fast and you want to make your training faster by avoiding to re-create workers at each epoch. This is done at the cost of memory.
If this is the intended use, ignore this comment. In the contrary case, you can remove altogether the re-defined train/val/test_dataloaders
There was a problem hiding this comment.
As we discussed, I removed the dataset from the PR.
I did make the chage though, thanks for the advice!
|
|
||
| loss = self.loss(y_hat, self._cast_type_to_loss(y)) # Shape mismatch for binary: need to 'y = y.unsqueeze(1)' (or use .reshape(2)) to cast from [2] to [2,1] and cast y to float with .float() | ||
| self.log(f"loss/{split}", loss, **log_kwargs) | ||
| self.log(f"loss_{split}", loss, **log_kwargs) |
There was a problem hiding this comment.
There was a problem hiding this comment.
Changing that character would also mean updating every example
There was a problem hiding this comment.
I didn't think of that. I undid it
| else: | ||
| score = metric_func(y_hat, y) | ||
| self.log(f"{metric_name}/{split}", score, **log_kwargs) | ||
| self.log(f"{metric_name}_{split}", score, **log_kwargs) |
There was a problem hiding this comment.
| for metric_name, metric_func in self.metrics.items(): | ||
| if isinstance(metric_func, dict): | ||
| score = metric_func['callable'](y_hat, y, **metric_func['kwargs']) | ||
| if metric_func['kwargs']: |
There was a problem hiding this comment.
Indeed, metric kwargs are expected to be added in the config file; however, when the kwargs is empty, calling metric_func with kwargs will trigger a None-type error.
I suggest you replace that if condition by the following line instead:
score = metric_func['callable'](y_hat, y, metric_func.get('kwargs', {}))
replacing the old line 167.
This is a good addition, thx!
| """ | ||
| device = 'cuda' if torch.cuda.is_available() else 'cpu' | ||
| self.model.to(device) | ||
| data = data.to(device) |
There was a problem hiding this comment.
data isn't guaranteed to be a Tensor since this method doesn't require it to be (intentional behavior since the method should be callable by hand on any data type).
If data isn't a Tensor, the calling of .to() will fail and crash.
The data (or its elements) are cast to device anyway in the following lines.
=> Recommendation: delete this line
| lr=lr, | ||
| weight_decay=weight_decay) | ||
|
|
||
| print(optimizer) |
|
|
||
| print(optimizer) | ||
|
|
||
| if 'regression' in task: |
There was a problem hiding this comment.
No default value for loss. Risk of error if task doesn't contain 'regression' string as loss will be None. The error is caught by malpolon.models.utils.check_loss() but I advise handling a 'default loss value' case
There was a problem hiding this comment.
removed the condition since the task is filtered
|
|
||
| model = check_model(model) | ||
|
|
||
| optimizer = torch.optim.AdamW(model.parameters(), |
There was a problem hiding this comment.
INFO: optimizers instantiation have changed behavior since update v2.1.0 and is now a user choice (via config file.
This is still 100% compatible with the update, but consider adding a optimizer input argument and nesting your optimizer variable instantiation inside the condition if optimizer is None:
(see malpolon.models.standard_prediction_systems.ClassificationSystem)
|
|
||
| super().__init__(model, loss, optimizer, metrics=metrics) | ||
|
|
||
| def configure_optimizers(self): |
There was a problem hiding this comment.
INFO: since the v2.1.0 update, this method doesn't need to be re-defined unless you want to impose a fixed optimizer/scheduler as both of these objects can be defined by the user in his experiment config file
| from pathlib import Path | ||
| from typing import Mapping, Union | ||
|
|
||
| import torchmetrics |
tlarcher
left a comment
There was a problem hiding this comment.
Reviewed the engine side of your PR as requested. Couple of things to address. I'll let you remove the example/dataset part for now.
When everything is addressed I'll merge this in a new branch which will wait for you examples when they are ready.
If you want, you can adapt the (optimizer + config file) parts in accordance to the v2.1.0 update but that isn't necessary.
tlarcher
left a comment
There was a problem hiding this comment.
Reviewed the engine side of your PR as requested. Couple of things to address. I'll let you remove the example/dataset part for now.
When everything is addressed I'll merge this in a new branch which will wait for you examples when they are ready.
If you want, you can adapt the (optimizer + config file) parts in accordance to the v2.1.0 update but that isn't necessary.
|
|
||
| class RegressionSystem(GenericPredictionSystem): | ||
| """Regression task class.""" | ||
| def __init__( |
There was a problem hiding this comment.
Some arguments (optimizer kwargs) are not used because incompatible with the proposed default optimizer. I suggest replacing them with the default optimizer's; or simply removing them.
Please update the docstring accordingly too
Added Malpolon paper shield badge



📝 Changelog
✅ Checklist