In the use case about Text anonymization on sensitive data, SIESTA develops a tool that allows training machine learning models using anonymised text data. In addition, a synthetic dataset will be generated to evaluate the performance of the
tool on both the original and the anonymised dataset.
Current methods are mainly based on masking or suppressing the original sensitive data, on pseudonymisation techniques that partially replace sensitive data by its category or on noising, where sensitive data is modified. The proposaed approach generates data similar to the sensitive one, substituting the original, avoiding any disclosure of critical information and allowing the machine learning models to classify the text in the same category as the original, with a very similar performance.
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.