Comparing the performance of string operations across programming languages
Pelkonen, Niko (2020-01-16)
Pelkonen, Niko
N. Pelkonen
16.01.2020
© 2020 Niko Pelkonen. Tämä Kohde on tekijänoikeuden ja/tai lähioikeuksien suojaama. Voit käyttää Kohdetta käyttöösi sovellettavan tekijänoikeutta ja lähioikeuksia koskevan lainsäädännön sallimilla tavoilla. Muunlaista käyttöä varten tarvitset oikeudenhaltijoiden luvan.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202001201035
https://urn.fi/URN:NBN:fi:oulu-202001201035
Tiivistelmä
In this thesis, the performance of string operations are compared across programming languages. Handling strings effectively is important especially when performance is a crucial factor and large string sizes may emerge. Common examples where large string sizes emerge are during digitalization of a product, reading string data from a database, reading and handling large CSV-files and Excel-files, converting file format to another file format (e.g. CSV to Excel and vice versa), and reading and handling a DOM-tree of a website.
There has been a lot of corresponding research where programming languages are benchmarked, but none of them focus directly on string operations. The main goal of this thesis is to fill this gap in literature and try to find out which programming languages have the best results on string operations in terms of execution time and memory (maximum RSS) usage.
The test environment was formed by creating randomly generated string files with sizes varying from ten thousand characters to 100 million characters. The generated characters were ‘a’, ‘b’, and ‘ ‘ (whitespace character). The programming languages selected for this thesis were Python, C, C++, Java, Perl, Ruby, Go, and Swift.
Go seemed to be the most effective language in execution times, although it was not the fastest in many operations. C used very little memory, but only five operations were implemented in it. Every operation was implemented in Python, and it used additional memory to loading the string file in only one operation, which was sorting a string. Swift had quite bad results, and this could be caused by the Linux version of Swift that was used. In regular expressions, Perl and C++ were overwhelmingly effective. Java used the most memory in every operation.
There has been a lot of corresponding research where programming languages are benchmarked, but none of them focus directly on string operations. The main goal of this thesis is to fill this gap in literature and try to find out which programming languages have the best results on string operations in terms of execution time and memory (maximum RSS) usage.
The test environment was formed by creating randomly generated string files with sizes varying from ten thousand characters to 100 million characters. The generated characters were ‘a’, ‘b’, and ‘ ‘ (whitespace character). The programming languages selected for this thesis were Python, C, C++, Java, Perl, Ruby, Go, and Swift.
Go seemed to be the most effective language in execution times, although it was not the fastest in many operations. C used very little memory, but only five operations were implemented in it. Every operation was implemented in Python, and it used additional memory to loading the string file in only one operation, which was sorting a string. Swift had quite bad results, and this could be caused by the Linux version of Swift that was used. In regular expressions, Perl and C++ were overwhelmingly effective. Java used the most memory in every operation.
Kokoelmat
- Avoin saatavuus [31934]