In Go, you can extract substrings using slice notation with the syntax string[start:end], where start is inclusive and end is exclusive. This approach works directly on strings and is very efficient for ASCII text. However, when working with Unicode characters, byte-based slicing can produce incorrect results because multi-byte characters may be split in the middle.
For Unicode-safe substring extraction, convert the string to a rune slice first, perform the slicing operation, and then convert back to a string. This ensures that each character, regardless of its byte length, is treated as a single unit. The standard slice notation returns a new string without modifying the original, since strings in Go are immutable.
Here’s a comprehensive example showing both basic and Unicode-safe substring extraction:
package main
import (
"fmt"
)
func main() {
// Basic ASCII substring extraction
text := "programming"
fmt.Println(text[0:3]) // Output: pro
fmt.Println(text[3:9]) // Output: gram
fmt.Println(text[9:]) // Output: ing (from position 9 to end)
fmt.Println(text[:4]) // Output: prog (from start to position 4)
// Unicode-safe substring extraction
unicode := "Hello 世界 🚀"
runes := []rune(unicode)
// Extract first 6 characters (including space and Chinese)
substring := string(runes[0:6])
fmt.Println(substring) // Output: Hello 世
// Extract characters 6-8
substring2 := string(runes[6:8])
fmt.Println(substring2) // Output: 界
// Helper function for safe substring extraction
safeSubstring := func(s string, start, end int) string {
runes := []rune(s)
if start < 0 {
start = 0
}
if end > len(runes) {
end = len(runes)
}
if start > end {
return ""
}
return string(runes[start:end])
}
fmt.Println(safeSubstring("Привет мир", 0, 6)) // Output: Привет
}
Choosing between byte-based and rune-based substring extraction depends on your use case. For simple ASCII text, byte slicing is faster and more efficient. For internationalized applications or when processing user-generated content, always use rune-based extraction to ensure correct handling of all Unicode characters.
Further Reading:
If you found this snippet useful, you may also enjoy some of the other content on the site: