Updated April 14, 2023
Introduction to Bash Split String
In the world of big data there is presence of hundreds of unstructured data in form of message streams or even text files. Now, if one needs to get the most juice out of the data it becomes imperative for the developers to parse that string and get the essential information out of the unstructured data and make it as a structured one. Not only that, in cases of text analytics we come across a lot of string splitting as well. It is quite evident that string split has a much complex utilization as well, but the question still remains as what is the requirement of string split in bash is.
We addressed that even in bash one can perform complex analytics using sed or awk and few more commands. Not only that one might be required to split the long message streams into tokens. Now one thing to watch out for is the location of split of a string. This might be a single character or even combination of multiple characters. The location or the pattern on which it is decided to split the string is known as delimiter.
But before starting it becomes imperative for us to know something on IFS (Internal Field Separator) as it will constitute the majority of the method. IFS is nothing but a variable which is used for defining character which in turn is used for separation of a pattern into tokens. The tokens are then used for operations as required by the problem statement which is being tried to be solved. In simple terms, we call these variables as something which will separate a series of characters into recognizable parts. For example, space is used for signifying different words, newline is used for representing separate sentences and so on.
Methods of Bash Split String
Given below are the methods mentioned:
1. Split by single character
Bash has IFS as a reserved internal variable to recognize word boundaries. Hence, we would first need to assign IFS as a recognizable character as per the requirement to do the split. By default, the variable IFS is set to whitespace. Next is to read the string containing the words which needs to be split by a command read as read -ra<array_name><<<“$str”. “-r” is for not allowing backslash to act as backspace character, and in “-a<array_name>” we may use any array name as per convenience in place of <array_name> and this commands ensures that the words are assigned sequentially to the array, starting from index 0 (zero). But be very careful to assign the IFS variable as whitespace after the use of IFS is done within the code.
Syntax:
IFS='<symbol_for_separation>'
read -ra<array_name><<<"$str"
2. Split without using IFS variable
In case one doesn’t want to use the IFS variable, there is an alternate option to proceed with string split.
For this, we would use readarray as a command.
Code:
readarray -d <symbol_for_separation> -t <array_name><<<"$str"
3. Split a string with multiple character delimiter
This technique is used when there is a multiple character through which we would like to split the string. For example, in a message log, let us say a particular string is occurring after every sentence instead of a full stop. One would need to use that to separate different sentences and, in the example, we will show a detailed process for the same. Just from a pseudo code understanding perspective, we would use while loop and break down the string using a regular expression and then store each element into individual indexes in an array
One needs to keep 2 different perspective of this approach:
- {<variable>%%<delimiter>}: This is used for removing the longest matching pattern.
- {<variable>#<delimiter>}: This is for removing the shortest matching pattern.
Examples of Bash Split String
Given below are the examples mentioned:
Example #1
Code:
echo "****Example to show use of IFS to split a string****"
IFS='-'
str="Learn-Bash-From-EduCBA"
echo "The string we are going to split by hyphen '-' is: $str"
read -rasplitIFS<<< "$str"
echo "Print out the different words separated by hyphen '-'"
for word in "${splitIFS[@]}"; do
echo $word
done
echo "Setting IFS back to whitespace"
IFS=''
Output:
Example #2
Code:
echo "****Example to show split a string without IFS****"
str="Learn,Bash,From,EduCBA"
echo "The string we are going to split by comma ',' is: $str"
readarray -d , -t splitNoIFS<<< "$str"
echo "Print out the different words separated by comma '',''"
for word in "${splitNoIFS[@]}"; do
echo $word
done
Output:
Example #3
Code:
echo "****Example to show split a string without IFS****"
str="Learn||Bash||From||EduCBA"
echo "The string we are going to split by double pipe '||' is: $str"
delimiter="||"
conCatString=$str$delimiter
splitMultiChar=()
while [[ $conCatString ]]; do
splitMultiChar+=( "${conCatString%%"$delimiter"*}" )
conCatString=${conCatString#*"$delimiter"}
done
echo "Print out the different words separated by double pipe '||'"
for word in "${splitMultiChar[@]}"; do
echo $word
done
Output:
Conclusion
In this article we have tried to get you examples from the real world in a super interpretable problem statement so that the usage is intuitive for you when you are using it in the solving a real problem statement. In modern scenario, the usage of bash for splitting string specially when we have a multiple character as delimiter from message flow.
Recommended Articles
We hope that this EDUCBA information on “Bash Split String” was beneficial to you. You can view EDUCBA’s recommended articles for more information.