First, I would recommend Expresso. It is free, but you do have to register it. I find it very valuable both for working with regular expressions as well as learning to use them better. One final warning is that string parsing with regex (especially web page content; which is what yours appears to be), is especially brittle and can easily break. A regular expression that works right now, can easily start failing if the text has small changes.
With that out of the way, now for your specific question. I am assuming the result set that you are looking for is 0,13,135,171,1148,732,10 (all of the competion ids)
We will start by opening Expresso and pasting all of the text into the Sample Text (bottom left) area (make sure you are on the Test Mode tab). Now we will start writing a regular expression to find the text we are looking for. Put competition_id": into the Regular Expression area (top left). If you expand out the tree in the Regex Analyzer (top right), it will show each of the individual characters. This indicates that all of these characters will be matched literally. If you click the Run Match button, you will see a list of matches displayed in the Search Results (bottom right). Perfect, it found all 8 areas where that text appears. You can click on each of the Search Results and Expresso will highlight the corresponding area in the Sample Text.
Now we need to expand this to match the number after it. If you click on the Design Mode tab you will see an area at the bottom that lists all of the regular expression symbols and what they mean. I find this area helpful for looing up various matching patterns. Change the regular expression to be competition_id":\d+
The \d means match any digit (0-9) and the + means match one or more of them. If you click Run Match you will see that each of the matches now contains the text competition_id:"<number>
If we use this regular expression in C#, it will return back all of the text, and in this case we just want the number. One final change to the regex competition_id":(\d+). Note that in the Regex Analyzer it now indicates that we have a number capture group. All this means is that portion of the match that is inside of the parenthesis will be put into its own group that we can easily extract. Click Run Match, and you will notice that the matches still contain the full text match, but now there is a sub group under each that contains the individual value.
Now back in C#, I will assume that you you that large script block in a string value named data.
string data = ...;
//Get all of the matches
MatchCollection matches = Regex.Matches(data, "competition_id\":(\\d+)");
foreach (Match match in matches)
//This is the group number that we saw in expression. Group will be the full match.
Group group = match.Groups;
//Get the value out of the group. We can do an int.Parse since we know it will only contian digits
int competition_id = int.Parse(group.Value);
//TODO: Do something with competition_id
Note: We do have to escape the regular expression when it is represented as a string.
This is only a small introduction into regular expressions. I would encourage you to play around with Expresso and poke around online. There are lots of good resources out there. The most important thing to do is practice.
hi, thanks for the amazing answer, unforunately I get matches.Count = 0; the content of data is this: pastebin.com/AS7vMBYq
Sorry had a small typo when I moved the regular expression patter into the C#. I accidently put a space between the \d and the +. This causes the regular expression to fail because the + only affects the thing directly before it (which was the space and not the \d as intended. The updated code should work.
Another option is using HTMLAgilityPack and (Jurassic library or XPath) to get what you want.