I'm working on a project that has thrown some curves at me and I thought that maybe someone here might have some suggestions for me.
What I have is a web site thats been ripped that needs to be searchable. What I'm doing is reading each .htm file into a variable, then do a Find usng the string that a user enters to search on. If there is a hit, add the htm file to a listbox.
This works, but there are some glitches. Say someone want to search on the word Oak. I would set my string to find as " oak " so that it is looking for the whole word. Problem is that embedded in the HTM file, it could be <b>oak</b>, or even "oak,", or ... you get the picture.
So the question is, can anyone come up with a good way to grab these types of entries? I could simply do a substring search, but then there would be more false hits than I would be comfortable with.
I also have to come up with a way of doing this for a "starts with", and "end with" type of search.
Without actually searching for every possible combination, I'm not too sure what to do.
What I have is a web site thats been ripped that needs to be searchable. What I'm doing is reading each .htm file into a variable, then do a Find usng the string that a user enters to search on. If there is a hit, add the htm file to a listbox.
This works, but there are some glitches. Say someone want to search on the word Oak. I would set my string to find as " oak " so that it is looking for the whole word. Problem is that embedded in the HTM file, it could be <b>oak</b>, or even "oak,", or ... you get the picture.
So the question is, can anyone come up with a good way to grab these types of entries? I could simply do a substring search, but then there would be more false hits than I would be comfortable with.
I also have to come up with a way of doing this for a "starts with", and "end with" type of search.
Without actually searching for every possible combination, I'm not too sure what to do.
Comment