Announcement

Collapse
No announcement yet.

Keyword, Contains, Starts with, and End With Search

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keyword, Contains, Starts with, and End With Search

    I'm working on a project that has thrown some curves at me and I thought that maybe someone here might have some suggestions for me.

    What I have is a web site thats been ripped that needs to be searchable. What I'm doing is reading each .htm file into a variable, then do a Find usng the string that a user enters to search on. If there is a hit, add the htm file to a listbox.

    This works, but there are some glitches. Say someone want to search on the word Oak. I would set my string to find as " oak " so that it is looking for the whole word. Problem is that embedded in the HTM file, it could be <b>oak</b>, or even "oak,", or ... you get the picture.

    So the question is, can anyone come up with a good way to grab these types of entries? I could simply do a substring search, but then there would be more false hits than I would be comfortable with.

    I also have to come up with a way of doing this for a "starts with", and "end with" type of search.

    Without actually searching for every possible combination, I'm not too sure what to do.

  • #2
    Re: Keyword, Contains, Starts with, and End With Search

    Maybe something out of the box is in order? Perhaps doing a conversion of all html into identically named text files would be a starting place?

    Sorry Worm, you're probably over my head already. [img]/ubbthreads/images/icons/smile.gif[/img]
    Eric Darling
    eThree Media
    http://www.ethreemedia.com

    Comment


    • #3
      Re: Keyword, Contains, Starts with, and End With Search

      Not to mention what if a page uses styles or javascript or something similarly "behind the scenes," and has the search text in a class or function name. [img]/ubbthreads/images/icons/smile.gif[/img]

      Getting around this could be tricky, for sure. It depends how "perfect" you want your search results to be. The brute-force approach is just to search for the target string, and then analyze the text around it to see if it's a viable match. The problem with HTML is you'd have to search backwards to see if there's a < somewhere "before" the target in the file that has no trailing >, and then make sure that < isn't part of a comment, etc.

      If you don't care about stuff inside tags like that, you could just verify that the target is a whole word. To do that, check the previous and next characters in the file, and compare them to a list of "okay" characters. (You'll need a different list for the preceding character, and for the trailing character.)

      An easy way to do that is to just put all the "okay" characters in a string, and then search in that string for the character you're testing. If a match is found, then it's one of those "okay" characters.

      Example of "okay" trailing characters: .,;" -:!<

      You could also build an index of sorts, or append a list of search terms to the end of each HTML file and just search through that. (Use a tool to mine the whole words out of each html file and stuff them into a comment at the end.)

      There's probably a better way to handle all of this, overall, though. (Come on, Worm...just write a DLL. [img]/ubbthreads/images/icons/wink.gif[/img])
      --[[ Indigo Rose Software Developer ]]

      Comment


      • #4
        Re: Keyword, Contains, Starts with, and End With Search

        Ah -- what you really need is a way to do a regexp search. Which you could do with a DLL, or if you can find a free utility that will do the job.
        --[[ Indigo Rose Software Developer ]]

        Comment


        • #5
          Re: Keyword, Contains, Starts with, and End With Search


          There's probably a better way to handle all of this, overall, though. (Come on, Worm...just write a DLL. )
          I was hoping that I was over looking some obvious, right under my nose solution. Can't anything I do be simple? [img]/ubbthreads/images/icons/crazy.gif[/img]

          Thanks for the tips!

          Comment


          • #6
            Re: Keyword, Contains, Starts with, and End With Search

            The regexp search would be the way to go, IMO. Although coming up with the appropriate regular expression wouldn't be what I'd call "easy," [img]/ubbthreads/images/icons/smile.gif[/img] it would be a lot easier (and in the end much faster) than trying to do that kind of search "from scratch."
            --[[ Indigo Rose Software Developer ]]

            Comment

            Working...
            X