1. 잘 사용하던 Daum Movie Scraper 가 Daum이 서비스를 변경하면서 작동이 되지 않음. 

2. 다른 부분(detail.json, cast_crew.json, list.json)은 여전히 사용이 가능한데 search api (movie.json)가 작동을 안함. 

3. Kodi Scraper How-To : https://kodi.wiki/view/Scrapers 를 참고함.

4. 기존 개발된  scraper를 기반으로 Daum 검색페이지를 parsing 해서 작동은 가능하도록 수정함

    - https://github.com/hojel/metadata.movie.daum.net 을 설치함. (* Plex daum agent 만드신 분이네요)

    - <CreateSearchUrl >  http:://~~, <GetSearchResults> <expression> 부분을 수정함.

    - 수정 전

<CreateSearchUrl dest="3">
        <RegExp input="$$1" output="&lt;url&gt;http://movie.daum.net/data/movie/search/v2/movie.json?size=20&amp;start=1&amp;searchText=\1&lt;/url&gt;" dest="3">
            <expression noclean="1" />
        </RegExp>
    </CreateSearchUrl>
<GetSearchResults dest="4">
        <RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="4">
            <RegExp conditional="!OrigTitleInSrchResult" input="$$1" output="&lt;entity&gt;&lt;title&gt;\2&lt;/title&gt;&lt;year&gt;\4&lt;/year&gt;&lt;id&gt;\1&lt;/id&gt;&lt;url cache=&quot;daum-movie-\1.json&quot;&gt;http://movie.daum.net/data/movie/movie_info/detail.json?movieId=\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes" trim="2,3">"movieId":(\d+),"titleKo":"([^"]*)","titleEn":"?([^",]*)"?,[^}]*"prodYear":(\d*)</expression>
            </RegExp>
            <RegExp conditional="OrigTitleInSrchResult" input="$$6" output="\1" dest="5">
                <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\2(\3)&lt;/title&gt;&lt;year&gt;\4&lt;/year&gt;&lt;id&gt;\1&lt;/id&gt;&lt;url cache=&quot;daum-movie-\1.json&quot;&gt;http://movie.daum.net/data/movie/movie_info/detail.json?movieId=\1&lt;/url&gt;&lt;/entity&gt;" dest="6">
                    <expression repeat="yes" trim="2,3">"movieId":(\d+),"titleKo":"([^"]*)","titleEn":"?([^",]*)"?,[^}]*"prodYear":(\d*)</expression>
                </RegExp>
                <expression noclean="1" />
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </GetSearchResults>

    - 수정 후 

<CreateSearchUrl dest="3">
        <RegExp input="$$1" output="&lt;url&gt;http://search.daum.net/search?w=tot&amp;q=\1&lt;/url&gt;" dest="3">
            <expression noclean="1" />
        </RegExp>
    </CreateSearchUrl>
    <GetSearchResults dest="4">
        <RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="4">
            <RegExp conditional="!OrigTitleInSrchResult" input="$$1" output="&lt;entity&gt;&lt;title&gt;\2&lt;/title&gt;&lt;year&gt;\4&lt;/year&gt;&lt;id&gt;\1&lt;/id&gt;&lt;url cache=&quot;daum-movie-\1.json&quot;&gt;http://movie.daum.net/data/movie/movie_info/detail.json?movieId=\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
                <expression clean="1">movie\.daum\.net[^\?]*\?movieId=(\d*)[^\?]*tit_name&quot;&gt;&lt;b&gt;(.[^"]*)&lt;/b&gt;[^\?]*tit_sub&quot;&gt;(.[^"]*),[^\?]*(\d{4})</expression>
            </RegExp>
            <RegExp conditional="OrigTitleInSrchResult" input="$$6" output="\1" dest="5">
                <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\2(\3)&lt;/title&gt;&lt;year&gt;\4&lt;/year&gt;&lt;id&gt;\1&lt;/id&gt;&lt;url cache=&quot;daum-movie-\1.json&quot;&gt;http://movie.daum.net/data/movie/movie_info/detail.json?movieId=\1&lt;/url&gt;&lt;/entity&gt;" dest="6">
                    <expression clean="1">movie\.daum\.net[^\?]*\?movieId=(\d*)[^\?]*tit_name&quot;&gt;&lt;b&gt;(.[^"]*)&lt;/b&gt;[^\?]*tit_sub&quot;&gt;(.[^"]*),[^\?]*(\d{4})</expression>
                </RegExp>
                <expression noclean="1" />
            </RegExp>
            <expression noclean="1" />
        </RegExp>
    </GetSearchResults>

 

5. TODO

    - daum search를 movie 한정, movie 페이지 검색의 suggest 활용 (plex daum agent 에서 활용 중)

    - CreateSearchUrl : 한글, 숫자 제목만 뽑아서 search keywords로

    - GetSearchResults : 정규식 고도화 (아무것도 모르고 카피함. ^^;;)

    - 검색안되는 것들 메뉴얼 검색해서 movieId만 넣어도 되도

    - zip 으로 배포되도록

(혼란을 막기 위해 최신 version 만 남기고  지난 version은 삭제함.)

Posted by 옴팡진
,